The Pith of Performance: Visualizing Variance

The typical presentation of variance in textbooks often looks like this Wikipedia definition. Quite daunting for the non-expert. So, how would you explain the notion of variance to someone who has little or no background in statistics and couldn't easily digest all that gobbledygook?

The Mean

Let's drop back a notch. How would you explain the statistical mean? A common way to do that is to utilize the simple visual device of the "bell curve" belonging to the normal distribution (Fig. 1).

Figure 1. A normal distribution

The normal distribution, $N(x,\mu,\sigma^2)$, is specified by two parameters:

Mean, usually denoted by $\mu$
Variance, usually denoted by $\sigma^2$

that determine (1) the location and (2) the shape of the curve. In Fig. 1, $\mu = 4$. Being a probability, the curve must be normalized to enclose unit area. Also, since $N(x)$ is unimodal and symmetric about $\mu$, the mean, median and mode are all located at the same position on the $x$-axis. Therefore, it's easy to point to the mean as being the $x$-position of the peak. Anybody can see that immediately. Mission accomplished.

But what about the variance? Where is that in Figure 1?

The Variance Defined

The variance has something to do with the dispersion or spread of the measured data. Technically, $Var(X)$, of an i.i.d. random variable, $X$, is the spread about the mean defined as: \begin{equation} Var(X) = E( (\mu - X)^2 ) \label{eq:var1} \end{equation} where $E(X) = \mu$ is the expectation value of the random variable—the average value of $X$. This is the probability-theoretic statement of the more familiar expression for the sample variance seen in statistics textbooks: \begin{equation} Var(X) = \dfrac{1}{N} \sum_{i=1}^N (\mu - x_i)^2 \end{equation} The summation and division by $N$ samples of the random variable, $x_i$, provides the averaging that corresponds to the expectation in \eqref{eq:var1}. Another familiar expression for the variance is given by: \begin{equation} Var(X) = E(X^2) - E^2(X) \end{equation} which is just a reorganization of the terms in \eqref{eq:var1}.

Any of these mathematical expressions simply formalizes the idea that the spread is represented by a kind of average difference between each data point and their associated mean, viz., a kind of average width. The square just ensures that the variance is always a positive number. This is true for any data distribution, not just the normal distribution. But since we already used the peak of the normal distribution to visualize the mean, it would also be useful to visualize the corresponding variance of the normal distribution.

The Variance Visualized

So, the variance of a normal distribution, $Var(X) = \sigma^2$, is identified with the spread or width of $N(x)$ about its peak location. We can write this width as $\mu~\pm~\sigma$ to indicate that there is spread below the mean ($-\sigma$) as well as above the mean ($+\sigma$). The entire interval is therefore: $\sigma - (-\sigma) = 2\sigma$. But the normal curve (being a curve) has different widths at different heights. So, what height corresponds to the width $2\sigma$ for an arbitrary normal curve?

Figure 2. Width of $N(x)$ at mid height

Since each side of the normal curve has an 'S' shape, a reasonable guess might be the width of the curve midway between its base at the x-axis and the peak of the curve or its maximum, i.e., the width shown as the horizontal line segment in Fig. 2.

Rough Approximation

This guess turns out to be not quite correct, for reasons I'll explain in a minute. Nonetheless, it is a good first approximation. In signal processing and experimental physics, the width of the normal curve at this half-way location is called the full width at half maximum or FWHM, for short. In Fig. 3, you can see that the spread at that height covers a bit more than 2 units on the x-axis. In fact, FWHM = 2.35 units.

Figure 3. Corresponding interval measured on x-axis

FWHM corresponds to the entire width at half the peak height or half max height. The standard deviation, however, is defined as the width measured on one side or the other of the mean. In other words, our estimate of the standard deviation (let's denote it by $\hat{\sigma}$) is simply half the FWHM interval: \begin{equation} \hat{\sigma} = \dfrac{\text{FWHM}}{2} = 1.1774 \label{eq:sighat} \end{equation}

The actual value is $\sigma = 1.0000$ (by construction in Fig. 1). So, the approximation in \eqref{eq:sighat} overestimates the correct value by almost 18%.

We can correct for this error by adjusting the height of the blue rectangle under the normal curve in an upward direction while keeping its area fixed. This has the effect of shrinking the width until it matches the correct position on the x-axis at $-\sigma = 3$ and $+\sigma = 5$ in Figure 4.

Figure 4. Exact areal definition of $\sigma$

In terms of areas, $\sigma$ is actually the location on the x-axis of the vertical boundaries that contain 68.2689% of the total area under $N(x,\mu,\sigma^2)$. In Fig. 4, it's the dark blue area that lies symmetrically about the mean at $\mu = 4$. Gauging that area is too subtle to use as an immediate visualization, but we can use it to improve our estimator in \eqref{eq:sighat}.

Better Approximation

The difference between the width of the correct area (dark blue) and the width of our estimator (blue) is indicated by the two light-blue marginal areas in Fig. 5.

Figure 5. Areal differences

Each of these marginal widths is about a sixth of the FWHM width. This suggests that a more accurate estimator could be formed by shrinking the FWHM by $2/6$ on each side or $1/3$ overall. A better approximation is therefore given by a denominator that is bigger than 2 by about one third ^†: \begin{equation} \hat{\sigma} = \dfrac{\text{FWHM}}{2 + 1/3} = 1.007 \label{eq:sighat2} \end{equation} which leads to the following rule of thumb.

Rule of Thumb (or Forefinger)

For the purposes of visualizing the variance:

Draw a normal curve (Fig. 1) and point to the mean ($x$-location of the peak).
Point to the full width of the normal curve at half peak-height. (see Fig. 2)
Slide your forefinger up another 10% to compensate for width overestimation ^‡.
The full width at this corrected height corresponds to $2\sigma$.

The standard deviation ($\sigma$) is half of that width and the variance ($\sigma^2$) is the square of that value.

^† The proper correction factor in the denominator of \eqref{eq:sighat2} is $2 \, (2\ln(2))^{1/2}$.
^‡ Your forefinger should now be at about 60% of peak height.

The Pith of Performance

Sunday, January 6, 2013

Visualizing Variance