Monday, May 23, 2011

What do you mean "average"?

It's not like there aren't many options.

Many people implicitly assume the median, when they say "50% of X are below average." This is probably because they assume that the distribution is symmetric around the arithmetic, or simple, mean, and therefore the mean equals the median. But why limit ourselves to the simple mean?

The most common average is the simple mean:

$m =\frac{1}{N} \sum_{i=1}^{N} x_i,$

although one could use a quadratic, cubic, quartic, quintic, etc mean (for $k=2,3,4,5,\ldots$):

$m =\frac{1}{N}\left[ \sum_{i=1}^{N}  x_i ^k \right]^{1/k}.$

Or maybe something more esoteric, like a geometric mean:

$m =\left[ \prod_{i=1}^{N}  x_i \right]^{1/N},$

or the harmonic mean

 $m= \frac{N}{ \sum_{i=1}^{N}  \frac{1}{x_i}}.$

Strange as though they may seem, all of these have their uses and their problems. For example, europeans report automobile fuel consumption in liters/km, while americans report miles/gal, or with appropriate scaling to grown-up units, km/l. Because most people think better with linear means than harmonic means, the european representation leads to better understanding of fuel economy comparisons. The choice of the mean has to be appropriate to the problem (engineers will notice how RMS and MSE both use quadratic means). 

And this is before we even get to the problems with the $x_i$ that go into the average.