Friday, April 3, 2015

Does "50% below average" convey innumeracy?

Apparently some people believe that saying "fifty percent are below average" shows ignorance of statistics.

There's some ignorance going on, but it tends to belong to those who act as if the phrase is a mathematical tautology. Consider what happens to a group of non-millionaire friends that gets in a room with Bill Gates: all but one person in that room will have below-room-average wealth.

Use that example whenever smug people who "like math" as long as understanding it is optional make fun of the "fifty percent are below average" phrase.

There are many real-life cases where the mean (or "average"; added later: see below, note IV) is different from the median (the point in the support of the distribution that has half the probability mass on either side). Understanding this is quite important for many things in life.

Consider independent random events in time. Think, for example, of random customers walking into a store, computer processes generating demand for CPU time, packets in a switching network requesting dispatch or queueing, time of death for certain terminal diseases, or radioactive decay.

If you have random independent events that can happen with some fixed probability per unit time, then the time between those events follows an exponential distribution with a probability density function
\[
f_{T}(t) = \lambda \, \exp(-\lambda \, t)
\]
where the mean time between occurrences of the event is $1/\lambda$. The median of this distribution is $\log(2)/\lambda$, which implies that there's always more probability on the left side of the mean than on the right. To be precise, $63\%$ of all intervals between successive events have a length below $1/\lambda$, the mean interval length.

"Sixty-three percent are below the mean." And true!

This asymmetry, from skewness of the distribution, also applies to more complex inter-temporal laws with dependent events, like Weibull random variables, and to power laws, which describe many natural, social, and artificial phenomena. Not always $63\%$, obviously.

So, the next time someone mocks the "fifty percent below average" as proof of innumeracy, educate them about the difference between the mean and the median.

-- -- -- --

Note I: Neil nothing-like-Carl-Sagan Tyson apparently uses the phrase to mock other people. This is no surprise, since his schtick is basically the same as Penn & Teller's: mockery of the out-group and praise of the in-group, with no education at all or, occasionally, anti-education.

Note II: $\log(2)$ is logarithm of $2$ in the natural base $e$. Even though I'm an engineer, I follow the mathematicians' convention and use $\log_{10}$ or $\log_{2}$ to make explicit when I'm not using the natural base.

Note III: Yes, it's always $63\%$, no matter the $\lambda$:
\[
\Pr(T \le 1/\lambda) = \int_{0}^{1/\lambda} \, \lambda \, \exp( - \lambda \, t) \, dt = \Bigg[ - \exp( - \lambda \, t) \Bigg]_{0}^{1/\lambda} = 0.63.
\]
This has to do with the exponential distribution and its peculiarities. As you can see, unlike many "science" popularizers, I show my work.

Note IV: A family member points out that "average" can be used for many other measures of central tendency (a point I had made in this earlier post), but: (a) pretty much all instances of the use of that phrase that I've seen refer to the mean; and (b) the people who mock the usage I explain are generally not cognizant of the other measures of central tendency, they just want to play the identity game.