Friday, July 19, 2019

Fat tails and extremistan - not the same thing



Extremistan and mediocrestan


What, are we making up words, now? (All words are made up. Think about it.)

Extremistan and mediocrestan are characterizations of distributions; a simple way to think about them is that very large events either totally dominate (extremistan) or don't (mediocrestan):

Height is in mediocrestan: if the average height in a room with ten people is 200 cm, that's probably from ten people between 190 and 210 cm tall and not nine people 100 cm tall and one person 1100 cm tall.  
Wealth is in extremistan: if the average wealth in a room with 10 people is 2 billion dollars, that's more likely to be one billionaire with 20 billion and nine average income people than ten billionaires with 2 billion each.

This classification determines whether you can estimate relevant population parameters from samples (mediocrestan yes, extremistan no) and how well-behaved order statistics (maximum, second place, etc) are (mediocrestan nicely predictable, extremistan not so much).

There's a fairly common error that people make when they learn about extremistan: they think that because distributions in extremistan have fat tails and are dominated by extreme values, then — and this is the error — distributions that have fat tails, especially those with extreme values, are in extremistan.

Note the error: $a \Rightarrow b$ is being used to assert $b \Rightarrow a$.

As we'll see next, not all fat-tailed and extreme-valued distributions are in extremistan.


A tale of two tails


Let us compare (a) the probability that $n$ similar outcomes of large size $M$ add up to a combined event of size $nM$ (or, equivalently, average to $M$) with (b) the probability of an extreme event of size $nM$ and $n-1$ events of size 0 add up to that combined event $nM$. If the first is higher than the second, we're in mediocrestan, if the second is higher than the first, we're in extremistan.

For the Normal distribution, the probabilities (a) denoted $P(\text{Similar})$ and (b) denoted $P(\text{Extreme})$ are:
\begin{eqnarray*}
P(\text{Similar}) &=& \frac{1}{(2 \pi)^{n/2}} \exp(- n \, M^2/2) \\
P(\text{Extreme})  &=& \frac{1}{(2 \pi)^{n/2}} \exp( - n^2 \,  M^2/2)
\end{eqnarray*}
It's trivial to see that for the Normal we have
\[
P(\text{Similar}) > P(\text{Extreme}).
\]
Unsurprisingly enough, with its reference excess kurtosis of 0, the Normal distribution is well inside mediocrestan.

For our fat-tailed, extreme-valued distribution, we'll use the Gumbel distribution, which is also known as Extreme Value Type I. A simple form of this distribution has the following pdf:
\[
f_X(x) = \exp(- x - \exp(-x))
\]
As shown here, its variance is $\pi^2/6$, while the Normal above has variance 1, but since we're comparing within class (Normal with Normal and Gumbel with Gumbel), that makes no difference and saves a lot of unnecessary clutter if we just use that pdf as is.

For Gumbel we have the following probabilities:
\begin{eqnarray*}
P(\mathrm{Similar}) &=& \exp(-nM - n \, \exp(-M))
\\
P(\mathrm{Extreme}) &=&  \exp(-nM - \exp(-nM) -n+1)
\end{eqnarray*}
Since for large $M$ we have $\exp(-nM) \approx 0$ and $\exp(-M) \approx 0$, then $\exp(-nM) +n-1 > n \, \exp(-M)$, for Gumbel we also have
\[
P(\text{Similar}) > P(\text{Extreme}).
\]
The Gumbel distribution belongs in mediocrestan, despite its fat tails and extreme values.

Really makes us think about the specialness of scale independent distributions, where we can bet on a big event to overwhelm all the small events (i.e. an extremistan distribution). Those are the distributions for which a trading strategy of enduring many small losses to capture the one big win can beat a strategy of consistent small wins.


What about the maximum?


In many cases the maximum is more relevant than the mean or median. So, how do fat tails influence the maxima?

When you look at the maximum of something, say the fastest kid in a class, the larger the class, the higher the maximum will be, on average. So the fastest kid in a group of 100 is on average faster than the fastest kid in a group of 10, for example.

In mediocrestan this increase is concave on the number of kids (the difference between the fastest kids in classes of 100 and 200 kids is bigger than the difference between the fastest kids in classes of 1100 and 1200 kids, on average); in extremistan there are no guarantees.

But once again, fat tails and extreme value distributions (the Gumbel, here scaled to have variance 1) have well-behaved maxima:



This nice concavity (note the logarithmic horizontal scale) makes things predictable; since many real-world metrics are known to be fat-tailed, it's comforting to know that their maxima don't explode all of a sudden.

Note that there's an effect of the extreme value: the maxima are larger and they grow faster, with less concavity than for the Normal.


And the point is…?


There are a number of people who assert that all sorts of research and social metrics are unusable because their analysis is based on mediocrestan (either by using sample statistics to estimate population statistics or by assuming regular behavior from order statistics), but — so goes the argument — these real world metrics have fat tails, so they are in extremistan.

The point of the above was to show that this form of argument (usually punctuated with gratuitous insults, expletives, and Mathematica-based math or other forms of using pretend-math to bully one's audience) is wrong, tout court.

Only a small subset of fat-tailed, extreme-valued distributions is in extremistan. For all the rest, we can use our usual tools.