Saturday, October 3, 2020

More Talebian nonsense: eyeball 1.0 vs statistics

Apparently Nassim Nicholas Taleb* doesn't like some paper in psychology and decided to debunk it using a very advanced technique called "can you tell the difference between these graphs?"

Yes, the Talebian method is to look (with eyeball 1.0) at 2-D graphics and his argument is that if we can't tell the difference between a graphic with uncorrelated data and one with a small effect size, then we should dismiss the paper.

Wait, that's not entirely accurate. That rationale only applies to papers that have conclusions Taleb disagrees with. As far as I know, NNT hasn't criticized the massive amount of processing that was necessary to come up with the "photo" of the Messier 87 supermassive black hole from the raw data of the Event Horizon Telescope.

No, the "use your eyeball" method applies selectively to papers NNT doesn't like; and apparently his conclusions then apply to an entire field (psychologists, who NNT seems to have a problem with, minor exceptions allowed).

Okay, so what's wrong with this logic? 

Everything!

The reason we developed statistical analysis methods is because our eyes aren't that good at capturing subtle patterns in data when they are there.

Here are two charts plotting three variables pairwise. Can you tell which one has a correlation?



(C'mon, don't lie; you can't and neither can I — and I made the charts.)

Here, we'll fit an OLS model to the data. Now, can you tell?



(You should; the line on the left has a 10% grade; and as anyone who's ever tried to bike a long 10% grade street knows, that's a lot steeper than you'd guess.)

The thing is, there's no noise in that data; what appears to be noise is simply a missing factor, an artifact created because you can't really represent three continuous variables on a 2-D flat plot. (You can use a 2-D projection of a 3-D surface and move it around with a cursor to simulate 3-D motion, but that's not really the point here.)

That data is $Y = 0.1 \times X + Z$; note how there's no error in it. $X$ and $Z$ have some variability, but are uncorrelated. $Y$ is determined (with no error) from $X$ and $Z$, but when we plot $Y$ on $X$, the variation due to the missing variable $Z$ obscures the more subtle variation due to $X$.**

This is why we use statistical methods to elicit estimates, rather than eyeball 1.0.

- - - - -

* When one tracks topics like statistics, sometimes one gets a link to Nassim Nicholas Taleb making a fool of himself. I only watched the first couple of minutes until NNT unveils his Mathematica-based illustration, at which point his argument was already clear. And clearly wrong.

** I have two chapters in my (coming soon) book on missing factors, by the way. 🤓