Si Tacuisses, Philosophus Mansisses: October 2020

Tuesday, October 20, 2020

Pomposity!

Let $f(x)$, $f \in \mathrm{C}^{\infty}$, be the following infinitely continuously differentiable function over the space of real numbers:

\[
f(x) \doteq
\sum_{n=0}^{\infty} \frac{e^{-2} \, 2^n}{n!}
+ \frac{1}{\sqrt{2 \, \pi}}\int_{-\infty}^{+ \infty} x \, \exp(-y^2/2) \, dy;
\]
then, applying Taylor's theorem and the Newton–Leibniz axiom,

\[f(1) = 2.\]

Time out! What the Heck?!?!

Okay. Breathe.

Let's restate the above in non-pompous terms.

Let $f(x)$ be the following function
\[f(x) = 1 + x\]
then $f(1) = 2$.

All the words between "following" and "function" in the first paragraph mean "smooth," which this function certainly is; $f \in \mathrm{C}^{\infty}$ is the formal way to say all the words in that sentence, so it's redundant.

As for the complicated formula, it uses a series and an integral that each compute to one. Eagle-eyed readers will notice that the first is the Taylor series expansion of $e^2$ times the constant $e^{-2}$ and the second is $x$ times the integral of the p.d.f. for the Normal distribution for $y$, which by definition of a probability has to integrate to 1. Taylor's theorem and Newton–Leibniz axiom are used to get the values for the series and the integral from first principles, as is done in first-year mathematical analysis classes, and which no one would ever use in a practical calculation.

I took a trivially simple function and turned it into a complicated, nay, scary formula. With infinite sums, integrals, and theorems. Taylor is relatively unknown, but Newton and Leibniz? Didn't they invent calculus? (Yes.) So my nonsensical formula acquires immense gravitas. Newton! And Leibniz!!

And that's the problem with an increasing number of public intellectuals and technical material.

There are some genuinely complex things out there, and to even understand the problems in some of these complex things one needs serious grounding in the tools of the field. There's no question about that. But there's a lot of deliberate obfuscation of the clear and unnecessary complexification of the simple.

Why? And what can we do about it?

Why does this happen? Because, sadly, it works: many audiences incorrectly judge the competence of a speaker or writer by how hard it is to follow their logic. And many speakers and writers thus create a simulacrum of expertise by using jargon, dropping obscure references and provisos into the text, and avoiding simple, clear examples in favor of complex and hard-to-follow, "rich," examples.

What can we do about it? This is a systemic problem, so individual action will not solve it. But there's one thing we each can do: starve the pompous of the attention and recognition they so crave. In other words, and in a less pompous phrasing, when we realize someone is purposefully obfuscating the clear and complexifying the simple, we can stop paying attention to them.

Simplicity actually requires more competence than haphazard complexity; it requires the ability to separate what is essential from what's ancillary. To make things, as Einstein said, as simple as possible, but no simpler.

It's also a good thinking tool for general use. Feynman describes how he used to follow complicated topological proofs by thinking of balls, with hair growing on them, and changing colors:

As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball)—disjoint (two balls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on. Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say, “False!”
If it’s true, they get all excited, and I let them go on for a while. Then I point out my counterexample.
“Oh. We forgot to tell you that it’s Class 2 Hausdorff homomorphic.”
“Well, then,” I say, “It’s trivial! It’s trivial!” By that time I know which way it goes, even though I don’t know what Hausdorff homomorphic means.
Excerpt From: Richard Feynman, “Surely You’re Joking, Mr. Feynman: Adventures of a Curious Character.”

Let's strive to be like Einstein and Feynman.

- - - - -
This post was inspired by an old paper that starts with $1+1=2$ and ends with a multi-line formula, but I've lost the reference; it might have been in the igNobel prizes collection.

Sunday, October 18, 2020

Of martingales and election forecasts

(This post started its life as a response to a video, but during its development I decided that there's enough negativity in the world, so it's now a stand-alone post.)

What are these martingales?

Originally a gambling strategy, martingales are discrete-time stochastic processes... hold on, I sound like the person in that video: pompous, jargon-spewing, and unhelpful.

Let's say we have some metric that evolves over time, like the advantage candidate A (for Aiden) has over candidate B (for Brenna) in an election in the fictional country of Zambonia, and that we get measures of this metric at some discrete points (every time we take a poll, for example). Note that these are a sequence of points, ordered, but not necessarily equidistant. That's what discrete-time means, that the "independent variable" (time) is ordinal but not cardinal.

(This makes a difference for many models; in actual electoral metrics it's not very important since most campaigns run daily tracking polls.)

So, we have a metric, say $A_i$, the point advantage of Aiden in poll number $i$. This is just a sequence of numbers. If they come from an underlying process which includes some unobservable or random parts we say that the $A_i$ follow a stochastic process. (Stochastic is a [insert Harvford tuition here] word for random.)

A discrete-time stochastic process is a martingale if the best estimate we have for the metric in the future is the current value, in other words,

\[ E[A_{i+1}] = A_i. \]

In some sense, we already sort-of assume that the elections are some sort of martingale: we treat the daily poll as the best estimate of the future results. Well, we used to. Some people still do, and add a lot of unsupported assumptions to develop option pricing models for... oh, bother, almost got into that negativity again.

Martingales and forecasting

A simple example of a martingale is a symmetric random walk,

\[A_{i+1} = \left\{ \begin{array}{ll} A_i + a & \text{ with prob. 1/2} \\ A_i - a & \text{ with prob. 1/2} \end{array}\right.\]

Here are two examples, with different $a$, to show how that parameter influences the dispersion.

We can see from that figure that despite the current value being the best estimate of future values, we can make serious errors if we don't consider that dispersion. Consider the red process and note how bad the values for $A_{13}$ (POINT A) and $A_{41}$ (POINT B) are as estimates of the final value. Note also that $A_{13}$ is closer to the final value than $A_{41}$ despite $A_{41}$ being much farther along in the process (and therefore its $i=41$ is closer to the final $i=66$ than $i=13$).

Another example of a martingale is $A_{i+1} = A_i + \epsilon$ where $\epsilon$ is a Normal random variable with mean 0 and standard deviation $\sigma$. Using a standard Normal, $\sigma = 1$, here are two examples of this process:

Note how despite the same parameters and starting point, the processes' evolution is quite different. This becomes more obvious when the processes have different standard deviations:

The main point here is that even though martingales appear very simple, in that the best estimate for the future is the current value of the metric, the actual realizations of the future may be very different from the current metric.

That alone would be a good reason to try to find better ways to model elections. However this is not the only, or even the best argument against models of elections using martingales. As Ron Popeil used to say:

But wait, there's more!

The real argument here is that the process of interest (who people will vote for) and the process being measured (who the people who are willing to answer poll questions say they'll vote for) are not the same.

What's primarily wrong is that the information being used to create the $A_i$ at any point isn't an unbiased measure of the probability of Aiden winning. And that's not on the math, that's on (a) polling technique and (b) political use of polls.

Polling technique depends on people's answers, usually corrected with some measures of demographics and representativeness. For example, if Zambonia has 20% senior citizens and the polling sample only has 10%, that has to be accounted for with some statistical corrections.

Another correction comes from noticing, for example, that in previous elections the model was off by some percentage and dealing with that: if the polls for Zamboni City had Clarisse winning by 10% in the last elections but Hannibal won Zamboni City by 5%, that response bias needs to be corrected, somehow, in newer models.

Political use of polls happens when results that are known to be biased are released for political reasons. For example Aiden may release what their campaign knows to be wrong numbers to discourage Brenna donors, volunteers, and voters.

So, the problem with using martingales as a model of the election is that the information being used to generate the metrics being tracked is not an unbiased representation of the underlying reality. It's possible that the dynamics of the metric are a martingale, but what the metric is measuring is not the electoral vote but a mix of socially acceptable answers (who wants to say they're voting Hannibal rather than Clarisse, even when they are?) and push-poll results designed to influence the electoral process

Many professional political forecasters deal with this mismatch using field-specific knowledge and heuristics. Certain others criticize them for the heuristics and field-specific knowledge while missing the problems implicit in using martingale-based models.

Good, no Taleb references at all. 🤓

Recommendation: readers interested in political (and other) forecasting might want to read Superforecasting, by Phil Tetlock and Dan Gardner.

Saturday, October 3, 2020

More Talebian nonsense: eyeball 1.0 vs statistics

Apparently Nassim Nicholas Taleb* doesn't like some paper in psychology and decided to debunk it using a very advanced technique called "can you tell the difference between these graphs?"

Yes, the Talebian method is to look (with eyeball 1.0) at 2-D graphics and his argument is that if we can't tell the difference between a graphic with uncorrelated data and one with a small effect size, then we should dismiss the paper.

Wait, that's not entirely accurate. That rationale only applies to papers that have conclusions Taleb disagrees with. As far as I know, NNT hasn't criticized the massive amount of processing that was necessary to come up with the "photo" of the Messier 87 supermassive black hole from the raw data of the Event Horizon Telescope.

No, the "use your eyeball" method applies selectively to papers NNT doesn't like; and apparently his conclusions then apply to an entire field (psychologists, who NNT seems to have a problem with, minor exceptions allowed).

Okay, so what's wrong with this logic?

Everything!

The reason we developed statistical analysis methods is because our eyes aren't that good at capturing subtle patterns in data when they are there.

Here are two charts plotting three variables pairwise. Can you tell which one has a correlation?

(C'mon, don't lie; you can't and neither can I — and I made the charts.)

Here, we'll fit an OLS model to the data. Now, can you tell?

(You should; the line on the left has a 10% grade; and as anyone who's ever tried to bike a long 10% grade street knows, that's a lot steeper than you'd guess.)

The thing is, there's no noise in that data; what appears to be noise is simply a missing factor, an artifact created because you can't really represent three continuous variables on a 2-D flat plot. (You can use a 2-D projection of a 3-D surface and move it around with a cursor to simulate 3-D motion, but that's not really the point here.)

That data is $Y = 0.1 \times X + Z$; note how there's no error in it. $X$ and $Z$ have some variability, but are uncorrelated. $Y$ is determined (with no error) from $X$ and $Z$, but when we plot $Y$ on $X$, the variation due to the missing variable $Z$ obscures the more subtle variation due to $X$.**

This is why we use statistical methods to elicit estimates, rather than eyeball 1.0.

- - - - -

* When one tracks topics like statistics, sometimes one gets a link to Nassim Nicholas Taleb making a fool of himself. I only watched the first couple of minutes until NNT unveils his Mathematica-based illustration, at which point his argument was already clear. And clearly wrong.

** I have two chapters in my (coming soon) book on missing factors, by the way. 🤓