Sunday, April 17, 2016

Tax day

In memory of income fallen to the revenuers, there will be no posts in April.

Wednesday, March 30, 2016

Three cardinal sins of presenting

Observations from yet another terrible talk.

(To protect the guilty, the presenter will be called "Epic," short for "Epic Fail II," and without loss of generality will be referred to with masculine pronouns.)

Epic committed three cardinal sins of presentations (there are more than three and some of the others were present in the terrible talk), in increasing order of badness:

The sin of humming: 

"Hum... like... basically..." were Epic's most common words. Or sounds, more precisely, because that's what they are. Sounds that Epic made as his brain composed the sentence that was to come.

This is the main problem of using slides-as-presenter-notes, though it also happens to presenters who have separate "talk skeleton" notes and don't rehearse a few times: bullet points aren't feasible out-loud sentences, so, to unprepared presenters, they act as stumbling blocks rather than helpful hints.

Some people are very articulate; some can be articulate from notes; most of the others need to do at least one run-through of the notes, preferably to camera so they can review it. The camera is essential, as without feedback there's little improvement.

Humming is a sign the presenter didn't care enough for the audience to rehearse his presentation.

The sin of non-preparedness:

Like most presenters, Epic seems to have created his presentation in a small fraction of the presentation time. That's usually a recipe for disaster. While some people can make good presentations impromptu or quasi-impromptu, most presenters should prepare carefully.

Epic's presentation had no clear objectives, no clear structure, and above all, no clear arguments. For comparison, there was another presenter at the conference who, in order to explain a programming philosophy created a motivating example based on refactoring a cookbook.

The procedure for preparing isn't complicated: decide what the presentation objectives are; decide how they sequence into each other; devise ways to explain these objectives; assemble the presentation; rehearse.

Epic skipped all these stages, except the assembling of the presentation as a sequence of presenter-notes-on-slides, but without actually thinking much about what each point. Epic didn't think about the phrasing of the points (see previous sin), let alone consider how to best explain them to the audience.

Good presentations begin in the preparation; bad presentations in the lack of it.

The sin of self-absorption:

The audience was promised, and therefore expected, a technical talk about a technical tool. Epic delivered a presentation about Epic: Epic's education (really, a CV slide and multiple name-drops to Epic's school, Epic's degree, Epic's degree advisor); Epic's actions ("I did this," "I found that" not "data show" or "tool does this"); Epic's performance on Epic's job (via repeated references to a sort of limited field contests/competitions, to which the audience groan was the only appropriate answer).

Two other presenters in the same session described highly technical tools, barely ever using the first person, talking about the tools, offering interesting if technically challenging knowledge. That's because, unlike Epic, they understood that the audience wasn't there to learn about the presenters' lives, but rather about the tools.

Epic, like many terrible presenters, bought into the idea that every presentation has to be a story (more or less right, even for a technical audience) about the presenter (absolutely wrong, unless you're presenting an autobiography).

Audiences don't like bait-and-switch: deliver what was promised, not what you like.

Many talks are bad, and that's a choice made by the presenter.

Saturday, March 12, 2016

Read before writing

A quick refresher this morning before tackling a writing task in the afternoon.

A quick read of my notes on these two books always helps focus my attention for any writing task.

I make a point of re-reading Zinsser's book in its entirety at least once a year. It takes but a couple of hours, best 'writing skills preventative maintenance' I can think of. It's also worth re-reading my notes prior to any major writing task, which is why I'm doing it today. I think of it as 'pre-flighting my writing skills'.

Before any major writing task, I go over Strunk & White's rules so that they're fresh in my mind as I write. That helps cut down on editing time later.

-- -- -- --

For the terminally lazy: Amazon links to On Writing Well and The Elements Of Style. (I would make them affiliate links, but I too am lazy.)

Saturday, March 5, 2016

Powerlifters vs Gym Rats - A tale of two means

In my last post I wrote:

For example, some time ago I had a discussion with a friend about strength training. The gist of it was that powerlifters are typically much stronger than the average athlete, but they are also much fewer; because of that, in a typical gym the strongest athlete might not be a powerlifter, but as we get into regional competitions and national competitions, the winner is going to be a powerlifter.

And the explanation, which the friend didn't understand, was "because on the upper tail the difference between means is going to dominate the difference in sizes of the population."

So here's an illustration of what I meant, with pictures and numbers and bad jokes.

First let's make the setup explicit. That's the great power of math and numerical examples, making things explicit. "Powerlifters are typically much stronger than the average athlete" will be operationalized with four assumptions:
A1: There's some composite metric of strength, call it $S$ that we care about and we'll normalize it so that the average gym rat has a mean $\mu(S_{\mathrm{GR}})$ of zero and a variance of $1$. 
A2: The distribution of strength within the population of gym rats is Normally distributed. 
A3: The distribution of strength in the sub-population of powerlifters is also Normally distributed. 
A4: For illustration purposes only, we will assume that powerlifters have a mean $\mu(S_{\mathrm{PL}})$ of 2 and the same variance as the rest of the gym rats.
We operationalize "they are also much fewer" with
A5: For illustration, the number of powerlifters is $1\%$ of gym rats.
(Powerlifters are gym rats, so the distribution for $S_{\mathrm{GR}}$ includes these $1\%$, balanced by CrossFit people, who bring down the mean strength and IQ in the gym while raising the insurance premiums. Watch Elgintensity to understand.)

The following figure shows the distributions:

When we look at the people in a gym with above-average strength, that is people with $S_{\mathrm{GR}}>0$, we find that one-half of all gym rats have that, and $98
\%$ of all powerlifters have that: $\Pr(S_{\mathrm{GR}}>0) = 0.5$ and $\Pr(S_{\mathrm{PL}}>0) = 0.98$. This is illustrated in the next figure:

Powerlifters are over-represented in the above-average strength, approximately twice as much as in the general population, but they are only about $2\%$ of the total, as their over-representation is multiplied by $1\%$.

As we become more selective, the over-representation goes up. For athletes that are at least one standard deviation above the mean, we have:

with $\Pr(S_{\mathrm{GR}}>1) = 0.16$ and $\Pr(S_{\mathrm{PL}}>1) = 0.84$. Powerlifters are over-represented 5-fold, so about $5\%$ of the total athletes in this category.

When we become more and more selective, for example when we compute the number of gym rats that have at least as much strength as the average powerlifter, $\Pr(S_{\mathrm{GR}}>2)$, we get

with $\Pr(S_{\mathrm{GR}}>2) = 0.023$ and $\Pr(S_{\mathrm{PL}}>2) = 0.5$, a 22-fold over-representation, meaning that of every six athletes in this category, one is a powerlifter. (Yes, one out of six, not one out of five. See if you can figure out why; if not, look at the solution for $S>6$ below and you'll understand. Or not, but that's a different problem.)

And as we look at subsets of stronger and stronger athletes, the over-representation of powerlifters becomes higher and higher: $\Pr(S_{\mathrm{GR}}>3) = 0.00135$ and $\Pr(S_{\mathrm{PL}}>3) = 0.159$, $118$-fold ratio. There will be a few more powerlifters in this group that other gym rats; another way to say that is that powerlifters will be a little bit more than one-half of all gym rats that are at least one standard deviation stronger than the average powerlifter.

The ratios grow exponentially with increasing values for strength (the rare correct use of "exponentially" as they are ratios of Normal distribution tail probabilities; see below).

For $S>4$ the ratio is $718$, for $S>5$ the ratio is $4700$, for $S>6$ the ratio is $32 100$, in other words, there will be one non-powerlifter per group of $322$ gym rats with strength greater than 6 standard deviations above the mean of all gym rats.

This is what the effect of the differences in the tails of Normals always implies: eventually the small size of the better population (powerlifters) will be irrelevant as the higher mean will dominate.

See? That wasn't complicated at all.

-- -- -- --

For the mathematically inclined (strangely themselves over-represented in the set of powerlifters...)

Note that the ratio of probability density functions for the two Normal distributions in the post, for realizations of strength $S = x$ is
\frac{f_{S}(x|\mu_{S}=2)}{f_{S}(x|\mu_{S}=0)}= \frac{e^{-(x-2)^2/2}}{e^{-x^2/2}}= e^{2x-2}
which grows unbounded with $x$; no matter how small the fraction of powerlifters, say $\epsilon$, there's always a minimal $\bar S$ beyond which that ratio becomes greater than $1/\epsilon$ Which means that at some point above $\bar S$ the ratio of the remaining tail itself becomes greater than $1/\epsilon$. (It's very easy to calculate $\bar S$ and I have done so; I'll leave it as an exercise for the dedicated reader...)

Oh, that's the rare occurrence of the correct use of "exponentially," which is usually incorrectly treated as a synonym for "convex."

Wednesday, March 2, 2016

Acalculia, innumeracy, or numerophobia?

I think there's an epidemic of number-induced brain paralysis going around.

There are quite a few examples of quant questions in interviews creating the mental equivalent of a frozen operating system (including this post by Sprezzaturian), but I think that there's something beyond that, something that applies in social situations and that affects people who should know better.

Here's a simple example. What is the orbital speed of the International Space Station, roughly? No, don't google it, calculate it. Orbital period is about 90 minutes, altitude (distance to ground) about 400km, Earth radius is about 6370km.

Seriously, this question stumps people with university degrees, including some in the life sciences who necessarily have taken college level science courses.

And what college-level math do you need to answer it? The formula for the circumference of a circle of radius $r$. Yes, $2\times\pi\times r$. The orbital velocity in km/h is the total number of kilometers per orbit ($2\times\pi\times (6370+400)$) divided by the time to orbit in hours ($1\frac{1}{2}$), that is around $28,000$ km/h, which is close to the actual value, $27\, 600$ km/h. (The orbit is an ellipse and takes more than 90 minutes.)

Can it possibly be ignorance, innumeracy? Is it plausible that college-educated professionals don't know the circumference formula?  Nope, they can recite the formula when prompted.

Or is it acalculia? That they have a mental inability to do calculation? Nope, they can compute exactly how much I owe on the lunch bill for the extra crème brûlée and the expensive entrée.

No, I think it's a mild case of numerophobia, a mental paralysis created by the appearance of an unexpected numerical challenge in normal life. This is a problem, as most of the world can be perceived more deeply if one thinks like a quant all the time; many strange "paradoxes" become obvious when seen through the lens of numerical (or parametrical) thinking.

For example, some time ago I had a discussion with a friend about strength training. The gist of it was that powerlifters are typically much stronger than the average athlete, but they are also much fewer; because of that, in a typical gym the strongest athlete might not be a powerlifter, but as we get into regional competitions and national competitions, the winner is going to be a powerlifter.

"That's because on the upper tail the difference between means is going to dominate the difference in sizes of the population." That quoted sentence is what I said. I might as well have said "boo-blee-gaa-gee in-a-gadda-vida hidee-hidee-hidee-oh" for all the comprehension. The friend is an engineer. A numbers person. But apparently, numbers are work-domain only.

The awesome power of quant thinking is being blocked by this strange social numerophobia. We must fight it. Liberate your inner quant; learn to love numbers in all areas of life.

Everything is numbers.

Thursday, February 25, 2016

People in glass houses shouldn't call smart kids ignorant

So, an acquaintance forwarded another "kids these days can only take tests but don't know anything important" link; it included these questions as example of the problem:

"Who fought in the Peloponnesian war?  What was at stake at the Battle of Salamis?  Who taught Plato, and whom did Plato teach?  How did Socrates die?  Raise your hand if you have read both the Iliad and the Odyssey.  The Canterbury Tales?  Paradise Lost? The Inferno? 
Who was Saul of Tarsus?  What were the 95 theses, who wrote them, and what was their effect?  Why does the Magna Carta matter?  How and where did Thomas Becket die?  What happened to Charles I?  Who was Guy Fawkes, and why is there a day named after him?  What happened at Yorktown in 1781?  What did Lincoln say in his Second Inaugural?  His first Inaugural?  How about his third Inaugural? Who can tell me one or two of the arguments that are made in Federalist 10? Who has read Federalist 10?  What are the Federalist Papers?"

The funny thing, and I'm not the first one to notice this, is that the people who ask these questions in order to call others ignorant have little knowledge of the sciences, technologies, engineering, and math. (Or economics and business, for that matter.)

So, here's my response:

What happens when you drop metallic copper into sulfuric acid? What does it mean that the half-life of caffeine in the human body is approximately 2 hours? What is the main function of the kidneys and how does the heart work, namely what's connected to each part? Raise your hand if you can write the chemical equations for sodium hydroxide reacting with hydrochloric acid and for the combustion of propane. The quadratic equation solution formula? The equations of motion for a ballistic projectile? The complex conjugate of $(4 - 7i)\times (3+ 2i)$? 
What is discounted cash flow? How far are the Sun and the Moon from Earth? What is kinetic energy, and for a given moving object does it increase more when you double the mass or the speed? Why does the standard error for an estimate matter? How does a pressure cooker do its faster cooking? What's the difference in market outcomes for an increase in demand and an increase in supply, everything else being constant? What happens at Lagrange Points? What amino acids are essential, and why are they "essential"? What's Newton's first law of motion? His second law? What's an example of the difference in programming languages between a cycle and a conditional statement? Who can tell me one or two main differences between Newtonian physics and general relativity? Newtonian physics and quantum mechanics? What makes quantum mechanics "quantum"?

I contend that knowing the answers to my questions is a lot more important than to the first set of questions. Alas, many "educated" people don't think so. After all, most of the top questions lead to discussions where one can say more or less what one wants, but the bottom questions all have outside validators (the science, engineering, math, and economics or business).

The kids may well be ignorant, but the haughty superciliousness of most people whose knowledge base is the Humanities or Social Sciences is completely undeserved.

I'm going to start asking people who make big pronouncements about the ignorance of today's youth to calculate something like the missing value in the diagram above. It's basic Pythagorean theorem, applied twice, so everyone with a basic education should be able to do it, right? Right? RIGHT?

[Thoughts ruminate during the work day…]

The more I think about these two cultures, the more I see it's not just about different knowledge, it's about the focus of attention.

Compare the following question, from the original article:
Who taught Plato, and whom did Plato teach?  
What is kinetic energy, and for a given moving object does it increase more when you double the mass or the speed?
The answer the author was looking for, I think, is Socrates and Aristotle. Not the thoughts of Socrates and of Aristotle, but simply the persons. A lot of the questions in the original article are about people or events, not about concepts, ideas, or tools, which are what all my questions are about. (Kinetic energy is the energy of motion, $E_{K} = \frac{1}{2} m v^{2}$ so doubling the speed quadruples the kinetic energy, while doubling the mass only doubles the energy.)

Of course, some questions are out-and-out cultural virtue signaling. I'll see your
Raise your hand if you have read both the Iliad and the Odyssey.
And raise you a
Raise your hand if you have read both Molecular Biology of the Gene and Walter Rudin's Real and Complex Analysis and can answer the questions at the end of the chapters.
Game, set, and match, as they say in the Super Bowl.

One of the funniest things to see is the collision of these two focuses of attention, for example when people who don't like science try to pretend they "love" science by emphasizing people or events. That's when we see "science" questions like
  • Where was Einstein born? 
  • What Nobel Prizes did Marie Curie win?
These are, at best, history questions. Compare with
  • What is the energy of a 1kg mass going $99\%$ of the speed of light? 
  • If we start with 100g of Thorium-231 ($^{231}\mathrm{Th}$, an isotope in the decay chain of Uranium) and wait 51 hours (two half-lives), how much $^{231}\mathrm{Th}$ is left?
The answers to these don't depend on historic events or individual people. (They do relate to the people in the questions above by way of their work.) They require computation and thinking, for real. And that "for real" part is killer. For example, one can argue endlessly about the meaning of texts and the existence of "penumbras" in law or sticking to original intent, but there is no arguing with the technical questions.

That's one of the big issues that separates technical material from "soft" material: there's really an answer, and that answer can be shown to be right or tested with experiments that don't depend on feelings or whether Taul of Sarsus came up with it in the $94 \frac{1}{2}$ theses he nailed to the door of the Delicatessen in Wittenberg while he went in for a Schlagobers after the battle of the Salamis (pork against beef against chicken against vegan).

BTW, people who "love" science and haughty non-STEM professoriate: what's the answer to those two technical questions? Hint: don't forget the Lorenz correction.

"Won't someone rid us of these meddlesome quants?"

Saturday, February 20, 2016

Much ado about time preference

Today's José wants tomorrow's José to go on a diet, but when tomorrow arrives, the "new today" José will want the "new tomorrow" José to go on a diet, etc.

("My diet starts tomorrow" XXXL t-shirts available in the gift shop.)

As far as I know, Richard Thaler was the first economist to illustrate the inconsistency between choices in the short term and the long term with a simple pair of questions. First:

Q1: Do you prefer an apple in one year or two apples in one year and a day?

Most people choose the two apples. Then Thaler hit them with the second question:

Q2: Do you prefer an apple now, or two apples tomorrow?

And most people choose the one apple. This, trained economists and careful thinkers will say, is inconsistent. (This is one of the rare occasions when trained economists and careful thinkers will agree, so it's worth noting. :-)

Why is it inconsistent? For the same reason "my diet starts tomorrow" t-shirts are a good joke: because the decision is reversed simply by the passing of time. If instead of "in one year" and "in one year and a day" we had dates, say "on Feb 20th, 2017" and "on Feb 21st, 2017" and repeated the question every day, at some point the answer to Q1 would become "one apple," say on Feb 4, 2017.

Or maybe not. Maybe only on Feb 20th, 2017. Still, just the passing of time would reverse the choice, which is what "inconsistent over time" means.

Two common models of time preference that account for these inconsistencies are hyperbolic discounting, in which the exponential discounting used for finance (and for economics rational models) is replaced by an hyperbolic function; and a non-immediacy penalty for any delayed reward. In the second case, all future payoffs are discounted by a factor $\beta \times \delta(t)$, where $\delta(t)$ is the standard exponential discount factor and $\beta < 1$ is the non-immediacy penalty. The lower the $\beta$, the more now-oriented the decision-maker.

The reason why I've come to like the $(\beta,\delta(t))$ formulation is that it models a number of explanations that have little to do with time orientation and a lot to do with the actual circumstances of getting a reward.

For example, I give these choices to participants in one-day managerial decision-making exec-ed events:

Q3: Choose between $\$10$ now or $\$20$ tomorrow. (Nearly all choose the $\$10$.)
Q4: Choose between $\$10$ in a week or $\$20$ in eight days. (Nearly all choose the $\$20$.)

And when we discuss the "inconsistency" participants mostly bring up the mechanics of the transaction: how exactly are they going to get the money after the event is over? (It's hypothetical, of course, in these events money comes my way; but participants play along and take the decision seriously.) If it's now, they can just get the money and walk away. So the future is discounted not just because of the opportunity cost of having the money later but rather because it's associated with more hassle and uncertainty. Of course, when both payoffs are in the future, then participants prefer the larger payoff, as both payoffs have the same hassle and uncertainty.

Given the advantages of being temporally-consistent (which includes delaying gratification for bigger rewards), these non-opportunity cost reasons for now-preference are quite important. For example, in the case of people going on diets, their experience with bad diets may make them ask "what's the point? I might as well have that  second crème brûlée and a chocolate soufflé while I'm at it…"

I think that Scott Adams was right, the best think is to stop considering goals (that is making payoff-based choices) and adopt systems that work by bypassing the choice mechanisms. For me, the Paleo diet is one of them, strength training and rowing are another. YMMV, of course.

Another possibility is to practice delaying gratification as an exercise; it will be prophylactic against temporal inconsistency. There's a problem with this, of course, sometimes it's taken too far and leads to bad choices in itself. But in general, postponing a decision for a few days or considering whether a decision would change if the timing was shifted by a couple of days is a good idea.

Living for the now is a sure way to compromise the future.

--  --  --  --

For the quants…

The notion that the choice in Q3 could be due to standard discount (that is, a matter of opportunity cost of only having the money tomorrow instead of today) becomes ludicrous when we compute the discount rate associated: annualizing a $1/2$ one-day discount factor we get a yearly rate of (drumroll please…):

$\delta(\text{1 day}) = \frac{1}{(1+r)^{1/365}}= 1/2 \quad \Rightarrow \quad r = 2^{365}-1 = 7.515 \times 10^{109}$.

Choices like those captured by Q1-Q4 have to be driven by immediacy, as any attempt to find a discount mechanism that makes sense without a discontinuity at "now" quickly run into these ridiculously high discount rates.

References for the academically inclined:

✏︎ Thaler, Richard (1980): "Toward a positive theory of choice," Journal of Economic Behavior and Organization.
✏︎ Thaler, Richard (1981): "Some empirical evidence on dynamic inconsistency," Economic Letters