Tuesday, August 30, 2016

Some thoughts on quant interviews

Being a curmudgeonly quant, I started reacting to people who "love" science and math with simple Post-It questions like this:


(This is not a gotcha question, all you need is to apply Pythagorean theorem twice. I even picked numbers that work out well. Yes, $9 \sqrt{2}$ is a number that works out well.)

Which reminds me of quant interviews and their shortcomings.

I already wrote about what I think is the most important problem in quantitative thinking for the general public, in Innumeracy, Acalculia, or Numerophobia, which was inspired by this Sprezzaturian's post (Sprezzaturian was writing about quant interviews).


In search of quants

That was for the general public. This post is specifically about interviewing to determine quality of quantitative thinking. Which is more than just mathematical and statistical knowledge.

One way to test mathematical knowledge is to ask the same type of questions one gets in an exam, such as:

$\qquad$ Compute $\frac{\partial }{\partial x} \frac{\partial }{\partial y} \frac{2 \sin(x) - 3 \sin(y)}{\sin(x)\sin(y)}$.

Having interacted with self-appointed "analytics experts" who had trouble with basic calculus (sometimes even basic algebra), this kind of test sounds very appealing at first. But its focus in on the wrong side of the skill set.

Physicist Eric Mazur has the best example of the disconnect between being able to answer a technical question and understanding the material:

TL; DR: students can't apply Newton's third law of motion (for every action there's an equal and opposite reaction) to a simple problem (car collision), though they can all recite that selfsame third law. I wrote a post about this before.

Testing what matters

Knowledge tests should at the very least be complemented with (if not superseded by) "facility with quantitative thinking"-type questions. For example, let's say Bob is interviewing for a job and is given the following graph (and formula):

Nina, the interviewer, asks Bob to explain what the formula means and to grok the parameters.

Bob Who Recites Knowledge will say something like "it's a sine with argument $2 \pi \rho x$ multiplied by an exponential of $- \kappa x$; if you give me the data points I can use Excel Solver to fit a model to get estimates of $\rho$ and $\kappa$."

Bob Who Understands will start by calling the graph what it is: a dampened oscillation over $x$. Treating $x$ as time for exposition purposes, that makes $\rho$ a frequency in Hertz and $\kappa$ the dampening factor.

Next, Bob Who Understands says that there appear to be 5 1/4 cycles between 0 and 1, so $\hat \rho = 5.25$. Estimating $\kappa$ is a little harder, but since the first 3/4 cycle maps to an amplitude of $-0.75$, all we need is to solve two equations, first translating 3/4 cycle to the $x$ scale,

$\qquad$ $ 10.5 \,  \pi x = 1.5 \,  \pi$ or  $x= 0.14$

and then computing a dampening of $0.75$ at that point, since $\sin(3/2 \, \pi) = - 1$,

$\qquad$  $\exp(-\hat\kappa \times 0.14) = 0.75$, or $\hat \kappa = - \log(0.75)/0.14 = 2.3$

Bob Who Understands then says, "of course, these are only approximations; given the data points I can quickly fit a model in #rstats that gets better estimates, plus quality measures of those estimates."

(Nerd note: If instead of $e^{-\kappa x}$ the dampening had been $2^{-\kappa x}$, then $1/\kappa$ would be the half-life of the process; but the numbers aren't as clean with base $e$.)

This facility with approximate reasoning (and use of #rstats :-) signal something important about Bob Who Understands: he understands what the numbers mean in terms of their effects on the function; he groks the function.

Nina hires Bob Who Understands. Bonuses galore follow.

Bob Who Recites Knowledge joins a government agency, funding research based on "objective, quantitative" metrics, where he excels at memorizing the 264,482 pages of regulation defining rules for awarding grants.

Friday, August 19, 2016

The strange case of the oscillating black hole at the gym

There's a micro-black hole at my gym, and it oscillates between a position under the squat cages and the deadlift platform, in synchrony with my powerlifting program.

It's the only possible explanation.

The data: following my not-very-demanding powerlifting program (which is very low volume, even for powerlifting), I should have gained $5\%$ in both the squat and the deadlift in the last six weeks. This is feasible because I'm recovering strength from the beginning of the year, not creating new strength. (Who said powerlifters have crazy superstitions?)

Clearly, what is happening is that the same mass (bar + weights) is exerting a larger force on my body, which means that there's a micro-black hole under the gym. Since I estimate that there's a 3-meter foundation, this being earthquake territory and all, I postulate that the black hole is below that:



Furthermore, it has to move, since it doesn't affect bench press (where, despite a damaged right rotator cuff, I'm recovering strength well above the program envelope) but it affects the other two lifts. Since it affects both lifts the same percentage, that black hole has to move diagonally, as seen above. It also oscillates back and forth between the squat and deadlift stations.

Two black holes, you say? Don't be ridiculous. Two back holes, indeed! Pah!

So, a little back-of-the-drawing calculation, the kind that drives OCD quants crazy...



(with three or four corrections along the way, including a slight ahem when I used $c = 3\times 10^{9}$ instead of the correct $c = 3\times 10^{8}$ m/s)

... and I have my micro-black hole. As small as the total gains of an entire continent's worth of CrossFitters, at a Schwarzchild radius of $3\times 10^{-16}$ meter and as heavy as a Planet Fitness personal trainer, at a mass of $2\times 10^{11}$ kilogram. That mass means that the black hole won't evaporate anytime soon, so my stalled gains will continue.

(The time to evaporate a black hole with a mass of $10^{11}$kg  is in the billions of years, about the time it would take for a CrossFitter to do one good chin-up or a Planet Fitness client to lose five ounces of fat.)

Clearly this is an important discovery. Hello, Nobel Committee? Got a pick for 2017 Physics yet?


Alternative explanation 1: weight gain on my part

First off, to cancel a $5\%$ increase in squat and deadlift, I'd have to have gained close to $15\%$ of my bodyweight over six weeks. That's not impossible (or even unheard of), but in reality I've been losing weight at about 1kg per week, mostly fat, hopefully more than 1.5kg of fat per week (muscle mass increasing at 0.5kg/week during a recovery is reasonable).

Also, weight gain wouldn't affect squat and deadlift in the same way, unless I gained all the weight above my sternum. (I continue to improve my mental skills, but that doesn't significantly increase the mass of the brain...)

In an ass-to-grass squat (my type of squat), the femur goes over a 110-120 degree arc, hence getting to the weakest part of the quads force curve. So, some more weight in the torso may affect the ability to squat heavy. But for the deadlift, the drive with the legs is only an arc of 60-70 degrees, well away from the weak part of the force curve for the quads, therefore the loss of deadlifting power given additional bodyweight should be much lower than the loss for the squat, not the same. But the same it is.

(In the squat my weakness is the quads, in the deadlift, the spinal erectors; never my glutes. Hip thrusts, baby, hip thrusts FTW! I do leg-extensions with the full stack for reps, but only an ass-to-grass squat hits the quads at their full extension...)

This explanation is therefore dismissed.


Alternative explanation 2: poor supplementation

A picture is worth a thousand words (and about two hundred dollars):



Of course, I eschew that marvelous "supplement" family, anabolic steroids, or if one wants to be a little more discreet, TRT, testosterone replacement therapy. I like my reproductive system to stay at manufacturer's specification. It's kind of a big deal for me, to have the theoretical capability for reproduction (Theoretical because the 3.75 billion women in the world took a vote and unanimously –minus my mother– decided that for the good of the universe I should not reproduce; who am I to question democracy?)

This explanation is therefore dismissed.


Alternative explanation 3: I'm no longer an 18-year-old kid.

Poppycock and balderdash! Balderdash, I say!

Age is but a number and you're as young as you feel. Besides I'm barely in my early middle age -- just a few months past 30.*

This explanation is dismissed with extreme prejudice and a SEAL Team 6 visit.


Conclusion

The only possible logical conclusion is that, like in the 1990 David Brin scifi novel Earth, there's a naughty micro-black hole oscillating between the space under the squat cages and the lifting platforms, and by enormous coincidence its period matches my powerlifting program.

Obviously the solution is to combine the Westside Barbell approach of growing a big belly with the Testosterone Nation recommendation of synergistic beard growing and head shaving so that, even with increasing gravity, gains will come:


It's Science!


-- -- -- --
* 236 months, to be precise.

Sunday, August 14, 2016

Working the solution versus solving the problem

Some time ago I tweeted that I was going to row a number of nautical miles on my trusty old Concept IIc machine. As an engineer, I use SI units for everything --- except on the water, where I use traditional units: nautical miles and knots.

A couple of rowers I know asked me how I had hacked the controller on the Concept IIc to change the units. This was my answer:

How I "hacked the software" on the Concept IIc to use nautical miles. #genius

Many people miss the point, that the others were making a common mistake in problem-solving, a mistake that forecloses most creative solutions:

The mistake is working the solution instead of solving the problem.

Hidden in the question about the hack is an assumption: that the solution has to come from my programming skills (they know what I do, so it's not an unreasonable assumption). That assumption sets a path to a solution, which would include reprogramming the firmware inside the Concept IIc controller.

Having the ability to backtrack from that path into the beginning and to choose another path is the key process in the thinking process here. Too many people start on one path and can't get off it to pursue other possible paths to the solution.

By focussing on the problem, i.e. the question "what is to be achieved?", rather than the solution under consideration, changing the software, the mistake was avoided.

Yes, this is a trivial and obvious (after the fact) example, but often the difference between a non-working "solution" and a working solution is a matter of focus on the problem to be solved.

Alas, changing their focus is too hard for some would-be problem solvers.

Wednesday, August 10, 2016

Numerical fun: tracking my blood caffeine level in one day

A few days ago, I decided to see what my blood caffeine profile looks like on a typical day. Since I didn't want to draw blood at regular intervals for analysis, I did the next best thing and tracked consumption and computed the blood level using a model of its dynamics.

Tracking consumption was simple: I have two french presses, both used for tea; the smaller one (1 liter) brews the caffeine equivalent of two espressos (80mg each, or 160 total) and the larger one (1.5 liter) brews the equivalent of three espressos (240mg). I just made a note of when I finished with one of the french presses and which it was.

To convert consumption into blood level, we need a state equation. We make the following assumptions:
  1. Caffeine level on wakeup is zero (an approximation).
  2. Time $t$ is discrete and measured in half-hours.
  3. Caffeine half-life in the body is two hours.*
The last assumption gives the equation

$\qquad L(t) = c(t) + 0.8409 \times L(t-1)$

where $L(t)$ is the level and $c(t)$ is the consumption at time $t$. This equation is an exponential decay process with a half-life of two hours: for a given $t=T$, assuming no consumption,

$\qquad L(T+4) = (0.8409)^4 \times L(T) = 0.5000 \times L(T)$.

(Two hours is 4 half-hours, since we're using the half-hour as the time unit.)

Putting the consumption and the initial condition into the equation and graphing it on a scale for the day in question we get

My average level was a bit high, but I'm used to it.

-- -- -- --
* I got this number from a doctor, but several sources have told me it's too low. Online sources point to a half-life of 3-6 hours. This changes the coefficient for $L(t-1)$ in the equation above to somewhere between 0.8909 (for three hours) to  0.9439 (for six hours). Possibly there's an update to this post in the future to deal with that.

Update in the future: I did the computations (click to embiggen):

Corrected Caffeine Level Profile