Thursday, February 27, 2020

Learning and understanding technical material – some thoughts

Learning technical material


From my YouTube subscriptions, the image that inspired all this:


Ah, MIT teaching, where professors get former students who they consult for/with to teach all their classes, while still getting their teaching requirement filled…

(For what it's worth, students probably get better teaching this way, given the average quality of MIT engineering professors' teaching.)

These are not the typical MIT/Stanford/Caltech post-docs or PhD students teaching the classes of their Principal Investigators or Doctoral Advisors. These are business associates of Tom Eagar, who get roped into teaching his class "as an honor." (In other words, for free.)

Note that there is such a thing in academia as "organizing a seminar series," which some professors do (for partial teaching credit), formally different from "teaching a class" (full teaching credit). Doing the former for the credit of the latter… questionable, but sadly common in certain parts of academe.

On the other hand, as most MIT faculty and students will confirm, technical learning is 0.1% lectures, 0.9% reading textbook/notes, 9% working through solved examples, 90% solving problem sets, so all this "who teaches what" is basically a non-issue. (These numbers aren't precise estimates, just an orders-of-magnitude reference used at MIT.)


That's probably the major difference between technical fields and non-technical fields, that all the learning (all the understanding, really) is in the problem-solving. Concepts, principles, and tools only matter inasmuch as they are understood to solve problems.

(Sports analogy: No matter how strong you are, no matter how many books you read and videos you watch about handstand walks, the only way to do handstand walks is to get into a handstand, then "walk" with your hands.)

Which brings us to the next section:


Understanding technical material


There are roughly five levels of understanding technical material, counting 'no knowledge or understanding at all' as a level; the other four are illustrated in the following picture:


The most basic knowledge is that the phenomenon exists, perhaps with some general idea of its application. We'll be using gravity as the example, so the lowest level of understanding is just knowing that things under gravity, well, fall.

This might seem prosaic, but in some technical fields one meets people whose knowledge of the technical material in the field is limited to knowing the words but not their meaning; sometimes these people can bluff their way into significant positions simply by using a barrage of jargon on unsuspecting victims, but generally can be discovered easily by anyone with deeper understanding of the material.

A second rough level of knowlege and understanding is a conceptual or qualitative understanding of a field; this is the type of understanding one gets from reading well-written and correct mass-market non-fiction. In other words, an amateur's level of understanding, which is fine for amateurs.

In the case of gravity this would include things like knowing that the gravity is different on different planets, that there's some relationship with the mass of the planet, and that on a given planet objects of different masses fall at the same rate (with some caveats regarding friction and fluid displacement forces).

The big divide is between this qualitative level of understanding (which in technical fields is for amateurs, though it's also the level some professionals decay to by not keeping up with the field and not keeping their learned skills sharp) and the level at which a person can operationalize the knowledge to solve problems.

Operational understanding means that we can solve problems using the material. For example, we can use the formula $d= 1/2 \, g \, t^2$ to determine that a ball bearing falling freely will drop 4.9 m in the first second. We can also compute the equivalent result for the Moon, using $g_{\mathrm{Moon}} = g/6$, so on the Moon the ball bearing would only fall 82 cm in the first second.

This level of understanding is what technical training (classes, textbooks, problem sets, etc) is for. It's possible to learn by self-study, of course, since that's a component of all learning (textbooks were the original MOOCs), but the only way to have real operational understanding is to solve problems.

There's a level of understanding beyond operational, typically reserved for people who work in research and development, or the people moving the concepts, principles, and tools of the field forward. Since that kind of research and development needs a good understanding of the foundations of (and causality within) the field, I chose to call it deep understanding, but one might also call it causal understanding. Such an understanding of gravity would come from doing research and reading and publishing research papers in Physics, rather than applying physics to solve, say, engineering problems.


An example: Sergei Krikalev, the time-traveling cosmonaut


The difference between qualitative understanding and operational understanding can be clarified with how each level processes the following tweet:


More precise data can be obtained from the linked article and that's what we'll use below.*

Qualitative understanding: Special Relativity says that when people are moving their time passes slower than that of people who are stationary; the 0.02 seconds in the tweet come from the ISS moving around the Earth very fast.

(There's a lot of issues with that explanation; for example: from the viewpoint of Krikalev the Earth was moving while he was stationary, so why is Krikalev, instead of the Earth, in the future? Viascience explains this apparent paradox here.)

Operational understanding: time dilation relative to a reference frame created by being in a moving frame with speed $v$ is given by $\gamma(v) = (1 - (v/c)^2)^{-1/2}$. The ISS moves at approximately 7700 m/s, so that dilation is $\gamma(7700) = 1.00000000032939$. When we apply this dilation to the total time spent by Krikalev at the ISS (803 days, 9 hours, and 39 minutes = 69,413,940 s) we get that an additional 0.0228642576966 seconds passed on Earth during that time.

Because we have operational understanding of time dilation, we could ask how much in the future Krikalev would have traveled at faster speeds (not on the ISS, since its orbit determines its speed). We can see that if Krikalev had moved at twice the ISS speed, he'd have been 0.0914570307864 seconds younger. At ten times the speed, 2.2864181341266 seconds younger. And at 10,000 times the speed – over 25% of the speed of light – almost 28 days younger.

As a curiosity, we can use that $\gamma(7700)$ to compute kinetic energy, $E_k(v) = (\gamma(v)-1) \, mc^2$, or more precisely, since we don't have the mass, the specific energy, $E_k(v)/m = (\gamma(v)-1) \, c^2$. At its speed of 7.7 km/s the ISS and its contents have the specific energy of ethanol (30 MJ/kg) or seven times that of an equivalent mass of TNT.

To say that one understands technical material without being able to solve problems with that same understanding is like saying one knows French without being able to speak, read, write, or understand  French speech or text. Sacré Bleu!

The application is what counts.


- - - - -
* The article also refers to the effect of gravity, noting that it's too low to make any difference (Earth gravity at the ISS average altitude of ~400 km is 89% of surface gravity; both are too small for the General Relativity effect of gravity slowing down time to be of any impact on Krikalev, or for that matter anyone on Earth).

Wednesday, February 19, 2020

Seriously, pulling a Mensa card?

Some thoughts on IQ testing, inspired by someone who pulled a "I have a higher IQ than you" on scifi author TJIC (read his books if you like hard scifi: first, second) and then pulled —I kid you not — a Mensa card. An actual Mensa card.


The obvious logical fallacy of implying "I have a high IQ, therefore what I say is right" being evidence of not engaging one's intelligence notwithstanding, there's something funny about claims that IQ, as measured by tests designed for the mass of the population, is somehow a measure of the ability to think about complex or difficult issues.

Note that for mass testing purposes the IQ test as designed is useful, for reasons that will become clear below.

The tests typically consist of a number of simple problems of pattern matching or other low algorithmic complexity and low computational complexity tasks. This works out as a good way to separate people with intelligence from zero up to a standard deviation or two above the mean. In other words, this type of testing separates people who will have serious difficulties, mild difficulties, and no difficulties in following basic education (say up to high school), and people who can do well in education beyond the basic if they choose to.

Because some of the people who do well in these tests go on to do well in situations with high algorithmic and/or computational complexity, IQ metrics (or proxies thereof like the SAT) are used as one of the tools in selection for jobs or education that include such tasks, such as STEM education and jobs.

(Note that it is possible for someone to do badly in IQ tests and still do well in tasks with high algorithmic and/or computational complexity, though that tends to be unlikely and generally happens due to considerations orthogonal to actual intellectual capabilities.)

Because some of the people who do well in IQ tests don't do well once the algorithmic and/or computational complexity increase, using IQ measures as the sole selection tool would be a bad idea; which is why most recruiters look at school transcripts, relevant achievements (like code on Github, Ramanujan's notebooks, billionaire parents*), and other metrics.

The people who do well in IQ tests but not so well in more complex tasks tend to be the ones who join Mensa, which is why it's so funny anyone would thing showing a Mensa card means anything.

Oh, a small thing, though...

The tasks in these tests, themselves, tend to be, well... there's really no nice way to say this, wrong. Just wrong.

Other than word parallel tests (A : B :: C : ?), which measure vocabulary fluidity above all else, pretty much all the pattern matching tasks in these tests can be coded as "what's the next vector in this sequence of vectors of numbers?" to which anyone with a basic understanding of mathematics would answer "a vector of appropriate dimension with any numbers you want."

For example, consider the following sequence: 1, 1, 2, 3, 5, 8. Which is the next number in the sequence?

It's $e^{\pi^3}$.

Clearly!

That's because that sequence is clearly an enumeration in increasing order of the zeros of the following polynomial:

$(x-1)^2 \, (x-2) \, (x-3) \, (x-5) \, (x-8) \, (x - e^{\pi^3})$.

How about the sequence 1, 1, 1, 1, 1, what number comes next?

Clearly the next number is 5. This is the well-known Cinconacci sequence (five ones followed by the sum of the previous five numbers), after the Tribonacci (three ones followed by the sum of the previous three numbers) and Fibonacci (two ones followed by the sum of the previous two numbers) sequences. The Quatronacci sequence is left as an exercise to the reader.

By the way, only people with limited imagination think that the sequence 1, 1, 2, 3, 5, 8 above could only be the beginning of the Fibonacci sequence. That's a cultural bias towards a specific sequence out of an infinity of possible sequences. (A big infinity, at that, $\aleph_1$.)

Note that a smart test-taker will realize that in the infinity of sequences there are some sequences the people who write the test believe are the "right" ones, so a smart test-taker will choose those, thus using both the ability to recognize patterns and the perception of test makers' intellectual limitations.

This is not to say that the tests are useless per se; beyond being good at the separation of the lower levels of thinking ability, they measure the ability to follow instructions and to concentrate on a task for some time, both of which are important as they measure brain executive function.**

But, as I've told many a recruiter (as a consultant to the recruiter, not as a candidate), if you want to know whether a candidate can write code or solve math problems, don't bother them with puzzles; give them a coding task or a math problem.

Alas, puzzle interviews have grown to mythological status, so they're here to stay.


- - - - -

* Or as they call them at Hahvahd admissions, high-potential donors.

** There's an old recruitment test, no longer used, with an instruction sheet and a worksheet. The top of the instruction sheet said in large type "read through all the instructions before beginning," and proceded in regular type to instructions like "1 - draw a line in the worksheet, diagonally from top left to bottom right," and say, another 9 like this; at the bottom of the page it said "turn page to continue" and on the back it said, again in large type, "don't follow instructions 1-10; just write your name in the center of the worksheet and hand that in."

A significant number of people failed the test by doing tasks 1-10 as they read them, ignoring the "read all instructions before beginning" command at the top. This test is no longer used because (a) it's too well-known and (b) people who fail it never want to accept that it's their fault for not following the main instruction to read all instructions before beginning.



AFTERTHOUGHT:

My IQ, you ask? I'm pretty sure, say with 99% probability, that it falls somewhere between 50 and 500. On a good day, of course.


Wednesday, February 12, 2020

Contagion, coronavirus, and charlatans

This post is an illustration of a simple epidemiological model and why some of the ad-hoc modeling of coronavirus that some charlatans are spreading on social media platforms is a nonsensical distraction.


Math of contagion: the SIR-1 model


A simple model for infectious diseases, the SIR-1 model (also known as Kendrick-McCormack model), is too simple for the coronavirus, but contains some of the basic behavior of any epidemic.

The model uses a fixed population, with no deaths, no natural immunity, no latent period for the disease (when a person is exposed but not infectious; not to be mistaken for what happens with the coronavirus, where people are infectious but asymptomatic), and a simple topology (the population is in a single homogeneous pool, instead of different cities and countries sparsely connected).

There are three states that a given individual can be in: susceptible (fraction on the population in this state represented by $S$), infectious (fraction represented by $I$), and recovered (fraction represented by $R$); recovered means immune, so there isn't recurrence of an infection.

There are two parameters: $\beta$, the contagiousness of the disease, and $\gamma$, the recovery rate. To illustrate using discretized time, $\beta= 0.06$ means that any infectious individual has a 6% chance of infecting another individual in the next period (say, a day); $\gamma= 0.03$ means that any infectious individual has a 3% chance of recovering in the next period.

The dynamics of the model are described by three differential equations:

$\dot S = - \beta S I$;
$\dot I = (\beta S - \gamma) I$;
$\dot R = \gamma I$.

The ratio $R_0 = \beta/\gamma$ is critical to the behavior of an epidemic: if lower than one, the infection dies off without noticeable expansion, if much higher than one, it becomes a large epidemic.

There is no analytic solution to the differential equations, but they're easy enough to simulate and to fit data to. Here are some results for a discretized, 200-period simulation for some values of the parameters $(\beta, \gamma)$, starting with an initial infected population of 1%.

First, a model with an $R_0=2$, illustrating the three processes:


Note that although a large percentage of the population is eventually infected (if we continue to run the model, it will converge to 100%), the number of people infectious at a given time (and presumably also feeling the symptoms of the disease) is much lower, and this is a very important metric, as the number of people sick at a given time determines how effectively health providers can deal with the disease.

Next, a model of runaway epidemic (the $R_0 = 24$ is beyond any epidemic I've known; used here only to make the point in a short 200 periods):


In this case, the number of sick people grows very fast, which makes it difficult for the health system to cope with the disease, plus the absence of the sick people from the workforce leads to second-order problems, including stalled production, insufficient logistics to distribute needed supplies, and lack of services and support for necessary infrastructure.

Finally, a model closer to non-epidemic diseases, like the seasonal flu (as opposed to epidemic flu), though the $(\beta,\gamma)$ are too high for that disease; this was necessary for presentation purposes, in order to make the 200-period chart more than three flat lines.


Note how low the number of people infected at any time is, which is why these things tend to die off, instead of growing into epidemics, once people start taking precautions and that $\beta$ becomes smaller than $\gamma$ which leads to a $R_0 < 1$, a condition for the disease to die off eventually.


The problem with estimating ad-hoc models


One of the problems with ignoring the elements of these epidemiological models and calibrating statistical models on early data can be seen when we take the first example above ($\beta=0.06,\gamma=0.03$) and use the first 50 data points to calibrate a statistical model for forecasting the evolution of the epidemic:


As a general rule of thumb, models for processes that follow a S-shaped curve are extremely difficult to calibrate on early data; any data set that doesn't extend at least some periods into the concave region of the model is going to be of questionable value, especially if there are errors in measurement (as is always the case).

Consider that the failure of that estimation is for the simplest model (SIR-1), without the complexities of topology (multiple populations in different locations, each with a $(\beta,\gamma)$ of their own, connected by a network of transportation with different levels of quarantine and preventative measures, etc.), possible obfuscation of some data due to political concerns, misdiagnosis and under-reporting due to latency, changes to the $\beta$ and $\gamma$ as people's behavior adapts and health services adapt, and many other complications of a real-world epidemic including second-order effects on health services and essential infrastructure, which change people's behavior as well.

No, that forecasting error comes simply from that rule of thumb, that until the process passes the inflection point, it's almost certain that estimates based on aggregate numbers (as opposed to clinical measures of $\beta$ and $\gamma$, based on analysis of clinical cases; these are what epidemiologists use, by the way) will give nonsensical predictions.

But those nonsensical predictions get retweets, YouTube video views, and SuperChat money.

Saturday, February 8, 2020

Fun with numbers for February 8, 2020

Some collected twitterage and other nerditude from the interwebs.

Converting California to EVs: we're going to need a bigger boat grid


I like how silent electric vehicles are, but if California is to convert a significant number of FF cars to electric (50-80%), its grid will need to deliver 11-18% more energy (we already import around 1/3 of that energy and our grid is not exactly underutilized).




Playing around with diffusion models to avoid thinking about coronavirus


Playing around with some diffusion models of infection, not really sophisticated enough to deal with the topological complexities of coronavirus given air travel but better than people who believe you get that virus from drinking too much Corona beer… 🤯




Better choose winners of the past or the new thing?


Based on the following tweet by TJIC, author of Prometheus Award winning hard scifi books (first, second) about homesteading the Moon, with uplifted (genetically engineered, intelligent) Dogs and sentient AI,


I decided to create a simple model and just run with it. For laughs only.

We need to have some sort of metric of quality, $x$, and we'll assume that since people can stop reading a novel if it's too bad, $x \ge 0$. We also know Sturgeon's law, that 90% of everything is dross, so we'll need a distribution with a long left tail. For now we're okay with the exponential distribution $f_X(x) = \lambda \exp(-\lambda x)$, and we'll go with a $\lambda = 1$ to start.

Instead of changing the average quality of the novels for different years, we'll change the sample size from which the winners are chosen; what we're interested in is, therefore, $M(x)_N = E\left[\max\{x_1,\ldots,x_N\}\right]$ for different $N$, the number of novels. Assuming that $10 < N < 100000$, we can use a simple simulation to find those $M(x)_N$:


The results are

$M(x)_{10} = 2.899432$
$M(x)_{100} = 5.230011$
$M(x)_{1000} = 7.512119$
$M(x)_{10000} =  9.750539$
$M(x)_{100000} =   12.122326$

Let's say there are between 100 and 1000 scifi novels worthy of that name in any given year of the last 100 years. So, unless the new novels have on average between 5.2 and 7.5 times the average quality of those in the previous 100 years, one is better off picking a winner at random from those 100 years than a random new novel.

(Yes, there's a lot of nonsense in this model, but the idea is just to show that when there's a long left tail, which comes from Sturgeon's law --- and this one isn't even that steep --- randomly picking past winners is a better choice than randomly picking new novels even if the quality improved a bit relative to the past.)



No numbers, just Bay Area seamanship





Live long and prosper.