Sunday, May 22, 2016

Nerds/geeks as an example of aggregation problem - A rant.

Aggregation problems come from the loss of information and detail from data reduction procedures. It applies to classifications of humans as well. The problem, that is.

Hi. My name is José, and I'm a geek.*

The classification of someone as a nerd/geek (used interchangeably throughout this post) has been on my mind recently. For clarity, I mean the old-style STEM-geek, not the new-agey "anything"-geek like food-geek or exercise-geek.

Here are a few things that get put under geekdom:

1. Playing video games

2. Reading comic books. I mean, graphic novels.

3. Watching science fiction television shows and movies.

(Huh, these read as "basic entertainment," with a notable lack of physicality. So far I score basically a zero. I watch some SciFi television and movies, but as a fraction of my [already minimal] media consumption, they are negligible.)

4. Reading science fiction books.

5. Solving logic puzzles.

6. Learning STEM for ludic purposes (as opposed to for school)

7. Watching science and engineering documentaries (including YouTube).

(Well, now I'm batting 1.000 on these last four.)

8. Applying math, engineering, and science to everyday problems.

9. Having a home lab, building mechanical, electrical, or electronic devices, programming computers (no, not just "using" computers), basically being an amateur scientist or engineer.

10. Choosing a career in STEM.

(Again, 1.000 in the last three.)

My point here is that 1--3, the most popular "geek" activities, represent a choice of entertainment that is mostly non-physical but has minimal intellectual involvement (SPARE ME YOUR BELLYACHING, GAMERS, I HAVE READ MOLECULAR BIOLOGY OF THE GENE FOR FUN, COMPARE THAT WITH YOUR PEW-PEW-BOOM), 4--7 represent more intellectually challenging choices of entertainment, and 8--10 are essentially having the mindset that leads to a career in STEM.

So, what's the point, really? Bragging?


There's a lot of buzz around the "rise of nerds" or some such idea, basically that because of the importance of technology and the enrichment of some entrepreneurial nerds, society is moving towards a more accepting attitude towards nerds.

That might be true, but the evidence I see for this is almost always from the rise of activities in the 1--3 points above. And that isn't what makes for a real societal change (except in the undesirable consequences of having young people who avoid physical exertion, something that I never did, mens sana in corpore sano and all that...)

People who really like machines (like I do), will easily spend hours watching Chris Boden "autopsy" equipment (someone give Mr. Boden a sandwich or twenty, please):

People who really like the science in "science fiction" will argue about different parts of the movies than audience members who are there for the spectacle or for the back story of the characters:

(If I were ill-tempered, I'd invite readers to compare those reviews, by an actual engineer who works in space exploration, with the generic comments by a science popularizer that plays a scientist in the media but who's an administrator in a museum in NYC. Just for comparison, before Carl Sagan became a popularizer with Cosmos, he had a long list of academic publications. Unlike this popularizer, who made a fool of himself trying to fill Sagan's shoes. But I won't name him, because that would be mean.)

Hardcore nerds might even spend some quality time with a textbook or two, learning new stuff in their middle-age, or taking in a lecture from a little technical school in Massachusetts, free!

I'm not saying that one thing is better than another, just that they're different. That science books that appeal to "mass market geeks" (1--3) are going to focus on different matters (events, people) than science books for the "hardcore geeks" (the STEM, the logic), but with this confusion between the first (and larger) group and the second (smaller but more dedicated) group, many popularization channels for STEM are becoming more like mass entertainment and losing their focus.

(Ranting? You bet. On top of my full-time quant job, I agreed to teach an MBA class, and even though I'm no longer an academic I still get refereeing requests; so I'm tired, I'm over-caffeinated, I'm fed up with people who "love science" like they "loved Armani" in the 1990s, and people who think that a biography of a physicist is a science book. And don't get me started on the people who try to crowbar the Arts into STEM to create STEAM, a steaming pile of... but I digress.)

It's nice that geeks are more accepted by society in general, but it's important to feed the hunger for knowledge of the "hardcore geeks" not just pander to the "mass market geeks."

(Seriously, I've worked about 70 hours since last Monday and now there's an unpaid referee report that I have to write. I'd much rather watch a few more equipment autopsies.)

-- -- -- --
* Also a powerlifter, so watch your mouth. :-)

Saturday, May 21, 2016

Groups who facilitate or benefit from technological illiteracy

Based on a comment I made on this post by Dystopian Science Fiction author Davis Aurini (author of As I Walk These Broken Roads).

Technological illiteracy isn’t a random occurrence; without starting conspiracy theories, there are quite a few identifiable groups of people who participate in and benefit from this state of affairs. Just off the top of my head:

“True believers” really think that if they concentrate enough on the whiteness and maleness of Isaac Newton they won’t die when they fall off a cliff, as long as they self-identify as something other than white and male. Well, maybe not gravity, but there are true believers in a variety of nonsense who think that just because something is virtuous (say Solar Power), it must be immune from the laws of Physics and Economics.

“Dunning-Krugers” consider that anything that isn’t their part of the job, like say engineering and manufacturing, is a trivial point that can be solved in an afternoon, while their part, say choosing the color for the packaging, is a key success factor and must be the most important part of the project. (See: Fontus Water Bottle.)

“People who love science” (as long as they don’t have to learn any) are always looking for ways to virtue signal their love of science, so anything technological which allows them to pretend they’re at the forefront of technology will be eagerly embraced. More so if the right celebrities, especially sciencey celebrities, are behind it. (See: Solar Roadways.)

“Geek-haters” did poorly in science and math class, and they hate the people who actually understand STEM, so when they see a popular product that only geeks complain about they take the opportunity to attack those who did well in STEM. In other words, they see these nonsense products as opportunities for creating friction between the geeks and the general population. (See: Triton Artificial Gills.)

“Early Outs” understand that the product is made of vaporware, hype, and fraud, but they also know that before that’s exposed their share of the company will be sold to the next level of investors so they’ll make a fortune and have no liability, as they will have all sorts of CYA written into the bylaws of the company. (See: Solyndra.)

“Banksters” know how to pass any losses they might have from buying later into the company to their clients or to taxpayers, so they don’t care about the long-term feasibility of any company as long as they get their fees and carry-over trade gains. (See: Dot-com bubble of 2000.)

Probably a few more. Certainly “patient enemies of our nations” would be a possibility, but that would be conspiratorial now…

Monday, May 9, 2016

Service notice

As I'm preparing to teach a class on top of my day job, and there are books I want to read and online courses I want to take, posting will be light for the month of May.

Sunday, April 17, 2016

Tax day

In memory of income fallen to the revenuers, there will be no posts in April.

Wednesday, March 30, 2016

Three cardinal sins of presenting

Observations from yet another terrible talk.

(To protect the guilty, the presenter will be called "Epic," short for "Epic Fail II," and without loss of generality will be referred to with masculine pronouns.)

Epic committed three cardinal sins of presentations (there are more than three and some of the others were present in the terrible talk), in increasing order of badness:

The sin of humming: 

"Hum... like... basically..." were Epic's most common words. Or sounds, more precisely, because that's what they are. Sounds that Epic made as his brain composed the sentence that was to come.

This is the main problem of using slides-as-presenter-notes, though it also happens to presenters who have separate "talk skeleton" notes and don't rehearse a few times: bullet points aren't feasible out-loud sentences, so, to unprepared presenters, they act as stumbling blocks rather than helpful hints.

Some people are very articulate; some can be articulate from notes; most of the others need to do at least one run-through of the notes, preferably to camera so they can review it. The camera is essential, as without feedback there's little improvement.

Humming is a sign the presenter didn't care enough for the audience to rehearse his presentation.

The sin of non-preparedness:

Like most presenters, Epic seems to have created his presentation in a small fraction of the presentation time. That's usually a recipe for disaster. While some people can make good presentations impromptu or quasi-impromptu, most presenters should prepare carefully.

Epic's presentation had no clear objectives, no clear structure, and above all, no clear arguments. For comparison, there was another presenter at the conference who, in order to explain a programming philosophy created a motivating example based on refactoring a cookbook.

The procedure for preparing isn't complicated: decide what the presentation objectives are; decide how they sequence into each other; devise ways to explain these objectives; assemble the presentation; rehearse.

Epic skipped all these stages, except the assembling of the presentation as a sequence of presenter-notes-on-slides, but without actually thinking much about what each point. Epic didn't think about the phrasing of the points (see previous sin), let alone consider how to best explain them to the audience.

Good presentations begin in the preparation; bad presentations in the lack of it.

The sin of self-absorption:

The audience was promised, and therefore expected, a technical talk about a technical tool. Epic delivered a presentation about Epic: Epic's education (really, a CV slide and multiple name-drops to Epic's school, Epic's degree, Epic's degree advisor); Epic's actions ("I did this," "I found that" not "data show" or "tool does this"); Epic's performance on Epic's job (via repeated references to a sort of limited field contests/competitions, to which the audience groan was the only appropriate answer).

Two other presenters in the same session described highly technical tools, barely ever using the first person, talking about the tools, offering interesting if technically challenging knowledge. That's because, unlike Epic, they understood that the audience wasn't there to learn about the presenters' lives, but rather about the tools.

Epic, like many terrible presenters, bought into the idea that every presentation has to be a story (more or less right, even for a technical audience) about the presenter (absolutely wrong, unless you're presenting an autobiography).

Audiences don't like bait-and-switch: deliver what was promised, not what you like.

Many talks are bad, and that's a choice made by the presenter.

Saturday, March 12, 2016

Read before writing

A quick refresher this morning before tackling a writing task in the afternoon.

A quick read of my notes on these two books always helps focus my attention for any writing task.

I make a point of re-reading Zinsser's book in its entirety at least once a year. It takes but a couple of hours, best 'writing skills preventative maintenance' I can think of. It's also worth re-reading my notes prior to any major writing task, which is why I'm doing it today. I think of it as 'pre-flighting my writing skills'.

Before any major writing task, I go over Strunk & White's rules so that they're fresh in my mind as I write. That helps cut down on editing time later.

-- -- -- --

For the terminally lazy: Amazon links to On Writing Well and The Elements Of Style. (I would make them affiliate links, but I too am lazy.)

Saturday, March 5, 2016

Powerlifters vs Gym Rats - A tale of two means

In my last post I wrote:

For example, some time ago I had a discussion with a friend about strength training. The gist of it was that powerlifters are typically much stronger than the average athlete, but they are also much fewer; because of that, in a typical gym the strongest athlete might not be a powerlifter, but as we get into regional competitions and national competitions, the winner is going to be a powerlifter.

And the explanation, which the friend didn't understand, was "because on the upper tail the difference between means is going to dominate the difference in sizes of the population."

So here's an illustration of what I meant, with pictures and numbers and bad jokes.

First let's make the setup explicit. That's the great power of math and numerical examples, making things explicit. "Powerlifters are typically much stronger than the average athlete" will be operationalized with four assumptions:
A1: There's some composite metric of strength, call it $S$ that we care about and we'll normalize it so that the average gym rat has a mean $\mu(S_{\mathrm{GR}})$ of zero and a variance of $1$. 
A2: The distribution of strength within the population of gym rats is Normally distributed. 
A3: The distribution of strength in the sub-population of powerlifters is also Normally distributed. 
A4: For illustration purposes only, we will assume that powerlifters have a mean $\mu(S_{\mathrm{PL}})$ of 2 and the same variance as the rest of the gym rats.
We operationalize "they are also much fewer" with
A5: For illustration, the number of powerlifters is $1\%$ of gym rats.
(Powerlifters are gym rats, so the distribution for $S_{\mathrm{GR}}$ includes these $1\%$, balanced by CrossFit people, who bring down the mean strength and IQ in the gym while raising the insurance premiums. Watch Elgintensity to understand.)

The following figure shows the distributions:

When we look at the people in a gym with above-average strength, that is people with $S_{\mathrm{GR}}>0$, we find that one-half of all gym rats have that, and $98
\%$ of all powerlifters have that: $\Pr(S_{\mathrm{GR}}>0) = 0.5$ and $\Pr(S_{\mathrm{PL}}>0) = 0.98$. This is illustrated in the next figure:

Powerlifters are over-represented in the above-average strength, approximately twice as much as in the general population, but they are only about $2\%$ of the total, as their over-representation is multiplied by $1\%$.

As we become more selective, the over-representation goes up. For athletes that are at least one standard deviation above the mean, we have:

with $\Pr(S_{\mathrm{GR}}>1) = 0.16$ and $\Pr(S_{\mathrm{PL}}>1) = 0.84$. Powerlifters are over-represented 5-fold, so about $5\%$ of the total athletes in this category.

When we become more and more selective, for example when we compute the number of gym rats that have at least as much strength as the average powerlifter, $\Pr(S_{\mathrm{GR}}>2)$, we get

with $\Pr(S_{\mathrm{GR}}>2) = 0.023$ and $\Pr(S_{\mathrm{PL}}>2) = 0.5$, a 22-fold over-representation, meaning that of every six athletes in this category, one is a powerlifter. (Yes, one out of six, not one out of five. See if you can figure out why; if not, look at the solution for $S>6$ below and you'll understand. Or not, but that's a different problem.)

And as we look at subsets of stronger and stronger athletes, the over-representation of powerlifters becomes higher and higher: $\Pr(S_{\mathrm{GR}}>3) = 0.00135$ and $\Pr(S_{\mathrm{PL}}>3) = 0.159$, $118$-fold ratio. There will be a few more powerlifters in this group that other gym rats; another way to say that is that powerlifters will be a little bit more than one-half of all gym rats that are at least one standard deviation stronger than the average powerlifter.

The ratios grow exponentially with increasing values for strength (the rare correct use of "exponentially" as they are ratios of Normal distribution tail probabilities; see below).

For $S>4$ the ratio is $718$, for $S>5$ the ratio is $4700$, for $S>6$ the ratio is $32 100$, in other words, there will be one non-powerlifter per group of $322$ gym rats with strength greater than 6 standard deviations above the mean of all gym rats.

This is what the effect of the differences in the tails of Normals always implies: eventually the small size of the better population (powerlifters) will be irrelevant as the higher mean will dominate.

See? That wasn't complicated at all.

-- -- -- --

For the mathematically inclined (strangely themselves over-represented in the set of powerlifters...)

Note that the ratio of probability density functions for the two Normal distributions in the post, for realizations of strength $S = x$ is
\frac{f_{S}(x|\mu_{S}=2)}{f_{S}(x|\mu_{S}=0)}= \frac{e^{-(x-2)^2/2}}{e^{-x^2/2}}= e^{2x-2}
which grows unbounded with $x$; no matter how small the fraction of powerlifters, say $\epsilon$, there's always a minimal $\bar S$ beyond which that ratio becomes greater than $1/\epsilon$ Which means that at some point above $\bar S$ the ratio of the remaining tail itself becomes greater than $1/\epsilon$. (It's very easy to calculate $\bar S$ and I have done so; I'll leave it as an exercise for the dedicated reader...)

Oh, that's the rare occurrence of the correct use of "exponentially," which is usually incorrectly treated as a synonym for "convex."