## Wednesday, December 21, 2011

### Powerful problems with power law estimation papers

Perhaps I shouldn't try to make resolutions: I resolved to blog book notes till the end of the year, and instead I'm writing something about estimation.

A power law is a relationship of the form $y = \gamma_0 x^{\gamma_1}$ and can be linearized for estimation using OLS (with a very stretchy assumption on stochastic disturbances, but let's not quibble) into

$\log(y) = \beta_0 + \beta_1 \log(x) +\epsilon$,

from which the original parameters can be trivially recovered:

$\hat\gamma_0 = \exp(\hat\beta_0)$ and $\hat\gamma_1 = \hat\beta_1$.

Power laws are plentiful in Nature, especially when one includes the degree distribution of social networks in a – generous and uncommon, I admit it – definition of Nature. An usually proposed source of power law degree distribution is preferential attachment in network formation: the probability of a new node $i$ being connected to an old node $j$ is an increasing function of the degree of $j$.

The problem with power laws in the wild is that they are really hard to estimate precisely, and I got very annoyed at the glibness of some articles, which report estimation of power laws in highly dequantized manner: they don't actually show the estimates or their descriptive statistics, only charts with no error bars.

Here's my problem: it's well-known that even small stochastic disturbances can make parameter identification in power law data very difficult. And yet, that is never mentioned in those papers. This omission, coupled with the lack of actual estimates and their descriptive statistics, is unforgivable. And suspicious.

Perhaps this needs a couple of numerical examples to clarify; as they say at the end of each season of television shows now:

– To be continued –

## Tuesday, December 20, 2011

### Marginalia: Writing in one's books

I've done it for a long time now, shocking behavior though it is to some of my family and friends.

WHY I make notes

Some of my family members and friends are shocked that I write in my books. The reasons to keep the books in pristine condition vary from maintaining resale value (not an issue for me, as I don't think of books as transient presences in my life) to keeping the integrity of the author's work. Obviously, if I had a first edition of Newton's Principia, I wouldn't write on in; the books I write on are workaday copies, many of them cheap paperbacks or technical books.

The reason why I makes notes is threefold:

To better understand the book as I read it. Actively reading a book, especially a non-fiction or work book, is essentially a dialog between the book and the knowledge I can access, both in my mind and in outside references. Deciding what is important enough to highlight and what points deserve further elaboration in the form of commentary or an example that I furnish, makes reading a much more immersive experience than simply processing the words.

To collect my ideas from several readings (I read many books more than once) into a place where they are not lost. Sometimes points from a previous reading are more clarifying to me than the text itself, sometimes I disagree vehemently with what I wrote before.

To refer to later when I need to find something in the book. This is particularly important in books that I read for work, in particular for technical books where many of the details have been left out (for space reasons) but I added notes that fill those in for the parts I care about.

WHAT types of notes I make

In an earlier post about marginalia on my personal blog I included this image (click for bigger),

showing some notes I made while reading the book Living With Complexity, by Donald Norman. These notes fell into six cases:

Summaries of the arguments in text. Often texts will take long circuitous routes to get to the point. (Norman's book is not one of these.) I tend to write quick summaries, usually in implication form like the one above, that cut down the entropy.

My examples to complement the text. Sometimes I happen to know better examples, or examples that I prefer, than those in the book; in that case I tend to note them in the book so that the example is always connected to the context in which I thought of it. This is particularly useful in work books (and papers, of course) when I turn them into teaching or executive education materials.

Comparisons with external materials. In this case I make a note to compare Norman's point about default choices with the problems Facebook faced in similar matters regarding its privacy.

Notable passages. Marking funny passages with smiley faces and surprising passages with an exclamation point helps find these when browsing the book quickly. Occasionally I also mark passages for style or felicitous turn of phrase, typically with "nice!" on the margin.

Personal commentary. Sometimes the text provokes some reaction that I think is work recording in the book. I don't write review-like commentary in books as a general rule, but I might note something about missing or hidden assumptions, innumeracy, biases, statistical issues; I might also comment positively on an idea, for example, that I had never thought of except for the text.

Quotable passages. These are self-explanatory and particularly easy to make on eBooks. Here's one from George Orwell's Homage To Catalonia:
The constant come-and-go of troops had reduced the village to a state of unspeakable filth. It did not possess and never had possessed such a thing as a lavatory or a drain of any kind, and there was not a square yard anywhere where you could tread without watching your step. (Chapter 2.)

A few other types of marginalia that I have used in other books:

Proofs and analysis to complement what's in the text. As an example, in a PNAS paper on predictions based on search, the authors call $\log(y) = \beta_0 + \beta_1 \log(x)$ a linear model, with the logarithms used to account for the skewness of the variables. I inserted a note that this is clearly a power law relationship, not a linear relationship, with the two steps of algebra that show $y = e^{\beta_0} \times x^{\beta_1}$, in case I happen to be distracted when I reread this paper and can't think through the baby math.

Adding missing references or checking the references (which sometime are incorrect, in which case I correct them). Yep, I'm an academic nerd at heart; but these are important, like a chain of custody for evidence or the provenance records for a work of art.

Diagrams clarifying complicated points. I do this in part because I like visual thinking and in part because if I ever need to present the material to an audience I'll have a starting point for visual support design.

Data that complements the text. Sometimes the text is dequantized and refers to a story for which data is available. I find that adding the data to the story helps me get a better perspective and also if I ever want to use the story I'll have the data there to make a better case.

Counter-arguments. Sometimes I disagree with the text, or at least with the lack of feasible counter-arguments (even when I agree with a position I don't like that the author presents the opposing points of view only in strawman form), so I write the counter-arguments in order to remind me that they exist and the presentation in the text doesn't do them justice.

Markers for things that I want to get. For example, while reading Ted Gioia's The History of Jazz, I marked several recordings that he mentions for acquisition; when reading technical papers I tend to mark the references I want to check; when reading reviews I tend to add things to wishlists (though I also prune these wishlists often).

HOW to make notes

A few practical points for writing marginalia:

Highlighters are not good for long-term notes. They either darken significantly, making it hard to read the highlighted text, or they fade, losing the highlight. I prefer underlining with a high contrast color for short sentences or segments or marking beginning and end of passages on the margin.

Margins are not the only place. I add free-standing inserts, usually in the form of large Post-Its or pieces of paper. Important management tip: write the page number the note refers to on the note.

Transcribing important notes to a searchable format (a text file on my laptop) makes it easy to find stuff later. This is one of the advantages of eBooks of the various types (Kindle, iBook, O'Reilly PDFs), making it easy to search notes and highlights.

Keeping a commonplace book of felicitous turns of phrase (the ones in the books and the ones I come up with) either in a file or on an old-style paper journal helps me become a better writer.

-- -- -- --

Note: This blog may become a little more varied in topics as I decided to write posts more often to practice writing for a general audience. After all, the best way to become a better writer is to write and let others see it. (No comments on the blog, but plenty of ones by email from people I know.)

## Monday, December 12, 2011

### How many possible topologies can a N-node network have?

Short answer, for an undirected network: $2^{N(N-1)/2}$.

Essentially the number of edges is $N(N-1)/2$ so the number of possible topologies is two raised to the number of edges, capturing every possible case where an edge can either be present or absent. For a directed network the number of edges is twice that of those in an undirected network so the number of possible topologies is the square (or just remove the $/2$ part from the formula above).

To show how quickly things get out of control, here are some numbers:

$N=1 \Rightarrow 1$ topology
$N=2 \Rightarrow 2$ topologies
$N=3 \Rightarrow 8$ topologies
$N=4 \Rightarrow 64$ topologies
$N=5 \Rightarrow 1024$ topologies
$N=6 \Rightarrow 32,768$ topologies
$N=7 \Rightarrow 2,097,152$ topologies
$N=8 \Rightarrow 268,435,456$ topologies
$N=9 \Rightarrow 68,719,476,736$ topologies
$N=10 \Rightarrow 35,184,372,088,832$ topologies
$N=20 \Rightarrow 1.5693 \times 10^{57}$ topologies
$N=30 \Rightarrow 8.8725 \times 10^{130}$ topologies
$N=40 \Rightarrow 6.3591 \times 10^{234}$ topologies
$N=50 \Rightarrow 5.7776 \times 10^{368}$ topologies

This is the reason why any serious analysis of a network requires the use of mathematical modeling and computer processing: our human brains are not equipped to deal with this kind of exploding complexity.

And for the visual learners, here's a graph denoting the pointlessness of trying to grasp network topologies "by hand" (note logarithmic vertical scale):

## Saturday, December 10, 2011

### Why analytics practitioners need to worry about "abstruse" statistical/econometric issues

Because these issues are plentiful in data!

Let's say a computer manufacturer has a web site where its customers can configure laptops. The manufacturer notes that a lot of people don't complete the process – some just look at the page, some configure bits and pieces, some go all the way to the end but don't buy, and some configure and buy a laptop – but it gets a lot of data nonetheless.

Analysts salivate over this big data and the multiple measures and the available level of disaggregation and use machine-learning tools to find patterns. But sometimes they fail to check for basic data problems. Here are some customer-behavior-related sources of data problems:

1. Customers configuring laptops are likely to each have a budget, even if it's only a mental one. This makes their choices of variable values in their laptop configurations, say X memory and Y processor speed, interdependent via an unobserved variable. (When they configure the laptop their choices of these dimensions are driven partially by the budget but that budget is not observed by the analysts.) This will create collinearities in the right-hand side variables of the data that would be detected by traditional statistical tools (like factor or principal component analysis, or more simply the non-significant coefficients in a choice model estimation) but are obscured by some machine learning algorithms.

2. Many of the dependent measures used, other than choice-related ones, like customer time-on-page, number of options clicked, number of pages seen, number of menus browsed, number of re-entries into the same page, Facebook likes, page tweets, etc. are highly collinear as well. Often these measures are presented as independent and corroborating measures of interest. This is misleading: they are measures of the same thing using different proxies. (This can be identified with factor or principal component analysis; if two variables are really independent measures of interest – which would be necessary for them to be corroborating – then PCA or FA would separate them as such, given enough data.)

3. Different customers will have different degrees of expertise about laptops, so their choices are likely to have different degrees of idiosyncratic error, which becomes an individual variance (different from other individuals') for their stochastic disturbances. In other words, the data is probably plagued with heteroskedasticity. That's not a big problem per se since it's easily corrected in estimation, but it becomes a problem when, on the rare occasion that standard errors of estimates are shown, the analysts fail to use robust standard error estimation.

4. Often a customer will configure multiple versions of the same laptop to see the price of different feature combinations. This is likely to create serial correlation in the stochastic disturbances: if Bob comes in today after a friend told him how important memory was for a Windows computer, that idiosyncratic error will propagate across all configurations Bob creates today; if tomorrow Bob hears disk space is the key issue, that idiosyncratic error will propagate across all configurations Bob creates tomorrow.

Added Dec 12: Serial correlation is an even more likely problem when analyzing clickstreams, as web links are unidirectional (the reverse motion via "back" button being unobserved in many cases by the clickstream collection system; also not used very often in the middle of long clickstreams), and idiosyncratic factors on one page may drive an long sequence of browsing down one branch of the pages tree rather than another branch. (End of addition.)

5. If Nina configures a laptop from her home computer and another from her work computer, without signing in first, the two configurations will share all of Nina's individual factors but will count as two separate individuals for estimation purpose, giving her individual factors twice the weight in the final estimation. (Also giving the variance of her idiosyncratic stochastic disturbances twice the importance.)

I've been told more than once by people who work in analytics that these are "minor" or "abstruse" statistical points; the people in question "learned" them in a statistics or econometrics class in their far past, but proceeded to forget them, at least operationally, in their careers.  Of course, these "minor" "abstruse" points are the difference between results being informative and being little more than random noise.

I'm pro-analytics, but I want them done correctly.

## Saturday, December 3, 2011

### Why I'm not a fan of "presentation training"

Because there are too many different types of presentation for any sort of abstract training to be effective. So "presentation training" ends up – at best – being "presentation software training."

Learning about information design, writing and general verbal communication, stage management and stage presence, and operation of software and tools used in presentations may help one become a better presenter. But, like in so many technical fields, all of these need some study of the foundations followed by a lot of field- and person-specific practice.

I recommend Edward Tufte's books (and seminar) for information design; Strunk and White's The Elements of Style, James Humes's Speak like Churchill, Stand like Lincoln, and William Zinsser's On Writing Well for verbal communication; and a quick read of the manual followed by exploration of the presentation software one uses. I have no recommendations regarding stage management and stage presence short of joining a theatre group, which is perhaps too much of a commitment for most presenters.

I have already written pretty much all I think about presentation preparation; the present post is about my dislike of "presentation training." To be clear, this is not about preparation for teaching or training to be an instructor. These, being specialized skills – and typically field-specific skills – are a different case.

Problem 1: Generic presentation training is unlikely to help any but the most incompetent of presenters

Since an effective presentation is one designed for its objective, within the norms of its field, targeted to its specific audience, and using the technical knowledge of its field, what use is it to learn generic rules, beyond the minimum of information design, clarity in verbal expression, and stage presence?

(My understanding from people who have attended presentation training is that there was little about information design, nothing about verbal expression, and just platitudes about stage presence.)

For someone who knows nothing about presentations and learns the basics of operating the software, presentation training may be of some use. I think Tufte made this argument: the great presenters won't be goaded into becoming "death by powerpoint" presenters just because they use the software; the terrible presenters will be forced to come up with some talking points, which may help their presentations be less disastrous. But the rest will become worse presenters by focussing on the software and some hackneyed rules – instead of the content of and the audience for the presentation.

Problem 2: Presentation trainers tend to be clueless about the needs of technical presentations

Or, the Norman Critique of the Tufte Table Argument, writ large.

The argument (which I wrote as point 1 in this post) is essentially that looking at a table, a formula, or a diagram as a presentation object – understanding its aesthetics, its information design, its use of color and type – is very different from looking at a table to make sense of the numbers therein, understand the implications of a formula to a mathematical or chemical model, and interpret the implications of the diagram for its field.

Tufte, in his attack on Powerpoint, talks about a table but focusses on its design, not how the numbers would be used, which is what prompted Donald Norman to write his critique; but, of all the people who could be said to be involved in presentation training, Tufte is actually the strongest advocate for content.

The fact remains that there's a very big difference between technical material which is used as a prop to illustrate some presentation device or technique to an audience which is mostly outside the technical field of the material and the same material being used to make a technical point to an audience of the appropriate technical field.

Presentation training, being generic, cannot give specific rules for a given field; but those rules are actually useful to anyone in the field who has questions about how to present something.

Problem 3: Presentation training actions are typically presentations (lectures), which is not an effective way to teach technical material

The best way to teach technical material is to have the students prepare by reading the foundations (or watching video on their own, allowing them to pace the delivery by their own learning speed) and preparing for a discussion or exercise applying what they learned.

This is called participant-centered learning; it's the way people learn technical material. Even in lecture courses the actual learning only happens when the students practice the material.

Almost all presentation training is done in lecture form, delivered as a presentation from the instructor with question-and-answer periods for the audience. But since the audience doesn't actually practice the material in the lecture, they may have only questions of clarification. The real questions that appear during actual practice don't come up during a lecture, and those are the questions that really need an answer.

Problem 4: Most presentation training is too narrowly bracketed

Because it's generic, presentation training misses the point of making a presentation to begin with.

After all, presentations aren't made in a vacuum: there's a purpose to the presentation (say, report market research to decision-makers), an audience with specific needs (product designers who need to understand the parameters of the consumer choice so they can tweak the product line), supporting material that may be used for further reference (a written report with the details of the research), action items and metrics for those items (follow-up research and a schedule of deliverables and budget), and other elements that depend on the presentation.

There's also the culture of the organization which hosts the presentation, disclosure and privacy issues, reliability of sources, and a host of matters apparently unrelated to a presentation that determine its success a lot more than the design of the slides.

In fact, the use of slides, or the idea of a speaker talking to an audience, is itself a constraint on the type of presentations the training is focussed on. And that trains people to think of a presentation as a lecture-style presentation. Many presentations are interactive, perhaps with the "presenter" taking the position of moderator or arbitrator; some presentations are made in roundtable fashion, as a discussion where the main presenter is one of many voices.

Some time ago, I summarized a broader view of a specific type of presentation event (data scientists presenting results to managers) in this diagram, illustrating why and how I thought data scientists should take more care with presentation design (click for larger):

(Note that this is specific advice for people making presentations based on data analysis to managers or decision-makers that rely on the data analysis for action, but cannot do the analysis themselves. Hence the blue rules on the right to minimize the miscommunication between the people from two different fields. This is what I mean by field-specific presentation training.)

These are four reasons why I don't like generic presentation training. Really it's just one: generic presentation training assumes that content is something secondary, and that assumption is the reason why we see so many bad presentations to begin with.

NOTE: Participant-centered learning is a general term for using the class time for discussion and exercises, not necessarily for the Harvard Case Method, which is one form of participant-centered learning.

Related posts:

Posts on presentations in my personal blog.

Posts on teaching in my personal blog.

Posts on presentations in this blog.

My 3500-word post on preparing presentations.

## Friday, December 2, 2011

### Dilbert gets the Correlation-Causation difference wrong

This was the Dilbert comic strip for Nov. 28, 2011:

It seems to imply that even though there's a correlation between the pointy-haired boss leaving Dilbert's cubicle and receiving an anonymous email about the worst boss in the world, there's no causation.

THAT IS WRONG!

Clearly there's causation: PHB leaves Dilbert's cubicle, which causes Wally to send the anonymous email. PHB's implication that he thinks Dilbert sends the email is wrong, but that doesn't mean that the correlation he noticed isn't in this case created by a causal link between leaving Dilbert's cubicle and getting the email.

I think Edward Tufte once said that the statement "correlation is not causation" was incomplete; at least it should read "correlation is not causation, but it sure hints at some relationship that must be investigated further." Or words to that effect.

## Friday, November 25, 2011

### Online Education and the Dentist vs Personal Trainer Models of Learning

Most of the (traditional) teaching I received was squarely based on what I call the Dentist Model of Education: a [student|patient] goes into the [classroom|dentist's office] and the [instructor|dentist] does something technical to the [student|patient]. Once the professional is done, the [student|patient] goes away and [forgets the lecture|never flosses].

I learned almost nothing from that teaching. Like every other person in a technical field, I learned from studying and solving practice problems. (Rule of thumb: learning is 1% lecture, 9% study, 90% practice problems.)

A better education model, the Personal Trainer Model of Education asserts that, like in fitness training, results come from the [trainee|student] practicing the [movements|materials] himself/herself. The job of the [personal trainer|instructor] is to guide that practice and select [exercises|materials] that are appropriate to the [training|instruction] objectives.

Which is why I'm two-thirds skeptical of the goodness of online education.

Obviously there are advantages to online materials: there's low distribution cost, which allows many people to access high quality materials; there's a culture of sharing educational materials, spearheaded by some of the world's premier education institutions; there are many forums, question and answer sites and – for those willing to pay a small fee – actual online courses with instructors and tests.

Leaving aside the broad accessibility of materials, there's no getting around the 1-9-90 rule for learning. Watching Walter Lewin teaching physics may be entertaining, but  without practicing, by solving problem sets, no one watching will become a physicist.

Consider the plethora of online personal training advice and assume that the aspiring trainee manages to find a trainer who knows what he/she is doing. Would this aspiring trainee get better at her fitness exercises by reading a web site and watching videos of the personal trainer exercising? And yet some people believe that they can learn computer programming by watching online lectures. (Or offline lectures, for that matter.*)

If practice is the key to success, why do so many people recognize the absurdity of the video-watching, gym-avoiding fitness trainee while at the same time assume that online lectures are the solution to technical education woes?

(Well-designed online instruction programs are much more than lectures, of course; but what most people mean by online education is not what I consider well-designed and typically is an implementation of the dentist model of education.)

The second reason why I'm skeptic (hence the two-thirds share of skepticism) is that the education system has a second component, beyond instruction: it certifies skills and knowledge. (We could debate how well it does this, but certification is one of the main functions of education institutions.)

Certification of a specific skill can be done piecemeal but complex technical fields depend on more than a student knowing the individual skills of the field; they require the ability to integrate across different sub-disciplines, to think like a member of the profession, to actually do things. That's why engineering students have engineering projects, medical students actually treat patients, etc. These are part of the certification process, which is very hard to do online or with short in-campus events, even if we remove questions of cheating from the mix.

There's enormous potential in online education, but it can only be realized by accepting that education is not like a visit to the dentist but rather like a training session at the gym. And that real, certified learning requires a lot of interaction between the education provider and the student: not something like the one-way lectures one finds online.

(This is not to say that there aren't some good online education programs, but they tend to be uncommon.)

Just like the best-equipped gym in the world will do nothing for a lazy trainee, the best online education platform in the world will do nothing for an unmotivated student. But a motivated kid with nothing but a barbell & plates can become a competitive powerlifter and a motivated kid a with a textbook will learn more than the hordes who watch online lectures while tweeting and facebooking.

The key success factor is not technology; it's the student. It always is.

ADDENDUM (Nov 27, 2011): I've received some comments to the effect that I'm just defending universities from the disruptive innovation of entrants. Perhaps, but:

Universities have several advantages over new institutions, especially when so many of these new institutions have no understanding of what technical education requires. If there was a new online way to sell hamburgers would it surprise anyone that McDs and BK were better at doing it than people who are great at online selling engineering but who never made an hamburger in their lives?

This is not to say that there isn't [vast] room to improve in both the online and offline offerings of universities. But it takes a massive dose of arrogance to assume that everything that went before (in regards to education) can be ignored because of a low cost of content distribution.

--------
* For those who never learned computer programming: you learn by writing programs and testing them. Many many many programs and many many many tests. A quick study of the basics of the language in question is necessary, but better done individually than in a lecture room. Sometimes the learning process can be jump-started by adapting other people's programs. A surefire way to not learn how to program is to listen to someone else talk about programming.

## Thursday, November 24, 2011

### Data cleaning or cherry-picking?

Sometimes there's a fine line between data cleaning and cherry-picking your data.

My new favorite example of this is based on something Nassim Nicholas Taleb said at a talk at Penn (starting at 32 minutes in): that 92% of all kurtosis for silver in the last 40 years of trading could be traced to a single day; 83% of stock market kurtosis could also be traced to one day in 40 years.

One day in forty years is about 1/14,600 of all data. Such a disproportionate effect  might lead some "outlier hunters" to discard that one data point. After all, there are many data butchers (not scientists if they do this) who create arbitrary rules for outlier detection (say, more than four standard deviations away from the mean) and use them without thinking.

In the NNT case, however, that would be counterproductive: the whole point of measuring kurtosis (or, in his argument, the problem that kurtosis is not measurable in any practical way) is to hedge against risk correctly. Underestimating kurtosis will create ineffective hedges, so disposing of the "outlier" will undermine the whole point of the estimation.

In a recent research project I removed one data point from the analysis, deeming it an outlier. But I didn't do it because it was four standard deviations from the mean alone. I found it because it did show an aggregate behavior that was five standard deviations higher than the mean. Then I examined the disaggregate data and confirmed that this was anomalous behavior: the experimental subject had clicked several times on links and immediately clicked back, not even looking at the linked page. This temporally disaggregate behavior, not the aggregate measure of total clicks, was the reason why I deemed the datum an outlier, and excluded it from analysis.

Data cleaning is an important step in data analysis. We should take care to ensure that it's done correctly.

## Sunday, November 13, 2011

### Vanity Fair bungles probability example

There's an interesting article about Danny Kahneman in Vanity Fair, written by Michael Lewis. Kahneman's book Thinking: Fast And Slow is an interesting review of the state of decision psychology and well worth reading, as it the Vanity Fair article.

But the quiz attached to that article is an example of how not to popularize technical content.

This example, question 2, is wrong:
A team of psychologists performed personality tests on 100 professionals, of which 30 were engineers and 70 were lawyers. Brief descriptions were written for each subject. The following is a sample of one of the resulting descriptions:

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies, which include home carpentry, sailing, and mathematics.
What is the probability that Jack is one of the 30 engineers?

A. 10–40 percent
B. 40–60 percent
C. 60–80 percent
D. 80–100 percent

If you answered anything but A (the correct response being precisely 30 percent), you have fallen victim to the representativeness heuristic again, despite having just read about it.
No. Most people have knowledge beyond what is in the description; so, starting from the appropriate prior probabilities, $p(law) = 0.7$ and $p(eng) = 0.3$, they update them with the fact that engineers like math more than lawyers, $p(math|eng) >> p(math|law)$. For illustration consider

$p(math|eng) = 0.5$; half the engineers have math as a hobby.
$p(math|law) = 0.001$; one in a thousand lawyers has math as a hobby.

Then the posterior probabilities (once the description is known) are given by
$p(eng|math) = \frac{ p(math|eng) \times p(eng)}{p(math)}$
$p(law|math) = \frac{ p(math|law) \times p(law)}{p(math)}$
with $p(math) = p(math|eng) \times p(eng) + p(math|law) \times p(law)$. In other words, with the conditional probabilities above,
$p(eng|math) = 0.995$
$p(law|math) = 0.005$
Note that even if engineers as a rule don't like math, only a small minority does, the probability is still much higher than 0.30 as long as the minority of engineers is larger than the minority of lawyers*:
$p(math|eng) = 0.25$ implies $p(eng|math) = 0.991$
$p(math|eng) = 0.10$ implies $p(eng|math) = 0.977$
$p(math|eng) = 0.05$ implies $p(eng|math) = 0.955$
$p(math|eng) = 0.01$ implies $p(eng|math) = 0.811$
$p(math|eng) = 0.005$ implies $p(eng|math) = 0.682$
$p(math|eng) = 0.002$ implies $p(eng|math) = 0.462$
Yes, that last case is a two-to-one ratio of engineers who like math to lawyers who like math; and it still falls out of the 10-40pct category.

I understand the representativeness heuristic, which mistakes $p(math|eng)/p(math|law)$ for $p(eng|math)/p(law|math)$, ignoring the base rates, but there's no reason to give up the inference process if some data in the description is actually informative.

-- -- -- --
* This example shows the elucidative power of working through some numbers. One might be tempted to say "ok, there's some updating, but it will probably still fall under the 10-40pct category" or "you may get large numbers with a disproportionate example like one-half of the engineers and one-in-a-thousand lawyers, but that's just an extreme case." Once we get some numbers down, these two arguments fail miserably.

Numbers are like examples, personas, and prototypes: they force assumptions and definitions out in the open.

## Saturday, November 5, 2011

### How The Mighty Fall, by Jim Collins

(I'm closing my booknotes blog and reposting some of the notes.)

These are reading notes I made for myself and not to be taken either as a review or a substitute for reading Jim Collins's How The Mighty Fall.

I think it's worth reading it and reflecting upon the ideas therein. I don't think it should be used as a how-to manual without some significant further reflection -- which is also the opinion of the author, as he explains on page 36 when discussing the difference between knowing what versus knowing why.

Academic critics of Collins were very happy when some companies in Good to Great and Built to Last failed miserably. I agree with Collins that their failure is not necessarily proof that the books were wrong (since companies change and don't necessarily follow the prescriptions that were derived from their previous behavior) but rather presents an opportunity to study the decline of great firms.

Discounting some methodological issues with the research, the study is interesting and I think elucidative. According to Collins, decline comes in five phases:

Stage 1: Hubris born of success
Success is viewed as deserved while failure is attributed to bad luck (a version of the fundamental attribution bias); this leads to a sense that one is entitled to success regardless, which in turn becomes arrogance.

Distracted by new ventures and outside threats the company starts neglecting its core value proposition and it grows stale or fails to adapt.

Instead of understanding root causes (WHY) the decision-makers focus on what to do that has worked before (WHAT), possibly ignoring changing underlying environment.

This leads to a decline in learning orientation (which is a consequence also of getting big and having to hire more people).

Stage 2: Undisciplined pursuit of more
Packard's law: Companies die more often of indigestion from pursuing too many opportunities than starvation from too few opportunities to pursue. (Named after HP founder David Packard.)

Quest for growth at any cost mistakes big for great. Leads to a vicious cycle of expectations, their disconfirmation, demotivated personnel. Also propitious to undisciplined discontinuous leaps, leading to non-core businesses that detract from core value of firm.

Easy cash erodes discipline (instead of cost control, focus on revenue building; when counter-cycle hits, costs stay and revenues dwindle). Bureaucracy subverts discipline (people think of jobs rather than responsibilities). Politicking for power as the company becomes larger destroys morale.

Stage 3: Denial of risk and peril
Psychological tendency to amplify the positive and discount the negative turns decisions from "can you show that it is safe" to "can you prove that is unsafe" like with the Challenger launch decision. Wrong framing leads to:

Big bets and bold moves without empirical validation. This brings up the possibility of huge downside risk based on ambiguous data. In particular taking risks that are below the waterline thinking they are above the waterline. (It's a ship analogy: is this risk equivalent to getting a hole in the hull above or below the waterline? the first is a problem that needs to be solved for optimal operation, the second places survivability at risk.)

Team dynamics fall apart as people who see the risks are marginalized by those who focus on the positive side; the latter externalize the blame (the fundamental attribution bias again).

This is the phase when firms start making obsessive reorganization, as if the problems just described were a matter or shuffling people around on an organization chart or merging and splitting departments. This is activity masquerading as useful action, while internal politics overwhelm considerations over outside environment.

Stage 4: Grasping for salvation
The last point of stage three becomes endemic: reorganizations, restructuring and looking for leaders who are magical saviors. The company is in a panic, looking for magic solutions and silver bullets that will stem decline, typically missing the point, which is that they have no actual value proposition for the market and their internal processes are dysfunctional.

Radical change is touted as a solution (which raises the problem that everyone wants change -- the company is in decline -- but most want different types of change than each other). Organization-wide placebo effect (known as the Hawthorne effect) leads to short bursts of improvement followed by long periods of further decline

Lots of hype and hope instead of actual results, leading to confusion and cynicism when the deception becomes clear.

Stage 5: Capitulation to irrelevance or death

## Tuesday, November 1, 2011

### Less

I found a magic word and it's "less."

On September 27, 2011, I decided to run a lifestyle experiment. Nothing radical, just a month of no non-essential purchases, the month of October 2011. These are the lessons from that experiment.

Separate need, want, and like

One of the clearest distinctions a "no non-essential purchases" experiment required me to make was the split between essential and non-essential.

Things like food, rent, utilities, gym membership, Audible, and Netflix I categorized as essential, or needs. The first three for obvious reasons, the last three because the hassle of suspending them wasn't worth the savings.

A second category of purchases under consideration was wants, things that I felt that I needed but could postpone the purchase until the end of the month. This included things like Steve Jobs's biography, for example. I just collected these in the Amazon wish list.

A third category was likes. Likes were things that I wanted to have but knew that I could easily live without them. (Jobs's biography doesn't fall into this category, as anyone who wants to discuss the new economy seriously has to read it. It's a requirement of my work, as far as I am concerned.) I placed these in the Amazon wish list as well.

Over time, some things that I perceived as needs were revealed as simply wants or even likes. And many wants ended up as likes. This means that just by delaying the decision to purchase for some time I made better decisions.

This doesn't mean that I won't buy something because I like it (I do have a large collection of music, art, photography, history, science, and science fiction books, all of which are not strictly necessary). What it means is that the decision to buy something is moderated by the preliminary categorization into these three levels of priority.

A corollary of this distinction is that it allows me to focus on what is really important in the activities that I engage in. I summarized some results in the following table (click for bigger):

One of the regularities of this table is that the entries in the middle column (things that are wrongly emphasized) tend to be things that are bought, while entries in the last column (what really matters) tend to be things that are learned or experienced.

Correct accounting focusses on time, not on nominal money

Ok, so I can figure out a way to spend less in things that are not that necessary. Why is this a source of happiness?

Because money to spend costs time and I don't even get all the money.

When I spend one hour working a challenging technical marketing problem for my own enjoyment, I get the full benefit of that one hour of work, in the happiness solving a puzzle always brings me. When I work for one hour on something that I'd rather not be doing for a payment of X dollars, I get to keep about half of those X dollars (when everything is accounted for). I wrote an illustration of this some time ago.

In essence, money to spend comes, at least partially from doing things you'd rather not do, or doing them at times when you'd rather be doing something else, or doing them at locations that you'd rather not travel to. I like the teaching and research parts of my job, but there are many other parts that I do because it's the job. I'm lucky in that I like my job; but even so I don't like all the activities it involves.

The less money I need, the fewer additional things I have to do for money. And, interestingly, the higher my price for doing those things. (If my marginal utility of money is lower, you need to pay more for me to incur the disutility of teaching that 6-9AM on-location exec-ed seminar than you'd have to pay to a alternate version of me that really wants money to buy the latest glued "designer" suit.)

Clarity of purpose, not simply frugality, is the key aspect

I'm actually quite frugal, having never acquired the costly luxury items of a wife and children, but the lessons here are not about frugality, rather about clarity of purpose.

## Monday, August 29, 2011

### Decline and fall of Western Manufacturing - a pessimistic reading of Pisano and Shih (2009)

Those who don't know history are condemned to repeat it.

Unfortunately those of us who do know history get dragged right along with the others, because we live in a world where everything is connected to everything else.

Above is my visualization of Pisano and Shih's 2009 Harvard Business Review article "Restoring American Competitiveness." This is a stylized version of a story that has happened in several industries.

Step 1: Companies start outsourcing their manufacturing operations to companies (or countries) which can perform them in a more cost-effective manner. Perhaps these companies/countries have cheaper labor, fewer costly regulations, or less overhead.

Step 2: Isolated from their manufacture, companies lose the skills for process engineering. After all, improving manufacturing processes is a task that depends on continuous experimentation and feedback from the manufacturing process. If the manufacturing process is outsourced, the necessary interaction between manufacturing and process engineers happens progressively more inside the contractor, not the original manufacturer.

Step 3: Without process engineering to motivate it, the original manufacturer (and the companies supporting it in the original country, in the diagram the US) stops investing in process technology development. For example, the companies that developed machine tools for US manufacturers in conjunction with US process engineers now have to so do with Taiwanese engineers in Taiwan, which leads to relocation of these companies and eventually of the skilled professionals.

Step 4: Because of spillovers in technological development between process technologies and product technologies (including the development of an engineering class and engineering support infrastructure), more and more product technology development is outsourced. For example, as fewer engineering jobs are available in the original country, fewer people go to engineering school; the opposite happens in the outsourced-to country, where an engineering class grows. That growth is a spillover that is seldom accounted for.

Step 5: As more and more technology development happens in the outsourced-to country, it captures more and more of the product innovation process, eventually substituting for the innovators in the original manufacturer's country. Part of this innovation may still be under contract with the original manufacturer, but the development of innovation skills in the outsourced-to country means that at some point it will have its own independent manufacturers (who will compete with the original manufacturer).

Pisano and Shih are optimists, as their article proposes solutions to slow, stop, and reverse this process of technological decline of the West (in their case, the US). It's worth a read (it's not free but it's cheaper than a day worth of lattes, m'kay?) and ends in an upbeat note.

I'm less optimistic than Pisano and Shih. Behold:

Problem 1: Too many people and too much effort dedicated to non-wealth-creating activities and too many people and too much effort aimed at stopping wealth-creating activities.

Problem 2: Lack of emphasis in useful skills (particularly STEM, entrepreneurship, and "maker" culture) in education. Sadly accompanied by a sense of entitlement and self-confidence which is inversely proportional to the actual skills.

Problem 3: Too much public discourse (politicians of both parties, news media, entertainment) which vilifies the creation of wealth and applauds the forcible redistribution of whatever wealth is created.

Problem 4: A generalized confusion between wealth and pieces of fancy green paper with pictures of dead presidents (or Ben Franklin) on them.

Problem 5: A lack of priorities or perspective beyond the immediate sectorial interests.

We are doomed!

## Monday, August 22, 2011

### Preparing instruction is different from preparing presentations

The title bears repeating, as many people confuse instruction and presentation preparation skills and criteria for success: Preparing instruction is different from preparing presentations.

My 3500-word post on preparing presentations is exactly for that purpose, preparing presentations. I could try to write a post for preparing instruction, but it would quickly get to book size. In fact, I recommend several books in this post describing the evolution of information design in my teaching approach. (The most relevant books for teaching are at the addendum to this post.)

I made a diagram depicting my process of preparing for a instruction event (the diagram was for my personal use, but there's no reason not to share it; click for larger):

And, for comparison, the process for preparing presentations:

Because they look similar, I need to point out that the tools used in each phase of the process are different for presentations and for instruction.

I'm a big fan of participant-centered learning (though not necessarily the HBS cases that people always associate with PCL); the idea is simple: students learn from doing, not from watching the instructor do. So, many of the "materials" (more precisely, most of the time in the "plan with timing" part of the diagram) in an instruction event are audience work: discussions, examples brought by the audience (to complement those brought by the instructor) and exercises. These are not materials that can be used in a speech or a presentation to a large audience.

Also, while a story works as a motivator for both presentations and instruction, I tend to use exercises or problems as motivators for instruction. For example, I start a class on promotion metrics by asking "how do you measure the lift" of some promotional activity, and proceed from there. By making it a management task that they have to do as part of their jobs, I get some extra attention from the audience. Plus, they can immediately see how the class will help them with their jobs.*

There are presentations that are mostly for instruction purposes, and there are parts of instruction events that are presentations. But never mistake one for the other: preparing instruction is different from preparing presentations.

Though so much instruction is so poorly prepared that even the basics of presentation preparation will help make instruction less of a disaster, that's just a step towards instruction-specific preparation.

- - - - - - - - - - - -

*I have a large variety of exercises for each knowledge unit I teach, and they are not all of the form "here's a problem, what's the solution?" Some are of the forms "here's what a company is doing, what are they trying to achieve?" and "here's a problem, here's what the company is doing, what is wrong with that?"

Addendum: Two books on preparation (and delivery) of instruction, from the post describing the evolution of information design in my teaching approach:

Tools for teaching, by Barbara Gross Davis covers every element of course design, class design, class management, and evaluation. It is rather focussed on institutional learning (like university courses), but many of the issues, techniques, and checklists are applicable in other instruction environments.

Designing effective instruction, by Gary Morrison, Steven Ross, and Jerrold Kemp, complements Tools for teaching. While TfT has the underlying model of a class, this book tackles the issues of training and instruction from a professional service point of view. (In short: TfT is geared towards university classes, DEI is geared towards firm-specific Exec-Ed.)

## Thursday, August 11, 2011

Copyright "reformers," or better yet copyright warriors (they're at war with copyright a lot more than they want to reform it) fail to understand some basic business points; that gets in the way of meaningful discussions about copyright. Which turns all discussions of copyright into politics.

My evidence of the copyright warriors' blind spot is this convenient video of Cory Doctorow giving the keynote address at SIGGRAPH 2011:

I agree that there have been some abuses of copyright law, but the whole address is marred by his failure to see value in anything other than content creation.

Here are some points that anyone with a minimal interest in the business of content would make:

1. Distribution is a value-added activity. This point needs hammering in, because there's a large subset of otherwise intelligent people who believe that distribution is some sort of parasite on the creatives. Because they don't understand the various functions of distribution beyond the physical movement of materials (or bits), they fail to see that financing, promoting, and filtering/bundling have a value of their own.

2. Revenue models are built with many elements. Another point that seems to escape the copyright warriors is that the hardware, the services of the brick-and-mortar bookstore, and customer support are cheap (or free) because the provider funds those services out of the revenues collected otherwise. When deciding whether to launch a new product (and at what price), companies take into account the costs and investments required and all the sources of revenue.

That's why restaurants charge a corkage fee. Yes, you can buy the same Chateau d'Proglos at Safeway for half the price, but the restaurant's markup on the wine is how it subsidizes the price of the meal. Yes, the very expensive meal is still subsidized by the drinks. Copyright warriors want to pay the subsidized price for the hardware and then escape paying the content prices that support that subsidy.* (Economists call this an horizontal externality.)

3. Ecosystems and their components don't fall from the sky. Apple had to develop the iTunes/iPhone/iPad ecosystems by actually paying money to engineers, designers, patent holders, and business consultants. Much of that money went into blind alleys, under the rubric of "acceptable business risk" and "cancelled project." The components of the ecosystem may be cheap to reproduce, but they were expensive to produce. (This problem is much larger in pharmaceuticals.)

When deciding on these investments, spending billions to develop their own chips and millions to design their interfaces, Apple and other companies look at total revenue projections, not just the small markup on hardware (yes, it looks big – to people who have no notion of the cost of capital or the failure rate for internal projects).

4. Corporations are not college bull sessions writ large. A corporation like Apple, Amazon, Microsoft, or Google is bound by a very large body of law and regulation. If a corporation fails to protect its intellectual property, for example, it soon loses its right to control it. Therefore, it's a fiduciary obligation of that corporation's management to use legal means to protect it. It's true that many companies manipulate the legal -- and sometimes the political -- process to their advantage. But failing to protect their copyrights would be a failure of the fiduciary obligation; a dereliction of duty.

Companies have some flexibility given the trade-off between the obligation and public relations, but the only cases that are visible are the ones where the trade-off comes on the side of the obligation; that many companies ignore violations of DMCA is obvious by the number of copies of Handbrake found in macs, for example.

5. There are good reasons for keeping one's ecosystem controlled (as much as possible) beyond revenue creation. For example, Apple's vetting of apps for iPhone/iTouch/iPad is in part a protection of Apple's brand equity, the part of which that says "apple is a safer product to use than others." (See also the footnote about the iOS payment system.)

6. Recent history shows the need for some copy protection. Not that it is very hard to defeat, but without it – consider the free distribution of music in the early days of file sharing – making a living out of content becomes very difficult. In other words, copy protection is necessary to have a consistent revenue model that can fund creative industries. Yes, you can write a book very cheaply, but what about TV shows, blockbuster movies, professionally recorded music, art photographs with professional models or in remote locales?

All these observations stem from the same point, the inability to see value created by other parts of the process involved in the consumption of content. Yes, there are abuses of copyright protection, made for simple revenue enhancement; but treating content as carrying all the value and the rest of the ecosystem as being somehow irrelevant – or as something that exists for the benefit of content creators only – is unbelievably myopic.

A first step towards a rational discussion of copyright is to accept that actions other than content creation have value. Cory still hasn't taken that step.

-- -- -- -- -- --

* Here's the response to the "evil Apple wants 30% of all money spent through their platform," which is the corkage fee-equivalent: this can be avoided simply by having the customers make their purchases through the web site of the provider, even from the iOS product itself.

What that 30% fee is capturing is not just the price of convenience; what iOS is giving to the app is the credibility of "transaction via Apple," as opposed to "transaction via the web page of a company that makes the app." Is it hard to understand where the value comes from in this difference?

Yes, Amazon can complain, but they can easily use the Kindle app to tell their servers what sample the reader just ended, and to put the book sampled at the top of the recommendation list; Apple would probably get into serious trouble legally if it gave Amazon privileged treatment.

NOTE: I'm a fan of Cory's fiction and I agree that there are clear cases of abuse and even political use of copyright for nefarious ends. But to reform copyright we need to accept that businesses exist for the creation and capture of value and that includes businesses built around content. Pretending that the content business is somehow insulated from the rules of economics is myopic and ultimately self-defeating.