Wednesday, December 21, 2011

Powerful problems with power law estimation papers

Perhaps I shouldn't try to make resolutions: I resolved to blog book notes till the end of the year, and instead I'm writing something about estimation.

A power law is a relationship of the form $y = \gamma_0 x^{\gamma_1}$ and can be linearized for estimation using OLS (with a very stretchy assumption on stochastic disturbances, but let's not quibble) into

$\log(y) = \beta_0 + \beta_1 \log(x) +\epsilon$,

from which the original parameters can be trivially recovered:

$\hat\gamma_0 = \exp(\hat\beta_0)$ and $\hat\gamma_1 = \hat\beta_1$.

Power laws are plentiful in Nature, especially when one includes the degree distribution of social networks in a – generous and uncommon, I admit it – definition of Nature. An usually proposed source of power law degree distribution is preferential attachment in network formation: the probability of a new node $i$ being connected to an old node $j$ is an increasing function of the degree of $j$.

The problem with power laws in the wild is that they are really hard to estimate precisely, and I got very annoyed at the glibness of some articles, which report estimation of power laws in highly dequantized manner: they don't actually show the estimates or their descriptive statistics, only charts with no error bars.

Here's my problem: it's well-known that even small stochastic disturbances can make parameter identification in power law data very difficult. And yet, that is never mentioned in those papers. This omission, coupled with the lack of actual estimates and their descriptive statistics, is unforgivable. And suspicious.

Perhaps this needs a couple of numerical examples to clarify; as they say at the end of each season of television shows now:

– To be continued –

Tuesday, December 20, 2011

Marginalia: Writing in one's books

I've done it for a long time now, shocking behavior though it is to some of my family and friends.

WHY I make notes

Some of my family members and friends are shocked that I write in my books. The reasons to keep the books in pristine condition vary from maintaining resale value (not an issue for me, as I don't think of books as transient presences in my life) to keeping the integrity of the author's work. Obviously, if I had a first edition of Newton's Principia, I wouldn't write on in; the books I write on are workaday copies, many of them cheap paperbacks or technical books.

The reason why I makes notes is threefold:

To better understand the book as I read it. Actively reading a book, especially a non-fiction or work book, is essentially a dialog between the book and the knowledge I can access, both in my mind and in outside references. Deciding what is important enough to highlight and what points deserve further elaboration in the form of commentary or an example that I furnish, makes reading a much more immersive experience than simply processing the words.

To collect my ideas from several readings (I read many books more than once) into a place where they are not lost. Sometimes points from a previous reading are more clarifying to me than the text itself, sometimes I disagree vehemently with what I wrote before.

To refer to later when I need to find something in the book. This is particularly important in books that I read for work, in particular for technical books where many of the details have been left out (for space reasons) but I added notes that fill those in for the parts I care about.

WHAT types of notes I make

In an earlier post about marginalia on my personal blog I included this image (click for bigger),

For blog post

showing some notes I made while reading the book Living With Complexity, by Donald Norman. These notes fell into six cases:

Summaries of the arguments in text. Often texts will take long circuitous routes to get to the point. (Norman's book is not one of these.) I tend to write quick summaries, usually in implication form like the one above, that cut down the entropy.

My examples to complement the text. Sometimes I happen to know better examples, or examples that I prefer, than those in the book; in that case I tend to note them in the book so that the example is always connected to the context in which I thought of it. This is particularly useful in work books (and papers, of course) when I turn them into teaching or executive education materials.

Comparisons with external materials. In this case I make a note to compare Norman's point about default choices with the problems Facebook faced in similar matters regarding its privacy.

Notable passages. Marking funny passages with smiley faces and surprising passages with an exclamation point helps find these when browsing the book quickly. Occasionally I also mark passages for style or felicitous turn of phrase, typically with "nice!" on the margin.

Personal commentary. Sometimes the text provokes some reaction that I think is work recording in the book. I don't write review-like commentary in books as a general rule, but I might note something about missing or hidden assumptions, innumeracy, biases, statistical issues; I might also comment positively on an idea, for example, that I had never thought of except for the text.

Quotable passages. These are self-explanatory and particularly easy to make on eBooks. Here's one from George Orwell's Homage To Catalonia:
The constant come-and-go of troops had reduced the village to a state of unspeakable filth. It did not possess and never had possessed such a thing as a lavatory or a drain of any kind, and there was not a square yard anywhere where you could tread without watching your step. (Chapter 2.)

A few other types of marginalia that I have used in other books:

Proofs and analysis to complement what's in the text. As an example, in a PNAS paper on predictions based on search, the authors call $\log(y) = \beta_0 + \beta_1 \log(x)$ a linear model, with the logarithms used to account for the skewness of the variables. I inserted a note that this is clearly a power law relationship, not a linear relationship, with the two steps of algebra that show $y = e^{\beta_0} \times x^{\beta_1}$, in case I happen to be distracted when I reread this paper and can't think through the baby math.

Adding missing references or checking the references (which sometime are incorrect, in which case I correct them). Yep, I'm an academic nerd at heart; but these are important, like a chain of custody for evidence or the provenance records for a work of art.

Diagrams clarifying complicated points. I do this in part because I like visual thinking and in part because if I ever need to present the material to an audience I'll have a starting point for visual support design.

Data that complements the text. Sometimes the text is dequantized and refers to a story for which data is available. I find that adding the data to the story helps me get a better perspective and also if I ever want to use the story I'll have the data there to make a better case.

Counter-arguments. Sometimes I disagree with the text, or at least with the lack of feasible counter-arguments (even when I agree with a position I don't like that the author presents the opposing points of view only in strawman form), so I write the counter-arguments in order to remind me that they exist and the presentation in the text doesn't do them justice.

Markers for things that I want to get. For example, while reading Ted Gioia's The History of Jazz, I marked several recordings that he mentions for acquisition; when reading technical papers I tend to mark the references I want to check; when reading reviews I tend to add things to wishlists (though I also prune these wishlists often).

HOW to make notes

A few practical points for writing marginalia:

Highlighters are not good for long-term notes. They either darken significantly, making it hard to read the highlighted text, or they fade, losing the highlight. I prefer underlining with a high contrast color for short sentences or segments or marking beginning and end of passages on the margin.

Margins are not the only place. I add free-standing inserts, usually in the form of large Post-Its or pieces of paper. Important management tip: write the page number the note refers to on the note.

Transcribing important notes to a searchable format (a text file on my laptop) makes it easy to find stuff later. This is one of the advantages of eBooks of the various types (Kindle, iBook, O'Reilly PDFs), making it easy to search notes and highlights.

Keeping a commonplace book of felicitous turns of phrase (the ones in the books and the ones I come up with) either in a file or on an old-style paper journal helps me become a better writer.

-- -- -- --

Note: This blog may become a little more varied in topics as I decided to write posts more often to practice writing for a general audience. After all, the best way to become a better writer is to write and let others see it. (No comments on the blog, but plenty of ones by email from people I know.)

Monday, December 12, 2011

How many possible topologies can a N-node network have?

Short answer, for an undirected network: $2^{N(N-1)/2}$.

Essentially the number of edges is $N(N-1)/2$ so the number of possible topologies is two raised to the number of edges, capturing every possible case where an edge can either be present or absent. For a directed network the number of edges is twice that of those in an undirected network so the number of possible topologies is the square (or just remove the $/2$ part from the formula above).

To show how quickly things get out of control, here are some numbers:

$N=1 \Rightarrow 1$ topology
$N=2 \Rightarrow 2$ topologies
$N=3 \Rightarrow 8$ topologies
$N=4 \Rightarrow 64$ topologies
$N=5 \Rightarrow 1024$ topologies
$N=6 \Rightarrow 32,768$ topologies
$N=7 \Rightarrow 2,097,152$ topologies
$N=8 \Rightarrow 268,435,456$ topologies
$N=9 \Rightarrow 68,719,476,736$ topologies
$N=10 \Rightarrow 35,184,372,088,832$ topologies
$N=20 \Rightarrow 1.5693 \times 10^{57}$ topologies
$N=30 \Rightarrow 8.8725 \times 10^{130}$ topologies
$N=40 \Rightarrow 6.3591 \times 10^{234}$ topologies
$N=50 \Rightarrow 5.7776 \times 10^{368}$ topologies

This is the reason why any serious analysis of a network requires the use of mathematical modeling and computer processing: our human brains are not equipped to deal with this kind of exploding complexity.

And for the visual learners, here's a graph denoting the pointlessness of trying to grasp network topologies "by hand" (note logarithmic vertical scale):

Number of network topologies as a function of the number of nodes

Saturday, December 10, 2011

Why analytics practitioners need to worry about "abstruse" statistical/econometric issues

Because these issues are plentiful in data!

Let's say a computer manufacturer has a web site where its customers can configure laptops. The manufacturer notes that a lot of people don't complete the process – some just look at the page, some configure bits and pieces, some go all the way to the end but don't buy, and some configure and buy a laptop – but it gets a lot of data nonetheless.

Analysts salivate over this big data and the multiple measures and the available level of disaggregation and use machine-learning tools to find patterns. But sometimes they fail to check for basic data problems. Here are some customer-behavior-related sources of data problems:

1. Customers configuring laptops are likely to each have a budget, even if it's only a mental one. This makes their choices of variable values in their laptop configurations, say X memory and Y processor speed, interdependent via an unobserved variable. (When they configure the laptop their choices of these dimensions are driven partially by the budget but that budget is not observed by the analysts.) This will create collinearities in the right-hand side variables of the data that would be detected by traditional statistical tools (like factor or principal component analysis, or more simply the non-significant coefficients in a choice model estimation) but are obscured by some machine learning algorithms.

2. Many of the dependent measures used, other than choice-related ones, like customer time-on-page, number of options clicked, number of pages seen, number of menus browsed, number of re-entries into the same page, Facebook likes, page tweets, etc. are highly collinear as well. Often these measures are presented as independent and corroborating measures of interest. This is misleading: they are measures of the same thing using different proxies. (This can be identified with factor or principal component analysis; if two variables are really independent measures of interest – which would be necessary for them to be corroborating – then PCA or FA would separate them as such, given enough data.)

3. Different customers will have different degrees of expertise about laptops, so their choices are likely to have different degrees of idiosyncratic error, which becomes an individual variance (different from other individuals') for their stochastic disturbances. In other words, the data is probably plagued with heteroskedasticity. That's not a big problem per se since it's easily corrected in estimation, but it becomes a problem when, on the rare occasion that standard errors of estimates are shown, the analysts fail to use robust standard error estimation.

4. Often a customer will configure multiple versions of the same laptop to see the price of different feature combinations. This is likely to create serial correlation in the stochastic disturbances: if Bob comes in today after a friend told him how important memory was for a Windows computer, that idiosyncratic error will propagate across all configurations Bob creates today; if tomorrow Bob hears disk space is the key issue, that idiosyncratic error will propagate across all configurations Bob creates tomorrow.

Added Dec 12: Serial correlation is an even more likely problem when analyzing clickstreams, as web links are unidirectional (the reverse motion via "back" button being unobserved in many cases by the clickstream collection system; also not used very often in the middle of long clickstreams), and idiosyncratic factors on one page may drive an long sequence of browsing down one branch of the pages tree rather than another branch. (End of addition.)

5. If Nina configures a laptop from her home computer and another from her work computer, without signing in first, the two configurations will share all of Nina's individual factors but will count as two separate individuals for estimation purpose, giving her individual factors twice the weight in the final estimation. (Also giving the variance of her idiosyncratic stochastic disturbances twice the importance.)

I've been told more than once by people who work in analytics that these are "minor" or "abstruse" statistical points; the people in question "learned" them in a statistics or econometrics class in their far past, but proceeded to forget them, at least operationally, in their careers.  Of course, these "minor" "abstruse" points are the difference between results being informative and being little more than random noise.

I'm pro-analytics, but I want them done correctly.

Saturday, December 3, 2011

Why I'm not a fan of "presentation training"

Because there are too many different types of presentation for any sort of abstract training to be effective. So "presentation training" ends up – at best – being "presentation software training."

Learning about information design, writing and general verbal communication, stage management and stage presence, and operation of software and tools used in presentations may help one become a better presenter. But, like in so many technical fields, all of these need some study of the foundations followed by a lot of field- and person-specific practice.

I recommend Edward Tufte's books (and seminar) for information design; Strunk and White's The Elements of Style, James Humes's Speak like Churchill, Stand like Lincoln, and William Zinsser's On Writing Well for verbal communication; and a quick read of the manual followed by exploration of the presentation software one uses. I have no recommendations regarding stage management and stage presence short of joining a theatre group, which is perhaps too much of a commitment for most presenters.

I have already written pretty much all I think about presentation preparation; the present post is about my dislike of "presentation training." To be clear, this is not about preparation for teaching or training to be an instructor. These, being specialized skills – and typically field-specific skills – are a different case.

Problem 1: Generic presentation training is unlikely to help any but the most incompetent of presenters

Since an effective presentation is one designed for its objective, within the norms of its field, targeted to its specific audience, and using the technical knowledge of its field, what use is it to learn generic rules, beyond the minimum of information design, clarity in verbal expression, and stage presence?

(My understanding from people who have attended presentation training is that there was little about information design, nothing about verbal expression, and just platitudes about stage presence.)

For someone who knows nothing about presentations and learns the basics of operating the software, presentation training may be of some use. I think Tufte made this argument: the great presenters won't be goaded into becoming "death by powerpoint" presenters just because they use the software; the terrible presenters will be forced to come up with some talking points, which may help their presentations be less disastrous. But the rest will become worse presenters by focussing on the software and some hackneyed rules – instead of the content of and the audience for the presentation.

Problem 2: Presentation trainers tend to be clueless about the needs of technical presentations

Or, the Norman Critique of the Tufte Table Argument, writ large.

The argument (which I wrote as point 1 in this post) is essentially that looking at a table, a formula, or a diagram as a presentation object – understanding its aesthetics, its information design, its use of color and type – is very different from looking at a table to make sense of the numbers therein, understand the implications of a formula to a mathematical or chemical model, and interpret the implications of the diagram for its field.

Tufte, in his attack on Powerpoint, talks about a table but focusses on its design, not how the numbers would be used, which is what prompted Donald Norman to write his critique; but, of all the people who could be said to be involved in presentation training, Tufte is actually the strongest advocate for content.

The fact remains that there's a very big difference between technical material which is used as a prop to illustrate some presentation device or technique to an audience which is mostly outside the technical field of the material and the same material being used to make a technical point to an audience of the appropriate technical field.

Presentation training, being generic, cannot give specific rules for a given field; but those rules are actually useful to anyone in the field who has questions about how to present something.

Problem 3: Presentation training actions are typically presentations (lectures), which is not an effective way to teach technical material

The best way to teach technical material is to have the students prepare by reading the foundations (or watching video on their own, allowing them to pace the delivery by their own learning speed) and preparing for a discussion or exercise applying what they learned.

This is called participant-centered learning; it's the way people learn technical material. Even in lecture courses the actual learning only happens when the students practice the material.

Almost all presentation training is done in lecture form, delivered as a presentation from the instructor with question-and-answer periods for the audience. But since the audience doesn't actually practice the material in the lecture, they may have only questions of clarification. The real questions that appear during actual practice don't come up during a lecture, and those are the questions that really need an answer.

Problem 4: Most presentation training is too narrowly bracketed

Because it's generic, presentation training misses the point of making a presentation to begin with.

After all, presentations aren't made in a vacuum: there's a purpose to the presentation (say, report market research to decision-makers), an audience with specific needs (product designers who need to understand the parameters of the consumer choice so they can tweak the product line), supporting material that may be used for further reference (a written report with the details of the research), action items and metrics for those items (follow-up research and a schedule of deliverables and budget), and other elements that depend on the presentation.

There's also the culture of the organization which hosts the presentation, disclosure and privacy issues, reliability of sources, and a host of matters apparently unrelated to a presentation that determine its success a lot more than the design of the slides.

In fact, the use of slides, or the idea of a speaker talking to an audience, is itself a constraint on the type of presentations the training is focussed on. And that trains people to think of a presentation as a lecture-style presentation. Many presentations are interactive, perhaps with the "presenter" taking the position of moderator or arbitrator; some presentations are made in roundtable fashion, as a discussion where the main presenter is one of many voices.

Some time ago, I summarized a broader view of a specific type of presentation event (data scientists presenting results to managers) in this diagram, illustrating why and how I thought data scientists should take more care with presentation design (click for larger):

Putting some thought into presentations - backward induction approach

(Note that this is specific advice for people making presentations based on data analysis to managers or decision-makers that rely on the data analysis for action, but cannot do the analysis themselves. Hence the blue rules on the right to minimize the miscommunication between the people from two different fields. This is what I mean by field-specific presentation training.)

These are four reasons why I don't like generic presentation training. Really it's just one: generic presentation training assumes that content is something secondary, and that assumption is the reason why we see so many bad presentations to begin with.

NOTE: Participant-centered learning is a general term for using the class time for discussion and exercises, not necessarily for the Harvard Case Method, which is one form of participant-centered learning.

Related posts:

Posts on presentations in my personal blog.

Posts on teaching in my personal blog.

Posts on presentations in this blog.

My 3500-word post on preparing presentations.

Friday, December 2, 2011

Dilbert gets the Correlation-Causation difference wrong

This was the Dilbert comic strip for Nov. 28, 2011:

It seems to imply that even though there's a correlation between the pointy-haired boss leaving Dilbert's cubicle and receiving an anonymous email about the worst boss in the world, there's no causation.


Clearly there's causation: PHB leaves Dilbert's cubicle, which causes Wally to send the anonymous email. PHB's implication that he thinks Dilbert sends the email is wrong, but that doesn't mean that the correlation he noticed isn't in this case created by a causal link between leaving Dilbert's cubicle and getting the email.

I think Edward Tufte once said that the statement "correlation is not causation" was incomplete; at least it should read "correlation is not causation, but it sure hints at some relationship that must be investigated further." Or words to that effect.

Friday, November 25, 2011

Online Education and the Dentist vs Personal Trainer Models of Learning

I'm a little skeptical about online education. About 2/3 skeptical.

Most of the (traditional) teaching I received was squarely based on what I call the Dentist Model of Education: a [student|patient] goes into the [classroom|dentist's office] and the [instructor|dentist] does something technical to the [student|patient]. Once the professional is done, the [student|patient] goes away and [forgets the lecture|never flosses].

I learned almost nothing from that teaching. Like every other person in a technical field, I learned from studying and solving practice problems. (Rule of thumb: learning is 1% lecture, 9% study, 90% practice problems.)

A better education model, the Personal Trainer Model of Education asserts that, like in fitness training, results come from the [trainee|student] practicing the [movements|materials] himself/herself. The job of the [personal trainer|instructor] is to guide that practice and select [exercises|materials] that are appropriate to the [training|instruction] objectives.

Which is why I'm two-thirds skeptical of the goodness of online education.

Obviously there are advantages to online materials: there's low distribution cost, which allows many people to access high quality materials; there's a culture of sharing educational materials, spearheaded by some of the world's premier education institutions; there are many forums, question and answer sites and – for those willing to pay a small fee – actual online courses with instructors and tests.

Leaving aside the broad accessibility of materials, there's no getting around the 1-9-90 rule for learning. Watching Walter Lewin teaching physics may be entertaining, but  without practicing, by solving problem sets, no one watching will become a physicist.

Consider the plethora of online personal training advice and assume that the aspiring trainee manages to find a trainer who knows what he/she is doing. Would this aspiring trainee get better at her fitness exercises by reading a web site and watching videos of the personal trainer exercising? And yet some people believe that they can learn computer programming by watching online lectures. (Or offline lectures, for that matter.*)

If practice is the key to success, why do so many people recognize the absurdity of the video-watching, gym-avoiding fitness trainee while at the same time assume that online lectures are the solution to technical education woes?

(Well-designed online instruction programs are much more than lectures, of course; but what most people mean by online education is not what I consider well-designed and typically is an implementation of the dentist model of education.)

The second reason why I'm skeptic (hence the two-thirds share of skepticism) is that the education system has a second component, beyond instruction: it certifies skills and knowledge. (We could debate how well it does this, but certification is one of the main functions of education institutions.)

Certification of a specific skill can be done piecemeal but complex technical fields depend on more than a student knowing the individual skills of the field; they require the ability to integrate across different sub-disciplines, to think like a member of the profession, to actually do things. That's why engineering students have engineering projects, medical students actually treat patients, etc. These are part of the certification process, which is very hard to do online or with short in-campus events, even if we remove questions of cheating from the mix.

There's enormous potential in online education, but it can only be realized by accepting that education is not like a visit to the dentist but rather like a training session at the gym. And that real, certified learning requires a lot of interaction between the education provider and the student: not something like the one-way lectures one finds online.

(This is not to say that there aren't some good online education programs, but they tend to be uncommon.)

Just like the best-equipped gym in the world will do nothing for a lazy trainee, the best online education platform in the world will do nothing for an unmotivated student. But a motivated kid with nothing but a barbell & plates can become a competitive powerlifter and a motivated kid a with a textbook will learn more than the hordes who watch online lectures while tweeting and facebooking.

The key success factor is not technology; it's the student. It always is.

ADDENDUM (Nov 27, 2011): I've received some comments to the effect that I'm just defending universities from the disruptive innovation of entrants. Perhaps, but:

Universities have several advantages over new institutions, especially when so many of these new institutions have no understanding of what technical education requires. If there was a new online way to sell hamburgers would it surprise anyone that McDs and BK were better at doing it than people who are great at online selling engineering but who never made an hamburger in their lives?

This is not to say that there isn't [vast] room to improve in both the online and offline offerings of universities. But it takes a massive dose of arrogance to assume that everything that went before (in regards to education) can be ignored because of a low cost of content distribution.

* For those who never learned computer programming: you learn by writing programs and testing them. Many many many programs and many many many tests. A quick study of the basics of the language in question is necessary, but better done individually than in a lecture room. Sometimes the learning process can be jump-started by adapting other people's programs. A surefire way to not learn how to program is to listen to someone else talk about programming.

Thursday, November 24, 2011

Data cleaning or cherry-picking?

Sometimes there's a fine line between data cleaning and cherry-picking your data.

My new favorite example of this is based on something Nassim Nicholas Taleb said at a talk at Penn (starting at 32 minutes in): that 92% of all kurtosis for silver in the last 40 years of trading could be traced to a single day; 83% of stock market kurtosis could also be traced to one day in 40 years.

One day in forty years is about 1/14,600 of all data. Such a disproportionate effect  might lead some "outlier hunters" to discard that one data point. After all, there are many data butchers (not scientists if they do this) who create arbitrary rules for outlier detection (say, more than four standard deviations away from the mean) and use them without thinking.

In the NNT case, however, that would be counterproductive: the whole point of measuring kurtosis (or, in his argument, the problem that kurtosis is not measurable in any practical way) is to hedge against risk correctly. Underestimating kurtosis will create ineffective hedges, so disposing of the "outlier" will undermine the whole point of the estimation.

In a recent research project I removed one data point from the analysis, deeming it an outlier. But I didn't do it because it was four standard deviations from the mean alone. I found it because it did show an aggregate behavior that was five standard deviations higher than the mean. Then I examined the disaggregate data and confirmed that this was anomalous behavior: the experimental subject had clicked several times on links and immediately clicked back, not even looking at the linked page. This temporally disaggregate behavior, not the aggregate measure of total clicks, was the reason why I deemed the datum an outlier, and excluded it from analysis.

Data cleaning is an important step in data analysis. We should take care to ensure that it's done correctly.

Sunday, November 13, 2011

Vanity Fair bungles probability example

There's an interesting article about Danny Kahneman in Vanity Fair, written by Michael Lewis. Kahneman's book Thinking: Fast And Slow is an interesting review of the state of decision psychology and well worth reading, as it the Vanity Fair article.

But the quiz attached to that article is an example of how not to popularize technical content.

This example, question 2, is wrong:
A team of psychologists performed personality tests on 100 professionals, of which 30 were engineers and 70 were lawyers. Brief descriptions were written for each subject. The following is a sample of one of the resulting descriptions:

Jack is a 45-year-old man. He is married and has four children. He is generally conservative, careful, and ambitious. He shows no interest in political and social issues and spends most of his free time on his many hobbies, which include home carpentry, sailing, and mathematics. 
What is the probability that Jack is one of the 30 engineers?

A. 10–40 percent
B. 40–60 percent
C. 60–80 percent
D. 80–100 percent

If you answered anything but A (the correct response being precisely 30 percent), you have fallen victim to the representativeness heuristic again, despite having just read about it. 
No. Most people have knowledge beyond what is in the description; so, starting from the appropriate prior probabilities, $p(law) = 0.7$ and $p(eng) = 0.3$, they update them with the fact that engineers like math more than lawyers, $p(math|eng) >> p(math|law)$. For illustration consider

$p(math|eng) = 0.5$; half the engineers have math as a hobby.
$p(math|law) = 0.001$; one in a thousand lawyers has math as a hobby.

Then the posterior probabilities (once the description is known) are given by
$p(eng|math) = \frac{ p(math|eng) \times p(eng)}{p(math)}$
$p(law|math) = \frac{ p(math|law) \times p(law)}{p(math)}$
with $p(math) = p(math|eng) \times p(eng) + p(math|law) \times p(law)$. In other words, with the conditional probabilities above,
$p(eng|math) = 0.995$
$p(law|math) = 0.005$
Note that even if engineers as a rule don't like math, only a small minority does, the probability is still much higher than 0.30 as long as the minority of engineers is larger than the minority of lawyers*:
$p(math|eng) = 0.25$ implies $p(eng|math) = 0.991$
$p(math|eng) = 0.10$ implies $p(eng|math) = 0.977$
$p(math|eng) = 0.05$ implies $p(eng|math) = 0.955$
$p(math|eng) = 0.01$ implies $p(eng|math) = 0.811$
$p(math|eng) = 0.005$ implies $p(eng|math) = 0.682$
$p(math|eng) = 0.002$ implies $p(eng|math) = 0.462$
Yes, that last case is a two-to-one ratio of engineers who like math to lawyers who like math; and it still falls out of the 10-40pct category.

I understand the representativeness heuristic, which mistakes $p(math|eng)/p(math|law)$ for $p(eng|math)/p(law|math)$, ignoring the base rates, but there's no reason to give up the inference process if some data in the description is actually informative.

-- -- -- --
* This example shows the elucidative power of working through some numbers. One might be tempted to say "ok, there's some updating, but it will probably still fall under the 10-40pct category" or "you may get large numbers with a disproportionate example like one-half of the engineers and one-in-a-thousand lawyers, but that's just an extreme case." Once we get some numbers down, these two arguments fail miserably.

Numbers are like examples, personas, and prototypes: they force assumptions and definitions out in the open.

Saturday, November 5, 2011

How The Mighty Fall, by Jim Collins

(I'm closing my booknotes blog and reposting some of the notes.)

These are reading notes I made for myself and not to be taken either as a review or a substitute for reading Jim Collins's How The Mighty Fall.

I think it's worth reading it and reflecting upon the ideas therein. I don't think it should be used as a how-to manual without some significant further reflection -- which is also the opinion of the author, as he explains on page 36 when discussing the difference between knowing what versus knowing why.

Academic critics of Collins were very happy when some companies in Good to Great and Built to Last failed miserably. I agree with Collins that their failure is not necessarily proof that the books were wrong (since companies change and don't necessarily follow the prescriptions that were derived from their previous behavior) but rather presents an opportunity to study the decline of great firms.

Discounting some methodological issues with the research, the study is interesting and I think elucidative. According to Collins, decline comes in five phases:

Stage 1: Hubris born of success
Success is viewed as deserved while failure is attributed to bad luck (a version of the fundamental attribution bias); this leads to a sense that one is entitled to success regardless, which in turn becomes arrogance.

Distracted by new ventures and outside threats the company starts neglecting its core value proposition and it grows stale or fails to adapt.

Instead of understanding root causes (WHY) the decision-makers focus on what to do that has worked before (WHAT), possibly ignoring changing underlying environment.

This leads to a decline in learning orientation (which is a consequence also of getting big and having to hire more people).

Stage 2: Undisciplined pursuit of more
Packard's law: Companies die more often of indigestion from pursuing too many opportunities than starvation from too few opportunities to pursue. (Named after HP founder David Packard.)

Quest for growth at any cost mistakes big for great. Leads to a vicious cycle of expectations, their disconfirmation, demotivated personnel. Also propitious to undisciplined discontinuous leaps, leading to non-core businesses that detract from core value of firm.

Easy cash erodes discipline (instead of cost control, focus on revenue building; when counter-cycle hits, costs stay and revenues dwindle). Bureaucracy subverts discipline (people think of jobs rather than responsibilities). Politicking for power as the company becomes larger destroys morale.

Stage 3: Denial of risk and peril
Psychological tendency to amplify the positive and discount the negative turns decisions from "can you show that it is safe" to "can you prove that is unsafe" like with the Challenger launch decision. Wrong framing leads to:

Big bets and bold moves without empirical validation. This brings up the possibility of huge downside risk based on ambiguous data. In particular taking risks that are below the waterline thinking they are above the waterline. (It's a ship analogy: is this risk equivalent to getting a hole in the hull above or below the waterline? the first is a problem that needs to be solved for optimal operation, the second places survivability at risk.)

Team dynamics fall apart as people who see the risks are marginalized by those who focus on the positive side; the latter externalize the blame (the fundamental attribution bias again).

This is the phase when firms start making obsessive reorganization, as if the problems just described were a matter or shuffling people around on an organization chart or merging and splitting departments. This is activity masquerading as useful action, while internal politics overwhelm considerations over outside environment.

Stage 4: Grasping for salvation
The last point of stage three becomes endemic: reorganizations, restructuring and looking for leaders who are magical saviors. The company is in a panic, looking for magic solutions and silver bullets that will stem decline, typically missing the point, which is that they have no actual value proposition for the market and their internal processes are dysfunctional.

Radical change is touted as a solution (which raises the problem that everyone wants change -- the company is in decline -- but most want different types of change than each other). Organization-wide placebo effect (known as the Hawthorne effect) leads to short bursts of improvement followed by long periods of further decline

Lots of hype and hope instead of actual results, leading to confusion and cynicism when the deception becomes clear.

Stage 5: Capitulation to irrelevance or death

Tuesday, November 1, 2011


I found a magic word and it's "less."

On September 27, 2011, I decided to run a lifestyle experiment. Nothing radical, just a month of no non-essential purchases, the month of October 2011. These are the lessons from that experiment.

Separate need, want, and like

One of the clearest distinctions a "no non-essential purchases" experiment required me to make was the split between essential and non-essential.

Things like food, rent, utilities, gym membership, Audible, and Netflix I categorized as essential, or needs. The first three for obvious reasons, the last three because the hassle of suspending them wasn't worth the savings.

A second category of purchases under consideration was wants, things that I felt that I needed but could postpone the purchase until the end of the month. This included things like Steve Jobs's biography, for example. I just collected these in the Amazon wish list.

A third category was likes. Likes were things that I wanted to have but knew that I could easily live without them. (Jobs's biography doesn't fall into this category, as anyone who wants to discuss the new economy seriously has to read it. It's a requirement of my work, as far as I am concerned.) I placed these in the Amazon wish list as well.

Over time, some things that I perceived as needs were revealed as simply wants or even likes. And many wants ended up as likes. This means that just by delaying the decision to purchase for some time I made better decisions.

This doesn't mean that I won't buy something because I like it (I do have a large collection of music, art, photography, history, science, and science fiction books, all of which are not strictly necessary). What it means is that the decision to buy something is moderated by the preliminary categorization into these three levels of priority.

A corollary of this distinction is that it allows me to focus on what is really important in the activities that I engage in. I summarized some results in the following table (click for bigger):

Misplaced priorities (image for blog post)

One of the regularities of this table is that the entries in the middle column (things that are wrongly emphasized) tend to be things that are bought, while entries in the last column (what really matters) tend to be things that are learned or experienced.

Correct accounting focusses on time, not on nominal money

Ok, so I can figure out a way to spend less in things that are not that necessary. Why is this a source of happiness?

Because money to spend costs time and I don't even get all the money.

When I spend one hour working a challenging technical marketing problem for my own enjoyment, I get the full benefit of that one hour of work, in the happiness solving a puzzle always brings me. When I work for one hour on something that I'd rather not be doing for a payment of X dollars, I get to keep about half of those X dollars (when everything is accounted for). I wrote an illustration of this some time ago.

In essence, money to spend comes, at least partially from doing things you'd rather not do, or doing them at times when you'd rather be doing something else, or doing them at locations that you'd rather not travel to. I like the teaching and research parts of my job, but there are many other parts that I do because it's the job. I'm lucky in that I like my job; but even so I don't like all the activities it involves.

The less money I need, the fewer additional things I have to do for money. And, interestingly, the higher my price for doing those things. (If my marginal utility of money is lower, you need to pay more for me to incur the disutility of teaching that 6-9AM on-location exec-ed seminar than you'd have to pay to a alternate version of me that really wants money to buy the latest glued "designer" suit.)

Clarity of purpose, not simply frugality, is the key aspect

I'm actually quite frugal, having never acquired the costly luxury items of a wife and children, but the lessons here are not about frugality, rather about clarity of purpose.

I have a $\$$2000 17mm ultra-wide angle tilt-shift lens on my wishlist, as a want. I do want to buy it, though I don't need it for now. Once I'm convinced that the lens on the camera, rather than my skills as a photographer, is the binding constraint in my photography, I plan to buy the lens. (Given the low speed at which my photography skill is improving, this may be a non-issue. ☺)

Many of our decisions are driven by underlying identity or symbolic reasons; other decisions are driven by narrowly framed problems; some decisions are just herd behavior or influenced by information cascades that overwhelm reasonable criteria; others still are purely hedonic, in-the-moment, impulses. Clarity of purpose avoids all these. I ask:

Why am I doing this, really?

I was surprised at how many times the answer was "erm...I don't know," "isn't everybody?" or infinitely worse "to impress X." These were not reasonable criteria for a decision. (Note that this is not just about purchase decisions, it's about all sorts of little decisions one makes every day, which deplete our wallets but also our energy, time, and patience.)

Clarity of purpose is hard to achieve during normal working hours, shopping, or the multiple activities that constitute a lifestyle. Borrowing some tools designed for lifestyle marketing, I have a simple way to do a "personal lifestyle review" using the real person "me" as the persona used in lifestyle marketing analysis. Adapted from the theory, it is:

1. Create a comprehensive list of stuff (not just material possessions, but relationships, work that is pending, even persons in one's life).

2. Associate the each entry in the stuff to a sub-persona (for non-marketers this means to a part of the lifestyle that is more or less independent of the others).

3. For each sub-persona, determine the activities which have given origin to the stuff.

4. Evaluate the activities using the "clarity of purpose" criterion: why am I doing this?

5. Purge the activities that are purely symbolic and those that were adopted for hedonic reasons but do not provide the hedonic rewards associated with their cost (in money, constraints to life, time, etc), plus any functional activities that are no longer operative.

6. Guide life decisions by the activities that survive the purge. Revise criteria only by undergoing a lifestyle review process, not by spur-of-the-moment impulses.

(This procedure is offered with no guarantees whatsoever; marketers may recognize the underlying structure from lifestyle marketing frameworks with all the consumer decisions reversed.)

Less. It works for me.

A final, cautionary thought: if the ideas I wrote here were widely adopted, most economies would crash. But I don't think there's any serious risk of that.

Monday, October 24, 2011

Thinking skill, subject matter expertise, and information

Good thinking depends on all three, but they have different natures.

To illustrate, I'm going to use a forecasting tool called Scenario Planning to determine my chances of dating Milla Jovovich.

First we must figure out the causal structure of the scenario. The desired event, "Milla and I live happily ever after," we denote by $M$. Using my subject matter expertise on human relationships, I postulate that $M$  depends on a conjunction of two events:
  • Event $P$  is "Paul Anderson – her husband – runs away with a starlet from one of his movies"
  • Event $H$  is "I pick up the pieces of Milla's broken heart"
So the scenario can be described by $P \wedge H \Rightarrow M$. And probabilistically,

$\Pr(M) = \Pr(P) \times \Pr(H).$

Now we can use information from the philandering of movie directors and the knight-in-shining-armor behavior of engineering/business geeks [in Fantasyland, where Milla and I move in the same circles] to estimate $\Pr(P) =0.2$  (those movie directors are scoundrels) and $\Pr(H)=0.1$  (there are other chivalrous nerds willing to help Milla) for a final result of $\Pr(M)=0.02$, or 2% chance.

Of course, scenario planning allows for more granularity and for sensitivity analysis.

We could decompose event $P$  further into a conjunction of two events, $S$  for "attractive starlet in Paul's movies" and $I$  for "Paul goes insane and chooses starlet over Milla." We could now determine $\Pr(P)$  from these events instead of estimating it directly at 0.2 from the marital unreliability of movie directors in general, using $\Pr(P) = \Pr(S) \times \Pr(I).$

Or, going in another direction, we could do a sensitivity analysis. Instead of assuming a single value for $\Pr(P)$ and $\Pr(H)$, we could find upper and lower bounds, say $0.1 < \Pr(P) < 0.3$  and $0.05 < \Pr(H) < 0.15$. This would mean that  $0.005 < \Pr(M) < 0.045$.

Of course, if instead of the above interpretation we had
  • Event $P$  is "contraction in the supply of carbon fiber"
  • Event $H$  is "increase in the demand for lightweight camera tripods and monopods"
  • Event $M$  is "precipitous increase in price and shortages of carbon fiber tennis rackets"
the same scenario planning would be used for logistics management of a sports retailer provisioning.

Which brings us to the three different competencies needed for scenario planning, and more generally, for thinking about something:

Thinking skill is, in this case, knowing how to use scenarios for planning. It includes knowing that the tool exists, knowing what its strengths and weaknesses are, how to compute the final probabilities, how to do sensitivity analysis, and other procedural matters. All the computations above, which don't depend on what the events mean are pure thinking skill.

Subject matter expertise is where the specific elements of the scenario and their chains of causality come from. It includes knowing what to include and what to ignore, understanding how the various events in a specific subject area are related, and understanding the meaning of the events (as opposed to just computing inferential chains like an algorithm). So knowing that movie directors tend to abandon their wives for starlets allows me to decompose the event $P$  into $S$  and $I$  in the Milla example. But only an expert in the carbon fiber market would know how to decompose $P$  when it becomes the event "contraction in the supply of carbon fiber."

Information, in this case, are the probabilities used as inputs for calculation, as long as those probabilities come from data. (Some of these, of course, could be parameters of the scenario, which would make them subject matter expertise. Also, instead of a strict implication we could have probabilistic causality.) For example, the $\Pr(P)=0.2$  could be a simple statistical count of how many directors married to fashion models leave their wives for movie starlets.

Of these three competencies, thinking skill is the most transferrable: knowing how to do the computations associated with scenario planning allows one to do them in military forecasting or in choice of dessert for dinner. It is also one that should be carefully learned and practiced in management programs but typically is not given the importance its real-world usefulness would imply.

Subject matter expertise is the hardest to acquire – and the most valuable – since it requires both acquiring knowledge and developing judgment. It is also very hard to transfer: understanding the reactions of retailers in a given area doesn't transfer easily to forecasting nuclear proliferation. 

Information is problem-specific and though it may cost money it doesn't require either training (like thinking skill) or real learning (like subject matter expertise). Knowing which information to get requires both thinking skill and subject matter expertise, of course.

Getting these three competencies confused leads to hilarious (or tragic) choices of decision-maker. For example, the idea that "smart is what matters" in recruiting for specific tasks ignores the importance of subject matter expertise.*

Conversely, sometimes a real subject matter expert makes a fool of himself when he tries to opine on matters beyond his expertise, even ones that are simple. That's because he may be very successful in his area due to the expertise making up for faulty reasoning skills, but in areas where he's not an expert those faults in reasoning skill become apparent.

Let's not pillory a deceased equine by pointing out the folly of making decisions without information; on the other hand, it's important to note the idiocy of mistaking someone who is well-informed (and just that) for a clear thinker or a knowledgeable expert.

Understanding the structure of good decisions requires separating these three competencies. It's a pity so few people do.

-- -- -- --
* "Smart" is usually a misnomer: people identified as "smart" tend to be good thinkers, not necessarily those who score highly on intelligence tests. Think of intelligence as raw strength and thinking as olympic weightlifting: the first helps the second, but strength without skill is irrelevant. In fact, some intelligent people end up being poor thinkers because they use their intelligence to defend points of view that they adopted without thinking and turned out to be seriously flawed.

Note 1: This post was inspired by a discussion about thinking and forecasting with a real clear thinker and also a subject matter expert on thinking, Wharton professor Barbara Mellers.

Note 2: No, I don't believe I have a 2% chance of dating Milla Jovovich. I chose that example precisely because it's so far from reality that it will give a smile to any of my friends or students reading this.

Saturday, October 22, 2011

There's no consistent realistic utility over wealth

Kind of a technical point, but I'm tired of making it in discussion forums.

Utility over wealth is usually considered concave: an extra 10 dollars make more difference to one's life when one's total wealth is 100 dollars than when it is 100 000 dollars. This is an outcome of optimal use of limited resources: spend scarce dollars on the essentials, abundant ones on frivolities.

On the other hand, response to price is also usually assumed concave: a price increase of 1 dollar leads to a stronger response for a initial price of 10 dollars than for 10 000 dollars.

These two statements are mutually inconsistent, and both are true. In other words, there's no single utility function that captures both, but both phenomena exist in reality. This means that you can't have a utility function over wealth that is true in the world. It's fairly obvious since price is negative wealth, and you can't have a function that is concave both in one of its variables and its symmetric.

Formally, if $u(\cdot)$ is the utility over wealth $w$, the response to price $p$ is given by $u(w-p)$ since paying $p$ decreases the wealth $w$ by that amount. If $u(\cdot)$ is a concave function of its argument, then $u(w-p)$ is concave in $w$ and convex in $p$; if $u(\cdot)$ is a convex function, then $u(w-p)$ is convex in $w$ and concave in $p$. Both concave is an impossible case.

But... but... but... I hear some empirical modelers say, we use a concave utility function over price all the time, typically $\log(p)$.

Yes. I know. That's why I'm writing this post. Of course since price is a negative term in wealth, your "concave" function is actually convex in wealth, as the term enters as $- \log(p)$, a convex function.

Math doesn't lie. All you need to do is pay attention.

Saturday, October 15, 2011

The costly consequences of misunderstanding cost

Apparently there's growing scarcity of some important medicines. And why wouldn't there be?

Some of these medicines are off-patent, some are price-controlled (at least in most of the world), some are bought at "negotiated" prices where one of the parties negotiating (the government) has the power to expropriate the patent from the producer. In other words, their prices are usually set at variable cost plus a small markup.

Hey, says Reggie the regulator, they're making a profit on each pill, so they should produce it anyway.

(Did you spot the error?)

(Wait for it...)

(Got it yet?)

Dear Reggie: pills are made in these things called "laboratories," that are really factories. Factories, you may be interested to know, have something called "capacity constraints," which means that using a production line for making one type of pill precludes that production line from making a different kind of pill. Manufacturers are in luck, though, because most production lines can be repurposed from one medication to another with relatively small configuration cost.

Companies make their decisions based on opportunity costs, not just variable costs. If they have a margin of say 90 cents/pill for growing longer eyelashes (I'm not kidding, there's a "medication" for that) and say 5 cents/pill to cure TB, they are going to dedicate as much of their production capacity to the eyelash-elongating "medication" as they can.* (They won't stop making the TB medication altogether because that would be bad for public relations.)

Funny how these things work, huh?

* Unless they can make more than eighteen times more TB pills than eyelash "medicine" pills with the same production facilities, of course.

Tuesday, October 4, 2011

Books on teaching and presentations

During a decluttering of my place, I had to make decisions about which books to keep; these are some that I found useful for teaching and presentations, and I'm therefore keeping:

Some books I find heplful for teaching and presenting (Blog version)

They are stacked by book size (for stability), but I'll group them in four major topics: general presentation planning and design; teaching; speechwriting; and visuals design.

1. Presentation planning and design

Edward Tufte's Beautiful Evidence is not just about making presentations, rather it's about analyzing, presenting, and consuming evidence.

Lani Arredondo's How to Present Like a Pro is the only "general presentation" book I'm keeping (and I'm still pondering that, as most of what it says is captured in my 3500-word post on preparing presentations). It's not especially good (or bad), it's just the best of the "general presentation" books I have, and there's no need for more than one. Whether I need one given Beautiful Evidence is an open question.

Donald Norman's Living With Complexity and Things That Make Us Smart are not about presentations, rather about designing cognitive artifacts (of which presentations and teaching exercises are examples) for handling complex and new units of knowledge.

Chip and Dan Heath's Made to Stick is a good book on memorability; inasmuch as we expect our students and audiences to take something away from a speech, class, or exec-ed, making memorable cognitive artifacts is an important skill to have.

Steve Krug's Don't Make Me Think is about making the process of interactions with cognitive artifacts as simple as possible (the book is mostly about the web, but the principles therein apply to presentation design as well).

Alan Cooper's The Inmates Are Running The Asylum is similar to Living With Complexity, with the added benefit of explicitly addressing the use of personas for designing complex products (a very useful product design tool for classes, I think).

I had other books on the general topic of presentations that I am donating/recycling. Most of them spend a lot of space discussing the management of stage fright, a problem with which I am not afflicted.

If I had to pick just one to keep, I'd choose Beautiful Evidence. (The others, except How To Present Like a Pro, are research-related, so I'd keep them anyway.)

2. Teaching

As I've mentioned previously, preparing instruction is different from preparing presentations. The two books I recommended then are the two books I'm keeping:

Tools for teaching, by Barbara Gross Davis covers every element of course design, class design, class management, and evaluation. It is rather focussed on institutional learning (like university courses), but many of the issues, techniques, and checklists are applicable in other instruction environments.

Designing effective instruction, by Gary Morrison, Steven Ross, and Jerrold Kemp, complements Tools for teaching. While Tools for Teaching has the underlying model of a course, this book tackles the issues of training and instruction from a professional service point of view. (In short: TfT is geared towards university classes, DEI is geared towards firm-specific Exec-Ed.)

I had other books on the general topic of teaching (and a number of books on academic life) that I am donating/recycling.

3. Speechwriting and public speaking

Speak like Churchill, stand like Lincoln, by James Humes, should be mandatory reading for anyone who ever has to make a public speech. Of any kind. Humes is a speechwriter and public speaker by profession and his book gives out practical advice on both the writing and the delivery. I have read many books on public speaking and this one is in a class of its own.

I have a few books from the Toastmasters series; I'm keeping (for now at least) Writing Great Speeches and Choosing Powerful Words, though their content overlaps a lot with Virginia Tufte's Beautiful Sentences, a book I'm definitely keeping as part of my writing set.

I'm probably keeping Richard Dowis's The Lost Art of The Great Speech as a good reference for styles and as motivation reading. (Every so often one needs to be reminded of why one does these things.)

I have other books on writing, in general, but the ones in the pile above are specific to speechwriting. I'm throwing out a few books on the business of speechwriting; they are so bad that I thought of keeping them as satire. Donating them would be an act of cruelty towards the recipients.

If I had to pick just one book on speechwriting, I'd go with Speak like Churchill, Stand like Lincoln. Hands down the best in the category, and I've read many.

4. Visuals design

Yes, the design of visuals for presentations or teaching, not Visual Design the discipline.

Edward Tufte's books are the alpha and the omega in this category. Anyone with any interest in information design should read these books carefully and reread them often.

The Non-Designer Design Book, by Robin Williams lets us in on the secrets behind what works visually and what doesn't. It really makes one appreciate the importance of what appears at first to be over-fussy unimportant details. I complement this with The Non-Designer Type Book and Robin Williams Design Workshop, the first specifically for type, the second as an elaboration of the Non-Designer Design Book.

Universal principles of design, by William Lidwell, Kristina Holden, and Jill Butler is a quick reference for design issues. I also like to peruse it regularly to get some reminders of design principles. It's organized alphabetically and each principle has a page or two, with examples.

Perhaps I'm a bit focussed on typography (a common symptom of reading design books, I'm told), but Robert Bringhurst's The Elements of Typographic Style is a really good and deeply interesting book on the subject. Much more technical than The Non-Designer Type Book, obviously, and the reason why I hesitate to switch from Adobe CS to iWork for my handouts.

Zakia and Page's Photographic Composition: A visual guide is very useful as a guide to laying out materials for impact. Designing the visual flow of a slide (or a handout) -- when there are options, of course, this is not about "reshaping" statistical charts -- helps tell a story even without narration or animation.

I had some other books on the general topic of slide design, which I am donating. I also have a collection of books on art, photography, and design in general, which affords me a reference library. (That collection I'm keeping.)

If I had to pare down the set further, the last ones I'd give up are the four Tufte books. If forced to pick just one (in addition to Beautiful Evidence, which fills the presentation category above), I'd choose The Visual Display of Quantitative Information, because that's the most germane to the material I cover.

CODA: A smaller set

Not that I'm getting rid of the books in the larger set above (that's the set that I'm keeping), but I think there's a core set of books I should reread at least once a year. Unsurprisingly, those are the same books I'd pick if I really could have only one per category (or one set for the last category):

Final Set Of Books (for blog post)

Note that the Norman, Heath Bros, Krug, Cooper books and my collection of art, photography, and design books are exempted from this choice, as they fall into separate categories: research-related or art. I also have several books on writing (some of them here).

And the books that didn't make the pile at the beginning of the post? Those, which I'm donating or recycling, make up a much larger pile (about 50% larger: 31 books on their way out).

Somewhat related posts:

Posts on presentations in my personal blog.

Posts on teaching in my personal blog.

Posts on presentations in this blog.

My 3500-word post on preparing presentations.

Wednesday, September 28, 2011

What to do about psychological biases? The answer tells a lot... about you.

There are many documented cases of behavior deviating from the normative "rational" prescription of decision sciences and economics. For example, in the book Predictably Irrational, Dan Ariely tells us how he got a large number of Sloan School MBA students to change their choices using an irrelevant alternative.

The Ariely example has two groups of students choose a subscription type for The Economist. The first group was given three options to choose from: (online only, $\$60$); (paper only, $\$120$); or (paper+online, $\$120$). Overwhelmingly they chose the last option. The second group was given two options : (online only, $\$60$) or (paper+online $\$120$). Overwhelmingly they chose the first option.

Since no one chooses the (paper only, $\$120$) option, it should be irrelevant to the choices. However, removing it makes a large number of respondents change their minds. This is what is called a behavioral bias: an actual behavior that deviates from "rational" choice. (Technically these choices violate the Strong Axiom of Revealed Preference.)

(If you're not convinced that the behavior described is irrational, consider the following isomorphic problem: a waiter offers a group of people three desserts: ice cream, chocolate mousse, and fruit salad; most people choose the fruit salad, no one chooses the mousse. Then the waiter apologizes: it turns out there's no mousse. At that point most of the people who had ordered fruit salad switch to ice cream. This behavior is the same -- use some letters to represent options to remove any doubt -- as the one in Ariely's example. And few people would consider the fruit salad to ice-cream switchers rational.)

Ok, so people do, in some cases (perhaps in a majority of cases) behave in "irrational" ways, as described by the decision science and economics models. This is not entirely surprising, as those models are abstractions of idealized behavior and people are concrete physical entities with limitations and -- some argue -- faulty software.

What is really enlightening is how people who know about this feel about the biases.

IGNORE. Many academic economists and others who use economics models try to ignore these biases. Inasmuch as these biases can be more or less important depending on the decision, the persons involved, and the context, this ignorance might work for the economists, for a while. However, pretending that reality is not real is not a good foundation for Science, or even life.

ATTACK. A number of people use the existence of biases as an attack on established economics. This is how science evolves, with theories being challenged by evidence and eventually changing to incorporate the new phenomena. Some people, however, may be motivated by personal animosity towards economics and decision sciences; this creates a bad environment for knowledge evolution -- it becomes a political game, never good news for Science.

EXPLOIT. Books like Nudge make this explicit, but many people think of these biases as a way to manipulate others' behavior. Manipulate is the appropriate verb here, since these people (maybe with what they think is the best of intentions -- I understand these pave the way to someplace...) want to change others' behavior without actually telling these others what they are doing. In addition to the underhandedness that, were this a commercial application, the Nudgers would be trying to outlaw, this type of attitude reeks of "I know better than others, but they are too stupid to agree." Underhanded manipulation presented as a virtue; the world certainly has changed a lot.

ADDRESS AND MANAGE. A more productive attitude is to design decisions and information systems to minimize the effect of these biases. For example, in the decision above, both scenarios could be presented, the inconsistency pointed out, and then a separate part-worth decision could be addressed (i.e. what are each of the two elements -- print and online -- worth separately?). Note that this is the one attitude that treats behavioral biases as damage and finds way to route decisions around them, unlike the other three attitudes.

In case it's not obvious, my attitude towards these biases is to address and manage them.

Sunday, September 18, 2011

Probability interlude: from discrete events to continuous time

Lunchtime fun: the relationship between Bernoulli and Exponential distributions.

Let's say the probability of Joe getting a coupon for Pepsi in any given time interval $\Delta t$, say a month, is given by $p$. This probability depends on a number of things, such as intensity of couponing activity, quality of targeting, Joe not throwing away all junk mail, etc.

For a given integer number of months, $n$, we can easily compute the probability, $P$, of Joe getting at least one coupon during the period, which we'll call $t$, as

$P(n) = 1 - (1-p)^n$.

Since the period $t$  is $t= n \times \Delta t$, we can write that as

$P(t) = 1 - (1-p)^{\frac{t}{\Delta t}}.$

Or, with a bunch of assumptions that we'll assume away,

$P(t) = 1- \exp\left(t \times \frac{\log (1-p)}{\Delta t}\right).$

Note that $\log (1-p)<0$. Defining $r = - \log (1-p) /\Delta t$, we get

$P(t) = 1 - \exp (- r t)$.

And that is the relationship between the Bernoulli distribution and the Exponential distribution.

We can now build continuous-time analyses of couponing activity. Continuous analysis is much easier to do than discrete analysis. Also, though most simulators are, by computational necessity, discrete, building them based on continuous time models is usually simpler and easier to explain to managers using them.

Saturday, September 17, 2011

Small probabilities, big trouble.

After a long – work-related – hiatus, I'm back to blogging with a downer: the troublesome nature of small probability estimation.

The idea for this post came from a speech by Nassim Nicholas Taleb at Penn. Though the video is a bit rambling, it contains several important points. One that is particularly interesting to me is the difficulty of estimating the probability of rare events.

For illustration, let's consider a Normally distributed random variable $P$, and see what happens when small model errors are introduced. In particular we want to how the probability density $f_{P}(\cdot)$ predicted by four different models changes as a function of distance to zero, $x$. The higher the $x$ the  more infrequently the event $P = x$ happens.

The densities are computed in the following table (click for larger):

Table for blog post

The first column gives $f_{P}(x)$ for $P \sim \mathcal{N}(0,1)$, the base case. The next column is similar except that there's a 0.1% increase in the variance (10 basis points*). The third column is the ratio of these densities. (These are not probabilities, since $P$  is a continuous variable.)

Two observations jump at us:

1. Near the mean, where most events happen, it's very difficult to separate the two cases: the ratio of the densities up to two standard deviations ($x=2$) is very close to 1.

2. Away from the mean, where events are infrequent (but potentially with high impact), the small error of 10 basis points is multiplied: at highly infrequent events ($x>7$) the density is off by over 500 basis points.

So: it's very difficult to tell the models apart with most data, but they make very different predictions for uncommon events. If these events are important when they happen, say a stock market crash, this means trouble.

Moving on, the fourth column uses $P \sim \mathcal{N}(0.001,1)$, the same 10 basis points error, but in the mean rather than the variance. Column five is the ratio of these densities to the base case.

Comparing column five with column three we see that similarly sized errors in mean estimation have less impact than errors in variance estimation. Unfortunately variance is harder to estimate accurately than the mean (it uses the mean estimate as an input, for one), so this only tells us that problems are likely to happen where they are more damaging to model predictive abilities.

Column six shows the effect of a larger variance (100 basis points off the standard, instead of 10); column seven shows the ratio of this density to the base case.

With an error of 1% in the estimate of the variance it's still hard to separate the models within two standard deviations (for a Normal distribution about 95% of all events fall within two standard deviations of the mean), but the error in density estimates at $x=7$ is 62%.

Small probability events are very hard to predict because most of the times all the information available is not enough to choose between models that have very close parameters but these models predict very different things for infrequent cases.

Told you it was a downer.

-- -- --

* Some time ago I read a criticism of this nomenclature by someone who couldn't see its purpose. The purpose is good communication design: when there's a lot of 0.01% and 0.1% being spoken in a noisy environment it's a good idea to say "one basis point" or "ten basis points" instead of "point zero one" or "zero point zero one" or "point zero zero one." It's the same reason we say "Foxtrot Universe Bravo Alpha Romeo" instead of "eff u bee a arr" in audio communication.

NOTE for probabilists appalled at my use of $P$  in $f_{P}(x)$ instead of more traditional nomenclature $f_{X}(x)$ where the uppercase $X$ would mean the variable and the lowercase $x$ the value: most people get confused when they see something like $p=\Pr(x=X)$.

Monday, August 29, 2011

Decline and fall of Western Manufacturing - a pessimistic reading of Pisano and Shih (2009)

Those who don't know history are condemned to repeat it.

Unfortunately those of us who do know history get dragged right along with the others, because we live in a world where everything is connected to everything else.

Evolution Of Capabilities – Image for a blog post

Above is my visualization of Pisano and Shih's 2009 Harvard Business Review article "Restoring American Competitiveness." This is a stylized version of a story that has happened in several industries.

Step 1: Companies start outsourcing their manufacturing operations to companies (or countries) which can perform them in a more cost-effective manner. Perhaps these companies/countries have cheaper labor, fewer costly regulations, or less overhead.

Step 2: Isolated from their manufacture, companies lose the skills for process engineering. After all, improving manufacturing processes is a task that depends on continuous experimentation and feedback from the manufacturing process. If the manufacturing process is outsourced, the necessary interaction between manufacturing and process engineers happens progressively more inside the contractor, not the original manufacturer.

Step 3: Without process engineering to motivate it, the original manufacturer (and the companies supporting it in the original country, in the diagram the US) stops investing in process technology development. For example, the companies that developed machine tools for US manufacturers in conjunction with US process engineers now have to so do with Taiwanese engineers in Taiwan, which leads to relocation of these companies and eventually of the skilled professionals.

Step 4: Because of spillovers in technological development between process technologies and product technologies (including the development of an engineering class and engineering support infrastructure), more and more product technology development is outsourced. For example, as fewer engineering jobs are available in the original country, fewer people go to engineering school; the opposite happens in the outsourced-to country, where an engineering class grows. That growth is a spillover that is seldom accounted for.

Step 5: As more and more technology development happens in the outsourced-to country, it captures more and more of the product innovation process, eventually substituting for the innovators in the original manufacturer's country. Part of this innovation may still be under contract with the original manufacturer, but the development of innovation skills in the outsourced-to country means that at some point it will have its own independent manufacturers (who will compete with the original manufacturer).

Pisano and Shih are optimists, as their article proposes solutions to slow, stop, and reverse this process of technological decline of the West (in their case, the US). It's worth a read (it's not free but it's cheaper than a day worth of lattes, m'kay?) and ends in an upbeat note.

I'm less optimistic than Pisano and Shih. Behold:

Problem 1: Too many people and too much effort dedicated to non-wealth-creating activities and too many people and too much effort aimed at stopping wealth-creating activities.

Problem 2: Lack of emphasis in useful skills (particularly STEM, entrepreneurship, and "maker" culture) in education. Sadly accompanied by a sense of entitlement and self-confidence which is inversely proportional to the actual skills.

Problem 3: Too much public discourse (politicians of both parties, news media, entertainment) which vilifies the creation of wealth and applauds the forcible redistribution of whatever wealth is created.

Problem 4: A generalized confusion between wealth and pieces of fancy green paper with pictures of dead presidents (or Ben Franklin) on them.

Problem 5: A lack of priorities or perspective beyond the immediate sectorial interests.

We are doomed!

Monday, August 22, 2011

Preparing instruction is different from preparing presentations

The title bears repeating, as many people confuse instruction and presentation preparation skills and criteria for success: Preparing instruction is different from preparing presentations.

My 3500-word post on preparing presentations is exactly for that purpose, preparing presentations. I could try to write a post for preparing instruction, but it would quickly get to book size. In fact, I recommend several books in this post describing the evolution of information design in my teaching approach. (The most relevant books for teaching are at the addendum to this post.)

I made a diagram depicting my process of preparing for a instruction event (the diagram was for my personal use, but there's no reason not to share it; click for larger):

Preparing Instruction (diagram for blog post)

And, for comparison, the process for preparing presentations:

My presentation preparation approach

Because they look similar, I need to point out that the tools used in each phase of the process are different for presentations and for instruction.

I'm a big fan of participant-centered learning (though not necessarily the HBS cases that people always associate with PCL); the idea is simple: students learn from doing, not from watching the instructor do. So, many of the "materials" (more precisely, most of the time in the "plan with timing" part of the diagram) in an instruction event are audience work: discussions, examples brought by the audience (to complement those brought by the instructor) and exercises. These are not materials that can be used in a speech or a presentation to a large audience.

Also, while a story works as a motivator for both presentations and instruction, I tend to use exercises or problems as motivators for instruction. For example, I start a class on promotion metrics by asking "how do you measure the lift" of some promotional activity, and proceed from there. By making it a management task that they have to do as part of their jobs, I get some extra attention from the audience. Plus, they can immediately see how the class will help them with their jobs.*

There are presentations that are mostly for instruction purposes, and there are parts of instruction events that are presentations. But never mistake one for the other: preparing instruction is different from preparing presentations.

Though so much instruction is so poorly prepared that even the basics of presentation preparation will help make instruction less of a disaster, that's just a step towards instruction-specific preparation.

- - - - - - - - - - - -

*I have a large variety of exercises for each knowledge unit I teach, and they are not all of the form "here's a problem, what's the solution?" Some are of the forms "here's what a company is doing, what are they trying to achieve?" and "here's a problem, here's what the company is doing, what is wrong with that?"

Addendum: Two books on preparation (and delivery) of instruction, from the post describing the evolution of information design in my teaching approach:

Tools for teaching, by Barbara Gross Davis covers every element of course design, class design, class management, and evaluation. It is rather focussed on institutional learning (like university courses), but many of the issues, techniques, and checklists are applicable in other instruction environments.

Designing effective instruction, by Gary Morrison, Steven Ross, and Jerrold Kemp, complements Tools for teaching. While TfT has the underlying model of a class, this book tackles the issues of training and instruction from a professional service point of view. (In short: TfT is geared towards university classes, DEI is geared towards firm-specific Exec-Ed.)

Thursday, August 11, 2011

Why can't copyright "reformers" understand minor business points?

Copyright "reformers," or better yet copyright warriors (they're at war with copyright a lot more than they want to reform it) fail to understand some basic business points; that gets in the way of meaningful discussions about copyright. Which turns all discussions of copyright into politics.

My evidence of the copyright warriors' blind spot is this convenient video of Cory Doctorow giving the keynote address at SIGGRAPH 2011:

I agree that there have been some abuses of copyright law, but the whole address is marred by his failure to see value in anything other than content creation.

Here are some points that anyone with a minimal interest in the business of content would make:

1. Distribution is a value-added activity. This point needs hammering in, because there's a large subset of otherwise intelligent people who believe that distribution is some sort of parasite on the creatives. Because they don't understand the various functions of distribution beyond the physical movement of materials (or bits), they fail to see that financing, promoting, and filtering/bundling have a value of their own.

2. Revenue models are built with many elements. Another point that seems to escape the copyright warriors is that the hardware, the services of the brick-and-mortar bookstore, and customer support are cheap (or free) because the provider funds those services out of the revenues collected otherwise. When deciding whether to launch a new product (and at what price), companies take into account the costs and investments required and all the sources of revenue.

That's why restaurants charge a corkage fee. Yes, you can buy the same Chateau d'Proglos at Safeway for half the price, but the restaurant's markup on the wine is how it subsidizes the price of the meal. Yes, the very expensive meal is still subsidized by the drinks. Copyright warriors want to pay the subsidized price for the hardware and then escape paying the content prices that support that subsidy.* (Economists call this an horizontal externality.)

3. Ecosystems and their components don't fall from the sky. Apple had to develop the iTunes/iPhone/iPad ecosystems by actually paying money to engineers, designers, patent holders, and business consultants. Much of that money went into blind alleys, under the rubric of "acceptable business risk" and "cancelled project." The components of the ecosystem may be cheap to reproduce, but they were expensive to produce. (This problem is much larger in pharmaceuticals.)

When deciding on these investments, spending billions to develop their own chips and millions to design their interfaces, Apple and other companies look at total revenue projections, not just the small markup on hardware (yes, it looks big – to people who have no notion of the cost of capital or the failure rate for internal projects).

4. Corporations are not college bull sessions writ large. A corporation like Apple, Amazon, Microsoft, or Google is bound by a very large body of law and regulation. If a corporation fails to protect its intellectual property, for example, it soon loses its right to control it. Therefore, it's a fiduciary obligation of that corporation's management to use legal means to protect it. It's true that many companies manipulate the legal -- and sometimes the political -- process to their advantage. But failing to protect their copyrights would be a failure of the fiduciary obligation; a dereliction of duty.

Companies have some flexibility given the trade-off between the obligation and public relations, but the only cases that are visible are the ones where the trade-off comes on the side of the obligation; that many companies ignore violations of DMCA is obvious by the number of copies of Handbrake found in macs, for example.

5. There are good reasons for keeping one's ecosystem controlled (as much as possible) beyond revenue creation. For example, Apple's vetting of apps for iPhone/iTouch/iPad is in part a protection of Apple's brand equity, the part of which that says "apple is a safer product to use than others." (See also the footnote about the iOS payment system.)

6. Recent history shows the need for some copy protection. Not that it is very hard to defeat, but without it – consider the free distribution of music in the early days of file sharing – making a living out of content becomes very difficult. In other words, copy protection is necessary to have a consistent revenue model that can fund creative industries. Yes, you can write a book very cheaply, but what about TV shows, blockbuster movies, professionally recorded music, art photographs with professional models or in remote locales?

All these observations stem from the same point, the inability to see value created by other parts of the process involved in the consumption of content. Yes, there are abuses of copyright protection, made for simple revenue enhancement; but treating content as carrying all the value and the rest of the ecosystem as being somehow irrelevant – or as something that exists for the benefit of content creators only – is unbelievably myopic.

A first step towards a rational discussion of copyright is to accept that actions other than content creation have value. Cory still hasn't taken that step.

-- -- -- -- -- --

* Here's the response to the "evil Apple wants 30% of all money spent through their platform," which is the corkage fee-equivalent: this can be avoided simply by having the customers make their purchases through the web site of the provider, even from the iOS product itself.

What that 30% fee is capturing is not just the price of convenience; what iOS is giving to the app is the credibility of "transaction via Apple," as opposed to "transaction via the web page of a company that makes the app." Is it hard to understand where the value comes from in this difference?

Yes, Amazon can complain, but they can easily use the Kindle app to tell their servers what sample the reader just ended, and to put the book sampled at the top of the recommendation list; Apple would probably get into serious trouble legally if it gave Amazon privileged treatment.

NOTE: I'm a fan of Cory's fiction and I agree that there are clear cases of abuse and even political use of copyright for nefarious ends. But to reform copyright we need to accept that businesses exist for the creation and capture of value and that includes businesses built around content. Pretending that the content business is somehow insulated from the rules of economics is myopic and ultimately self-defeating.