Thursday, February 9, 2012

Quantitative thinking: not for everyone. And that's bad.

Not all smart people are quantitative thinkers.

I've noticed that some smart people I know have different views of the world. Not just social, cultural, political, or aesthetic. They really do see the world through different conceptual lenses: mine are quantitative, theirs are qualitative.

Let's keep in mind that these are smart people who, when prompted to do so, can do basic math. But many of them think about the world in general, and most problems in particular, in a dequantified manner or limit their quantitative thinking in ways that don't match their knowledge of math.

Level 1 - Three different categories for quantities

Many people seem to hold the the three-level view of numbers: all quantities are divided into three bins: zero, one, many. In a previous post I explain why it's important to drill down into these categories: putting numbers in context requires, first of all, that the numbers are actual numbers, not categorical placeholders.

This tripartite view of the world is particularly bad when applied to probabilistic reasoning, because the world then becomes a three-part proposition: 0 (never), 50-50 (uncertain, which is almost always treated as the maximum entropy case), or 1 (always).

Once, at a conference, I was talking to a colleague from a prestigious school who, despite agreeing that a probability of 0.5 is different from a probability of 0.95, proceeded to argue his point based on an unstated 50-50 assumption. Knowing that $0.5 \neq 0.95$ didn't have any impact in his tripartite view of the world of uncertainty.

The problem with having a discussion with someone who thinks in terms of {zero, one, many} is that almost everything worth discussing requires better granularity than that. But the person who thinks thusly doesn't understand that it is even a problem.



Level 2 - Numbers and rudimentary statistics

Once we're past categorical thinking, things become more interesting to quantitatively focused people; this, by the way, is where a lot of muddled reasoning enters the picture. After all, many colleagues at this level of thinking believe that, by going beyond the three-category view of numbers, they are "great quants," which only proves the Dunning-Krueger effect applies.

For illustration we consider the relationship between two variables, $x$ and $y$, say depth of promotional cut (as a percentage of price) and promotional lift (as a percentage increase in unit sales due to promotion). Yep, a business example; could be politics or any social science (or science for that matter), but business is a neutral field.

At the crudest level, understanding the relationship between $x$ and $y$ can be reduced to determining whether that relationship exists at all; usually this is done by determining whether variation in one, $x$, can predict variation in the other, $y$.  For example, a company could run a contrast experiment ("A-B test" for those who believe Google invented experiments) by having half their stores run a promotion and half not; the data would then be, say:

Sales in stores without promotion: 200,000 units/store
Sales in stores with promotion: 250,000 units/store

Looks like a relationship, right? An apparent 25-percent lift (without knowing the depth of the price cut I can't comment on whether this is good or bad). But what if the average sales for all stores when there are no promotions on any store is 240,000 units/store? All this promotion apparently did was discourage some customers in the stores without promotions (the customers know about the promotion in other stores because you cannot stop information for diffusing over social media, for example) and incentivize a few of the discouraged to look for the stores running the promotion.

(A lot of anecdotes used to support public policy make the sort of mistake I just illustrated. There are plenty of other mistakes, too.)

To go beyond the simple observation of numbers and to use statistical tests, we need to have some formulation of the relationship, for example a linear one such as:

$\qquad y = \beta \, x + \epsilon$.

This formulation includes a term $\epsilon$ (called stochastic disturbance) which is the modeler's admission that we don't know everything we'd like to. (All tests have an underlying structure, even non-parametric tests; when people say that there's no structure what they are really saying is that they don't understand how the test works.)

Given some pairs of observations $\{(x_1,y_1), (x_2,y_2),\ldots\}$ , the relationship can be tested by estimating the parameter $\beta$ and determining whether the estimate $\hat \beta$ is significantly different from zero. If it's not, that means that the value of $y$ is statistically independent of $x$ (to the level of the test) and there is no relationship between them -- as far as statistical significance is concerned.

There's a lot to argue about significance testing, some of which I put in this video:


Once we get past simple tables and possibly the prepackaged statistical tests that can be done on these tables -- almost like an incantation with statistical software taking the place of the magical forces--, few people remain who want to discuss details. But even within that small set, there are many different sub-levels of thinking.



Level 3 - Thinking in models and functions

Let's go back to the linear formulation in $y = \beta \, x + \epsilon$. What this means is that lift $y$ increases with price cut $x$ in a proportional way, independent of the magnitudes of each.

Ok, so what? ask a lot of people whose level of numerical reasoning is being stretched. The "what" is that the effect of a change of price cut from 4 to 5 percent is assumed to be equal to that effect of the change from 45 to 46 percent. And this assumption is probably not true (actually, empirically we have evidence that this is not true).

Many people are able to repeat the rationale in the previous paragraph, but don't grok the implications.

The questions of where we go from this simple model are complicated. Let us ignore questions of causality for now, and focus on how different people want perceive the importance of details in the relationship between $x$ and $y$.

Increasing vs decreasing. Almost everyone who gets to this level of thinking cares about the direction of the effect. At this stage, however, many people forget that functions may be monotonic (increasing or decreasing) over an interval while outside that interval they may become non-monotonic (for example, increasing until a given point and then decreasing).

Convex versus concave. Even when the function is monotonic over the interesting domain, there's a big difference between linear, convex, and concave functions. Some disagreements with very smart people turned out to be over different assumptions regarding this second derivative: implicitly many people act as if the world is either linear or concave (assuming that the effect of adding 1 to 10 is bigger than the effect of adding 1 to 1000). As I pointed out in this post about network topologies and this post about models, combinatorics has a way of creating convexities. There's also a lot of s-shaped relationships in the world, but we'll leave those alone for now.

Functional form. As I illustrated in my post on long tails, two decreasing convex functions (the probability mass functions of the Poisson and Zipf distributions) can have very important differences. Empirical researchers are likely to care more about this than theoretical modelers, but once we reach the stage where we are discussing in these terms (and the group of people who can follow and participate in this discussion) arguments tend to be solved by mathematical inference or model calibration. In other words, leaving personal issues and inconvenient implications aside.

(Needless to say -- but I'll write it anyway -- this is the level of discussion I'd like to have when consequences are important. Alas, it's not very common; certainly not in the political or social sciences arena. In business and economics it's becoming more common and in STEM it's a foundation.)

Elaboration is still possible. I'll illustrate by noting that underlying assumptions (that I never made explicit, mind you) can come back to bite us in the gluteus maximus.

(Non-trivial statistics geekdom follows; skip till after the next picture to avoid some technical points about model building.)

Let's assume that we collect and store the data disaggregate by customer, so that $y_i$ is the quantity (not lift) bought by customer $i$; after all, we can always make aggregate data from disaggregate data but seldom can do the opposite. How would we analyze this data?

First observation: expenditures per customer are greater than zero, always. But our model might predict, for some values of $\epsilon$ a negative prediction for $y_i$  times price (which is a positive number). So our model needs to be tweaked to take into account the hard bound at zero.

If ours were retail stores, where the data collected by the PoS scanners is only available for customers who buy something (in other words, we don't observe $y$ when $y=0$), we would have to use a technique called a censored regression; if we observe the zeros (like on a online retail site), then a model called Tobit will account for the pooling of the probability mass at zero.

Second observation: the number of units bought by any given customer is an integer; we keep treating it as a continuous quantity. Typically regression models and their variants like censored regression and Tobit assume that the stochastic disturbances are Normal variables. That would lead to possible $y_i = 1.35$, which is nonsensical in our new data: $y_i \in \{0,1,2,3,\ldots\}$.

Counting models, like a Poisson regression (which has its own assumptions) take the discreteness into account and correct the problems introduced by the continuity assumption. In olden days (when? the 50s?) these were hard models to estimate but now they are commonly included in statistical packages so there is no reason not to use them.

For illustration, here's what these models look like:

Illustrating model differences: OLS, Tobit, and Poisson




Conclusion - why is it so hard to explain these things?

Thinking quantitatively is like a super-power: where others know of phenomena, we know how much of a phenomenon.*

The problem is that this is not like a amplifier super-power, like telescopic vision is to vision, but rather an orthogonal super-power, like the ability to create multiple instances of oneself. It's hard to explain to people without the super-power (people who don't think in numbers, even though they're smart) and it's hard to understand their point of view.

Contrary to the tagline of the television show Numb3rs, not everyone thinks in numbers.

That's a pity.


-- -- -- --
* A tip of the hat to Dilbert creator Scott Adams, via Ilkka Kokkarinen's blog for pointing this out in a post which is now the opening chapter of his book.

Saturday, January 28, 2012

Four thoughts about digital content

Following every writer's advice on writing, which is to write often and about anything, I've been writing on my online scrapbook: from a short piece on sock strategy to a mid-sized piece on cargo pants to a large piece defending Apple to a series of pieces on digital content; that series spawned this post.



1. Digital content simplifies travel immensely

Comparing a packing list from 2000 to a packing list for 2012, I noticed that I now carry much less stuff and yet take much more content. But, in the post I originally wrote about this, I made the mistake of using the 2000 mindset to plan my 2012 content.

A single one-terabyte portable hard drive, smaller and lighter than one of the paperback scifi novels I carried in 2000, can take a vast library of music, podcasts, audiobooks, eBooks, television shows, and movies. I make sure that I take the content I want, then add as much as will fit.

Taking 200 eBooks, 100 audiobooks, 10 000 music tracks, and 300 videos, over one-hour each, for a four-week work trip might appear greedy -- especially since it's a work trip. But the point is that these represent options, not choices. While a choice is something you have to live with, an option is something you may use or not. And these are costless options, so no reason to not take them.

And, in keeping with the new social sharing meme: since there's a long stretch of travel involved, I'll be loading audiobooks for that in the iPod; if I were traveling tomorrow I'd listen to The Origins of Political Order from Pre-human Times to the French Revolution by Francis Fukuyama. I'd also load two Kindle books, as audiobooks aren't suited for airport noise: The Cemetery Of Prague by Umberto Eco and The Enchantress of Florence by Salman Rushdie. (Note: since I'm not traveling tomorrow, I availed myself of a dead tree copy of the latter book from the San Francisco Public Library; which brings up the next point.)



2. The economics of digital content create wealth for everyone

The free library one can build with iBooks, Kindle, and PDF from legal sources is comparable to some of the best libraries a wealthy person could own during the Gilded Age. Many technical books are also available as preprints from their authors.

There's a wealth of education and training opportunities available for free. For those who have no computers of their own, there are these buildings called "Public Library," which provide the computer and the internet access. This truly is an age of digital abundance.

So, when some technophobes start spouting nonsense like "the digital economy is broadening the chasm between the haves and the have-nots," all they're showing is their ignorance and a narrow focus on nominal dollars and who can buy the largest yacht or collectible $300 sneakers.

(This point is a special case of technological progress, which makes the lower middle class of 2012 much better off than those in the top 1% of income category in 1912 -- two words: modern dentistry, stealing from P J O'Rourke -- but the economics of digital content, namely negligible reproduction cost, make it an important special case.)



3. Changes to education due to digital content are overstated

Not to repeat myself over the problems of online education or the different components of education (prompted by the Kenan-Flagler Online MBA), but there are multiple components to education, only one and a half of which are covered by online materials.

First, there's the content side: learning how to program in R, for example. Motivated students can learn content from online sources; that's no surprise, since motivated students have always been able to learn from a precursor of online video textbooks (which is what lectures are): paper textbooks. And practice, of course, which may be tricky for some technical fields -- chemistry and nuclear physics come to mind -- but much less so for others. That's the one in "one and a half."

Second, there's certification of knowledge. Now that MIT decided that it would certify some courses for a small fee, that seems to be taken care of. But only in part (hence the "half"), since part of what the education system certifies in not purely content: getting to class (or at least labs) on time, performing consistently over long periods, in general doing things that one would rather not be doing. From a job market perspective, there's value in knowing whether a job candidate can do these things. Some people fret that education is more about fitting in than standing out, but for many jobs that's precisely what is desired of a new hire. This is part of the selection and screening made by education.

Third, there are skills beyond content that can only be obtained and observed in in-person interactions, like discussing and presenting, for example. Add to that the value of having a well-rounded foundation and, for technical fields, a problem-solving attitude. These are all things that can be observed in a few months on the job, but having an education institution do them first saves employers a lot of potential grief, especially if it's difficult to get rid of some people after you hire them (cf: Eurosclerosis).

Fourth, education institutions create networks of shared cultural values and experiences (observe the bonding between late middle age adolescents and late teenage adolescents at homecoming football games) and contacts, which are useful in later life. Networks on Facebook, Twitter, LinkedIn, Meetup, and Google Plus are useful for other purposes, but aren't complete replacements of a real social network. As in a network of people who have done things for each other.

Fifth, despite how much credentialism, nepotism, clique-ism, groupthink, and outright intellectual fraud can be laid at the foot of universities, for technical skills it's still the only really reliable source of information. The people who dedicate themselves to research in a technical field are forced to take multiple levels of abstraction into consideration (from the intro courses every researcher has to teach occasionally to the graduate seminars to the "where is this field going" moments of self-reflection) in their normal course of work. This perspective is uncommon in any other institution, and the only place where über-nerds of any field can be reliably found is the research institute or the university (where the nerds teach).



4. We still need to find a solution for the copyright problem

For all my love of open content and free software, and my loathing of the ridiculous means by which content providers hamper their own content and value proposition to get minor revenue enhancements, I have no illusions that content quality will be maintained without some protection of creators' rights.

Yes, anyone can write a book for a handful of dollars and some videos of cats playing piano have millions of views. But art photos that require travel or models, professionally recorded music (say the Berliner Philarmoniker doing a Beethoven symphony series), movies and television shows with good production values and actors, all of these cost a lot of money to make. And professional writers need to be paid -- some of them don't want to rely on public speaking or other non-writing forms of revenue. They want to write for a living. Some free content may be very good, but that mostly is paid for in some other way -- like the aforementioned free preprints of technical books. In general, good stuff means expensive to create, despite how cheap it may be to reproduce.

I for one don't want future Hawaii Five-O seasons to be fan-fic, amateur-made video snippets with Comicon rejects playing Grace Park's role.

So, I'm fine with buying the DVDs, especially since I like to listen to the commentary tracks; I find it ridiculous that ripping them with Handbrake would be a violation of the DMCA, even if I did so just to watch movies I own on my iPad. (Many DVDs now come bundled with a digital copy to appease people like me who are -- mostly -- on the side of content creators but find the revenue model intromissions in the consumption almost worthy of a switch to the content pirates side.)



A bonus point: I trust the cloud, but only as a last-resort back-up.

My work content is divided into three priority levels: Crucial, Important, and the rest. All work content is on the laptop hard drive, backed up on the portable hard drive. (All the entertainment content mentioned in point 1 is on the portable hard drive too, with the content I think I really want to consume during the trip on the laptop, the iPad, and the iPod Touch as well. Obviously these aren't the only copies I have of that content.)

Important content, about 11GB of class materials, is also backed up on four 16GB flash drives: two in my pockets, one on the laptop bag, and one in the rolling carry-on.

Crucial content, about 1GB of class materials (handouts, notes, essential images) and 500MB of research-in-progress (papers, notes, computations, code, experimental data) also goes on those 16GB flash drives plus two older 2GB drives and is backed up on DropBox and Amazon Cloud Drive.

Someone told me the 3-2-1 theory of backups: three copies, two formats, one off-site. I think the Many-Many-Many approach is better. And the cloud, that's all fine and dandy, but I want a local copy. For luck, say.

Thursday, January 19, 2012

A tale of two long tails

Power law (Zipf) long tails versus exponential (Poisson) long tails: mathematical musings with important real-world implications.

There's a lot of talk about long tails, both in finance (where fat tails, a/k/a kurtosis, turn hedging strategies into a false sense of safety) and in retail (where some people think they just invented niche marketing). I leave finance for people with better salaries brainpower, and focus only on retail for my examples.

A lot of money can be made serving the customers on the long tail; that much we already knew from decades of niche marketing. The question is how much, and for this there are quite a few considerations; I will focus on the difference between exponential decay (Poisson) long tails and hyperbolic decay (power law) long tails and how that difference would impact different emphasis on long tail targeting (that is, how much to invest going after these niche customers), say for a bookstore.

A Poisson distribution over $N\ge 0$ with parameter $\lambda$ has pdf:

$ \Pr(N=n|\lambda) =\frac{\lambda^{n}\, e^{-\lambda}}{n!}$.

A discrete power law (Zipf) distribution for $N\ge 1$ with parameter $s$ is given by:

$ \Pr(N=n|s) =\frac{n^{-s}}{\zeta(s)},$

where $\zeta(s)$ is the Riemann zeta function; note that it's only a scaling factor given $s$.

A couple of observations:

1. Because the power law has $\Pr(N=0|s)=0$, I'll actually use a Poisson + 1 process for the exponential long tail. This essentially means that the analysis would be restricted to people who buy at least one book. This assumption is not as bad as it might seem: (a) for brick-and-mortar retailers, this data is only collected when there's an actual purchase; (b) the process of buying a book at all -- which includes going to the store -- may be different from the process of deciding whether to buy a given book or the number of books to buy.

2. Since I'm not calibrating the parameters of these distributions on client data (which is confidential), I'm going to set these parameters to equalize the means of the two long tails. There are other approaches, for example setting them to minimize a measure of distance, say the Kullback-Leibler divergence or the mean square error, but the equal means is simpler.

The following diagram compares a Zipf distribution with $s=3$ (which makes $\mu=1.37$) and a 1 + Poisson process with $\lambda=0.37$ (click for larger):

Long tails example for blog post

The important data is the grey line, which maps into the right-side logarithmic scale: for all the visually impressive differences in the small numbers $N$ on the left, the really large ratios happen in the long tail. This is one of the issues a lot of probabilists point out to practitioners: it's really important to understand the behavior at the small probability areas of the distribution support, especially if they represent -- say -- the possibility of catastrophic losses in finance or the potential for the customers who buy large numbers of books.

An aside, from Seth Godin, about the importance of the heavy user segment in bookstores:

Amazon and the Kindle have killed the bookstore. Why? Because people who buy 100 or 300 books a year are gone forever. The typical American buys just one book a year for pleasure. Those people are meaningless to a bookstore. It's the heavy users that matter, and now officially, as 2009 ends, they have abandoned the bookstore. It's over.

To illustrate the importance of even the relatively small ratios for a few books, this diagram shows the percentage of purchases categorized by size of purchase:

Long tails example for blog post

Yes, the large number of customers who buy a small number of books still gets a large percent of the total, but each of these is not a good customer to have: elaborating on Seth's post, these one-book customers are costly to serve, typically will buy a heavily-discounted best-seller and are unlikely to buy the high-margin specialized books, and tend to be followers, not influencers of what other customers will spend money on (so there are no spillovers from their purchase).

The small probabilities have been ignored long enough; finance is now becoming weary of kurtosis, marketing should go back to its roots and merge niche marketing with big data, instead of trying to reinvent the well-know wheel.

Lunchtime addendum: The differences between the exponential and the power law long tail are reproduced, to a smaller extent, across different power law regimes:

Comparing Power Law Regimes (for blog post)

Note that the logarithmic scale implies that the increasing vertical distances with $N$ are in fact increasing probability ratios.

- - - - - - - - -

Well, that plan to make this blog more popular really panned out, didn't it? :-)

Wednesday, December 21, 2011

Powerful problems with power law estimation papers

Perhaps I shouldn't try to make resolutions: I resolved to blog book notes till the end of the year, and instead I'm writing something about estimation.

A power law is a relationship of the form $y = \gamma_0 x^{\gamma_1}$ and can be linearized for estimation using OLS (with a very stretchy assumption on stochastic disturbances, but let's not quibble) into

$\log(y) = \beta_0 + \beta_1 \log(x) +\epsilon$,

from which the original parameters can be trivially recovered:

$\hat\gamma_0 = \exp(\hat\beta_0)$ and $\hat\gamma_1 = \hat\beta_1$.

Power laws are plentiful in Nature, especially when one includes the degree distribution of social networks in a – generous and uncommon, I admit it – definition of Nature. An usually proposed source of power law degree distribution is preferential attachment in network formation: the probability of a new node $i$ being connected to an old node $j$ is an increasing function of the degree of $j$.

The problem with power laws in the wild is that they are really hard to estimate precisely, and I got very annoyed at the glibness of some articles, which report estimation of power laws in highly dequantized manner: they don't actually show the estimates or their descriptive statistics, only charts with no error bars.

Here's my problem: it's well-known that even small stochastic disturbances can make parameter identification in power law data very difficult. And yet, that is never mentioned in those papers. This omission, coupled with the lack of actual estimates and their descriptive statistics, is unforgivable. And suspicious.

Perhaps this needs a couple of numerical examples to clarify; as they say at the end of each season of television shows now:

– To be continued –

Tuesday, December 20, 2011

Marginalia: Writing in one's books

I've done it for a long time now, shocking behavior though it is to some of my family and friends.

WHY I make notes

Some of my family members and friends are shocked that I write in my books. The reasons to keep the books in pristine condition vary from maintaining resale value (not an issue for me, as I don't think of books as transient presences in my life) to keeping the integrity of the author's work. Obviously, if I had a first edition of Newton's Principia, I wouldn't write on in; the books I write on are workaday copies, many of them cheap paperbacks or technical books.

The reason why I makes notes is threefold:

To better understand the book as I read it. Actively reading a book, especially a non-fiction or work book, is essentially a dialog between the book and the knowledge I can access, both in my mind and in outside references. Deciding what is important enough to highlight and what points deserve further elaboration in the form of commentary or an example that I furnish, makes reading a much more immersive experience than simply processing the words.

To collect my ideas from several readings (I read many books more than once) into a place where they are not lost. Sometimes points from a previous reading are more clarifying to me than the text itself, sometimes I disagree vehemently with what I wrote before.

To refer to later when I need to find something in the book. This is particularly important in books that I read for work, in particular for technical books where many of the details have been left out (for space reasons) but I added notes that fill those in for the parts I care about.


WHAT types of notes I make

In an earlier post about marginalia on my personal blog I included this image (click for bigger),

For blog post

showing some notes I made while reading the book Living With Complexity, by Donald Norman. These notes fell into six cases:

Summaries of the arguments in text. Often texts will take long circuitous routes to get to the point. (Norman's book is not one of these.) I tend to write quick summaries, usually in implication form like the one above, that cut down the entropy.

My examples to complement the text. Sometimes I happen to know better examples, or examples that I prefer, than those in the book; in that case I tend to note them in the book so that the example is always connected to the context in which I thought of it. This is particularly useful in work books (and papers, of course) when I turn them into teaching or executive education materials.

Comparisons with external materials. In this case I make a note to compare Norman's point about default choices with the problems Facebook faced in similar matters regarding its privacy.

Notable passages. Marking funny passages with smiley faces and surprising passages with an exclamation point helps find these when browsing the book quickly. Occasionally I also mark passages for style or felicitous turn of phrase, typically with "nice!" on the margin.

Personal commentary. Sometimes the text provokes some reaction that I think is work recording in the book. I don't write review-like commentary in books as a general rule, but I might note something about missing or hidden assumptions, innumeracy, biases, statistical issues; I might also comment positively on an idea, for example, that I had never thought of except for the text.

Quotable passages. These are self-explanatory and particularly easy to make on eBooks. Here's one from George Orwell's Homage To Catalonia:
The constant come-and-go of troops had reduced the village to a state of unspeakable filth. It did not possess and never had possessed such a thing as a lavatory or a drain of any kind, and there was not a square yard anywhere where you could tread without watching your step. (Chapter 2.)

A few other types of marginalia that I have used in other books:

Proofs and analysis to complement what's in the text. As an example, in a PNAS paper on predictions based on search, the authors call $\log(y) = \beta_0 + \beta_1 \log(x)$ a linear model, with the logarithms used to account for the skewness of the variables. I inserted a note that this is clearly a power law relationship, not a linear relationship, with the two steps of algebra that show $y = e^{\beta_0} \times x^{\beta_1}$, in case I happen to be distracted when I reread this paper and can't think through the baby math.

Adding missing references or checking the references (which sometime are incorrect, in which case I correct them). Yep, I'm an academic nerd at heart; but these are important, like a chain of custody for evidence or the provenance records for a work of art.

Diagrams clarifying complicated points. I do this in part because I like visual thinking and in part because if I ever need to present the material to an audience I'll have a starting point for visual support design.

Data that complements the text. Sometimes the text is dequantized and refers to a story for which data is available. I find that adding the data to the story helps me get a better perspective and also if I ever want to use the story I'll have the data there to make a better case.

Counter-arguments. Sometimes I disagree with the text, or at least with the lack of feasible counter-arguments (even when I agree with a position I don't like that the author presents the opposing points of view only in strawman form), so I write the counter-arguments in order to remind me that they exist and the presentation in the text doesn't do them justice.

Markers for things that I want to get. For example, while reading Ted Gioia's The History of Jazz, I marked several recordings that he mentions for acquisition; when reading technical papers I tend to mark the references I want to check; when reading reviews I tend to add things to wishlists (though I also prune these wishlists often).


HOW to make notes

A few practical points for writing marginalia:

Highlighters are not good for long-term notes. They either darken significantly, making it hard to read the highlighted text, or they fade, losing the highlight. I prefer underlining with a high contrast color for short sentences or segments or marking beginning and end of passages on the margin.

Margins are not the only place. I add free-standing inserts, usually in the form of large Post-Its or pieces of paper. Important management tip: write the page number the note refers to on the note.

Transcribing important notes to a searchable format (a text file on my laptop) makes it easy to find stuff later. This is one of the advantages of eBooks of the various types (Kindle, iBook, O'Reilly PDFs), making it easy to search notes and highlights.

Keeping a commonplace book of felicitous turns of phrase (the ones in the books and the ones I come up with) either in a file or on an old-style paper journal helps me become a better writer.


-- -- -- --

Note: This blog may become a little more varied in topics as I decided to write posts more often to practice writing for a general audience. After all, the best way to become a better writer is to write and let others see it. (No comments on the blog, but plenty of ones by email from people I know.)

Monday, December 12, 2011

How many possible topologies can a N-node network have?

Short answer, for an undirected network: $2^{N(N-1)/2}$.

Essentially the number of edges is $N(N-1)/2$ so the number of possible topologies is two raised to the number of edges, capturing every possible case where an edge can either be present or absent. For a directed network the number of edges is twice that of those in an undirected network so the number of possible topologies is the square (or just remove the $/2$ part from the formula above).

To show how quickly things get out of control, here are some numbers:

$N=1 \Rightarrow 1$ topology
$N=2 \Rightarrow 2$ topologies
$N=3 \Rightarrow 8$ topologies
$N=4 \Rightarrow 64$ topologies
$N=5 \Rightarrow 1024$ topologies
$N=6 \Rightarrow 32,768$ topologies
$N=7 \Rightarrow 2,097,152$ topologies
$N=8 \Rightarrow 268,435,456$ topologies
$N=9 \Rightarrow 68,719,476,736$ topologies
$N=10 \Rightarrow 35,184,372,088,832$ topologies
$N=20 \Rightarrow 1.5693 \times 10^{57}$ topologies
$N=30 \Rightarrow 8.8725 \times 10^{130}$ topologies
$N=40 \Rightarrow 6.3591 \times 10^{234}$ topologies
$N=50 \Rightarrow 5.7776 \times 10^{368}$ topologies

This is the reason why any serious analysis of a network requires the use of mathematical modeling and computer processing: our human brains are not equipped to deal with this kind of exploding complexity.

And for the visual learners, here's a graph denoting the pointlessness of trying to grasp network topologies "by hand" (note logarithmic vertical scale):

Number of network topologies as a function of the number of nodes

Saturday, December 3, 2011

Why I'm not a fan of "presentation training"

Because there are too many different types of presentation for any sort of abstract training to be effective. So "presentation training" ends up – at best – being "presentation software training."

Learning about information design, writing and general verbal communication, stage management and stage presence, and operation of software and tools used in presentations may help one become a better presenter. But, like in so many technical fields, all of these need some study of the foundations followed by a lot of field- and person-specific practice.

I recommend Edward Tufte's books (and seminar) for information design; Strunk and White's The Elements of Style, James Humes's Speak like Churchill, Stand like Lincoln, and William Zinsser's On Writing Well for verbal communication; and a quick read of the manual followed by exploration of the presentation software one uses. I have no recommendations regarding stage management and stage presence short of joining a theatre group, which is perhaps too much of a commitment for most presenters.

I have already written pretty much all I think about presentation preparation; the present post is about my dislike of "presentation training." To be clear, this is not about preparation for teaching or training to be an instructor. These, being specialized skills – and typically field-specific skills – are a different case.


Problem 1: Generic presentation training is unlikely to help any but the most incompetent of presenters

Since an effective presentation is one designed for its objective, within the norms of its field, targeted to its specific audience, and using the technical knowledge of its field, what use is it to learn generic rules, beyond the minimum of information design, clarity in verbal expression, and stage presence?

(My understanding from people who have attended presentation training is that there was little about information design, nothing about verbal expression, and just platitudes about stage presence.)

For someone who knows nothing about presentations and learns the basics of operating the software, presentation training may be of some use. I think Tufte made this argument: the great presenters won't be goaded into becoming "death by powerpoint" presenters just because they use the software; the terrible presenters will be forced to come up with some talking points, which may help their presentations be less disastrous. But the rest will become worse presenters by focussing on the software and some hackneyed rules – instead of the content of and the audience for the presentation.


Problem 2: Presentation trainers tend to be clueless about the needs of technical presentations

Or, the Norman Critique of the Tufte Table Argument, writ large.

The argument (which I wrote as point 1 in this post) is essentially that looking at a table, a formula, or a diagram as a presentation object – understanding its aesthetics, its information design, its use of color and type – is very different from looking at a table to make sense of the numbers therein, understand the implications of a formula to a mathematical or chemical model, and interpret the implications of the diagram for its field.

Tufte, in his attack on Powerpoint, talks about a table but focusses on its design, not how the numbers would be used, which is what prompted Donald Norman to write his critique; but, of all the people who could be said to be involved in presentation training, Tufte is actually the strongest advocate for content.

The fact remains that there's a very big difference between technical material which is used as a prop to illustrate some presentation device or technique to an audience which is mostly outside the technical field of the material and the same material being used to make a technical point to an audience of the appropriate technical field.

Presentation training, being generic, cannot give specific rules for a given field; but those rules are actually useful to anyone in the field who has questions about how to present something.


Problem 3: Presentation training actions are typically presentations (lectures), which is not an effective way to teach technical material

The best way to teach technical material is to have the students prepare by reading the foundations (or watching video on their own, allowing them to pace the delivery by their own learning speed) and preparing for a discussion or exercise applying what they learned.

This is called participant-centered learning; it's the way people learn technical material. Even in lecture courses the actual learning only happens when the students practice the material.

Almost all presentation training is done in lecture form, delivered as a presentation from the instructor with question-and-answer periods for the audience. But since the audience doesn't actually practice the material in the lecture, they may have only questions of clarification. The real questions that appear during actual practice don't come up during a lecture, and those are the questions that really need an answer.


Problem 4: Most presentation training is too narrowly bracketed

Because it's generic, presentation training misses the point of making a presentation to begin with.

After all, presentations aren't made in a vacuum: there's a purpose to the presentation (say, report market research to decision-makers), an audience with specific needs (product designers who need to understand the parameters of the consumer choice so they can tweak the product line), supporting material that may be used for further reference (a written report with the details of the research), action items and metrics for those items (follow-up research and a schedule of deliverables and budget), and other elements that depend on the presentation.

There's also the culture of the organization which hosts the presentation, disclosure and privacy issues, reliability of sources, and a host of matters apparently unrelated to a presentation that determine its success a lot more than the design of the slides.

In fact, the use of slides, or the idea of a speaker talking to an audience, is itself a constraint on the type of presentations the training is focussed on. And that trains people to think of a presentation as a lecture-style presentation. Many presentations are interactive, perhaps with the "presenter" taking the position of moderator or arbitrator; some presentations are made in roundtable fashion, as a discussion where the main presenter is one of many voices.

Some time ago, I summarized a broader view of a specific type of presentation event (data scientists presenting results to managers) in this diagram, illustrating why and how I thought data scientists should take more care with presentation design (click for larger):

Putting some thought into presentations - backward induction approach

(Note that this is specific advice for people making presentations based on data analysis to managers or decision-makers that rely on the data analysis for action, but cannot do the analysis themselves. Hence the blue rules on the right to minimize the miscommunication between the people from two different fields. This is what I mean by field-specific presentation training.)



These are four reasons why I don't like generic presentation training. Really it's just one: generic presentation training assumes that content is something secondary, and that assumption is the reason why we see so many bad presentations to begin with.


NOTE: Participant-centered learning is a general term for using the class time for discussion and exercises, not necessarily for the Harvard Case Method, which is one form of participant-centered learning.


Related posts:

Posts on presentations in my personal blog.

Posts on teaching in my personal blog.

Posts on presentations in this blog.

My 3500-word post on preparing presentations.