Sunday, May 29, 2011

Angelina Jolie shows problem with some economic models

Watching Megamind, I'm reminded of an old Freakonomics post about voice actors. It was very educational: it showed how having a model for something could make smart people say dumb things.

The argument went as follows: because voice actors are not seen, producers who pay a premium to use Angelina Jolie instead of some unknown voice actor are using the burning money theory of advertising: by destroying a lot of money arbitrarily, they signal their confidence in the value of their product to the market; after all, if the product was bad, they'd never make that lost money back. (Skip two blue paragraphs to avoid economics geekery.)

As models go, the burning money theory of advertising is full of holes: it's based on inference, which means that the equilibrium depends on beliefs off the equilibrium path; there's a folk theorem over games with uncertainty that shows any outcome on the convex hull of the individually-rational outcomes can be an equilibrium; the model works for some equilibrium concepts, like Bayesian Perfect Equilibrium, but not others, like Trembling-Hand Perfection; and it makes the assumption that advertising adds nothing to the product.

The reason for that model's popularity with economists is that it "explains" how advertising can make people prefer a known product A over a known product B without changing the utility of the products. A model where firm actions change customers' utilities is a no-no in Industrial Organization economics, because it cannot serve as a foundation for regulation: all the results become an artifact of how the modeler formulates that change.*

Ok, but then why hire Angelina Jolie? Ms. Jolie is  rich and famous, so she didn't get the job by sexing the producer.

Two reasons: some people can act better than others and have a distinctive diction style (production reason) and Ms. Jolie's job is not just the acting part (promotion reason).

The first reason is obvious to anyone who ever had to read a speech to tape or narrate a slideshow: it's difficult work and the narration doesn't sound natural; acting out parts is even harder. Practice helps, but even professional readers (like the ones narrating audiobooks) aren't that good at acting parts. And some people's diction and voice have distinctive patterns and sounds that have proved themselves on the market: James Spader is now fat, but his voice still sells Lexus.

When the voice work is over, Ms Jolie will help promote the movie: her fame gets her bookings on Leno and Letterman; her presence at a promotional event will draw a crowd. This kind of promotion is worth a lot of money not spent on advertising, and, of course, her name helps with the advertising as well. A good voice actor might be a cheaper actor (and let's note here that Ms. Jolie doesn't command as high a fee for voice work as for her regular acting), but will not get top billing and promotion on talk shows.

I like Economics' models. But not when they imply that Angelina Jolie is a waste of money.**

-- -- -- -- -- -- -- --

* For anyone who ever read a book about, took a course on, or worked in advertising, Industrial Organization models of advertising read like the Flat Earth Society trying to explain the Moon shot.

** And the video linked from the first sentence in that paragraph is evidence of the first reason above.

Saturday, May 28, 2011

Transition matrices and why "old" knowledge matters

An old tool, with well-know traps, still trapping people who don't bother to read "old" (aka 1980) technical marketing papers (yet another example of the arrogance of many people).

A consumption transition matrix is a tool to analyze state dependency. Assuming that there are two brands (Coke and Pepsi) and we can observe all consumption of a given person, we can turn histories of consumption like

...PPCCPPPCPCCCPCPPCPPCPPC...

into matrices where the entries are the probability of the column brand being chosen next given that the row brand is the current consumption, like

(1)  $\begin{array}{lcc}
 & C & P \\
 C & .3 & .7 \\
 P & .6 & .4 \\
 \end{array}$

which means, in this case, switching behavior; this person buys a Coke 60% of the time after a Pepsi consumption and Pepsi only 40% of the time after a Pepsi.

The structure of these switching matrices (which are actually embedded in choice models, in order to take into account marketing variables like price and promotion) gives some hints about the behavior of consumers. The one in matrix (1) is a moderate switcher with slightly more lengthy Pepsi consumption waves, while

(2)  $\begin{array}{lcc}
 & C & P \\
 C & .3 & .7 \\
 P & .3 & .7 \\
 \end{array}$

is a random mixer with a preference for Pepsi. Note how consumption on the next period does not depend on consumption in the current period. On the other hand, there are also cases like

(3)  $\begin{array}{lcc}
 & C & P \\
 C & .9 & .1 \\
 P & .05 & .95 \\
 \end{array}$

where the behavior is overwhelmingly one of inertia, habit, or loyalty (the matrix cannot separate between these three very different psychological decision processes).

It has been known by marketing modelers for a long time (at least since the early 80s) that aggregating transition matrices across people creates or destroys state dependency by itself; yet, the eternal "rediscovery" of basic marketing truth by non-marketers working in analytics seems to have passed that knowledge by. (You know who you are.)

Take the case of a market that is half type (3) and half of the type described next:


(4)  $\begin{array}{lcc}
 & C & P \\
 C & .1 & .9 \\
 P & .95 & .05 \\
 \end{array}$

This market that is composed of strong loyals and strong switchers, for whom the brand is very important to determine consumption. After aggregation, it will be described by a matrix of brand-indifferent people:

(5) $\begin{array}{lcc}
 & C & P \\
 C & .5 & .5 \\
 P & .5 & .5 \\
 \end{array}$

(To illustrate the creation of state dependency where none exists we need three brands; this is left as an exercise for the reader.)

Moral: just because something was discovered by people who worked in marketing before it became cool with your peer group, it doesn't mean it's not important to know.

Wednesday, May 25, 2011

Some thoughts on (other people's) presentations problems

Slightly disjointed observations, inspired by a few presentations I've observed recently:

1. Obvious laziness is unprofessional. I saw a presentation to an audience that works with mathematics where the presenter used the "draw ellipse segment" tool to draw "exponentials" on a slide about exponential growth. Since exponentials look very different from quarter-ellipses, it was obvious that the presenter didn't think the presentation worth taking the one minute required to plot an actual exponential with a spreadsheet.

2. When in doubt, use less: colors, fonts, indent levels, bundled clipart; in fact, never use bundled clipart. Everyone has that same clipart, so the audience will be familiar with it, associating it with the other uses.

3. There is no correlation between the time it takes to make a slide and the time that slide should take in a presentation. I have several slides that took hours to make (just to make the slide, not to figure out the material going into it) that get shown for seconds in a presentation, because that's their job in that presentation. On the other hand I routinely keep one-word chyrons up for minutes, as chorus to what I'm saying.

4. If you're going to use quotations, make darn sure you get the reference right. Otherwise you'll sound like an idiot. Saying "Life is but a walking shadow" and attributing it to 'Q' in episode one of Star Trek The Next Generation is both ignorant of the quotation (Shakespeare, Macbeth, Act 5, Scene 5 – you can find that on the interwebs) and Star Trek TNG where John de Lancie (Q) clearly attributes it to Shakespeare. Also, complete sourcing (not just author) increases credibility by making the quotation easier to check.

5. Speaker notes are perfectly acceptable; just don't carry flash cards. Memorizing a speech is really hard and few people can do it correctly; if you're over 40 you can always make the joke that memory is the first thing to go (punch line: "I forgot where I heard that"). Your command of field knowledge can be demonstrated in the question-and-answer period; coincidentally, people who are good at memorizing speeches tend to do poorly in the Q&A... Just remember:

6. Speaker notes are for the speaker. Don't impose them on the audience. Most especially don't put them in outline form on your slides. It suggests that you don't know how to use "presenter screen" on your computer, or dead-tree-ware. Don Norman writes about that.

7. Preparation is essential. I already wrote 3500 words on this. Most presentations continue to fail due to obvious lack of preparation or of preparation time spent on the wrong end of the process (memorizing speech, rehearsing delivery; these are important finishing touches, but not where most preparation should focus).

And a bonus meta-observation, from Illka Kokkarinen: the biggest problem is still the incessant yammering for fifteen minutes to reach a conclusion that could have been written in one paragraph to be read in less than a minute. Good point! We are so used to our time being wasted that we no longer notice this.

[Added May 30, 2011.] A reader (who asked for anonymity) emails: Don't eat a beef and bean burrito in the two hours prior to the presentation. I'd go further and suggest carefully managing pre-presentation intake of liquids (a presenter with a full bladder becomes short-tempered and rushed) and foods with gastrointestinal disruption potential.

Monday, May 23, 2011

What do you mean "average"?

It's not like there aren't many options.

Many people implicitly assume the median, when they say "50% of X are below average." This is probably because they assume that the distribution is symmetric around the arithmetic, or simple, mean, and therefore the mean equals the median. But why limit ourselves to the simple mean?

The most common average is the simple mean:

$m =\frac{1}{N} \sum_{i=1}^{N} x_i,$

although one could use a quadratic, cubic, quartic, quintic, etc mean (for $k=2,3,4,5,\ldots$):

$m =\frac{1}{N}\left[ \sum_{i=1}^{N}  x_i ^k \right]^{1/k}.$

Or maybe something more esoteric, like a geometric mean:

$m =\left[ \prod_{i=1}^{N}  x_i \right]^{1/N},$

or the harmonic mean

 $m= \frac{N}{ \sum_{i=1}^{N}  \frac{1}{x_i}}.$

Strange as though they may seem, all of these have their uses and their problems. For example, europeans report automobile fuel consumption in liters/km, while americans report miles/gal, or with appropriate scaling to grown-up units, km/l. Because most people think better with linear means than harmonic means, the european representation leads to better understanding of fuel economy comparisons. The choice of the mean has to be appropriate to the problem (engineers will notice how RMS and MSE both use quadratic means). 

And this is before we even get to the problems with the $x_i$ that go into the average.

Sunday, May 22, 2011

Selection effects, Buffett's rebuttal, and the causality question

Some thoughts on causality based on a story I recall from Alice Schroeder's  The Snowball, Warren Buffett's biography. (I read the book over two years ago, and it was a library copy, so I can't be sure of the details, but I'm sure of the logic.)

Warren Buffett attended a conference on money management where he made a big splash against a group of efficient market advocates. Efficient financial markets imply that, in the long term, it's impossible to have returns above market average, something that Buffett had been doing for several years by then.

The efficient markets hypothesis advocates present at this conference made the predictable argument against reading too much in the outsized returns of a few money managers: if there's a lot of people trading securities, then some will do better than the median, while others will do worse than the median, just as an artifact of the randomness. To over-interpret this is to imagine clusters where none exists.

Buffett then told a parable along the following lines: "Imagine that you look at all the money managers in the market last year, say 20,000, and see that there are 24 that did much better than the rest of the 20,000. So far it could be the case of a random cluster, yes. Then you find those 24 traders, and discover that 23 came from a very small town, [Buffett gave it the name of a mentor, but I can't recall it] Buffettville. Now, most people would think that there's something in Buffettville that makes for good managers; but you are telling us that it's all a coincidence."

Buffett's argument carries some weight in the sense that the second variable (i.e. being from Buffettville) is not a-priori related to having higher returns, so it must be related by a hitherto unknown causality relationship.

But there's a problem here. Even if a large proportion of the successful managers are from Buffetville, that doesn't mean that being from Buffettville makes people better managers; it might be the case that there were many other Buffettville managers in the 20,000 and those were at the very bottom. That would mean that managers from Bufettville have a much higher variance in returns than the market, and that the results, once again were the result of randomness.

My argument here is that the story as I recall it being told in Schroeder's book is an incomplete rebuttal of the efficient markets hypothesis, not a defense of that hypothesis. I'm not a finance theorist; I'm in marketing, where we do believe that some marketers are much better than others, so I have no bone to pick either with the theory or its critics.

I'm just a big fan of clear thinking in matters managerial or business.

Wednesday, May 18, 2011

Averages and trends

Some lessons from data processing for corporate planning (with, possibly, other applications).

In the 14th Century, when I sat on the student side of a MBA classroom, there was this discipline called "Strategic Planning And Corporate Policy," in which we spent several sessions learning the twin dark arts of forecasting and scenario planning. Fast forward to today and add some minor statistical sophistication, and here we are.*

Suppose there's a variable of interest that is an input to major strategic decisions and for which we have historical data with varying accuracy; say, population density. And to make decisions we need to get some sense of how it's changing, say an average trend.

The problem with density is that in some places it will make sense to measure it over areas and in other places, say city centers, over volume. So, how does one average two different metrics of the variable of interest ($people/km^2$ and $people/km^3$)?

For basic logistics planning one could flatten the volume and project it on the surface; but that doesn't work for other applications, like trends in the design of living space, where the average space in Manhattan is really volume. So, how to solve the general problem of finding an average that is a metric for trend computation?

Data reduction methods like Principal Component Analysis and Factor Analysis can take vectors of heterogeneous measures and find the common elements in them, therefore apparently solving this problem. Apparently -- if you don't know how these techniques work.

The trouble is that FA and PCA both select for variance, meaning that places that have the most variance in population density will be overweighed in the final metric. And this will lead to a trend estimate that is more volatile than the actual trend. (And most likely over-estimate any underlying trend as well, depending on the model formulation of trend as a function of metric.)

So, if we want to use averages for trend estimation, we have to end up with some sort of "manual data cleaning and weighing" in the model, which is done by judgment of the strategic planner's analysts. It's either that or use a correction for volatility in the estimated trend, but almost no one knows how to do that correctly and in many cases it's not possible.

A second problem with this trend estimation is that the locations where the data are collected depend on the actual variable being measured: areas with high density will have more census workers than empty areas, simply as a result of standard sampling approaches. And, if the population trends include migration, that migration changes the sampling strategy of demographers so it too will create additional volatility in the measurement of the trend. This can be controlled for with appropriate statistical techniques, but pretty much never is.

A third problem is the historical data. We trust our demographics data now, but in order to get trends, perhaps our strategic analysts needed to use historical records, and then indirect, proxy, variables. So they choose proxy variables that are well-correlated with the target variable and extrapolate the past. But to extrapolate the past the analysts need a strategy to cope with extrapolation error, which on average increases with extrapolation distance. Typically this will include some bootstrap method that uses the "good data" estimate of the trend as a starting point for the "bad data" part of the trend.

By using bootstrap methods, the analysts will smooth out any proxy variable effects that move away from the trend; in other words, if the historical data contradicts the trend, it will have only a small effect (depending on the smoothing technique and the bootstrapping strategy) on the final trend estimate, but if it is neutral or supports the trend, it will increase trend volatility.

Of course, we marketers at this point interject that this is all pre-1900 stuff. After all, the average is much less important than the marginal effects on important segments. For example, the effect of changes in Manhattan or Montana would be very unimportant, since for corporate purposes one is already highly dense and the other is empty. What matters is what happens in marginal places like Topeka, KS (marginal in the sense that they are highly sensitive to the choice of business strategy, no offense intended to the Topekans).

Marketers would want access to the disaggregate data, with separate data sets for the proxy variables. Instead of looking for a big number that summarizes some trend and then applying it blindly everywhere, we'd build two sets of models: local trends (as in how the population of Topeka evolves over time) and trends over the space of travel matrices (as in where do the people come from into Topeka and where do Topekans leave to). Then we could find policy implications that mattered.

Imagine that there was a very strong trend towards increasing density. If that trend was all in high or low density places (meaning people from Montana moving to Manhattan), this would not affect our strategy. But if cities at the cusp of a phase-change were either increasing or decreasing in density, that would have major policy implications.

(That is the marketing secret: whenever possible, unpack and disaggregate. Applies to many important things in life.)

At this point the corporate governance types in the classroom would interrupt to say that this is not how things actually work in the real world. (In the class where I was a student they'd be in trouble, as the teacher was on the board of several large companies; but let's sidestep that point.) The real corporate world, they'd say, is about power.

According to these corporate governance types, in the real world the analysts and the planners would put out a report, filled with the analysts' footnotes and appendices (which no one would read) and summarized by the planners in short sentences of small words (but no nuance and sometimes contradicting the analysts).

Then the executives would either cherry-pick the parts that supported their pre-determined plans or simply ignore the actual report and say that it supported their pre-determined plans. Then they'd sell that idea to the board of directors (who never read the report) and to the shareholders (who have no access to the report, let alone the data).

Everyone (except the more ethical among the analysts, but those could be conveniently fired or ostracized) would be happy: the planners would get bonuses for keeping their mouths shut; the executives would continue their unchecked rule; the board would continue to avoid serious outside challenges; and the shareholders would keep being fleeced.

Perhaps the governance guys were onto something. Having data is great, but the link between reality and action is not as obvious as one might think.

(I miss being a MBA student. It was fun.)

-- -- -- --
* The class had little statistics. We spent most of the time making probabilistic decision trees and conditional NPV calculations; then we discussed corporate governance and implementation of these plans via incentive systems. Not much 7S going on in that class. The professor had some funny stories of board of directors vs executives shenanigans, though.

Monday, May 16, 2011

Two quick thoughts about Microsoft's purchase of Skype

1. Valuation of a property like Skype is a lot more than just some multiple of earnings.

Quite a few bloggers, twitterers, and forum participants jumped on Facebook, Google, and Microsoft for their billion-dollar valuations of Skype. Usually the criticism was based on Skype's lackluster earnings. This is a massively myopic point of view.

One can acquire a company for many reasons beyond its current revenue stream: the company may own resources that it is not adequately exploiting, such as technology or highly valuable personnel; it may have a valuable brand or a large user base (which is certainly true for Skype); it may have valuable information about its customers (again true for Skype as the communication graph -- not just the link graph -- is valuable); and finally, the company may have untapped revenue potential, just not with their current revenue model.

As a general rule, just because one cannot think of a way to monetize something, it doesn't mean that there is no way to monetize that thing.

Another possible reason to buy a company is strategy at a corporate level: to stop it from developing into a competitor for some of our products, to stop competitors from buying it (and therefore becoming better competitors), and to signal commitment to a specific market.


2. Perhaps there's a little Winner's Curse going on here, or perhaps not

When three companies (Google, Facebook, and Microsoft) compete for the same company, there's always the possibility of a little Winner's Curse effect:

 Assume that the value of Skype to these companies includes a big fraction that is common, meaning that it will be realized independent of the owner. Call that true common value $v$. To simplify, for now, assume that there are no synergies or strategic advantages for any of the buying companies; so the whole value is $v$.

Using all the information available, Google, Facebook, and Microsoft estimate $v$, each coming up with a number: $\tilde v_G$, $\tilde v_F$, and $\tilde v_M$. Note that these are estimates of the same $v$, not a representation of different actual value that Skype might have for these three companies. The estimates are different because each company uses different financial models and has access to different information or weighs it differently.

In a competitive market the winner will be the company who has the highest estimate, so we can assume that $\tilde v_M > \tilde v_G$ and $\tilde v_M > \tilde v_F$. The question now becomes: is what Microsoft paid for Skype higher than $v$ (the true $v$)?

Probabilistically the winning $\tilde v$ is likely to be higher than $v$,* since it's the maximum of three unbiased estimates -- one hopes these three companies have good financial advisers -- of the true $v$. Microsoft knows this and may shade its offer down a little from $\tilde v_M$. But even so, there's a chance that it paid too much.

Except that we're ignoring all the non-common value: synergies, strategic fit with Microsoft's other properties, and signaling to the market that Microsoft isn't yet a zombie like IBM was in the '90s.

There's a lot going on between Skype and Microsoft that the online comentariat missed. Then again, that's the fun of reading it.

(Hey, I finally wrote a business post in this blog that I repositioned as a business blog over a month ago!)

-------------------

* If the distribution of the errors in estimates of $v$ is symmetrical around zero (ergo the median of $\tilde v$ is $v$), the probability that the maximum of three observations $\tilde v$ is higher than $v$ is $7/8$.

Sunday, May 15, 2011

Factoring game and algorithmic game theory

(A vignette inspired by Ehud Kalai's talk at the Lens 2011 Conference.)

Consider the following sequential-move game:
  1. Player 1 chooses an integer $n > 1$.
  2. Player 2 chooses an integer $k > 1$.
  3. Player 2 wins if $k$ is a prime factor of $n$; Player 1 wins if $k$ is not a prime factor of $n$.
The backward induction solution to this game is obvious: Player 2 picks $k$ such that it is a prime factor of $n$, and Player 1 picks any $n$, which is irrelevant because Player 2 always wins.

This game, created by Ben-Sasson, Kalai, and Kalai, called the Factoring Game, illustrates a problem with the concept of equilibrium: it assumes that Player 2 can solve a complex problem (integer factorization) in useful time.*

So that "because Player 2 always wins" boldface part above should really be preceded by "assuming that Player 2 has a quantum computer to run Shor's algorithm." In other words, in actual useful time the more likely event is that Player 1 wins (by picking a number that is the product of two very large primes, for example).

The Factoring Game exposes a problem with game-theoretic solutions to some strategic problems: they don't take into account computability or complexity. That is a problem for many real-world situations, like paid search and auction mechanism design.

There's a new-ish field at the intersection of economic game theory and computer science, algorithmic game theory. This field explicit models computation as part of the process of solving games. Something that we should keep our eyes open for, as it already has real world applications in search, mechanism design, and online auctions.

Game theory is really expanding its purview: modal logic, computational (simulated, numerical), algorithmic (computation-theoretic), and behavioral versions... good times.

Reference: E. Ben-Sasson, A. Kalai, and E. Kalai. "An approach  to bounded rationality." In Advances in Neural Information Processing, Volume 19, pages 145–152. MIT Press,  Cambridge, MA, 2006.

* This game actually only illustrates the problem of subgame-perfect Nash equilibrium, not all equilibria concepts. Hey, I had to take a ton of game theory, might as well use some of it to be pedantic here.

A short observation about limits

Some scientists need a better understanding of the concept of limit as in
\[
\lim_{\begin{array}{l}x \rightarrow 0 \\ y \rightarrow +\infty \end{array} } f(x) g(y)
\]
where $f(\cdot)$ and $g(\cdot)$ are increasing functions with $f(0)=0$.

This thought was motivated by Leslie Valiant's talk at the Lens 2011 Conference (video at the link). He was trying to determine the computational feasibility of evolution by random mutation and natural selection. In his case the question was simply whether the rates of mutation and the incidence of beneficial mutations could evolve the set of specific biochemistry cycles that control many functions of life.

The standard biologist's answer is that the rate of mutations is small and the probability is small, but when they accumulate over 4.5 billion years and integrate over all possible planets, they add to a big enough number. Clearly they don't understand that Small $\times$ Big is an undefined quantity, as the limit above can be anything. Their hand-waving argument is sloppy thinking.

You can't have that in science.

-- -- -- -- -- -- -- --

Addendum: Just in case it's not obvious, my issue is not with the theory of evolution per se, but with the lack of good numerical models and complexity evolution models which would allow for rate of evolution calculations.

Friday, May 13, 2011

A problem with the "less choice is better" idea

(Reposted because Blogger mulched its first instance.)

There's some research that shows that people do better when they have fewer choices. For example, when offered twenty different types of jam people will buy less jam (and those that buy will be less happy with their purchase) than when offered four types of jam.

There's some controversy around these results, but let us assume ad arguendum that, perhaps due to cognitive cost, perhaps due to stochastic disturbances in the choice process and associated regret, the result is true.

That does not imply what most people believe it implies.

The usual implication is something like: Each person does better with a choice set of four products; therefore let us restrict choice in this market to four products.

Oh! My! Goodness!

It's as if segmentation had never been invented. Even if each person is better off choosing when there are only four products in the market, instead of twenty, that doesn't mean that everybody wants the same four products in the choice set.

In fact, if there are 20 products total, there are $20!/(16! \times 4!) = 4845$ possible 4-unit choice sets.

Even when restricting an individual's choice would make that individual better-off, restricting the population's choices has a significant potential to make most individuals worse-off.

Tuesday, May 10, 2011

Some recent finds on the web (technical, not managerial)

These are some technical papers I have recently found on the web, on the topics that interest me professionally:

Brian Karrer, M. E. J. Newman:  "Stochastic blockmodels and community structure in networks" (seen at the Lens 2011 conference).

Tim Roughgarden:  "Algorithmic Game Theory primer" (a short intro to his book).

Hugo Mercier: "Why do humans reason? Arguments for an argumentative theory" (makes the point that reasoning evolved to win arguments, not to search for the truth).

Johannes Neidhart, Joachim Krug: "Adaptive walks & extreme value theory" (I think this has implication for the evolution of various markets, including the Internet).

Denys Pommeret, Mohamed Boutahar, Badih Ghattas: "Nonparametric test for detecting change in distribution with panel data".

Constantin Rothkopf, Christos Dimitrakakis: "Preference elicitation and inverse reinforcement learning".

I also found the preprints of a book that was lauded at the Lens 2011 conference (no, it's not an illegal torrent, it's a preprint put online by one of the authors):

Marc M├ęzard, Andrea Montanari: Information, Physics, and Computation (Oxford Graduate Texts):  Hardcover @ Amazon; preprint PDF.

Monday, May 9, 2011

That 81% prediction, it looks good, but needs further elaboration

Bobbing around the interwebs today we find a post about a prediction of UBL's location. A tip of the homburg to Drew Conway for being the first mention I saw. Now, for the prediction itself.

As impressive as a 81% chance attributed to the actual location of UBL is, it raises three questions. These are important questions for any prediction system after its prediction is realized. Bear in mind that I'm not criticizing the actual prediction model, just the attitude of cheering for the probability without further details.

Yes, 81% is impressive; did the model make other predictions (say the location of weapons caches), and if so were they also congruent with facts? Often models will predict several variables and get some right and others wrong. Other predicted variables can act as quality control and validation. (Choice modelers typically use a hold-out sample to validate calibrated models.) It's hard to validate a model based on a single prediction.

Equally important is the size of the space of possibilities relative to the size of the predicted event. If the space was over the entire world, and the prediction pointed to Abbottabad but not Islamabad, that's impressive; if the space was restricted to Af/Pk and the model predicted the entire Islamabad district, that's a lot less impressive. I predict that somewhere in San Francisco there's a panhandler with a "Why lie, the money's for beer" poster; that's not an impressive prediction. If I predict that the panhandler is on the Market - Valencia intersection, that's impressive.

Selection is the last issue: was this the only location model for UBL or were there hundreds of competing models and we're just seeing the best? In that case it's less impressive that a model gave a high probability to the actual outcome: it's sampling on the dependent variable. For example, when throwing four dice once, getting 1-1-1-1 is very unlikely ($1/6^4 \approx 0.0008$); when throwing four dice 10 000 times, it's very likely that the 1-1-1-1 combination will appear in one of them (that probability is $1-(1- 1/6^4)^{10000} \approx 1$).

Rules of model building and inference are not there because statisticians need a barrier to entry to keep the profession profitable. (Though they sure help with paying the bills.) They are there because there's a lot of ways in which one can make wrong inferences from good models.

Usama Bin Laden had to be somewhere; a sufficiently large set of models with large enough isoprobability areas will almost surely contain a model that gives a high probability to the actual location where UBL was, especially if it was allowed to predict the location of the top hundred Al-Qaeda people and it just happened to be right about UBL.

Lessons: 1) the value of a predicted probability $\Pr(x)$ for a known event $x$ can only be understood with the context of the predicted probabilities $\Pr(y)$ for other known events $y$; 2) we must be very careful in defining what $x$ is and what the space $\mathcal{X}: x \in \mathcal{X}$ is; 3) when analyzing the results of a model, one needs to control for the existence of other models [cough] Bayesian thinking [/cough].

Effective model building and evaluation need to take into account the effects of limited reasoning by those reporting model results, or, in simpler terms, make sure you look behind the curtain before you trust the magic model to be actually magical.

Summary of this post: in acrostic!