Si Tacuisses, Philosophus Mansisses: May 2011

Sunday, May 29, 2011

Angelina Jolie shows problem with some economic models

Watching Megamind, I'm reminded of an old Freakonomics post about voice actors. It was very educational: it showed how having a model for something could make smart people say dumb things.

The argument went as follows: because voice actors are not seen, producers who pay a premium to use Angelina Jolie instead of some unknown voice actor are using the burning money theory of advertising: by destroying a lot of money arbitrarily, they signal their confidence in the value of their product to the market; after all, if the product was bad, they'd never make that lost money back. (Skip two blue paragraphs to avoid economics geekery.)

As models go, the burning money theory of advertising is full of holes: it's based on inference, which means that the equilibrium depends on beliefs off the equilibrium path; there's a folk theorem over games with uncertainty that shows any outcome on the convex hull of the individually-rational outcomes can be an equilibrium; the model works for some equilibrium concepts, like Bayesian Perfect Equilibrium, but not others, like Trembling-Hand Perfection; and it makes the assumption that advertising adds nothing to the product.

The reason for that model's popularity with economists is that it "explains" how advertising can make people prefer a known product A over a known product B without changing the utility of the products. A model where firm actions change customers' utilities is a no-no in Industrial Organization economics, because it cannot serve as a foundation for regulation: all the results become an artifact of how the modeler formulates that change.*

Ok, but then why hire Angelina Jolie? Ms. Jolie is rich and famous, so she didn't get the job by sexing the producer.

Two reasons: some people can act better than others and have a distinctive diction style (production reason) and Ms. Jolie's job is not just the acting part (promotion reason).

The first reason is obvious to anyone who ever had to read a speech to tape or narrate a slideshow: it's difficult work and the narration doesn't sound natural; acting out parts is even harder. Practice helps, but even professional readers (like the ones narrating audiobooks) aren't that good at acting parts. And some people's diction and voice have distinctive patterns and sounds that have proved themselves on the market: James Spader is now fat, but his voice still sells Lexus.

When the voice work is over, Ms Jolie will help promote the movie: her fame gets her bookings on Leno and Letterman; her presence at a promotional event will draw a crowd. This kind of promotion is worth a lot of money not spent on advertising, and, of course, her name helps with the advertising as well. A good voice actor might be a cheaper actor (and let's note here that Ms. Jolie doesn't command as high a fee for voice work as for her regular acting), but will not get top billing and promotion on talk shows.

I like Economics' models. But not when they imply that Angelina Jolie is a waste of money.**

-- -- -- -- -- -- -- --

* For anyone who ever read a book about, took a course on, or worked in advertising, Industrial Organization models of advertising read like the Flat Earth Society trying to explain the Moon shot.

** And the video linked from the first sentence in that paragraph is evidence of the first reason above.

Wednesday, May 25, 2011

Some thoughts on (other people's) presentations problems

Slightly disjointed observations, inspired by a few presentations I've observed recently:

1. Obvious laziness is unprofessional. I saw a presentation to an audience that works with mathematics where the presenter used the "draw ellipse segment" tool to draw "exponentials" on a slide about exponential growth. Since exponentials look very different from quarter-ellipses, it was obvious that the presenter didn't think the presentation worth taking the one minute required to plot an actual exponential with a spreadsheet.

2. When in doubt, use less: colors, fonts, indent levels, bundled clipart; in fact, never use bundled clipart. Everyone has that same clipart, so the audience will be familiar with it, associating it with the other uses.

3. There is no correlation between the time it takes to make a slide and the time that slide should take in a presentation. I have several slides that took hours to make (just to make the slide, not to figure out the material going into it) that get shown for seconds in a presentation, because that's their job in that presentation. On the other hand I routinely keep one-word chyrons up for minutes, as chorus to what I'm saying.

4. If you're going to use quotations, make darn sure you get the reference right. Otherwise you'll sound like an idiot. Saying "Life is but a walking shadow" and attributing it to 'Q' in episode one of Star Trek The Next Generation is both ignorant of the quotation (Shakespeare, Macbeth, Act 5, Scene 5 – you can find that on the interwebs) and Star Trek TNG where John de Lancie (Q) clearly attributes it to Shakespeare. Also, complete sourcing (not just author) increases credibility by making the quotation easier to check.

5. Speaker notes are perfectly acceptable; just don't carry flash cards. Memorizing a speech is really hard and few people can do it correctly; if you're over 40 you can always make the joke that memory is the first thing to go (punch line: "I forgot where I heard that"). Your command of field knowledge can be demonstrated in the question-and-answer period; coincidentally, people who are good at memorizing speeches tend to do poorly in the Q&A... Just remember:

6. Speaker notes are for the speaker. Don't impose them on the audience. Most especially don't put them in outline form on your slides. It suggests that you don't know how to use "presenter screen" on your computer, or dead-tree-ware. Don Norman writes about that.

7. Preparation is essential. I already wrote 3500 words on this. Most presentations continue to fail due to obvious lack of preparation or of preparation time spent on the wrong end of the process (memorizing speech, rehearsing delivery; these are important finishing touches, but not where most preparation should focus).

And a bonus meta-observation, from Illka Kokkarinen: the biggest problem is still the incessant yammering for fifteen minutes to reach a conclusion that could have been written in one paragraph to be read in less than a minute. Good point! We are so used to our time being wasted that we no longer notice this.

[Added May 30, 2011.] A reader (who asked for anonymity) emails: Don't eat a beef and bean burrito in the two hours prior to the presentation. I'd go further and suggest carefully managing pre-presentation intake of liquids (a presenter with a full bladder becomes short-tempered and rushed) and foods with gastrointestinal disruption potential.

Sunday, May 22, 2011

Selection effects, Buffett's rebuttal, and the causality question

Some thoughts on causality based on a story I recall from Alice Schroeder's The Snowball, Warren Buffett's biography. (I read the book over two years ago, and it was a library copy, so I can't be sure of the details, but I'm sure of the logic.)

Warren Buffett attended a conference on money management where he made a big splash against a group of efficient market advocates. Efficient financial markets imply that, in the long term, it's impossible to have returns above market average, something that Buffett had been doing for several years by then.

The efficient markets hypothesis advocates present at this conference made the predictable argument against reading too much in the outsized returns of a few money managers: if there's a lot of people trading securities, then some will do better than the median, while others will do worse than the median, just as an artifact of the randomness. To over-interpret this is to imagine clusters where none exists.

Buffett then told a parable along the following lines: "Imagine that you look at all the money managers in the market last year, say 20,000, and see that there are 24 that did much better than the rest of the 20,000. So far it could be the case of a random cluster, yes. Then you find those 24 traders, and discover that 23 came from a very small town, [Buffett gave it the name of a mentor, but I can't recall it] Buffettville. Now, most people would think that there's something in Buffettville that makes for good managers; but you are telling us that it's all a coincidence."

Buffett's argument carries some weight in the sense that the second variable (i.e. being from Buffettville) is not a-priori related to having higher returns, so it must be related by a hitherto unknown causality relationship.

But there's a problem here. Even if a large proportion of the successful managers are from Buffetville, that doesn't mean that being from Buffettville makes people better managers; it might be the case that there were many other Buffettville managers in the 20,000 and those were at the very bottom. That would mean that managers from Bufettville have a much higher variance in returns than the market, and that the results, once again were the result of randomness.

My argument here is that the story as I recall it being told in Schroeder's book is an incomplete rebuttal of the efficient markets hypothesis, not a defense of that hypothesis. I'm not a finance theorist; I'm in marketing, where we do believe that some marketers are much better than others, so I have no bone to pick either with the theory or its critics.

I'm just a big fan of clear thinking in matters managerial or business.

Monday, May 16, 2011

Two quick thoughts about Microsoft's purchase of Skype

1. Valuation of a property like Skype is a lot more than just some multiple of earnings.

Quite a few bloggers, twitterers, and forum participants jumped on Facebook, Google, and Microsoft for their billion-dollar valuations of Skype. Usually the criticism was based on Skype's lackluster earnings. This is a massively myopic point of view.

One can acquire a company for many reasons beyond its current revenue stream: the company may own resources that it is not adequately exploiting, such as technology or highly valuable personnel; it may have a valuable brand or a large user base (which is certainly true for Skype); it may have valuable information about its customers (again true for Skype as the communication graph -- not just the link graph -- is valuable); and finally, the company may have untapped revenue potential, just not with their current revenue model.

As a general rule, just because one cannot think of a way to monetize something, it doesn't mean that there is no way to monetize that thing.

Another possible reason to buy a company is strategy at a corporate level: to stop it from developing into a competitor for some of our products, to stop competitors from buying it (and therefore becoming better competitors), and to signal commitment to a specific market.

2. Perhaps there's a little Winner's Curse going on here, or perhaps not

When three companies (Google, Facebook, and Microsoft) compete for the same company, there's always the possibility of a little Winner's Curse effect:

Assume that the value of Skype to these companies includes a big fraction that is common, meaning that it will be realized independent of the owner. Call that true common value $v$. To simplify, for now, assume that there are no synergies or strategic advantages for any of the buying companies; so the whole value is $v$.

Using all the information available, Google, Facebook, and Microsoft estimate $v$, each coming up with a number: $\tilde v_G$, $\tilde v_F$, and $\tilde v_M$. Note that these are estimates of the same $v$, not a representation of different actual value that Skype might have for these three companies. The estimates are different because each company uses different financial models and has access to different information or weighs it differently.

In a competitive market the winner will be the company who has the highest estimate, so we can assume that $\tilde v_M > \tilde v_G$ and $\tilde v_M > \tilde v_F$. The question now becomes: is what Microsoft paid for Skype higher than $v$ (the true $v$)?

Probabilistically the winning $\tilde v$ is likely to be higher than $v$,* since it's the maximum of three unbiased estimates -- one hopes these three companies have good financial advisers -- of the true $v$. Microsoft knows this and may shade its offer down a little from $\tilde v_M$. But even so, there's a chance that it paid too much.

Except that we're ignoring all the non-common value: synergies, strategic fit with Microsoft's other properties, and signaling to the market that Microsoft isn't yet a zombie like IBM was in the '90s.

There's a lot going on between Skype and Microsoft that the online comentariat missed. Then again, that's the fun of reading it.

(Hey, I finally wrote a business post in this blog that I repositioned as a business blog over a month ago!)

-------------------

* If the distribution of the errors in estimates of $v$ is symmetrical around zero (ergo the median of $\tilde v$ is $v$), the probability that the maximum of three observations $\tilde v$ is higher than $v$ is $7/8$.

Sunday, May 15, 2011

Factoring game and algorithmic game theory

(A vignette inspired by Ehud Kalai's talk at the Lens 2011 Conference.)

Consider the following sequential-move game:

Player 1 chooses an integer $n > 1$.
Player 2 chooses an integer $k > 1$.
Player 2 wins if $k$ is a prime factor of $n$; Player 1 wins if $k$ is not a prime factor of $n$.

The backward induction solution to this game is obvious: Player 2 picks $k$ such that it is a prime factor of $n$, and Player 1 picks any $n$, which is irrelevant because Player 2 always wins.

This game, created by Ben-Sasson, Kalai, and Kalai, called the Factoring Game, illustrates a problem with the concept of equilibrium: it assumes that Player 2 can solve a complex problem (integer factorization) in useful time.*

So that "because Player 2 always wins" boldface part above should really be preceded by "assuming that Player 2 has a quantum computer to run Shor's algorithm." In other words, in actual useful time the more likely event is that Player 1 wins (by picking a number that is the product of two very large primes, for example).

The Factoring Game exposes a problem with game-theoretic solutions to some strategic problems: they don't take into account computability or complexity. That is a problem for many real-world situations, like paid search and auction mechanism design.

There's a new-ish field at the intersection of economic game theory and computer science, algorithmic game theory. This field explicit models computation as part of the process of solving games. Something that we should keep our eyes open for, as it already has real world applications in search, mechanism design, and online auctions.

Game theory is really expanding its purview: modal logic, computational (simulated, numerical), algorithmic (computation-theoretic), and behavioral versions... good times.

Reference: E. Ben-Sasson, A. Kalai, and E. Kalai. "An approach to bounded rationality." In Advances in Neural Information Processing, Volume 19, pages 145–152. MIT Press, Cambridge, MA, 2006.

* This game actually only illustrates the problem of subgame-perfect Nash equilibrium, not all equilibria concepts. Hey, I had to take a ton of game theory, might as well use some of it to be pedantic here.

Friday, May 13, 2011

A problem with the "less choice is better" idea

(Reposted because Blogger mulched its first instance.)

There's some research that shows that people do better when they have fewer choices. For example, when offered twenty different types of jam people will buy less jam (and those that buy will be less happy with their purchase) than when offered four types of jam.

There's some controversy around these results, but let us assume ad arguendum that, perhaps due to cognitive cost, perhaps due to stochastic disturbances in the choice process and associated regret, the result is true.

That does not imply what most people believe it implies.

The usual implication is something like: Each person does better with a choice set of four products; therefore let us restrict choice in this market to four products.

Oh! My! Goodness!

It's as if segmentation had never been invented. Even if each person is better off choosing when there are only four products in the market, instead of twenty, that doesn't mean that everybody wants the same four products in the choice set.

In fact, if there are 20 products total, there are $20!/(16! \times 4!) = 4845$ possible 4-unit choice sets.

Even when restricting an individual's choice would make that individual better-off, restricting the population's choices has a significant potential to make most individuals worse-off.

Monday, May 9, 2011

That 81% prediction, it looks good, but needs further elaboration

Bobbing around the interwebs today we find a post about a prediction of UBL's location. A tip of the homburg to Drew Conway for being the first mention I saw. Now, for the prediction itself.

As impressive as a 81% chance attributed to the actual location of UBL is, it raises three questions. These are important questions for any prediction system after its prediction is realized. Bear in mind that I'm not criticizing the actual prediction model, just the attitude of cheering for the probability without further details.

Yes, 81% is impressive; did the model make other predictions (say the location of weapons caches), and if so were they also congruent with facts? Often models will predict several variables and get some right and others wrong. Other predicted variables can act as quality control and validation. (Choice modelers typically use a hold-out sample to validate calibrated models.) It's hard to validate a model based on a single prediction.

Equally important is the size of the space of possibilities relative to the size of the predicted event. If the space was over the entire world, and the prediction pointed to Abbottabad but not Islamabad, that's impressive; if the space was restricted to Af/Pk and the model predicted the entire Islamabad district, that's a lot less impressive. I predict that somewhere in San Francisco there's a panhandler with a "Why lie, the money's for beer" poster; that's not an impressive prediction. If I predict that the panhandler is on the Market - Valencia intersection, that's impressive.

Selection is the last issue: was this the only location model for UBL or were there hundreds of competing models and we're just seeing the best? In that case it's less impressive that a model gave a high probability to the actual outcome: it's sampling on the dependent variable. For example, when throwing four dice once, getting 1-1-1-1 is very unlikely ($1/6^4 \approx 0.0008$); when throwing four dice 10 000 times, it's very likely that the 1-1-1-1 combination will appear in one of them (that probability is $1-(1- 1/6^4)^{10000} \approx 1$).

Rules of model building and inference are not there because statisticians need a barrier to entry to keep the profession profitable. (Though they sure help with paying the bills.) They are there because there's a lot of ways in which one can make wrong inferences from good models.

Usama Bin Laden had to be somewhere; a sufficiently large set of models with large enough isoprobability areas will almost surely contain a model that gives a high probability to the actual location where UBL was, especially if it was allowed to predict the location of the top hundred Al-Qaeda people and it just happened to be right about UBL.

Lessons: 1) the value of a predicted probability $\Pr(x)$ for a known event $x$ can only be understood with the context of the predicted probabilities $\Pr(y)$ for other known events $y$; 2) we must be very careful in defining what $x$ is and what the space $\mathcal{X}: x \in \mathcal{X}$ is; 3) when analyzing the results of a model, one needs to control for the existence of other models [cough] Bayesian thinking [/cough].

Effective model building and evaluation need to take into account the effects of limited reasoning by those reporting model results, or, in simpler terms, make sure you look behind the curtain before you trust the magic model to be actually magical.

Summary of this post: in acrostic!