Saturday, November 21, 2009

Statisticians and marketers: two professions separated by a common dataset.

Some days ago, in response to a friend's question, I wrote a tweet about nested choice models:
Someone else saw the tweet and sent me a question (edited):
Why choose order logit for quantity instead of a count model, say a Poisson regression?
This is a reasonable question: the Poisson "regression" (aka, the estimation of the Poisson lambda parameter as a linear combination of independent variables) is widely used to model physical count processes and makes writing the likelihood function easier. (In the old days, we wrote likelihood functions by hand! Kids these days, I tell you, with the clothes and the music and the hair, I mean, the Mathematica and the R libraries and the powerful home computers...)

The question illustrates an important difference between the statistical view of purchase data and the marketing analysis view of purchase data.

Marketing research, both of the commercial measurement type and of the experimental academic type, has produced a lot of consumer behavior knowledge. Within such knowledge we find that a count process is unlikely to be a good description of buying products in diverse quantities.

Instead, the number of cans of tuna purchased by customer Olivia is likely to fall into a handful of ordinal categories: zero, lower than normal inter-purchase interval consumption, normal inter-purchase interval consumption, higher than normal inter-purchase interval consumption, and very high. It's this categorical variable that should be regressed (order-logited, methinks!) on the variables thought to change Olivia's behavior. For Olivia the categories might be
{0}, {1,2,3}, {4,5}, {6,7,8,9}, {n | n>9}
These categories can be identified before the estimation of the nested choice model; coding them into a separate DV for the ologit is trivial. (The {0} category is never coded, as it is subsumed by the inter-purchase timing part of the nested model.)

Using the categories captures the behavior that marketers really want to understand: how does a marketing action M make Olivia change her quantity consumption in the Olivia scale, rather than the shared scale of natural numbers?

And this is one of the reasons why people analyzing marketing data need to know the basics of marketing model-building: because thinking like a marketer is different from building models based on physical processes.

Post-Bloggum: Yes, I know about the identification problems and how sensitive the estimation is to numeric issues. Those really important details are part of the -- unacknowledged and mostly misunderstood -- barriers to entry to the profession.