There are many documented cases of behavior deviating from the normative "rational" prescription of decision sciences and economics. For example, in the book Predictably Irrational, Dan Ariely tells us how he got a large number of Sloan School MBA students to change their choices using an irrelevant alternative.
The Ariely example has two groups of students choose a subscription type for The Economist. The first group was given three options to choose from: (online only, $\$60$); (paper only, $\$120$); or (paper+online, $\$120$). Overwhelmingly they chose the last option. The second group was given two options : (online only, $\$60$) or (paper+online $\$120$). Overwhelmingly they chose the first option.
Since no one chooses the (paper only, $\$120$) option, it should be irrelevant to the choices. However, removing it makes a large number of respondents change their minds. This is what is called a behavioral bias: an actual behavior that deviates from "rational" choice. (Technically these choices violate the Strong Axiom of Revealed Preference.)
(If you're not convinced that the behavior described is irrational, consider the following isomorphic problem: a waiter offers a group of people three desserts: ice cream, chocolate mousse, and fruit salad; most people choose the fruit salad, no one chooses the mousse. Then the waiter apologizes: it turns out there's no mousse. At that point most of the people who had ordered fruit salad switch to ice cream. This behavior is the same -- use some letters to represent options to remove any doubt -- as the one in Ariely's example. And few people would consider the fruit salad to ice-cream switchers rational.)
Ok, so people do, in some cases (perhaps in a majority of cases) behave in "irrational" ways, as described by the decision science and economics models. This is not entirely surprising, as those models are abstractions of idealized behavior and people are concrete physical entities with limitations and -- some argue -- faulty software.
What is really enlightening is how people who know about this feel about the biases.
IGNORE. Many academic economists and others who use economics models try to ignore these biases. Inasmuch as these biases can be more or less important depending on the decision, the persons involved, and the context, this ignorance might work for the economists, for a while. However, pretending that reality is not real is not a good foundation for Science, or even life.
ATTACK. A number of people use the existence of biases as an attack on established economics. This is how science evolves, with theories being challenged by evidence and eventually changing to incorporate the new phenomena. Some people, however, may be motivated by personal animosity towards economics and decision sciences; this creates a bad environment for knowledge evolution -- it becomes a political game, never good news for Science.
EXPLOIT. Books like Nudge make this explicit, but many people think of these biases as a way to manipulate others' behavior. Manipulate is the appropriate verb here, since these people (maybe with what they think is the best of intentions -- I understand these pave the way to someplace...) want to change others' behavior without actually telling these others what they are doing. In addition to the underhandedness that, were this a commercial application, the Nudgers would be trying to outlaw, this type of attitude reeks of "I know better than others, but they are too stupid to agree." Underhanded manipulation presented as a virtue; the world certainly has changed a lot.
ADDRESS AND MANAGE. A more productive attitude is to design decisions and information systems to minimize the effect of these biases. For example, in the decision above, both scenarios could be presented, the inconsistency pointed out, and then a separate part-worth decision could be addressed (i.e. what are each of the two elements -- print and online -- worth separately?). Note that this is the one attitude that treats behavioral biases as damage and finds way to route decisions around them, unlike the other three attitudes.
In case it's not obvious, my attitude towards these biases is to address and manage them.
Non-work posts by Jose Camoes Silva; repurposed in May 2019 as a blog mostly about innumeracy and related matters, though not exclusively.
Wednesday, September 28, 2011
Sunday, September 18, 2011
Probability interlude: from discrete events to continuous time
Lunchtime fun: the relationship between Bernoulli and Exponential distributions.
Let's say the probability of Joe getting a coupon for Pepsi in any given time interval $\Delta t$, say a month, is given by $p$. This probability depends on a number of things, such as intensity of couponing activity, quality of targeting, Joe not throwing away all junk mail, etc.
For a given integer number of months, $n$, we can easily compute the probability, $P$, of Joe getting at least one coupon during the period, which we'll call $t$, as
$P(n) = 1 - (1-p)^n$.
Since the period $t$ is $t= n \times \Delta t$, we can write that as
$P(t) = 1 - (1-p)^{\frac{t}{\Delta t}}.$
Or, with a bunch of assumptions that we'll assume away,
$P(t) = 1- \exp\left(t \times \frac{\log (1-p)}{\Delta t}\right).$
Note that $\log (1-p)<0$. Defining $r = - \log (1-p) /\Delta t$, we get
$P(t) = 1 - \exp (- r t)$.
And that is the relationship between the Bernoulli distribution and the Exponential distribution.
We can now build continuous-time analyses of couponing activity. Continuous analysis is much easier to do than discrete analysis. Also, though most simulators are, by computational necessity, discrete, building them based on continuous time models is usually simpler and easier to explain to managers using them.
Let's say the probability of Joe getting a coupon for Pepsi in any given time interval $\Delta t$, say a month, is given by $p$. This probability depends on a number of things, such as intensity of couponing activity, quality of targeting, Joe not throwing away all junk mail, etc.
For a given integer number of months, $n$, we can easily compute the probability, $P$, of Joe getting at least one coupon during the period, which we'll call $t$, as
$P(n) = 1 - (1-p)^n$.
Since the period $t$ is $t= n \times \Delta t$, we can write that as
$P(t) = 1 - (1-p)^{\frac{t}{\Delta t}}.$
Or, with a bunch of assumptions that we'll assume away,
$P(t) = 1- \exp\left(t \times \frac{\log (1-p)}{\Delta t}\right).$
Note that $\log (1-p)<0$. Defining $r = - \log (1-p) /\Delta t$, we get
$P(t) = 1 - \exp (- r t)$.
And that is the relationship between the Bernoulli distribution and the Exponential distribution.
We can now build continuous-time analyses of couponing activity. Continuous analysis is much easier to do than discrete analysis. Also, though most simulators are, by computational necessity, discrete, building them based on continuous time models is usually simpler and easier to explain to managers using them.
Labels:
mathematics,
Probability
Saturday, September 17, 2011
Small probabilities, big trouble.
After a long – work-related – hiatus, I'm back to blogging with a downer: the troublesome nature of small probability estimation.
The idea for this post came from a speech by Nassim Nicholas Taleb at Penn. Though the video is a bit rambling, it contains several important points. One that is particularly interesting to me is the difficulty of estimating the probability of rare events.
For illustration, let's consider a Normally distributed random variable $P$, and see what happens when small model errors are introduced. In particular we want to how the probability density $f_{P}(\cdot)$ predicted by four different models changes as a function of distance to zero, $x$. The higher the $x$ the more infrequently the event $P = x$ happens.
The densities are computed in the following table (click for larger):

The first column gives $f_{P}(x)$ for $P \sim \mathcal{N}(0,1)$, the base case. The next column is similar except that there's a 0.1% increase in the variance (10 basis points*). The third column is the ratio of these densities. (These are not probabilities, since $P$ is a continuous variable.)
Two observations jump at us:
1. Near the mean, where most events happen, it's very difficult to separate the two cases: the ratio of the densities up to two standard deviations ($x=2$) is very close to 1.
2. Away from the mean, where events are infrequent (but potentially with high impact), the small error of 10 basis points is multiplied: at highly infrequent events ($x>7$) the density is off by over 500 basis points.
So: it's very difficult to tell the models apart with most data, but they make very different predictions for uncommon events. If these events are important when they happen, say a stock market crash, this means trouble.
Moving on, the fourth column uses $P \sim \mathcal{N}(0.001,1)$, the same 10 basis points error, but in the mean rather than the variance. Column five is the ratio of these densities to the base case.
Comparing column five with column three we see that similarly sized errors in mean estimation have less impact than errors in variance estimation. Unfortunately variance is harder to estimate accurately than the mean (it uses the mean estimate as an input, for one), so this only tells us that problems are likely to happen where they are more damaging to model predictive abilities.
Column six shows the effect of a larger variance (100 basis points off the standard, instead of 10); column seven shows the ratio of this density to the base case.
With an error of 1% in the estimate of the variance it's still hard to separate the models within two standard deviations (for a Normal distribution about 95% of all events fall within two standard deviations of the mean), but the error in density estimates at $x=7$ is 62%.
Small probability events are very hard to predict because most of the times all the information available is not enough to choose between models that have very close parameters but these models predict very different things for infrequent cases.
Told you it was a downer.
-- -- --
* Some time ago I read a criticism of this nomenclature by someone who couldn't see its purpose. The purpose is good communication design: when there's a lot of 0.01% and 0.1% being spoken in a noisy environment it's a good idea to say "one basis point" or "ten basis points" instead of "point zero one" or "zero point zero one" or "point zero zero one." It's the same reason we say "Foxtrot Universe Bravo Alpha Romeo" instead of "eff u bee a arr" in audio communication.
NOTE for probabilists appalled at my use of $P$ in $f_{P}(x)$ instead of more traditional nomenclature $f_{X}(x)$ where the uppercase $X$ would mean the variable and the lowercase $x$ the value: most people get confused when they see something like $p=\Pr(x=X)$.
The idea for this post came from a speech by Nassim Nicholas Taleb at Penn. Though the video is a bit rambling, it contains several important points. One that is particularly interesting to me is the difficulty of estimating the probability of rare events.
For illustration, let's consider a Normally distributed random variable $P$, and see what happens when small model errors are introduced. In particular we want to how the probability density $f_{P}(\cdot)$ predicted by four different models changes as a function of distance to zero, $x$. The higher the $x$ the more infrequently the event $P = x$ happens.
The densities are computed in the following table (click for larger):

The first column gives $f_{P}(x)$ for $P \sim \mathcal{N}(0,1)$, the base case. The next column is similar except that there's a 0.1% increase in the variance (10 basis points*). The third column is the ratio of these densities. (These are not probabilities, since $P$ is a continuous variable.)
Two observations jump at us:
1. Near the mean, where most events happen, it's very difficult to separate the two cases: the ratio of the densities up to two standard deviations ($x=2$) is very close to 1.
2. Away from the mean, where events are infrequent (but potentially with high impact), the small error of 10 basis points is multiplied: at highly infrequent events ($x>7$) the density is off by over 500 basis points.
So: it's very difficult to tell the models apart with most data, but they make very different predictions for uncommon events. If these events are important when they happen, say a stock market crash, this means trouble.
Moving on, the fourth column uses $P \sim \mathcal{N}(0.001,1)$, the same 10 basis points error, but in the mean rather than the variance. Column five is the ratio of these densities to the base case.
Comparing column five with column three we see that similarly sized errors in mean estimation have less impact than errors in variance estimation. Unfortunately variance is harder to estimate accurately than the mean (it uses the mean estimate as an input, for one), so this only tells us that problems are likely to happen where they are more damaging to model predictive abilities.
Column six shows the effect of a larger variance (100 basis points off the standard, instead of 10); column seven shows the ratio of this density to the base case.
With an error of 1% in the estimate of the variance it's still hard to separate the models within two standard deviations (for a Normal distribution about 95% of all events fall within two standard deviations of the mean), but the error in density estimates at $x=7$ is 62%.
Small probability events are very hard to predict because most of the times all the information available is not enough to choose between models that have very close parameters but these models predict very different things for infrequent cases.
Told you it was a downer.
-- -- --
* Some time ago I read a criticism of this nomenclature by someone who couldn't see its purpose. The purpose is good communication design: when there's a lot of 0.01% and 0.1% being spoken in a noisy environment it's a good idea to say "one basis point" or "ten basis points" instead of "point zero one" or "zero point zero one" or "point zero zero one." It's the same reason we say "Foxtrot Universe Bravo Alpha Romeo" instead of "eff u bee a arr" in audio communication.
NOTE for probabilists appalled at my use of $P$ in $f_{P}(x)$ instead of more traditional nomenclature $f_{X}(x)$ where the uppercase $X$ would mean the variable and the lowercase $x$ the value: most people get confused when they see something like $p=\Pr(x=X)$.
Monday, August 29, 2011
Decline and fall of Western Manufacturing - a pessimistic reading of Pisano and Shih (2009)
Those who don't know history are condemned to repeat it.
Unfortunately those of us who do know history get dragged right along with the others, because we live in a world where everything is connected to everything else.

Above is my visualization of Pisano and Shih's 2009 Harvard Business Review article "Restoring American Competitiveness." This is a stylized version of a story that has happened in several industries.
Step 1: Companies start outsourcing their manufacturing operations to companies (or countries) which can perform them in a more cost-effective manner. Perhaps these companies/countries have cheaper labor, fewer costly regulations, or less overhead.
Step 2: Isolated from their manufacture, companies lose the skills for process engineering. After all, improving manufacturing processes is a task that depends on continuous experimentation and feedback from the manufacturing process. If the manufacturing process is outsourced, the necessary interaction between manufacturing and process engineers happens progressively more inside the contractor, not the original manufacturer.
Step 3: Without process engineering to motivate it, the original manufacturer (and the companies supporting it in the original country, in the diagram the US) stops investing in process technology development. For example, the companies that developed machine tools for US manufacturers in conjunction with US process engineers now have to so do with Taiwanese engineers in Taiwan, which leads to relocation of these companies and eventually of the skilled professionals.
Step 4: Because of spillovers in technological development between process technologies and product technologies (including the development of an engineering class and engineering support infrastructure), more and more product technology development is outsourced. For example, as fewer engineering jobs are available in the original country, fewer people go to engineering school; the opposite happens in the outsourced-to country, where an engineering class grows. That growth is a spillover that is seldom accounted for.
Step 5: As more and more technology development happens in the outsourced-to country, it captures more and more of the product innovation process, eventually substituting for the innovators in the original manufacturer's country. Part of this innovation may still be under contract with the original manufacturer, but the development of innovation skills in the outsourced-to country means that at some point it will have its own independent manufacturers (who will compete with the original manufacturer).
Pisano and Shih are optimists, as their article proposes solutions to slow, stop, and reverse this process of technological decline of the West (in their case, the US). It's worth a read (it's not free but it's cheaper than a day worth of lattes, m'kay?) and ends in an upbeat note.
I'm less optimistic than Pisano and Shih. Behold:
Problem 1: Too many people and too much effort dedicated to non-wealth-creating activities and too many people and too much effort aimed at stopping wealth-creating activities.
Problem 2: Lack of emphasis in useful skills (particularly STEM, entrepreneurship, and "maker" culture) in education. Sadly accompanied by a sense of entitlement and self-confidence which is inversely proportional to the actual skills.
Problem 3: Too much public discourse (politicians of both parties, news media, entertainment) which vilifies the creation of wealth and applauds the forcible redistribution of whatever wealth is created.
Problem 4: A generalized confusion between wealth and pieces of fancy green paper with pictures of dead presidents (or Ben Franklin) on them.
Problem 5: A lack of priorities or perspective beyond the immediate sectorial interests.
We are doomed!
Unfortunately those of us who do know history get dragged right along with the others, because we live in a world where everything is connected to everything else.

Above is my visualization of Pisano and Shih's 2009 Harvard Business Review article "Restoring American Competitiveness." This is a stylized version of a story that has happened in several industries.
Step 1: Companies start outsourcing their manufacturing operations to companies (or countries) which can perform them in a more cost-effective manner. Perhaps these companies/countries have cheaper labor, fewer costly regulations, or less overhead.
Step 2: Isolated from their manufacture, companies lose the skills for process engineering. After all, improving manufacturing processes is a task that depends on continuous experimentation and feedback from the manufacturing process. If the manufacturing process is outsourced, the necessary interaction between manufacturing and process engineers happens progressively more inside the contractor, not the original manufacturer.
Step 3: Without process engineering to motivate it, the original manufacturer (and the companies supporting it in the original country, in the diagram the US) stops investing in process technology development. For example, the companies that developed machine tools for US manufacturers in conjunction with US process engineers now have to so do with Taiwanese engineers in Taiwan, which leads to relocation of these companies and eventually of the skilled professionals.
Step 4: Because of spillovers in technological development between process technologies and product technologies (including the development of an engineering class and engineering support infrastructure), more and more product technology development is outsourced. For example, as fewer engineering jobs are available in the original country, fewer people go to engineering school; the opposite happens in the outsourced-to country, where an engineering class grows. That growth is a spillover that is seldom accounted for.
Step 5: As more and more technology development happens in the outsourced-to country, it captures more and more of the product innovation process, eventually substituting for the innovators in the original manufacturer's country. Part of this innovation may still be under contract with the original manufacturer, but the development of innovation skills in the outsourced-to country means that at some point it will have its own independent manufacturers (who will compete with the original manufacturer).
Pisano and Shih are optimists, as their article proposes solutions to slow, stop, and reverse this process of technological decline of the West (in their case, the US). It's worth a read (it's not free but it's cheaper than a day worth of lattes, m'kay?) and ends in an upbeat note.
I'm less optimistic than Pisano and Shih. Behold:
Problem 1: Too many people and too much effort dedicated to non-wealth-creating activities and too many people and too much effort aimed at stopping wealth-creating activities.
Problem 2: Lack of emphasis in useful skills (particularly STEM, entrepreneurship, and "maker" culture) in education. Sadly accompanied by a sense of entitlement and self-confidence which is inversely proportional to the actual skills.
Problem 3: Too much public discourse (politicians of both parties, news media, entertainment) which vilifies the creation of wealth and applauds the forcible redistribution of whatever wealth is created.
Problem 4: A generalized confusion between wealth and pieces of fancy green paper with pictures of dead presidents (or Ben Franklin) on them.
Problem 5: A lack of priorities or perspective beyond the immediate sectorial interests.
We are doomed!
Labels:
business,
management,
manufacturing,
Strategy
Monday, August 22, 2011
Preparing instruction is different from preparing presentations
The title bears repeating, as many people confuse instruction and presentation preparation skills and criteria for success: Preparing instruction is different from preparing presentations.
My 3500-word post on preparing presentations is exactly for that purpose, preparing presentations. I could try to write a post for preparing instruction, but it would quickly get to book size. In fact, I recommend several books in this post describing the evolution of information design in my teaching approach. (The most relevant books for teaching are at the addendum to this post.)
I made a diagram depicting my process of preparing for a instruction event (the diagram was for my personal use, but there's no reason not to share it; click for larger):

And, for comparison, the process for preparing presentations:

Because they look similar, I need to point out that the tools used in each phase of the process are different for presentations and for instruction.
I'm a big fan of participant-centered learning (though not necessarily the HBS cases that people always associate with PCL); the idea is simple: students learn from doing, not from watching the instructor do. So, many of the "materials" (more precisely, most of the time in the "plan with timing" part of the diagram) in an instruction event are audience work: discussions, examples brought by the audience (to complement those brought by the instructor) and exercises. These are not materials that can be used in a speech or a presentation to a large audience.
Also, while a story works as a motivator for both presentations and instruction, I tend to use exercises or problems as motivators for instruction. For example, I start a class on promotion metrics by asking "how do you measure the lift" of some promotional activity, and proceed from there. By making it a management task that they have to do as part of their jobs, I get some extra attention from the audience. Plus, they can immediately see how the class will help them with their jobs.*
There are presentations that are mostly for instruction purposes, and there are parts of instruction events that are presentations. But never mistake one for the other: preparing instruction is different from preparing presentations.
Though so much instruction is so poorly prepared that even the basics of presentation preparation will help make instruction less of a disaster, that's just a step towards instruction-specific preparation.
- - - - - - - - - - - -
*I have a large variety of exercises for each knowledge unit I teach, and they are not all of the form "here's a problem, what's the solution?" Some are of the forms "here's what a company is doing, what are they trying to achieve?" and "here's a problem, here's what the company is doing, what is wrong with that?"
Addendum: Two books on preparation (and delivery) of instruction, from the post describing the evolution of information design in my teaching approach:
Tools for teaching, by Barbara Gross Davis covers every element of course design, class design, class management, and evaluation. It is rather focussed on institutional learning (like university courses), but many of the issues, techniques, and checklists are applicable in other instruction environments.
Designing effective instruction, by Gary Morrison, Steven Ross, and Jerrold Kemp, complements Tools for teaching. While TfT has the underlying model of a class, this book tackles the issues of training and instruction from a professional service point of view. (In short: TfT is geared towards university classes, DEI is geared towards firm-specific Exec-Ed.)
My 3500-word post on preparing presentations is exactly for that purpose, preparing presentations. I could try to write a post for preparing instruction, but it would quickly get to book size. In fact, I recommend several books in this post describing the evolution of information design in my teaching approach. (The most relevant books for teaching are at the addendum to this post.)
I made a diagram depicting my process of preparing for a instruction event (the diagram was for my personal use, but there's no reason not to share it; click for larger):

And, for comparison, the process for preparing presentations:

Because they look similar, I need to point out that the tools used in each phase of the process are different for presentations and for instruction.
I'm a big fan of participant-centered learning (though not necessarily the HBS cases that people always associate with PCL); the idea is simple: students learn from doing, not from watching the instructor do. So, many of the "materials" (more precisely, most of the time in the "plan with timing" part of the diagram) in an instruction event are audience work: discussions, examples brought by the audience (to complement those brought by the instructor) and exercises. These are not materials that can be used in a speech or a presentation to a large audience.
Also, while a story works as a motivator for both presentations and instruction, I tend to use exercises or problems as motivators for instruction. For example, I start a class on promotion metrics by asking "how do you measure the lift" of some promotional activity, and proceed from there. By making it a management task that they have to do as part of their jobs, I get some extra attention from the audience. Plus, they can immediately see how the class will help them with their jobs.*
There are presentations that are mostly for instruction purposes, and there are parts of instruction events that are presentations. But never mistake one for the other: preparing instruction is different from preparing presentations.
Though so much instruction is so poorly prepared that even the basics of presentation preparation will help make instruction less of a disaster, that's just a step towards instruction-specific preparation.
- - - - - - - - - - - -
*I have a large variety of exercises for each knowledge unit I teach, and they are not all of the form "here's a problem, what's the solution?" Some are of the forms "here's what a company is doing, what are they trying to achieve?" and "here's a problem, here's what the company is doing, what is wrong with that?"
Addendum: Two books on preparation (and delivery) of instruction, from the post describing the evolution of information design in my teaching approach:
Tools for teaching, by Barbara Gross Davis covers every element of course design, class design, class management, and evaluation. It is rather focussed on institutional learning (like university courses), but many of the issues, techniques, and checklists are applicable in other instruction environments.
Designing effective instruction, by Gary Morrison, Steven Ross, and Jerrold Kemp, complements Tools for teaching. While TfT has the underlying model of a class, this book tackles the issues of training and instruction from a professional service point of view. (In short: TfT is geared towards university classes, DEI is geared towards firm-specific Exec-Ed.)
Labels:
presentations,
teaching
Thursday, July 28, 2011
A simple, often overlooked, problem with models
There are just too many possibilities.
Let's say we have one dependent variable, $y$, and ten independent variables, $x_1,\ldots,x_{10}$. How many models can we build? For simplicity let's keep our formulation linear (in the usual sense of the word, that is linear in the coefficients; see footnote).
Inexcusably wrong answer: 11 models.
Wrong answer: 1024 models.
Right-ish answer: $1.8 \times 10^{308}$ models.
Right answer: an infinity of models.
Ok, 1024 is the number of models which include at most one instance of each variable and no interaction. Something like
$ y = \beta_0 + \beta_1 \, x_1 + \beta_3 \, x_3 + \beta_7 \, x_7$ ,
of which there are $2^{10}$ models. (Since the constant $\beta_0$ can be zero by calibration, we'll include it in all models -- otherwise we'd have to demean the $y$.)
Once we consider possible interactions among variables, like $x_1 x_7 x_8$ for example, a three-way interaction, there are $2^{10}$ variables and interactions and therefore $2^{2^{10}}= 1.8 \times 10^{308}$ possible models with all interactions. For comparison, the number of atoms in the known universe is estimated to be in the order of $10^{80}$.
Of course, each variable can enter the model in a variety of functional forms: $x_1^{2}$, $\log(x_7)$, $\sin(5 \, x_9)$ or $x_3^{-x_{2}/2}$, for example, making it an infinite number of possibilities. (And there can be interactions between these different functions of different variables, obviously.)
(Added on August 11th.) Using polynomial approximations for generalized functions, say to the fourth degree, the total number of interactions is now $5^{10}=9765625$, as any variable may enter an interaction in one of five orders (0 through 4), and the total number of models is $2^{5^{10}}$ or around $10^{3255000}$. (End of addition.)
So here's a combinatorial riddle for statisticians: how can you identify a model out of, let's be generous, $1.8 \times 10^{308}$ with data in the exa- or petabyte range? That's almost three hundred orders of magnitude too little, methinks.
The main point is that any non-trivial set of variables can be modeled in a vast number of ways, which means that a limited number of models presented for appreciation (or review) necessarily includes an inordinate amount of judgement from the model-builder.
It's unavoidable, but seldom acknowledged.
--------------
The "linear in coefficients" point is the following. Take the following formulation, which is clearly non-linear in the $x$:
$y = \beta_0 + \beta_1 \, x_1^{1/4} + \beta_2 \, x_1 \, x_7$
but can be made linear very easily by making two changes of variables: $ z_1 = x_1^{1/4}$ and $z_2 = x_1 \, x_7$.
In contrast, the model $y = \alpha \, \sin( \omega \, t )$ cannot be linearized in coefficients $\alpha$ and $\omega$.
Let's say we have one dependent variable, $y$, and ten independent variables, $x_1,\ldots,x_{10}$. How many models can we build? For simplicity let's keep our formulation linear (in the usual sense of the word, that is linear in the coefficients; see footnote).
Inexcusably wrong answer: 11 models.
Wrong answer: 1024 models.
Right-ish answer: $1.8 \times 10^{308}$ models.
Right answer: an infinity of models.
Ok, 1024 is the number of models which include at most one instance of each variable and no interaction. Something like
$ y = \beta_0 + \beta_1 \, x_1 + \beta_3 \, x_3 + \beta_7 \, x_7$ ,
of which there are $2^{10}$ models. (Since the constant $\beta_0$ can be zero by calibration, we'll include it in all models -- otherwise we'd have to demean the $y$.)
Once we consider possible interactions among variables, like $x_1 x_7 x_8$ for example, a three-way interaction, there are $2^{10}$ variables and interactions and therefore $2^{2^{10}}= 1.8 \times 10^{308}$ possible models with all interactions. For comparison, the number of atoms in the known universe is estimated to be in the order of $10^{80}$.
Of course, each variable can enter the model in a variety of functional forms: $x_1^{2}$, $\log(x_7)$, $\sin(5 \, x_9)$ or $x_3^{-x_{2}/2}$, for example, making it an infinite number of possibilities. (And there can be interactions between these different functions of different variables, obviously.)
(Added on August 11th.) Using polynomial approximations for generalized functions, say to the fourth degree, the total number of interactions is now $5^{10}=9765625$, as any variable may enter an interaction in one of five orders (0 through 4), and the total number of models is $2^{5^{10}}$ or around $10^{3255000}$. (End of addition.)
So here's a combinatorial riddle for statisticians: how can you identify a model out of, let's be generous, $1.8 \times 10^{308}$ with data in the exa- or petabyte range? That's almost three hundred orders of magnitude too little, methinks.
The main point is that any non-trivial set of variables can be modeled in a vast number of ways, which means that a limited number of models presented for appreciation (or review) necessarily includes an inordinate amount of judgement from the model-builder.
It's unavoidable, but seldom acknowledged.
--------------
The "linear in coefficients" point is the following. Take the following formulation, which is clearly non-linear in the $x$:
$y = \beta_0 + \beta_1 \, x_1^{1/4} + \beta_2 \, x_1 \, x_7$
but can be made linear very easily by making two changes of variables: $ z_1 = x_1^{1/4}$ and $z_2 = x_1 \, x_7$.
In contrast, the model $y = \alpha \, \sin( \omega \, t )$ cannot be linearized in coefficients $\alpha$ and $\omega$.
Sunday, July 24, 2011
Three thoughts on presentation advice
As someone who makes presentations for a living,* I regularly peruse several blogs and forums on presentations. Here are three thoughts on presentation advice, inspired by that perusal.
1. The problem with much presentation advice is that it's a meta exercise: a presentation about presentations. And it falls into what I like to call the Norman Critique of Tufte's Table Argument (NCoTTA). From Don Norman's essay "In Defense Of Powerpoint":
2. When an attendee of a short talk I recently gave asked me for quick advice on presentations I said: have a simple clear statement, in complete sentences, of what your presentation is supposed to achieve. He was flummoxed; I assume he wanted the secret sauce for my slides.
Here's a slide from that talk:

It's obvious that there is no secret sauce here; extending the cooking metaphor, what that slide shows is a good marinade: preparation. Though many presentation advice websites talk about rehearsal and working the room as preparation, what I mean is what this 3500-word post explains.
For example, knowing what the 100,000 SKU statistic is for, I chose to put the size of FMCG consideration sets as a footer, to contextualize the big number. Different uses of the big number get different footers to put it into the appropriate perspective. If all I wanted to do was illustrate how big that number is, I could say "if you bought a different SKU every day, you'd need almost 300 years to go through them all."
Most advice on presentations will not be useful because the content and the context of the presentation are much more important to the design of the presentation than generic rules. (Hence the NCoTTA problem so much advice has. Ditto for this slide, since I didn't explain what the talk was about.)
3. Speaking of Tufte, one of the things that separates him from the other presentation advocates is that he takes a full view of the communication process (partially illustrated in this post): from the speaker's data to the receiver's understanding. Here's a simple diagram to illustrate the sequence:
Most presentation advice is about the creation and, especially, the delivery of presentations. Tufte stands more or less alone as one who discusses the receiving and processing of presentation material: how to pay attention (not just being "engaged," but actually processing the information and checking for unstated assumptions, logical fallacies, psychological biases, or innumeracy) and how to elaborate on one's own, given presentation materials.
Other than Tufte and his constant reminder that receiving a presentation is an active process rather than a passive event, presentation coaches focus almost all their attention on the presenter-side processes. Many "Tufte followers" also miss this point: when processing a presentation by someone else they focus on the presentation itself (the slides, the design, the handouts) instead of the content of the presentation, i.e. the insights.
-- -- -- --
* Among other things, like teaching, creating original research, and writing.
1. The problem with much presentation advice is that it's a meta exercise: a presentation about presentations. And it falls into what I like to call the Norman Critique of Tufte's Table Argument (NCoTTA). From Don Norman's essay "In Defense Of Powerpoint":
Tufte doesn't overload the audience in his own talks—but that is because he doesn't present data as data, he presents data as examples of what slides and graphical displays might look like, so the fact that the audience might not have time to assimilate all the information is irrelevant.It's funny that Tufte is actually one of the people who least deserve the NCoTTA; most presentation coaches make that error more often and to greater depths.
2. When an attendee of a short talk I recently gave asked me for quick advice on presentations I said: have a simple clear statement, in complete sentences, of what your presentation is supposed to achieve. He was flummoxed; I assume he wanted the secret sauce for my slides.
Here's a slide from that talk:

It's obvious that there is no secret sauce here; extending the cooking metaphor, what that slide shows is a good marinade: preparation. Though many presentation advice websites talk about rehearsal and working the room as preparation, what I mean is what this 3500-word post explains.
For example, knowing what the 100,000 SKU statistic is for, I chose to put the size of FMCG consideration sets as a footer, to contextualize the big number. Different uses of the big number get different footers to put it into the appropriate perspective. If all I wanted to do was illustrate how big that number is, I could say "if you bought a different SKU every day, you'd need almost 300 years to go through them all."
Most advice on presentations will not be useful because the content and the context of the presentation are much more important to the design of the presentation than generic rules. (Hence the NCoTTA problem so much advice has. Ditto for this slide, since I didn't explain what the talk was about.)
3. Speaking of Tufte, one of the things that separates him from the other presentation advocates is that he takes a full view of the communication process (partially illustrated in this post): from the speaker's data to the receiver's understanding. Here's a simple diagram to illustrate the sequence:
Most presentation advice is about the creation and, especially, the delivery of presentations. Tufte stands more or less alone as one who discusses the receiving and processing of presentation material: how to pay attention (not just being "engaged," but actually processing the information and checking for unstated assumptions, logical fallacies, psychological biases, or innumeracy) and how to elaborate on one's own, given presentation materials.
Other than Tufte and his constant reminder that receiving a presentation is an active process rather than a passive event, presentation coaches focus almost all their attention on the presenter-side processes. Many "Tufte followers" also miss this point: when processing a presentation by someone else they focus on the presentation itself (the slides, the design, the handouts) instead of the content of the presentation, i.e. the insights.
-- -- -- --
* Among other things, like teaching, creating original research, and writing.
Labels:
presentations
Subscribe to:
Posts (Atom)
