Tuesday, December 10, 2019

Analysis paralysis vs precipitate decisions

Making good decisions includes deciding when you should make the decision.

There was a discussion on twitter where Tanner Guzy, (whose writings/tweets about clothing provide a counterpoint to the stuffier subforums of The Style Forum and the traditionalist The London Lounge), expressed a common opinion that is, alas, too reductive:


The truth is out there... ahem, is more complicated than that:


Making a decision without enough information is precipitate and usually leads to wrong decisions, in that even if the outcome turns out well it's because of luck; relying on luck is not a good foundation for decision-making. The thing to do is continue to collect information until the risk of making a decision is within acceptable parameters.

(If a decision has to be made by a certain deadline, then the risk parameters should work as a guide to whether it's better to pass on the opportunities afforded by the decision or to risk making the decision based on whatever information is available at that time.)

Once enough information has been obtained to make the decision risk acceptable, the decision-maker should commit to the appropriate course of action. If the decision-maker keeps postponing the decision and waiting for more information, that's what is correctly called "analysis paralysis."

Let us clarify some of these ideas with numerical examples, using a single yes/no decision for simplicity. Say our question is whether to short the stock of a company that's developing aquaculture farms in the Rub' al Khali.

Our quantity of interest is the probability that the right choice is "yes," call it $p(I_t)$ where the $I_t$ is the set of information available at time $t$. At time zero we'll have $p(I_0) = 0.5$ to represent a no-information state.

Because we can hedge the decision somewhat, there's a defined range of probabilities for which the risk is unacceptable (say from 0.125 to 0.875 for our example), but outside of that range the decision can be taken: if the probability is consistently above 0.875 it's safe to choose yes, if it's below 0.125 it's safe to choose no.

Let's say we have some noisy data; there's one bit of information out there $T$ (for true), which is either zero or one (zero means the decision should be no, one that it should be yes), but each data event is a noisy representation of $T$, call it $E_i$, where $i$ is the number of data event, defined as

$E_i = T $ with probability $1 - \epsilon$  and

$E_i = 1-T $ with probability $\epsilon$,

where $\epsilon$ is the probability of an error. These data events could be financial analysts reports, feasibility analyses of aquaculture farms in desert climates, political stability in the area that might affect industrial policies, etc. As far as we're concerned, they're either favorable (if 1) or unfavorable (if 0) to our stock short.

Let's set $T=1$ for illustration, in other words, "yes" is the right choice (as seen by some hypothetical being with full information, not the decision-maker). In the words of the example decision, $T=1$ means it's a good idea to short the stock of companies that purport to build aquaculture farms in the desert (the "yes" decision).

The decision-maker doesn't know that $T=1$, and uses as a starting point the no-knowledge position, $p(I_0) = 0.5$.

The decision-maker collects information until such a time as the posterior probability is clearly outside the "zone of unacceptable risk," here the middle 75% of the probability range. Probabilities are updated using Bayes's rule assuming that the decision-maker knows the $\epsilon$, in other words the reliability of the data sources:

$p(I_{k+1} | E_{k+1} = 1) = \frac{ (1- \epsilon) \times p(I_k)}{(1- \epsilon) \times p(I_k) + \epsilon \times (1- p(I_k))}$  and

$p(I_{k+1} | E_{k+1} = 0) = \frac{ \epsilon \times p(I_k)}{  \epsilon \times p(I_k) + (1- \epsilon) \times (1- p(I_k)) }$.

For our first example, let's have $\epsilon=0.3$, a middle-of-the-road case. Here's an example (the 21 data events are in blue, but we can only see the ones because the zeros have zero height):


We get twenty-one reports and analyses; some (1, 4, 6, 8, 9, 13, 14, and 21) are negative (they say we shouldn't short the stock), while the others are positive; this data is used to update the probability, in red, and that probability is used to drive the decision. (Note that event 21 would be irrelevant as the decision would have been taken before that.)

In this case, making a decision before the 17th data event would be precipitate and for better resilience one should wait at least two more without entering the zone of unacceptable risk before committing to a yes, so making the decision only after event 19 isn't a case of analysis paralysis.

Another example, still with $\epsilon=0.3$:


In this case, committing to yes after event 13 would be precipitate, whereas after event 17 would be an appropriate time.

If we now consider cases with lower noise, $\epsilon=0.25$, we can see that decisions converge to the "yes" answer faster and also why one should not commit as soon as the first data event brings the posterior probability outside of the zone of unacceptable risk:



If we now consider cases with higher noise, $\epsilon=0.4$, we can see that it takes longer for the information to converge (longer than the 21 events depicted) and therefore a responsible decision-maker would wait to commit to the decision:



In the last example, the decision-maker might take a gamble after data event 18, but to be sure the commit should only happen after a couple of events in which the posterior probability was outside the zone of unacceptable risk..

Deciding when to commit to a decision is as important as the decision itself; precipitate decisions come from committing too soon, analysis paralysis from a failure to commit when appropriate.