Si Tacuisses, Philosophus Mansisses: Medical tests and probabilities

You may have heard this one, but bear with me.

Let's say you get tested for a condition that affects ten percent of the population and the test is positive. The doctor says that the test is ninety percent accurate (presumably in both directions). How likely is it that you really have the condition?

[Think, think, think.]

Most people, including most doctors themselves, say something close to $90\%$; they might shade that number down a little, say to $80\%$, because they understand that "the base rate is important."

Yes, it is. That's why one must do computation rather than fall prey to anchor-and-adjustment biases.

Here's the computation for the example above (click for bigger):

One-half. That's the probability that you have the condition given the positive test result.

We can get a little more general: if the base rate is $\Pr(\text{sick}) = p$ and the accuracy (assumed symmetric) of the test is $\Pr(\text{positive}|\text{sick}) = \Pr(\text{negative}|\text{not sick}) = r $, then the probability of being sick given a positive test result is

\[ \Pr(\text{sick}|\text{positive}) = \frac{p \times r}{p \times r + (1- p) \times (1-r)}. \]

The following table shows that probability for a variety of base rates and test accuracies (again, assuming that the test is symmetric, that is the probability of a false positive and a false negative are the same; more about that below).

A quick perusal of this table shows some interesting things, such as the really low probabilities, even with very accurate tests, for the very small base rates (so, if you get a positive result for a very rare disease, don't fret too much, do the follow-up).

There are many philosophical objections to all the above, but as a good engineer I'll ignore them all and go straight to the interesting questions that people ask about that table, for example, how the accuracy or precision of the test works.

Let's say you have a test of some sort, cholesterol, blood pressure, etc; it produces some output variable that we'll assume is continuous. Then, there will be a distribution of these values for people who are healthy and, if the test is of any use, a different distribution for people who are sick. The scale is the same, but, for example, healthy people have, let's say, blood pressure values centered around 110 over 80, while sick people have blood pressure values centered around 140 over 100.

So, depending on the variables measured, the type of technology available, the combination of variables, one can have more or less overlap between the distributions of the test variable for healthy and sick people.

Assuming for illustration normal distributions with equal variance, here are two different tests, the second one being more precise than the first one:

Note that these distributions are fixed by the technology, the medical variables, the biochemistry, etc; the two examples above would, for example, be the difference between comparing blood pressures (test 1) and measuring some blood chemical that is more closely associated with the medical condition (test 2), not some statistical magic made on the same variable.

Note that there are other ways that a test A can be more precise than test B, for example if the variances for A are smaller than for B, even if the means are the same; or if the distributions themselves are asymmetric, with longer tails on the appropriate side (so that the overlap becomes much smaller).

(Note that the use of normal distributions with similar variances above was only for example purposes; most actual tests have significant asymmetries and different variances for the healthy versus sick populations. It's something that people who discover and refine testing technologies rely on to come up with their tests. I'll continue to use the same-variance normals in my examples, for simplicity.)

A second question that interested (and interesting) people ask about these numbers is why the tests are symmetric (the probability of a false positive equal to that of a false negative).

They are symmetric in the examples we use to explain them, since it makes the computation simpler. In reality almost all important preliminary tests have a built-in bias towards the most robust outcome.

For example, many tests for dangerous conditions have a built-in positive bias, since the outcome of a positive preliminary test is more testing (usually followed by relief since the positive was a false positive), while the outcome of a negative can be lack of treatment for an existing condition (if it's a false negative).

To change the test from a symmetric error to a positive bias, all that is necessary is to change the threshold between positive and negative towards the side of the negative:

In fact, if you, the patient, have access to the raw data (you should be able to, at least in the US where doctors treat patients like humans, not NHS cost units), you can see how far off the threshold you are and look up actual distribution tables on the internet. (Don't argue these with your HMO doctor, though, most of them don't understand statistical arguments.)

For illustration, here are the posterior probabilities for a test that has bias $k$ in favor of false positives, understood as $\Pr(\text{positive}|\text{not sick}) = k \times \Pr(\text{negative}|\text{sick})$, for some different base rates $p$ and probability of accurate positive test $r$ (as above):

So, this is good news: if you get a scary positive test for a dangerous medical condition, that test is probably biased towards false positives (because of the scary part) and therefore the probability that you actually have that scary condition is much lower than you'd think, even if you'd been trained in statistical thinking (because that training, for simplicity, almost always uses symmetric tests). Therefore, be a little more relaxed when getting the follow-up test.

There's a third interesting question that people ask when shown the computation above: the probability of someone getting tested to begin with. It's an interesting question because in all these computational examples we assume that the population that gets tested has the same distribution of sick and health people as the general population. But the decision to be tested is usually a function of some reason (mild symptoms, hypochondria, job requirement), so the population of those tested may have a higher incidence of the condition than the general population.

This can be modeled by adding elements to the computation, which makes the computation more cumbersome and detracts from its value to make the point that base rates are very important. But it's a good elaboration and many models used by doctors over-estimate base rates precisely because they miss this probability of being tested. More good news there!

Probabilities: so important to understand, so thoroughly misunderstood.

- - - - -
Production notes

1. There's nothing new above, but I've had to make this argument dozens of times to people and forum dwellers (particularly difficult when they've just received a positive result for some scary condition), so I decided to write a post that I can point people to.

2. [warning: rant] As someone who has railed against the use of spline drawing and quarter-ellipses in other people's slides, I did the right thing and plotted those normal distributions from the actual normal distribution formula. That's why they don't look like the overly-rounded "normal" distributions in some other people's slides: because these people make their "normals" with free-hand spline drawing and their exponentials with quarter ellipses, That's extremely lazy in an age when any spreadsheet, RStats, Matlab, or Mathematica can easily plot the actual curve. The people I mean know who they are. [end rant]

Friday, January 13, 2017

Medical tests and probabilities