Wednesday, November 18, 2020

A thought about the DANMASK study and presenting quantitative results


This post is about the analysis and presentation of the results, not about the substantive question of whether to wear masks. Link to the study

The main point here is that the way the results are presented, without a comparison counterfactual, makes it difficult to understand what the study really means:



So, without further ado, the results themselves. Cutting through a lot of important but uninteresting stuff, there are four numbers that matter:

Size of the sub-sample with masks: 2392

Number of infected in the mask sub-sample: 43

Size of the sub-sample with no masks: 2470

Number of infected in the no-mask sub-sample: 52

From these four numbers we can compute the incidence of infection given masks (1.8%) and no masks (2.1%). We can also test these numbers in a variety of ways, including using the disaggregate data to calibrate a logit model (no, I won't call it ``logistic regression''), but for now let's look at those two incidences only.


A likelihood ratio "test"


Here's a simple test we like: the likelihood ratio test between two hypotheses: that both samples are drawn from a common incidence (1.95%) or that each sample is drawn from its own incidence. In other words, we want

\[ LR = \frac{\Pr(52 \text{ pos out of } 2470|p = 0.021)\, \Pr(43 \text{ pos out of } 2392|p = 0.018)}{\Pr(52 \text{ pos out of } 2470|p = 0.0195)\, \Pr(43 \text{ pos out of } 2392|p = 0.0195)}\]

Using log-space computations to get around precision problems, we get $LR = 1.083$.

In other words, it's only 8.3% more likely that the data comes from two groups with different incidences than from a group with a common incidence. In order to be minimally convinced we'd like that likelihood ratio to be 20 or so, at least, so LR = 1.083 supports the frequentist analysis that these numbers seem to come from the same population.

(Yes, this is the same test we apply to Rotten Tomatoes ratings.)


Presenting the results: make comparisons!


A problem with the paper is that the academese of the results is hard for many people to understand. One way to make the lack of effect of masks more obvious is to create a comparison with an alternative. We choose a simple 2:1 ratio of protection, which is weak protection (a person wearing a mask has half the likelihood of infection of that of someone with no mask), but is enough to make the point.

Since we want to make fair comparison, we need to use the same size population for both the mask and no-mask conditions (we'll choose 2400 as it's in the middle of those sample sizes) and infection ratios similar to those of the test (we choose 1.25% and 2.5% for mask and no-mask, respectively). Now all we need to do is plot and compare:



(The more eagle-eyed readers will notice that these are Poisson distributions.)

The comparison with an hypothetical, even one as basic as a 2:1 protection ratio makes the point that the distributions on the left overlap a lot and therefore there's a fair chance that they come from the same population (in other words that there's no difference in incidence of infections between the use and non-use of masks).


Bayesians (and more attentive frequentists) might note at this point that having non-significant differences isn't the same thing as having a zero effect size; and that a richer model (including the distributions of the estimates themselves, which are random variables) might be useful to drive policy.

But for now, the point is that those four lines in the figure are much easier to interpret than the word-and-number salad under the subheading "results" in the paper itself.