Wednesday, June 12, 2019

A statistical analysis of reviews of L.A. Finest: audience vs. critics



"If numbers are available, let's use the numbers. If all we have are opinions, let's go with mine." -- variously attributed to a number of bosses.

There's a new police procedural this season, L.A. Finest, and Rotten Tomatoes has done it again: critics and audience appear to be at loggerheads. Like with The Orville, Star Trek Discovery, and the last season of Doctor Who.

But "appear to be" is a dequantified statement. And Rotten Tomatoes has numbers; so, what can these numbers tell us?

Before they can tell us anything, we need to write our question: first in words, then as a math problem. Then we can solve the math problem and that solution gets translated into a "words" answer, but now a quantified "words" answer.

The question, which is suggested by the above numbers is:
Do the critics and the audience use similar or opposite criteria to rate this show?
One way to answer this question, which would have been feasible in the past when Rotten Tomatoes had user reviews, would be to do text analytics on the reviews themselves. But now the user reviews are gone so that's no longer possible.

Another way, a simpler and cleaner way, is to use the data above.

To simplify we'll assume that all ratings are either positive or negative, 0 or 1; there are some unobservable random factors that make some people like a show more or less, so these ratings are random variables. For a given person $i$, the probability that that person likes L.A. Finest is captured in some parameter $\theta_i$ (we don't observe that, of course), which is the probability of that person giving a positive rating.

So, our question above is whether the $\theta_i$ of the critics and the $\theta_i$ of the audience are the same or "opposed." And what is "opposed"? If $i$ and $j$ use opposite criteria, the probability that $i$ gives a 1 is the probability that $j$ gives a 0, so $\theta_i = 1-\theta_j$.

We don't have the individual parameters $\theta_i$ but we can simplify again by assuming that all variation within each group (critics or audience) is random, so we really only need two $\theta$.

We are comparing two situations, call them: hypothesis zero, $H_0$, meaning the critics and the audience use the same criteria, that is they have the same $\theta$, call it $\theta_0$; and hypothesis one, $H_1$, meaning the critics use criteria opposite to those of the audience, so if the critics $\theta$ is $\theta_1$, the audience $\theta$ is $(1-\theta_1)$.

Yes, I know, we don't have $\theta_0$ or $\theta_1$. We'll get there.

Our "words" question now becomes the following math problem: how much more likely is it that the data we observe is created by $H_1$ versus created by $H_0$, or in a formula: what is the likelihood ratio

$LR = \frac{\Pr(\mathrm{Data}| H_1)}{\Pr(\mathrm{Data}| H_0)} $?

Observation: This is different from the usual statistics test: the usual test is whether the two distributions are different; we are testing for a specific type of difference, opposition. So there are in fact three states of the world: same, opposite, and different but not opposite; we want to compare the likelihood of the first two. If same is much more likely than opposite, then we conclude 'same.' If opposite is much more likely than same, we conclude 'opposite.' If same and opposite have similar likelihoods (for some notion of 'similar' we'd have to investigate), then we conclude 'different but not opposite.'

Our data is four numbers: number of critics $N_C = 10$, number of positive reviews by critics $k_C = 1$, number of audience members $N_A = 40$, number of positive reviews by audience members $k_A = 30$.

But what about the $\theta_0$ and $\theta_1$?

This is where the lofty field of mathematics gives way to the down and dirty world of estimation. We estimate $\theta$ by maximum likelihood, and the maximum likelihood estimator for the probability of a positive outcome of a binary random variable (called a Bernoulli variable) is the sample mean.

Yep, all those words to say "use the share of 1s as the $\theta$."

Not so fast. True, for $H_0$, we use the share of ones

$\theta_0 = (k_C + k_A)/(N_C + N_A) = 31/50 = 0.62$;

but for $H_1$, we need to address the audience's $1-\theta_1$ by reverse coding the zeros and ones, in other words,

$\theta_1 = (k_C + (N_A - k_A))/(N_C + N_A) = 11/50 = 0.22$.

Yes, those two fractions are "estimation." Maximum likelihood estimation, at that.

Now that we are done with the dirty statistics, we come back to the shiny world of math, by using our estimates to solve the math problem. That requires a small bit of combinatorics and probability theory, all in a single sentence:

If each individual data point is an independent and identically distributed Bernoulli variable, the sum of these data points follows the binomial distribution.

Therefore the desired probabilities, which are joint probabilities of two binomial distributions, one for the critics, one for the audience, are

$\Pr(\mathrm{Data}| H_0) = c(N_C,k_C) (\theta_0)^{k_C} (1- \theta_0)^{N_C- k_C} \times c(N_A,k_A) (\theta_0)^{k_A} (1- \theta_0)^{N_A- k_A}$

and

$\Pr(\mathrm{Data}| H_1) = c(N_C,k_C) (\theta_1)^{k_C} (1- \theta_1)^{N_C- k_C} \times c(N_A,k_A) (1 -\theta_1)^{k_A} (\theta_1)^{N_A- k_A}$.

Replacing the symbols with the estimates and the data we get

$\Pr(\mathrm{Data}| H_0) = 3.222\times 10^{-5}$;
$\Pr(\mathrm{Data}| H_1) = 3.066\times 10^{-2}$.

We can now compute the likelihood ratio,

$LR = \frac{\Pr(\mathrm{Data}| H_1)}{\Pr(\mathrm{Data}| H_0)} = 915$,

and translate that into words to make the statement
It's 915 times more likely that critics are using criteria opposite to those of the audience than the same criteria.
Isn't that a lot more satisfying than saying they "appear to be at loggerheads"?