Let's say we have one dependent variable, $y$, and ten independent variables, $x_1,\ldots,x_{10}$. How many models can we build? For simplicity let's keep our formulation linear (in the usual sense of the word, that is linear in the coefficients; see footnote).

Inexcusably wrong answer: 11 models.

Wrong answer: 1024 models.

Right-ish answer: $1.8 \times 10^{308}$ models.

Right answer: an infinity of models.

Ok, 1024 is the number of models which include at most one instance of each variable and no interaction. Something like

$ y = \beta_0 + \beta_1 \, x_1 + \beta_3 \, x_3 + \beta_7 \, x_7$ ,

of which there are $2^{10}$ models. (Since the constant $\beta_0$ can be zero by calibration, we'll include it in all models -- otherwise we'd have to demean the $y$.)

Once we consider possible interactions among variables, like $x_1 x_7 x_8$ for example, a three-way interaction, there are $2^{10}$ variables and interactions and therefore $2^{2^{10}}= 1.8 \times 10^{308}$ possible models with all interactions. For comparison, the number of atoms in the known universe is estimated to be in the order of $10^{80}$.

Of course, each variable can enter the model in a variety of functional forms: $x_1^{2}$, $\log(x_7)$, $\sin(5 \, x_9)$ or $x_3^{-x_{2}/2}$, for example, making it an infinite number of possibilities. (And there can be interactions between these different functions of different variables, obviously.)

**(Added on August 11th.)**Using polynomial approximations for generalized functions, say to the fourth degree, the total number of interactions is now $5^{10}=9765625$, as any variable may enter an interaction in one of five orders (0 through 4), and the total number of models is $2^{5^{10}}$ or around $10^{3255000}$.

**(End of addition.)**

So here's a combinatorial riddle for statisticians: how can you identify a model out of, let's be generous, $1.8 \times 10^{308}$ with data in the exa- or petabyte range? That's almost three hundred orders of magnitude too little, methinks.

The main point is that any non-trivial set of variables can be modeled in a vast number of ways, which means that a limited number of models presented for appreciation (or review) necessarily includes an inordinate amount of

**judgement**from the model-builder.

It's unavoidable, but seldom acknowledged.

--------------

The "linear in coefficients" point is the following. Take the following formulation, which is clearly non-linear in the $x$:

$y = \beta_0 + \beta_1 \, x_1^{1/4} + \beta_2 \, x_1 \, x_7$

but can be made linear very easily by making two changes of variables: $ z_1 = x_1^{1/4}$ and $z_2 = x_1 \, x_7$.

In contrast, the model $y = \alpha \, \sin( \omega \, t )$ cannot be linearized in coefficients $\alpha$ and $\omega$.