__Movie: The Man Who Knew Infinity__There's not much math in the movie, but it's a good movie.*

Bonus: Numberphile on Ramanujan –

__Book: Weapons of Math Destruction, by Cathy O'Neil__String theorist Lubos Motl doesn't like it. I think he's a bit harsh, but mostly correct. There are two problems with the book, IMNSHO:

1. An

*attribution problem*: data-based decision-support systems (DB-DSSs) are blamed for problems that come from their use. That blame correctly belongs to the people making the decisions, either the users of the DB-DSSs or the people who choose/design/program those DB-DSSs. Most of her examples are of people in power hiding behind "math" to further an agenda, just like in other times people in power used "the divine right of kings" to further their agenda.

2. An

*inconvenient realities problem*: there are some real heterogeneities that O'Neil doesn't like, but have real-world consequences (say: young men are more likely than other demographics to die doing something stupid; their life insurance is concomitantly more expensive). O'Neil appears to want at least some those realities hidden; DB-DSSs find them, therefore she wants them regulated/limited/overruled. I say: if they exist, find them, understand them, address them (for example, take mitigating actions to avoid their exploitation).

It's important to note that the choice we have is not between an ideal world and flawed DB-DSSs; it's between flawed big data, very flawed small data, and zero-data preconceived notions. So, in general, the more data the better:

For example: I'm a Portuguese male; knowing only that, most people would assume I like soccer (I don't). Netflix, YouTube, and Amazon, having precise data about my revealed preferences, never suggest anything soccer-related, though they will suggest a lot of exercise-related content/purchases.More data leads to more personalized recommendations and away from stereotypes. But...

But there are always outliers. People who are outside the predictive model; people whose data is incorrect; people who have changed significantly over time (most people don't change, but some do); people for whom there's no data; and, as the excerpt above illustrates, pretty much all metrics can be gamed and many are.

So, there's a need for

*balance*,

*accountability*, and

*clarity*. In that, O'Neil is right, but that's not a DB-DSSs problem, it's a problem with

*every human system*. If anything, DB-DSSs have the potential for improving all three. (Balance may appear strange at first, but there's a whole field of multi-criteria decision-making that studies the comparative statics of varying evaluation criteria.)

I might write something longer about this, as there's an increasing number of anti-DB-DSSs books, posts, and videos.

Bonus: here's O'Neil presenting some of the examples from the book –

__Paper: "Why does deep and cheap learning work so well?" by Henry Lin and Max Tegmark__http://arxiv.org/abs/1608.08225

Via

*The Tech Review*, I found this arxiv paper explaining the connection between deep learning and the structure of the universe. (Yes, it's a real technical paper. When I say I like math, it's not like the people who "love science," but only if they don't need to learn any.)

Abstract (JCS comments in

*italic blue*):

We show how the success of deep learning depends not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can be approximated through "cheap learning" with exponentially fewer parameters than generic ones, because they have simplifying properties tracing back to the laws of physics.

In other words: the reasons these deep learning systems work when math would suggest they shouldn't is that the physical world is a lot more organized than arbitrary math spaces.

The exceptional simplicity of physics-based functions hinges on properties such as symmetry, locality, compositionality and polynomial log-probability, and we explore how these properties translate into exceptionally simple neural networks approximating both natural phenomena such as images and abstract representations thereof such as drawings.

In other words: as long as what we're learning is nicely behaved (defined as "symmetry, locality, compositionality and polynomial log-probability"), this trick for massive information-compression will work.

We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to renormalization group procedures. Various "no-flattening theorems" show when these efficient deep networks cannot be accurately approximated by shallow ones without efficiency loss - even for linear networks.

In other words: You really need the "deep" in deep learning.

Bonus: An introduction to deep learning, by Andrew Ng –

__Video: A Hole in a Hole in a Hole by Numberphile__Ah, the crazy world of topology...

There's some extra footage in Numberphile's second channel:

-- -- -- -- -- -- --

* For people who say that it's impossible to make a compelling movie with complicated technical content in it, I recommend watching the movie

*Copenhagen*, where several elements of quantum mechanics are interweaved with the story.