Tuesday, December 12, 2006

Malcolm Gladwell's foresight bias

Malcolm Gladwell's essay "The Formula" (not yet online) appears in the New Yorker's media issue of Oct 16 of this year. This time, he's reporting on a group of computer-savvy entrepreneurs who have developed a grand artificial intelligence system that, he reports, can predict the earnings potential of a given film script or song.

It's an ambitious essay, the kind that sounds like it might be the basis for Tipping Point 3: The Blink-Master, and I'm glad that Gladwell reaches for big concepts; this one is entertaining and though-provoking as always. But I wish MG were not so eager a convert to novel ideas.

As is his way, when he points out the surprising successes of this month's big idea, he writes as an evangelist rather than a journalist. The prediction system, we learn, foresaw the success of Norah Jones and the Gnarls Barkley song "Crazy", as well as the middling box office returns of the Nicole Kidman-Sean Penn snoozefest The Interpreter, with astonishing accuracy:
According to the formula, the final shooting script was a $69-million picture (an estimate that came within $4 million of the actual box-office).
This is exciting, but how did the system do with other movies? You know, the ones that these geniuses--who seem to be Gladwell's only source on the quality of the pricey service they sell--didn't jump to tell Gladwell all about? According to Gladwell's vague timetable, the prediction of $69 million was made after the film's box office total was made public; he doesn't hide that fact, but in the face of his enthusiasm it's hard to keep it in mind.

In fact there's not much besides enthusiasm and unsourced anecdotes in the article to validate Gladwell's scoop. It is believable that vague, subjective artistic qualities can be accurately measured by a formula; there have been successes in this field for decades. But Gladwell gives neither context to place the difficulty of the predictive problem he describes, nor evidence that his subjects have accomplished anything at all.

It's not just that the examples are cherry-picked--by the subjects, or by Gladwell, or by both in succession--but that Gladwell gives the impression that the formula is basically mathematical. In fact it largely relies on subjective human judgment. Humans judge which results are to be considered normal for the system, and which are to be considered buggy; humans judge which of the various trials and results to report to Gladwell and to their clients; and, astonishingly, humans are even inputting their subjective quantifications of things like character development.

Here's how Gladwell explains the generation of the prediction system:
The two men... had broken down the elements of screenplay narrative into multiple categories, and then drawn on their excyclopedic knowledge of television and film to assign scripts a score in each of those categories--creating a giant screenplay report card... They could treat screenplays as mathematical propositions, using Mr. Pink and Mr. Brown's categories and scores... [emph. added]
And what are these categories and scores? First, Gladwell quotes one member of the group:"You know, the star wears a blue shirt. The star doesn't zip up his pants. Whatever." A system that generates an assessment of success from such atoms must innovative indeed. But read on:
He started with the first film and had the neural network make a guess: maybe it said that the hero's moral crisis in act one, which rated a 7 on the 10-point moral-crisis scale, was worth $7 million, and having a gorgeous red-headed eighteen-hear-old female leadwhose characterization came in at 6.5 was worth $3 million and a 9-point bonding moment between the male lead and a four-year-old boy in act three was worth $2 million, and so on...
Gladwell uses passive language ("rated", "came in at") here, possibly because that allows him to avoid overtly mentioning that the 7/10, 6.5/10 and 9/10 ratings are entirely human-generated.

This is not the promised quantitative code for a hit. When the group gives analysis to studios, it turns out, they suggest things like "better characterization" and ask for "the city where the film was set to be much more of a presence". This advice may be useful, but not more so than a producer with a good eye for public taste. The predictions are certainly no stronger than the judgment of the input values (6.5-point characterization etc.), just the type of datum that can be influenced subconsciously by other "hit" qualities, If, for example, there is a hilarious, winning scene that showcases the lead's star power, but does not develop the character deeply, can we trust that the human judges won't bump up the characterization score?

As for the accuracy of the system, any comp sci grad can code a neural network that will make amazing predictions from unrelated information. The trick is to pre-test various inputs to verify they will produce accurate results; when a group of inputs don't work well, you make up a reason why that data is biased (for example, it doesn't work with Indiana Jones-level hits because no one can predict the snowball effect of a blockbuster, but the same problem does not apply to Norah Jones, because in that case the system does predict correctly). Even better, show off the system using the same data the system was trained on. Thus trained, the innocent, mathematical formula can be applied to Butch Cassidy and Ishtar to dramatic effect, and if there is any correlation between script quality (as read by a human for hit potential) and earnings, the subjective input values can ensure the prediction is not wildly off the mark.

These tricks seem hard to pass off on observers, but Gladwell is ever forgiving. It's hard to imagine a reporter who covers social science, and who is when preparing an 8,000-word piece on amazing predictive abilities, not asking for a prediction of an event that will happen after the piece is published (say, how much Blood Diamond or The Pursuit of Happyness will make). But either Gladwell asked and was refused--and didn't mention this fishiness in his story--or, more worryingly, he just never thought to ask.

Gladwell recently excused intelligence agencies for failing to predict the Sept. 11 2001 attacks:
I do think that recognition of hindsight bias can change the way we respond to failure. It ought to make us much more accepting of the mistakes of individuals and institutions. Unfortunately, it isn't very satisfying to acknowledge the role of hindsight bias, because there is something very psychologically and politically pleasing about identifying culprits and drafting plans for reform. We need to feel that we are making progress, even when the actual prospects for progress are quite small. [emph. added]
He would do well to take his own advice. He suffers not from hindsight bias, which is thinking that it was easy yesterday to see what was coming today, but from "foresight bias"--thinking that it's hard today to see what came yesterday.

(One last transgression: in the article he gives away the ending of a great film, Dear Frankie. Skip those paragraphs and rent the movie.)