01/17/2013 12:03 pm ET | Updated Mar 19, 2013

Obama Versus Lincoln and Argo

Branching out from politics and economics, I have been examining Oscar predictions over the last few weeks. While I approach the science of predictions the same way for both political elections and the Oscars, there are some key differences. When I forecast politics I utilize four main sources of data: fundamental data (i.e., economic indicators, incumbency, etc.,), prediction markets, polls, and user-generated data. Two of these sources: polls and fundamental data are much less useful for the Oscars. This places greater strain on the other two sources: prediction markets and user-generated data.

Early in an election cycle I rely on the fundamental data to provide a baseline prediction for all of the elections. My model was very accurate for 2012, correctly predicting 50 of 51 Electoral College elections in mid-February. The same two candidates run in all 51 Electoral College races, thus there is no state-by-state difference for some key fundamental categories: presidential approval, incumbency, and home state. But, there is meaningful state-by-state identification for other key categories: past election results, economic indicators, and state-level ideology measures. This helps fundamental models provide extremely accurate early forecasts.

Fundamental data of movies do not have the same type of identification that it has in elections. In many of the 24 categories, the same set of movies is running: Lincoln (12), Life of Pi (11), etc. Yet, most of the fundamental data of movies are not category specific: studio input choices (budget, release date, genre), success with general audience (gross revue and screens by week), and ratings. There is person level data available for some categories, but there is little objective data on the value or rating of any one person's role, relative to the overall movie. This makes fundamental models for the Oscars very imprecise.

As the election cycle progresses, I incorporate polling and prediction market data into my forecasts; this data allows me to see how the forecasts adjust to the main events of the campaign. Polls collect the voting intention of a random sample of a representative group of voters and historical data allows me to project that polling data to Election Day. Prediction markets gather the expectations of a self-selected group of high information users. My models phase out fundamental data sharply after Labor Day and rely almost exclusively on these two reliable sources of information.

There is no reliable polling of the voters in the Academy of Motion Picture Arts and Sciences. Rather than citizens who have reached the age of 18, the Oscars have a more select group of members. While political pollsters face increasingly perilous low response rates in conjunction with the uptick in cellphone-only households, any potential Oscar pollster would have to overcome much greater obstacles to reach this elusive collection of movie insiders.

Fortunately, the Oscars are an ideal use case for prediction markets. For big elections, the vast majority of the information in prediction markets is the latest polls. Prediction markets have an advantage of being able to digest late-breaking events and are especially useful earlier in the cycle and in situations where there is less polling. They are the most reliable data available, but only offer a marginally better forecast than polling data in trustworthy hands. Yet, for Oscars there is a lot of information about likely outcomes, but little of it is objective, digestible data. It is common knowledge that Daniel Day-Lewis shined in Lincoln, but it hard to pin down a reliable statistically significant data point that demonstrates his role in the movie's success. Dispersed information, among dispersed informed users, makes this a more ideal case of where prediction markets can shine relative to other forecasting data. They did very well in both 2011 and 2012 in forecasting the Oscars.

Experimental user data proved valuable during the 2012 election, but was not necessary. In this column, I showed the value of the Xbox data we collected both during debates and in a daily panel. But, its main use case is as a fun engagement device and for future research. I also showed off my prediction games that explored correlation between states, but that promise is in the future.

Experimental user data will prove not only valuable, but necessary, in forecasting the Oscars. There are not going to be detailed prediction markets in many categories, so we are going to rely on our users to supplement those categories. Further, we are excited to learn more about the correlations between different categories, from the insights our users provide. How does winning the Oscar for best actor correlate with winning the Oscar for best picture? Stay tuned for the launch of these games later this month.

This column syndicates with my personal website: