Right now, the 2016 stories are all about individual presidential candidates -- Hillary and Jeb and Rand and Ted and maybe Martin. But let's just put aside what they're eating for lunch, and try to predict which party is most likely to win the White House next year.
Will the Republicans have a leg up after eight years of a Democrat in the Oval Office, or will the Democrats have an advantage because of an improving economy (assuming it keeps growing in the next 18 months)? The arguments fly back and forth, often backed by historical data and sophisticated-looking charts.
Many news and blog articles from journalists and academics will try to tell you that what’s happened in the past informs us about what will happen in 2016. Their anecdotes, data and charts assume the predictive power of a relationship between past presidential elections and some other piece of information. We’ve had presidents for over 200 years, so history is convincing, right?
Not always. Some skepticism is actually warranted. Presidential elections are not frequent enough, or similar enough, to make very many conclusions based on prior contests.
There have been 57 presidential elections since 1788, but they are not all relevant today. Presidential elections have really only been dominated by the combination of global and domestic issues familiar to us and the Republican and Democratic parties we know since 1945. That leaves us to analyze just 17 presidential elections since World War II, each of which has been unique in many ways and similar to others in additional ways.
This cartoon from xkcd shows how easily you can find something unique about each election. Every president won in some singular way, but that unique factor may have had nothing to do with his victory. The same premise holds about similarities -- maybe there are similarities among the last 17 elections, but we can’t say with any certainty whether they mattered.
Data are not necessarily helpful under these circumstances. In statistical practice, a set of data with fewer than 30 observations is considered very small. Each presidential election is a single observation, so 17 elections equal ... a very small set of data.
The number of observations matters because traditional statistics focuses on finding “significant” relationships between variables -- meaning that the relationship between two things is not accidental and didn’t happen by chance. It’s much more difficult to confirm that something didn’t happen by chance when you have only 17 observations.
An argument from 2011 over whether the unemployment rate would have any effect on the president's re-election offers a nice illustration of the problem. Over the last 17 elections, the relationship between the incumbent presidential party's margin of victory and the unemployment rate looks like this:
To use those data to predict what an incumbent party’s margin of victory might be in a future election, the analyst needs to draw a single line that basically expresses how the margin of victory has responded to the unemployment rate. But there are so few points, and they are so spread out, that it’s almost impossible to do that.
In the second chart, the red line shows the most likely statistical relationship between the unemployment rate and the incumbent party's margin of victory, but the dotted lines show the range of statistical uncertainty about that red line -- which is, in the context of presidential margins of victory, enormous. Statistically speaking, any one of the blue lines might represent the "true" relationship between unemployment and margin of victory, and they're all over the place.
The lines trending downward would indicate that the incumbent party’s margin of victory decreases with higher unemployment rates. But the lines trending upward indicate the opposite -- that higher unemployment increases the incumbent party's chances of getting re-elected. The latter doesn't seem likely, but it can't be ruled out based on the available data. More data would reduce the range of uncertainty (i.e., bring those dotted lines closer together) and make us more certain of the relationship, but we don’t have more data.
One thing we can be sure of is that over the next year or so, political observers will be declaring how the unemployment situation affects the chances of a Democrat or a Republican winning in November 2016. They'll be trotting out many other factors that they say relate to victory and basing predictions on them, too.
Just remember that there’s usually a whole lot of uncertainty when there are only 17 data points.