It started not long after the sunrise last Wednesday morning. One reporter after another wanted to know: Which poll or pollster was most accurate? Which was worst? Votes were still being counted (they still are in some places), results still unofficial, and yet the rush to crown a pollster champion (and goat) was already in full swing.
I am going to write several posts on pollster accuracy -- this is just the first -- but I want to try to emphasize some common themes: First, leaping to conclusions about "accuracy" without considering random sampling is almost always misleading. Second, most of the pollsters came reasonably close to the final result in most places, so they tend to be bunched up in accuracy ratings and, as such, small differences in the way we choose to measure accuracy can produce different rankings. Third, I want to raise some questions about the polling industry's focus on the "last poll" as the ultimate measure of accuracy.
For today, let's start with something simple: It is foolish to focus on a single poll that "nailed" the result is given the random variation that is an inherent part of polling. Because most surveys involve random sampling (even internet panel surveys randomly sample from their pool of volunteers), they come with a degree of random variability built in, something we know of as the "margin of error." If we make the assumption that the final poll's "snapshot" of voter preferences comes close enough to the election to predict the outcome, then the best we should expect a poll to do is capture the actual result within its margin of error (although even then with caveat that the margin of error is usually based on a 95% level of statistical confidence, so 1 poll in 20 will likely produce a result outside that error margin by chance alone). So, if all polls are as accurate as they can be, the difference between "nailing" the result and being a few points off is a matter of random chance -- or luck.
If we are going to try to compare pollsters, the wisest thing to do is to measure accuracy across as many polls as possible, because the role of random chance will gradually diminish as the number of polls examined increases.
Unfortunately, that observation is not stopping a lot of reporters and observers from scanning the final national polls and trying to identify winners and losers. So before moving on to more elaborate aggregations, let's look at the list the final national poll conducted by 19 different organizations over the final week of the campaign. Looking first at the final survey results (as opposed to "projections" that allocated the undecided), we see that all of the polls had Obama leading by margins of 5 to 11 percentage points. A straight average of these surveys shows Obama leading by 7.6% (51.4% to 43.8% ).
How did these polls compare to the actual results? First, let's keep in mind that provisional and late arriving mail-in ballots are still being counted in some places (and may not be reflected in the "99% of precincts counted" statistics typically provided by the Associated Press). The most current and complete national count I can find now shows Obama with a 6.6% lead in the national popular vote (52.7% to 46.1%). Obama's margin has increased by about a half a percentage point over the last week and (if the pattern in 2004 is a guide) may increase slightly more as secretaries of state release their final certified results.
Given that margin, however, just about every national poll can claim to have gotten the result "right" in some respect. Most captured either the individual candidate results or the margin within their reported margin of error (keeping in mind that the margin of error on the margin between two candidates is a little less than double the reported margin of error for each poll). Many that reported more in the undecided category, thus coming in low on individual candidate percentages, offered "projections" that allocated undecided. And remember, the 95% confidence level tells us that one of these polls should have fallen outside of the margin of error by chance alone.
Of course, if we are hell bent on crowning a champion, we still need to decide which accuracy measurement is best (do we compare the margins, how close the poll came to predicting the percentage for one or both of the candidates?) and in some cases, we would need to decide whether to focus on the survey results or the pollster's projection. For Battleground/GWU, for example, we have three sets of numbers: A final poll showing Obama with a 5-point lead and two projections (one from the Democratic and Republican pollsters involved) showing Obama with leads of 5 and 2 points respectively.
I am not devoting much effort here to calculating or charting the accuracy of the individual polls here because, again, random chance is such a big player in determining where each pollster ranks. I am working on another to follow soon, hopefully tomorrow, that will look at how pollsters did in statewide contests where we can aggregate accuracy calculations across multiple polls.
But before moving on from the national polls, let's look at this issue another way. What if we back up and look at the "snapshot" of polls as of Friday, October 31. After all, we have considerable evidence that virtually all minds were made up by the final week of the campaign. According to the national exit poll, only 7% of voters say they made their decision in the final three days (10% over the course of the final week). Although McCain did slightly better -- running roughly even with Obama -- among the late deciders, my colleague David Moore points out that those final decisions would have had little or no impact on the margins separating the candidate over the final week.
The overall performance is about the same. The average the results of the polls in this table, all of which concluded between October 26 and October 31, shows an average Obama lead of 7.1 points (51.4% to 43.0%) -- just slightly narrower than the 7.6% margin on the final round of national polling. What is different, however, is the spread of results. Where the final poll Obama margins varied from 5 to 11 points, just three days earlier the spread was from 3 to 15. The standard deviation (a measurement of the spread of results) was 1.8 on the Obama margin on the final polls, but 3.2 on the polls just a few days earlier.
I do not want to use this table to beat up on any individual pollster, especially since my October 31 cut-off is arbitrary and the field dates vary considerably (the Pew survey, for example started and ended earlier than most of the others). A slightly different cut-off date would have produced a different picture. Obama's 5 point margin on the IBD/TIPP 10/27-31 survey, for example, shrank to just 2 points the next day and then expanded back to 8 points on their final release.
We should remember that pollsters hold the details of their "likely voter models" close, habit that allows many to tinker with their selection and weighting procedures on their last poll. Gallup -- among the most transparent of pollsters in terms of describing their likely voter model -- disclosed a small adjustment in their model made just days before the election (although Gallup's Jeff Jones explained via email that the change did not explain Obama's growing margin over the last few days of their survey).
All of this brings me to the question we ought to keep front and center as we think about the accuracy of state level polls, where we are in a better position to quantify final poll accuracy. How many pollsters were tinkering or adjusting their models on that "last poll" with an eye toward the "final exam" coming on Election Day? And if the final poll results tended to converge around the average on the last round of polls, how much of that convergence was real and how much the result of last minute tinkering with LV models and weighting? And what does all of this say about focusing solely on "the last poll" to as a way to rate pollster accuracy? After all, just 19 of the 543 poll displayed on our national poll table were the "last poll." Which surveys had the biggest impact on campaign coverage?