Yesterday, the National Council on Public Polls (NCPP) posted its biennial review of poll performance as a three-part set of PDF documents: Tables scoring the final polls from each pollster at both the national and statewide level and a top-line analysis (full disclosure: We provided NCPP with a database of the general election polls we logged here at Pollster.com, although we had no involvement in their analysis).
Historically, NCPP has focused their analysis on the national polls. Here are their main conclusions on the performance of the national polls in 2008:
In terms of Candidate Error, the average is less than one percentage point (0.9), whether the pollster choose to allocate undecided voters at the end of not. That is the same as the 0.9 percentage point error reported by NCPP for this analysis in 2004. It is slightly less than the 1.1 percentage point average in 2000. In 2008, estimated errors ranged from 0.1 to 2.4 percentage points.
Thus, despite widely discussed concerns such as the growing size of the cell-phone-only population (and this year the possibility of a repeat of Bradley/Wilder effect), there was no change in poll average error.
NCPP is a consortium of media pollsters, and as such concentrates on evaluating the performance of the polls (plural) rather than on rating or ranking individual pollsters or methodologies. So while the report has some useful data for making year-to-year, industry-wide comparisons, it will likely frustrate those trying to find "best" or "worst" pollster.
That said, any thorough effort to rank the pollsters, to separate "good" from "bad" on the basis of the accuracy of the last poll is bound to frustrate for reasons the NCPP report identifies in an easily overlooked, next-to-last paragraph:
No method of judging the error works perfectly. Other evaluations of poll performance based on other methods may produce different conclusions.
The NCPP report includes two methods of measuring the poll error that differ slightly from the eight first proposed in 1948 by the renowned Harvard statistician Frederick Mosteller in his chapter of the report of the Social Science Research Council on the polling failures that year (an still used by many who score pollster error). The NCPP measures also differ from the odds-ratio scoring proposed three years ago in the pages of Public Opinion Quarterly by Martin, Traugott and Kennedy. I have looked at state level pollster error using some of these methods, and can confirm that different methods can and do produce different rankings in 2008.
The reasons is that four factors can affect the size of the error scores, especially when aggregated for any given pollster, and these are not comparable across organizations:
1) The number of polls conducted - Generally, if we average errors across multiple polls, those who do more polls should show lower average errors by the logic of regression to the mean. Any one poll can produce a large error by chance, but as we average more and more surveys, the average errors should be generally lower (there is an exception
2) The number of interviews for any given poll - More interviews should mean less random error, and different pollsters use different sample sizes. The sample sizes for some individual pollsters can also vary widely from state to state. So if we aggregate errors across pollsters, some will do better simply because their sample sizes are bigger.
3) How the scoring handles or interprets the "undecided" category - In general elections, "undecided" is not a choice on the ballot, so any reported undecided is an error, in a sense. What complicates the analysis is that some pollsters allocate undecided voters on their final poll and some do not. Some error scores effectively ignore the undecided (either by allocating or by focusing on the margins separating the candidates), while some scores penalize pollsters that leave undecideds unallocated. This issue remains a matter of considerable, unresolved debate among pollsters.
4) The lag between the dates of interviewing and the election -- A longer delay between the field dates and election day creates a greater potential for error due to last minute shifts in voter preferences. Those that field late have an inherent advantage over those that conclude earlier, although the size of any such advantage in any given election is debatable and hard to evaluate. And ignoring all the polls that came before "the last poll" opens the possibility of a misleading measure, especially when polls do seem to converge around a common mean on the last round of polls (at least they did in 2008, see our posts here, here, here and here).
All of these are reasons why we have been cautious (so far) in producing a "best-to-worst" ranking of individual pollsters for Pollster.com. A few weeks ago, Mark Lindeman and I ranked pollsters based on their statewide surveys using 12 different scores and time frames (don't bother searching, as we have not yet posted these online).** Even when we narrowed the list to the 15 or so organizations that produced at least five "final" poll results in statewide contests, we found seven different pollsters ranking 1st or 2nd at least once, five ranking lowest or second lowest at least once and three that ended up in both categories (best and worst) at least once. And none of these rankings ranked the pollsters in a way that controlled for the number of polls conducted or the sample sizes used, ranking each pollster against the standard of how well it should have done.
The NCPP report takes a first stab at that sort of analysis by comparing what they call candidate estimate error to one half the of the margin of error. "A total of 53 of the state polls," the report tells us, "or 12.8 percent had results that fell outside of the sampling margin of error for that survey."*** Given that the margin of error is based on a 95% level of statistical confidence, if the surveys (and these comparisons) were perfect, we would expect only 5% of the results to fall outside the margin of error. Caveat: They arrive at this statistic by calculating the error on the margin predicted by the poll, dividing that number by two (to get an estimate on the error for each candidate) and comparing it to the reported margin of error for that poll.
Do some pollsters do better than others when judged by that standard? I will try to assess that in my next post.
**I haven't posted those scores, mostly because the endless number of tables adds up to no obvious conclusion. I'm willing to post those tables, in all their glorious and confusing detail, if readers demand it. But I would much rather try to find ways to evaluate pollsters that attempt to control for the four factors listed above. As always, readers suggestions are welcome.
***When I wrote this post, the links on the NCPP web site pointed to earlier drafts of the tables and analysis that were not based on the final results in each state. As a result, the original version of my entry quoted an earlier computation of the percentage of polls falling outside the sampling margin of error, which I have now corrected.