My column for next week has been posted a little earlier than usual. It covers the controversy over Nate Silver's pollster ratings, and bloggy exchange over the last day or two between Silver, Political Wire's Taegan Goddard and Research 2000's Del Ali over the transparency in the FiveThirtyEight pollster ratings. I have a few important footnotes and another aspect of transparency to review, but real life intrudes. So please click through and read it all, but come back to this post later tonight for an update.
I'm going to update this post in two parts. First I want to add some footnotes to the column, which covers the questions that have been raised about the database of past polls that Nate Silver created and used to score pollsters. The second part will discuss the transparency regarding additional aspects of Nate's model and scoring.
I want to emphasize that nothing I learned this week leads me to believe that Silver has intentionally misled anyone or done anything intentionally sinister. I have questions about the design and interpretation of the models used to score pollsters, and I wish he would be more transparent about the data and mechanics, but these are issues of substance. I'm not questioning his motives.
So on the footnotes: Earlier today, Del Ali of Research 2000 sent us a list of 12 of his poll results he claimed that Silver should have included in his database and 2 more that he said were in error. Later in the morning he sent one more omitted result. We did our best to review that list and confirm the information provided. Here is what we found.
First, the two polls included in Silver's database with errors:
- 2008-FL President (10/20-10/22) - Error (+3 Obama not +4)
- 2008-ME President (10/13-10/15) - Error (+17 Obama not +15)
These are both relatively small errors, and we noticed that the apparent mistake on the Maine poll was also present in the DailyKos summary of the poll published at the time.
There were four
three more polls in the omitted category that were either more than 21 days before the election (Hawaii and the Florida House race) our outside the range of races that Silver said he included (he did not include any gubernatorial primaries before 2010). [Correction: We overlooked that the NY-23 special election was omitted intentionally because of Silver's criteria of excluding races "where a candidate who had a tangible chance of winning the election drops out of it prematurely"].
- 2010-HI-01 Special Election House (4/11-4/14)
- 2006-FL-16 House (10/11-10/13)
- 2002-IL Dem Primary Governor (3/11-3/13)
- 2009-NY-23 Special (10/26-28)
Some may quarrel with Silver's decisions about the range of dates he sets as a cut-off, and I'm hoping to write more about that aspect of his scoring system. But as long as Silver applied his stated rules consistently, these examples do not qualify as erroneous omissions.
That leaves nine
ten more Research 200 polls that appear to be genuine omissions in the sense that they meet Silver's criteria but were not included in the database:
- 2000-IN President (10/28-10/30)
- 2000-NC President (10/28-10/30)
- 2000-NC Governor (10/28-10/30)
- 2002-IN-02 House (10/27-10/29)
- 2004-IA-03 (10/25-10/27)
- 2004-NV Senate (10/19-10/21)
- 2008-ID Senate (10/21-10/22)
- 2008-ID-01 (10/21-10/22)
- 2008-FL-18 (10/20-10/22)
2009-NY-23 Special (10/26-28)
Do these omissions indicate sloppiness? We were able to find
the NY-23 special election results on Pollster.com and elsewhere, the 2004 Nevada Senate and 2002 Indiana House on the Polling Report and the Iowa 3rd CD poll from 2004 with a Google search at KCCI.com. So those examples should have been included but were not.
However, we could not find the 2000 North Carolina poll anywhere except the subscriber-only archives of The Hotline (although, oddly, with different field dates: 10/30-31). The Hotline database is not among Silver's listed resources.
We also checked and the three results (from two polls) missing for 2008 and found they were also missing from the compilations published by our site, RealClearPolitics and the Polling Report during the campaign (though we did find mention of the Idaho poll on Research2000.com). We could not find the Indiana presidential result from 2000 anywhere.
The point of all of this is that there are really only a small number of examples that qualify as mistakes attributable to Silver's team. Most of the other oversights were also made by their sources. And even if we correct all of the errors and include all of the inside-the-21-day-window omissions, it changes the average error for Research 2000 hardly at all (as summarized in the column [and leaving out NY-23 does not change the average error]). These examples still represent imperfections in the data that should be corrected, and we can assume that more exist for the other pollsters, and as argued in the column, I'm all for greater transparency. But if you are looking for evidence of something "sinister," it just isn't there.
We created a spreadsheet that includes both the original list of Research 2000 polls included in the Fivethirtyeight database and a second tab that includes the corrections and appropriate omissions. It is certainly possible that our spreadsheet contains errors of it's own, so in the spirit of transparency, we've made it available for download. Feel free to email us with corrections.
[I corrected a few typos and cleaned up one mangled sentence in the original post above -- Part II of the update coming over the weekend.]
Update (6/14): Since I did not finish the promised update until Monday afternoon, I posted it as a separate entry. Please click through for more.
Follow Mark Blumenthal on Twitter: www.twitter.com/MysteryPollster