06/14/2010 02:42 pm ET | Updated May 25, 2011

Transparency and Pollster Ratings: Update

[Update: On Friday night, I linked to my column for this week, which appeared earlier than usual. It covers the controversy over Nate Silver's pollster ratings, and an exchange last week between Silver, Political Wire's Taegan Goddard and Research 2000's Del Ali over the transparency in the FiveThirtyEight pollster ratings. In linking to the column I also posted additional details on the polls that Ali claimed Silver had missed and promised more on the subject of transparency that I did not have a chance to include in the column. That discussion follows below.]

Although my column discusses issues of transparency of the database Nate Silver created to rate pollster accuracy, it did not address transparency in regards to the details of the statistical models used to generate the ratings.

When Taegan Goddard challenged the transparency of the ratings, Silver shot back that the transparency is "here in an article that contains 4,807 words and 18 footnotes," and explains "literally every detail of how the pollster ratings are calculated."

Granted, Nate goes into great detail describing how his rating system works, but several pollsters and academics I talked to last week wanted to see more details of the model and the statistical output in order to better evaluate whether the ratings perform as advertised.

For example, Joel David Bloom, a survey researcher at the University at Albany who has done a similar regression
of pollster accuracy, said he "would need to see the full regression table" for Silver's initial model that produces the "raw scores," a table that would include the standard error and level of significance for each coefficient (or score). He also says he "would like to see the results of statistical tests showing whether the addition of large blocks of variables (e.g., all the pollster variables, or all the election-specific variables) added significantly to the model's explanatory power."

Similarly, Clifford Young, pollster and senior vice president at IPSOS Public Affairs, said that in order to evaluate Silver's scores, he would "need to see the fit of the model and whether the model violates or respects the underlying assumptions of the model," and more specifically, "what's the equation, what are all the variables, are they significant or aren't they significant."

I should stress that no one quoted above doubts Silver's motives or questions the integrity of his work. They are, however, trying to understand and assess his methods.

I emailed Silver and asked about both estimates of the statistical uncertainty associated with his error scores and about not providing more complete statistical output. On the "margin of error" of the accuracy scores, he wrote:

Estimating the errors on the PIE [pollster-introduced error] terms is not quite as straightforward as it might seem, but the standard errors generally seem to be on the order of +/- .2, so the 95% confidence intervals would be on the order of +/- .4. We can say with a fair amount of confidence that the pollsters at the top dozen or so positions in the chart are skilled, and the bottom dozen or so are unskilled i.e. "bad". Beyond that, I don't think people should be sweating every detail down to the tenth-of-a-point level.

In a future post, I'm hoping to discuss the ratings themselves and whether it is appropriate to interpret differences in the scores as indicative of "skill" (short version: I'm dubious). Today's post, however, is about transparency. Here is what Silver had to say about not providing full statistical output:

Keep in mind that we're a commercial site with a fairly wide audience. I don't know that we're going to be in the habit of publishing our raw regression output. If people really want to pick things apart, I'd be much more inclined to appoint a couple of people to vet or referee the model like a Bob Erikson. I'm sure that there are things that can be improved and we have a history of treating everything that we do as an ongoing work-in-progress. With that said, a lot of the reason that we're able to turn out the volume of academic-quality work that we do is probably because (ironically) we're not in academia, and that allows us to avoid a certain amount of debates over methodological esoterica, in which my view very little value tends to be added.

To be clear, no one I talked to is urging FiveThirtyEight to start regularly publishing raw regression output. Even in this case, I can understand why Silver would not want to clutter up his already lengthy discussion with the output of a model featuring literally hundreds of independent variables. However, a link to an appendix in the form of a PDF file would have added no clutter.

I'm also not sure I understand why this particular scoring system requires a hand-picked referee or vetting committee. We are not talking about issues of national security or executive privilege

That said, the pollster ratings are not the fodder of a typical blog post. Many in the worlds of journalism and polling world are taking these ratings very seriously. They have already played a major role in getting one pollster fired. Soon these ratings will appear under the imprimatur of the New York Times. So with due respect, these ratings deserve a higher degree of transparency than FiveThirtyEight's typical work.

Perhaps Silver sees his models as proprietary and prefers to shield the details from the prying eyes of potential competitors (like, say, us). Such an urge would be understandable but, as Taegan Goddard pointed out last week, also ironic. Silver's scoring system gives bonus accuracy points to pollsters "that have made a public commitment to disclosure and transparency" through membership in the National Council on Public Polls (NCPP) or through commitment to the Transparency Initiative launched this month by the American Association for Public Opinion Research (AAPOR), because he says, his data shows that those firms produce more accurate results.

The irony is that Silver's reluctance to share details of his models may stem from some of the same instincts that have made many pollsters, including AAPOR members, reluctant to disclose more about their methods or even the support the Transparency Initiative itself. Those instincts are what AAPOR's leadership is hoping to use their Initiative to change.

Last month, AAPOR's annual conference included a plenary session that discussed the Initiative (I was one of six speakers on the panel). The very last audience comment came from a pollster who said he conducts surveys for a small midwestern newspaper. "I do not see what the issue is," he said, referring to the reluctance of his colleagues to disclose more about their work "other than the mere fact that maybe we're just so afraid that our work will be scrutinized." He recalled an episode where he had been ready to disclose methodological data to someone who had emailed with a request but was stopped by the newspaper's editors who were fearful "that somebody would find something to be critical of and embarrass the newspaper."

Gary Langer, the director of polling at ABC News, replied to the comment. His response is a good place to conclude this post:

You're either going to be criticized for your disclosure or you're going to be criticized for not disclosing, so you might as well be on the right side of it and be criticized for disclosure. Our work, if we do it with integrity and care, will and can stand the light of day, and we speak well of ourselves, of our own work and of our own efforts by undertaking the disclosure we are discussing tonight.