The T-shirt said it all. "The weighting is the hardest part."
The pun seen all around the 2014 conference of the American Association for Public Opinion Research (AAPOR) aimed to elicit chuckles, but it also inadvertently conveyed a critical message about modern polling. Between fewer landline phones and falling response rates, it's harder than ever to contact and interview a true random sample of Americans. That makes the key to assessing survey quality a better understanding of the ways pollsters draw and "weight" their samples to correct for the statistical bias that can creep into raw data.
Yet according to an extensive Huffington Post analysis, the published results from statewide polls on U.S. Senate and gubernatorial races in 2014 have frequently omitted crucial methodological details about sample frame, sample design and weighting procedures. Just 16 percent of the polls we looked at provided information on all three of these items, while just 48 percent provided information on at least one of them.
What Makes A Poll Trustworthy?
FiveThirtyEight founder Nate Silver's approach to rating pollsters has famously focused on scoring their accuracy on horse race questions asked in the final weeks of a campaign. In a recent update of his pollster ratings, however, Silver explains why his approach has evolved to consider both accuracy and methodological assessments.
"Pollster performance is predictable -- to some extent," he writes, but the differences in accuracy are not large, especially when there are only a handful of actual surveys to consider. "Our finding," Silver adds, "is that past performance reflects more noise than signal until you have about 30 polls to evaluate, so you should probably go with the firm with the higher methodological standards up to that point.”
The assertion that methodological standards matter is far from controversial. The difficult part is scoring those standards. One of the ways that Silver assesses methodology is by checking whether the pollster is a member of the National Council for Public Polls, is a supporter of the AAPOR’s Transparency Initiative, or archives data with the Roper Center. Any one of the three will do.
As Silver emphasizes, membership in the National Council for Public Polls and supporting AAPOR's Transparency Initiative are not direct measures of methodological quality, but are proxies for it. Silver's assumption, which he has backed up with data, is that pollsters willing to ally themselves with these groups have stronger methodological standards and therefore produce more accurate results.
Unfortunately, AAPOR's initiative remains a work in progress, with a few false starts and missed deadlines since the initial announcement in 2010. (AAPOR tells The Huffington Post that the initiative's formal launch is coming later in October.)
When it launches, the Transparency Initiative will require participating researchers to disclose the kind of methodological information that is so important in judging polls today. With many differing opinions about methodology and known problems for every method or mode, perhaps the most important aspect of polling is disclosing how these problems are addressed. How a pollster adjusts for problems can arguably tell us far more about the quality of its polls than any other information, and that's why AAPOR's code of professional ethics calls for this to be disclosed.
Do Pollsters Disclose Enough?
While waiting for the Transparency Initiative to get up to speed, HuffPost Pollster decided to take a closer look at whether pollsters have been abiding by AAPOR's disclosure standards. Over the summer, we coded organizations based on the amount of methodological information they make publicly available either in news releases or on an easy-to-find page on their websites. We gave credit only for information that an average person could find with relative ease, excluding facts buried deep within a website or hidden behind a subscription paywall.
We created our sample of polls/pollsters in June of this year. Included were the two most recent polls from every organization that had conducted a statewide or congressional district poll in the 2014 election cycle. If a pollster had conducted only one such poll by June, just that one was included. Ultimately, we ended up with 140 polls and 86 pollsters in the sample.
What we found, though not entirely surprising, was concerning: There is a distinct lack of disclosure out there.
All of the polls were coded for the items that AAPOR requires to be immediately disclosed, as well as a few other items we felt provide cues about openness. The full database is available, but here we focus on the sample frame, sample design and weighting procedures.
First, we looked at the sample frame description. Loosely defined, a sample frame is the list a pollster uses to create a sample. Some pollsters use lists of telephone numbers, while others use lists of registered voters. We gave credit to any pollster who even mentioned a sample frame. Even with this generous coding, less than one-quarter of the 140 polls we looked at (24 percent) mentioned the sample frame in a news release or on the pollster's website.
A slightly higher percentage provided a sample design description. Sample design refers to how the pollsters select their sample from the sample frame. Here again, we gave credit to any pollster who even mentioned a sample design, no matter how briefly. All in all, only 33 percent of the 140 polls did so.
Another item we looked for was any mention of data being stratified -- meaning the use of additional sampling procedures as data are being collected -- or weighted -- meaning the use of statistical adjustments afterward -- to assure that the final sample is representative of the target population. To be clear, virtually all pollsters have to weight or stratify to correct for low response rates and other problems contacting a truly random sample.
Less than half of the polls examined (43 percent) mentioned weighting or stratifying their data at all. An even smaller number actually provided some description of how it was done.
Obtaining sample and weighting information is particularly important for online polls. Most online polls do not begin with a random sample; rather they interview respondents who have previously volunteered to participate in surveys. There is no pretense of representativeness in the pool of volunteers. The key to obtaining representative data is the use of stratifying and weighting procedures.
Among the other items we looked at: Disclosure of response rates was extremely low, with only three of the 140 polls providing the rate of response (AAPOR's Transparency Initiative requires its disclosure only upon request). Publication of the party, gender, age, race and geographic breakdowns of the sample, as well as publication of tables showing vote choice broken down by party, gender, age, race and geography, was somewhat higher (these items are not part of the initiative). Sixty-two percent of the 140 polls provided a breakdown of the sample by party, 58 percent provided a breakdown by gender, 55 percent provided a breakdown by age, 46 provided a breakdown by race, and 36 percent provided a breakdown by geographic location of respondents. In addition, 43 percent of the polls we looked at disclosed vote choice broken down by party, 40 percent disclosed vote choice by gender, 34 percent disclosed vote choice by age, 24 percent disclosed vote choice by race, and 28 percent disclosed vote choice by geography.
Pollsters tend to judge each other by their methods, and so should the public. What we have provided here is another example (in the same vein as the 2013 Marist Poll transparency project) of how much more work needs to be done to attain full transparency. Perhaps if we keep asking for more details, things will get better.
To see the more complete results from this project, click here.