I want to pick up where I left off on Tuesday, when I wrote about the way national surveys screen for primary voters. How well have the pollsters in early primary states done in disclosing how tightly they "screen" to identify the voters that will actually turn out to vote (or caucus)? Not very well, unfortunately.
For those just dropping in, here is the basic dilemma: Voter turnout in primary elections and, especially in caucus states like Iowa, is typically much lower than in the general election. A pre-election survey that aims to track and ultimately project the outcome of the "horse-race" -- the measure of voter preferences "if the election were held today" -- needs to represent the population of "likely voters." When the expected turnout is very low, that becomes a difficult task, especially when polling many months before an election.
And in Iowa and South Carolina, if history is a guide, that turnout will be a very small fraction of eligible adults,** as the following table shows:
When a pollster uses a random digit telephone methodology, they begin by randomly sampling adults in all households with landline telephone service. They need to use some mechanism to identify a probable electorate from within a sample of all adults. If recent history is a guide, the probable electorate in Iowa -- Democrats and Republicans -- will fall in the high single digits as a percentage of eligible adults. South Carolina's turnout is better, but is still unlikely to exceed 30% of adults. And while the New Hampshire primary typically draws the highest turnout of any of the presidential primaries, it still attracts less than half of the eligible adults in the state. Despite all the attention the New Hampshire primary receives, many voters that ultimately cast ballots in the November general election (roughly 30% in 2000) choose to skip their states' storied primary.
A pollster may not want to "screen" so that the size of their likely voter matches the exact level of turnout. Most campaign pollsters I have worked with prefer to shoot for a slightly more expansive universe, both to capture those genuinely uncertain about whether they will vote and to account for the presumption that "refusals" (those who hang up on their own before answering any questions) are more likely to be non-voters.
Nonetheless, the degree to which pollsters screen matters a great deal. If, hypothetically, one Democratic primary poll captures 10% of eligible adults while another captures 40%, the results could easily be very different (and I'll definitely put more faith in the first).
It also matters greatly how the pollster go about identifying likely voters. I wrote quite a bit about that process in October 2004 as it applies to random digit dial (RDD) surveys of general election voters. In extremely low turnout contests, such as the Iowa caucuses, most campaign pollsters now rely on samples drawn from lists of registered voters that include the vote history of individual voters. Most of the Democratic pollsters I know agree with Mark Mellman, who asserted in a must-read column in The Hill earlier this year that, "the only accurate way to poll the Iowa caucuses starts with the party's voter file."
So, based on the information they routinely release, what do we know about way the recent polls in Iowa, New Hampshire and South Carolina screened for likely voters? As the many questions marks in the tables below show, not much.
The gold star for disclosure goes to the automated pollster SurveyUSA. Of 22 survey organizations active so far in these states, they are the only organization that routinely releases (and makes available on their web site) all of the information necessary to determine how tightly they screen. Every release includes a simple statement like the one from their May poll of New Hampshire voters:
Filtering: 2,000 state of New Hampshire adults were interviewed by SurveyUSA 05/04/07 through 05/06/07. . . Of the 2,000 NH adults, 1,756 were registered to vote. Of them, 551 were identified by SurveyUSA as likely to vote in the Republican NH Primary, 589 were identified by SurveyUSA as likely to vote in the Democratic NH Primary, and were included in this survey.
I did the simple math using the number above (which are weighted values). For SurveyUSA's May survey, Democratic likely voters represented 29% of adults and Republican likely voters represented 28%, for a total of 57% of all New Hampshire adults. Their screen is a very reasonable fit for a survey fielded eight months before the primary.
Honorable mention for disclosure also goes to two Iowa polls. First, the Des Moines Register poll conducted by Selzer and Company. Ann Selzer provided me with very complete information upon request last year. Her first Iowa caucus survey last year used a registered voter list sample and screened reach a population that represents roughly 11% of the eligible adults (assuming 2.0 million registered voters in Iowa and 2.2 million eligible adults).
Second, the poll conducted in March by the University of Iowa. While their survey asked an open-ended vote question (rendering the results incomparable with those included in our Iowa chart), their release did at least provide the basic numbers concerning their likely voter screen. They interviewed 298 Democratic likely caucus goers and 178 Republican caucus-goers out of 1,290 "registered Iowa voters" (for an incidence of 37% of registered voters). Unfortunately, they did not specify whether they used a registered voter list or a random digit sample, although given the incidence of registered voters in Iowa, we can assume that the percentage of eligible adults that passed the screen was probably in the low 30s.
And speaking of the sampling frame, only 6 of 22 organizations SurveyUSA, Des Moines Register/Selzer, Fox News, Rasmussen Reports, Zogby, and Winthrop University specified the sampling method they used (random digit dial, RBS or listed telephone directory). I will give honorable mention to two more organizations -- Chernoff Newman/ MarketSearch and the partnership of Hamilton Beattie (D) and Ayres McHenry (R) -- that disclosed their sample method to me upon request earlier this year.
The obfuscation of this information by the remaining 14 pollsters is particularly stunning given that the ethical codes of both the American Association for Public Opinion Research (AAPOR) and the National Council on Public Polls (NCPP) include explicitly require the disclosure of the sampling method, also known as the sample "frame." The NCPP's principles of disclosure requires the following for its member organizations for "all reports of survey findings issued for public release:"
Sampling method employed (for example, random-digit dialed telephone sample, list-based telephone sample, area probability sample, probability mail sample, other probability sample, opt-in internet panel, non-probability convenience sample, use of any oversampling).
The AAPOR code mandates disclosure of:
A definition of the population under study, and a description of the sampling frame used to identify this population.
Finally, while virtually all of these surveys told us how many "likely primary voters" they selected, very few provided details on how they determined that voters (or caucus goers) were in fact "likely" to participate. The most notable exceptions were the Hamilton Beattie (D) Ayres McHenry (R) and Chernoff Newman/ MarketSearch polls in South Carolina, and the News 7/Suffolk University poll in New Hampshire. All of these included the questions used to screen for likely primary voters in the "filled-in" questionnaires that included full results.
So what should an educated poll consumer do? I have one more category of diagnostic questions to review, and then I want to propose something we might be able to do about the very limited methodological information available to us. For now, here's two-word hint of what I have in mind: "upon request."
**Political scientists typically use two statistics to calculate turnout among adults: all adults of voting age (also known as the voting age population or VAP), or all adults who are eligible to vote (or the voter eligible population or VEP). George Mason University Professor Michael McDonald has helped popularize VEP as a better way to calculate voter turnout, because it excludes adults ineligible for voting such as non-citizens and ineligible felons. The perfect statistic for comparison to telephone surveys of adults would fall somewhere in between, because adult telephone samples do not reach those living in institutions or who do not speak English, but might still include non-citizens that speak English (or Spanish where pollsters use bilingual interviewers).
In a state like California, with a large non-citizen population, VAP is probably the better statistic for comparisons to the way polls screen for likely voters. In Iowa, New Hampshire and South Carolina, however, the choice has very little impact. Had I used VAP rather than VEP above, the turnout statistics in the table would have been roughly a half a percentage point lower.
CORRECTION: Due to an error in my spreadsheet, the original version of the turnout table above incorrectly displayed turnout as a percentage of VAP rather than VEP. For reference, the table below has turnout as a percentage of VAP.
Follow Mark Blumenthal on Twitter: www.twitter.com/MysteryPollster