Two recent polls provide case studies in how pollsters determine the demographics of "likely voters," especially the gender breakdown. The answer is not as simple as you might imagine, although when it comes to gender, some public pollsters show a surprising reluctance to adjust likely voter samples that produce highly implausible percentages of women.
Most of us understand that polls aim to capture a snapshot of attitudes at the moment they are taken, "as if the election were held today." On that much, most pollsters agree. But my pollster colleagues tend to disagree about how much they are willing to assume about who will vote. Media pollsters generally prefer to use procedures that select "likely voters" based on the respondent reports of their likelihood to vote, past voting behavior and interest in the campaign, while allowing the demographics of the resulting likely voter sub-sample to vary from poll to poll. Internal campaign pollsters are more willing to make estimates of some of the demographics of likely voters and adjust their samples accordingly to keep the demographic composition reasonably consistent.
Given that background, consider our first case study, a recent survey of Michigan voters conducted by EPIC/MRA for the Detroit News and several local television stations. The survey gave Democratic governor Jennifer Granholm an eight-point lead over Republican challenger Dick DeVos (50% to 42%), but also reported a gender composition of 57% female. Some Republicans cried foul, so the National Journal's Hotline (subscription only) contacted EPIC/MRA pollster Ed Sarpolous for comment:
The poll is a great example of the science of weighting polls, says EPIC/MRA's Ed Sarpolous. He explains that, when conducting a poll for a media client, the client has two options: They can take a snapshot of the race as it stands at that moment in time, or they can choose to guess what the electorate will do come Election Day. The difference is all in how the pollster weights the polls.
Here I have to stop. In my experience, pollsters rarely try to "guess what the electorate will do," although we may sometimes make an educated guess about who the electorate will be as described above. It is puzzling that Sarpolous would describe his weighting procedure in these terms, although the specifics of the rest of his explanation have much more to do with the demographic composition of his sample than its vote preference. The Hotline report continues:
The survey Sarpolous conducted, though, was unweighted. The [unweighted] "snapshot," as he calls it, allows his clients to take a look at the state of the race today. In this case, he says, men -- especially Republican men -- weren't making it through his screens of likely voters. That is, they were telling his interviewers that they were unlikely to vote. That made the unweighted sample of likely voters overwhelmingly female.
Sarpolous says his clients had to answer a question when deciding how to weight their poll -- or leave it alone: "Are you looking to write a story about what's happening today, or what's going to happen in 55 days?"
Two things are odd about this explanation. The first is that most media pollsters begin with a sample of all adults, and weight the adult sample
that to match the highly reliable demographic estimates from the U.S. Census. They then select a pool of registered or likely voters from the larger adult sample, allowing the demographics of the sub-sample to vary.** It would be very unusual if the EPIC/MRA survey did no demographic weighting at all, but it is unclear from the explanation above.
Second, while estimating the composition of voters in terms of age or race can be difficult, we do have reasonably consistent estimates of gender. One such estimate comes from the Current Population Survey (CPS) of the U.S. Census. The CPS is also a survey, of course, and their voter estimates are based on self-reported voting behavior. However, the CPS is based on a very large initial sample (60,000+ households nationally each month) with a very high response rate (90%+). The following table shows the CPS estimates by state for 1998 and 2002 (kindly provided by Professor Michael McDonald of George Mason University, who has been analyzing CPS estimates of voter demographics for an upcoming journal article):
A note of caution: The CPS voter sample sizes in many states are well under 1,000, and as such, the estimates no doubt include much random variation due to sampling error. However, even with the variation, CPS reported very few states with a gender composition of 57% or higher in either year (the District of Columbia and Delaware in 2002, D.C. and Mississippi in 1998). The higher percentage of women in places like DC and Mississippi owes to a greater percentage of African American voters. CPS has shown a large and persistent gender gap in turnout among African Americans (see the report on differences in turnout by race in the 2002 CPS, Table B).
Michigan had a larger sample size in both years (n=1,437 in 2002). The gender percentage there was reasonably consistent -- 53.0% in 1998, 53.4% in 2002. The CPS is not the only source. A pollster might also consider the results of past exit polls and the gender statistics available from the lists of past voters. Except for urban areas and states like Mississippi, the gender composition of voters rarely deviates more than a percentage point or two from 52-53% female. Speaking from personal experience, most campaign pollsters would weight "likely voter" data for that state to 53% female.
The Hotline story went on to say that after "the public outcry," Sarpolous went back and weighted his results by gender to reflect a close balance of men and women.
As it turns out, men were nearly as likely as women to favor DeVos and, when weighted evenly to predict Election Day turnout, the results ended up the same.
That may be, though I am still puzzled why the pollster would not have conducted this sort of analysis before releasing the initial results.
The gender composition of our second case study was probably more consequential. A survey of Indiana's 8th Congressional District conducted recently for the Evansville Courier & Press by Indiana State University that showed Democratic challenger Brad Ellsworth leading incumbent Republican John Hostettler by a surprising 15 point margin (47.4% to 31.8%). The Courier & Press reported that 63.5% of the 603 interviews conducted among registered voters were women. Most campaign pollsters would probably agree with the assessment of Republican pollster Bill Cullo, who called the gender mix "unprecedented" yesterday on Crosstabs.org.
Another recent automated survey on Indiana-08 conducted by the Majority Watch project (RT Strategies and Constituent Dynamics) weighted their results to 48% male, 52% female. They also showed a significant gender gap in voter preferences. Their survey had Hostettler leading by 14 points among men (56% to 42%) but trailing by a whopping 24 points among women (36% to 60%).
What were the results of Courier & Press survey by gender? The initial poll story includes no tabulations by gender. However, given the high proportion of women in their sample, they owe their readers some indication of how this very unusual result may have affected the results.
**CLARIFICATION: In using the word "pool" above I did not mean to imply that the process of selecting registered for likely voters involves a second round of random sampling. Pollsters simply select the subgroup of interest (self identified registered voters, or voters that they classify as "likely") from the larger sample. The process is analogous to selecting any other subgroup (women, 18-30 year olds, union members, etc.).
Weighting or adjusting a sample of all adults by demographics like gender and age is not controversial among media and political pollsters, because, as I wrote above, we can base those adjustments on highly reliable U.S. Census estimates of the adult population. The practice of seperately weighting the subgroup of registered or likely voters is more controversial, because the demographics of those populations vary slightly from election to election, and estimates are less reliable.
Finally, the Courier & Press survey was conducted by the Sociology Research Lab at Indiana State University. The original version of this post identified it incorrectly as Indiana University.
Follow Mark Blumenthal on Twitter: www.twitter.com/MysteryPollster