Following up on yesterday's column and my additional comments on the Huffington Post's requests for response rate data from pollsters last fall, I want to provide a little more of a users' guide to response rates with a focus on how hard it can be to (a) calculate a response rate and (b) make valid comparisons across pollsters.
Generally speaking, the response rate alone does not not tell us very much and, as such, is a poor indicator of the overall quality of the survey. That's one point that Evans Witt, the president of the National Council on Public Polls (NCPP), made in a comment worth reading on my post yesterday:
NCPP does not believe any single number is the perfect guideline to judge a poll. That is why NCPP calls for the release of a substantial amount of information when a poll is the subject of public debate. With all the required information in hand, the informed consumer can judge a given poll and evaluate it against other surveys.
My column referenced my 2008 interview on the subject with Robert Groves, then a University of Michigan professor, now director of the U.S. Census. For those not familiar with his career, Groves is one of the most widely respected authorities on survey methdology and non-response bias in surveys. I asked him whether we should consider it a problem that political surveys have response rates at or below 20%. His answer:
The key to answering that question is to determine whether the non-respondents are different from the respondents. What we do know from about ten years of research around the world is that [the response] rate, that 20% you cited, isn't itself informative to that answer. We don't know what a 20% response rate means with regard to the difference between respondents and non-respondents.
We do know, secondly, that in a single survey, some estimates are subject to large non-response biases -- that is, the respondents are really quite different from the non-respondents -- and others in the same survey are subject to no bias. So if you just know the response rate, you can't answer the question.
As always, knowing something about the non-respondents is hard, since we don't interview them. Groves goes on to talk about the importance of including "auxiliary variables" on the sample as a way to "get a purchase on an answer" of how respondents differ from non-respondents. For more detail, listen to the full interview or see his (free) article in the 2006 Public Opinion Quarterly special edition on non-response bias.
Next, on the difficulty of calculating a response rate, here's the short version: In the late 1990s, the American Association for Public Opinion Research (AAPOR) made an effort to standardize the computation of response rates. The document they published -- Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys -- is now 50 pages long. It includes six different ways to calculate a response rate, four ways to calculate a cooperation rate and three ways to calculate a refusal rate.
Now here's the long version (and if it gets too hopelessly wonky, just skip to the paragraph that starts "Let's go back to the Huffpollstrology feature"):
Why so many formulas? The underlying idea is not complicated. Suppose, hypothetically, you have a perfect list of some population you want to sample from and you draw a simple random sample of 1,000 names from that list and then attempt to interview everyone on the sampled list. The response rate would be the number of respondents to the survey divided by the 1,000 names sampled.
In the real world of telephone surveys, however, this calculation gets a lot more complicated. First, should the numerator include partial interviews -- those that begin the survey but do not complete it? And where should the pollster draw the line between those that refuse to participate but answer a question or two before hanging up, and those that are otherwise cooperative but have to get off the phone before the interview is complete? The answer depends, in part, on what the pollster does with data gathered from partial interviews. If they throw out all partial interviews, the answer is simple: Exclude them from the numerator (which makes the response rate lower). If they use partial data in their results, the decision of how to calculate the response rate gets a lot more complicated (see pp. 12-13 of the AAPOR Standard Definitions).
The denominator of the formula is even more complicated, especially for telephone surveys that use randomly generated telephone numbers, the so-called random digit dial (RDD) methodology. In any RDD sample, some non-trivial percentage of the randomly generated numbers are non-working but will -- due to the vagaries of telephony -- either produce a busy signal or ring endlessly the way a working phone would if no one answered. Some unknown percentage of the "no answers" are business numbers that can be excluded from the response rate calculation because they are not eligible for the survey.
There is one very accurate way to determine which numbers are working or otherwise eligible for the survey: Dial each number over and over for a period of months, not days or weeks. Eventually, you end up identifying 99% or more of the working numbers. But virtually all political opinion surveys call for just a few days, so for most of the polls we care about, the pollster is left with some sampled "no answer" numbers whose status is uncertain. The AAPOR Standards document resolves this issue by allowing for three different calculations: A response rate that includes all of the mystery "no-answer" numbers (and is thus lower than the true number), a response rate that includes none of the mystery numbers (and is thus higher than the true number), and a response rate that involves an estimate of the percentage of eligible numbers from among the "no answers" (and -- surprise, surprise -- pollsters differ on the best way to estimate this percentage).
Put these variables together and you get six different official AAPOR response rates, labled as RR1 through RR6 in the table below:
The main point, if you're having trouble following all the detail, is that AAPOR's Response Rate #1 (RR1) is the most conservative way to calculate the rate (i.e. it produces the lowest response rate, all else being equal) and RR6 is the most "liberal" (i.e. produces the highest rate). The two rates that involve estimates of the eligible "no answers," RR3 and RR4, usually produce rates somewhere in the middle.
Now, let's consider another reason why the response rate alone is a poor indicator of survey quality. The overall response rate is a combination of two things: How many sampled units the pollster is able to contact (the contact rate) and how many of those live human beings are willing to be interviewed once contacted (the cooperation rate, or its converse, the refusal rate). Pollsters know how to boost the response rate. That's easy. Just dial over and over again for period of weeks. But how useful is a political campaign survey with a field period of a month or more? So focusing on the response rate can be deceiving.
Also consider some of the other methodological differences that might cause a response rate to go higher or lower, as noted by the ABC News summary of its methodology and standards:
It cannot be assumed that a higher response rate in and of itself ensures greater data integrity. By including business-listed numbers, for instance, ABC News increases coverage, yet decreases contact rates (and therefore overall response rates). Adding cell-only phones also increases coverage but lessens response rates. On the other hand, surveys that, for instance, do no within-household selection, or use listed-only samples, will increase their cooperation or contact rates (and therefore response rates), but at the expense of random selection or population coverage. (For a summary see Langer, 2003, Public Perspective, May/June: 16-8.)
Let's go back to the Huffpollstrology feature. In response to the Huffington Post requests, some pollsters specified the AAPOR formula they used, some did not. But even if all used the same response rate formula, we would still get variation in the rates depending on how each pollster drew their sample, how they selected individuals within each household, how long they stayed in the field and how they interpreted AAPOR's guidelines for coding the calls they made. Consider also that pollsters that use an automated method have much less ability to "resolve" the status of the calls they make. They can determine whether a human being answers the phone, but know little about why some choose to hang up.
So trying to make comparisons across pollsters is frustrating at best. When response rates are generally in the same range, they are also a lousy way of trying to tell "good" pollsters from "bad."
Finally, consider the comment I received via email from Ann Selzer, the Iowa-based pollsters best known for conducting the Des Moines Register's Iowa Poll:
If low response rates were a big problem, no pollster could consistently match election outcomes. In the end, we have a good test of what matters more and what matters less. How one defines likely voters is much more important than the current (albeit seemingly low) response rates.
What pollsters do to minimize the potential for response bias, and how that intersects with how the select likely voters, is something I'm going to take up in subsequent posts.