I have read Yeager and Krosnick's recent, well researched essay on this subject with great interest. It was written in response to my comments (of October 26) on their paper comparing the accuracy of RDD telephone surveys and Internet surveys conducted with probability and non-probability samples posted in August 2009.
In their new essay Yeager and Krosnick provide evidence to refute my two criticisms of their original paper.
My first criticism was the data they presented, even if completely accurate, did not show that the "RDD telephone data was consistently more accurate than the non-probability surveys." Yeager and Krosnick agree with me that the Harris Interactive's data points are closer to the benchmarks on two of the six items they used by 2.64 and 0.56 percentage points. They argue that the word "consistently" was justified because these differences are small (and they are). So this is really a question about semantics. The Oxford English Dictionary defines consistently as "uniformly, with persistent uniformity." IF the RDD sample produced more accurate data on six out of the six variables, that would be consistently more accurate but four out of six is not.
Social Desirability Bias
Yeager and Krosnick agree with me that "Internet surveys are less subject to social desirability bias than are surveys involving live interviewers," and provide some useful references to support this conclusion. However, they argue that "the measures of smoking and drinking we examined were not contaminated by social desirability bias."
Smoking and Drinking
The authors provide several hypotheses, other than social desirability bias, that might explain why Harris Interactive's online surveys found more drinkers and smokers than the benchmark survey and the RDD survey, both involving live interviewers. For example they suggest that "perhaps the people who agreed to participate in the opt-in Harris Interactive Internet surveys generally possessed the studied undesirable attributes at higher rates than did respondents to the RDD sample." This is possible, of course, just as it is possible that Harris Interactive's online respondents are much more likely to be gay or lesbian, and less likely to give money to charity, to clean their teeth, believe in God, go to religious services, exercise regularly, abstain from alcohol and drive under the speed limit. However this hypothesis sounds very like the argument used by the tobacco industry for thirty years or more that the correlation between smoking and lung cancer could be because those prone to this disease were more likely to smoke .
Yeager and Krosnick also address the evidence I quoted from the Federal government's NHANES survey which found that based on blood samples more people had apparently smoked than admitted to smoking cigarettes when they were interviewed.The authors present several hypotheses to explain this difference, all of which may be true but none of which are proven. It is surely true, as they suggest, that part of the increase is due to people using tobacco in ways other than smoking cigarettes. But they also argue that the data from the blood samples cannot be usedas a check on respondent's answers because for most respondents there was a gap of "between two and nine weeks" between the interview and the drawing of the blood sample, and that smoking behavior may have changed during this time. If so this would be a big increase in the number of smokers over a short time, and this trend if it continued would rapidly increase the number of adult smokers, which has not happened.
As I suggested at the beginning, I am impressed by Yeager and Krosnick's research on the literature on this topic. Furthermore, I concede that I have not proved that social desirability bias is the only possible explanation for the differences between our online survey data and the live interviewer surveys on smoking and drinking (including our own). However, Yeager and Krosnick have not proved my hypothesis is wrong and their explanations for these differences are also hypothetical and, I submit, less plausible.
The 7 "secondary demographics"
This was not part of my argument about "were the benchmarks wrong?" but was in the original paper by Yeager and Krosnick and was referenced again in the authors' reply, so a few comments may be useful. The seven variables were picked by the authors from a long list that they might have used. Had they chosen other variables the results might have told a different story, but we do not have those data. The average errors involved were modest ( 3.0 and 1.7 respectively) and the differences between the two samples was small. One of the seven variable was the number of adults in the household, a variable for which Harris normally weights; I am not sure why it was not weighted in this survey. By far the biggest error in the Harris survey was for people in households with incomes of between $50,000 and $60,000 (why that particular bracket and not others?) Replies to questions about incomes are notoriously unreliable and here again social desirability bias may well be at work.
One other thing
At the risk of extending this dialogue, there is one other important point that should be made about the research on which Yeager and Krosnick have based their paper and their conclusions.
They reported that the RDD telephone survey used in these comparisons was very different from the typical telephone surveys used by any of the published polls. It was in the field for six months, non-respondents were offered a $10 incentive to participate, and it achieved a 35.6% response rate. In other words, the sample was presumably much better than the samples used in all the published telephone polls, which do not pay incentives, are usually in the field for only a few days, and achieve much lower response rates. Even if the RDD survey used by the authors had been more accurate than our online poll (which, of course, I dispute) it would say nothing about the accuracy of the RDD telephone polls published in the media.
How will Trump’s administration impact you? Learn more