All polls are non-probability to varying degrees

A new partnership between The Washington Post and SurveyMonkey brings nonprobability polling further into the spotlight.

09/07/2016 10:55 am ET | Updated Sep 07, 2016

The Washington Post utilized SurveyMonkey to survey 74,886 registered voters in all 50 states on who they would vote for in the upcoming election. I am very excited about the work, because I am huge proponent of advancing polling methodology, but the methodological explanation and data detail bring up some serious questions.

The Washington Post explanation conflates method and mode: the mode was online (versus telephone) and the method was a random sample of SurveyMonkey users with raked demographics (versus probability-based sampling with raked demographics). So, the first key difference is that it is online versus telephone. The second key difference is that the sample is random to users of SurveyMonkey versus any telephone users. The third possible difference is not employed: despite different modes and selection criteria, the Washington Post/SurveyMonkey team used traditional analytics. This poll is more like traditional polls than the Washington Post admits, in both its strengths and weaknesses.

Both online and telephone have limitations in who can be reached; but, they have very similar coverage. As of September 2015 89% of the US adults were online in some context. Between cell phones and landlines, most people are on telephones as well. About 90% have cellphones and actually about half are cell phone only. But, that is actually the problem, a confusing number of people have both cell phones and landlines, and many of the cell phones owners no longer live near the area code of their phone. So, while telephones may be able to reach slightly more American adults, that advantage is rapidly diminishing, and US adults without any internet access, are very unlikely voters.

The bigger limitation is that the survey only reaches Survey Monkey users, rather than any possible online users. I do not know the limitations of SurveyMonkey’s reach, but as the Washington Post article notes that they are drawing from about three million daily active users. Over the course of 24-day study, a non-trivial cross section US adults may have interacted with SurveyMonkey, but while I have no way of knowing how it is biased versus the general population of online users, I assume it is relatively good at covering a cross-section of: genders, ages, races, incomes, education levels, and geography.

So, while the Washington Post is right in saying this sample is non-probability, in that we do not know the probability of any voter answering the survey, so is the traditional phone method. We do not know the probability of non-telephone users being excluded from being called, especially with shifting cell-phone and landline coverage. On a personal note, I do not get called by traditional polls because my cell phone area code is at where my parents lived when I got my first cell phone 15 years ago. And, we do not know all of the dimensions which drive the nonresponse of people called (somewhere between 1% and 10% of people answer the phone). In short, both methods are non-probability.

What is disappointing to me is that the Washington Post/SurveyMonkey team then employed an analytical method that is optimized for probability-based telephone surveys: raking. Raking means matching the marginal demographics of the respondents to the Census on: age, race, sex, education, and region. With 74,886 respondents, and a goal to provide state-level results, the team should use modeling and post-stratification. MRP employees all of respondents to create an estimate for any sub-group. It draws on the idea that white men from Kansas can help estimate how white men from Arkansas may vote or white people in general from New York. It is a powerful tool for non-probability surveys (regardless of the mode or method).

The team did break from tradition and weighed on party identification in: Colorado, Florida, Georgia, Ohio, and Texas. Partisan nonresponse is a big problem and party identification should be used to both stabilize the ups and downs of any given poll and create a more accurate forecast of actual voting. But, it should never be employed selectively within a single survey!

The Washington Post/SurveyMonkey team notes that “The Post-SurveyMonkey poll employed a ‘non-probability’ sample of respondents. While standard Washington Post surveys draw random samples of cellular and landline users to ensure every voter has a chance of being selected, the probability of any given voter being invited to a SurveyMonkey is unknown, and those who do not use the platform do not have a chance of being selected. A margin of sampling error is not calculated for SurveyMonkey results, since this is a statistical property only applicable to randomly sampled surveys.” As noted above the true probability is never known for both the new SurveyMonkey survey and any of the Washington Post’s traditional polls. Empirically, the true margin of error is about twice as large as the stated margin of error, for traditional polls, because the standard method for computing margin of error ignores: coverage (i.e., who could be included), nonresponse (i.e., who actually answers), measurement (i.e., how the poll itself shapes answers), and specification error (i.e., if the poll answer the right question).

In short, this poll is just like every other poll, it is non-probability and uses raking to overcome some of the random sample issues. Kudos to the Washington Post for endorsing this work and I look forward to better acceptance of the data collection and drive to improving the analytical methods that can take this data collection method to the next level of accuracy and depth.