09/10/2009 03:19 pm ET | Updated May 25, 2011

Model-Based Inference

In a recent column citing a study by Krosnick, et. al. that "Finds Trouble for Opt-in Internet Surveys" (the same study that Doug Rivers responded to here on Tuesday), ABC News polling director Gary Langer re-issued an earlier challenge to "hit me" with a "reasonable theoretical justification" for opt-in Internet polling: "I welcome any coherent theoretical defense of the use of convenience samples in estimating population values; it's a debate we need to have."

Today, Stanford University political science professor Simon Jackman took a shot at an answer:

Try this: model-based inference is an idea that has been around for a long time, and contrasts quite markedly with design-based inference for data generated by surveys. There is plenty written on this, but I'd suggest starting with a reasonably accessible book on sampling, like Sharon Lohr's Sampling: Design and Analysis. Model-based inference for survey data is discussed in various places, typically in a "starred section" in each chapter (e.g., here's how we can do design of and inference for cluster sampling from the model-based perspective, etc). The references provided by Lohr include important works by Basu and Royall etc. See also the delightful book called Combined Survey Sampling Inference by Ken Brewer -- if you can get your hands on it. Doug Rivers pointed me to this book a year or two ago and it is a treat (as these things go).

As I've said before, as soon as non-response enters the picture we're relying on models (e.g., what variables to use when weighting for non-response) and the "purity" of randomization in the sampling design is starting to fall by the wayside.

Jackman goes on to note that "we've been making use of model-based ideas for decades (e.g., weighting to correct for non-response)." I'll second that. So why is it that pre-election telephone surveys that cut all sorts of methodological corners appear to predict election outcomes as well as those that apply the accepted best practices of what Jackman calls "design-based inference?" It surely has something to do with the "modeling" they apply via survey weights. As users of the data, we need to know more about how those models work and, as per the underlying premise of the Krosnick study, more about the accuracy of the data they produce.