08/13/2008 02:26 pm ET Updated May 25, 2011

How We Choose Polls to Plot: Part III

In the first two installments of this online dialogue, I asked a question we have heard from readers about why we choose the results for "likely voters" (LVs) over "registered voters" (RVs) when pollsters release both. Charles answered and explained our rationale for our "fixed rule" for these situations (this is the gist):

That rule for election horse races is "take the sample that is most likely to vote" as determined by the pollster that conducted the survey. If the pollster was content to just survey adults, then so be it. That was their call. If they were content with registered voters, again use that. But if they offer more than one result, use the one that is intended to best represent the electorate. That is likely voters, when available.

Despite my own doubts, I'm convinced by the rule for this reason: I can't come up with a better one. Yes, we would arbitrarily choose RVs over LVs until some specified date, but that would leave us still plotting numbers from pollsters that only release LV samples. And on which date do we suddenly start using the LV numbers? After the conventions? After October 1? What makes sense to me about our rule, is that in almost all cases (see the prior posts for examples) it defers to the judgement of the pollster.

Several readers posed good questions in the comments on the last post. Let me tackle a few. Amit ("Systematic Error") asked about how likely voters are constructed and whether we might be able to plot results by "a family of LV screens (say, LV_soft, LV_medium, LV_hard)" and allow readers to judge the effect.

I wrote quite a bit back in 2004 about how likely voter screens are created, and a shorter version focusing on the Gallup model two weeks ago. One big obstacle to Amit's suggestion is that few pollsters provide enough information about how they model likely voters (and how that modeling changes over the course of the election cycle) to allow for such a categorization.

"Independent" raised a related issue:

Looking at the plot, it appears that Likely Voters show the highest variability as a function of time, while Registered Voters show the least. Is there some reason why LVs should be more volatile than RVs? If not, shouldn't one suspect that the higher variability of the LV votes is an artifact of the LV screening process?

The best explanation comes from a 2004 analysis (subs. req.) in Public Opinion Quarterly by Robert Erikson, Costas Panagopoulos and Christopher Wlezien. They found that the classic 7-question Gallup model "exaggerates" reported volatility in ways that are "not due to actual voter shifts in preference but rather to changes in the composition of Gallup's likely voter pool." I also summarized their findings in a blog post four years ago.

Finally, let me toss one new question back to Charles that many readers have raised in recent weeks. The two daily tracking surveys -- the Gallup Daily and the Rasmussen Reports automated survey -- contribute disproportionately to our national chart. For example, we have logged 51 national surveys since July 1, and more than half of those points on the chart (27) are either Gallup Daily or Rasmussen tracking surveys. Are we giving too much weight to the trackers? And what would the trend look like if we removed those surveys?