06/09/2010 11:37 pm ET Updated May 25, 2011

More on the Arkansas Surprise

Before moving on to the more important issues raised by both Nate Silver's new pollster accuracy ratings and their apparent role in parting of ways between DailyKos and pollster Research 2000, I want to consider some possible lesson's from last night's Arkansas surprise.

Let's start with the assertion that Del Ali, president of Research 2000, made to me earlier today. He says that the final result -- Blanche Lincoln prevailed by a 52.0% to 48.0% margin -- fell within the +/- 4% margin of error of his final poll, which showed Halter at 49% and Lincoln at 46%. That much appears to be true. However, Research 2000 did three polls on the Lincoln-Halter run-off, including a survey conducted entirely on the evening of the first primary, and all three gave Halter roughly the same margin as the final poll.

2010-06-09-AR polls.png

I'll spare you the math (and the argument about how we might calculate the margin of error for such a pooled sample), but if you treat all three polls as if they were one, the difference between the vote count and the consistent Research 2000 result looks far more statistically meaningful.

One big problem in this case is that Research 2000 was the only pollster releasing results into the public domain. Had other pollsters been active, producing the sort of pollster-to-pollster variation we typically see, those who follow the race may have been less surprised by the outcome.

I am told, however, that there the Lincoln campaign and allies of the Halter campaign (presumably organized Labor) did conduct internal polling that was not publicly released. I communicated with senior advisors to both campaigns today who say that each side polled immediately after the first primary and showed Lincoln ahead. Lincoln's internal poll showed her leading by ten points, while two post-primary polls conducted by Halter's allies showed Lincoln leading by six and four points. The advisors also claim that neither campaign fielded a tracking poll in the final week, as all remaining resources were devoted to advertising and efforts to get out the vote.

Now in fairness to Research 2000, all of these claims were made to me today, on background, and I have no way to verify them independently. So take this information with a grain of salt.

Are there lessons to be learned here?

First, let's remember the point I made a week ago, with the help of Nate Silver's data: Whatever the reason, polls show far more error in primaries, especially primary elections in southern states.

Second, consider something largely overlooked: Arkansas has one of the largest cell-phone only populations in the nation. A year ago, the Center for Disease Control's National Center for Health Statistics (NCHS) published estimates of wireless-only percentages by state. Arkansas ranked fourth for the percentage of cell phone only households (22.6%) and seventh for the percentage of cell phone only adults (21.2% -- for rankings, see the charts in our summary). And the national level NCHS estimates of the cell phone only population have risen another 4.5 percentage points over the past year.

Nationally, the cell-phone-only population is largest among younger Americans, those who rent rather than own their homes and among non-whites. Those patterns could have made a difference in Arkansas.

Third is a point I made in my column earlier this week: The results of pre-election surveys are sometimes only as good as the assumptions that pollsters make in "modeling" likely voters.

For example, many pollsters stratify their likely voter samples regionally based on past turnout. In other words, they divide the state up into regions and use past vote returns to determine the size of each region as a percentage of the likely electorate. As should be obvious, these judgements are often subjective and rely heavily on the assumption that past turnout patterns will apply to future elections.

For the Arkansas runoff, however, pollsters could rely on a very proximate turnout model: The first primary on May 18 between Lincoln, Halter and D.C. Morrison. In fact, according to Del Ali, that's exactly what Research 2000 did for their runoff polls. They used the regional distribution of voters on May 18 to set regional quotas. They also conducted a survey of self-identified voters on primary election night, weighted the survey so their self-reported preferences matched the result, and relied on the resulting demographics to guide their demographic weighting on subsequent polls.

But here's the problem: As is typical, total turnout declined between the two elections. Roughly 70,000 voters (21% of those who voted in the first primary) did not vote in the runoff. But more important the pattern in the fall-off was not consistent throughout the state and the pattern favored Lincoln: Turnout was high in her base and fell off most where she was weakest.

I took the vote by county as reported by the Associated Press (here and here) and calculated turnout in each county for the runoff as a percentage of the total vote cast in the first primary. As the scatterplot below shows, the fall-off in turnout was typically greatest in counties where Lincoln's percentage of the vote on May 18 was lowest (I omitted results from Baxter and Newton counties which showed increases in the total vote suggesting clerical errors or omissions in AP's vote total).


The pattern is most likely explained by the fact that there were also Congressional run-off elections held yesterday in the 1st and 2nd Districts of Arkansas, which kept turnout higher in areas that are also Lincoln's base of support.

I don't want to make too much of the turnout pattern since, by my calculations, re-weighting the May 18 vote to match yesterday's county level turnout would add less than percentage point to Lincoln's lead. But hopefully it gives you some idea of what can happen when assumptions can go awry. Region is just one variable. Other assumptions, such as those for race and age, may have been even more consequential. Other pollsters making different assumptions may have produced very different results. When just one public pollster is active in the race, the odds of misreading the horse-race are greater.