iPhone app iPad app Android phone app Android tablet app More

Featuring fresh takes and real-time analysis from HuffPost's signature lineup of contributors
Simon Jackman

GET UPDATES FROM Simon Jackman
 

Model-Based Poll Averaging: How Do We Do It?

Posted: 09/14/2012 8:58 am

Over the remainder of the election campaign, HuffPost will be presenting graphical summaries of the modeling I've been doing behind-the-scenes for them over the last few months. Some additional background on how it all works seems warranted.

The goal is to summarize the "state of play" in states where we have even a little polling, along with the national trend in voting intentions. Note right away that this isn't a forecast of what will happen in November on Election Day; but rather a synthesis of available polling -- the web site is called Pollster, not Forecaster.

That said, it is important to note that there is a lot more going on than just poll averaging.

Every couple of hours, my computer programs check to see if there are any new polls in the Pollster database. We're at a point in the campaign now where tempo of polling is ramping up and polls are being published throughout the working day. But even on a quiet day, survey houses like Rasmussen or Gallup are updating their rolling, national tracking polls. So at least once a day (but usually more often than that), my computer programs find new polls to integrate into the estimate of where each state is.

The goal is to form today's best estimate of the national and state-by-state levels of support for Obama and Romney. If we literally had no new polling on a given day, then today's best guess is simply yesterday's estimate. But the more interesting case is when, say, we have new national level polls, and a few state-level polls. How do we use that information to make estimates for all of the states we're tracking, even for states without new polling data?

This is where the modeling comes in. My model assumes that states (and the national track) move together, but in clusters. That is, some states follow other states more closely than others. Further, some states follow the national trend more closely than others. I rely on historical election returns to help assign states to clusters, and for initial guesses as to the strength of the clustering; in exploiting the historical data I upweight recent presidential elections relative to less recent ones.

There is a big geographic component to the clustering (e.g., New England, the Plains, the deep South, the Pacific West). Since the national outcome is simply the sum of what happens in the states, all states track the national outcome to some extent; that said, in recent U.S. presidential elections some states track the national swing more closely than other states. For example, Indiana, New York, Arizona, New Mexico and (of course) Ohio and Michigan have tracked the national swing much more closely than, say, Mississippi and South Dakota. Accordingly, polls in, say, Ohio and Michigan tend to more informative about the national trend (and vice versa) than polls from the deep South or the Plains.

In turn, this means that even though we may not have polls in every state on every day, we've got a plausible (if imprecise) model of movement in one state that might be related to movements in another state, and a model that links up state and national level changes in the vote. We'd always like more data than less, but the model lets us produce estimates day by day, even without a steady stream of polling data from each state, "borrowing strength" from state with polling data (or the national level) through the clustering.

The model also deals with the fact that different polls have different sample sizes, and so not all polls contribute equally to the model's estimates. The model also corrects for "house effects" that I talked about in an earlier post, the tendency for some survey houses to produce estimates that are systematically higher or lower for one candidate than other pollsters.

These house effect corrections are vital, letting us deal with the fact that some pollsters contribute many more polls than others; e.g., Rasmussen contributes 150 of the 720 presidential election polls in Pollster's collection of 2012 Obama-Romney head-to-head polls, Public Policy Polling contributes 103 polls, with another 109 survey houses contributing the remaining 467 polls. Without the house effect corrections, the model's estimate would be overwhelmed by any bias contained in the estimates produced these more prolific pollsters. In future posts I'll dig into the specific estimates of house effects, with a specific focus on likely voter versus registered voter filters.

In addition, I also aggressively down-weight polls that come from "one-off" pollsters (appearing just once in the Pollster data) and/or polls that are radically at odds with the rest of the polls for a given state.

So, for each day, for about 38 states (states where we have at least one poll in 2012) -- and for the national level -- I obtain an estimate of the state of play. The estimate is only that -- an estimate -- and so comes with some uncertainty. The uncertainty comes from multiple sources: (1) sampling error in the polls themselves; (2) uncertainty about the house effect corrections; (3) uncertainty about how quickly vote intentions are changing; (4) uncertainty about the strength of the correlations between and within clusters of states and the national level.

I take these sources of uncertainty into account when producing state-by-state probabilities of winning and the Electoral College summary. In states with less polling, we have more uncertainty about the state of play. Happily, these are generally states where the outcome isn't in much doubt (e.g., Utah).

Now that we're well and truly past the conventions, we're going to see a lot more state-level polling, particularly in "battlegrounds" like Ohio, Virginia, Pennsylvania and Florida. Already these states dominate the state-level polling in the Pollster data base: e.g., 41 polls from Florida, 38 from Ohio, 15 from New York, 13 from California, 1 from Utah. More data from more states will mean less uncertainty in the state-by-state estimates and less reliance on the model and its underlying assumptions.

In the meantime, please enjoy the estimates and graphs I'm producing for Pollster.

 

Follow Simon Jackman on Twitter: www.twitter.com/simonjackman

FOLLOW POLITICS
Over the remainder of the election campaign, HuffPost will be presenting graphical summaries of the modeling I've been doing behind-the-scenes for them over the last few months. Some additional backg...
Over the remainder of the election campaign, HuffPost will be presenting graphical summaries of the modeling I've been doing behind-the-scenes for them over the last few months. Some additional backg...
 
 
  • Comments
  • 106
  • Pending Comments
  • 0
  • View FAQ
Comments are closed for this entry
View All
Favorites
Recency  | 
Popularity
Page: 1 2 3  Next ›  Last »  (3 total)
10:35 AM on 11/07/2012
Well done - the modeled election map showed the correct outcome several months in advance.
photo
HUFFPOST SUPER USER
Christina Chapman
Extraordinary claims...
11:50 PM on 11/03/2012
I wonder how much method of polling factors in to the final figures.
photo
HUFFPOST SUPER USER
jcd8822
03:13 PM on 11/03/2012
It will be interesting if the polls are incorrect in Texas. I am sure they did not do any polling in the Rio Grande Valley which is heavily Hispanic and where they are breaking records on early voting. In the first six days Hidalgo County was 43 per cent ahead of 2008.
07:49 AM on 11/02/2012
Jimmy and UT:
How very enlightening. I do find it interesting that in your elitist attitudes towards limiting the voters pool, you carefully avoided anything that could be applied to you.
To find that you, Jimmyforlife, are an advocate for the repeal of the 17th is not really a surprise to me, but, rather, an anticipated attitude.
Just for your edification: non-citizens and convicted felons[until they have petitioned for return of their civil rights and had them restored] are not members of the voting public as it is.
10:09 PM on 11/01/2012
Allow me please to be the "restorer of calm" for a moment. Like all of us, there is good and bad in both President Obama and Governor Romney. This has been an especially mean spirited campaign fraught with untruths and sometimes outright lies, by both campaigns. This rhetoric has served up a recipe that has brought out the worst in everyone, including ourselves. In the end, regardless of the outcome, we have to unite together to help heal the wounds of this great nation. A place we love and are so proud to call home. As such, I respectfully ask that everyone be mindful and respectful of each others feelings and opinions. After all, we are the "United" States of America. Thank You, Parker Hanson.
01:47 PM on 11/06/2012
Parker, You are absolutely right, thank you. (However, it would be easier if Repubs weren't so small- minded. JUST A JOKE!)
photo
Sky Tripp
34 yo gay married hippy dude
08:28 PM on 11/01/2012
I for one would like to point out a fundamental flaw in all OLD SCHOOL polling, our household has not had a landline phone in more then 5 years. Cell phones only. and I only have one friend out of ten that has a land line and she is in her 70's. Younger more educated voters vote dem, almost always. but are almost never represented in traditional data!!!
photo
Sky Tripp
34 yo gay married hippy dude
05:43 PM on 11/08/2012
and i was RIGHT!!! haaaaa it was not even close
10:29 PM on 10/30/2012
Why is Ohio a "tossup" at 97% likely for Obama win but North Carolina is "lean Romney" for a 94% likely Romney win?
04:13 PM on 10/30/2012
One big problem here in determining the average poll! HP conveniently left out Gallup Poll on all of their state averages....Hmmmmmm! One major and extremely credible polling firm conveniently left off even in Ohio....I guess HuffPo doesn't like it when it doesn't favor their "Chosen One"...And I'll reference the accuracy of the poll that in 2008 it predicted the landslide for Obama!!!! http://www.gallup.com/poll/107674/gallup-daily-election-2008.aspx
As of right now gallup has Romney ahead by 6 pts amongst likely voters...http://www.gallup.com/poll/154559/US-Presidential-Election-Center.aspx?ref=interactive
I wonder where Huffpo's electoral map would be if they added this poll into their calculations as they should!!!!
07:39 PM on 10/30/2012
Why should they include a national poll when calculating state averages? HP uses Gallup's national poll when calculating the popular vote, which is why it has Romney ahead:
http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama

But it's the Electoral College that matters, hence state polling. It doesn't matter if Romney gets 80% of the vote in the South (hence boosting his standing in national polls) if he's behind in swing states.
11:03 AM on 11/05/2012
Just took a look at the Gallup polls per your link...the problem with these is that they are all National Polls...We don't elect the President based upon popular vote nation-wide as a whole, so it really doesn't matter, in my view, for predictive purposes what these national polls say. I prefer to look at state by state polls which give a better picture of how the electoral map will pan out. This is maybe a reason the Gallup polls are not included.
This comment has been removed due to violations of our [Guidelines]
09:31 AM on 10/28/2012
Really want the scoop----Follow the gambling odds in Vegas: OBAMA keeps the furniture by 6%.
photo
HUFFPOST SUPER USER
RosesForObama
Obama WON Re-election. I CALLED It
02:16 AM on 10/28/2012
One number.

270

270

270

I don't care how we get there so long as we get there.
photo
HUFFPOST COMMUNITY MODERATOR
EthnicHeart
02:35 AM on 10/28/2012
We're getting there. Obama/Dems 2012 F&F
08:57 PM on 10/26/2012
3 words
OHIO
OHIO
OHIO
photo
herdingcats2012
Trying to Control the Uncontrollable
01:11 AM on 10/24/2012
Why are the two latest polls for Ohio from "polling organizations" that have not been used in your polling data for Ohio in the past?

So you "preselect" what polls to use in your model?
02:06 PM on 10/28/2012
Private polling can occur at any time. If there is no history for these organizations, it's best to not weight them much relative to the others, since they may just be spin-polls or the polls may be run by amateurs (with errors) or pros (with purposefully crafted data).
11:32 AM on 10/19/2012
I'd really like to see information on who is funding each poll prominently displayed with the results.

Whether they are selling advertising time to campaigns or merely collecting eyeballs and selling advertising time to to traditional merchants, the media outlets - including Huffpost - are making money hand over fist from this manufactured uncertainty and tweaking the nation's collective blood pressure. It is another implementation of the fear-mongering that those in love with power use to try to drive public sentiment, and the willing perpetuation by the media of shoddy information (lies) for corporate gain.
11:29 AM on 10/19/2012
On election day, all of Romney's shortcomings will come home to roost - his unwillingness to disclose personal taxes, his track record as a vulture capitalist, his lack of details on implementation plans, his lack of foreign policy experience, his failed math. NONE of these issues is ideological in nature, and any one of them alone is a likely deal-killer. There are millions of voters out there just like me, who would be perfectly happy to vote for a qualified candidate with conservative values. But we don't have one. We will hold our noses and stay the course with the more experienced candidate because in hard times a known quantity is a more conservative choice than a candidate who doesn't want to discuss details. A candidate is an investment, and Romney represents a huge downside risk in a climate where upside potential is limited.

There is nothing newsworthy in these polls. When American voters pull together the gestalt of each candidate, Obama is going to win by a comfortable margin with >56% of the popular vote and 300+ electoral votes. He will enjoy a well-deserved legacy of being one of the least popular presidents of modern time, but he will win easily.

God help us all when the election is over and the Media juggernaut of this country has to find some other way of sucking revenue out of our tired economy.
12:01 PM on 11/05/2012
at least we only have one more day of Romney lies !!! It has been unreal and if that is what Mormons do I have to question that faith....surely they dont adhear to lying about everything I have friends that are Mormon and they are embarassed over him lying so much...so all Mormons do not do this
03:45 PM on 11/06/2012
And then there is always this:
PolitiFact has chronicled 19 “pants on fire” lies by Mr. Romney and 7 by Mr. Obama since 2007, but Mr. Romney’s whoppers have been qualitatively far worse: the “apology tour,” the “government takeover of health care,” the “$4,000 tax hike on middle class families,” the gutting of welfare-to-work rules, the shipment by Chrysler of jobs from Ohio to China. Said one of his pollsters, Neil Newhouse, “We’re not going to let our campaign be dictated by fact checkers.”

To be sure, the Obama campaign has certainly had its own share of dissembling and distortion, including about Mr. Romney’s positions on abortion and foreign aid. But nothing in it — or in past campaigns, for that matter — has equaled the efforts of the Romney campaign in this realm. Its fundamental disdain for facts is something wholly new.

The voters, of course, may well recoil against these cynical manipulations at the polls. But win or lose, the Romney campaign has placed a big and historic bet on the proposition that facts can be ignored, more or less, with impunity.