iPhone app iPad app Android phone app Android tablet app

Upgrading Pollster's Trend Lines: The Kalman Filter

First Posted: 10/22/10 12:19 AM ET Updated: 05/25/11 07:05 PM ET

Kalman

Regular readers have probably noticed a few subtle changes in our trend lines over the last 24 hours (and perhaps a few temporary glitches). The good news is that we have been rolling out the first in a series of long overdue upgrades I hinted at when we debuted HuffPost Pollster, and I want to take a few minutes to explain what's changing and why.

Those who have followed Pollster.com from the beginning may remember that we started with charts based on simple "last 5 poll" averages whose trend lines were anything but smooth. Simple averaging reduced the random noise associated with individual surveys, but much remained. So in 2007 we started plotting charts using a loess regression model that draws smooth trend lines to fit noisy polling data. Rather than report simple estimates, we've reported the value of the end-point of the trend lines, or what we have called "trend estimates."

That smoothing has a huge practical benefit: It draws trend lines that are typically a better representation of real underlying trends. Those lines resist "chasing" outlier polls or the variation that's either random and inherent in polling data or that results from differences in polling methodology rather than real shifts in opinion. When they work well, the smooth trend lines help you see real trends more accurately and put new polls into perspective. You can see easily how they compare to the overall trend.

But the loess regression model has some limitations that we have struggled with. The computations generally run smoothly when polls are plentiful, but they sometimes go awry when we have only a small number of polls available. With fewer than 8 polls (the scenario that applies to most of the U.S. House races right now), we do not even attempt to draw loess lines and plot simple linear (straight) trend lines instead. The straight lines produce current "trend estimates" that are no more accurate than the most recent poll, and sometimes considerably less so.

So as of yesterday, we have added an important new first step to this process. The generation of Pollster's trend lines now begins with a statistical tool called a "Kalman Filter," which smooths survey data in a manner that's conceptually similar to loess regression. However, as explained in a helpful 1999 article in Public Opinion Quarterly (Green, Gerber and De Beoff, "Tracking Opinion Over Time"),* Kalman filtering adds some important properties. First, because it sees each data point on the chart as a *survey estimate (with an associated sample size and margin of error) rather than just a number, it provides a means of quantifying the accuracy of the lines -- including the end points that we typically call our "trend estimates." That property is useful in translating trend estimates into probabilities. Second, it provides us with some additional tools (that I'll describe in a successive post) to improve the accuracy of forecasts based on the polling data in the chart. Third, from a purely practical perspective, Kalman Filtering provides a more consistent and reliable process for us to use to generate these charts when polls are sparse.

We have developed a specific Kalman Filter for our charts that is adapted from a model developed by two academic friends-of-Pollster, Jeff Lewis and Simon Jackman. The next few paragraphs get a bit technical (and are intended for statisticians and others who more than the rest of us about how Kalman Filters work -- feel free to skip to the paragraph that starts "in plainer English" below): Our Kalman Filter smooths the polling results by considering (1) how big the sample size is for each poll, weighting polls with fewer responses less heavily and (2) how likely the Kalman Filter "thinks" a race is likely to jump around. The second point, how likely a race is to jump around, is part of Jeff and Simon's model, which estimates the variance of each individual race over time as well as the covariance between races (this is sometimes referred to as the "process noise" or "innovation matrix" of the Kalman Filter).

In other words, if a candidate has had a steady 60% in the polls, and then we suddenly see a poll where he or she has 20%, the Kalman Filter will be less inclined to trust the latter poll, where loess smoothing would have been dragged down. In addition, the Kalman Filter can incorporate the correlation between races into its forecasts. For example, if the filter has learned that Barbara Boxer's scores go up when Harry Reid's scores go up and we see a new poll where Reid is doing well, it may give a slight bump to Boxer as well, even in the absence of new polls (for what it's worth, we are not currently seeing much in the way of this sort of national trend).

One quirk with Kalman filtering, however, is that it moves forward in time, only estimating today's support based on what it saw yesterday and the days before. The unfortunate result is that the results of the Kalman filter are often jagged or abrupt looking. To address this issue, we employ a commonly used technique called Forward Filtering Backward Sampling (FFBS) (Kim, Shephard, and Chib, 1998) to smooth our results. Among other things, the FFBS, can help create simulations of what happens between now and election day that can be used to estimate the probability that a candidate will win a race. More on that to follow in a later post.

In plainer English, that means the Kalman Filter process brings a lot of important "extra stuff" to our charts. There is a catch, though, especially for those of us that have grown accustomed -- for all the right reasons -- to the even smoother lines that Professor Charles Franklin developed for Pollster.com. Even after FFBS, Kalman-Filter output still looks more jagged than our standard chart. Here's an example using the Nevada Senate race:

2010-10-22-Blumenthal-NVSenKF.png

After pondering this issue, we decided to add an extra step: The standard charts now running on HuffPost Pollster take the slightly more jagged Kalman Filter trend lines and run them through the same loess regression "smoother" model we have been using for the last three years. The net result provides what we consider the best of both worlds, the added properties of Kalman filtering in the form of smoother trend lines that -- when polls are ample as they are currently in the most competitive Senate and Governor races -- are virtually identical to what we've been doing all along. Here is the end result, again, using the Nevada Senate race to show our standard trend estimate line. It should look very familiar:

2010-10-22-Blumenthal-NVSenDefault.png

If you're a data geek and want to see it, we have tucked the raw Kalman Filter output just a few clicks away. Use the Smoothing tool in the chart and select the "More Sensitive" option, as I did to generate the first example above.

When polls are sparse, the lines will look a bit different than what we have been producing, but in a good way. The lines we generate will be more reliable and should better represent the underlying data while also bringing the additional statistical properties described above.

That said, There are two minor issues that regular Pollster readers should be aware of, that we will be working on after the election. First, we were not able to get the Kalman Filtering routine to run efficiently enough to run in your browser to drive the Filter tool. But don't worry, you can still filter out any pollster. The chart will just use the same loess process we have used previously, and the filtered results will be roughly comparable, especially when polls are ample.

Second, because the underlying model starts with the covariation between all races, you will see very small changes in the trend estimates for every race (usually no more than a tenth of a percent or two) whenever we add a new poll to any race.

So far, I am just describing the process that draws the trend lines. Next, I'll describe how we use the Kalman Filter model to generate the race classifications and probabilities.

**Many thanks to our friends at Public Opinion Quarterly and Oxford Journals for providing a free link to the Gerber, Green and De Beoff article.

FOLLOW HUFFPOST

Regular readers have probably noticed a few subtle changes in our trend lines over the last 24 hours (and perhaps a few temporary glitches). The good news is that we have been rolling out the first in...
Regular readers have probably noticed a few subtle changes in our trend lines over the last 24 hours (and perhaps a few temporary glitches). The good news is that we have been rolling out the first in...
 
 
  • Comments
  • 22
  • Pending Comments
  • 0
  • View FAQ
Comments are closed for this entry
View All
Recency  | 
Popularity
dcswampfox
I also have a predictor badge!
12:09 PM on 11/01/2010
And congrats on giving ourselves only two bad choices (DEMS and REPUBS.)to pick from.
dcswampfox
I also have a predictor badge!
12:07 PM on 11/01/2010
It seem crazy to me that people expect the economy to be fixed in 2 years given that policies aimed toward big business had been implemented for 8 years of W bush. It seems asinine to me that folks take the position that 2 years is all the the democrats should have to turn things around. Even more asinine, is people's position to return to office/power the very people (repubs) that got us into trouble in the first place. If we are dumb enough to operate on this logic, then we do deserve to put back into power the very people who make policies that help big business and hurt the average Joe. We get the government and leaders we deserve. Congrats!
01:51 PM on 11/02/2010
Perhaps if Americans sensed the Obama administration was working on anything else but Healthcare, perhaps they'd be more willing to give the guy a break. But it's hard to be patient when you can't buy a quart of milk for your family.

I'm not saying Obama needed to bring unemployment down to 3%. I think Americans needed to see a downward trend. Obama promised 8% unemployment if we passed the stimulus, and unemployment remains stubbornly near 10%.

That's what's killing this administration. The perception that this admintration ignored JOBS JOBS JOBS.
photo
HUFFPOST SUPER USER
Chris1962
NYC
06:14 PM on 11/02/2010
The "socialist" perception of Obama and his policies is what's killing this election. And he did it to himself with his dictatorial, Castro-style HCR "mandate" and nationalizing everything that wasn't nailed down.
03:25 PM on 10/25/2010
"De Boef" not "De Beoff"
08:42 PM on 10/24/2010
Never underestimate the American People.

On november 2nd 2010; We, The People, take America back!

http://www.youtube.com/watch?v=x_Os0cwpCQE&feature=player_embedded
HUFFPOST SUPER USER
bcgd
12:25 PM on 10/25/2010
wow sorry but its obvious you dont get it and will vote against your best interest again and again.
I will shoot down 60,61,100 and the anti choice 63 AGAIN. I want taxes to save our schools, first responders and state. These areas are our foundation, it must be strong to grow the middle class so im sorry but borrow and spend or watch us rot. Lastly Colorado has tried this three times since 04 if not 2000 I believe: to apply the tittle of living human to an embro at the precise moment the seed enters the egg.
Your not going to turn roe v wade over, that goes for entitlements and the new healthcare reforms.

We are not taking this country back to 1950 I know you want to but you raised free thinking brilliant kids. We're not scared of black, brown, white, male, female, LGBT, immigrant, commi, socialist, right, left, anger, fear terrorist AKA war. We understand screaming and fighting will not work and there just happens to be 7billion humans sharing a planet with each other and the rest of the organizms. Which we think it is our way of life that is detrimental to these other plants and animals. Thus our responibitlity is to keep these other species around for your grandchildrens grandchildren.
So hate to break it to ya but we will be taking the country FORWARD.
06:21 PM on 10/26/2010
Faved.
12:33 PM on 10/27/2010
who's this we you're talking bout? Do you have a mouse in your pocket? Misrepresenting other's positions doesn't make you right. pontificating on self-righteous platitutes doesn't put people back to work or unify a Country. You claim not to be afraid. Well then, don't be afraid of the future. Don't be afraid of the direction the Country is moving in after this election. Ifg you understand screaming and fighting won't work, try to convince your fellow liberals that it's time to be tolerant of All Americans, regardless of their political views. Taxes are not the answer, primarily because the gevernment is the most inefficient way of spending taxpayer's money. Liberal government creates more non-productive citizens than does free enterprise and capitalism. Bigger and more invasive government drives our Country in the wrong direction and has cratered our economy.
photo
HUFFPOST SUPER USER
oldgraymare
Congress is the opposite of Progress
01:24 PM on 10/27/2010
take it back ... FROM WHOM?????????? From the corporate giants who have pumped millions and millions and millions of dollars into Republican candidates who will, in turn, pump millions and millions and millions of dollars back into these companies/corporations by way of favorable legislation? Do you REALLY think these folks are looking out for YOUR interests?
photo
dpearl
Show me the data
05:57 PM on 10/24/2010
I think the Kalman filtering combined with the Lowess smoothly is a sensible approach here and I applaud the improvement. However my experience with the Pollster charts is that this is tinkering around the margins with respect to election prediction issues. Much more important to me is how you handle the undecideds and other candidate responses. Different surveys have very different ways of handling the undecideds and the degree to which they force a choice. With other candidates many surveys ignore them while many include them. I think it is important to include a method to adjust for these crucial differences in your charts. I can think of some sophisticated ways to handle this but a crude method that simply allows us to use the percent of the two-party vote as an option would help greatly for now.
01:28 AM on 10/23/2010
I'd love to see pollster incorporate shaded areas showing the 68% and 95% confidence regions. Seeing these get thinner or wider would be as informative as the trend line. It would also be great to show the individual poll points as points with error bars to show house effects.
05:15 PM on 10/22/2010
In these days of overbearing partisan filtering of news, less
manipulating is preferable. The bumps and uneven results
more closely reflect volatile times and key turning points. But
I am too far removed from Statistics and Quantitative Math
to debate the Kalman filter. Besides the most accurate polls
for the senate races are found at intrade.com where money
bets determine the odds.
03:57 PM on 10/22/2010
NASA has used kalman filters for decades and they have the benefit of detecting real shifts (verses noise). Would be nice to see BOTH charts (you could add a link to the KF version for us geeks.

Cheers, AJStrata
03:39 PM on 10/22/2010
I don't see how this works without incorporating House Effects, they make up something like 96% of the poll to poll variance in the generic ballot. The Kalman Filter's estimated variance is going to shoot up considerably if it's not taken into account. That's probably why the graph is so "jagged" and needs all of the ad-hoc adjustment.
This user has chosen to opt out of the Badges program
02:19 AM on 10/22/2010
How is all of this different from an Exponentially Smoothed Moving Average? It seems to me that an ESMA is what is called for here, since what happened last January is having no effect whatsoever on what is now happening.

Took a statistics course in college, btw, and found it fascinating.
03:36 PM on 10/22/2010
Pretty different, the results would be really sensitive to whatever smoothing parameter is used. It also doesn't easily incorporate sampling error or the missing data issues.
HUFFPOST SUPER USER
LaFlow
Yes, indeed, my micro-bio is empty.
12:48 AM on 10/22/2010
Mark this is great! I don't know much about Kalman filtering with polling data... but I wonder if in future iterations you can also incorporate an online bias estimation routine that could identify house effects in different polls.
12:16 AM on 10/22/2010
i cant believe that i read most of that article, its fascinating but im kind of a dummy
This user has chosen to opt out of the Badges program
photo
11:43 AM on 10/22/2010
Me too. But by reading all these articles, it eventually starts to sink in, and it keeps your brain from rotting.