The Huffington Post’s Senate prediction model uses all of the polls in HuffPost Pollster’s database to forecast the Senate elections. It’s a three-step process: We average the polls, calculate a win probability for each race and estimate what the probability is that each party will have the Senate majority after the election.
Step 1: Poll averaging
We estimate the probability of a win for each individual Senate race by using Pollster’s Bayesian Kalman filter model to average publicly available polls in the HuffPost Pollster database. Briefly, Kalman filter models combine so-called “noisy” data ― which is not completely precise ― into a single estimate of the underlying “signal” ― that is, what’s actually happening. For HuffPost, that means the model looks for trends in the polls and produces its best estimate of the polling average.
That model runs 100,000 simulations of the data to find the most likely polling average. The model needs starting values and information that tells the simulations how to work. These are the “priors” for the model. Many Bayesian models ― including the Pollster averaging model as it’s implemented for our charts ― use “uninformed” priors that don’t affect the model or provide any background information.
However, we do use information from previous elections in these priors to make predictions in the Senate model. The model is predicting vote share proportions for each candidate, so we need information on how elections have turned out in the past. We leverage Cook Political Report’s ratings for current and past elections to create our priors.
The values for our 2016 priors are based on an analysis of Cook Senate race ratings issued in July or August of the election years 2004 through 2014. We pooled all Senate races rated “toss-up” from 2004 to 2014 and calculated the average and standard deviation of the actual vote proportions for each candidate. Then we did the same calculations for races rated “solid Democrat,” “solid Republican,” “likely Democrat,” “likely Republican,” “lean Democrat” and “lean Republican” ― all of the different Cook ratings.
The averages for vote shares in past elections mostly hover around 48 to 50 percent, but the standard deviation indicates how much those values vary, which is where differences are evident. The vote proportions don’t vary much in toss-up states since the two major-party candidates will have close to the same proportion ― say, 52 percent to 48 percent. The standard deviations are a bit larger for contests rated “lean” and “likely,” because there is a bigger difference in vote share between the candidates (e.g., the winner gets 57 percent of the vote, compared to the other candidate’s 43 percent). And the “solid” states, where winners often receive 60 percent or more of the vote, have the largest standard deviations.
These priors start the simulations, and then polling data is incorporated to make the estimates more precise. The priors typically become inconsequential once the polling data is added, but the information is helpful if there aren’t very many polls. The model begins running simulations to calculate a candidate’s estimates on the first date of the first poll. It incorporates the polls available for each subsequent day, pulling in additional surveys as it continues toward the current date — at which time all of the polls meeting HuffPost’s criteria are being considered. Newer polls are more influential in a given day’s average than older polls, because older polls are inherently less reliable, more uncertain measures of the current state of the race.
Once the simulations have run, the model produces a percentage estimate of support for each candidate on each date, in addition to estimates of undecided voter proportions and the margin between the candidates. The probability that the leading candidate actually leads, which we use to calculate the win probabilities, is determined based on the margin between the candidates.
When there isn’t enough (or any) polling data available, the Cook ratings provide our estimates of where the race stands. If the Senate race is rated as a “toss-up,” we give it a 50 percent probability of going in either direction. Since 2004, races that Cook rated as “lean” one direction or the other have gone in that direction roughly 81 percent of the time, so “lean” races get an 81 percent probability. “Likely” races have ended up correct about 93 percent of the time since 2004, so those are assigned 93 percent probability if there’s no polling available. And “solid” races nearly always end up correct, so those are assigned 99 percent probabilities of going in the anticipated direction.
Step 2: Calculating the win probability for each race
We take two steps beyond the model run to calculate the probability that the polling leader will win in the fall: extending the model’s simulations out to November, and adjusting for undecided voters.
The extension to November is a departure from how the model is implemented for the Pollster chart averages, which stop on the current date. Obviously we don’t have polling data from the future, so the model assumes that the race generally continues along its current trajectory. The lack of new data means that the outcomes of the races get less certain as time goes on, so the probability of a leading candidate winning drops slightly over the gap between today and the election. This accounts for uncertainty: We don’t know what will happen between any given day and the election, so we can’t be as certain of a win on a future Election Day as we are of a lead today.
The model also estimates what proportion of voters are undecided in the polls as of today. We calculate the ratio of undecided voters to the margin of the difference between the top two candidates by dividing the undecided proportion by the margin between the top two candidates, then subtract that from the win probability. The smaller the margin between candidates, the more the undecided voters will matter. For example, if the average undecided proportion is 7.9 percent, but candidate A is ahead of candidate B by 20 percentage points, the 7.9 percent undecided would not change the outcome of the race even if they all ultimately voted for candidate B.
In extreme cases, where margins are exceptionally small, we can generate some problematic results with our calculation. Imagine 7.9 percent undecided and 0.5 percent margin. That gives us: 7.9/0.5 = 15.8. If the margin is already that small, the unadjusted win probability could easily be less than 65 percent; subtracting 15.8 percent would cause the probability to dip below 50 and in effect “flip” the race in favor of the other candidate. We don’t want that to happen since the polling indicates which candidate leads, so we cap the undecided adjustment at a maximum of 10 percentage points and don’t allow the probability of the favored candidate winning to fall below 50 percent.
The result of adjusting the Nov. 8 probability of lead for these undecided voters is the our estimate of the final probability of a candidate winning on Election Day.
Step 3: Estimating the probability that each party will have the Senate majority
We use Monte Carlo simulations to calculate the probability of Democrats winning 51 or more seats in the Senate. The computer picks a random number for each race, then compares that number to the probability of the Democrat winning in that state. If the number is lower than or the same as the probability, that “spin” counts as a Democratic win, otherwise it’s a Republican win. For example, if the Democratic candidate in a given race has a 35 percent chance of winning according to the model, a random number from 0 to 35 would count as a Democratic win, but a number from 36 to 100 would be a Republican win.
We repeat this process for every race and count the number of Democrat-won seats. If it’s 51 or more once we add in the 36 Senate seats held by Democrats not up for re-election, the simulation counts as a Democratic majority. That whole process repeats millions of times. The probability of a Democratic takeover of the Senate is the proportion of times that the Democrats win 51 or more seats in these simulations. The probability of a tie is the proportion of times the seat count landed at 50-50.
The current model results can be viewed here.