In the eight weeks since the elections, I have seen a fair number of self-congratulatory press releases from pollsters boasting of their successes during 2008, but none had quite the audacity of the release put out last week by Investor's Business Daily about the polls they conducted along with the TechnoMetrica Institute of Policy & Politics. "IBD/TIPP Takes Top Honors Again," read the headline, continuing in bold print below:
Now that the '08 tally is official, we note that for the second election in a row, the IBD/TIPP Poll not only came closest to the final margin, but was right on the money -- tantamount to hitting a bullet with a bullet.
My thoughts of these snap judgements of poll "accuracy" over the last two months have had a common theme: We are placing far too much emphasis on "the last poll." Nothing supports that argument as much as the claims in this IBD release. And the story is worth telling in detail, so pull up a chair and let me begin at the beginning.
IBD/TIPP released results of daily tracking from October 13 through the night before the election, Monday, November 4. Their first release was based on interviews conducted over seven days (October 6-12). Their field period narrowed to five days as of the October 16 release and to four days on their second-to-last release on November 2.
As the chart shows below, the IBD/TIPP poll showed a persistent "house effect" favoring McCain. Twenty-two (22) of 23 releases before 11/4 produced an Obama margin that was below our trend estimate. It produced an Obama margin that averaged roughly four percentage points over the course of October, compared to six to eight points for our trend estimate over the same period. A roughly 2-3 point difference on the margin is not large in absolute terms, but it does put the margin reported by the IBD polls at the lower end of the range of those reported by other national polls during October 2008. Other polls produced leads for Obama on the opposite end of that range, but the average margin of roughly seven points during most of October matched the final result almost perfectly.
Without undecideds allocated, the final IBD/TIPP polls showed Obama leading by 5.1 points (47.5 to 42.4%), a result only slightly better for Obama than their average result over the previous four weeks. However, their final projection on Monday afternoon allocated to Obama two-thirds of remaining 6% that were still undecided. That allocation produced an projection that was remarkably close to what the poll averages showed earlier that day. The final IBD/TIPP projection had Obama ahead by 7.2 percentage points (51.5% to 44.3%), exactly the same as our trend estimate as of Monday morning (Obama +7.2 points; 51.5% to 44.3%) and just one tenth of a percent different from the RealClearPolitics average that day (Obama +7.3 points; 51.6% to 44.3%).
Obama's final margin -- as of this writing and based on final results in each state -- turns out to be urned out to be 7.25 percentage points (52.9% to 45.7%), within a whisper of the final IBD/TIPP projection and the result that all other polls, collectively, had been showing for weeks (at the same time, the actual 1.2% support for candidates other than Obama and McCain fell far short of IBD/TIPP's projection of 4.1% -- something not accounted for in their boast of "top honors").
However, the "house effect" that made earlier IBD/TIPP results more favorable to McCain was not lost on those producing the IBD/TIPP analysis. Here is the first line of their first poll release:
In contrast to other polls, which show Obama leading McCain by 4 points (Reuters/C-SPAN/Zogby) to 11 (Newsweek), the IBD/TIPP Tracking Poll debuts today with Obama up just 2 points with 13% (including 25% of independents) undecided.
They came back to this theme repeatedly. On October 22:
Contrary to other polls, some of which show Obama ahead by double digits, the IBD/TIPP Poll shows a sudden tightening of Obama's lead to 3.7 from 6.0.
And again on October 29, the IBD/TIPP analysis pointed out that other polls were "migrating" (briefly, as it turned out) to their closer than average result:
The race tightened again to 3 points Wednesday, a margin IBD/TIPP has shown for six days and to which other polls appear to be migrating. For example, the Rasmussen and Gallup polls, each of which had Obama up 5 points two days ago, now have him at 3.
The relative closeness of the IBD/TIPP polls was also repeatedly noted by supporters of John McCain (all quotes via Nexis transcripts):
Not surprisingly, many conservative bloggers relied on the IBD/TIPP polls to support the same conclusion, namely that the race was closer than other polls made it appear. Ironically, to strengthen that argument, many pointed to IBD/TIPP's boast of being the "most accurate" pollster in 2004, a message that appeared prominently alongside each day's results. "An analysis of Final Certified Results for the 2004 election," they wrote, "showed IBD's polling partner, TIPP, was the most accurate pollster of the campaign season."
TIPP's final projection in 2004, showing a 2.1 percentage point lead, did come closest among the final national polls to the ultimate 2.5 percent margin, although it is worth noting that IBD was one of the few to report results to one decimal place. Four other pollsters showed Bush leading by margins of 2 or 3 percentage points on their final 2004 poll. Since they rounded off their results to the nearest whole number, we cannot say for certain who would have been "closest," although reporting results out to one decimal is largely meaningless for individual polls given margins of error of 3 percentage points or more.
The same can be said this year: Seven pollsters other than IBD/TIPP reported final results or projections that rounded to Obama leads of 7 or 8 percent. Again, IBD/TIPP can claim to be "closest" to the final margin mostly because they chose to report results to one (largely meaningless) decimal place. Other pollsters may have been just as close. Moreover, remember random sampling error. How close a pollster comes to the illusion of "pinpoint accuracy" on any given survey (or even on any two surveys) is still largely a matter of chance.
But the issue raised here that many will wonder about is the allocation of two-thirds of the undecided voters to Obama, especially given the analysis I linked to yesterday by Mickey Blum about the so-called "convergence mystery." She noted that allocations that moved estimates closer to the poll averages helped explain the sharply decreased variance among the final polls that David Moore identified (see also my commentary). So what do we make of the fact that the IBD/TIPP allocation produced a result converged almost perfectly with the poll averages?
First, in their defense, IBD/TIPP is certainly not the only pollster to wait to allocate undecided voters until their final release. Gallup has followed that practice for decades, and five other national survey organizations did the same this year.
Second, Raghavan Mayur, president of TechnoMetrica, explained in a comment posted here yesterday that they used a "Hierarchical Heuristic algorithm" in both 2004 and 2008 that had been developed using their data for the decided. Such algorithms create mutually exclusive clusters of voters based on combinations of variables that, in this case, maximize the difference across those subgroups in terms of support for Obama or McCain. They divide up the decided electorate, creating some subgroups that are very likely to support Obama, some that are very likely to support McCain and others that fall somewhere in between based on what Mayur describes as "dominant demographics" that predict candidate support.
In response to my email query last week, Mayor provided more details on the workings of their algorithm. In this case they created groups of clusters based on combinations of seven variables (race, party identification, age, income, self-reported ideology, 2004 vote and religion). Once they had used those variables to create segmentation based on the decided, they developed allocation rules based on the demographic segments (and, presumably, on the level of support for Obama and McCain in each segment) that they used to allocate each undecided respondent.
So Mayur's explanation is essentially that their allocation was based not on a gut hunch or an arbitrary "rule" based on past experience but rather a statistical model based driven by individual level data. He also sticks doggedly to the initial IBD line that other pollsters had it wrong:
Other commentators questioned why we often had the Obama-McCain race tighter than other polls. The fact is, the race was tight right down to the last week, when undecided voters swung to Obama. The final 7.2-point spread was the widest we showed since our polling started Oct. 13.
Unfortunately, I find no evidence anywhere else to support the contention that two-thirds of undecided voters "broke" to the Democrat in either 2004 or 2008. Not in the exit polls. Not in comparisons between other polls and the final results. Only the IBD/TIPP poll achieved "awesome results" (as Mayur put it in an email), by assuming a 2-to-1 break to the Democrats in 2008 and nearly as much in 2004. More specifically:
There is also some confusion about the number of variables used to create the algorithm that allocated the undecideds. When the Wall Street Journal's Carl Bialik asked Mayur about it shortly after the election, Bialik reported that TIPP's allocation "method uses 10 variables, such as the respondent's party, gender and 2004 vote, to predict the 2008 vote." When I asked Mayor to specify the 10 variables, he listed only seven (race, party, age, income, ideology, 2004 vote and religion), explaining that "we also look at gender but do not use it in the model." However, the document he attached to the same email has one listing of classification rules that does not include the age variable, and a listing of computer code that does (Mayur asked that I treat the additional details contained within that document as "strictly 'confidential" so that I would not "give away our 'bread and butter' to other firms").
The point here is that the underlying model used to allocate undecided voters still leaves a lot of room for subjective judgements. How many variables are used to create the segmentation? Which subgroups from that segmentation are used to allocate undecided, and which are not? What is the criteria that determines whether a given subgroup gets allocated to Obama or McCain?
Writing here yesterday, Mayur offered a "moral" of their allocation story:
1. Close to ten percent of the electorate makes decision in the final weekend (Saturday, Sunday, and Monday).
2. They don't break even. Democrats have a clear advantage among them, at least in the past 3 races.
Respectfully, those are the wrong lessons. Here is what we have learned:
Finally, let us be clear that we are in no position to discern an inappropriate motive on the part of TIPP or Investor's Business Daily. The more important issue here is what we make of the results they produced.
PS: I asked Raghavan Mayur for his comment on my argument that we put too much emphasis on final polls generally, and on the final IBD/TIPP polls specifically, in measuring survey accuracy. His full response appears after the jump.
It is an interesting proposition, but I am sanguine for the following reasons:
1) It only makes sense that the main emphasis will be on the results of the last poll--it is the latest, freshest, best predictor of what voters will actually do on election day. The pollster who comes closest to the final result is the most accurate. If not, what then will be your datum for comparison to rate pollster accuracy? The popular vote is the best judge. These days, unfortunately many folks including scholars at universities don't have patience and jump the gun -- they can't even wait till all the votes are counted to understand poll accuracy. In 2008, some did a comparison on the morning after the Election Day. I got several emails how come we were off. I advised many to have patience and wait till all votes are counted to do the comparison.
2) I do not believe there is any contradiction in our results, as you claim. First, the landscape changes from day to day in an election cycle--especially in such an historic one as this with so many factors coming into play and influencing voters--right up until the last day. That is especially true for the relatively large segment of undecideds, who by definition did not make up their minds until the end. Our results showed they broke 2:1 for Obama. We do not know how they would have broken earlier in the cycle. Had we allocated throughout the election cycle, our margins would have reflected whatever the results were for that snapshot in time.
Second, the margin or difference becomes more precise when undecided voters are allocated. In 2004, our allocation much to dissatisfaction of some partisans, gave more for Kerry than Bush (2 to 1). It got our poll results better, narrowing the margin from 3.3-points to 2.1-points. And it's similar in 2008; it expanded by a few points. And you can do the improvement only based on the last few days. The fresher the data, the better the accuracy.
3) On the point of other pollsters...
At least for myself, I can make best judgments only on the data we have here. I never saw a double digit race others showed. And I am not sure why others don't carry the 6% to 8% undecided (even higher earlier in the cycle) till the end, when in reality there exists such a share. And that may have something to do with the double digit leads.
Our models are different. They are good for some things and not as good for others. But they perform great for the "margin." For instance, I had a 4% for "other" candidates. I don't argue with the data. I went with it (for all we knew it could have been a manifestation of a reverse Bradley effect). We turned out to be a bit high on that.
I also think it's important to note that we weight by party - a practice not used by many others, but we feel it's an important and valid predictive factor, that probably accounts at least in part for the tighter race we showed all along - typically from what we have seen in some internals Republicans are under-represented. Therefore, if party weighting is not used, there's an inherent over-representation of Democrats.
Also, we recognize that different organizations have different models and different approaches. We don't interpret any underlying motives for their results. For instance, the reputable Pew poll the week before the election showed Obama up 15-points, but dropped to 6-points in the final poll.
The venerable Gallup poll showed widely different results from their 2 methodologies just the week before the election--and reported only one result because of their observation that the 2 "converged" in the final run-up to election day (+5 and +10 versus +11 for the final prediction). The CBS poll consistently showed margins over 10. Also Newsweek showed double digits all along.
So the moral here is no two polls are alike and as professionals we don't ascribe motives or agendas to the other polling organizations' results. And we hope that others would afford us the same courtesy.
Follow Mark Blumenthal on Twitter: www.twitter.com/MysteryPollster