The Big Rock Candy Mountain of Data

01/16/2009 02:36 pm ET | Updated May 25, 2011
  • Mark Blumenthal Mark Blumenthal is the Head of Election Polling at SurveyMonkey.

Yesterday, I attended a press briefing on the 2008 elections by four organizations affiliated with the Democratic party. More important, they drew back the curtain a bit on what Catalist CEO Laura Quinn described as the "brand new, big rock candy mountain of data" that these organizations collected during the just concluded campaign.

My colleague Marc Ambinder blogged yesterday on the briefing by these "best supporting actors" of the Democratic campaign last year and how the Democrats "have clearly caught up" to the Republicans in terms of their "back end" work "segregating data, segregating demographic groups and providing statistically valid data to election planner." What is less well understood by those of use who pour over public opinion and vote data is that this technical transformation is creating an enormous pool of data and facilitating some very advanced analysis so far not available to the rest of us.

Both political parties and their consultants have always invested considerable time and money into collecting and analyzing the "metrics" of politics (vote returns, survey data and databases of registered voters). So the data available to insiders for analysis has always been more rich than what is available in the public domain. Less apparent to casual political observers is that the data collection and analysis going on behind the scenes is getting far more advanced than ever.

Yesterday's briefing gave a hint of what they are doing and what they learned last year. I continue after the jump, with the highlights.

Tom Bonier, director of targeting at the National Committee for an Effective Congress (NCEC), presented a county-level analysis of the 2008 and 2004 vote (not unlike those we have seen recently from various political scientists, here  here, here and here for example). He billed it as "simple analysis" meant to "lay the ground work" for the presentations by others.

Two slides Bonier presented helped tell the story of the 2008 elections in terms of both turnout and the "swing" in Democratic support from 2004 to 2008. Bonier analyzed county level data on turnout (as a percentage of the voting age population) and Democratic performance (the percentage of the vote received by Kerry or Obama). The slide below shows something others have demonstrated with maps: Turnout was three to four percentage points higher in counties that typically voted Democratic (70% or better), but 4.2% lower in counties that typically vote heavily Republican (80% or better).


When Bonier looked only at mostly white counties (90% or better), he found a -1.2% decline in turnout compared to 2004, implying that the +0.8% overall increase in turnout occurred mostly in counties with significant minority populations. He then took the analysis a step further, breaking those mostly white counties out by median income. As the slide below shows, the most significant declines in turnout and support for Obama as compared to Kerry occurred in heavily (90%+) white counties with median annual household incomes of less than $25,000.


See the work of Ansolabehere and Stewart: and Gelman for very similar conclusions based on county-level vote data.

Bonier also described his analysis as "ongoing" because by early spring, NCEC will have collected vote return data for every precinct in the country. At that level of data, NCEC will be able to replicate this sort of analysis with far more precision in terms of matching geography and demography.

Jill Hanauer, the president of Project New West presented data on the Democratic successes in 2008 in Western states. Their pollster Andrew Myers shared one particularly powerful slide based on over 65,000 interviews his firm conducted for various clients between 2004 and 2008. It showed a net swing of 16 percentage points in party identification among college educated voters in Colorado, from a 10-point net advantage for the Republicans in 2004 to a 6-point Democratic advantage in 2008.


Of the four presentations, the most data-rich by far came from Erik Brauner, chief scientist at Catalist , the private company run by Harold Ickes and Laura Quinn that built a database with a record for almost every registered voter in the country and enhanced that list with commercial data and records of the personal contacts made by a list of subscribers that includes the Obama campaign and 90 progressive political organizations. I'll highlight three slides from Brauer's presenation.

First, when Catalist analyzed the data at the county level, they found significant correlations (shown in the chart below) between an increase in support for Obama (as compared to Kerry) and the number of personal contacts made by Obama and his allies (by phone, mail, door-to-door, email, etc). They found the same pattern for stepped up voter registration. Not surprisingly, more personal contacts correlated with higher percentages of new registrants.


Did these campaign activities cause higher support for Obama? To try to get at an answer, Brauner used a simple regression model and found that higher levels of personal contact, paid television advertising and new registration predicted higher support for Obama at the county level even after controlling for the most significant demographic variables (race, age, education, marriage, religious adherence and the presence of children in the household). We always need to be careful about assuming causation from correlations, but these results, as Brauner explained, show that personal contact by the Democratic campaign, voter registration activity and paid television advertising were "all acting together and explaining outcomes that are not explained simply by demographic factors" (and no, the slides to not include coefficient values, but the chart below does show the "relative influence" of each variable).


The slide that (deservedly) drew the biggest response in the room involved two "heatmap" charts that plotted three characteristics of individual registered voters in Ohio drawn from the Catalist database and models: Their predicted likelihood of turning out to vote (as indicated by the vertical or y-axis), their predicted likelihood of voting Democratic (as indicated by the horizontal or x-axis) and the number of contacts made with each voter by the Democratic campaign (green=more, red=less).


These "heatmap" graphics vividly illustrate the revolutionary change that occurred in Ohio this year in the way the Obama campaign and its allies targeted voters. In 2004, the dark green vertical band at the right of the chart shows that Democrats targeted their phone calls and door knocking at heavily Democratic precincts. In 2008, they were far more efficient in targeting individual voters. Moreover, the green areas circled on the 2008 chart show they did a better job targeting two groups: (1) those with a high probability of voting Democratic but a weaker history of turning out to vote and (2) more persuadable "swing" voters with a very strong history of past voting.

The Catalist data make a strong case that Obama gained most among the voters that the Obama campaign and its allies targeted. But a complicated question remains: Did the field and media campaign activity cause the shift, or were the campaigns effectively piling on among voters that were the most likely to shift anyway? That question, as the Catalist analysts concede, is more difficult to answer with observational data, although within a few months they will add to their database records of which individuals actually voted in 2008, allowing for more analysis of which efforts helped boost turnout and which did not.

Of course, as all pollsters know, proving that sort of causation with this sort of data analysis is very difficult if not impossible. What works better are "randomized controlled experiments" that compare how randomly sampled voters exposed to an experimental "treatment" (in this case various campaign activities) compare to randomly sampled voters in "control groups" with no such exposure. Nine months ago, Brenden Nyhan blogged abut the founding of a new Democratic organization called the Analyst Institute , directed by a Harvard PhD named Todd Rogers. This development, Nyhan wrote, signaled that political operatives were "finally catching on" to the experimental work by Yale's Alan Gerber and Donald Green on the effectiveness of campaign techniques.

Yesterday, Rogers confirmed Nyhan's intuition. He drew back the curtain and provided a few examples of what he described as a "record use" of controlled experiments by the Democrats in 2008, used as they "had never been used before . . . to figure out exactly what impact their voter contact activities were having."

One such experiment involved post election survey work conducted in 11 states by the Service Employees International Union (SEIU) on both experimental and control groups of their members. In this case they held back a random sample "control group" of voter who received no contact from SEIU during the campaign. They then surveyed both the control group of non-contacts and a random sample of all the other voters who received campaign mail and other contact by SEIU.

What impact did the "hundreds of thousands" of targeted contacts SEIU make during the election have in "actually changing support for Obama?" According to Rogers, their post election survey found the "surprisingly positive effects" illustrated in the slide below. The campaign contacts "undermined McCain favorability, increased Obama favorability" and convinced voters that "Obama was better on jobs, the economy and health care," exactly the messages communicated by the SEIU campaign.


As someone who worked as a Democratic pollster for twenty years (until turning to blogging full time in 2006), I can confirm the unprecedented nature of the the experimental work that Rogers describes. Similar experiments had been conducted before (I worked on a few), but these previous efforts were typically sporadic and scattershot. What is different now is both the scale and sophistication of the work and also -- in one of the least understood aspects of campaign 2008 -- the increased cooperation now occurring among Democratic party organizations, campaigns, and consultants to systematically study which campaign techniques work, and which do not.