03/18/2014 05:40 pm ET Updated Mar 19, 2014

HUFFPOLLSTER: Google's Flu Mishap Shows The Limits Of Big Data

la presse

A team of academics offers lessons learned when Google's big data flu tracking went astray. The Roper Center makes raw exit poll available to its members again, but with a catch. And another poll finds a drop in Mayor De Blasio's job approval rating. This is HuffPollster for Tuesday, March 18, 2014.

NEP RESTORES ACCESS TO EXIT POLLS - In an email newsletter publicly circulated on Tuesday, the Roper Center Public Opinion Archives announced that access to "the National Election Pool's Exit Poll collection spanning 1990 through 2010 has been restored." Data from 2012 has not yet been released to the archive. The Roper Center had made respondent level exit poll data available for download to paid subscribers, but **roughly a year ago the Center "temporarily suspended" access to NEP data "due to a breach of the Center’s Terms of Use**." [Archive.org]

But there's a catch... - As explained on the Roper Archives web site, the data will now be accessible by application only: "the NEP exit polls are restricted data sets, therefore, in addition to the Center’s standard Terms and Conditions have other limits associated with their use. Access to these data is a privilege, not a right, and will be provided only if the proposed research or teaching objectives and data protection plan are deemed to be adequate. Interested parties should complete and submit the Application to Access NEP Exit Poll Files. All applications are kept confidential. The data are to be destroyed at the completion of the research project or teaching assignment, and no backup copies of the data are to be made." The new policy has been in effect since January. According to the Roper Center, they shared the news was their members "recently" prior to the public announcement. [Roper via @sfcpoll, @RoperCenter]

...and discomfort with the new restrictions:

-GMU political scientist Michael McDonald, via email: "As one who has published work critical of the exit polls, I am concerned that Roper's new application process will deny access to those who have research agendas that do not conform with the 'specific interests' of the NEP. The NEP should not be fearful of reasoned criticism since outside observers can often offer new perspectives that will improve the performance of future exit poll surveys."

-Jon Robinson (D): "Still no access to 2012 micro-data, kind of dumb that you have to 'destroy' the data after analysis. What about replication, etc?" [@jon_m_rob]

-Alex Lundry (R) "WTF? This reads to me like partisan pollsters won't have access to exit poll data. Am I reading that right?" [@AlexLundry]

What are the criteria? - In response to Lundry, the Roper Center tweeted: "Anyone who requires access to these data may apply. Members don't have to pay extra for them, just fill out the application." Lundry followed-up: "But upon what criteria will applications be approved/denied?" [@RoperCenter, @AlexLundry]

Roper responds - Roper Center Associate Director Lois Timms-Ferrara sent the following response to a late afternoon query from HuffPollster: "Non-members may also apply for access to the polls, and there is a fee structure in place for doing so." Roper Executive Director Paul Herrnson "will review all applications for access. While we absolutely do not want to add barriers to accessing this important collection, the Roper Center is committed to assuring that these data remain available to the research community well into the future. These are the current requirements for doing so."

Past exit poll data still available via ICPSR - Whatever the issues involving the Roper Center, respondent level exit poll data from 1994 through 2008 remain accessible to scholars via the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. [ICPSR]

GOOGLE FLU TRENDS: THE LIMITS OF BIG DATA - From a new academic paper by David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani (footnotes omitted): "In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. [The journal] Nature reported that [Google Flu Trends] was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States. This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data, what lessons can we draw from this error?" The authors identify two:

Big Data Hubris - "'Big data hubris' is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. We have asserted that there are enormous scientific possibilities in big data. However, quantity of data does not mean that one can ignore foundational issues of measurement, construct validity and reliability, and dependencies among data. The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis."

Algorithm changes - "[I]t is quite likely that GFT was an unstable reflection of the prevalence of the flu because of algorithm dynamics affecting Google’s search algorithm. Algorithm dynamics are the changes made by engineers to improve the commercial service and by consumers in using that service. Several changes in Google’s search algorithm and user behavior likely affected GFT’s tracking...In improving its service to customers, Google is also changing the data-generating process. Modifications to the search algorithm are presumably implemented so as to support Google’s business model—for example, in part, by providing users useful information quickly and, in part, to promote more advertising revenue. Recommended searches, usually based on what others have searched, will increase the relative magnitude of certain searches." [Lazer et. al. via @jcpolls, @ElectProject]

NEW FLORIDA POLL FINDS CLOSE RACE FOR GOVERNOR - Stephen Calabria: "Former Florida Gov. Charlie Crist (D) has just a 1-point edge against incumbent Gov. Rick Scott (R), according to a poll released Tuesday by the University of North Florida. According to the survey, 34 percent of registered voters said they favor Crist, while 33 percent said they favor Scott. Another 34 percent were undecided or favored another candidate.The UNF results are the closest findings between Crist and Scott in the race thus far...One reason for the relatively low support for both candidates: unlike UNF's October poll, which simply asked about Scott and Crist, the university's most recent survey asked voters to choose between "Rick Scott the Republican, Charlie Crist the Democrat, or somebody else," with 17 percent opting for another candidate." [HuffPost]

DE BLASIO'S RATING DROPS - Quinnipiac: "New York City voters approve 45 - 34 percent of the job Mayor Bill de Blasio is doing, below Police Commissioner William Bratton's 57 - 13 percent approval rating and City Comptroller Scott Stringer's 53 - 12 percent score, according to a Quinnipiac University poll released today. Mayor de Blasio's approval rating is down from a 53 - 13 percent score in a January 16 survey by the independent Quinnipiac University. A March 27, 2002, survey showed newly-elected Mayor Michael Bloomberg with a 62 - 16 percent approval rating. By a 65 - 29 percent margin, New York City voters are optimistic about the next four years under Mayor de Blasio. The new mayor will make life better for them and their families, 33 percent of voters say, while 22 percent say he will make life worse and 38 percent say he will have no effect. " [Quinnipiac]

HUFFPOLLSTER VIA EMAIL! - You can receive this daily update every weekday via email! Just enter your email address at the bottom of this article or in the box on the upper right corner of the Pollster page, and click "sign up." That's all there is to it (and you can unsubscribe anytime).

TUESDAY'S 'OUTLIERS' - Links to the best of news at the intersection of polling, politics and political data:

-PPP (D) finds Republican Cory Gardner within two percentage points of Colorado Sen. Mark Udall. [PPP]

-An internal POS (R) survey for supporters of Oklahoman T.W. Shannon finds him gaining on Sen. Tom Coburn (R-Okla.) in the Senate primary. [Roll Call]

-The Chicago Sun-Times charts turnout in Tuesday's primary election [Chicago Sun-Times]

-Kathy Frankovic explains why Americans don't see the economy as improving. [YouGov]

-Republican pollsters say bashing Obamacare remains their party's best tactic for 2014. [Time]

-Gene Ulm (R) argues the Democrats turnout problems will "tilt the board heavily in favor of the GOP in '024." [POS]

-Just 30 percent of the uninsured realize that federal tax credits are available through the Obamacare exchanges to make insurance affordable to lower income families. [Bankrate]

-Mark Tracy says Nate Silver is a hedgehog, not a fox. [New Republic]

-Carl Bialik explains how statisticians could help find the missing Malaysian plane. [538]

-Jeff Leek suggests a checklist for "decoding" news about health studies. [538]

-Time for some traffic problems in Fort Peep. [The Monkey Cage]