Probing The Realities of Big Data In Alzheimer's

04/29/2015 01:01 pm ET | Updated Jun 29, 2015

Still Alice, the film with Academy-award winning actress Julianne Moore, powerfully depicts a woman's steep descent into Alzheimer's Disease (AD) and demonstrates why people overwhelmingly cite Alzheimer's as the disease they most fear.

There is currently no effective treatment for AD. For individuals, an Alzheimer's diagnosis is the beginning of a dark, inevitable journey to a place where our connection to ourselves and the world slowly fades away. For governments, the spiraling global growth of the disease -- with nearly 150 million victims projected for 2050 -- threatens the budgets of even the wealthiest of nations.

To avert the current inevitability of these dire projections, breakthrough innovations are sorely needed.

Modern technology now makes it possible to collect so-called Big Data on entire populations, opening up new possibilities for AD research. Today, genetic, imaging and even sensor data can be collected on massive numbers of people with strong privacy and security protections. Unlike the traditional research approach that relies on one or a small number of labs' work on data to publish a limited set of findings months or years later, big data allows a global network of researchers to gain access to each other and to citizen-generated data for collaborative and patient-focused research. The challenge before us is to mine all this data to unearth the clues to understand and then beat Alzheimer's. And soon.

What barriers do we face in ushering in this new era of open science, and what advances are possible if researchers apply 21st century open science to the world's broadest and best sources of Alzheimer's data?

The Global CEO Initiative on Alzheimer's Disease (CEOi) and open science leaders at Sage Bionetworks and DREAM organized a collaborative competition to probe these questions -- the Alzheimer's Disease Big Data DREAM Challenge (AD#1). Challenge participants used the biggest and best data set now available -- the Alzheimer's Disease Neuro-Imaging project (ADNI) -- to build predictive algorithms: (1) to make possible an earlier diagnosis of AD; (2) to identify people not impaired by AD even though they exhibit biological markers of the disease, or (3) to use only brain-imaging data to identify people having the hallmark features of AD. Organizers objectively evaluated the algorithms by assessing how well they worked when applied to other open premier data sets (from US-based Rush Alzheimer's Disease Center or EU-based AddNeuroMed).

The Challenge ran for three months over the summer of 2014. During that time, more than 520 researchers from around the world provided a total of nearly 2,000 predictive algorithms. Challenge organizers identified the best current predictive models of AD and determined what has to happen next to make these types of models useful in primary physician clinics.

Two major roadblocks emerged. First, the datasets were too small and too limited to produce the statistical "power" needed to provide accurate predictions to the three questions posed in the Challenge. The teams that worked on AD#1 collectively logged thousands of hours of effort to develop their predictions. Much greater success could have been achieved if they had worked on a data set with adequate size and composition to generate the statistical power to answer important questions about AD.

Second was the surprising degree to which today's premier publicly available data repositories do not lend themselves to a crowd-sourced public research project like AD#1. Challenge organizers had to overcome the constraints of data use agreements and the mammoth task of harmonizing independent data sets so that they could, in a sense, "talk" to one another. Sadly, datasets widely viewed as 'open' were, as a practical matter, inaccessible because of narrow patient consents, inhospitable data structures, or administrative barriers to access.

The way we manage and use medical data is still decades behind the connectivity and fluidity that governs how we bank, how we shop and how we communicate. To address this, three key changes in the management of biomedical data are needed. First, the open repositories need to be brought current with existing and anticipated information technology. Second, data repository management should be built to support robust open access on a global level. Finally, 21st century data sets need to be interoperable so researchers can merge data sets together to glean the insights needed for breakthroughs in Alzheimer's disease. With focus and effort, existing global standards and open science practices can deliver interoperability, but this is no small task.

Hope is on the horizon, as we can see a future where privacy-protected biomedical data can be shared to advance scientific research across a global community of diverse experts. Today in the UK, there is the 2 million volunteer Dementia Project. In the US, it's President Obama's recently announced million person DNA sequencing initiative. In business, its Apple's recent announcement of ResearchKit which will leverage the world's 700 million iPhone users to collect and analyze new types of data in asthma, breast cancer, diabetes and Parkinson's Disease.

AD#1 exposed the impediments that limit the full potential of Big Data to address medicine's most taxing unmet medical needs. Let's think of what we can do for those at risk for Alzheimer's and let loose the innovation promised by 21st century citizen science.

This is co-authored by Stephen Friend, President of Sage Bionetworks. George Vradenburg is the Founder of USAgainstAlzheimer's and the Global CEO Initiative on Alzheimer's Disease.