11/12/2013 01:55 pm ET | Updated Jan 23, 2014

A Bold Experiment to Tackle the Deluge of Scientific Data

Amazing technological advances are generating staggering amounts of scientific data, and the volume, variety, and velocity is increasing daily. Efficiently harnessing the data being generated has the potential to revolutionize every field within the natural, computational, and social sciences. However, there are critical challenges: the data are overwhelming current practices in most fields and academic institutions are not optimally structured to bring together and reward the people with both the scientific domain expertise and the computational, statistical and mathematical skills needed for driving data-intensive science forward.

As a molecular biologist and geneticist, I personally experienced how the massive data from the genomics revolution completely changed how my lab group conducted research. We knew the questions we wanted to answer using our data, but we didn't have the computational, statistical and mathematical skills needed to handle the data, let alone effectively analyze it. We partially solved our dilemma by collaborating with another laboratory across the country that had the skills we lacked, and by hiring a postdoctoral fellow who had sufficient cross-training to bridge the two laboratory groups. However, this slowed progress, because our collaborator was in high demand and over-committed, and we were highly dependent on our bridge postdoctoral fellow. Once he left for his own position, it took months to find another person with the necessary skills. This 'solution' was neither sustainable nor scalable.

My career transitioned to leading the Science Program at the Moore Foundation, which funds discovery-driven research across the natural sciences. Our team members were observing challenges across the areas we support in the life and physical sciences that were similar to what I had faced as a researcher. We realized that trying to tackle these challenges one laboratory at a time wouldn't scale and that the problems were increasing as more disciplines were experiencing the data deluge. Thus, we began to explore the challenges, barriers and opportunities related to this issue and whether there might be a way for our foundation to make a difference across multiple fields. Along the way we found a partner with a shared vision in the Alfred P. Sloan Foundation.

On November 12th, at an Office of Science Technology and Policy event in Washington, DC, we announced a new five-year, $37.8 million partnership with Sloan and three universities to carry out bold experiments to demonstrate the power of data-intensive science. The three universities--the University of California at Berkeley, the University of Washington and New York University--will focus on bringing multi-disciplinary people together within institutional environments that provide the resources, freedom and interconnected networks for data-intensive science to flourish.

Our hypothesis is that the greatest advances in tools and practices will result from meaningful interactions and sustained collaborations among data-intensive science researchers who build on one another's work, leverage the best practices and tools in existence, and demonstrate solutions that can be used more broadly by others. It is also critical to establish long-term, sustainable career paths in academia for those scientists who take a multi-disciplinary approach to analyzing massive, noisy, and complex scientific data. Openly sharing practices, tools, lessons and discoveries will help to ensure that the network of experiments has impact greater than the sum of its parts. Learn more here.