Even within the infrastructure of the American surveillance apparatus, the National Security Agency is notoriously secretive. The spy agency jealously guards from public view practically all aspects of its operations, from the information it collects to its plans for a massive 100,000-square-foot building being constructed in the Utah desert.
But when it comes to the agency's primary tool for making sense of all that data, the NSA hasn't been secretive at all. Indeed, two years ago, it made public the very code for a key program it uses to analyze the firehose of information pouring into its computer servers.
The NSA’s decision to give away that code to developers has helped fuel what is now a booming trend in technology known as "big data." The technology, Accumulo, makes it possible for companies to sift through massive amounts of information with essentially the same degree of sophistication and security as the country's top spy agency.
The use of computers to spot connections along a trail of digital breadcrumbs is hardly new. For years, major companies, from Amazon to Facebook to Google, have analyzed customer information to suggest books, friends or search results.
But the NSA’s use of such computing power was not widely understood until last week, when The Guardian and The Washington Post reported the agency was collecting and crunching huge amounts of Internet, phone and financial data in a bid to predict terrorist activity.
The revelation that the NSA was collecting a massive trove of phone and Internet records from Americans highlights privacy concerns around the use of data analysis to draw conclusions from a wide of variety of information.
“There are all sorts of things you can do with this technology,” said Matt Turck, managing director in FirstMark Capital, a venture capital firm. “Now it’s up to society to decide what’s acceptable and what’s not acceptable.”
The same cheap data storage and free open-source software used by the NSA now allows companies to conduct the kind of sophisticated data analysis once was only available to Internet giants like IBM and Google.
“Ten years ago, if you wanted to store and process that much data you would have to spend millions of dollars buying really expensive servers,” said Ben Siscovick, general partner at IA Ventures, a venture capital firm that invests in big data companies. “Now, the tools are out there, and they’re accessible in a low-cost way to just about anybody who wants it.”
For advocates of big data -- an industry with an estimated value approaching $100 billion -- the potential for technology like Accumulo has barely been tapped.
"This is the first technological innovation since the Internet with the potential to change the world," said Christopher Lynch, an investor that has bankrolled 10 Boston-area big data companies.
One of those companies is Sqrrl, which Lynch helped launch two years ago after poaching from the NSA six engineers who developed Accumulo. Sqrrl markets its technology to companies in the telecom, health care and financial sectors who need extra security when dealing with sensitive customer data. The database sorts through enormous amounts of information and restricts access to users with high-level security clearances, said Ely Kahn, the company's co-founder.
Its technology is used by major banks to predict whether customers will pay off their credit cards based on information like the demographic characteristics of their neighborhoods. It is also used by a telecom provider to spot damage on its network by searching for keywords like “broken” in a database of customer service calls, Kahn said.
“It’s similar to the way Amazon or eBay use databases to predict what you might want to buy next,” he said.
But the growing reliance on databases and software to draw conclusions has raised privacy concerns before. Target, for example, sparked controversy last year when an employee told The New York Times how the company could determine whether a woman was pregnant based on her purchasing history and demographic information.
Lenders have started assessing the creditworthiness of borrowers by doing big-data analysis on their social media connections. And some health insurers have started buying massive databases to potentially flag people for being at risk of obesity if they have a history of buying plus-sized clothing, according to The Wall Street Journal.
FirstMark Capital's Turck predicted that the ability for both the NSA and companies to unlock secrets from the data they collect “is only going to get more powerful and more precise."
“The genie is out of the bottle,” he said.