As the Obama administration continues its efforts to stabilize the financial system, the question of Treasury Secretary Timothy Geithner's relationship to Wall Street keeps reemerging. Is he too close to the bankers over which he now has to ride herd? As head of the New York Federal Reserve, was he getting the information he needed to make a clear-eyed assessment of the looming banking crisis, or was he surrounded by an insular inner circle that didn't ask enough unpleasant questions until it was too late?
The New York Times has attempted to address these questions in a way Geithner might not entirely appreciate: through the publication on its Web site of 658 pages of Geithner's daily schedule, covering the last two years of his time at the Fed. Readers can wade through day by day and see who Geithner had breakfast with, whom he met with, and whom he played tennis with.
While the prospect of having the public leaf through one's daybook may send chills down the spines of public officials and set off national-security and privacy alarm bells, this level of public exposure is the wave of the future. Indeed, the real significance of the Times' posting of Geithner's schedule is the glimpse it gives us into how much more informed we will be once we stop treating data like an artifact for journalists and historians and start treating it like a tangible asset we can leverage to our benefit.
Imagine if Geithner's schedule had been released in real time, or something close to it, rather than somewhat after the fact, in response to a Freedom of Information Act request. More important, imagine if it were structured -- if, instead of PDFs of typed pages, his schedule were published in a database-ready format so it could be analyzed. Thanks to a number of readily available software packages that conduct social network analysis, the question of how closely connected he is to a particular group of bankers, or how impermeable his circle is to outside perspectives, or who the most influential players are in his network, could actually be answered in an intelligible, quasi-quantifiable way. And I would suggest that not only would we taxpayers find this interesting, but that Geithner would find it interesting as well. Data doesn't only make you transparent to other people, it makes you transparent to yourself, giving you insights into your behaviors that you are unlikely to intuit -- as anyone who has tracked how much they actually spend at Starbucks in a week has experienced first hand.
The notion that officials would expose this sort of data about themselves and invite the public to dig in is becoming less far-fetched by the day. In 2006, 96 congressional candidates promised to publicly release their daily schedules. The SEC is pushing companies to report their financial data in a standard database format that would make it easy for people to compare apples with apples and do their own financial analyses. At the Department of Education, Arne Duncan has said that part of the $100 billion in education funding slated to be distributed to the states will be contingent on the states providing reams of hard data on performance. Perhaps most notably, the White House is adopting a set of powerful data standards for some of the information it publishes on its Web sites, enabling that information to be read, analyzed, and debated by a growing and highly energized cadre of policy-oriented computer savants. And as we know from the history of the Internet, when computer savants are madly excited about something, we're all going to be doing it in ten years. That excitement is further fueled by the expectations of data.gov, the new Web site in the works to expose a broad array of government data.
These examples, in turn, fit into a larger movement underway in the technosphere. Tim Berners-Lee, the man who invented the World Wide Web by writing its underlying protocols, has been spreading a new gospel, one he reduces to a three-word chant: "Raw Data Now." Raw Data Now means simply that data -- from the government, NGOs, research labs, businesses, and individual citizens -- should be made available as quickly as possible in standard formats that everyone can play with. Raw Data Now takes the information revolution unleashed by the Web to the next logical step, to what is called the Semantic Web. Just as the Web allowed us to link Web page to Web page, the Semantic Web will allow us to link data set to data set. Slate's social graph of the U.S. Senate, for example, does an enviable job of showing the inevitability of Arlen Specter's defection -- and who else might be next. Now imagine a similar graph for Geithner during his time at the Fed, mapped against a timetable of the unfolding financial meltdown. The Times does its best to do this, adding comments to particularly interesting daybook items, such as when Sandy Weill met with Geithner about taking over Citigroup. But with unstructured data, the Times is reduced to using electronic Post-It notes that are no less primitive for being digital.
Of course, as public officials begin to expose more data -- and public expectation increases accordingly -- a host of questions will need to be addressed: When does transparency stop empowering democracy and start engendering paralysis that comes from being given no private room to breathe or maneuver? How do we factor in national security concerns? As with most things with the Internet, the answers will come through trial and error, as we think through new capabilities in light of conflicting needs, priorities, and values.
But for Berners-Lee and many others involved in Semantic Web initiatives, transparency is just the starting point -- the real goal is a richer, deeper understanding of the world we live in. We still have a ways to go before we get there. According to the Sunlight Foundation, of those 96 congressional candidates who promised to publicly release their daily schedules, only one, Kirsten Gillibrand, since appointed to replace Hillary Clinton in the Senate, was elected and followed through.
The real issue, of course, is changing our attitude toward the data we generate as we move through the world. Rather than see data as an uninteresting byproduct of our actions, we need to see it as a component essential to collectively understanding who we are, how we behave, what works, and what doesn't. If we do so, fifteen years from now, the way we deal with data today -- even in the age of Google and Facebook -- will seem as primitive as a rotary telephone.