Democratization of Data and the Rise of Data Literacy

Democratization of Data and the Rise of Data Literacy
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

It is hard to imagine an idea more powerful than democracy—that power rests with the people. One of the most important modern-day forces shaping the way such power is used is the democratization of data. It is a work in progress.

Not so long ago the gathering of data, its storage, and its analysis was, for the most part, the work of a fairly small circle of highly trained people. I know this because I was one of them, arriving on the data analysis scene about 30 years ago. In that world, researchers used social science methods and data largely drawn from large-scale surveys and other designed national datasets to produce accurate information that policymakers could use to make key decisions based on facts and not their gut.

I’m tempted to say: “How times have changed.” But I’m not going to go that far. Instead I will say: “How times are changing.” The walls of training and barriers to access that used to make data the province of an educated elite—the people who could interact with a raw dataset, had access to the right software, could extract meaning from the data, and understood the limitations of the sources—have fallen. Today anyone with a mobile device can access, create, analyze, and disseminate vast quantities of information. This is the democratization of data.

Think about the enormous amount of data available to anyone who wants it. Want a snapshot of what is on people’s minds? You have the easy availability of Twitter feeds. Want to do your own survey? Any number of mobile apps are there for you. Crowdsourced data is used by enormous companies and lay people alike. The transformation in the sources and uses of data is beyond dramatic—and it will continue to grow and develop.

A lot of it is good work that advances the public interest. Here in Chicago, I am struck by the efforts of Smart Chicago, an organization founded in part by the Chicago Community Trust, the MacArthur Foundation, and the City of Chicago that has as a big part of its mission developing products from data that will improve the lives of Chicagoans. Smart Chicago hosted an event not long ago with the Art Institute of Chicago that I think did a wonderful job of capturing the spirit of democratized data:

What is the relationship between information technology, urban space, and the public good in the age of big data? Where do “smart cities” initiatives like the Array of Things—which doesn’t collect any information about individuals—fit into contemporary conversations about privacy and surveillance? How can the arts and humanities help our society think through these issues?

That framing is about as far away as you can get from a cadre of data analysts doing arcane data analysis work for the federal government.

It was the spirit of publicly available data being put to work for the average citizen that motivated the Chicago Tribune’s remarkable work revealing that the city’s red light cameras were looking more like cash registers than traffic monitors. It was an exercise in the new field called data-based journalism in which the key source of information for the journalist is data, not people.

As the work to democratize data progresses, there are important challenges to address, many of them having to do with making sure the vast new sources of data are used wisely and well. What’s important now is expanding data literacy. The walls are down, but the challenge is to help make consumers literate about what they are seeing, reading, and using. When the data no longer flows through the hands of the experts, it must come with added education so that people can use it effectively and to their advantage.

Many of the key lessons learned in the 20 century about the promises and pitfalls of using data in the service of democracy are still very important and useful today. For example, it is important to remember that analysis of data is a science, and, whether we are compiling a dataset from traditional survey data or scraping it from social media, there are key questions we must keep in mind. First, is it representative of the population or phenomenon we are trying to understand? Second, is it big enough for us to draw meaningful conclusions? Third, is it asking the right questions, in the right ways, to address what we need to know?

These are not easy questions, but part of data literacy is asking them. Equally important is transparency. A person providing conclusions based upon analysis of data should also be asked about the data’s limitations, context, assumptions, and origin. Without that information, it is hard to trust the results.

And finally, there is the question of data advocacy, in which purportedly objective data are manipulated to advance a preconceived point of view. That is not analysis. It is a way of masking advocacy with pseudo-science in an effort to take advantage of those who don’t yet understand the limitations of data and data analysis. In this election season, we have seen plenty of efforts to manipulate the public with purposefully distorted or deceptive statistics.

The democratization of data can be a powerful force for good, and it will certainly transform the field of data science. As that transformation takes place, it is important to keep both the data sound and the science intact. Data literacy will help ensure this happens.

###

Popular in the Community

Close

What's Hot