Data science is one hot field these days. In a 2011 article in the Harvard Business Review, DJ Patil and Thomas H. Davenport stated unequivocally that "Data Scientist" would be "the sexiest job of the 21st century." A tidal wave of discussion, dissension, and downright contrarianism ensued as technical experts (and not-so-technical experts) argued about the merits of data science and how it will or won't change the world.As I argue in Too Big to Ignore, more and more businesses are realizing value from data. More and more governments are embracing data as a way to improve lives, and more and more people are using data in their everyday decisions. Gathering and storing data-all kinds of data-is the first step on an exciting journey. Data science can serve as a the map that will lead us to where we want to go, but how?
To answer that question, I sat down with Melinda Thielbar, a Ph.D. statistician with 15 years of experience in software development and analytics consulting. She is also a co-founder of Research Triangle Analysts.
PS: Why is data science so big right now?
MT: Just as a music buff will tell you rock 'n' roll has its roots in blues and folk, a data scientist will tell you data science isn't new. Data science is the latest incarnation of an idea that's as old as old as the scientific method. Gathering information in the right way gives you a chance to test your ideas, make adjustments, and do better next time. These methods have allowed medical science to progress from "bloodletting as cure-all" to the discovery of penicillin. As recently as the year 2000, applying those principles required a huge infrastructure investment most businesses couldn't even contemplate.
Now, almost every customer carries a smart phone. Most business communications are digital. Call centers are equipped with software that would have seemed like science fiction a decade ago. Even small businesses have a web presence (and are realizing huge benefits from the most basic investments). Each interaction with a customer generates data, giving businesses endless opportunities to test their processes and improve.
PS: What does a data scientist do?
MT: Getting meaning from vast amounts of data is not always straightforward. Let's take the example of the call center. Many call centers track how long calls last and the number of calls each employee processes. If an employee's average call length is longer than his or her peers', what does it mean? Is she bad at her job, taking too long to complete basic tasks? Is she great at her job and taking a little longer because she makes sure the customer's problem is solved before she hangs up? Or is she actually average, and just happens to get more complicated problems (what a statistician would call "random variation")?
These are the questions a data scientist works to answer. Any analysis package, including most spreadsheet programs, will ingest data and spit out basic statistics like mean and standard deviation. Some of them will even perform regressions and decision trees. Basic analysis packages will produce endless streams of statistics, but they do little to help users interpret the results. There's a real difference between generating the reports and understanding what they mean. Data science is all about making sure the numbers tell the real story for your organization.
PS: So, say I'm a leader in a business or government organization, and I want to learn about data science. Where do I start?
MT: The Ghost Map is my favorite resource for leaders ready to use data science to drive results-driven change. It is the story of Dr. John Snow and Reverend Henry Whitehead, and how they used 19th century data science to discover the cause of cholera. Rather than focusing on the details of the statistics The Ghost Map shows a technical expert and a community leader working together, respectfully disagreeing, and forming a bond based on mutual respect. Modern cities exist because these two men were willing to challenge their own assumptions as well as the accepted norms and follow where the analysis led them. No matter what your technical savvy, if you're willing to find someone with the right attitude and a passion for data, your organization can realize great benefits from bringing data science to your culture.
If you're new to the idea of data science, there are some semi-technical books that can help you understand the field better. Nate Silver's The Signal and the Noise is the new standard as a primer on prediction. Personally, I'm a huge fan of a little book called Proofiness, which talks about the difference between good statistics and bad.
PS: How can an organization get started with data science?
MT: Data science is a hybrid of programming, business analytics, and statistics. Chances are, your organization has people who are skilled in at least one of these areas. Consider bringing those skills together in an internal team and doing a "test run" data science implementation. A short-term, small-scale project that uses existing tools and existing talent can give you a much better perspective on what you need and what you already have. Think of something you wish you knew about your organization. Set a timeline for answering the question and a measurable goal for success. Follow it through, and document what you learn throughout the process. Even if you don't get the answers you're looking for, what you find out about yourselves and your organization's data will be invaluable. Data science is a new field, and being willing to learn is the first step in becoming a data-driven organization.
PS: What about consultants?
MT: Consultants give you access to skills you don't have, fresh perspectives on old problems, and leadership as your firm moves into a new field. Think of them as your company's expert guide in this exciting new territory. If you have the budget for a consultant, you should absolutely take advantage of the great people who are offering consulting services right now (including, but certainly not limited to, me). Not only can they help you find out what you don't know, they can also help you leverage what you do know in the best way possible.
PS: You say data science is new, so why hurry to get started?
MT: Why wait? The data tidal wave is already here. Smart organizations are taking advantage of that information to create better spaces for us to live, work, and play. Compared to the infrastructure needed to collect and store data, the data science needed to use it is cheap. The upsides are enormous. If you're a leader, you owe it to yourself and the people you lead to take advantage of this opportunity. Don't let it pass you by!