"My data is bigger than your data..."
"Yeah? Show me yours and I'll show you mine..."
An apocryphal conversation between two macho teens in the near future?
"Can't meet you later. Have incense burning duty at the Data Temple..."
The new religion perhaps?
"Wait. I love those red pants but I have to check what other people like me are buying. I hope they like it..."
Maybe you think I'm pushing the edge here, but when search returns more than 2 billion possibilities on Big Data, it's a Big Data challenge to figure out what it all really means.
Let's start with a few simple explanations to set the stage -- beginning with Wikipedia:
"Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications..."
So far so good -- in other words, when a government decides to listen to everyone's phone calls, it's hard to manage by having a bunch of cigarette-smoking guys, with headsets, hunched over recording devices... or for that matter, if you are in a retail business, how do you track what your customer had for lunch before browsing your website, and then how do you relate that to the local search for a haircut they had just concluded related to similar people who were walking their dogs at the same moment? You get the picture.
IBM has estimated, "Every day, we create 2.5 quintillion bytes of data -- so much that 90 percent of the data in the world today has been created in the last two years alone."
To give that some sort of visualization, let me share from an article by Theresa Riley on Moyers and Company:
"In Big Data, A Revolution That Will Transform How We Live, Work, And Think, published earlier this year, authors Viktor Mayer-Schonberger and Kenneth Cukier try to explain just how much data there is in big data. They write that 'in 2013 the amount of stored information in the world is estimated to be around 1,200 exabytes, of which less than 2 percent is non-digital.'"
What exactly is an exabyte, you might ask? They continue:
"There is no good way to think about what this size of data means. If it were all printed in books, they would cover the entire surface of the United States some 52 layers thick. If it were placed on CD-ROMs and stacked up, they would stretch to the moon in five separate piles. In the third century B.C., as Ptolemy II of Egypt strove to store a copy of every written work, the great Library of Alexandria represented the sum of all knowledge in the world. The digital deluge now sweeping the globe is the equivalent of giving every person living on Earth today 320 times as much information as is estimated to have been stored in the Library of Alexandria."
Bottom line, Ptolemy II would be a dummy today by the standards of Big Data.
I only point out that it might be worthwhile trying to analyze the difference between what he collected and what we collect. Think about it...
And then occasionally you run into scholarly articles like this one -- applying Big Data to Word of Mouth.
Frankly, I have no comment other than to say DIGIBABBLE.
Let's be clear: this is not a diatribe against data. Far from it -- I have been working with and in data for the better part of my career, and I fervently believe that the right data, applied the right way, in the right circumstances cannot only help you shop or drive more profits for Big Enterprise, but actually help our world and the people in it. Sadly it won't stop war or hatred or injustice... and maybe that is where we need to focus to learn about what our new religion really is and isn't.
So my suggestion is that rather than follow the crowd and digibabble our way into group think, we take a step back and listen to some of the biggest collectors of Big Data (amazing how we don't pay enough attention to the fine print).
How about this one:
"Twitter uses a far-flung army of contract workers, whom it calls judges, to interpret the meaning and context of search terms that suddenly spike in frequency on the service... Humans are core to this system," two Twitter engineers wrote in a blog post in January.
"There has been a shift in our thinking... A part of our resources are now more human curated." -- Scott Huffman, an engineering director in charge of search quality at Google.
"You need judgment, and to be able to intuitively recognize the smaller sets of data that are most important... To do that, you need some level of human involvement." -- Ben Taylor, a product manager at FindTheBest, a "'comparison engine' for finding and comparing more than 100 topics and products, from universities to nursing homes, smartphones to dog breeds."
And there you have it: The challenge in Big Data is to recognize the small data, the important data, the smart data -- and there is where technology needs insight in order to be useful.
As a final thought, I return to our ability to save our world, to help humankind through Big Data. Listen to Leon Wieseltier, writer, critic, philosopher, friend:
"The study of the consumer is one of capitalism's oldest techniques. But it is not fine that the consumer is mistaken for the entirety of the person."
Let's not make Big Data all about shopping and profit; let's not ascribe God-like powers to it; let's never forget that the beauty of being human is that we are unpredictable.
"The other day I was listening to Mahler in my library. When I caught sight of the computer on the table, it looked small." Leon
Bottom line: Your data might be bigger than mine but next to Mahler, it's very, very small...
What do you think?
Follow David Sable on Twitter: www.twitter.com/DavidSable