10/17/2012 11:57 am ET Updated Dec 06, 2017

Graphing the Spillway of Big Data: Interview With Tagasauris CEO Todd Carter

At the unveiling of IBM's PureData system in New York City last week, two facts emerged on the Big Bang explosion of unstructured data that stilled the packed ballroom at the Grand Hyatt Hotel. In the past two years, new technologies have created more information than from all human history before it. Not to be outdone, there will be 44 times more data generated by 2020 than there is today.

The deluge of big data pitted against the diminishing returns of time is the greatest challenge consumers and businesses face today. To sift through the ocean of data in a fast, seamless way will become the next evolutionary step in social media. It's easy to collect data in a warehouse, but difficult to pull the integral data that can be accessed, analyzed, and acted upon. Without the ability to filter out the background noise of information, efficiency gains won't be realized and opportunities will be missed. And that could be something as simple as people's time.

With a limited number of hours per day, the time-consuming platforms of the current, receding social media wave -- Facebook and Foursquare -- will face stiff competition from the next wave of personalized search and full engagement.

Don't believe me?

"There will be a new wave in the post social world of new players. The post social players will dominate," said George Colony, the savvy CEO of Forrester Research. That little nugget is a year old. And he's right.

The next wave will be fought over time, the most valuable, taken-for-granted asset we all own. It will show the 'new players' disrupting the disrupters of five years ago.

Tagasauris, a metadata start-up that organizes the noise and makes it searchable on the Web, was founded in 2010. But its semantics technology is older than that. Like its machine intelligence and crowdsourcing approach to big data, Tagasauris' motto has evolved, too.

Today, the company slogan is "Scaling the discoverability of visual media." But it used to be: "We make your content smarter" with metadata curation.

The Human Face of Big Data

I first came in contact with Tagasauris at the "The Human Face of Big Data" -- a summit that unveiled a new socio-culture, mobile, time capsule-esque experiment on human behavior: 10 million downloads of the Human Face app, which for a week (starting in November) will track the users anonymously on their daily activities, all the while having them respond to surveys and questions.

Sounds creepy? No, it's the future of crowdsourcing on a global scale.

At that event, Tagasauris was one of two-dozen start-ups invited to showcase their play on big data. What they are developing is fascinating. That night, I caught up with Tagasauris' cofounder and CEO Todd Carter at a presentation that his company, with guest speakers, including Professor David Alan Grier -- author of the highly regarded book, When Computers Were Human held at the Museum of New York.


In a follow-up phone interview with Mr. Carter, this author looked under the hood of Tagasauris to see if it will become the 'next social wave player.'

"Tagasauris grew out of a life long interest in the power of metadata to unlock the value of digital media," Todd Carter said. We built a team around an idea that included three PhDs with relevant, multi-disciplinary expertise in computer science, human-computer interaction, collective intelligence and web science. "The impetus was making digital media more discoverable, connected and engaging."

The concept was inspired, in part, by a 2005 start up that connected tags to musical features extracted from audio signals.

"We were aiming to become the Google of music. We built a query language based on sound that employed signal-processing techniques to exact audio features from music. We had a query analysis window that was ten seconds long. You could play any track in your collection and at any point you could click search," he explained. "Of course a key here was tagging the big pile of sound fragments that connected one electric Janis Joplin vocal to, perhaps, another from Robert Plant."

From Digital Music to Visual Media

"In order to make something like that work at a consumer level we had to close the semantic gap," he said."The disparity between descriptions of multimedia content that can be computed automatically and the richness and subjectivity of semantics used to find, organize and share music."

Could they crowdsource that problem? Could they apply the lessons learned to other media types?

The problem, which has since become opportunity for Tagasauris, was a project in Paris with the Magnum Photos archive. "Magnum is a photographic cooperative whose sixty photographer members upload thousands of digital image files each day to its archive," Mr. Carter said. "They are under deadline so they do this often with minimal labeling of information about an image or a story. To a computer this often looks like an arbitrary string of characters.

He continued:

"We looked at the problem. There were hundreds of thousands of images without meta-data. The photographers wanted to focus on storytelling through pictures. To their creative minds, tagging images was akin to asking programmers to comment their code. At the same time, search engines required text about the images to hydrate their indices and editors needed images tagged with meta-data about their content to find what they were looking for. That being a whole new layer of contextual metadata would be needed."

Then the question hit home: "Could we build human computation engine that would combine machines and crowds in a cooperative fashion to do things that neither could achieve in isolation? Could we do that for visual media?" he wondered.

Visualizing Big Data

Other questions surfaced as they examined the contours of the visual media landscape. Could they automate the discovery of certain context information from embedded camera metadata such as GPS? Could that be mapped to geo places? What about other sensor data? What about the generation of new data? How would they cost effectively automate the creation of a new interlinked data layer of context for visual media in the Magnum Archive?

"This was web-scale problem," he said. "For me, I spent a decade building web-based media management systems that did new things with data. I know a pony when I see one. It was the consumer Internet, a programmed experience, metadata driven entertainment, like Pandora."

Todd Carter has more than 20 years experience working with photo archives, libraries, museums and IR systems, including most recently the linked open data community.

"For Tagasauris, the sandbox with Magnum was an ideal place to start to develop the methodology," he said. "Could we connect the dots at scale? Could we catch a ride on this combinatorial, data-driven rocket to the future of media and entertainment?"

In scaling the discovery of visual media, Tagasauris had to vastly improve the plain-text tag. "We needed something based on HTTP URIs that referenced unique, well-defined concepts and included metadata payloads," he said.

The big three elements Tagasauris is pursuing include discovery and connectedness. The third will be "content becomes more engaging."

Clients of Tagasauris include libraries, archives and museums, New York Times, Ryerson University, and the Museum of the City of New York.

"The furture for Tagasauris will be about cataloging and characterizing the richness and expressiveness of event and visual media data, the collectivization of knowledge and about creating a rich ecosystem for Linked Open Data innovation," Todd Carter said.

"At Tagasauris' core is a large-scale human computation engine. It's sector agnostic. The technology is completely made for the big data world."

The re-tweets of big data on the social graph is only going to get more interesting.