THE BLOG
05/30/2016 03:20 pm ET Updated May 31, 2017

How data science can help save the world

This report was originally published on The Signal. To follow stories about tech, data, and culture, subscribe today.

***

When the Indonesian government threatened to remove democratic local elections, 118,000 citizens signed a petition to preserve their right to vote.

Inspired by public outcry and a petition signed by 140,000 people, a court of appeals exonerated American Tyrone Hood after being wrongfully imprisoned for 22 years.

In Canada, two teenage girls convinced school districts to make consent part of Ontario's sex ed curriculum all with the support of a virtual community.

And in The Gambia, the president finally declared a ban on female genital mutilation after a single online petition caught on like wildfire. What inspired these global initiatives? One place on the internet that does one simple thing very well.

That place is Change.org, the world's largest online petition platform, whose 146 million users create over 200,000 successful campaigns a year.

"With the web mobilizing these internet campaigns, Change.org harnesses the power of scale to help our users turn passion into real life impact," said Andy Veluswami, director of data science and analytics. And thanks to Andy and his team, the mechanics behind social activism, and how movements spread and gain momentum, are far clearer because of Change.org's unorthodox practice of data science.

It turns out being a data scientist isn't just about crunching the numbers to uncover deep insights. According to Andy, cultural understanding and language are key drivers in how Change.org discovers, serves, and grows its community.

The stakes are high at Change.org, considering its users are often victims of injustices or defendants of the disenfranchised.

But no matter what kind of company you are, any product manager or data scientist can relate to Change.org's universal tenet: users aren't just "users" - they are people, and tech is here to serve those people in whatever capacity it's set out to do. This is no easy task.

While dealing with the large datasets can get convoluted, scale is not the toughest part of the job. The real work of data science is in grappling with the difficult choices outside the realm of numbers. According to Andy and his team at Change.org, "data science" is more interdisciplinary than the name might suggest.

Another frontier

In the Change.org offices in Portero Hill, Andy was wearing a typical Silicon Valley uniform - stylish jeans, a gray hoodie, salt and pepper beard. He looked the part of a quant, but anyone who talks to Andy for more than two seconds would get the impression that he stands out from the typical techie. He's always loyal to the data, but his sheer passion for social activism helps him see the greater context behind the viral spikes and trending petitions.

Not only was Andy involved with grant administration in a past life, but he has also been involved with several nonprofits throughout San Francisco. He counts among his personal heroes Jaha Dukureh, "an incredible woman who created the petition that banned female genital mutilation [in The Gambia]," he told me.

After his time at Zynga, the mobile gaming company who set the model for how a lot of companies do analytics today, Andy joined Change.org, bringing all of his experience to bear on social activism.

"What we're doing here at Change.org is another frontier. There's such a hunger in the world to learn not only about the numbers and underpinnings behind a movement, but how we can make people feel empowered in society," he said.

Feeding this hunger is not just about parsing the numbers, but also about grappling with the cultures and politics that drive these movements. As an example of this, Andy shares how data and research were important in empowering Change.org's users in India.

Finding new voices
For a country of 1.2 billion people, India's internet usage is exceptionally high, and it's largely on mobile. About 10% of the population speaks English, so after three years, Change.org's audience in India was modest, but sizable.

However, after conducting market research, Change.org realized that they needed to localize in another, vernacular language for a non-English-speaking audience. Not only were they missing 90% of Indian citizens, but that 90% didn't have the same advantages as their English-speaking counterparts.

"English speakers in India are generally more elite, more urban, better-educated and have connections to the diaspora of international folks and government postings," Andy told me. This socio-economic breakdown gave English-speakers more agency in their political and social systems but, it was a much different story for those who didn't speak English.

Change.org found that Hindi speakers, about 400 million people, are generally more rural and less elite. Not to forget to mention, "Hindi speakers don't have all the opportunities that English speakers have in India," Andy said. "In addition, the income difference between the two groups is significant, estimated to be anywhere between 9x and 100x." It quickly became apparent to their team that Hindi speakers needed a platform to express their political will.

"We have to mobilize these people," Andy remembered, "These are the people that don't necessarily have the loudest voice at the table."

In October 2015, the Hindi site launched, and Change.org witnessed the impact it had on its Indian user base.

"We noticed the political orientation of users had differences between the English and Indian sites. An example that illustrates this is 'reservation'."

"In India, caste-based 'reservation' is essentially a form of affirmative action where India sets aside quotas to help disadvantaged minorities," Andy explained, "It's very similar to what we do in the US with race-based admissions, often called 'affirmative action.'"

This is where their data science team saw a direct relationship between language and the nature of petitions being created.

"On the English site, we saw a rather large petition that supported the ending of the reservation system. After we launched the Hindi site, we noticed petitions being created on the other side of the debate - i.e. in support of affirmative action."

They realized that the language their platform supports only helps those that speak that language. So in diversifying their product, Change.org gave more people "a seat at the table."

The multitudes in data science
"With our scale at Change.org, I see a lot of opportunity for data science to help users make an impact, and for our platform to gather more insight into what drives a movement and who we need to help," Andy said. Even as a glass-half-full type of data scientist, Andy is not blind to the complexities of the job. Data science unlocks insights, but it also comes with limitations.

"Data science is great at parsing through numbers. If you've got data, it can be crunched. But what data science can never analyze are things on a deeper level, like the motivations behind someone starting a petition," Andy said.

At Change.org, crunching numbers evaluates a petition's performance and reach, or the quantitative "success." Whereas, understanding the data from different perspectives - with political, cultural, or even personal insight - can get closer to pinpointing the motivation behind a user who created or signed a petition.

"Looking at the numbers and mapping it onto our understanding of a culture has been essential when figuring out where our next opportunity lies," Andy said.

This is why data science doesn't just encompass "data" and "science." Rather, data science is a practice involving a multitude of disciplines, such as politics, culture, psychology, anthropology, and beyond. Without a doubt, it's notoriously difficult to define, leaving it up to debate amongst the experts.

Data science DNA
According to Brad Schumitsch from Twitch, "data science brings together three things: statistics, programming, and product knowledge." Unsurprisingly, Andy defines how to be a data scientist differently, adding in a fourth category I hadn't heard before.

"Here, we have four teams with distinct and interdependent functions that make up our data science and analytics practice: data engineering, quantitative analytics, machine learning, and content science," he told me, sharing some slides that have made the rounds at data science meetups.

The first three components make sense, sure, but content science? I honestly didn't know what that meant. I have since learned that another term for "content science" is "taxonomy," or the art and science of categorizing and naming things.

"One of the exciting and unique things about being a data scientist is that content sometimes falls under our domain," Andy tells me. That was exciting. I never thought that content, which is often seen as a "soft skill" to engineers, would play such an intrinsic role in data science. However, after hearing what an impact different languages made to Change.org's Indian users, it made complete sense. But, it's not only about the overarching language on the website, but also the specific terms used to describe things.

From a user's perspective, when there are hundreds of thousands of petitions to search through, how each is labeled and categorized greatly impacts how a campaign is discovered.

From the website's perspective, how a petition is labeled also informs Change.org's recommendation engine, an internal tool that learns the interests of a user based on what petitions she's signed in the past.

For example, if a user advocated for a PETA campaign, then the recommendation engine is likelier to serve up a petition calling "to stop the barbaric ritual of harvesting bear bile," than a petition supporting the open carry of guns, i.e the right to "bear" arms. Content science, in essence, impacts Change.org's user experience, keeping someone engaged and retained, ultimately signing more and more petitions over time.

Parsing vs. understanding
"Some topics, like human rights, are much more complex to categorize," Andy said, "We have to do a lot more sophisticated kind of number or text crunching to be able to definitively say a petition is a human rights petition."

Andy gives a classic example of how a computer could screw up: "A 'right to bear arms' isn't an animal rights issue. It's a political one. A computer doesn't have cultural context to know what this phrase means, but a person wouldn't trip up on this categorization."

At this juncture, you can see how the interdisciplinary nature of data science (and just being a human) becomes more important than programming. A major part of Change.org's taxonomic efforts go into figuring out what the correct and most up-to-date terms are, that way nothing gets miscategorized.

"It's an art and a science," Andy continued. "Luckily we have some great political scientists and political minds in DC and New York and across the world helping us understand what are the right terms people and campaigners are using."

These political scientists also help determine what is a "right-leaning" vs. "left-leaning" campaign, a job a bit too nuanced for a computer to automate and do by itself.

"The political leaning of a petition is not always obvious. We really need that human layer and it's critical to our analytics process at Change.org," Andy said. Even on the algorithmic side, Andy explains to me, Natural Language Processing (NLP) still needs that human processing.

"Certain words in petitions or political trends cycle in and out depending on what's going on in the world. There's no way a computer would be able to know that," explained Andy.

"'Human rights' doesn't index well but 'refugees,' for example, did really well in Q3. Now there might be other terms that are trending, like 'migrants.' Our content science is a continuously evolving conversation that requires insight from our political scientists."

People with cultural and language fluency are vital to Change.org's operations, and it would be nearly impossible to reach the scale that it does without its diverse employee network.

Take the Hindi site. It couldn't have been created without those who speak and write native Hindi. Why? Meanings and definitions can get lost in translation. Have you ever compared a translation tool to a native speaker? Having the experts that understand the idioms of trending topics is key for accuracy in content science.

It goes to show that no matter the systems in place to collect and analyze data, cultural understanding offers insight that can't be replaced when building a product. And for startups trying to acquire new users, compiling other disciplines into data science - even empathy - will be that irreplaceable "soft skill" for product teams when finding that product-market fit. It also requires higher levels of emotional intelligence to weigh a situation's urgency, even if the numbers don't reflect that magnitude.

Even in small numbers
When the Hindi site launched in October 2015, its impact didn't just allow citizens to voice their opinions on hot button issues. The platform became a vehicle for change, even at a smaller scale, Andy explained.

"Villagers in a small town were petitioning big industries that were polluting their rivers. Because of the petition, the village council and the chief councilor decided to set up a committee to regulate the pollution of the water," said Andy.

"The Hindi petition had 110 signatures," Andy continued. "It underscores the fact that petitions don't need a large number of signatures to win -- we realized that 'critical mass' is case-by-case, and the definition of 'success' is relative."

Despite their massive scale, Change.org learned that being a data scientist, especially for social activism, included some counterintuitive thinking. Sometimes it's not a numbers game. Sometimes it's not about getting as many signatures as humanly possible on the internet. What matters is that the numbers make an impact, in context to the community it's trying to serve, no matter how small.

***

This report was originally published on The Signal. To follow stories about tech, data, and culture, subscribe today.