False Tweets Busted By Algorithm That Sniffs Out The Truth On Twitter

Want to create the world’s least credible tweet? Try this:

During the London riots in the summer of 2011, this false tweet managed to fan a rumor that rioters were burning down the London Eye. During Hurricane Sandy, which hit the U.S. East Coast this year, one tweeter managed to spread false information about floods and fires that didn't exist.

While it has been argued that Twitter is a "truth machine," where knowledgable users squash falsehoods before they spread too far, first responders increasingly use social media to direct their rescue efforts, and false microblogging can divert or distract them. Reputable news organizations can’t be everywhere at once, which means that the tweets may remain uncorrected by "official" sources for hours after being reported on microblogging sites.

But information forensics may be stepping in to help. A new algorithm designed by Chilean researchers aims to automate the sorting of tweets into true and false categories. And according to the scientists’ soon-to-be-published paper "Predicting Information Credibility in Time-Sensitive Social Media", to appear in "The Power of Prediction with Social Media," the algorithm has sorted several thousand test tweets surprisingly well.

But how can a mere algorithm, an automated process, tell when people on Twitter are lying? According to the researchers, Carlos Castillo, Barbara Poblete and Marcelo Mendoza, "credible" news tweets share certain characteristics, as do “non-credible” news tweets.

The team’s algorithm examines 16 characteristics per tweet in order to determine the credibility of the tweet’s content. Though the researchers don't spell out all the specifics of the algorithm in their paper, they do give some details on what separates credible tweets from the less credible ones.

Unsurprisingly, tweets that tend to be more credible are sent by users with a larger number of followers, are generally longer, are more likely to contain URLs (especially URLs for the top 10,000 most visited domains on the web), and have a negative sentiment rather than a positive one. And, the study says, a lot can be learned from punctuation and point of view. “People tend to concentrate question and exclamation marks on non credible tweets, frequently using first and third person pronouns," the authors write.

The researchers also examined emoticon usage, user mentions within the tweet, the number of retweets, and the number of times the author had previously tweeted about the topic.

When the researchers used the algorithm on a cluster of tweets generated over a “normal” period of time (several days in the late spring of 2010), it was remarkably consistent. When given a true tweet and a false tweet, it ranked the true one as more credible 86 percent of the time.

The algorithm performed nearly as well in a simulated crisis situation; when the researchers tested the algorithm on tweets sent out in the 24 hours during and directly after the Chilean earthquake of 2010, the algorithm chose the true tweet over the false tweet at a rate of 82 percent.

The researchers’ paper will be published in the journal Internet Research in early 2013. The Huffington Post received an advance copy of the paper, courtesy of Carlos Castillo, one of its authors.

Correction: A previous version of this article mistakenly titled the paper that contained the algorithm "The Power of Prediction with Social Media." The paper is actually titled "Predicting Information Credibility in Time-Sensitive Social Media," and is to be published in an issue of the journal Internet Research entitled "The Power of Prediction with Social Media."