Can an Algorithm Solve Twitter’s Credibility Problem?

On October 29, 2012, when Hurricane Sandy made landfall, I was in my Brooklyn apartment, refreshing Twitter. The news on my timeline consisted mostly of grim dispatches from amateur storm spotters tracking Sandy’s march up the coast. By the time the storm reached New Jersey, sober reports of rising sea levels and wind speed and pictures of flooding on the east side of Manhattan gave way to apocalyptic photos that suggested the entire Eastern Seaboard had become a modern-day Atlantis. A shark swam in the streets of New Jersey. An enormous tidal wave crashed over the Statue of Liberty. A scuba diver navigated a flooded Brooklyn subway station less than a mile from my apartment. Of course, these photos were all fake. (The tidal wave was from the disaster flick “The Day After Tomorrow,” and only Jake Gyllenhaal on a boogie board would have made it less believable.)

The Twitter commons have a credibility problem, and, in the age of “big data,” all problems require an elegant, algorithmic solution. Last week, a group of researchers at the Qatar Computing Research Institute (Q.C.R.I.) and the Indraprastha Institute of Information Technology (I.I.I.T.), in Delhi, India, released what could be a partial fix. Tweetcred, a new extension for the Chrome browser, bills itself as a “real-time, web-based system to assess credibility of content on Twitter.” When you install Tweetcred, it appends a “credibility ranking” to all of the tweets in your feed, when viewed on twitter.com. Each tweet’s rating, from one to seven, is represented by little blue starbursts next to the user’s name, almost like a Yelp rating. The program learns over time, and users can give tweets their own ratings to help it become more accurate.

Tweetcred is built on insights that researchers have gained from studying massive databases of tweets surrounding major news events. In 2012, the I.I.I.T. researcher Aditi Gupta analyzed more than thirty-five million tweets from fourteen major news events during 2011, ranging from the U.K. riots to Steve Jobs’s resignation to the uprising in Libya. Gupta wanted to see if she could use certain characteristics of a tweet to predict its credibility. Human analysts ranked the credibility of sample tweets in the database. Gupta then correlated the tweets’ score with a number of variables to see of what made a Credible Tweet: tweet length, whether the tweet included a U.R.L., the number of followers of the user who tweeted it, and so on.

She found, for example, that longer tweets were more credible, whereas tweets with swear words were, unsurprisingly, less credible. Tweets with pronouns were less credible because, Gupta writes, “Tweets that contain information or are reporting facts about the event are impersonal in nature.” From these results, Gupta developed an algorithm that could be used to automatically determine a tweet’s credibility, much as Google’s PageRank judges a Web site’s relative importance. Tweetcred pairs Gupta’s research with findings from a team that conducted a similar study of tweets surrounding the spread of rumors following the 2010 Chile earthquake. The Tweetcred algorithm uses forty-five different characteristics to calculate its credibility score.

When I installed Tweetcred on a relatively quiet news day, Twitter seemed no more or less credible than before. What does it mean that the “credibility” of a joke about the #LiberalRaceHorseNames hashtag ranks three points below a tweet from Slate promoting an article about the actor Tom Hardy? Twitter is a roiling conversation where context is as important as content, but Tweetcred’s algorithm treats each tweet as a self-contained object, like a novel, that can be judged against the canon of tweets, which, in turn, leads to a rigidity that seems inappropriate for the form.

In a Sandy-like event, I can imagine that even the crude Tweetcred score might have a positive impact. In another study, Gupta and her colleagues found that the vast majority of tweets spreading those fake photos during Hurricane Sandy (eighty-six per cent) were retweets. The problem wasn’t really that users posted fake photos—it was that other people passed them along. The only thing more mindless than a tweet is a retweet, and the visual cue of Tweetcred’s little blue dots might be enough to provoke the nanosecond of reflection needed to prompt an investigation into the provenance of a supposed fact. And it could help, say, first responders examining huge numbers of tweets to separate the obviously bogus from the potentially useful.

But as a tool for the everyday user, Tweetcred falls short of its elegant promise. The fake Sandy photos went viral precisely because they were the most incredible. Nobody in history has ever tweeted a photo with the exclamation, “You need to see this picture—it’s so credible!” Tweetcred smacks of what the technologist Evgeny Morozov calls “solutionism”—the search for a clean technical fix for an intractable human problem. Improving the platform’s credibility will take more than a browser plug-in; it will take a community effort to combat built-in incentives that encourage quick, thoughtless information sharing. Twitter, for what it’s worth, already has a credibility rating: the coveted blue “verified” check marks, which Twitter bestows upon its most valued members. But a surprising number of verified Twitter users shared fake photos during Sandy, and spread false reports after the Boston Marathon bombing, adding their imprimatur to the speculation.

A relatively small number of verified journalists set the pace for Twitter during breaking news events. Twitter, as part of its lucrative cultivation of media companies, created a two-tiered system to boost the signal of favored users. It stands to reason, then, that Twitter has the justification—some might even argue the obligation—to de-verify users if they recklessly tweet false information. This might be messy in practice, but the judicious de-verification of even one high-profile journalist would probably be enough to send a message to the rest.

There’s one other motivating factor that could ultimately outpace any algorithm: shame. As the worst of the Sandy photos were debunked, Twitter was seized by a giddy spasm of relief and a sort of collective embarrassment. Many of the same users who had just hours before breathlessly shared the photos now shared even more absurdly fake photos with good-humored chagrin. For those who passed along fake photos, the embarrassment from that one turbulent night has stayed thousands of ill-considered tweets and retweets. And, unlike Tweetcred, shame works on every browser.

Illustration by Roman Muradov.