Predicting the Credibility of Disaster Tweets Automatically

“Predicting Information Credibility in Time-Sensitive Social Media” is one of this year’s most interesting and important studies on “information forensics”. The analysis, co-authored by my QCRI colleague ChaTo Castello, will be published in Internet Research and should be required reading for anyone interested in the role of social media for emergency management and humanitarian response. The authors study disaster tweets and find that there are measurable differences in the way they propagate. They show that “these differences are related to the news-worthiness and credibility of the information conveyed,” a finding that en-abled them to develop an automatic and remarkably accurate way to identify credible information on Twitter.

The new study builds on this previous research, which analyzed the veracity of tweets during a major disaster. The research found “a correlation between how information propagates and the credibility that is given by the social network to it. Indeed, the reflection of real-time events on social media reveals propagation patterns that surprisingly has less variability the greater a news value is.” The graphs below depict this information propagation behavior during the 2010 Chile Earthquake.

The graphs depict the re-tweet activity during the first hours following earth-quake. Grey edges depict past retweets. Some of the re-tweet graphs reveal interesting patterns even within 30-minutes of the quake. “In some cases tweet propagation takes the form of a tree. This is the case of direct quoting of infor-mation. In other cases the propagation graph presents cycles, which indicates that the information is being commented and replied, as well as passed on.” When studying false rumor propagation, the analysis reveals that “false rumors tend to be questioned much more than confirmed truths [...].”

Building on these insights, the authors studied over 200,000 disaster tweets and identified 16 features that best separate credible and non-credible tweets. For example, users who spread credible tweets tend to have more followers. In addition, “credible tweets tend to include references to URLs which are included on the top-10,000 most visited domains on the Web. In general, credible tweets tend to include more URLs, and are longer than non credible tweets.” Further-more, credible tweets also tend to express negative feelings whilst non-credible tweets concentrate more on positive sentiments. Finally, question- and exclama-tion-marks tend to be associated with non-credible tweets, as are tweets that use first and third person pronouns. All 16 features are listed below.

• Average number of tweets posted by authors of the tweets on the topic in past.
• Average number of followees of authors posting these tweets.
•  Fraction of tweets having a positive sentiment.
•  Fraction of tweets having a negative sentiment.
•  Fraction of tweets containing a URL that contain most frequent URL.
•  Fraction of tweets containing a URL.
•  Fraction of URLs pointing to a domain among top 10,000 most visited ones.
•  Fraction of tweets containing a user mention.
•  Average length of the tweets.
•  Fraction of tweets containing a question mark.
•  Fraction of tweets containing an exclamation mark.
•  Fraction of tweets containing a question or an exclamation mark.
•  Fraction of tweets containing a “smiling” emoticons.
•  Fraction of tweets containing a first-person pronoun.
•  Fraction of tweets containing a third-person pronoun.
•  Maximum depth of the propagation trees.

Using natural language processing (NLP) and machine learning (ML), the authors used the insights above to develop an automatic classifier for finding credible English-language tweets. This classifier had a 86% AUC. This measure, which ranges from 0 to 1, captures the classifier’s predictive quality. When applied to Spanish-language tweets, the classifier’s AUC was still relatively high at 82%, which demonstrates the robustness of the approach.

Interested in learning more about “information forensics”? See this link and the articles below:

32 responses to “Predicting the Credibility of Disaster Tweets Automatically

  1. Pingback: Estudio analiza características de los tweets creíbles sobre desastres | iRescate

  2. Pingback: Cyberculture roundup: “Good and bad reasons to be worried about WCIT…”A Tutorial on Anonymous Email Accounts « Erkan's Field Diary

  3. Retweet pattern is occasion of expectation .

  4. Pingback: Could Artificial Intelligence Debunk Twitter Rumors Before They Spread? - Slate Magazine - TWITTEROO.NET

  5. Pingback: Analyzing the Veracity of Tweets during a Major Crisis | iRevolution

  6. Pingback: Forscher entwickeln Lügendetektor für Twitter | BASIC thinking

  7. Pingback: Блог Imena.UA » Учёные нашли способ определить правдивость твитов

  8. Pingback: 5 motivos por los que un Twitter resulta creíble | Redes Sociales

  9. Pingback: 5 motivos por los que un tweet resulta creíble « Blog Personal de Ariel Infante

  10. Pingback: 5 motivos por los que un tweet resulta creíble | Construweb Social Media Marketing

  11. Pingback: Social media hoaxes: Could machine learning debunk false Twitter rumors before they spread? « GEODATA POLICY

  12. Pingback: Tweet analyzer ranks trustworthy Tweets during emergencies | Tim

  13. Pingback: How Twitter Gets In The Way Of Research « Gadgetizing

  14. Pingback: New Site Coming In Days

  15. Pingback: Tweeting is Believing? Analyzing Perceptions of Credibility on Twitter | iRevolution

  16. Pingback: » A (não) memória do Twitter

  17. Pingback: How to Create Resilience Through Big Data | iRevolution

  18. Pingback: Why the Public Uses Social Media During Disasters (and Why Some Don’t) | iRevolution

  19. Pingback: Policy makers and network science: Time to bridge the divide « Voices from Eurasia

  20. Pingback: Building a Better Truth Machine |

  21. Pingback: За Россию, Путина и Народный Фронт! » Новые медиа как средство международных информационных интервенций

  22. Pingback: How Twitter Gets In The Way Of Research |

  23. Pingback: A Research Framework for Next Generation Humanitarian Technology and Innovation | iRevolution

  24. Pingback: Using Crowdsourcing to Counter the Spread of False Rumors on Social Media During Crises | iRevolution

  25. Pingback: Cyberculture roundup: “The Life and Times of a TV Show Piracy Release Group”, CISPA behind closed doors… | Erkan's Field Diary

  26. Pingback: Humanitarianism in the Network Age: Groundbreaking Study | iRevolution

  27. Pingback: Tweet analyzer ranks trustworthy Tweets during emergencies | Tim

  28. Pingback: Automatically Identifying Fake Images Shared on Twitter During Disasters | iRevolution

  29. Pingback: World Disaster Report: Next Generation Humanitarian Technology | iRevolution

  30. Pingback: Analyzing Fake Content on Twitter During Boston Marathon Bombings | iRevolution

  31. Pingback: Учёные нашли способ определить правдивость твитов | Блог Imena.UA

  32. Pingback: New Insights on How To Verify Social Media | iRevolution

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s