Tag Archives: analysis

Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.

Bio

Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

Inferring International and Internal Migration Patterns from Twitter

My QCRI colleagues Kiran Garimella and Ingmar Weber recently co-authored an important study on migration patterns discerned from Twitter. The study was co-authored with  Bogdan State (Stanford)  and lead author Emilio Zagheni (CUNY). The authors analyzed 500,000 Twitter users based in OECD countries between May 2011 and April 2013. Since Twitter users are not representative of the OECD population, the study uses a “difference-in-differences” approach to reduce selection bias when in out-migration rates for individual countries. The paper is available here and key insights & results are summarized below.

Twitter Migration

To better understand the demographic characteristics of the Twitter users under study, the authors used face recognition software (Face++) to estimate both the gender and age of users based on their profile pictures. “Face++ uses computer vision and data mining techniques applied to a large database of celebrities to generate estimates of age and sex of individuals from their pictures.” The results are depicted below (click to enlarge). Naturally, there is an important degree of uncertainty about estimates for single individuals. “However, when the data is aggregated, as we did in the population pyramid, the uncertainty is substantially reduced, as overestimates and underestimates of age should cancel each other out.” One important limitation is that age estimates may still be biased if users upload younger pictures of themselves, which would result in underestimating the age of the sample population. This is why other methods to infer age (and gender) should also be applied.

Twitter Migration 3

I’m particularly interested in the bias-correction “difference-in-differences” method used in this study, which demonstrates one can still extract meaningful information about trends even though statistical inferences cannot be inferred since the underlying data does not constitute a representative sample. Applying this method yields the following results (click to enlarge):

Twitter Migration 2

The above graph reveals a number of interesting insights. For example, one can observe a decline in out-migration rates from Mexico to other countries, which is consistent with recent estimates from Pew Research Center. Meanwhile, in Southern Europe, the results show that out-migration flows continue to increase for  countries that were/are hit hard by the economic crisis, like Greece.

The results of this study suggest that such methods can be used to “predict turning points in migration trends, which are particularly relevant for migration forecasting.” In addition, the results indicate that “geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration.” Furthermore, since the study relies in publicly available, real-time data, this approach could also be used to monitor migration trends on an ongoing basis.

To which extent the above is feasible remains to be seen. Very recent mobility data from official statistics are simply not available to more closely calibrate and validate the study’s results. In any event, this study is an important towards addressing a central question that humanitarian organizations are also asking: how can we make statistical inferences from online data when ground-truth data is unavailable as a reference?

I asked Emilio whether techniques like “difference-in-differences” could be used to monitor forced migration. As he noted, there is typically little to no ground truth data available in humanitarian crises. He thus believes that their approach is potentially relevant to evaluate forced migration. That said, he is quick to caution against making generalizations. Their study focused on OECD countries, which represent relatively large samples and high Internet diffusion, which means low selection bias. In contrast, data samples for humanitarian crises tend to be far smaller and highly selected. This means that filtering out the bias may prove more difficult. I hope that this is a challenge that Emilio and his co-authors choose to take on in the near future.

bio

#Westgate Tweets: A Detailed Study in Information Forensics

My team and I at QCRI have just completed a detailed analysis of the 13,200+ tweets posted from one hour before the attacks began until two hours into the attack. The purpose of this study, which will be launched at CrisisMappers 2013 in Nairobi tomorrow, is to make sense of the Big (Crisis) Data generated during the first hours of the siege. A summary of our results are displayed below. The full results of our analysis and discussion of findings are available as a GoogleDoc and also PDF. The purpose of this public GoogleDoc is to solicit comments on our methodology so as to inform the next phase of our research. Indeed, our aim is to categorize and study the entire Westgate dataset in the coming months (730,000+ tweets). In the meantime, sincere appreciation go to my outstanding QCRI Research Assistants, Ms. Brittany Card and Ms. Justine MacKinnon for their hard work on the coding and analysis of the 13,200+ tweets. Our study builds on this preliminary review.

The following 7 figures summarize the main findings of our study. These are discussed in more detail in the GoogleDoc/PDF.

Figure 1: Who Authored the Most Tweets?

Figure 2: Frequency of Tweets by Eyewitnesses Over Time?

Figure 3: Who Were the Tweets Directed At?

Figure 4: What Content Did Tweets Contain?

Figure 5: What Terms Were Used to Reference the Attackers?

Figure 6: What Terms Were Used to Reference Attackers Over Time?

Figure 7: What Kind of Multimedia Content Was Shared?

Hashtag Analysis of #Westgate Crisis Tweets

In July 2013, my team and I at QCRI launched this dashboard to analyze hashtags used by Twitter users during crises. Our first case study, which is available here, focused on Hurricane Sandy. Since then, both the UN and Greenpeace have also made use of the dashboard to analyze crisis tweets.

QCRI_Dashboard

We just uploaded 700,000+ Westgate related tweets to the dashboard. The results are available here and also displayed above. The dashboard is still under development, so we very much welcome feedback on how to improve it for future analysis. You can upload your own tweets to the dashboard if you’d like to test drive the platform.

Bio

See also: Forensics Analysis of #Westgate Tweets (Link)

Analyzing Crisis Hashtags on Twitter (Updated)

Update: You can now upload your own tweets to the Crisis Hashtags Analysis Dashboard here

Hashtag footprints can be revealing. The map below, for example, displays the top 200 locations in the world with the most Twitter hashtags. The top 5 are Sao Paolo, London, Jakarta, Los Angeles and New York.

Hashtag map

A recent study (PDF) of 2 billion geo-tagged tweets and 27 million unique hashtags found that “hashtags are essentially a local phenomenon with long-tailed life spans.” The analysis also revealed that hashtags triggered by external events like disasters “spread faster than hashtags that originate purely within the Twitter network itself.” Like other metadata, hashtags can be  informative in and of themselves. For example, they can provide early warning signals of social tensions in Egypt, as demonstrated in this study. So might they also reveal interesting patterns during and after major disasters?

Tens of thousands of distinct crisis hashtags were posted to Twitter during Hurricane Sandy. While #Sandy and #hurricane featured most, thousands more were also used. For example: #SandyHelp, #rallyrelief, #NJgas, #NJopen, #NJpower, #staysafe, #sandypets, #restoretheshore, #noschool, #fail, etc. NJpower, for example, “helped keep track of the power situation throughout the state. Users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1).

Sandy Hashtags

My colleagues and I at QCRI are studying crisis hashtags to better understand the variety of tags used during and in the immediate aftermath of major crises. Popular hashtags used during disasters often overshadow more hyperlocal ones making these less discoverable. Other challenges include the: “proliferation of hashtags that do not cross-pollinate and a lack of usability in the tools necessary for managing massive amounts of streaming information for participants who needed it” (2). To address these challenges and analyze crisis hashtags, we’ve just launched a Crisis Hashtags Analytics Dashboard. As displayed below, our first case study is Hurricane Sandy. We’ve uploaded about half-a-million tweets posted between October 27th to November 7th, 2012 to the dashboard.

QCRI_Dashboard

Users can visualize the frequency of tweets (orange line) and hashtags (green line) over time using different time-steps, ranging from 10 minute to 1 day intervals. They can also “zoom in” to capture more minute changes in the number of hashtags per time interval. (The dramatic drop on October 30th is due to a server crash. So if you have access to tweets posted during those hours, I’d be  grateful if you could share them with us).

Hashtag timeline

In the second part of the dashboard (displayed below), users can select any point on the graph to display the top “K” most frequent hashtags. The default value for K is 10 (e.g., top-10 most frequent hashtags) but users can change this by typing in a different number. In addition, the 10 least-frequent hashtags are displayed, as are the 10 “middle-most” hashtags. The top-10 newest hashtags posted during the selected time are also displayed as are the hashtags that have seen the largest increase in frequency. These latter two metrics, “New K” and “Top Increasing K”, may provide early warning signals during disasters. Indeed, the appearance of a new hashtag can reveal a new problem or need while a rapid increase in the frequency of some hashtags can denote the spread of a problem or need.

QCRI Dashboard 2

The third part of the dashboard allows users to visualize and compare the frequency of top hashtags over time. This feature is displayed in the screenshot below. Patterns that arise from diverging or converging hashtags may indicate important developments on the ground.

QCRI Dashboard 3

We’re only at the early stages of developing our hashtags analytics platform (above), but we hope the tool will provide insights during future disasters. For now, we’re simply experimenting and tinkering. So feel free to get in touch if you would like to collaborate and/or suggest some research questions.

Bio

Acknowledgements: Many thanks to QCRI colleagues Ahmed Meheina and Sofiane Abbar for their work on developing the dashboard.

Boston Marathon Explosions: Analyzing First 1,000 Seconds on Twitter

My colleagues Rumi Chunara and John Brownstein recently published a short co-authored study entitled “Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions.” At 2.49pm EDT on April 15, two improvised bombs exploded near the finish line of the 117th Boston Marathon. Ambulances left the scene approximately 9 minutes later just as public health authorities alerted regional emergency departments of the incident.

Meanwhile, on Twitter:

BostonTweets

An analysis of tweets posted within a 35 mile radius of the finish line reveals that the word stems containing “explos*” and “explod*” appeared on Twitter just 3 minutes after the explosions. “While an increase in messages indicating an emergency from a particular location may not make it possible to fully ascertain the circumstances of an incident without computational or human review, analysis of such data could help public safety officers better understand the location or specifics of explosions or other emergencies.”

In terms of geographical coverage, many of the tweets posted during the first 10 minutes were from witnesses in the immediate vicinity of the finish line. “Because of their proximity to the event and content of their postings, these individuals might be witnesses to the bombings or be of close enough proximity to provide helpful information. These finely detailed geographic data can be used to localize and characterize events assisting emergency response in decision-making.”

BostonBombing2

Ambulances were already on site for the marathon. This is rarely the case for the majority of crises, however. In those more common situations, “crowdsourced information may uniquely provide extremely timely initial recognition of an event and specific clues as to what events may be unfolding.” Of course, user-generated content is not always accurate. Filtering and analyzing this content in real-time is the first step in the verification process, hence the importance of advanced computing. More on this here.

“Additionally, by comparing newly observed data against temporally adjusted keyword frequencies, it is possible to identify aberrant spikes in keyword use. The inclusion of geographical data allows these spikes to be geographically adjusted, as well. Prospective data collection could also harness larger and other streams of crowdsourced data, and use more comprehensive emergency-related keywords and language processing to increase the sensitivity of this data source.” Furthermore, “the analysis of multiple keywords could further improve these prior probabilities by reducing the impact of single false positive keywords derived from benign events.”

bio

Using Twitter to Map Blackouts During Hurricane Sandy

I recently caught up with Gilal Lotan during a hackathon in New York and was reminded of his good work during Sandy, the largest Atlantic hurricane on record. Amongst other analytics, Gilal created a dynamic map of tweets referring to power outages. “This begins on the evening October 28th as people mostly joke about the prospect of potentially losing power. As the storm evolves, the tone turns much more serious. The darker a region on the map, the more aggregate Tweets about power loss that were seen for that region.” The animated map is captured in the video below.

Hashtags played a key role in the reporting. The #NJpower hashtag, for example, was used to ‘help  keep track of the power situation throughout the state (1). As depicted in the tweet below, “users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1). 

NJpower tweet

As Gilal notes, “The potential for mapping out this kind of information in realtime is huge. Think of generating these types of maps for different scenarios– power loss, flooding, strong winds, trees falling.” Indeed, colleagues at FEMA and ESRI had asked us to automatically extract references to gas leaks on Twitter in the immediate aftermath of the Category 5 Tornado in Oklahoma. One could also use a platform like GeoFeedia, which maps multiple types of social media reports based on keywords (i.e., not machine learning). But the vast majority of Twitter users do not geo-tag their tweets. In fact, only 2.7% of tweets are geotagged, according to this study. This explains why enlightened policies are also important for humanitarian technologies to work—like asking the public to temporally geo-tag their social media updates when these are relevant to disaster response.

While basing these observations on people’s Tweets might not always bring back valid results (someone may jokingly tweet about losing power),” Gilal argues that “the aggregate, especially when compared to the norm, can be a pretty powerful signal.” The key word here is norm. If an established baseline of geo-tagged tweets for the northeast were available, one would have a base-map of “normal” geo-referenced twitter activity. This would enable us to understand deviations from the norm. Such a base-map would thus place new tweets in temporal and geo-spatial context.

In sum, creating live maps of geo-tagged tweets is only a first step. Base-maps should be rapidly developed and overlaid with other datasets such as population and income distribution. Of course, these datasets are not always available acessing historical Twitter data can also be a challenge. The latter explains why Big Data Philanthropy for Disaster Response is so key.

bio