Tag Archives: analysis

Inferring International and Internal Migration Patterns from Twitter

My QCRI colleagues Kiran Garimella and Ingmar Weber recently co-authored an important study on migration patterns discerned from Twitter. The study was co-authored with  Bogdan State (Stanford)  and lead author Emilio Zagheni (CUNY). The authors analyzed 500,000 Twitter users based in OECD countries between May 2011 and April 2013. Since Twitter users are not representative of the OECD population, the study uses a “difference-in-differences” approach to reduce selection bias when in out-migration rates for individual countries. The paper is available here and key insights & results are summarized below.

Twitter Migration

To better understand the demographic characteristics of the Twitter users under study, the authors used face recognition software (Face++) to estimate both the gender and age of users based on their profile pictures. “Face++ uses computer vision and data mining techniques applied to a large database of celebrities to generate estimates of age and sex of individuals from their pictures.” The results are depicted below (click to enlarge). Naturally, there is an important degree of uncertainty about estimates for single individuals. “However, when the data is aggregated, as we did in the population pyramid, the uncertainty is substantially reduced, as overestimates and underestimates of age should cancel each other out.” One important limitation is that age estimates may still be biased if users upload younger pictures of themselves, which would result in underestimating the age of the sample population. This is why other methods to infer age (and gender) should also be applied.

Twitter Migration 3

I’m particularly interested in the bias-correction “difference-in-differences” method used in this study, which demonstrates one can still extract meaningful information about trends even though statistical inferences cannot be inferred since the underlying data does not constitute a representative sample. Applying this method yields the following results (click to enlarge):

Twitter Migration 2

The above graph reveals a number of interesting insights. For example, one can observe a decline in out-migration rates from Mexico to other countries, which is consistent with recent estimates from Pew Research Center. Meanwhile, in Southern Europe, the results show that out-migration flows continue to increase for  countries that were/are hit hard by the economic crisis, like Greece.

The results of this study suggest that such methods can be used to “predict turning points in migration trends, which are particularly relevant for migration forecasting.” In addition, the results indicate that “geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration.” Furthermore, since the study relies in publicly available, real-time data, this approach could also be used to monitor migration trends on an ongoing basis.

To which extent the above is feasible remains to be seen. Very recent mobility data from official statistics are simply not available to more closely calibrate and validate the study’s results. In any event, this study is an important towards addressing a central question that humanitarian organizations are also asking: how can we make statistical inferences from online data when ground-truth data is unavailable as a reference?

I asked Emilio whether techniques like “difference-in-differences” could be used to monitor forced migration. As he noted, there is typically little to no ground truth data available in humanitarian crises. He thus believes that their approach is potentially relevant to evaluate forced migration. That said, he is quick to caution against making generalizations. Their study focused on OECD countries, which represent relatively large samples and high Internet diffusion, which means low selection bias. In contrast, data samples for humanitarian crises tend to be far smaller and highly selected. This means that filtering out the bias may prove more difficult. I hope that this is a challenge that Emilio and his co-authors choose to take on in the near future.


#Westgate Tweets: A Detailed Study in Information Forensics

My team and I at QCRI have just completed a detailed analysis of the 13,200+ tweets posted from one hour before the attacks began until two hours into the attack. The purpose of this study, which will be launched at CrisisMappers 2013 in Nairobi tomorrow, is to make sense of the Big (Crisis) Data generated during the first hours of the siege. A summary of our results are displayed below. The full results of our analysis and discussion of findings are available as a GoogleDoc and also PDF. The purpose of this public GoogleDoc is to solicit comments on our methodology so as to inform the next phase of our research. Indeed, our aim is to categorize and study the entire Westgate dataset in the coming months (730,000+ tweets). In the meantime, sincere appreciation go to my outstanding QCRI Research Assistants, Ms. Brittany Card and Ms. Justine MacKinnon for their hard work on the coding and analysis of the 13,200+ tweets. Our study builds on this preliminary review.

The following 7 figures summarize the main findings of our study. These are discussed in more detail in the GoogleDoc/PDF.

Figure 1: Who Authored the Most Tweets?

Figure 2: Frequency of Tweets by Eyewitnesses Over Time?

Figure 3: Who Were the Tweets Directed At?

Figure 4: What Content Did Tweets Contain?

Figure 5: What Terms Were Used to Reference the Attackers?

Figure 6: What Terms Were Used to Reference Attackers Over Time?

Figure 7: What Kind of Multimedia Content Was Shared?

Hashtag Analysis of #Westgate Crisis Tweets

In July 2013, my team and I at QCRI launched this dashboard to analyze hashtags used by Twitter users during crises. Our first case study, which is available here, focused on Hurricane Sandy. Since then, both the UN and Greenpeace have also made use of the dashboard to analyze crisis tweets.


We just uploaded 700,000+ Westgate related tweets to the dashboard. The results are available here and also displayed above. The dashboard is still under development, so we very much welcome feedback on how to improve it for future analysis. You can upload your own tweets to the dashboard if you’d like to test drive the platform.


See also: Forensics Analysis of #Westgate Tweets (Link)

Analyzing Crisis Hashtags on Twitter (Updated)

Update: You can now upload your own tweets to the Crisis Hashtags Analysis Dashboard here

Hashtag footprints can be revealing. The map below, for example, displays the top 200 locations in the world with the most Twitter hashtags. The top 5 are Sao Paolo, London, Jakarta, Los Angeles and New York.

Hashtag map

A recent study (PDF) of 2 billion geo-tagged tweets and 27 million unique hashtags found that “hashtags are essentially a local phenomenon with long-tailed life spans.” The analysis also revealed that hashtags triggered by external events like disasters “spread faster than hashtags that originate purely within the Twitter network itself.” Like other metadata, hashtags can be  informative in and of themselves. For example, they can provide early warning signals of social tensions in Egypt, as demonstrated in this study. So might they also reveal interesting patterns during and after major disasters?

Tens of thousands of distinct crisis hashtags were posted to Twitter during Hurricane Sandy. While #Sandy and #hurricane featured most, thousands more were also used. For example: #SandyHelp, #rallyrelief, #NJgas, #NJopen, #NJpower, #staysafe, #sandypets, #restoretheshore, #noschool, #fail, etc. NJpower, for example, “helped keep track of the power situation throughout the state. Users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1).

Sandy Hashtags

My colleagues and I at QCRI are studying crisis hashtags to better understand the variety of tags used during and in the immediate aftermath of major crises. Popular hashtags used during disasters often overshadow more hyperlocal ones making these less discoverable. Other challenges include the: “proliferation of hashtags that do not cross-pollinate and a lack of usability in the tools necessary for managing massive amounts of streaming information for participants who needed it” (2). To address these challenges and analyze crisis hashtags, we’ve just launched a Crisis Hashtags Analytics Dashboard. As displayed below, our first case study is Hurricane Sandy. We’ve uploaded about half-a-million tweets posted between October 27th to November 7th, 2012 to the dashboard.


Users can visualize the frequency of tweets (orange line) and hashtags (green line) over time using different time-steps, ranging from 10 minute to 1 day intervals. They can also “zoom in” to capture more minute changes in the number of hashtags per time interval. (The dramatic drop on October 30th is due to a server crash. So if you have access to tweets posted during those hours, I’d be  grateful if you could share them with us).

Hashtag timeline

In the second part of the dashboard (displayed below), users can select any point on the graph to display the top “K” most frequent hashtags. The default value for K is 10 (e.g., top-10 most frequent hashtags) but users can change this by typing in a different number. In addition, the 10 least-frequent hashtags are displayed, as are the 10 “middle-most” hashtags. The top-10 newest hashtags posted during the selected time are also displayed as are the hashtags that have seen the largest increase in frequency. These latter two metrics, “New K” and “Top Increasing K”, may provide early warning signals during disasters. Indeed, the appearance of a new hashtag can reveal a new problem or need while a rapid increase in the frequency of some hashtags can denote the spread of a problem or need.

QCRI Dashboard 2

The third part of the dashboard allows users to visualize and compare the frequency of top hashtags over time. This feature is displayed in the screenshot below. Patterns that arise from diverging or converging hashtags may indicate important developments on the ground.

QCRI Dashboard 3

We’re only at the early stages of developing our hashtags analytics platform (above), but we hope the tool will provide insights during future disasters. For now, we’re simply experimenting and tinkering. So feel free to get in touch if you would like to collaborate and/or suggest some research questions.


Acknowledgements: Many thanks to QCRI colleagues Ahmed Meheina and Sofiane Abbar for their work on developing the dashboard.

Boston Marathon Explosions: Analyzing First 1,000 Seconds on Twitter

My colleagues Rumi Chunara and John Brownstein recently published a short co-authored study entitled “Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions.” At 2.49pm EDT on April 15, two improvised bombs exploded near the finish line of the 117th Boston Marathon. Ambulances left the scene approximately 9 minutes later just as public health authorities alerted regional emergency departments of the incident.

Meanwhile, on Twitter:


An analysis of tweets posted within a 35 mile radius of the finish line reveals that the word stems containing “explos*” and “explod*” appeared on Twitter just 3 minutes after the explosions. “While an increase in messages indicating an emergency from a particular location may not make it possible to fully ascertain the circumstances of an incident without computational or human review, analysis of such data could help public safety officers better understand the location or specifics of explosions or other emergencies.”

In terms of geographical coverage, many of the tweets posted during the first 10 minutes were from witnesses in the immediate vicinity of the finish line. “Because of their proximity to the event and content of their postings, these individuals might be witnesses to the bombings or be of close enough proximity to provide helpful information. These finely detailed geographic data can be used to localize and characterize events assisting emergency response in decision-making.”


Ambulances were already on site for the marathon. This is rarely the case for the majority of crises, however. In those more common situations, “crowdsourced information may uniquely provide extremely timely initial recognition of an event and specific clues as to what events may be unfolding.” Of course, user-generated content is not always accurate. Filtering and analyzing this content in real-time is the first step in the verification process, hence the importance of advanced computing. More on this here.

“Additionally, by comparing newly observed data against temporally adjusted keyword frequencies, it is possible to identify aberrant spikes in keyword use. The inclusion of geographical data allows these spikes to be geographically adjusted, as well. Prospective data collection could also harness larger and other streams of crowdsourced data, and use more comprehensive emergency-related keywords and language processing to increase the sensitivity of this data source.” Furthermore, “the analysis of multiple keywords could further improve these prior probabilities by reducing the impact of single false positive keywords derived from benign events.”


Using Twitter to Map Blackouts During Hurricane Sandy

I recently caught up with Gilal Lotan during a hackathon in New York and was reminded of his good work during Sandy, the largest Atlantic hurricane on record. Amongst other analytics, Gilal created a dynamic map of tweets referring to power outages. “This begins on the evening October 28th as people mostly joke about the prospect of potentially losing power. As the storm evolves, the tone turns much more serious. The darker a region on the map, the more aggregate Tweets about power loss that were seen for that region.” The animated map is captured in the video below.

Hashtags played a key role in the reporting. The #NJpower hashtag, for example, was used to ‘help  keep track of the power situation throughout the state (1). As depicted in the tweet below, “users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1). 

NJpower tweet

As Gilal notes, “The potential for mapping out this kind of information in realtime is huge. Think of generating these types of maps for different scenarios– power loss, flooding, strong winds, trees falling.” Indeed, colleagues at FEMA and ESRI had asked us to automatically extract references to gas leaks on Twitter in the immediate aftermath of the Category 5 Tornado in Oklahoma. One could also use a platform like GeoFeedia, which maps multiple types of social media reports based on keywords (i.e., not machine learning). But the vast majority of Twitter users do not geo-tag their tweets. In fact, only 2.7% of tweets are geotagged, according to this study. This explains why enlightened policies are also important for humanitarian technologies to work—like asking the public to temporally geo-tag their social media updates when these are relevant to disaster response.

While basing these observations on people’s Tweets might not always bring back valid results (someone may jokingly tweet about losing power),” Gilal argues that “the aggregate, especially when compared to the norm, can be a pretty powerful signal.” The key word here is norm. If an established baseline of geo-tagged tweets for the northeast were available, one would have a base-map of “normal” geo-referenced twitter activity. This would enable us to understand deviations from the norm. Such a base-map would thus place new tweets in temporal and geo-spatial context.

In sum, creating live maps of geo-tagged tweets is only a first step. Base-maps should be rapidly developed and overlaid with other datasets such as population and income distribution. Of course, these datasets are not always available acessing historical Twitter data can also be a challenge. The latter explains why Big Data Philanthropy for Disaster Response is so key.


Crowdsourcing Crisis Information from Syria: Twitter Firehose vs API

Over 400 million tweets are posted every day. But accessing 100% of these tweets (say for disaster response purposes) requires access to Twitter’s “Firehose”. The latter, however, can be prohibitively expensive and also requires serious infrastructure to manage. This explains why many (all?) of us in the Crisis Computing & Humanitarian Technology space use Twitter’s “Streaming API” instead. But how representative are tweets sampled through the API vis-a-vis overall activity on Twitter? This is important question is posed and answered in this new study using Syria as a case study.

Tweets Syria

The analysis focused on “Tweets collected in the region around Syria during the period from December 14, 2011 to January 10, 2012.” The first dataset was collected using Firehose access while the second was sampled from the API. The tag clouds above (click to enlarge) displays the most frequent top terms found in each dataset. The hashtags and geoboxes used for the data collection are listed in the table below.

Syria List

The graph below shows the number of tweets collected between December 14th, 2011 and January 10th, 2012. This amounted 528,592 tweets from the API and 1,280,344 tweets from the Firehose. On average, the API captures 43.5% of tweets available on the Firehose. “One of the more interesting results in this dataset is that as the data in the Firehose spikes, the Streaming API coverage is reduced. One possible explanation for this phenomenon could be that due to the Western holidays observed at this time, activity on Twitter may have reduced causing the 1% threshold to go down.”

Syria Graph

The authors, Fred Morstatter, Jürgen Pfeffer, Huan Liu and Kathleen Carley, also carry out hashtag analysis using each dataset. “Here we see mixed results at small values of n [top hashtags], indicating that the Streaming data may not be good for finding the top hashtags. At larger values of n, we see that the Streaming API does a better job of estimating the top hashtags in the Firehose data.” In addition, the analysis reveals that the “Streaming API data does not consistently find the top hashtags, in some cases revealing reverse correlation with the Firehose data […]. This could be indicative of a filtering process in Twitter’s Streaming API which causes a misrepresentation of top hashtags in the data.”

In terms of social network analysis, the the authors were able to show that “50% to 60% of the top 100 key-players [can be identified] when creating the networks based on one day of Streaming API data.” Aggregating more days’ worth of data “can increase the accuracy substantially. For network level measures, first in-depth analysis revealed interesting correlation between network centralization indexes and the proportion of data covered by the Streaming API.”

Finally, study also compares the geolocation of tweets. More specifically, the authors assess how the “geographic distribution of the geolocated tweets is affected by the sampling performed by the Streaming API. The number of geotagged tweets is low, with only 16,739 geotagged tweets in the Streaming data (3.17%) and 18,579 in the Firehose data (1.45%).” Still, the authors find that “despite the difference in tweets collected on the whole we get 90.10% coverage of geotagged tweets.”

In sum, the study finds that “the results of using the Streaming API depend strongly on the coverage and the type of analysis that the researcher wishes to perform. This leads to the next question concerning the estimation of how much data we actually get in a certain time period.” This is critical if researchers want to place their results into context and potentially apply statistical methods to account (and correct) for bias. The authors suggest that in some cases the Streaming API coverage can be estimated. In future research, they hope to “find methods to compensate for the biases in the Streaming API to provide a more accurate picture of Twitter activity to researchers.” In particularly they want to “determine whether the methodology presented here will yield similar results for Twitter data collected from other domains, such as natural, protest & elections.”

The authors will present their paper at this year’s International Conference on Weblogs and Social Media (ICWSM). So I look forward to meeting them there to discuss related research we are carrying out at QCRI.


 See also: