Tag Archives: Twitter

Using AIDR to Collect and Analyze Tweets from Chile Earthquake

Wish you had a better way to make sense of Twitter during disasters than this?

Type in a keyword like #ChileEarthquake in Twitter’s search box above and you’ll see more tweets than you can possibly read in a day let alone keep up with for more than a few minutes. Wish there way were an easy, free and open source solution? Well you’ve come to the right place. My team and I at QCRI are developing the Artificial Intelligence for Disaster Response (AIDR) platform to do just this. Here’s how it works:

First you login to the AIDR platform using your own Twitter handle (click images below to enlarge):

AIDR login

You’ll then see your collection of tweets (if you already have any). In my case, you’ll see I have three. The first is a collection of English language tweets related to the Chile Earthquake. The second is a collection of Spanish tweets. The third is a collection of more than 3,000,000 tweets related to the missing Malaysia Airlines plane. A preliminary analysis of these tweets is available here.

AIDR collections

Lets look more closely at my Chile Earthquake 2014 collection (see below, click to enlarge). I’ve collected about a quarter of a million tweets in the past 30 hours or so. The label “Downloaded tweets (since last re-start)” simply refers to the number of tweets I’ve collected since adding a new keyword or hashtag to my collection. I started the collection yesterday at 5:39am my time (yes, I’m an early bird). Under “Keywords” you’ll see all the hashtags and keywords I’ve used to search for tweets related to the earthquake in Chile. I’ve also specified the geographic region I want to collect tweets from. Don’t worry, you don’t actually have to enter geographic coordinates when you set up your own collection, you simply highlight (on map) the area you’re interested in and AIDR does the rest.

AIDR - Chile Earthquake 2014

You’ll also note in the above screenshot that I’ve selected to only collect tweets in English, but you can collect all language tweets if you’d like or just a select few. Finally, the Collaborators section simply lists the colleagues I’ve added to my collection. This gives them the ability to add new keywords/hashtags and to download the tweets collected as shown below (click to enlarge). More specifically, collaborators can download the most recent 100,000 tweets (and also share the link with others). The 100K tweet limit is based on Twitter’s Terms of Service (ToS). If collaborators want all the tweets, Twitter’s ToS allows for sharing the TweetIDs for an unlimited number of tweets.

AIDR download CSV

So that’s the AIDR Collector. We also have the AIDR Classifier, which helps you make sense of the tweets you’re collecting (in real-time). That is, your collection of tweets doesn’t stop, it continues growing, and as it does, you can make sense of new tweets as they come in. With the Classifier, you simply teach AIDR to classify tweets into whatever topics you’re interested in, like “Infrastructure Damage”, for example. To get started with the AIDR Classifier, simply return to the “Details” tab of our Chile collection. You’ll note the “Go To Classifier” button on the far right:

AIDR go to Classifier

Clicking on that button allows you to create a Classifier, say on the topic of disaster damage in general. So you simply create a name for your Classifier, in this case “Disaster Damage” and then create Tags to capture more details with respect to damage-related tweets. For example, one Tag might be, say, “Damage to Transportation Infrastructure.” Another could be “Building Damage.” In any event, once you’ve created your Classifier and corresponding tags, you click Submit and find your way to this page (click to enlarge):

AIDR Classifier Link

You’ll notice the public link for volunteers. That’s basically the interface you’ll use to teach AIDR. If you want to teach AIDR by yourself, you can certainly do so. You also have the option of “crowdsourcing the teaching” of AIDR. Clicking on the link will take you to the page below.

AIDR to MicroMappers

So, I called my Classifier “Message Contents” which is not particularly insightful; I should have labeled it something like “Humanitarian Information Needs” or something, but bear with me and lets click on that Classifier. This will take you to the following Clicker on MicroMappers:

MicroMappers Clicker

Now this is not the most awe-inspiring interface you’ve ever seen (at least I hope not); reason being that this is simply our very first version. We’ll be providing different “skins” like the official MicroMappers skin (below) as well as a skin that allows you to upload your own logo, for example. In the meantime, note that AIDR shows every tweet to at least three different volunteers. And only if each of these 3 volunteers agree on how to classify a given tweet does AIDR take that into consideration when learning. In other words, AIDR wants to ensure that humans are really sure about how to classify a tweet before it decides to learn from that lesson. Incidentally, The MicroMappers smartphone app for the iPhone and Android will be available in the next few weeks. But I digress.

Yolanda TweetClicker4

As you and/or your volunteers classify tweets based on the Tags you created, AIDR starts to learn—hence the AI (Artificial Intelligence) in AIDR. AIDR begins to recognize that all the tweets you classified as “Infrastructure Damage” are indeed similar. Once you’ve tagged enough tweets, AIDR will decide that it’s time to leave the nest and fly on it’s own. In other words, it will start to auto-classify incoming tweets in real-time. (At present, AIDR can auto-classify some 30,000 tweets per minute; compare this to the peak rate of 16,000 tweets per minute observed during Hurricane Sandy).

Of course, AIDR’s first solo “flights” won’t always go smoothly. But not to worry, AIDR will let you know when it needs a little help. Every tweet that AIDR auto-tags comes with a Confidence level. That is, AIDR will let you know: “I am 80% sure that I correctly classified this tweet”. If AIDR has trouble with a tweet, i.e., if it’s confidence level is 65% or below, the it will send the tweet to you (and/or your volunteers) so it can learn from how you classify that particular tweet. In other words, the more tweets you classify, the more AIDR learns, and the higher AIDR’s confidence levels get. Fun, huh?

To view the results of the machine tagging, simply click on the View/Download tab, as shown below (click to enlarge). The page shows you the latest tweets that have been auto-tagged along with the Tag label and the confidence score. (Yes, this too is the first version of that interface, we’ll make it more user-friendly in the future, not to worry). In any event, you can download the auto-tagged tweets in a CSV file and also share the download link with your colleagues for analysis and so on. At some point in the future, we hope to provide a simple data visualization output page so that you can easily see interesting data trends.

AIDR Results

So that’s basically all there is to it. If you want to learn more about how it all works, you might fancy reading this research paper (PDF). In the meantime, I’ll simply add that you can re-use your Classifiers. If (when?) another earthquake strikes Chile, you won’t have to start from scratch. You can auto-tag incoming tweets immediately with the Classifier you already have. Plus, you’ll be able to share your classifiers with your colleagues and partner organizations if you like. In other words, we’re envisaging an “App Store” of Classifiers based on different hazards and different countries. The more we re-use our Classifiers, the more accurate they will become. Everybody wins.

And voila, that is AIDR (at least our first version). If you’d like to test the platform and/or want the tweets from the Chile Earthquake, simply get in touch!

bio

Note:

  • We’re adapting AIDR so that it can also classify text messages (SMS).
  • AIDR Classifiers are language specific. So if you speak Spanish, you can create a classifier to tag all Spanish language tweets/SMS that refer to disaster damage, for example. In other words, AIDR does not only speak English : )

Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.

Bio

Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

Inferring International and Internal Migration Patterns from Twitter

My QCRI colleagues Kiran Garimella and Ingmar Weber recently co-authored an important study on migration patterns discerned from Twitter. The study was co-authored with  Bogdan State (Stanford)  and lead author Emilio Zagheni (CUNY). The authors analyzed 500,000 Twitter users based in OECD countries between May 2011 and April 2013. Since Twitter users are not representative of the OECD population, the study uses a “difference-in-differences” approach to reduce selection bias when in out-migration rates for individual countries. The paper is available here and key insights & results are summarized below.

Twitter Migration

To better understand the demographic characteristics of the Twitter users under study, the authors used face recognition software (Face++) to estimate both the gender and age of users based on their profile pictures. “Face++ uses computer vision and data mining techniques applied to a large database of celebrities to generate estimates of age and sex of individuals from their pictures.” The results are depicted below (click to enlarge). Naturally, there is an important degree of uncertainty about estimates for single individuals. “However, when the data is aggregated, as we did in the population pyramid, the uncertainty is substantially reduced, as overestimates and underestimates of age should cancel each other out.” One important limitation is that age estimates may still be biased if users upload younger pictures of themselves, which would result in underestimating the age of the sample population. This is why other methods to infer age (and gender) should also be applied.

Twitter Migration 3

I’m particularly interested in the bias-correction “difference-in-differences” method used in this study, which demonstrates one can still extract meaningful information about trends even though statistical inferences cannot be inferred since the underlying data does not constitute a representative sample. Applying this method yields the following results (click to enlarge):

Twitter Migration 2

The above graph reveals a number of interesting insights. For example, one can observe a decline in out-migration rates from Mexico to other countries, which is consistent with recent estimates from Pew Research Center. Meanwhile, in Southern Europe, the results show that out-migration flows continue to increase for  countries that were/are hit hard by the economic crisis, like Greece.

The results of this study suggest that such methods can be used to “predict turning points in migration trends, which are particularly relevant for migration forecasting.” In addition, the results indicate that “geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration.” Furthermore, since the study relies in publicly available, real-time data, this approach could also be used to monitor migration trends on an ongoing basis.

To which extent the above is feasible remains to be seen. Very recent mobility data from official statistics are simply not available to more closely calibrate and validate the study’s results. In any event, this study is an important towards addressing a central question that humanitarian organizations are also asking: how can we make statistical inferences from online data when ground-truth data is unavailable as a reference?

I asked Emilio whether techniques like “difference-in-differences” could be used to monitor forced migration. As he noted, there is typically little to no ground truth data available in humanitarian crises. He thus believes that their approach is potentially relevant to evaluate forced migration. That said, he is quick to caution against making generalizations. Their study focused on OECD countries, which represent relatively large samples and high Internet diffusion, which means low selection bias. In contrast, data samples for humanitarian crises tend to be far smaller and highly selected. This means that filtering out the bias may prove more difficult. I hope that this is a challenge that Emilio and his co-authors choose to take on in the near future.

bio

Typhoon Yolanda: UN Needs Your Help Tagging Crisis Tweets for Disaster Response (Updated)

Final Update 14 [Nov 13th @ 4pm London]: Thank you for clicking to support the UN’s relief operations in the Philippines! We have now completed our mission as digital humanitarian volunteers. The early results of our collective online efforts are described here. Thank you for caring and clicking. Feel free to join our list-serve if you want to be notified when humanitarian organizations need your help again during the next disaster—which we really hope won’t be for a long, long time. In the meantime, our hearts and prayers go out to those affected by this devastating Typhoon.

-

The United Nations Office for the Coordination of Humanitarian Affairs (OCHA) just activated the Digital Humanitarian Network (DHN) in response to Typhoon Yolanda, which has already been described as possibly one of the strongest Category 5 storms in history. The Standby Volunteer Task Force (SBTF) was thus activated by the DHN to carry out a rapid needs & damage assessment by tagging reports posted to social media. So Ji Lucas and I at QCRI (+ Hemant & Andrew) and Justine Mackinnon from SBTF have launched MicroMappers to microtask the tagging of tweets & images. We need all the help we can get given the volume we’ve collected (and are continuing to collect). This is where you come in!

TweetClicker_PH2

You don’t need any prior experience or training, nor do you need to create an account or even login to use the MicroMappers TweetClicker. If you can read and use a computer mouse, then you’re all set to be a Digital Humanitarian! Just click here to get started. Every tweet will get tagged by 3 different volunteers (to ensure quality control) and those tweets that get identical tags will be shared with our UN colleagues in the Philippines. All this and more is explained in the link above, which will give you a quick intro so you can get started right away. Our UN colleagues need these tags to better understand who needs help and what areas have been affected.

ImageClicker YolandaPH

It only takes 3 seconds to tag a tweet or image, so if that’s all the time you have then that’s plenty! And better yet, if you also share this link with family, friends, colleagues etc., and invite them to tag along. We’ll soon be launching We have also launched the ImageClicker to tag images by level of damage. So please stay tuned. What we need is the World Wide Crowd to mobilize in support of those affected by this devastating disaster. So please spread the word. And keep in mind that this is only the second time we’re using MicroMappers, so we know it is not (yet) perfect : ) Thank you!

bio

p.s. If you wish to receive an alert next time MicroMappers is activated for disaster response, then please join the MicroMappers list-serve here. Thanks!

Previous updates:

Update 1: If you notice that all the tweets (tasks) have been completed, then please check back in 1/2 hour as we’re uploading more tweets on the fly. Thanks!

Update 2: Thanks for all your help! We are getting lots of traffic, so the Clicker is responding very slowly right now. We’re working on improving speed, thanks for your patience!

Update 3: We collected 182,000+ tweets on Friday from 5am-7pm (local time) and have automatically filtered this down to 35,175 tweets based on relevancy and uniqueness. These 35K tweets are being uploaded to the TweetClicker a few thousand tweets at a time. We’ll be repeating all this for just one more day tomorrow (Saturday). Thanks for your continued support!

Update 4: We/you have clicked through all of Friday’s 35K tweets and currently clicking through today’s 28,202 tweets, which we are about 75% of the way through. Many thanks for tagging along with us, please keep up the top class clicking, we’re almost there! (Sunday, 1pm NY time)

Update 5: Thanks for all your help! We’ll be uploading more tweets tomorrow (Monday, November 11th). To be notified, simply join this list-serve. Thanks again! [updated post on Sunday, November 10th at 5.30pm New York]

Update 6: We’ve uploaded more tweets! This is the final stretch, thanks for helping us on this last sprint of clicks!  Feel free to join our list-serve if you want to be notified when new tweets are available, many thanks! If the system says all tweets have been completed, please check again in 1/2hr as we are uploading new tweets around the clock. [updated Monday, November 11th at 9am London]

Update 7 [Nov 11th @ 1pm London]We’ve just launched the ImageClicker to support the UN’s relief efforts. So please join us in tagging images to provide rapid damage assessments to our humanitarian partners. Our TweetClicker is still in need of your clicks too. If the Clickers are slow, then kindly be patient. If all the tasks are done, please come back in 1/2hr as we’re uploading content to both clickers around the clock. Thanks for caring and helping the relief efforts. An update on the overall digital humanitarian effort is available here.

Update 8 [Nov 11th @ 6.30pm NY]We’ll be uploading more tweets and images to the TweetClicker & ImageClicker by 7am London on Nov 12th. Thank you very much for supporting these digital humanitarian efforts, the results of which are displayed here. Feel free to join our list-serve if you want to be notified when the Clickers have been fed!

Update 9 [Nov 12th @ 6.30am London]: We’ve fed both our TweetClicker and ImageClicker with new tweets and images. So please join us in clicking away to provide our UN partners with the situational awareness they need to coordinate their important relief efforts on the ground. The results of all our clicks are displayed here. Thank you for helping and for caring. If the Clickers or empty or offline temporarily, please check back again soon for more clicks.

Update 10 [Nov 12th @ 10am New York]: Were continuing to feed both our TweetClicker and ImageClicker with new tweets and images. So please join us in clicking away to provide our UN partners with the situational awareness they need to coordinate their important relief efforts on the ground. The results of all our clicks are displayed here. Thank you for helping and for caring. If the Clickers or empty or offline temporarily, please check back again soon for more clicks. Try different browsers if the tweets/images are not showing up.

Update 11 [Nov 12th @ 5pm New York]: Only one more day to go! We’ll be feeding our TweetClicker and ImageClicker with new tweets and images by 7am London on the 13th. We will phase out operations by 2pm London, so this is the final sprint. The results of all our clicks are displayed here. Thank you for helping and for caring. If the Clickers are empty or offline temporarily, please check back again soon for more clicks. Try different browsers if the tweets/images are not showing up.

Update 12 [Nov 13th @ 9am London]: This is the last stretch, Clickers! We’ve fed our TweetClicker and ImageClicker with new tweets and images. We’ll be refilling them until 2pm London (10pm Manila) and phasing out shortly thereafter. Given that MicroMappers is still under development, we are pleased that this deployment went so well considering. The results of all our clicks are displayed here. Thank you for helping and for caring. If the Clickers are empty or offline temporarily, please check back again soon for more clicks. Try different browsers if the tweets/images are not showing up.

Update 13 [Nov 13th @ 11am London]: Just 3 hours left! Our UN OCHA colleagues have just asked us to prioritize the ImageClicker, so please focus on that Clicker. We’ll be refilling the ImageClicker until 2pm London (10pm Manila) and phasing out shortly thereafter. Given that MicroMappers is still under development, we are pleased that this deployment went so well considering. The results of all our clicks are displayed here. Thank you for helping and for caring. If the ImageClicker is empty or offline temporarily, please check back again soon for more clicks. Try different browsers if images are not showing up.

Social Media, Disaster Response and the Streetlight Effect

A police officer sees a man searching for his coin under a streetlight. After helping for several minutes, the exasperated officer asks if the man is sure that he lost his coin there. The man says “No, I lost them in the park a few blocks down the street.” The incredulous officer asks why he’s searching under the streetlight. The man replies, “Well this is where the light is.”[1] This parable describes the “streetlight effect,” the observational bias that results from using the easiest way to collect information. The streetlight effect is an important criticisms leveled against the use of social media for emergency management. This certainly is a valid concern but one that needs to be placed into context.

muttjeff01
I had the honor of speaking on a UN panel with Hans Rosling in New York last year. During the Q&A, Hans showed Member States a map of cell phone coverage in the Democratic Republic of the Congo (DRC). The map was striking. Barely 10% of the country seemed to have coverage. This one map shut down the entire conversation about the value of mobile technology for data collection during disasters. Now, what Hans didn’t show was a map of the DRC’s population distribution, which reveals that the majority of the country’s population lives in urban areas; areas that have cell phone coverage. Hans’s map was also static and thus did not convey the fact that the number cell phone subscribers increased by roughly 50% in the year leading up to the panel and ~50% again the year after.

Of course, the number of social media users in the DRC is far, far lower than the country’s 12.4 million unique cell phone subscribers. The map below, for example, shows the location of Twitter users over a 10 day period in October 2013. Now keep in mind that only 2% of users actually geo-tag their tweets. Also, as my colleague Kalev Leetaru recently discovered, the correlation between the location of Twitter users and access to electricity is very high, which means that every place on Earth that is electrified has a high probability of having some level of Twitter activity. Furthermore, Twitter was only launched 7 years ago compared to the first cell phone, which was built 30 years ago. So these are still early days for Twitter. But that doesn’t change the fact that there is clearly very little Twitter traffic in the DRC today. And just like the man in the parable above, we only have access to answers where an “electrified tweet” exists (if we restrict ourselves to the Twitter streetlight).

DRC twitter map 2

But this begs the following question, which is almost always overlooked: too little traffic for what? This study by Harvard colleagues, for example, found that Twitter was faster (and as accurate) as official sources at detecting the start and early progress of Cholera after the 2010 earthquake. And yet, the corresponding Twitter map of Haiti does not show significantly more activity than the DRC map over the same 10-day period. Keep in mind there were far fewer Twitter users in Haiti four years ago (i.e., before the earthquake). Other researchers have recently shown that “micro-crises” can also be detected via Twitter even though said crises elicit very few tweets by definition. More on that here.

Haiti twitter map

But why limit ourselves to the Twitter streetlight? Only a handful of “puzzle pieces” in our Haiti jigsaw may be tweets, but that doesn’t mean they can’t complement other pieces taken from traditional datasets and even other social media channels. Remember that there are five times more Facebook users than Twitter users. In certain contexts, however, social media may be of zero added value. I’ve reiterated this point again in recent talks at the Council on Foreign Relation and the UN. Social media is forming a new “nervous system” for our planet, but one that is still very young, even premature in places and certainly imperfect in representation. Then again, so was 911 in the 1970′s and 1980′s as explained here. In any event, focusing on more developed parts of the system (like Indonesia’s Twitter footprint below) makes more sense for some questions, as does complementing this new nervous system with other more mature data sources such mainstream media via as GDELT as advocated here.

Indonesia twitter map2

The Twitter map of the Manila area below is also the result of 10-day traffic. While “only” ~12 million Filipinos (13% of the country) lives in Manila, it behoves us to remember that urban populations across the world are booming. In just over 2,000 days, more than half of the population in the world’s developing regions will be living in urban areas according to the UN. Meanwhile, the rural population of developing countries will decline by half-a-billion in coming decades. At the same time, these rural populations will also grow a larger social media footprint since mobile phone penetration rates already stand at 89% in developing countries according to the latest ITU study (PDF). With Google and Facebook making it their (for-profit) mission to connect those off the digital grid, it is only a matter of time until very rural communities get online and click on ads.

Manila twitter map

The radical increase in population density means that urban areas will become even more vulnerable to major disasters (hence the Rockefeller Foundation’s program on 100 Resilience Cities). To be sure, as Rousseau noted in a letter to Voltaire after the massive 1756 Portugal Earthquake, “an earthquake occurring in wilderness would not be important to society.” In other words, disaster risk is a function of population density. At the same time, however, a denser population also means more proverbial streetlights. But just as we don’t need a high density of streetlights to find our way at night, we hardly need everyone to be on social media for tweets and Instagram pictures to shed some light during disasters and facilitate self-organized disaster response at the grassroots level.

Credit: Heidi RYDER Photography

My good friend Jaroslav Valůch recounted a recent conversation he had with an old fireman in a very small town in Eastern Europe who had never heard of Twitter, Facebook or crowdsourcing. The old man said: “During crisis, for us, the firemen, it is like having a dark house where only some rooms are lit (i.e., information from mayors and other official local sources in villages and cities affected). What you do [with social media and crowdsourcing], is that you are lighting up more rooms for us. So don’t worry, it is enough.”

No doubt Hans Rosling will show another dramatic map if I happen to sit on another panel with him. But this time I’ll offer context so that instead of ending the discussion, his map will hopefully catalyze a more informed debate. In any event, I suspect (and hope that) Hans won’t be the only one objecting to my optimism in this blog post. So as always, I welcome feedback from iRevolution readers. And as my colleague Andrew Zolli is fond of reminding folks at PopTech:

“Be tough on ideas, gentle on people.”

bio

Automatically Identifying Eyewitness Reporters on Twitter During Disasters

My colleague Kate Starbird recently shared a very neat study entitled “Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions” (PDF). As she and her co-authors rightly argue, “most Twitter activity during mass disruption events is generated by the remote crowd.” So can we use advanced computing to rapidly identify Twitter users who are reporting from ground zero? The answer is yes.

twitter-disaster-test

An important indicator of whether or not a Twitter user is reporting from the scene of a crisis is the number of times they are retweeted. During the Egyptian revolution in early 2011, “nearly 30% of highly retweeted Twitter users were physically present at those protest events.” Kate et al. drew on this insight to study tweets posted during the Occupy Wall Street (OWS) protests in September 2011. The authors manually analyzed a sample of more than 2,300 Twitter users to determine which were tweeting from the protests. They found that 4.5% of Twitter users in their sample were actually onsite. Using this dataset as training data, Kate et al. were able to develop a classifier that can automatically identify Twitter users reporting from the protests with an accuracy of just shy of 70%. I expect that more training data could very well help increase this accuracy score. 

In any event, “the information resulting from this or any filtering technique must be further combined with human judgment to assess its accuracy.” As the authors rightly note, “this ‘limitation’ fits well within an information space that is witnessing the rise of digital volunteer communities who monitor multiple data sources, including social media, looking to identify and amplify new information coming from the ground.” To be sure, “For volunteers like these, the use of techniques that increase the signal to noise ratio in the data has the potential to drastically reduce the amount of work they must do. The model that we have outlined does not result in perfect classification, but it does increase this signal-to-noise ratio substantially—tripling it in fact.”

I really hope that someone will leverage Kate’s important work to develop a standalone platform that automatically generates a list of Twitter users who are reporting from disaster-affected areas. This would be a very worthwhile contribution to the ecosystem of next-generation humanitarian technologies. In the meantime, perhaps QCRI’s Artificial Intelligence for Disaster Response (AIDR) platform will help digital humanitarians automatically identify tweets posted by eyewitnesses. I’m optimistic since we were able to create a machine learning classifier with an accuracy of 80%-90% for eyewitness tweets. More on this in our recent study

MOchin - talked to family

One question that remains is how to automatically identify tweets like the one above? This person is not an eyewitness but was likely on the phone with her family who are closer to the action. How do we develop a classifier to catch these “second-hand” eyewitness reports?

bio

Hashtag Analysis of #Westgate Crisis Tweets

In July 2013, my team and I at QCRI launched this dashboard to analyze hashtags used by Twitter users during crises. Our first case study, which is available here, focused on Hurricane Sandy. Since then, both the UN and Greenpeace have also made use of the dashboard to analyze crisis tweets.

QCRI_Dashboard

We just uploaded 700,000+ Westgate related tweets to the dashboard. The results are available here and also displayed above. The dashboard is still under development, so we very much welcome feedback on how to improve it for future analysis. You can upload your own tweets to the dashboard if you’d like to test drive the platform.

Bio

See also: Forensics Analysis of #Westgate Tweets (Link)

Forensics Analysis of #Westgate Tweets (Updated)

Update 1: Our original Twitter collection of Westgate-related tweets included the following hashtags: #Kenya, #Nairobi #WestgateAttack, #WestagateMall, #WestgatemallAttack, #Westgateshootout & #Westgate. While we overlooked #Westlands and Westlands, we have just fixed the oversight. This explains why the original results below differed from the iHub’s analysis which was based on tweets with the keywords Westgate and Westlands.

Update 2: The list below of first tweets to report the attack has been updated to include tweets referring to Westlands. These are denoted by an asterisk (*). 

I’m carrying out some preliminary “information forensics” research on the 740,000+ tweets posted during the Westgate attack. More specifically, I’m looking for any clues in the hours leading up to the attack that may reveal something out of the ordinary prior to the siege. Other questions I’m hoping to answer: Were any tweets posted during the crisis actionable? Did they add situational awareness? What kind of multimedia content was shared? Which tweets were posted by eyewitnesses? Were any tweets posted by the attackers or their supporters? If so, did these carry tactical information?

Screen Shot 2013-10-03 at 4.20.23 AM

If you have additional suggestions on what else to search for, please feel free to post them in the comments section below, thank you very much. I’ll be working with QCRI research assistants over the next few weeks to dive deeper into the first 24 hours of the attack as reported on Twitter. This research would not be possible where it not for my colleagues at GNIP who very kindly granted me access their platform to download all the tweets. I’ve just reviewed the first hour of tweets (which proved to be highly emotional, as expected). Below are the very first tweets posted about the attack.

[12:38:20 local time]*
gun shots in westlands? wtf??

[12:41:49]*
Weird gunshot like sounds in westlands : (

[12:42:35]
Explosions and gunfight ongoing in #nairobi 

[12:42:38]
Something really bad goin on at #Westgate. Gunshots!!!! Everyone’s fled. 

[12:43:17] *
Somewhere behind Westlands? What’s up RT @[username]: Explosions and gunfight ongoing in #nairobi

[12:44:03]
Are these gunshots at #Westgate? Just heard shooting from the road behind sarit, sounded like it was coming from westgate 

[12:44:37]*
@[username] shoot out at westgate westlands mall. going on for the last 10 min

[12:44:38]
Heavily armed thugs have taken over #WestGate shopping mall. Al occupants and shoppers are on the floor. Few gunshots heard…more to follow 

[12:44:51]*
did anyone else in westlands hear that? #KOT #Nairobi 

[12:45:04]
Seems like explosions and small arms fire are coming from Westlands or Gigiri #nairobi 

[12:46:12]
Gun fight #westgate… @ntvkenya @KTNKenya @citizentvkenya any news… 

[12:46:44]*
Several explosions followed by 10 minutes of running gunfight in Nairobi westlands

[12:46:59]
Small arms fire is continuing to be exchanged intermittently. #nairobi

[12:46:59]
Something’s going on around #Westgate #UkayCentre area. Keep away if you can

[12:47:54]
Gunshots and explosions heard around #Westgate anybody nearby? #Westlands

[12:48:33]*
@KenyaRedCross explosions and gunshots heard near Westgate Mall in Westlands. Fierce shoot out..casualties probable

[12:48:36]
Shoot to kill order #westgate

See also:

  • We Are Kenya: Global Map of #Westgate Tweets [Link]
  • Did Terrorists Use Twitter to Increase Situational Awareness? [Link]
  • Analyzing Tweets Posted During Mumbai Terrorist Attacks [Link]
  • Web 2.0 Tracks Attacks on Mumbai [Link]

bio

We Are Kenya: Global Map of #Westgate Tweets

I spent over an hour trying to write this first paragraph last week and still don’t know where to start. I grew up in Nairobi, my parents lived in Kenya for more than 15 years, their house was 5 minutes from Westgate, my brother’s partner is Kenyan and I previously worked for Ushahidi, a Kenyan not-for-profit group. Witnessing the tragedy online as it unfolded in real-time, graphic pictures and all, was traumatic;  I did not know the fate of several friends right away. This raw anxiety brought back memories from the devastating Haiti Earthquake of 2010; it took 12 long hours until I got word that my wife and friends had just made it out of a crumbling building.

WeAreKenya

What to do with this most recent experience and the pain that lingers? Amongst the graphic Westgate horror unfolding via Twitter, I also witnessed the outpouring of love, support and care; the offers of help from Kenyans and Somalis alike; collective grieving, disbelief and deep sadness; the will to remain strong, to overcome, to be united in support of the victims, their families and friends. So I reached out to several friends in Nairobi to ask them if aggregating and surfacing these tweets publicly could serve as a positive testament. They all said yes.

I therefore contacted colleagues at GNIP who kindly let me use their platform to collect more than 740,000 tweets related to the tragedy, starting from several hours before the horror began until the end of the siege. I then reached out to friends Claudia Perlich (data scientist) and Jer Throp (data artist) for their help on this personal project. They both kindly agreed to lend their expertise. Claudia quickly put together the map above based on the location of Twitter users responding to the events in Nairobi (click map to enlarge). The graph below depicts where Twitter users covering the Westgate tragedy were tweeting from during the first 35 hours or so.

Westgate Continents

Westgate Table Continents

We also did some preliminary content analysis of some keywords. The graph below displays the frequency of the terms “We Are One,” “Blood Appeal / Blood Donations,” and “Pray / Prayers” during the four day siege (click to enlarge).

Kenya We Are One

Jer suggested (thankfully) a more compelling and elegant data visualization approach, which we are exploring this week. So we hope to share some initial visuals in the coming days. If you have any specific suggestions on other ways to analyze and visualize the data, please do share them in the comments section below, thank you. 

bio

See also: Forensics Analysis of #Westgate Tweets [Link]

AIDR: Artificial Intelligence for Disaster Response

Social media platforms are increasingly used to communicate crisis information when major disasters strike. Hence the rise of Big (Crisis) Data. Humanitarian organizations, digital humanitarians and disaster-affected communities know that some of this user-generated content can increase situational awareness. The challenge is to identify relevant and actionable content in near real-time to triangulate with other sources and make more informed decisions on the spot. Finding potentially life-saving information in this growing stack of Big Crisis Data, however, is like looking for the proverbial needle in a giant haystack. This is why my team and I at QCRI are developing AIDR.

haystpic_pic

The free and open source Artificial Intelligence for Disaster Response platform leverages machine learning to automatically identify informative content on Twitter during disasters. Unlike the vast majority of related platforms out there, we go beyond simple keyword search to filter for informative content. Why? Because recent research shows that keyword searches can miss over 50% of relevant content posted on Twitter. This is very far from optimal for emergency response. Furthermore, tweets captured via keyword search may not be relevant since words can have multiple meanings depending on context. Finally, keywords are restricted to one language only. Machine learning overcomes all these limitations, which is why we’re developing AIDR.

So how does AIDR work? There are three components of AIDR: the Collector, Trainer and Tagger. The Collector simply allows you to collect and save a collection of tweets posted during a disaster. You can download these tweets for analysis at any time and also use them to create an automated filter using machine learning, which is where the Trainer and Tagger come in. The Trainer allows one or more users to train the AIDR platform to automatically tag tweets of interest in a given collection of tweets. Tweets of interest could include those that refer to “Needs”, “Infrastructure Damage” or “Rumors” for example.

AIDR_Collector

A user creates a Trainer for tweets-of-interest by: 1) Creating a name for their Trainer, e.g., “My Trainer”; 2) Identifying topics of interest such as “Needs”, “Infrastructure Damage”,  “Rumors” etc. (as many topics as the user wants); and 3) Classifying tweets by topic of interest. This last step simply involves reading collected tweets and classifying them as “Needs”, “Infrastructure Damage”, “Rumor” or “Other,” for example. Any number of users can participate in classifying these tweets. That is, once a user creates a Trainer, she can classify the tweets herself, or invite her organization to help her classify, or ask the crowd to help classify the tweets, or all of the above. She simply shares a link to her training page with whoever she likes. If she choses to crowdsource the classification of tweets, AIDR includes a built-in quality control mechanism to ensure that the crowdsourced classification is accurate.

As noted here, we tested AIDR in response to the Pakistan Earthquake last week. We quickly hacked together the user interface displayed below, so functionality rather than design was our immediate priority. In any event, digital humanitarian volunteers from the Standby Volunteer Task Force (SBTF) tagged over 1,000 tweets based on the different topics (labels) listed below. As far as we know, this was the first time that a machine learning classifier was crowdsourced in the context of a humanitarian disaster. Click here for more on this early test.

AIDR_Trainer

The Tagger component of AIDR analyzes the human-classified tweets from the Trainer to automatically tag new tweets coming in from the Collector. This is where the machine learning kicks in. The Tagger uses the classified tweets to learn what kinds of tweets the user is interested in. When enough tweets have been classified (20 minimum), the Tagger automatically begins to tag new tweets by topic of interest. How many classified tweets is “enough”? This will vary but the more tweets a user classifies, the more accurate the Tagger will be. Note that each automatically tagged tweet includes an accuracy score—i.e., the probability that the tweet was correctly tagged by the automatic Tagger.

The Tagger thus displays a list of automatically tagged tweets updated in real-time. The user can filter this list by topic and/or accuracy score—display all tweets tagged as “Needs” with an accuracy of 90% or more, for example. She can also download the tagged tweets for further analysis. In addition, she can share the data link of her Tagger with developers so the latter can import the tagged tweets directly into to their own platforms, e.g., MicroMappers, Ushahidi, CrisisTracker, etc. (Note that AIDR already powers CrisisTracker by automating the classification of tweets). In addition, the user can share a display link with individuals who wish to embed the live feed into their websites, blogs, etc.

In sum, AIDR is an artificial intelligence engine developed to power consumer applications like MicroMappers. Any number of other tools can also be added to the AIDR platform, like the Credibility Plugin for Twitter that we’re collaborating on with partners in India. Added to AIDR, this plugin will score individual tweets based on the probability that they convey credible information. To this end, we hope AIDR will become a key node in the nascent ecosystem of next-generation humanitarian technologies. We plan to launch a beta version of AIDR at the 2013 CrisisMappers Conference (ICCM 2013) in Nairobi, Kenya this November.

In the meantime, we welcome any feedback you may have on the above. And if you want to help as an alpha tester, please get in touch so I can point you to the Collector tool, which you can start using right away. The other AIDR tools will be open to the same group of alpha tester in the coming weeks. For more on AIDR, see also this article in Wired.

AIDR_logo

The AIDR project is a joint collaboration with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). Other organizations that have expressed an interest in AIDR include the International Committee of the Red Cross (ICRC), American Red Cross (ARC), Federal Emergency Management Agency (FEMA), New York City’s Office for Emergency Management and their counterpart in the City of San Francisco. 

bio

Note: In the future, AIDR could also be adapted to take in Facebook status updates and text messages (SMS).