Tag Archives: Twitter

We Are Kenya: Global Map of #Westgate Tweets

I spent over an hour trying to write this first paragraph last week and still don’t know where to start. I grew up in Nairobi, my parents lived in Kenya for more than 15 years, their house was 5 minutes from Westgate, my brother’s partner is Kenyan and I previously worked for Ushahidi, a Kenyan not-for-profit group. Witnessing the tragedy online as it unfolded in real-time, graphic pictures and all, was traumatic;  I did not know the fate of several friends right away. This raw anxiety brought back memories from the devastating Haiti Earthquake of 2010; it took 12 long hours until I got word that my wife and friends had just made it out of a crumbling building.

WeAreKenya

What to do with this most recent experience and the pain that lingers? Amongst the graphic Westgate horror unfolding via Twitter, I also witnessed the outpouring of love, support and care; the offers of help from Kenyans and Somalis alike; collective grieving, disbelief and deep sadness; the will to remain strong, to overcome, to be united in support of the victims, their families and friends. So I reached out to several friends in Nairobi to ask them if aggregating and surfacing these tweets publicly could serve as a positive testament. They all said yes.

I therefore contacted colleagues at GNIP who kindly let me use their platform to collect more than 740,000 tweets related to the tragedy, starting from several hours before the horror began until the end of the siege. I then reached out to friends Claudia Perlich (data scientist) and Jer Throp (data artist) for their help on this personal project. They both kindly agreed to lend their expertise. Claudia quickly put together the map above based on the location of Twitter users responding to the events in Nairobi (click map to enlarge). The graph below depicts where Twitter users covering the Westgate tragedy were tweeting from during the first 35 hours or so.

Westgate Continents

Westgate Table Continents

We also did some preliminary content analysis of some keywords. The graph below displays the frequency of the terms “We Are One,” “Blood Appeal / Blood Donations,” and “Pray / Prayers” during the four day siege (click to enlarge).

Kenya We Are One

Jer suggested (thankfully) a more compelling and elegant data visualization approach, which we are exploring this week. So we hope to share some initial visuals in the coming days. If you have any specific suggestions on other ways to analyze and visualize the data, please do share them in the comments section below, thank you. 

bio

See also: Forensics Analysis of #Westgate Tweets [Link]

AIDR: Artificial Intelligence for Disaster Response

Social media platforms are increasingly used to communicate crisis information when major disasters strike. Hence the rise of Big (Crisis) Data. Humanitarian organizations, digital humanitarians and disaster-affected communities know that some of this user-generated content can increase situational awareness. The challenge is to identify relevant and actionable content in near real-time to triangulate with other sources and make more informed decisions on the spot. Finding potentially life-saving information in this growing stack of Big Crisis Data, however, is like looking for the proverbial needle in a giant haystack. This is why my team and I at QCRI are developing AIDR.

haystpic_pic

The free and open source Artificial Intelligence for Disaster Response platform leverages machine learning to automatically identify informative content on Twitter during disasters. Unlike the vast majority of related platforms out there, we go beyond simple keyword search to filter for informative content. Why? Because recent research shows that keyword searches can miss over 50% of relevant content posted on Twitter. This is very far from optimal for emergency response. Furthermore, tweets captured via keyword search may not be relevant since words can have multiple meanings depending on context. Finally, keywords are restricted to one language only. Machine learning overcomes all these limitations, which is why we’re developing AIDR.

So how does AIDR work? There are three components of AIDR: the Collector, Trainer and Tagger. The Collector simply allows you to collect and save a collection of tweets posted during a disaster. You can download these tweets for analysis at any time and also use them to create an automated filter using machine learning, which is where the Trainer and Tagger come in. The Trainer allows one or more users to train the AIDR platform to automatically tag tweets of interest in a given collection of tweets. Tweets of interest could include those that refer to “Needs”, “Infrastructure Damage” or “Rumors” for example.

AIDR_Collector

A user creates a Trainer for tweets-of-interest by: 1) Creating a name for their Trainer, e.g., “My Trainer”; 2) Identifying topics of interest such as “Needs”, “Infrastructure Damage”,  “Rumors” etc. (as many topics as the user wants); and 3) Classifying tweets by topic of interest. This last step simply involves reading collected tweets and classifying them as “Needs”, “Infrastructure Damage”, “Rumor” or “Other,” for example. Any number of users can participate in classifying these tweets. That is, once a user creates a Trainer, she can classify the tweets herself, or invite her organization to help her classify, or ask the crowd to help classify the tweets, or all of the above. She simply shares a link to her training page with whoever she likes. If she choses to crowdsource the classification of tweets, AIDR includes a built-in quality control mechanism to ensure that the crowdsourced classification is accurate.

As noted here, we tested AIDR in response to the Pakistan Earthquake last week. We quickly hacked together the user interface displayed below, so functionality rather than design was our immediate priority. In any event, digital humanitarian volunteers from the Standby Volunteer Task Force (SBTF) tagged over 1,000 tweets based on the different topics (labels) listed below. As far as we know, this was the first time that a machine learning classifier was crowdsourced in the context of a humanitarian disaster. Click here for more on this early test.

AIDR_Trainer

The Tagger component of AIDR analyzes the human-classified tweets from the Trainer to automatically tag new tweets coming in from the Collector. This is where the machine learning kicks in. The Tagger uses the classified tweets to learn what kinds of tweets the user is interested in. When enough tweets have been classified (20 minimum), the Tagger automatically begins to tag new tweets by topic of interest. How many classified tweets is “enough”? This will vary but the more tweets a user classifies, the more accurate the Tagger will be. Note that each automatically tagged tweet includes an accuracy score—i.e., the probability that the tweet was correctly tagged by the automatic Tagger.

The Tagger thus displays a list of automatically tagged tweets updated in real-time. The user can filter this list by topic and/or accuracy score—display all tweets tagged as “Needs” with an accuracy of 90% or more, for example. She can also download the tagged tweets for further analysis. In addition, she can share the data link of her Tagger with developers so the latter can import the tagged tweets directly into to their own platforms, e.g., MicroMappers, Ushahidi, CrisisTracker, etc. (Note that AIDR already powers CrisisTracker by automating the classification of tweets). In addition, the user can share a display link with individuals who wish to embed the live feed into their websites, blogs, etc.

In sum, AIDR is an artificial intelligence engine developed to power consumer applications like MicroMappers. Any number of other tools can also be added to the AIDR platform, like the Credibility Plugin for Twitter that we’re collaborating on with partners in India. Added to AIDR, this plugin will score individual tweets based on the probability that they convey credible information. To this end, we hope AIDR will become a key node in the nascent ecosystem of next-generation humanitarian technologies. We plan to launch a beta version of AIDR at the 2013 CrisisMappers Conference (ICCM 2013) in Nairobi, Kenya this November.

In the meantime, we welcome any feedback you may have on the above. And if you want to help as an alpha tester, please get in touch so I can point you to the Collector tool, which you can start using right away. The other AIDR tools will be open to the same group of alpha tester in the coming weeks. For more on AIDR, see also this article in Wired.

AIDR_logo

The AIDR project is a joint collaboration with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA). Other organizations that have expressed an interest in AIDR include the International Committee of the Red Cross (ICRC), American Red Cross (ARC), Federal Emergency Management Agency (FEMA), New York City’s Office for Emergency Management and their counterpart in the City of San Francisco. 

bio

Note: In the future, AIDR could also be adapted to take in Facebook status updates and text messages (SMS).

Developing MicroFilters for Digital Humanitarian Response

Filtering—or the lack thereof—presented the single biggest challenge when we tested MicroMappers last week in response to the Pakistan Earthquake. As my colleague Clay Shirky notes, the challenge with “Big Data” is not information overload but rather filter failure. We need to make damned sure that we don’t experience filter failure again in future deployments. To ensure this, I’ve decided to launch a stand-alone and fully interoperable platform called MicroFilters. My colleague Andrew Ilyas will lead the technical development of the platform with support from Ji Lucas. Our plan is to launch the first version of MicroFilters before the CrisisMappers conference (ICCM 2013) in November.

MicroFilters

A web-based solution, MicroFilters will allow users to upload their own Twitter data for automatic filtering purposes. Users will have the option of uploading this data using three different formats: text, CSV and JSON. Once uploaded, users can elect to perform one or more automatic filtering tasks from this menu of options:

[   ]  Filter out retweets
[   ]  Filter for unique tweets
[   ]  Filter tweets by language [English | Other | All]
[   ]  Filter for unique image links posted in tweets [Small | Medium | Large | All]
[   ]  Filter for unique video links posted in tweets [Short | Medium | Long | All]
[   ]  Filter for unique image links in news articles posted in tweets  [S | M | L | All]
[   ]  Filter for unique video links in news articles posted in tweets [S | M | L | All]

Note that “unique image and video links” refer to the long URLs not shortened URLs like bit.ly. After selecting the desired filtering option(s), the user simply clicks on the “Filter” button. Once the filtering is completed (a countdown clock is displayed to inform the user of the expected processing time), MicroFilters provides the user with a download link for the filtered results. The link remains live for 10 minutes after which the data is automatically deleted. If a CSV file was uploaded for filtering, the file format for download is also in CSV format; likewise for text and JSON files. Note that filtered tweets will appear in reverse chronological order (assuming time-stamp data was included in the uploaded file) when downloaded. The resulting file of filtered tweets can then be uploaded to MicroMappers within seconds.

In sum, MicroFilters will be invaluable for future deployments of MicroMappers. Solving the “filter failure” problem will enable digital humanitarians to process far more relevant data and in a more timely manner. Since MicroFilters will be a standalone platform, anyone else will also have access to these free and automatic filtering services. In the meantime, however, we very much welcome feedback, suggestions and offers of help, thank you!

bio

Results of MicroMappers Response to Pakistan Earthquake (Updated)

Update: We’re developing & launching MicroFilters to improve MicroMappers.

About 47 hours ago, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) in response to the Pakistan Earthquake. The activation request was for 48 hours, so the deployment will soon phase out. As already described here, the Standby Volunteer Task Force (SBTF) teamed up with QCRI to carry out an early test of MicroMappers, which was not set to launch until next month. This post shares some initial thoughts on how the test went along with preliminary results.

Pakistan Quake

During ~40 hours, 109 volunteers from the SBTF and the public tagged just over 30,000 tweets that were posted during the first 36 hours or so after the quake. We were able to automatically collect these tweets thanks to our partnership with GNIP and specifically filtered for said tweets using half-a-dozen hashtags. Given the large volume of tweets collected, we did not require that each tweet be tagged at least 3 times by individual volunteers to ensure data quality control. Out of these 30,000+ tweets, volunteers tagged a total of 177 tweets as noting needs or infrastructure damage. A review of these tweets by the SBTF concluded that none were actually informative or actionable.

Just over 350 pictures were tweeted in the aftermath of the earthquake. These were uploaded to the ImageClicker for tagging purposes. However, none of the pictures captured evidence of infrastructure damage. In fact, the vast majority were unrelated to the earthquake. This was also true of pictures published in news articles. Indeed, we used an automated algorithm to identify all tweets with links to news articles; this algorithm would then crawl these articles for evidence of images. We found that the vast majority of these automatically extracted pictures were related to politics rather than infrastructure damage.

Pakistan Quake2

A few preliminary thoughts and reflections from this first test of MicroMappers. First, however, a big, huge, gigantic thanks to my awesome QCRI team: Ji Lucas, Imran Muhammad and Kiran Garimella; to my outstanding colleagues on the SBTF Core Team including but certainly not limited to Jus Mackinnon, Melissa Elliott, Anahi A. Iaccuci, Per Aarvik & Brendan O’Hanrahan (bios here); to the amazing SBTF volunteers and members of the general public who rallied to tag tweets and images—in particular our top 5 taggers: Christina KR, Leah H, Lubna A, Deborah B and Joyce M! Also bravo to volunteers in the Netherlands, UK, US and Germany for being the most active MicroMappers; and last but certainly not least, big, huge and gigantic thanks to Andrew Ilyas for developing the algorithms to automatically identify pictures and videos posted to Twitter.

So what did we learn over the past 48 hours? First, the disaster-affected region is a remote area of south-western Pakistan with a very light social media footprint, so there was practically no user-generated content directly relevant to needs and damage posted on Twitter during the first 36 hours. In other words, there were no needles to be found in the haystack of information. This is in stark contrast to our experience when we carried out a very similar operation following Typhoon Pablo in the Philippines. Obviously, if there’s little to no social media footprint in a disaster-affected area, then monitoring social media is of no use at all to anyone. Note, however, that MicroMappers could also be used to tag 30,000+ text messages (SMS). (Incidentally, since the earthquake struck around 12noon local time, there was only about 18 hours of daylight during the 36-hour period for which we collected the tweets).

Second, while the point of this exercise was not to test our pre-processing filters, it was clear that the single biggest problem was ultimately with the filtering. Our goal was to upload as many tweets as possible to the Clickers and stress-test the apps. So we only filtered tweets using a number of general hashtags such as #Pakistan. Furthermore, we did not filter out any retweets, which probably accounted for 2/3 of the data, nor did we filter by geography to ensure that we were only collecting and thus tagging tweets from users based in Pakistan. This was a major mistake on our end. We were so pre-occupied with testing the actual Clickers that we simply did not pay attention to the pre-processing of tweets. This was equally true of the images uploaded to the ImageClicker.

Pakistan Quake 3

So where do we go from here? Well we have pages and pages worth of feedback to go through and integrate in the next version of the Clickers. For me, one of the top priorities is to optimize our pre-processing algorithms and ensure that the resulting output can be automatically uploaded to the Clickers. We have to refine our algorithms and make damned sure that we only upload unique tweets and images to our Clickers. At most, volunteers should not see the same tweet or image more than 3 times for verification purposes. We should also be more careful with our hashtag filtering and also consider filtering by geography. Incidentally, when our free & open source AIDR platform becomes operational in November, we’ll also have the ability to automatically identify tweets referring to needs, reports of damage, and much, much more.

In fact, AIDR was also tested for the very first time. SBTF volunteers tagged about 1,000 tweets, and just over 130 of the tags enabled us to create an accurate classifier that can automatically identify whether a tweet is relevant for disaster response efforts specifically in Pakistan (80% accuracy). Now, we didn’t apply this classifier on incoming tweets because AIDR uses streaming Twitter data, not static, archived data which is what we had (in the form of CSV files). In any event, we also made an effort to create classifiers for needs and infrastructure damage but did not get enough tags to make these accurate enough. Typically, we need a minimum of 20 or so tags (i.e., examples of actual tweets referring to needs or damage). The more tags, the more accurate the classifier.

The reason there were so few tags, however, is because there were very few to no informative tweets referring to needs or infrastructure damage during the first 36 hours. In any event, I believe this was the very first time that a machine learning classifier was crowdsourced for disaster response purposes. In the future, we may want to first crowdsource a machine learning classifier for disaster relevant tweets and then upload the results to MicroMappers; this would reduce the number of unrelated tweets  displayed on a TweetClicker.

As expected, we have also received a lot of feedback vis-a-vis user experience and the user interface of the Clickers. Speed is at the top of the list. That is, making sure that once I’ve clicked on a tweet/image, the next tweet/image automatically appears. At times, I had to wait more than 20 seconds for the next item to load. We also need to add more progress bars such as the number of tweets or images that remain to be tagged—a countdown display, basically. I could go on and on, frankly, but hopefully these early reflections are informative and useful to others developing next-generation humanitarian technologies. In sum, there is a lot of work to be done still. Onwards!

bio

Enabling Crowdfunding on Twitter for Disaster Response

Twitter is increasingly used to communicate needs during crises. These needs often include requests for information and financial assistance, for example. Identifying these tweets in real-time requires the use of advanced computing and machine learning in particular. This is why my team and I at QCRI are developing the Artificial Intelligence for Disaster Response (AIDR) platform. My colleague Hemant Purohit has been working with us to develop machine learning classifiers to automatically identify and disaggregate between different types of needs. He has also developed classifiers to automatically identify twitter users offering different types of help including financial support. Our aim is to develop a “Match.com” solution to match specific needs with offers of help. What we’re missing, however, is for an easy way to post micro-donations on Twitter as a result of matching financial needs and offers.

tinyGive-logo (1)

This is where my colleague Clarence Wardell and his start-up TinyGive may come in. Geared towards nonprofits, TinyGive is the easiest way to accept donations on Twitter. Indeed, Donating via TinyGive is as simple as tweeting five words: “Hey @[organization], here’s $5! #tinygive”. I recently tried the service at a fundraiser and it really is that easy. TinyGive turns your tweet into an actual donation (and public endorsement), thus drastically reducing the high barriers that currently exist for Twitter users who wish to help others. Indeed, many of the barriers that currently exist in the mobile donation space is overcome by TinyGive.

Combining the AIDR platform with TinyGive would enable us to automatically identify those asking for financial assistance following a disaster and also automatically tweet a link to TinyGive to those offering financial assistance via Twitter. We’re not all affected the same way by disasters and those of us who are in proximity to said disaster but largely unscathed could use Twitter to quickly help those nearby with a simple micro-donation here and there. Think of it as time-critical, peer-to-peer localvesting.

At this recent White House event on humanitarian technology and innovation (which I had been invited to speak at but regrettably had prior commitments), US Chief Technology Office Todd Park talks about the need for “A crowdfunding platform for small businesses and others to receive access to capital to help rebuild after a disaster, including a rating system that encourages rebuilding efforts that improve the community.” Time-critical crowdfunding can build resilience and enable communities to bounce back (and forward) more quickly following a disaster. TinyGive may thus be able to play a role in building community resilience as well.

In the future, my hope is that platforms like TinyGive will also allow disaster-affected individuals (in addition to businesses and other organizations) to receive access to micro-donations during times of need directly via Twitter. There are of course important challenges still ahead, but the self-help, mutual-aid approach to disaster response that I’ve been promoting for years should also include crowdfunding solutions. So if you’ve heard of other examples like TinyGive applied to disaster response, please let me know via the comments section below. Thank you!

bio

Can Official Disaster Response Apps Compete with Twitter?

There are over half-a-billion Twitter users, with an average of 135,000 new users signing up on a daily basis (1). Can emergency management and disaster response organizations win over some Twitter users by convincing them to use their apps in addition to Twitter? For example, will FEMA’s smartphone app gain as much “market share”? The app’s new crowdsourcing feature, “Disaster Reporter,” allows users to submit geo-tagged disaster-related images, which are then added to a public crisis map. So the question is, will more images be captured via FEMA’s app or from Twitter users posting Instagram pictures?

fema_app

This question is perhaps poorly stated. While FEMA may not get millions of users to share disaster-related pictures via their app, it is absolutely critical for disaster response organizations to explicitly solicit crisis information from the crowd. See my blog post “Social Media for Emergency Management: Question of Supply and Demand” for more information on the importance demand-driven crowdsourcing. The advantage of soliciting crisis information from a smartphone app is that the sourced information is structured and thus easily machine readable. For example, the pictures taken with FEMA’s app are automatically geo-tagged, which means they can be automatically mapped if need be.

While many, many more picture may be posted on Twitter, these may be more difficult to map. The vast majority of tweets are not geo-tagged, which means more sophisticated computational solutions are necessary. Instagram pictures are geo-tagged, but this information is not publicly available. So smartphone apps are a good way to overcome these challenges. But we shouldn’t overlook the value of pictures shared on Twitter. Many can be geo-tagged, as demonstrated by the Digital Humanitarian Network’s efforts in response to Typhoon Pablo. More-over, about 40% of pictures shared on Twitter in the immediate aftermath of the Oklahoma Tornado had geographic data. In other words, while the FEMA app may have 10,000 users who submit a picture during a disaster, Twitter may have 100,000 users posting pictures. And while only 40% of the latter pictures may be geo-tagged, this would still mean 40,000 pictures compared to FEMA’s 10,000. Recall that over half-a-million Instagram pictures were posted during Hurricane Sandy alone.

The main point, however, is that FEMA could also solicit pictures via Twitter and ask eyewitnesses to simply geo-tag their tweets during disasters. They could also speak with Instagram and perhaps ask them to share geo-tag data for solicited images. These strategies would render tweets and pictures machine-readable and thus automatically mappable, just like the pictures coming from FEMA’s app. In sum, the key issue here is one of policy and the best solution is to leverage multiple platforms to crowdsource crisis information. The technical challenge is how to deal with the high volume of pictures shared in real-time across multiple platforms. This is where microtasking comes in and why MicroMappers is being developed. For tweets and images that do not contain automatically geo-tagged data, MicroMappers has a microtasking app specifically developed to crowd-source the manual tagging of images.

In sum, there are trade-offs. The good news is that we don’t have to choose one solution over the other; they are complementary. We can leverage both a dedicated smartphone app and very popular social media platforms like Twitter and Facebook to crowdsource the collection of crisis information. Either way, a demand-driven approach to soliciting relevant information will work best, both for smartphone apps and social media platforms.

Bio

 

Taking the Pulse of the Boston Marathon Bombings on Twitter

Social media networks are evolving a new nervous system for our planet. These real-time networks provide immediate feedback loops when media-rich societies experience a shock. My colleague Todd Mostak recently shared the tweet map below with me which depicts tweets referring to “marathon” (in red) shortly after the bombs went off during Boston’s marathon. The green dots represent all the other tweets posted at the time. Click on the map to enlarge. (It is always difficult to write about data visualizations of violent events because they don’t capture the human suffering, thus seemingly minimizing the tragic events).

Credit: Todd Mostak

Visualizing a social system at this scale gives a sense that we’re looking at a living, breathing organism, one that has just been wounded. This impression is even more stark in the dynamic visualization captured in the video below.

This an excerpt of Todd’s longer video, available here. Note that this data visualization uses less than 3% of all posted tweets because 97%+ of tweets are not geo-tagged. So we’re not even seeing the full nervous system in action. For more analysis of tweets during the marathon, see this blog post entitled “Boston Marathon Explosions: Analyzing First 1,000 Seconds on Twitter.”

bio

Using Social Media to Predict Disaster Resilience (Updated)

Social media is used to monitor and predict all kinds of social, economic, political and health-related behaviors these days. Could social media also help identify more disaster resilient communities? Recent empirical research reveals that social capital is the most important driver of disaster resilience; more so than economic and material resources. To this end, might a community’s social media footprint indicate how resilience it is to disasters? After all, “when extreme events at the scale of Hurricane Sandy happen, they leave an unquestionable mark on social media activity” (1). Could that mark be one of resilience?

Twitter Heatmap Hurricane

Sentiment analysis map of tweets posted during Hurricane Sandy.
Click on image to learn more.

In the immediate aftermath of a disaster, “social ties can serve as informal insurance, providing victims with information, financial help and physical assistance” (2). This informal insurance, “or mutual assistance involves friends and neighbors providing each other with information, tools, living space, and other help” (3). At the same time, social media platforms like Twitter are increasingly used to communicate during crises. In fact, data driven research on tweets posted during disasters reveal that many tweets provide victims with information, help, tools, living space, assistance and other more. Recent studies argue that “such interactions are not necessarily of inferior quality compared to simultaneous, face-to-face interactions” (4). What’s more, “In addition to the preservation and possible improvement of existing ties, interaction through social media can foster the creation of new relations” (5). Meanwhile, and “contrary to prevailing assumptions, there is evidence that the boom in social media that connects users globally may have simultaneously increased local connections” (6).

A recent study of 5 billion tweets found that Japan, Canada, Indonesia and South Korea have highest percentage of reciprocity on Twitter (6). This is important because “Network reciprocity tells us about the degree of cohesion, trust and social capital in sociology” (7). In terms of network density, “the highest values correspond to South Korea, Netherlands and Australia.” The findings further reveal that “communities which tend to be less hierarchical and more reciprocal, also displays happier language in their content updates. In this sense countries with high conversation levels … display higher levels of happiness too” (8).

A related study found that the language used in tweets can be used to predict the subjective well-being of those users (9). The same analysis revealed that the level of happiness expressed by Twitter users in a community are correlated with members of that same community who are not on social media. Data-driven studies on happiness also show that social bonds and social activities are more conducive to happiness than financial capital (10). Social media also includes blogs. A new study analyzed more than 18.5 million blog posts found that “bloggers with lower social capital have fewer positive moods and more negative moods [as revealed by their posts] than those with higher social capital” (11).

Collectivism vs Individualism countries

Finally, another recent study analyzed more than 2.3 million twitter users and found that users in collectivist countries engage with others more than those in individualistic countries (12). “In high collectivist cultures, users tend to focus more on the community to which they belong,” while  people in individualistic countries are “in a more loosely knit social network,” and so typically “look after themselves or only after immediate family members” (13). The map above displays collectivist and individualistic countries; with the former represented by lighter shades and the latter darker colors.

In sum, one should be able to measure “digital social capital” and thus disaster resilience by analyzing social media networks before, during and after disasters. “These disaster responses may determine survival, and we can measure the likelihood of them happening” via digital social capital dynamics reflected on social media (14). One could also combine social network analysis with sentiment analysis to formulate various indexes. Anyone interested in pursuing this line of research?

bio

Analyzing Crisis Hashtags on Twitter (Updated)

Update: You can now upload your own tweets to the Crisis Hashtags Analysis Dashboard here

Hashtag footprints can be revealing. The map below, for example, displays the top 200 locations in the world with the most Twitter hashtags. The top 5 are Sao Paolo, London, Jakarta, Los Angeles and New York.

Hashtag map

A recent study (PDF) of 2 billion geo-tagged tweets and 27 million unique hashtags found that “hashtags are essentially a local phenomenon with long-tailed life spans.” The analysis also revealed that hashtags triggered by external events like disasters “spread faster than hashtags that originate purely within the Twitter network itself.” Like other metadata, hashtags can be  informative in and of themselves. For example, they can provide early warning signals of social tensions in Egypt, as demonstrated in this study. So might they also reveal interesting patterns during and after major disasters?

Tens of thousands of distinct crisis hashtags were posted to Twitter during Hurricane Sandy. While #Sandy and #hurricane featured most, thousands more were also used. For example: #SandyHelp, #rallyrelief, #NJgas, #NJopen, #NJpower, #staysafe, #sandypets, #restoretheshore, #noschool, #fail, etc. NJpower, for example, “helped keep track of the power situation throughout the state. Users and news outlets used this hashtag to inform residents where power outages were reported and gave areas updates as to when they could expect their power to come back” (1).

Sandy Hashtags

My colleagues and I at QCRI are studying crisis hashtags to better understand the variety of tags used during and in the immediate aftermath of major crises. Popular hashtags used during disasters often overshadow more hyperlocal ones making these less discoverable. Other challenges include the: “proliferation of hashtags that do not cross-pollinate and a lack of usability in the tools necessary for managing massive amounts of streaming information for participants who needed it” (2). To address these challenges and analyze crisis hashtags, we’ve just launched a Crisis Hashtags Analytics Dashboard. As displayed below, our first case study is Hurricane Sandy. We’ve uploaded about half-a-million tweets posted between October 27th to November 7th, 2012 to the dashboard.

QCRI_Dashboard

Users can visualize the frequency of tweets (orange line) and hashtags (green line) over time using different time-steps, ranging from 10 minute to 1 day intervals. They can also “zoom in” to capture more minute changes in the number of hashtags per time interval. (The dramatic drop on October 30th is due to a server crash. So if you have access to tweets posted during those hours, I’d be  grateful if you could share them with us).

Hashtag timeline

In the second part of the dashboard (displayed below), users can select any point on the graph to display the top “K” most frequent hashtags. The default value for K is 10 (e.g., top-10 most frequent hashtags) but users can change this by typing in a different number. In addition, the 10 least-frequent hashtags are displayed, as are the 10 “middle-most” hashtags. The top-10 newest hashtags posted during the selected time are also displayed as are the hashtags that have seen the largest increase in frequency. These latter two metrics, “New K” and “Top Increasing K”, may provide early warning signals during disasters. Indeed, the appearance of a new hashtag can reveal a new problem or need while a rapid increase in the frequency of some hashtags can denote the spread of a problem or need.

QCRI Dashboard 2

The third part of the dashboard allows users to visualize and compare the frequency of top hashtags over time. This feature is displayed in the screenshot below. Patterns that arise from diverging or converging hashtags may indicate important developments on the ground.

QCRI Dashboard 3

We’re only at the early stages of developing our hashtags analytics platform (above), but we hope the tool will provide insights during future disasters. For now, we’re simply experimenting and tinkering. So feel free to get in touch if you would like to collaborate and/or suggest some research questions.

Bio

Acknowledgements: Many thanks to QCRI colleagues Ahmed Meheina and Sofiane Abbar for their work on developing the dashboard.

Boston Marathon Explosions: Analyzing First 1,000 Seconds on Twitter

My colleagues Rumi Chunara and John Brownstein recently published a short co-authored study entitled “Twitter as a Sentinel in Emergency Situations: Lessons from the Boston Marathon Explosions.” At 2.49pm EDT on April 15, two improvised bombs exploded near the finish line of the 117th Boston Marathon. Ambulances left the scene approximately 9 minutes later just as public health authorities alerted regional emergency departments of the incident.

Meanwhile, on Twitter:

BostonTweets

An analysis of tweets posted within a 35 mile radius of the finish line reveals that the word stems containing “explos*” and “explod*” appeared on Twitter just 3 minutes after the explosions. “While an increase in messages indicating an emergency from a particular location may not make it possible to fully ascertain the circumstances of an incident without computational or human review, analysis of such data could help public safety officers better understand the location or specifics of explosions or other emergencies.”

In terms of geographical coverage, many of the tweets posted during the first 10 minutes were from witnesses in the immediate vicinity of the finish line. “Because of their proximity to the event and content of their postings, these individuals might be witnesses to the bombings or be of close enough proximity to provide helpful information. These finely detailed geographic data can be used to localize and characterize events assisting emergency response in decision-making.”

BostonBombing2

Ambulances were already on site for the marathon. This is rarely the case for the majority of crises, however. In those more common situations, “crowdsourced information may uniquely provide extremely timely initial recognition of an event and specific clues as to what events may be unfolding.” Of course, user-generated content is not always accurate. Filtering and analyzing this content in real-time is the first step in the verification process, hence the importance of advanced computing. More on this here.

“Additionally, by comparing newly observed data against temporally adjusted keyword frequencies, it is possible to identify aberrant spikes in keyword use. The inclusion of geographical data allows these spikes to be geographically adjusted, as well. Prospective data collection could also harness larger and other streams of crowdsourced data, and use more comprehensive emergency-related keywords and language processing to increase the sensitivity of this data source.” Furthermore, “the analysis of multiple keywords could further improve these prior probabilities by reducing the impact of single false positive keywords derived from benign events.”

bio