Category Archives: Crowdsourcing

Using AIDR to Collect and Analyze Tweets from Chile Earthquake

Wish you had a better way to make sense of Twitter during disasters than this?

Type in a keyword like #ChileEarthquake in Twitter’s search box above and you’ll see more tweets than you can possibly read in a day let alone keep up with for more than a few minutes. Wish there way were an easy, free and open source solution? Well you’ve come to the right place. My team and I at QCRI are developing the Artificial Intelligence for Disaster Response (AIDR) platform to do just this. Here’s how it works:

First you login to the AIDR platform using your own Twitter handle (click images below to enlarge):

AIDR login

You’ll then see your collection of tweets (if you already have any). In my case, you’ll see I have three. The first is a collection of English language tweets related to the Chile Earthquake. The second is a collection of Spanish tweets. The third is a collection of more than 3,000,000 tweets related to the missing Malaysia Airlines plane. A preliminary analysis of these tweets is available here.

AIDR collections

Lets look more closely at my Chile Earthquake 2014 collection (see below, click to enlarge). I’ve collected about a quarter of a million tweets in the past 30 hours or so. The label “Downloaded tweets (since last re-start)” simply refers to the number of tweets I’ve collected since adding a new keyword or hashtag to my collection. I started the collection yesterday at 5:39am my time (yes, I’m an early bird). Under “Keywords” you’ll see all the hashtags and keywords I’ve used to search for tweets related to the earthquake in Chile. I’ve also specified the geographic region I want to collect tweets from. Don’t worry, you don’t actually have to enter geographic coordinates when you set up your own collection, you simply highlight (on map) the area you’re interested in and AIDR does the rest.

AIDR - Chile Earthquake 2014

You’ll also note in the above screenshot that I’ve selected to only collect tweets in English, but you can collect all language tweets if you’d like or just a select few. Finally, the Collaborators section simply lists the colleagues I’ve added to my collection. This gives them the ability to add new keywords/hashtags and to download the tweets collected as shown below (click to enlarge). More specifically, collaborators can download the most recent 100,000 tweets (and also share the link with others). The 100K tweet limit is based on Twitter’s Terms of Service (ToS). If collaborators want all the tweets, Twitter’s ToS allows for sharing the TweetIDs for an unlimited number of tweets.

AIDR download CSV

So that’s the AIDR Collector. We also have the AIDR Classifier, which helps you make sense of the tweets you’re collecting (in real-time). That is, your collection of tweets doesn’t stop, it continues growing, and as it does, you can make sense of new tweets as they come in. With the Classifier, you simply teach AIDR to classify tweets into whatever topics you’re interested in, like “Infrastructure Damage”, for example. To get started with the AIDR Classifier, simply return to the “Details” tab of our Chile collection. You’ll note the “Go To Classifier” button on the far right:

AIDR go to Classifier

Clicking on that button allows you to create a Classifier, say on the topic of disaster damage in general. So you simply create a name for your Classifier, in this case “Disaster Damage” and then create Tags to capture more details with respect to damage-related tweets. For example, one Tag might be, say, “Damage to Transportation Infrastructure.” Another could be “Building Damage.” In any event, once you’ve created your Classifier and corresponding tags, you click Submit and find your way to this page (click to enlarge):

AIDR Classifier Link

You’ll notice the public link for volunteers. That’s basically the interface you’ll use to teach AIDR. If you want to teach AIDR by yourself, you can certainly do so. You also have the option of “crowdsourcing the teaching” of AIDR. Clicking on the link will take you to the page below.

AIDR to MicroMappers

So, I called my Classifier “Message Contents” which is not particularly insightful; I should have labeled it something like “Humanitarian Information Needs” or something, but bear with me and lets click on that Classifier. This will take you to the following Clicker on MicroMappers:

MicroMappers Clicker

Now this is not the most awe-inspiring interface you’ve ever seen (at least I hope not); reason being that this is simply our very first version. We’ll be providing different “skins” like the official MicroMappers skin (below) as well as a skin that allows you to upload your own logo, for example. In the meantime, note that AIDR shows every tweet to at least three different volunteers. And only if each of these 3 volunteers agree on how to classify a given tweet does AIDR take that into consideration when learning. In other words, AIDR wants to ensure that humans are really sure about how to classify a tweet before it decides to learn from that lesson. Incidentally, The MicroMappers smartphone app for the iPhone and Android will be available in the next few weeks. But I digress.

Yolanda TweetClicker4

As you and/or your volunteers classify tweets based on the Tags you created, AIDR starts to learn—hence the AI (Artificial Intelligence) in AIDR. AIDR begins to recognize that all the tweets you classified as “Infrastructure Damage” are indeed similar. Once you’ve tagged enough tweets, AIDR will decide that it’s time to leave the nest and fly on it’s own. In other words, it will start to auto-classify incoming tweets in real-time. (At present, AIDR can auto-classify some 30,000 tweets per minute; compare this to the peak rate of 16,000 tweets per minute observed during Hurricane Sandy).

Of course, AIDR’s first solo “flights” won’t always go smoothly. But not to worry, AIDR will let you know when it needs a little help. Every tweet that AIDR auto-tags comes with a Confidence level. That is, AIDR will let you know: “I am 80% sure that I correctly classified this tweet”. If AIDR has trouble with a tweet, i.e., if it’s confidence level is 65% or below, the it will send the tweet to you (and/or your volunteers) so it can learn from how you classify that particular tweet. In other words, the more tweets you classify, the more AIDR learns, and the higher AIDR’s confidence levels get. Fun, huh?

To view the results of the machine tagging, simply click on the View/Download tab, as shown below (click to enlarge). The page shows you the latest tweets that have been auto-tagged along with the Tag label and the confidence score. (Yes, this too is the first version of that interface, we’ll make it more user-friendly in the future, not to worry). In any event, you can download the auto-tagged tweets in a CSV file and also share the download link with your colleagues for analysis and so on. At some point in the future, we hope to provide a simple data visualization output page so that you can easily see interesting data trends.

AIDR Results

So that’s basically all there is to it. If you want to learn more about how it all works, you might fancy reading this research paper (PDF). In the meantime, I’ll simply add that you can re-use your Classifiers. If (when?) another earthquake strikes Chile, you won’t have to start from scratch. You can auto-tag incoming tweets immediately with the Classifier you already have. Plus, you’ll be able to share your classifiers with your colleagues and partner organizations if you like. In other words, we’re envisaging an “App Store” of Classifiers based on different hazards and different countries. The more we re-use our Classifiers, the more accurate they will become. Everybody wins.

And voila, that is AIDR (at least our first version). If you’d like to test the platform and/or want the tweets from the Chile Earthquake, simply get in touch!

bio

Note:

  • We’re adapting AIDR so that it can also classify text messages (SMS).
  • AIDR Classifiers are language specific. So if you speak Spanish, you can create a classifier to tag all Spanish language tweets/SMS that refer to disaster damage, for example. In other words, AIDR does not only speak English : )

Launching a Search and Rescue Challenge for Drone / UAV Pilots

My colleague Timothy Reuter (of AidDroids fame) kindly invited me to co-organize the Drone/UAV Search and Rescue Challenge for the DC Drone User Group. The challenge will take place on May 17th near Marshall in Virginia. The rules for the competition are based on the highly successful Search/Rescue challenge organized by my new colleague Chad with the North Texas Drone User Group. We’ll pretend that a person has gone missing by scattering (over a wide area) various clues such as pieces of clothing & personal affects. Competitors will use their UAVs to collect imagery of the area and will have 45 minutes after flying to analyze the imagery for clues. The full set of rules for our challenge are listed here but may change slightly as we get closer to the event.

searchrescuedrones

I want to try something new with this challenge. While previous competitions have focused exclusively on the use of drones/UAVs for the “Search” component of the challenge, I want to introduce the option of also engaging in the “Rescue” part. How? If UAVs identify a missing person, then why not provide that person with immediate assistance while waiting for the Search and Rescue team to arrive on site? The UAV could drop a small and light-weight first aid kit, or small water bottle, or even a small walkie talkie. Enter my new colleague Euan Ramsay who has been working on a UAV payloader solution for Search and Rescue; see the video demo below. Euan, who is based in Switzerland, has very kindly offered to share several payloader units for our UAV challenge. So I’ll be meeting up with him next month to take the units back to DC for the competition.

Another area I’d like to explore for this challenge is the use of crowdsourcing to analyze the aerial imagery & video footage. As noted here, the University of Central Lancashire used crowdsourcing in their UAV Search and Rescue pilot project last summer. This innovative “crowdsearching” approach is also being used to look for Malaysia Flight 370 that went missing several weeks ago. I’d really like to have this crowdsourcing element be an option for the DC Search & Rescue challenge.

UAV MicroMappers

My team and I at QCRI have developed a platform called MicroMappers, which can easily be used to crowdsource the analysis of UAV pictures and videos. The United Nations (OCHA) used MicroMappers in response to Typhoon Yolanda last year to crowdsource the tagging pictures posted on Twitter. Since then we’ve added video tagging capability. So one scenario for the UAV challenge would be for competitors to upload their imagery/videos to MicroMappers and have digital volunteers look through this content for clues of our fake missing person.

In any event, I’m excited to be collaborating with Timothy on this challenge and will be share updates on iRevolution on how all this pans out.

bio

See also:

  • Using UAVs for Search & Rescue [link]
  • Crowdsourcing Analysis of UAV Imagery for Search and Rescue [link]
  • How UAVs are Making a Difference in Disaster Response [link]
  • Grassroots UAVs for Disaster Response [link]

Results of the Crowdsourced Search for Malaysia Flight 370 (Updated)

Update: More than 3 million volunteers thus far have joined the crowdsourcing efforts to locate the missing Malaysian Airlines plane. These digital volunteers have viewed over a quarter-of-a-billion micro-maps and have tagged almost 3 million features in these satellite maps. Source of update.

Malaysian authorities have now gone on record to confirm that Flight 370 was hijacked, which reportedly explains why contact with the passenger jet abruptly ceased a week ago. The Search & Rescue operations now involve 13 countries around the world and over 100 ships, helicopters and airplanes. The costs of this massive operation must easily be running into the millions of dollars.

FlightSaR

Meanwhile, a free crowdsourcing platform once used by digital volunteers to search for Genghis Khan’s Tomb and displaced populations in Somalia (video below) has been deployed to search high-resolution satellite imagery for signs of the missing airliner. This is not the first time that crowdsourced satellite imagery analysis has been used to find a missing plane but this is certainly the highest profile operation yet, which may explain why the crowdsourcing platform used for the search (Tomnod) reportedly crashed for over a dozen of hours since the online search began. (Note that Zooniverse can easily handle this level of traffic). Click on the video below to learn more about the crowdsourced search for Genghis Khan and displaced peoples in Somalia.

NatGeoVideo

Having current, high-resolution satellite imagery is almost as good as having your own helicopter. So the digital version of these search operations includes tens of thousands of digital helicopters, whose virtual pilots are covering over 2,000 square miles of Thailand’s Gulf right from their own computers. They’re doing this entirely for free, around the clock and across multiple time zones. This is what Digital Humanitarians have been doing ever since the 2010 Haiti Earthquake, and most recently in response to Typhoon Yolanda.

Tomnod has just released the top results of the crowdsourced digital search efforts, which are displayed in the short video below. Like other microtasking platforms, Tomnod uses triangulation to calculate areas of greatest consensus by the crowd. This is explained further here. Note: The example shown in the video is NOT a picture of Flight 370 but perhaps of an airborne Search & Rescue plane.

While looking for evidence of the missing airliner is like looking for the proverbial needle in a massive stack of satellite images, perhaps the biggest value-added of this digital search lays in identifying where the aircraft is most definitely not located—that is, approaching this crowdsourced operation as a process of elimination. Professional imagery analysts can very easily and quickly review images tagged by the crowd, even if they are mistakenly tagged as depicting wreckage. In other words, the crowd can provide the first level filter so that expert analysts don’t waste their time looking at thousands of images of bare oceans. Basically, if the mandate is to leave no stone unturned, then the crowd can do that very well.

In sum, crowdsourcing can reduce the signal to noise ratio so that experts can focus more narrowly on analyzing the potential signals. This process may not be perfect just yet but it can be refined and improved. (Note that professionals also get it wrong, like Chinese analysts did with this satellite image of the supposed Malaysian airliner).

If these digital efforts continue and Flight 370 has indeed been hijacked, then this will certainly be the first time that crowdsourced satellite imagery analysis is used to find a hijacked aircraft. The latest satellite imagery uploaded by Tomnod is no longer focused on bodies of water but rather land. The blue strips below (left) is the area that the new satellite imagery covers.

Tomnod New Imagery 2

Some important questions will need to be addressed if this operation is indeed extended. What if the hijackers make contact and order the cessation of all offline and online Search & Rescue operations? Would volunteers be considered “digital combatants,” potentially embroiled in political conflict in which the lives of 227 hostages are at stake?

bio

Note: The Google Earth containing the top results of the search is available here.

See also: Analyzing Tweets on Malaysia Flight #MH370 [link]

Using Social Media to Predict Economic Activity in Cities

Economic indicators in most developing countries are often outdated. A new study suggests that social media may provide useful economic signals when traditional economic data is unavailable. In “Taking Brazil’s Pulse: Tracking Growing Urban Economies from Online Attention” (PDF), the authors accurately predict the GDPs of 45 Brazilian cities by analyzing data from a popular micro-blogging platform (Yahoo Meme). To make these predictions, the authors used the concept of glocality, which notes that “economically successful cities tend to be involved in interactions that are both local and global at the same time.” The results of the study reveals that “a city’s glocality, measured with social media data, effectively signals the city’s economic well-being.”

The authors are currently expanding their work by predicting social capital for these 45 cities based on social media data. As iRevolution readers will know, I’ve blogged extensively on using social media to measure social capital footprints at the city and sub-city level. So I’ve contacted the authors of the study and look forward to learning more about their research. As they rightly note:

“There is growing interesting in using digital data for development opportunities, since the number of people using social media is growing rapidly in developing countries as well. Local impacts of recent global shocks – food, fuel and financial – have proven not to be immediately visible and trackable, often unfolding ‘beneath the radar of traditional monitoring systems’. To tackle that problem, policymakers are looking for new ways of monitoring local impacts [...].”


bio

Using Crowd Computing to Analyze UAV Imagery for Search & Rescue Operations

My brother recently pointed me to this BBC News article on the use of drones for Search & Rescue missions in England’s Lake District, one of my favorite areas of the UK. The picture below is one I took during my most recent visit. In my earlier blog post on the use of UAVs for Search & Rescue operations, I noted that UAV imagery & video footage could be quickly analyzed using a microtasking platform (like MicroMappers, which we used following Typhoon Yolanda). As it turns out, an enterprising team at the University of Central Lancashire has been using microtasking as part of their UAV Search & Rescue exercises in the Lake District.

Lake District

Every year, the Patterdale Mountain Rescue Team assists hundreds of injured and missing persons in the North of the Lake District. “The average search takes several hours and can require a large team of volunteers to set out in often poor weather conditions.” So the University of Central Lancashire teamed up with the Mountain Rescue Team to demonstrate that UAV technology coupled with crowdsourcing can reduce the time it takes to locate and rescue individuals.

The project, called AeroSee Experiment, worked as follows. The Mountain Rescue service receives a simulated distress call. As they plan their Search & Rescue operation, the University team dispatches their UAV to begin the search. Using live video-streaming, the UAV automatically transmits pictures back to the team’s website where members of the public can tag pictures that members of the Mountain Rescue service should investigate further. These tagged pictures are then forwarded to “the Mountain Rescue Control Center for a final opinion and dispatch of search teams.” Click to enlarge the diagram below.

AeroSee

Members of the crowd would simply log on to the AeroSee website and begin tagging. Although the experiment is over, you can still do a Practice Run here. Below is a screenshot of the microtasking interface (click to enlarge). One picture at a time is displayed. If the picture displays potentially important clues, then the digital volunteer points to said area of the picture and types in why they believe the clue they’re pointing at might be important.

AeroSee MT2

The results were impressive. A total of 335 digital volunteers looked through 11,834 pictures and the “injured” walker (UAV image below) was found within 69 seconds of the picture being uploaded to microtasking website. The project team subsequently posted this public leaderboard to acknowledge all volunteers who participated, listing their scores and levels of accuracy for feedback purposes.

Aero MT3

Upon further review of the data and results, the project team concluded that the experiment was a success and that digital Search & Rescue volunteers were able to “home in on the location of our missing person before the drones had even landed!” The texts added to the tagged images were also very descriptive, which helped the team “locate the casualty very quickly from the more tentative tags on other images.”

If the area being surveyed during a Search & Rescue operation is fairly limited, then using the crowd to process UAV images is a quick and straightforward, especially if the crowd is relatively large. We have over 400 digital humanitarian volunteers signed up for MicroMappers (launched in November 2013) and hope to grow this to 1,000+ in 2014. But for much larger areas, like Kruger National Park, one would need far more volunteers. Kruger covers 7,523 square miles compared to the Lake District’s 885 square miles.

kruger-gate-sign

One answer to this need for more volunteers could be the good work that my colleagues over at Zooniverse are doing. Launched in February 2009, Zooniverse has a unique volunteer base of one million volunteers. Another solution is to use machine computing to prioritize the flights paths of UAVs in the first place, i.e., use advanced algorithms to considerably reduce the search area by ruling out areas that missing people or other objects of interest (like rhinos in Kruger) are highly unlikely to be based on weather, terrain, season and other data.

This is the area that my colleague Tom Snitch works in. As he noted in this recent interview (PDF), “We want to plan a flight path for the drone so that the number of unprotected animals is as small as possible.” To do this, he and his team use “exquisite mathematics and complex algorithms” to learn how “animals, rangers and poachers move through space and time.” In the case Search & Rescue, ruling out areas that are too steep and impossible for humans to climb or walk through could go a long way to reducing the search area not to mention the search time.

bio

See also:

  • Using UAVs for Search & Rescue [link]
  • MicroMappers: Microtasking for Disaster Response [link]
  • Results of MicroMappers Response to Typhoon Yolanda [link]
  • How UAVs are Making a Difference in Disaster Response [link]
  • Crowdsourcing Evaluation of Sandy Building Damage [link]

Rapid Disaster Damage Assessments: Reality Check

The Multi-Cluster/Sector Initial Rapid Assessment (MIRA) is the methodology used by UN agencies to assess and analyze humanitarian needs within two weeks of a sudden onset disaster. A detailed overview of the process, methodologies and tools behind MIRA is available here (PDF). These reports are particularly insightful when comparing them with the processes and methodologies used by digital humanitarians to carry out their rapid damage assessments (typically done within 48-72 hours of a disaster).

MIRA PH

Take the November 2013 MIRA report for Typhoon Haiyan in the Philippines. I am really impressed by how transparent the report is vis-à-vis the very real limitations behind the assessment. For example:

  • “The barangays [districts] surveyed do not constitute a represen-tative sample of affected areas. Results are skewed towards more heavily impacted municipalities [...].”
  • “Key informant interviews were predominantly held with baranguay captains or secretaries and they may or may not have included other informants including health workers, teachers, civil and worker group representatives among others.”
  • Barangay captains and local government staff often needed to make their best estimate on a number of questions and therefore there’s considerable risk of potential bias.”
  • Given the number of organizations involved, assessment teams were not trained in how to administrate the questionnaire and there may have been confusion on the use of terms or misrepresentation on the intent of the questions.”
  • “Only in a limited number of questions did the MIRA checklist contain before and after questions. Therefore to correctly interpret the information it would need to be cross-checked with available secondary data.”

In sum: The data collected was not representative; The process of selecting interviewees was biased given that said selection was based on a convenience sample; Interviewees had to estimate (guesstimate?) the answer for several questions, thus introducing additional bias in the data; Since assessment teams were not trained to administrate the questionnaire, this also introduces the problem of limited inter-coder reliability and thus limits the ability to compare survey results; The data still needs to be validated with secondary data.

I do not share the above to criticize, only to relay what the real world of rapid assessments resembles when you look “under the hood”. What is striking is how similar the above challenges are to the those that digital humanitarians have been facing when carrying out rapid damage assessments. And yet, I distinctly recall rather pointed criticisms leveled by professional humanitarians against groups using social media and crowdsourcing for humanitarian response back in 2010 & 2011. These criticisms dismissed social media reports as being unrepresentative, unreliable, fraught with selection bias, etc. (Some myopic criticisms continue to this day). I find it rather interesting that many of the shortcomings attributed to crowdsourcing social media reports are also true of traditional information collection methodologies like MIRA.

The fact is this: no data or methodology is perfect. The real world is messy, both off- and online. Being transparent about these limitations is important, especially for those who seek to combine both off- and online methodologies to create more robust and timely damage assessments.

bio

Yes, I’m Writing a Book (on Digital Humanitarians)

I recently signed a book deal with Taylor & Francis Press. The book, which is tentatively titled “Digital Humanitarians: How Big Data is Changing the Face of Disaster Response,” is slated to be published next year. The book will chart the rise of digital humanitarian response from the Haiti Earthquake to 2015, highlighting critical lessons learned and best practices. To this end, the book will draw on real-world examples of digital humanitarians in action to explain how they use new technologies and crowdsourcing to make sense of “Big (Crisis) Data”. In sum, the book will describe how digital humanitarians & humanitarian technologies are together reshaping the humanitarian space and what this means for the future of disaster response. The purpose of this book is to inspire and inform the next generation of (digital) humanitarians while serving as a guide for established humanitarian organizations & emergency management professionals who wish to take advantage of this transformation in humanitarian response.

2025

The book will thus consolidate critical lessons learned in digital humanitarian response (such as the verification of social media during crises) so that members of the public along with professionals in both international humanitarian response and domestic emergency management can improve their own relief efforts in the face of “Big Data” and rapidly evolving technologies. The book will also be of interest to academics and students who wish to better understand methodological issues around the use of social media and user-generated content for disaster response; or how technology is transforming collective action and how “Big Data” is disrupting humanitarian institutions, for example. Finally, this book will also speak to those who want to make a difference; to those who of you who may have little to no experience in humanitarian response but who still wish to help others affected during disasters—even if you happen to be thousands of miles away. You are the next wave of digital humanitarians and this book will explain how you can indeed make a difference.

The book will not be written in a technical or academic writing style. Instead, I’ll be using a more “storytelling” form of writing combined with a conversational tone. This approach is perfectly compatible with the clear documentation of critical lessons emerging from the rapidly evolving digital humanitarian space. This conversational writing style is not at odds with the need to explain the more technical insights being applied to develop next generation humanitarian technologies. Quite on the contrary, I’ll be using intuitive examples & metaphors to make the most technical details not only understandable but entertaining.

While this journey is just beginning, I’d like to express my sincere thanks to my mentors for their invaluable feedback on my book proposal. I’d also like to express my deep gratitude to my point of contact at Taylor & Francis Press for championing this book from the get-go. Last but certainly not least, I’d like to sincerely thank the Rockefeller Foundation for providing me with a residency fellowship this Spring in order to accelerate my writing.

I’ll be sure to provide an update when the publication date has been set. In the meantime, many thanks for being an iRevolution reader!

bio