Category Archives: Social Computing

Computing Research Institutes as an Innovation Pathway for Humanitarian Technology

The World Humanitarian Summit (WHS) is an initiative by United Nations Secretary-General Ban Ki-moon to improve humanitarian action. The Summit, which is to be held in 2016, stands to be one of the most important humanitarian conferences in a decade. One key pillar of WHS is humanitarian innovation. “Transformation through Innovation” is the WHS Working Group dedicated to transforming humanitarian action by focusing explicitly on innovation. I have the pleasure of being a member of this working group where my contribution focuses on the role of new technologies, data science and advanced computing. As such, I’m working on an applied study to explore the role of computing research institutes as an innovation pathway for humanitarian technology. The purpose of this blog post is to invite feedback on the ideas presented below.

WHS_Logo_0

I first realized that the humanitarian community faced a “Big Data” challenge in 2010, just months after I had joined Ushahidi as Director of Crisis Mapping, and just months after co-founding CrisisMappers: The Humanitarian Technology Network. The devastating Haiti Earthquake resulted in a massive overflow of information generated via mainstream news, social media, text messages and satellite imagery. I launched and spearheaded the Haiti Crisis Map at the time and together with hundreds of digital volunteers from all around the world went head-to head with Big Data. As noted in my forthcoming book, we realized there and then that crowdsourcing and mapping software alone were no match for Big (Crisis) Data.

Digital Humanitarians: The Book

This explains why I decided to join an advanced computing research institute, namely QCRI. It was clear to me after Haiti that humanitarian organizations had to partner directly with advanced computing experts to manage the new Big Data challenge in disaster response. So I “embedded” myself in an institute with leading experts in Big Data Analytics, Data Science and Social Computing. I believe that computing research institutes (CRI’s) can & must play an important role in fostering innovation in next generation humanitarian technology by partnering with humanitarian organizations on research & development (R&D).

There is already some evidence to support this proposition. We (QCRI) teamed up with the UN Office for the Coordination of Humanitarian Affairs (OCHA) to create the Artificial Intelligence for Disaster Response platform, AIDR as well as MicroMappers. We are now extending AIDR to analyze text messages (SMS) in partnership with UNICEF. We are also spearheading efforts around the use and analysis of aerial imagery (captured via UAVs) for disaster response (see the Humanitarian UAV Network: UAViators). On the subject of UAVs, I believe that this new technology presents us (in the WHS Innovation team) with an ideal opportunity to analyze in “real time” how a new, disruptive technology gets adopted within the humanitarian system. In addition to UAVs, we catalyzed a partnership with Planet Labs and teamed up with Zooniverse to take satellite imagery analysis to the next level with large scale crowd computing. To this end, we are working with humanitarian organizations to enable them to make sense of Big Data generated via social media, SMS, aerial imagery & satellite imagery.

The incentives for humanitarian organizations to collaborate with CRI’s are obvious, especially if the latter (like QCRI) commits to making the resulting prototypes freely accessible and open source. But why should CRI’s collaborate with humanitarian organizations in the first place? Because the latter come with real-world challenges and unique research questions that many computer scientists are very interested in for several reasons. First, carrying out scientific research on real-world problems is of interest to the vast majority of computer scientists I collaborate with, both within QCRI and beyond. These scientists want to apply their skills to make the world a better place. Second, the research questions that humanitarian organizations bring enable computer scientists to differentiate themselves in the publishing world. Third, the resulting research can help advanced the field of computer science and advanced computing.

So why are we see not seeing more collaboration between CRI’s & humanitarian organizations? Because of this cognitive surplus mismatch. It takes a Director of Social Innovation (or related full-time position) to serve as a translational leader between CRI’s and humanitarian organizations. It takes someone (ideally a team) to match the problem owners and problem solvers; to facilitate and manage the collaboration between these two very different types of expertise and organizations. In sum, CRI’s can serve as an innovation pathway if the following three ingredients are in place: 1) Translation Leader; 2) Committed CRI; and 3) Committed Humanitarian Organization. These are necessary but not sufficient conditions for success.

While research institutes have a comparative advantage in R&D, they are not the best place to scale humanitarian technology prototypes. In order to take these prototypes to the next level, make them sustainable and have them develop into enterprise level software, they need to be taken up by for-profit companies. The majority of CRI’s (QCRI included) actually do have a mandate to incubate start-up companies. As such, we plan to spin-off some of the above platforms as independent companies in order to scale the technologies in a robust manner. Note that the software will remain free to use for humanitarian applications; other uses of the platform will require a paid license. Therein lies the end-to-end innovation path that computing research institutes can offer humanitarian organization vis-a-vis next generation humanitarian technologies.

As noted above, part of my involvement with the WHS Innovation Team entails working on an applied study to document and replicate this innovation pathway. As such, I am looking for feedback on the above as well as on the research methodology described below.

I plan to interview Microsoft Research, IBM Research, Yahoo Research, QCRI and other institutes as part of this research. More specifically, the interview questions will include:

  • Have you already partnered with humanitarian organizations? Why/why not?
  • If you have partnered with humanitarian organizations, what was the outcome? What were the biggest challenges? Was the partnership successful? If so, why? If not, why not?
  • If you have not yet partnered with humanitarian organizations, why not? What factors would be conducive to such partnerships and what factors serve as hurdles?
  • What are your biggest concerns vis-a-vis working with humanitarian groups?
  • What funding models did you explore if any?

I also plan to interview humanitarian organizations to better understand the prospects for this potential innovation pathway. More specifically, I plan to interview ICRC, UNHCR, UNICEF and OCHA using the following questions:

  • Have you already partnered with computing research groups? Why/why not?
  • If you have partnered with computing research groups, what was the outcome? What were the biggest challenges? Was the partnership successful? If so, why? If not, why not?
  • If you have not yet partnered with computing research groups, why not? What factors would be conducive to such partnerships and what factors serve as hurdles?
  • What are your biggest concerns vis-a-vis working with computing research groups?
  • What funding models did you explore if any?

My plan is to carry out the above semi-structured interviews in February-March 2015 along with secondary research. My ultimate aim with this deliverable is to develop a model to facilitate greater collaboration between computing research institutes and humanitarian organizations. To this end, I welcome feedback on all of the above (feel free to email me and/or add comments below). Thank you.

Bio

See also:

  • Research Framework for Next Generation Humanitarian Technology and Innovation [link]
  • From Gunfire at Sea to Maps of War: Implications for Humanitarian Innovation [link]

Establishing Social Media Hashtag Standards for Disaster Response

The UN Office for the Coordination of Humanitarian Affairs (OCHA) has just published an important, must-read report on the use of social media for disaster response. As noted by OCHA, this document was inspired by conversations with my team and I at QCRI. We jointly recognize that innovation in humanitarian technology is not enough. What is needed—and often lacking—is innovation in policymaking. Only then can humanitarian technology have widespread impact. This new think piece by OCHA seeks to catalyze enlightened policymaking.

ebolatags

I was pleased to provide feedback on earlier drafts of this new study and look forward to discussing the report’s recommendations with policymakers across the humanitarian space. In the meantime, many thanks to Roxanne Moore and Andrej Verity for making this report a reality. As Andrej notes in his blog post on this new study, the Filipino Government has just announced that “twitter will become another source of information for the Philippines official emergency response mechanism,” which will lead to an even more pressing Big (Crisis) Data challenge. The use of standardized hashtags will thus be essential.

hashtags-cartoon

The overflow of information generated during disasters can be as paralyzing to disaster response as the absence of information. While information scarcity has long characterized our information landscapes, today’s information-scapes are increasingly marked by an overflow of information—Big Data. To this end, encouraging the proactive standardization of hashtags may be one way to reduce this Big Data challenge. Indeed, standardized hashtags—i.e., more structured information—would enable paid emergency responders (as well as affected communities) to “better leverage crowdsourced information for operational planning and response.” At present, the Government of the Philippines seems to be the few actors that actually endorse the use of specific hashtags during major disasters as evidenced by their official crisis hashtags strategy.

The OCHA report thus proposes three hashtag standards and also encourages social media users to geo-tag their content during disasters. The latter can be done by enabling auto-GPS tagging or by using What3Words. Users should of course be informed of data-privacy considerations when geo-tagging their reports. As for the three hashtag standards:

  1. Early standardization of hashtags designating a specific disaster
  2. Standard, non-changing hashtag for reporting non-emergency needs
  3. Standard, non-changing hashtags for reporting emergency needs

1. As the OCHA think piece rightly notes, “News stations have been remarkably successful in encouraging early standardization of hashtags, especially during political events.” OCHA thus proposes that humanitarian organizations take a “similar approach for emergency response reporting and develop partnerships with Twitter as well as weather and news teams to publicly encourage such standardization. Storm cycles that create hurricanes and cyclones are named prior to the storm. For these events, an official hashtag should be released at the same time as the storm announcement.” For other hazards, “emergency response agencies should monitor the popular hashtag identifying a disaster, while trying to encourage a standard name.”

2. OCHA advocates for the use of #iSee, #iReport or #PublicRep for members of the public to designate tweets that refer to non-emergency needs such as “power lines, road closures, destroyed bridges, large-scale housing damage, population displacement or geographic spread (e.g., fire or flood).” When these hashtags are accompanied with GPS information, “responders can more easily identify and verify the information, therefore supporting more timely response & facilitating recovery.” In addition, responders can more easily create live crisis maps on the fly thanks to this structured, geo-tagged information.

3. As for standard hashtags for emergency reports, OCHA notes emergency calls are starting to give way to emergency SMS’s. Indeed, “Cell phone users will soon be able to send an SMS to a toll-free phone number. For emergency reporting, this new technology could dramatically alter the way the public interacts with nation-based emergency response call centers. It does not take a large imaginary leap to see the potential move from SMS emergency calls to social media emergency calls. Hashtags could be one way to begin reporting emergencies through social media.”

Most if not all countries have national emergency phone numbers already. So OCHA suggests using these existing, well-known numbers as the basis for social media hashtags. More specifically, an emergency hashtag would be composed of the country’s emergency number (such as 911 in the US, 999 in the UK, 133 in Austria, etc) followed by the country’s two-letter code (US, UK, AT respectively). In other words: #911US, #999UK, #133AT. Some countries, like Austria, have different emergency phone numbers for different types of emergencies. So these could also be used accordingly. OCHA recognizes that many “federal agencies fear that such a system would result in people reporting through social media outside of designated monitoring times. This is a valid concern. However, as with the implementation of any new technology in the public service, it will take time and extensive promotion to ensure effective use.”

Digital Humanitarians: The Book

Of course, “no monitoring system will be perfect in terms of low-cost, real-time analysis and high accuracy.” OCHA knows very well that there are a number of important limitations to the system they propose above. To be sure, “significant steps need to be taken to ensure that information flows from the public to response agencies and back to the public through improved efforts.” This is an important theme in my forthcoming book “Digital Humanitarians.”

bio

See also:

  • Social Media & Emergency Management: Supply and Demand [link]
  • Using AIDR to Automatically Classify Disaster Tweets [link]

Using Flash Crowds to Automatically Detect Earthquakes & Impact Before Anyone Else

It is said that our planet has a new nervous system; a digital nervous system comprised of digital veins and intertwined sensors that capture the pulse of our planet in near real-time. Next generation humanitarian technologies seek to leverage this new nervous system to detect and diagnose the impact of disasters within minutes rather than hours. To this end, LastQuake may be one of the most impressive humanitarian technologies that I have recently come across. Spearheaded by the European-Mediterranean Seismological Center (EMSC), the technology combines “Flashsourcing” with social media monitoring to auto-detect earthquakes before they’re picked up by seismometers or anyone else.

Screen Shot 2014-10-23 at 5.08.30 PM

Scientists typically draw on ground-motion prediction algorithms and data on building infrastructure to rapidly assess an earthquake’s potential impact. Alas, ground-motion predictions vary significantly and infrastructure data are rarely available at sufficient resolutions to accurately assess the impact of earthquakes. Moreover, a minimum of three seismometers are needed to calibrate a quake and said seismic data take several minutes to generate. This explains why the EMSC uses human sensors to rapidly collect relevant data on earthquakes as these reduce the uncertainties that come with traditional rapid impact assess-ment methodologies. Indeed, the Center’s important work clearly demonstrates how the Internet coupled with social media are “creating new potential for rapid and massive public involvement by both active and passive means” vis-a-vis earthquake detection and impact assessments. Indeed, the EMSC can automatically detect new quakes within 80-90 seconds of their occurrence while simultaneously publishing tweets with preliminary information on said quakes, like this one:

Screen Shot 2014-10-23 at 5.44.27 PM

In reality, the first human sensors (increases in web traffic) can be detected within 15 seconds (!) of a quake. The EMSC’s system continues to auto-matically tweet relevant information (including documents, photos, videos, etc.), for the first 90 minutes after it first detects an earthquake and is also able to automatically create a customized and relevant hashtag for individual quakes.

Screen Shot 2014-10-23 at 5.51.05 PM

How do they do this? Well, the team draw on two real-time crowdsourcing methods that “indirectly collect information from eyewitnesses on earthquakes’ effects.” The first is TED, which stands for Twitter Earthquake Detection–a system developed by the US Geological Survey (USGS). TED filters tweets by key word, location and time to “rapidly detect sharing events through increases in the number of tweets” related to an earthquake. The second method, called “flashsourcing” was developed by the European-Mediterranean to analyze traffic patterns on its own website, “a popular rapid earthquake information website.” The site gets an average of 1.5 to 2 million visits a month. Flashsourcing allows the Center to detect surges in web traffic that often occur after earthquakes—a detection method named Internet Earthquake Detection (IED). These traffic surges (“flash crowds”) are caused by “eyewitnesses converging on its website to find out the cause of their shaking experience” and can be detected by analyzing the IP locations of website visitors.

It is worth emphasizing that both TED and IED work independently from traditional seismic monitoring systems. Instead, they are “based on real-time statistical analysis of Internet-based information generated by the reaction of the public to the shaking.” As EMSC rightly notes in a forthcoming peer-reviewed scientific study, “Detections of felt earthquakes are typically within 2 minutes for both methods, i.e., considerably faster than seismographic detections in poorly instrumented regions of the world.” TED and IED are highly complementary methods since they are based on two entirely “different types of Internet use that might occur after an earthquake.” TED depends on the popularity of Twitter while IED’s effectiveness depends on how well known the EMSC website is in the area affected by an earthquake. LastQuake automatically publishes real-time information on earthquakes by automatically merging real-time data feeds from both TED and IED as well as non-crowdsourcing feeds.

infographie-CSEM-LastQuake2

Lets looks into the methodology that powers IED. Flashsourcing can be used to detect felt earthquakes and provide “rapid information (within 5 minutes) on the local effects of earthquakes. More precisely, it can automatically map the area where shaking was felt by plotting the geographical locations of statistically significant increases in traffic […].” In addition, flashsourcing can also “discriminate localities affected by alarming shaking levels […], and in some cases it can detect and map areas affected by severe damage or network disruption through the concomitant loss of Internet sessions originating from the impacted region.” As such, this “negative space” (where there are no signals) is itself an important signal for damage assessment, as I’ve argued before.

remypicIn the future, EMSC’s flashsourcing system may also be able discriminate power cuts between indoor and outdoor Internet connections at the city level since the system’s analysis of web traffic session will soon be based on web sockets rather than webserver log files. This automatic detection of power failures “is the first step towards a new system capable of detecting Internet interruptions or localized infrastructure damage.” Of course, flashsourcing alone does not “provide a full description of earthquake impact, but within a few minutes, independently of any seismic data, and, at little cost, it can exclude a number of possible damage scenarios, identify localities where no significant damage has occurred and others where damage cannot be excluded.”

Screen Shot 2014-10-23 at 5.59.20 PM

EMSC is complementing their flashsourching methodology with a novel mobile app that quickly enables smartphone users to report about felt earthquakes. Instead of requiring any data entry and written surveys, users simply click on cartoonish-type pictures that best describe the level of intensity they felt when the earthquake (or aftershocks) struck. In addition, EMSC analyzes and manually validates geo-located photos and videos of earthquake effects uploaded to their website (not from social media). The Center’s new app will also make it easier for users to post more pictures more quickly.

CSEM-tweets2

What about typical criticisms (by now broken records) that social media is biased and unreliable (and thus useless)? What about the usual theatrics about the digital divide invalidating any kind of crowdsourcing effort given that these will be heavily biased and hardly representative of the overall population? Despite these already well known short-comings and despite the fact that our inchoate digital networks are still evolving into a new nervous system for our planet, the existing nervous system—however imperfect and immature—still adds value. TED and LastQuake demonstrate this empirically beyond any shadow of a doubt. What’s more, the EMSC have found that crowdsourced, user-generated information is highly reliable: “there are very few examples of intentional misuses, errors […].”

My team and I at QCRI are honored to be collaborating with EMSC on integra-ting our AIDR platform to support their good work. AIDR enables uses to automatically detect tweets of interest by using machine learning (artificial intelligence) which is far more effective searching for keywords. I recently spoke with Rémy Bossu, one masterminds behind the EMSC’s LastQuake project about his team’s plans for AIDR:

“For us AIDR could be a way to detect indirect effects of earthquakes, and notably triggered landslides and fires. Landslides can be the main cause of earthquake losses, like during the 2001 Salvador earthquake. But they are very difficult to anticipate, depending among other parameters on the recent rainfalls. One can prepare a susceptibility map but whether there are or nor landslides, where they have struck and their extend is something we cannot detect using geophysical methods. For us AIDR is a tool which could potentially make a difference on this issue of rapid detection of indirect earthquake effects for better situation awareness.”

In other words, as soon as the EMSC system detects an earthquake, the plan is for that detection to automatically launch an AIDR deployment to automatically identify tweets related to landslides. This integration is already completed and being piloted. In sum, EMSC is connecting an impressive ecosystem of smart, digital technologies powered by a variety of methodologies. This explains why their system is one of the most impressive & proven examples of next generation humanitarian technologies that I’ve come across in recent months.

bio

Acknowledgements: Many thanks to Rémy Bossu for providing me with all the material and graphics I needed to write up this blog post.

See also:

  • Social Media: Pulse of the Planet? [link]
  • Taking Pulse of Boston Bombings [link]
  • The World at Night Through the Eyes of the Crowd [link]
  • The Geography of Twitter: Mapping the Global Heartbeat [link]

Integrating Geo-Data with Social Media Improves Situational Awareness During Disasters

A new data-driven study on the flooding of River Elbe in 2013 (one of the most severe floods ever recorded in Germany) shows that geo-data can enhance the process of extracting relevant information from social media during disasters. The authors use “specific geographical features like hydrological data and digital elevation models to prioritize crisis-relevant twitter messages.” The results demonstrate that an “approach based on geographical relations can enhance information extraction from volunteered geographic information,” which is “valuable for both crisis response and preventive flood monitoring.” These conclusions thus support a number of earlier studies that show the added value of data integration. This analysis also confirms several other key assumptions, which are important for crisis computing and disaster response.

floods elbe

The authors apply a “geographical approach to prioritize [the collection of] crisis-relevant information from social media.” More specifically, they combine information from “tweets, water level measurements & digital elevation models” to answer the following three research questions:

  • Does the spatial and temporal distribution of flood-related tweets actually match the spatial and temporal distribution of the flood phenomenon (despite Twitter bias, potentially false info, etc)?

  • Does the spatial distribution of flood-related tweets differ depending on their content?
  • Is geographical proximity to flooding a useful parameter to prioritize social media messages in order to improve situation awareness?

The authors analyzed just over 60,000 disaster-related tweets generated in Germany during the flooding of River Elbe in June 2013. Only 398 of these tweets (0.7%) contained keywords related to the flooding. The geographical distribution of flood-related tweets versus non-flood related tweets is depicted below (click to enlarge).

Screen Shot 2014-10-04 at 7.04.59 AM

As the authors note, “a considerable amount” of flood-related tweets are geo-located in areas of major flooding. So they tested the statistical correlation between the location of flood-related tweets and the actual flooding, which they found to be “statistically significantly lower compared to non-related Twitter messages.” This finding “implies that the locations of flood-related twitter messages and flood-affected catchments match to a certain extent. In particular this means that mostly people in regions affected by the flooding or people close to these regions posted twitter messages referring to the flood.” To this end, major urban areas like Munich and Hamburg were not the source of most flood-related tweets. Instead, “The majority of tweet referring to the flooding were posted by locals” closer to the flooding.

Given that “most flood-related tweets were posted by locals it seems probable that these messages contain local knowledge only available to people on site.” To this end, the authors analyzed the “spatial distribution of flood-related tweets depending on their content.” The results, depicted below (click to enlarge), show that the geographical distribution of tweets do indeed differ based on their content. This is especially true of tweets containing information about “volunteer actions” and “flood level”. The authors confirm these results are statistically significant when compared with tweets related to “media” and “other” issues.

Screen Shot 2014-10-04 at 7.22.05 AM

These findings also reveal that the content of Twitter messages can be combined into three groups given their distance to actual flooding:

Group A: flood level & volunteer related tweets are closest to the floods.
Group B: tweets on traffic conditions have a medium distance to the floods.
Group C: other and media related tweets a furthest to the flooding.

Tweets belonging to “Group A” yield greater situational awareness. “Indeed, information about current flood levels is crucial for situation awareness and can complement existing water level measurements, which are only available for determined geographical points where gauging stations are located. Since volunteer actions are increasingly organized via social media, this is a type of information which is very valuable and completely missing from other sources.”

Screen Shot 2014-10-04 at 6.55.49 AM

In sum, these results show that “twitter messages that are closest to the flood- affected areas (Group A) are also the most useful ones.” The authors thus conclude that “the distance to flood phenomena is indeed a useful parameter to prioritize twitter messages towards improving situation awareness.” To be sure, the spatial distribution of flood-related tweets is “significantly different from the spatial distribution of off-topic messages.” Whether this is also true of other social media platforms like Instagram and Flickr remains to be seen. This is an important area for future research given the increasing use of pictures posted on social media for rapid damage assessments in the aftermath of disasters.

ImageClicker

“The integration of other official datasets, e.g. precipitation data or satellite images, is another avenue for future work towards better understanding the relations between social media and crisis phenomena from a geographical perspective.” I would add both aerial imagery (captured by UAVs) and data from mainstream news (captured by GDELT) to this data fusion exercise. Of course, the geographical approach described above is not limited to the study of flooding only but could be extended to other natural hazards.

This explains why my colleagues at GeoFeedia may be on the right track with their crisis mapping platform. That said, the main limitation with GeoFeedia and the study above is the fact that only 3% of all tweets are actually geo-referenced. But this need not be a deal breaker. Instead, platforms like GeoFeedia can be complemented by other crisis computing solutions that prioritize the analysis of social media content over geography.

Take the free and open-source “Artificial Intelligence for Disaster Response” (AIDR) platform that my team and I at QCRI are developing. Humanitarian organizations can use AIDR to automatically identify tweets related to flood levels and volunteer actions (deemed to provide the most situational awareness) without requiring that tweets be geo-referenced. In addition, AIDR can also be used to identify eyewitness tweets regardless of whether they refer to flood levels, volunteering or other issues. Indeed, we already demonstrated that eyewitness tweets can be automatically identified with an accuracy of 80-90% using AIDR. And note that AIDR can also be used on geo-tagged tweets only.

The authors of the above study recently go in touch to explore ways that their insights can be used to further improve AIDR. So stay tuned for future updates on how we may integrate geo-data more directly within AIDR to improve situational awareness during disasters.

bio

See also:

  • Debating the Value of Tweets For Disaster Response (Intelligently) [link]
  • Social Media for Emergency Management: Question of Supply and Demand [link]
  • Become a (Social Media) Data Donor and Save a Life [link]

Using AIDR to Collect and Analyze Tweets from Chile Earthquake

Wish you had a better way to make sense of Twitter during disasters than this?

Type in a keyword like #ChileEarthquake in Twitter’s search box above and you’ll see more tweets than you can possibly read in a day let alone keep up with for more than a few minutes. Wish there way were an easy, free and open source solution? Well you’ve come to the right place. My team and I at QCRI are developing the Artificial Intelligence for Disaster Response (AIDR) platform to do just this. Here’s how it works:

First you login to the AIDR platform using your own Twitter handle (click images below to enlarge):

AIDR login

You’ll then see your collection of tweets (if you already have any). In my case, you’ll see I have three. The first is a collection of English language tweets related to the Chile Earthquake. The second is a collection of Spanish tweets. The third is a collection of more than 3,000,000 tweets related to the missing Malaysia Airlines plane. A preliminary analysis of these tweets is available here.

AIDR collections

Lets look more closely at my Chile Earthquake 2014 collection (see below, click to enlarge). I’ve collected about a quarter of a million tweets in the past 30 hours or so. The label “Downloaded tweets (since last re-start)” simply refers to the number of tweets I’ve collected since adding a new keyword or hashtag to my collection. I started the collection yesterday at 5:39am my time (yes, I’m an early bird). Under “Keywords” you’ll see all the hashtags and keywords I’ve used to search for tweets related to the earthquake in Chile. I’ve also specified the geographic region I want to collect tweets from. Don’t worry, you don’t actually have to enter geographic coordinates when you set up your own collection, you simply highlight (on map) the area you’re interested in and AIDR does the rest.

AIDR - Chile Earthquake 2014

You’ll also note in the above screenshot that I’ve selected to only collect tweets in English, but you can collect all language tweets if you’d like or just a select few. Finally, the Collaborators section simply lists the colleagues I’ve added to my collection. This gives them the ability to add new keywords/hashtags and to download the tweets collected as shown below (click to enlarge). More specifically, collaborators can download the most recent 100,000 tweets (and also share the link with others). The 100K tweet limit is based on Twitter’s Terms of Service (ToS). If collaborators want all the tweets, Twitter’s ToS allows for sharing the TweetIDs for an unlimited number of tweets.

AIDR download CSV

So that’s the AIDR Collector. We also have the AIDR Classifier, which helps you make sense of the tweets you’re collecting (in real-time). That is, your collection of tweets doesn’t stop, it continues growing, and as it does, you can make sense of new tweets as they come in. With the Classifier, you simply teach AIDR to classify tweets into whatever topics you’re interested in, like “Infrastructure Damage”, for example. To get started with the AIDR Classifier, simply return to the “Details” tab of our Chile collection. You’ll note the “Go To Classifier” button on the far right:

AIDR go to Classifier

Clicking on that button allows you to create a Classifier, say on the topic of disaster damage in general. So you simply create a name for your Classifier, in this case “Disaster Damage” and then create Tags to capture more details with respect to damage-related tweets. For example, one Tag might be, say, “Damage to Transportation Infrastructure.” Another could be “Building Damage.” In any event, once you’ve created your Classifier and corresponding tags, you click Submit and find your way to this page (click to enlarge):

AIDR Classifier Link

You’ll notice the public link for volunteers. That’s basically the interface you’ll use to teach AIDR. If you want to teach AIDR by yourself, you can certainly do so. You also have the option of “crowdsourcing the teaching” of AIDR. Clicking on the link will take you to the page below.

AIDR to MicroMappers

So, I called my Classifier “Message Contents” which is not particularly insightful; I should have labeled it something like “Humanitarian Information Needs” or something, but bear with me and lets click on that Classifier. This will take you to the following Clicker on MicroMappers:

MicroMappers Clicker

Now this is not the most awe-inspiring interface you’ve ever seen (at least I hope not); reason being that this is simply our very first version. We’ll be providing different “skins” like the official MicroMappers skin (below) as well as a skin that allows you to upload your own logo, for example. In the meantime, note that AIDR shows every tweet to at least three different volunteers. And only if each of these 3 volunteers agree on how to classify a given tweet does AIDR take that into consideration when learning. In other words, AIDR wants to ensure that humans are really sure about how to classify a tweet before it decides to learn from that lesson. Incidentally, The MicroMappers smartphone app for the iPhone and Android will be available in the next few weeks. But I digress.

Yolanda TweetClicker4

As you and/or your volunteers classify tweets based on the Tags you created, AIDR starts to learn—hence the AI (Artificial Intelligence) in AIDR. AIDR begins to recognize that all the tweets you classified as “Infrastructure Damage” are indeed similar. Once you’ve tagged enough tweets, AIDR will decide that it’s time to leave the nest and fly on it’s own. In other words, it will start to auto-classify incoming tweets in real-time. (At present, AIDR can auto-classify some 30,000 tweets per minute; compare this to the peak rate of 16,000 tweets per minute observed during Hurricane Sandy).

Of course, AIDR’s first solo “flights” won’t always go smoothly. But not to worry, AIDR will let you know when it needs a little help. Every tweet that AIDR auto-tags comes with a Confidence level. That is, AIDR will let you know: “I am 80% sure that I correctly classified this tweet”. If AIDR has trouble with a tweet, i.e., if it’s confidence level is 65% or below, the it will send the tweet to you (and/or your volunteers) so it can learn from how you classify that particular tweet. In other words, the more tweets you classify, the more AIDR learns, and the higher AIDR’s confidence levels get. Fun, huh?

To view the results of the machine tagging, simply click on the View/Download tab, as shown below (click to enlarge). The page shows you the latest tweets that have been auto-tagged along with the Tag label and the confidence score. (Yes, this too is the first version of that interface, we’ll make it more user-friendly in the future, not to worry). In any event, you can download the auto-tagged tweets in a CSV file and also share the download link with your colleagues for analysis and so on. At some point in the future, we hope to provide a simple data visualization output page so that you can easily see interesting data trends.

AIDR Results

So that’s basically all there is to it. If you want to learn more about how it all works, you might fancy reading this research paper (PDF). In the meantime, I’ll simply add that you can re-use your Classifiers. If (when?) another earthquake strikes Chile, you won’t have to start from scratch. You can auto-tag incoming tweets immediately with the Classifier you already have. Plus, you’ll be able to share your classifiers with your colleagues and partner organizations if you like. In other words, we’re envisaging an “App Store” of Classifiers based on different hazards and different countries. The more we re-use our Classifiers, the more accurate they will become. Everybody wins.

And voila, that is AIDR (at least our first version). If you’d like to test the platform and/or want the tweets from the Chile Earthquake, simply get in touch!

bio

Note:

  • We’re adapting AIDR so that it can also classify text messages (SMS).
  • AIDR Classifiers are language specific. So if you speak Spanish, you can create a classifier to tag all Spanish language tweets/SMS that refer to disaster damage, for example. In other words, AIDR does not only speak English : )

Analyzing Tweets on Malaysia Flight #MH370

My QCRI colleague Dr. Imran is using our AIDR platform (Artificial Intelligence for Disaster Response) to collect & analyze tweets related to Malaysia Flight 370 that went missing several days ago. He has collected well over 850,000 English-language tweets since March 11th; using the following keywords/hashtags: Malaysia Airlines flight, #MH370m #PrayForMH370 and #MalaysiaAirlines.

MH370 Prayers

Imran then used AIDR to create a number of “machine learning classifiers” to automatically classify all incoming tweets into categories that he is interested in:

  • Informative: tweets that relay breaking news, useful info, etc

  • Praying: tweets that are related to prayers and faith

  • Personal: tweets that express personal opinions

The process is super simple. All he does is tag several dozen incoming tweets into their respective categories. This teaches AIDR what an “Informative” tweet should “look like”. Since our novel approach combines human intelligence with artificial intelligence, AIDR is typically far more accurate at capturing relevant tweets than Twitter’s keyword search.

And the more tweets that Imran tags, the more accurate AIDR gets. At present, AIDR can auto-classify ~500 tweets per second, or 30,000 tweets per minute. This is well above the highest velocity of crisis tweets recorded thus far—16,000 tweets/minute during Hurricane Sandy.

The graph below depicts the number of tweets generated since the day we started collecting the AIDR collection, i.e., March 11th.

Volume of Tweets per Day

This series of pie charts simply reflects the relative share of tweets per category over the past four days.

Tweets Trends

Below are some of the tweets that AIDR has automatically classified as being Informative (click to enlarge). The “Confidence” score simply reflects how confident AIDR is that it has correctly auto-classified a tweet. Note that Imran could also have crowdsourced the manual tagging—that is, he could have crowdsourced the process of teaching AIDR. To learn more about how AIDR works, please see this short overview and this research paper (PDF).

AIDR output

If you’re interested in testing AIDR (still very much under development) and/or would like the Tweet ID’s for the 850,000+ tweets we’ve collected using AIDR, then feel free to contact me. In the meantime, we’ll start a classifier that auto-collects tweets related to hijacking, criminal causes, and so on. If you’d like us to create a classifier for a different topic, let us know—but we can’t make any promises since we’re working on an important project deadline. When we’re further along with the development of AIDR, anyone will be able to easily collect & download tweets and create & share their own classifiers for events related to humanitarian issues.

Bio

Acknowledgements: Many thanks to Imran for collecting and classifying the tweets. Imran also shared the graphs and tabular output that appears above.

Using Social Media to Predict Economic Activity in Cities

Economic indicators in most developing countries are often outdated. A new study suggests that social media may provide useful economic signals when traditional economic data is unavailable. In “Taking Brazil’s Pulse: Tracking Growing Urban Economies from Online Attention” (PDF), the authors accurately predict the GDPs of 45 Brazilian cities by analyzing data from a popular micro-blogging platform (Yahoo Meme). To make these predictions, the authors used the concept of glocality, which notes that “economically successful cities tend to be involved in interactions that are both local and global at the same time.” The results of the study reveals that “a city’s glocality, measured with social media data, effectively signals the city’s economic well-being.”

The authors are currently expanding their work by predicting social capital for these 45 cities based on social media data. As iRevolution readers will know, I’ve blogged extensively on using social media to measure social capital footprints at the city and sub-city level. So I’ve contacted the authors of the study and look forward to learning more about their research. As they rightly note:

“There is growing interesting in using digital data for development opportunities, since the number of people using social media is growing rapidly in developing countries as well. Local impacts of recent global shocks – food, fuel and financial – have proven not to be immediately visible and trackable, often unfolding ‘beneath the radar of traditional monitoring systems’. To tackle that problem, policymakers are looking for new ways of monitoring local impacts […].”


bio