Category Archives: Big Data

Using Social Media to Predict Economic Activity in Cities

Economic indicators in most developing countries are often outdated. A new study suggests that social media may provide useful economic signals when traditional economic data is unavailable. In “Taking Brazil’s Pulse: Tracking Growing Urban Economies from Online Attention” (PDF), the authors accurately predict the GDPs of 45 Brazilian cities by analyzing data from a popular micro-blogging platform (Yahoo Meme). To make these predictions, the authors used the concept of glocality, which notes that “economically successful cities tend to be involved in interactions that are both local and global at the same time.” The results of the study reveals that “a city’s glocality, measured with social media data, effectively signals the city’s economic well-being.”

The authors are currently expanding their work by predicting social capital for these 45 cities based on social media data. As iRevolution readers will know, I’ve blogged extensively on using social media to measure social capital footprints at the city and sub-city level. So I’ve contacted the authors of the study and look forward to learning more about their research. As they rightly note:

“There is growing interesting in using digital data for development opportunities, since the number of people using social media is growing rapidly in developing countries as well. Local impacts of recent global shocks – food, fuel and financial – have proven not to be immediately visible and trackable, often unfolding ‘beneath the radar of traditional monitoring systems’. To tackle that problem, policymakers are looking for new ways of monitoring local impacts [...].”


bio

New Insights on How To Verify Social Media

The “field” of information forensics has seen some interesting developments in recent weeks. Take the Verification Handbook or Twitter Lie-Detector project, for example. The Social Sensor project is yet another new initiative. In this blog post, I seek to make sense of these new developments and to identify where this new field may be going. In so doing, I highlight key insights from each initiative. 

VHandbook1

The co-editors of the Verification Handbook remind us that misinformation and rumors are hardly new during disasters. Chapter 1 opens with the following account from 1934:

“After an 8.1 magnitude earthquake struck northern India, it wasn’t long before word circulated that 4,000 buildings had collapsed in one city, causing ‘innumerable deaths.’ Other reports said a college’s main building, and that of the region’s High Court, had also collapsed.”

These turned out to be false rumors. The BBC’s User Generated Content (UGC) Hub would have been able to debunk these rumors. In their opinion, “The business of verifying and debunking content from the public relies far more on journalistic hunches than snazzy technology.” So they would have been right at home in the technology landscape of 1934. To be sure, they contend that “one does not need to be an IT expert or have special equipment to ask and answer the fundamental questions used to judge whether a scene is staged or not.” In any event, the BBC does not “verify something unless [they] speak to the person that created it, in most cases.” What about the other cases? How many of those cases are there? And how did they ultimately decide on whether the information was true or false even though they did not  speak to the person that created it?  

As this new study argues, big news organizations like the BBC aim to contact the original authors of user generated content (UGC) not only to try and “protect their editorial integrity but also because rights and payments for newsworthy footage are increasingly factors. By 2013, the volume of material and speed with which they were able to verify it [UGC] were becoming significant frustrations and, in most cases, smaller news organizations simply don’t have the manpower to carry out these checks” (Schifferes et al., 2014).

Credit: ZDnet

Chapter 3 of the Handbook notes that the BBC’s UGC Hub began operations in early 2005. At the time, “they were reliant on people sending content to one central email address. At that point, Facebook had just over 5 million users, rather than the more than one billion today. YouTube and Twitter hadn’t launched.” Today, more than 100 hours of content is uploaded to YouTube every minute; over 400 million tweets are sent each day and over 1 million pieces of content are posted to Facebook every 30 seconds. Now, as this third chapter rightly notes, “No technology can automatically verify a piece of UGC with 100 percent certainty. However, the human eye or traditional investigations aren’t enough either. It’s the combination of the two.” New York Times journalists concur: “There is a problem with scale… We need algorithms to take more onus off human beings, to pick and understand the best elements” (cited in Schifferes et al., 2014).

People often (mistakenly) see “verification as a simple yes/no action: Something has been verified or not. In practice, […] verification is a process” (Chapter 3). More specifically, this process is one of satisficing. As colleagues Leysia Palen et al.  note in this study, “Information processing during mass emergency can only satisfice because […] the ‘complexity of the environment is immensely greater than the computational powers of the adaptive system.'” To this end, “It is an illusion to believe that anyone has perfectly accurate information in mass emergency and disaster situations to account for the whole event. If someone did, then the situation would not be a disaster or crisis.” This explains why Leysia et al seek to shift the debate to one focused on the helpfulness of information rather the problematic true/false dichotomy.

Credit: Ann Wuyts

“In highly contextualized situations where time is of the essence, people need support to consider the content across multiple sources of information. In the online arena, this means assessing the credibility and content of information distributed across [the web]” (Leysia et al., 2011). This means that, “Technical support can go a long way to help collate and inject metadata that make explicit many of the inferences that the every day analyst must make to assess credibility and therefore helpfulness” (Leysia et al., 2011). In sum, the human versus computer debate vis-a-vis the verification of social media is somewhat pointless. The challenge moving forward resides in identifying the best ways to combine human cognition with machine computing. As Leysia et al. rightly note, “It is not the job of the […] tools to make decisions but rather to allow their users to reach a decision as quickly and confidently as possible.”

This may explain why Chapter 7 (which I authored) applies both human and advanced computing techniques to the verification challenge. Indeed, I explicitly advocate for a hybrid approach. In contrast, the Twitter Lie-Detector project known as Pheme apparently seeks to use machine learning alone to automatically verify online rumors as they spread on social networks. Overall, this is great news—the more groups that focus on this verification challenge, the better for those us engaged in digital humanitarian response. It remains to be seen, however, whether machine learning alone will make Pheme a success.

pheme

In the meantime, the EU’s Social Sensor project is developing new software tools to help journalists assess the reliability of social media content (Schifferes et al., 2014). A preliminary series of interviews revealed that journalists were most interested in Social Sensor software for:

1. Predicting or alerting breaking news

2. Verifying social media content–quickly identifying who has posted a tweet or video and establishing “truth or lie”

So the Social Sensor project is developing an “Alethiometer” (Alethia is Greek for ‘truth’) to “meter the credibility of of information coming from any source by examining the three Cs—Contributors, Content and Context. These seek to measure three key dimensions of credibility: the reliability of contributors, the nature of the content, and the context in which the information is presented. This reflects the range of considerations that working journalists take into account when trying to verify social media content. Each of these will be measured by multiple metrics based on our research into the steps that journalists go through manually. The results of [these] steps can be weighed and combined [metadata] to provide a sense of credibility to guide journalists” (Schifferes et al., 2014).

SocialSensor1

On our end, my colleagues and at QCRI are continuing to collaborate with several partners to experiment with advanced computing methods to address the social media verification challenge. As noted in Chapter 7, Verily, a platform that combines time-critical crowdsourcing and critical thinking, is still in the works. We’re also continuing our collaboration on a Twitter credibility plugin (more in Chapter 7). In addition, we are exploring whether we can microtask the computation of source credibility scores using MicroMappers.

Of course, the above will sound like “snazzy technologies” to seasoned journalists with no background or interest in advanced computing. But this doesn’t seem to stop them from complaining that “Twitter search is very hit and miss;” that what Twitter “produces is not comprehensive and the filters are not comprehensive enough” (BBC social media expert, cited in Schifferes et al., 2014). As one of my PhD dissertation advisors (Clay Shirky) noted a while back already, information overflow (Big Data) is due to “Filter Failure”. This is precisely why my colleagues and I are spending so much of our time developing better filters—filters powered by human and machine computing, such as AIDR. These types of filters can scale. BBC journalists on their own do not, unfortunately. But they can act on hunches and intuition based on years of hands-on professional experience.

The “field” of digital information forensics has come along way since I first wrote about how to verify social media content back in 2011. While I won’t touch on the Handbook’s many other chapters here, the entire report is an absolute must read for anyone interested and/or working in the verification space. At the very least, have a look at Chapter 9, which combines each chapter’s verification strategies in the form of a simple check-list. Also, Chapter 10 includes a list of  tools to aid in the verification process.

In the meantime, I really hope that we end the pointless debate about human versus machine. This is not an either/or issue. As a colleague once noted, what we really need is a way to combine the power of algorithms and the wisdom of the crowd with the instincts of experts.

bio

See also:

  • Predicting the Credibility of Disaster Tweets Automatically [link]
  • Auto-Ranking Credibility of Tweets During Major Events [link]
  • Auto-Identifying Fake Images on Twitter During Disasters [link]
  • Truth in the Age of Social Media: A Big Data Challenge [link]
  • Analyzing Fake Content on Twitter During Boston Bombings [link]
  • How to Verify Crowdsourced Information from Social Media [link]
  • Crowdsourcing Critical Thinking to Verify Social Media [link]

Quantifying Information Flow During Emergencies

I was particularly pleased to see this study appear in the top-tier journal, Nature. (Thanks to my colleague Sarah Vieweg for flagging). Earlier studies have shown that “human communications are both temporally & spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness.” In this new study, the authors analyze crisis events using country-wide mobile phone data. To this end, they also analyze the communication patterns of mobile phone users outside the affected area. So the question driving this study is this: how do the communication patterns of non-affected mobile phone users differ from those affected? Why ask this question? Understanding the communication patterns of mobile phone users outside the affected areas sheds light on how situational awareness spreads during disasters.

Nature graphs

The graphs above (click to enlarge) simply depict the change in call volume for three crisis events and one non-emergency event for the two types of mobile phone users. The set of users directly affected by a crisis is labeled G0 while users they contact during the emergency are labeled G1. Note that G1 users are not affected by the crisis. Since the study seeks to assess how G1 users change their communication patterns following a crisis, one logical question is this: do the call volume of G1 users increase like those of G0 users? The graphs above reveal that G1 and G0 users have instantaneous and corresponding spikes for crisis events. This is not the case for the non-emergency event.

“As the activity spikes for G0 users for emergency events are both temporally and spatially localized, the communication of G1 users becomes the most important means of spreading situational awareness.” To quantify the reach of situational awareness, the authors study the communication patterns of G1 users after they receive a call or SMS from the affected set of G0 users. They find 3 types of communication patterns for G1 users, as depicted below (click to enlarge).

Nature graphs 2

Pattern 1: G1 users call back G0 users (orange edges). Pattern 2: G1 users call forward to G2 users (purple edges). Pattern 3: G1 users call other G1 users (green edges). Which of these 3 patterns is most pronounced during a crisis? Pattern 1, call backs, constitute 25% of all G1 communication responses. Pattern 2, call forwards, constitutes 70% of communications. Pattern 3, calls between G1 users only represents 5% of all communications. This means that the spikes in call volumes shown in the above graphs is overwhelmingly driven by Patterns 1 and 2: call backs and call forwards.

The graphs below (click to enlarge) show call volumes by communication patterns 1 and 2. In these graphs, Pattern 1 is the orange line and Pattern 2 the dashed purple line. In all three crisis events, Pattern 1 (call backs) has clear volume spikes. “That is, G1 users prefer to interact back with G0 users rather than contacting with new users (G2), a phenomenon that limits the spreading of information.” In effect, Pattern 1 is a measure of reciprocal communications and indeed social capital, “representing correspondence and coordination calls between social neighbors.” In contrast, Pattern 2 measures the dissemination of the “dissemination of situational awareness, corresponding to information cascades that penetrate the underlying social network.”

Nature graphs 3

The histogram below shows average levels of reciprocal communication for the 4 events under study. These results clearly show a spike in reciprocal behavior for the three crisis events compared to the baseline. The opposite is true for the non-emergency event.Nature graphs 4

In sum, a crisis early warning system based on communication patterns should seek to monitor changes in the following two indicators: (1) Volume of Call Backs; and (2) Deviation of Call Backs from baseline. Given that access to mobile phone data is near-impossible for the vast majority of academics and humanitarian professionals, one question worth exploring is whether similar communication dynamics can be observed on social networks like Twitter and Facebook.

 bio

Inferring International and Internal Migration Patterns from Twitter

My QCRI colleagues Kiran Garimella and Ingmar Weber recently co-authored an important study on migration patterns discerned from Twitter. The study was co-authored with  Bogdan State (Stanford)  and lead author Emilio Zagheni (CUNY). The authors analyzed 500,000 Twitter users based in OECD countries between May 2011 and April 2013. Since Twitter users are not representative of the OECD population, the study uses a “difference-in-differences” approach to reduce selection bias when in out-migration rates for individual countries. The paper is available here and key insights & results are summarized below.

Twitter Migration

To better understand the demographic characteristics of the Twitter users under study, the authors used face recognition software (Face++) to estimate both the gender and age of users based on their profile pictures. “Face++ uses computer vision and data mining techniques applied to a large database of celebrities to generate estimates of age and sex of individuals from their pictures.” The results are depicted below (click to enlarge). Naturally, there is an important degree of uncertainty about estimates for single individuals. “However, when the data is aggregated, as we did in the population pyramid, the uncertainty is substantially reduced, as overestimates and underestimates of age should cancel each other out.” One important limitation is that age estimates may still be biased if users upload younger pictures of themselves, which would result in underestimating the age of the sample population. This is why other methods to infer age (and gender) should also be applied.

Twitter Migration 3

I’m particularly interested in the bias-correction “difference-in-differences” method used in this study, which demonstrates one can still extract meaningful information about trends even though statistical inferences cannot be inferred since the underlying data does not constitute a representative sample. Applying this method yields the following results (click to enlarge):

Twitter Migration 2

The above graph reveals a number of interesting insights. For example, one can observe a decline in out-migration rates from Mexico to other countries, which is consistent with recent estimates from Pew Research Center. Meanwhile, in Southern Europe, the results show that out-migration flows continue to increase for  countries that were/are hit hard by the economic crisis, like Greece.

The results of this study suggest that such methods can be used to “predict turning points in migration trends, which are particularly relevant for migration forecasting.” In addition, the results indicate that “geolocated Twitter data can substantially improve our understanding of the relationships between internal and international migration.” Furthermore, since the study relies in publicly available, real-time data, this approach could also be used to monitor migration trends on an ongoing basis.

To which extent the above is feasible remains to be seen. Very recent mobility data from official statistics are simply not available to more closely calibrate and validate the study’s results. In any event, this study is an important towards addressing a central question that humanitarian organizations are also asking: how can we make statistical inferences from online data when ground-truth data is unavailable as a reference?

I asked Emilio whether techniques like “difference-in-differences” could be used to monitor forced migration. As he noted, there is typically little to no ground truth data available in humanitarian crises. He thus believes that their approach is potentially relevant to evaluate forced migration. That said, he is quick to caution against making generalizations. Their study focused on OECD countries, which represent relatively large samples and high Internet diffusion, which means low selection bias. In contrast, data samples for humanitarian crises tend to be far smaller and highly selected. This means that filtering out the bias may prove more difficult. I hope that this is a challenge that Emilio and his co-authors choose to take on in the near future.

bio

Yes, I’m Writing a Book (on Digital Humanitarians)

I recently signed a book deal with Taylor & Francis Press. The book, which is tentatively titled “Digital Humanitarians: How Big Data is Changing the Face of Disaster Response,” is slated to be published next year. The book will chart the rise of digital humanitarian response from the Haiti Earthquake to 2015, highlighting critical lessons learned and best practices. To this end, the book will draw on real-world examples of digital humanitarians in action to explain how they use new technologies and crowdsourcing to make sense of “Big (Crisis) Data”. In sum, the book will describe how digital humanitarians & humanitarian technologies are together reshaping the humanitarian space and what this means for the future of disaster response. The purpose of this book is to inspire and inform the next generation of (digital) humanitarians while serving as a guide for established humanitarian organizations & emergency management professionals who wish to take advantage of this transformation in humanitarian response.

2025

The book will thus consolidate critical lessons learned in digital humanitarian response (such as the verification of social media during crises) so that members of the public along with professionals in both international humanitarian response and domestic emergency management can improve their own relief efforts in the face of “Big Data” and rapidly evolving technologies. The book will also be of interest to academics and students who wish to better understand methodological issues around the use of social media and user-generated content for disaster response; or how technology is transforming collective action and how “Big Data” is disrupting humanitarian institutions, for example. Finally, this book will also speak to those who want to make a difference; to those who of you who may have little to no experience in humanitarian response but who still wish to help others affected during disasters—even if you happen to be thousands of miles away. You are the next wave of digital humanitarians and this book will explain how you can indeed make a difference.

The book will not be written in a technical or academic writing style. Instead, I’ll be using a more “storytelling” form of writing combined with a conversational tone. This approach is perfectly compatible with the clear documentation of critical lessons emerging from the rapidly evolving digital humanitarian space. This conversational writing style is not at odds with the need to explain the more technical insights being applied to develop next generation humanitarian technologies. Quite on the contrary, I’ll be using intuitive examples & metaphors to make the most technical details not only understandable but entertaining.

While this journey is just beginning, I’d like to express my sincere thanks to my mentors for their invaluable feedback on my book proposal. I’d also like to express my deep gratitude to my point of contact at Taylor & Francis Press for championing this book from the get-go. Last but certainly not least, I’d like to sincerely thank the Rockefeller Foundation for providing me with a residency fellowship this Spring in order to accelerate my writing.

I’ll be sure to provide an update when the publication date has been set. In the meantime, many thanks for being an iRevolution reader!

bio

The Best of iRevolution in 2013

iRevolution crossed the 1 million hits mark in 2013, so big thanks to iRevolution readers for spending time here during the past 12 months. This year also saw close to 150 new blog posts published on iRevolution. Here is a short selection of the Top 15 iRevolution posts of 2013:

How to Create Resilience Through Big Data
[Link]

Humanitarianism in the Network Age: Groundbreaking Study
[Link]

Opening Keynote Address at CrisisMappers 2013
[Link]

The Women of Crisis Mapping
[Link]

Data Protection Protocols for Crisis Mapping
[Link]

Launching: SMS Code of Conduct for Disaster Response
[Link]

MicroMappers: Microtasking for Disaster Response
[Link]

AIDR: Artificial Intelligence for Disaster Response
[Link]

Social Media, Disaster Response and the Streetlight Effect
[Link]

Why the Share Economy is Important for Disaster Response
[Link]

Automatically Identifying Fake Images on Twitter During Disasters
[Link]

Why Anonymity is Important for Truth & Trustworthiness Online
[Link]

How Crowdsourced Disaster Response Threatens Chinese Gov
[Link]

Seven Principles for Big Data and Resilience Projects
[Link]

#NoShare: A Personal Twist on Data Privacy
[Link]

I’ll be mostly offline until February 1st, 2014 to spend time with family & friends, and to get started on a new exciting & ambitious project. I’ll be making this project public in January via iRevolution, so stay tuned. In the meantime, wishing iRevolution readers a very Merry Happy Everything!

santahat

Video: Humanitarian Response in 2025

I gave a talk on “The future of Humanitarian Response” at UN OCHA’s Global Humanitarian Policy Forum (#aid2025) in New York yesterday. More here for context. A similar version of the talk is available in the video presentation below.

Some of the discussions that ensued during the Forum were frustrating albeit an important reality check. Some policy makers still think that disaster response is about them and their international humanitarian organizations. They are still under the impression that aid does not arrive until they arrive. And yet, empirical research in the disaster literature points to the fact that the vast majority of survivals during disasters is the result of local agency, not external intervention.

In my talk (and video above), I note that local communities will increasingly become tech-enabled first responders, thus taking pressure off the international humanitarian system. These tech savvy local communities already exit. And they already respond to both “natural” (and manmade) disasters as noted in my talk vis-a-vis the information products produced by tech-savvy local Filipino groups. So my point about the rise of tech-enabled self-help was a more diplomatic way of conveying to traditional humanitarian groups that humanitarian response in 2025 will continue to happen with or without them; and perhaps increasingly without them.

This explains why I see OCHA’s Information Management (IM) Team increasingly taking on the role of “Information DJ”, mixing both formal and informal data sources for the purposes of both formal and informal humanitarian response. But OCHA will certainly not be the only DJ in town nor will they be invited to play at all “info events”. So the earlier they learn how to create relevant info mixes, the more likely they’ll still be DJ’ing in 2025.

Bio

Combining Radio, SMS and Advanced Computing for Disaster Response

I’m headed to the Philippines this week to collaborate with the UN Office for the Coordination of Humanitarian Affairs (OCHA) on humanitarian crowdsourcing and technology projects. I’ll be based in the OCHA Offices in Manila, working directly with colleagues Andrej Verity and Luis Hernando to support their efforts in response to Typhoon Yolanda. One project I’m exploring in this respect is a novel radio-SMS-computing initiative that my colleague Anahi Ayala (Internews) and I began drafting during ICCM 2013 in Nairobi last week. I’m sharing the approach here to solicit feedback before I land in Manila.

Screen Shot 2013-11-25 at 6.21.33 AM

The “Radio + SMS + Computing” project is firmly grounded in GSMA’s official Code of Conduct for the use of SMS in Disaster Response. I have also drawn on the Bellagio Big Data Principles when writing up the in’s and out’s of this initiative with Anahi. The project is first and foremost a radio-based initiative that seeks to answer the information needs of disaster-affected communities.

The project: Local radio stations in the Philippines would create and broadcast radio programs inviting local communities to serve as “community journalists” to describe how the Typhoon has impacted their communities. The radio stations would provide a free SMS short-code and invite said communities to text in their observations. Each radio station would include in their broadcast a unique 2-letter identifier and would ask those texting in to start their SMS with that identifier. They would also emphasize that text messages should not include any Personal Identifying Information (PII) and no location information either. Those messages that do include PII would be deleted.

Text messages sent to the SMS short code would be automatically triaged by radio station (using the 2-letter identifier) and forwarded to the respective radio stations via SMS. (At this point, few local radio stations have web access in the disaster-affected areas). These radio stations would be funded to create radio programs based on the SMS’s received. These programs would conclude by asking local communities to text in their information needs—again using the unique radio identifier as a prefix in the text messages. Radio stations would create follow-up programs to address the information needs texted in by local communities (“news you can use”). This could be replicated on a weekly basis and extended to the post-disaster reconstruction phase.

Yolanda destruction

In parallel, the text messages documenting the impact of the Typhoon at the community level would be categorized by Cluster—such as shelter, health, education, etc. Each classified SMS would then be forwarded to the appropriate Cluster Leads. This is where advanced computing comes in: the application of microtasking and machine learning. Trusted Filipino volunteers would be invited to tag each SMS by Cluster-category (and also translate relevant text messages into English). Once enough text messages have been tagged per category, the use of machine learning classifiers would enable the automatic classification of incoming SMS’s. As explained above, these classified SMS’s would then be automatically forwarded to a designated point of contact at each Cluster Agency.

This process would be repeated for SMS’s documenting the information needs of local communities. In other words, information needs would be classified by Cluster category and forwarded to Cluster Leads. The latter would share their responses to stated information needs with the radio stations who in turn would complement their broadcasts with the information provided by the humanitarian community, thus closing the feedback loop.

The radio-SMS project would be strictly opt-in. Radio programs would clearly state that the data sent in via SMS would be fully owned by local communities who could call in or text in at any time to have their SMS deleted. Phone numbers would only be shared with humanitarian organization if the individuals texting to radio stations consented (via SMS) to their numbers being shared. Inviting communities to act as “citizen journalists” rather than asking them to report their needs may help manage expectations. Radio stations can further manage these expectations during their programs by taking questions from listeners calling in. In addition, the project seeks to limit the number of SMS’s that communities have to send. The greater the amount of information solicited from disaster-affected communities, the more challenging managing expectations may be. The project also makes a point of focusing on local information needs as the primary entry point. Finally, the data collection limits the geographical resolution to the village level for the purposes of data privacy and protection.

AIDR logo

It remains to be seen whether this project gets funded, but I’d welcome any feedback iRevolution readers may have in any event since this approach could also be used in future disasters. In the meantime, my QCRI colleagues and I are looking to modify AIDR to automatically classify SMS’s (in addition to tweets). My UNICEF colleagues already expressed to me their need to automatically classify millions of text messages for their U-Report project, so I believe that many other humanitarian and development organizations will benefit from a free and open source platform for automatic SMS classification. At the technical level, this means adding “batch-processing” to AIDR’s current “streaming” feature. We hope to have an update on this in coming weeks. Note that a batch-processing feature will also allow users to upload their own datasets of tweets for automatic classification. 

Bio

Opening Keynote Address at CrisisMappers 2013

Screen Shot 2013-11-18 at 1.58.07 AM

Welcome to Kenya, or as we say here, Karibu! This is a special ICCM for me. I grew up in Nairobi; in fact our school bus would pass right by the UN every day. So karibu, welcome to this beautiful country (and continent) that has taught me so much about life. Take “Crowdsourcing,” for example. Crowdsourcing is just a new term for the old African saying “It takes a village.” And it took some hard-working villagers to bring us all here. First, my outstanding organizing committee went way, way above and beyond to organize this village gathering. Second, our village of sponsors made it possible for us to invite you all to Nairobi for this Fifth Annual, International Conference of CrisisMappers (ICCM).

I see many new faces, which is really super, so by way of introduction, my name is Patrick and I develop free and open source next generation humanitarian technologies with an outstanding team of scientists at the Qatar Computing Research Institute (QCRI), one of this year’s co-sponsors.

We’ve already had an exciting two-days of pre-conference site visits with our friends from Sisi ni Amani and our co-host Spatial Collective. ICCM participants observed first-hand how GIS, mobile technology and communication projects operate in informal settlements, covering a wide range of topics that include governance, civic education and peacebuilding. In addition, our friend Heather Leson from the Open Knowledge Foundation (OKF) coordinated an excellent set of trainings at the iHub yesterday. So a big thank you to Heather, Sisi ni Amani and Spatial Collective for these outstanding pre-conference events.

Screen Shot 2013-11-19 at 10.48.30 AM

This is my 5th year giving opening remarks at ICCM, so some of you will know from previous years that I often take this moment to reflect on the past 12 months. But just reflecting on the past 12 days alone requires it’s own separate ICCM. I’m referring, of course, to the humanitarian and digital humanitarian response to the devastating Typhoon in the Philippines. This response, which is still ongoing, is unparalleled in terms of the level of collaboration between members of the Digital Humanitarian Network (DHN) and formal humanitarian organizations like UN OCHA and WFP. All of these organizations, both formal and digital, are also members of the CrisisMapper Network.

Screen Shot 2013-11-18 at 2.07.59 AM

The Digital Humanitarian Network, or DHN, serves as the official interface between formal humanitarian organizations and global networks of tech-savvy digital volunteers. These digital volunteers provide humanitarian organizations with the skill and surge capacity they often need to make timely sense of “Big (Crisis) Data” during major disasters. By Big Crisis Data, I mean social media content and satellite imagery, for example. This overflow of such information generated during disasters can be as paralyzing to humanitarian response as the absence of information. And making sense of this overflow in response to Yolanda has required all hands on deck—i.e., an unprecedented level of collaboration between many members of the DHN.

So I’d like to share with you 2 initial observations from this digital humanitarian response to Yolanda; just 2 points that may be signs of things to come. Local Digital Villages and World Wide (good) Will.

Screen Shot 2013-11-18 at 2.09.42 AM

First, there were numerous local digital humanitarians on the ground in the Philippines. These digitally-savvy Filipinos were rapidly self-organizing and launching crisis maps well before any of us outside the Philippines had time to blink. One such group is Rappler, for example.

Screen Shot 2013-11-18 at 2.10.37 AM

We (the DHN) reached out to them early on, sharing both our data and volunteers. Remember that “Crowdsourcing” is just a new word for the old African saying that “it takes a village…” and sometimes, it takes a digital village to support humanitarian efforts on the ground. And Rappler is hardly the only local digital community that mobilizing in response to Yolanda, there are dozens of digital villages spearheading similar initiatives across the country.

The rise of local digital villages means that the distant future (or maybe not too distant future) of humanitarian operations may become less about the formal “brick-and-mortar” humanitarian organizations and, yes, also less about the Digital Humanitarian Network. Disaster response is and has always have been about local communities self-organizing and now local digital communities self-organizing. The majority of lives saved during disasters is attributed to this local agency, not international, external relief. Furthermore, these local digital villages are increasingly the source of humanitarian innovation, so we should pay close attention; we have a lot to learn from these digital villages. Naturally, they too are learning a lot from us.

The second point that struck me occurred when the Standby Volunteer Task Force (SBTF) completed its deployment of MicroMappers on behalf of OCHA. The response from several SBTF volunteers was rather pointed—some were disappointed that the deployment had closed; others were downright upset. What happened next was very interesting; you see, these volunteers simply kept going, they used (hacked) the SBTF Skype Chat for Yolanda (which already had over 160 members) to self-organize and support other digital humanitarian efforts that were still ongoing. So the SBTF Team sent an email to it’s 1,000+ volunteers with the following subject header: “Closing Yolanda Deployment, Opening Other Opportunities!”

Screen Shot 2013-11-18 at 2.11.28 AM

The email provided a list of the most promising ongoing digital volunteer opportunities for the Typhoon response and encouraged volunteers to support whatever efforts they were most drawn to. This second reveals that a “World Wide (good) Will” exists. People care. This is good! Until recently, when disasters struck in faraway lands, we would watch the news on television wishing we could somehow help. That private wish—that innate human emotion—would perhaps translate into a donation. Today, not only can you donate cash to support those affected by disasters, you can also donate a few minutes of your time to support the relief efforts on the ground thanks to new humanitarian technologies and platforms. In other words, you, me, all of us can now translate our private wishes into direct, online public action, which can support those working in disaster-affected areas including local digital villages.

Screen Shot 2013-11-18 at 2.12.21 AM

This surge of World Wide (good) Will explains why SBTF volunteers wanted to continue volunteering for as long as they wished even if our formal digital humanitarian network had phased out operations. And this is beautiful. We should not seek to limit or control this global goodwill or play the professional versus amateur card too quickly. Besides, who are we kidding? We couldn’t control this flood of goodwill even if we wanted to. But, we can embrace this goodwill and channel it. People care, they want to offer their time to help others thousands of miles away. This is beautiful and the kind of world I want to live in. To paraphrase the philosopher Hannah Arendt, the greatest harm in the world is caused not by evil but apathy. So we should cherish the digital goodwill that springs during disasters. This spring is the digital equivalent of mutual aid, of self-help. The global village of digital Good Samaritans is growing.

At the same time, this goodwill, this precious human emotion and the precious time it freely offers can cause more harm than good if it is not channeled responsibly. When international volunteers poor into disaster areas wanting to help, their goodwill can have the opposite effect, especially when they are inexperienced. This is also true of digital volunteers flooding in to help online.

We in the CrisisMappers community have the luxury of having learned a lot about digital humanitarian response since the Haiti Earthquake; we have learned important lessons about data privacy and protection, codes of conduct, the critical information needs of humanitarian organizations and disaster-affected populations, standardizing operating procedures, and so on. Indeed we now (for the first time) have data protection protocols that address crowdsourcing, social media and digital volunteers thanks to our colleagues at the ICRC. We also have an official code of conduct on the use of SMS for disaster response thanks to our colleagues at GSMA. This year’s World Disaster Report (WDR 2013) also emphasizes the responsible use of next generation humanitarian technologies and the crisis data they manage.

Screen Shot 2013-11-18 at 2.13.03 AM

Now, this doesn’t mean that we the formal (digital) humanitarian sector have figured it all out—far from it. This simply means that we’ve learned a few important and difficult lessons along the way. Unlike newcomers to the digital humanitarian space, we have the benefit of several years of hard experience to draw on when deploying for disasters like Typhoon Yolanda. While sharing these lessons and disseminating them as widely as possible is obviously a must, it is simply not good enough. Guidebooks and guidelines just won’t cut it. We also need to channel the global spring of digital goodwill and distribute it to avoid  “flash floods” of goodwill. So what might these goodwill channels look like? Well they already exist in the form of the Digital Humanitarian Network—more specifically the members of the DHN.

These are the channels that focus digital goodwill in support of the humanitarian organizations that physically deploy to disasters. These channels operate using best practices, codes of conduct, protocols, etc., and can be held accountable. At the same time, however, these channels also block the upsurge of goodwill from new digital volunteers—those outside our digital villages. How? Our channels block this World Wide (good) Will by requiring technical expertise to engage with us and/or  by requiring an inordinate amount of time commitment. So we should not be surprised if the “World Wide (Good) Will” circumvents our channels altogether, and in so doing causes more harm than good during disasters. Our channels are blocking their engagement and preventing them from joining our digital villages. Clearly we need different channels to focus the World Wide (Good) Will.

Screen Shot 2013-11-18 at 2.14.21 AM

Our friends at Humanitarian OpenStreetMap already figured this out two years ago when they set up their microtasking server, making it easier for less tech-savvy volunteers to engage. We need to democratize our humanitarian technologies to responsibly channel the huge surplus global goodwill that exists online. This explains why my team and I at QCRI are developing MicroMappers and why we deployed the platform in response to OCHA’s request within hours of Typhoon Yolanda making landfall in the Philippines.

Screen Shot 2013-11-18 at 2.15.21 AM

This digital humanitarian operation was definitely far from perfect, but it was super simple to use and channeled 208 hours of global goodwill in just a matter days. Those are 208 hours that did not cause harm. We had volunteers from dozens of countries around the world and from all ages and walks of life offering their time on MicroMappers. OCHA, which had requested this support, channeled the resulting data to their teams on the ground in the Philippines.

These digital volunteers all cared and took the time to try and help others thousands of miles away. The same is true of the remarkable digital volunteers supporting the Humanitarian OpenStreetMap efforts. This is the kind of world I want to live in; the world in which humanitarian technologies harvest the global goodwill and channels it to make a difference to those affected by disasters.

Screen Shot 2013-11-18 at 2.09.42 AM

So these are two important trends I see moving forward, the rise of well-organized, local digital humanitarian groups, like Rappler, and the rise of World Wide (Good) Will. We must learn from the former, from the local digital villages, and when asked, we should support them as best we can. We should also channel, even amplify the World Wide (Good) Will by democratizing humanitarian technologies and embracing new ways to engage those who want to make a difference. Again, Crowdsourcing is simply a new term for the old African proverb, that it takes a village. Let us not close the doors to that village.

So on this note, I thank *you* for participating in ICCM and for being a global village that cares, both on and offline. Big thanks as well to our current team of sponsors for caring about this community and making sure that our village does continue to meet in person every year. And now for the next 3 days, we have an amazing line-up of speakers, panelists & technologies for you. So please use these days to plot, partner and disrupt. And always remember: be tough on ideas, but gentle on people.

Thanks again, and keep caring.

#Westgate Tweets: A Detailed Study in Information Forensics

My team and I at QCRI have just completed a detailed analysis of the 13,200+ tweets posted from one hour before the attacks began until two hours into the attack. The purpose of this study, which will be launched at CrisisMappers 2013 in Nairobi tomorrow, is to make sense of the Big (Crisis) Data generated during the first hours of the siege. A summary of our results are displayed below. The full results of our analysis and discussion of findings are available as a GoogleDoc and also PDF. The purpose of this public GoogleDoc is to solicit comments on our methodology so as to inform the next phase of our research. Indeed, our aim is to categorize and study the entire Westgate dataset in the coming months (730,000+ tweets). In the meantime, sincere appreciation go to my outstanding QCRI Research Assistants, Ms. Brittany Card and Ms. Justine MacKinnon for their hard work on the coding and analysis of the 13,200+ tweets. Our study builds on this preliminary review.

The following 7 figures summarize the main findings of our study. These are discussed in more detail in the GoogleDoc/PDF.

Figure 1: Who Authored the Most Tweets?

Figure 2: Frequency of Tweets by Eyewitnesses Over Time?

Figure 3: Who Were the Tweets Directed At?

Figure 4: What Content Did Tweets Contain?

Figure 5: What Terms Were Used to Reference the Attackers?

Figure 6: What Terms Were Used to Reference Attackers Over Time?

Figure 7: What Kind of Multimedia Content Was Shared?