Tag Archives: data

Data Science for 100 Resilient Cities

The Rockefeller Foundation recently launched a major international initiative called “100 Resilient Cities.” The motivation behind this global project stems from the recognition that cities are facing increasing stresses driven by the unprecedented pace urbanization. More than 75% of people expected to live in cities by 2050. The Foundation is thus rightly concerned: “As natural and man-made shocks and stresses grow in frequency, impact and scale, with the ability to ripple across systems and geographies, cities are largely unprepared to respond to, withstand, and bounce back from disasters” (1).

Resilience is the capacity to self-organize, and smart self-organization requires social capital and robust feedback loops. I’ve discussed these issues and related linkages at lengths in the posts listed below and so shan’t repeat myself here. 

  • How to Create Resilience Through Big Data [link]
  • On Technology and Building Resilient Societies [link]
  • Using Social Media to Predict Disaster Resilience [link]
  • Social Media = Social Capital = Disaster Resilience? [link]
  • Does Social Capital Drive Disaster Resilience? [link]
  • Failing Gracefully in Complex Systems: A Note on Resilience [link]

Instead, I want to make a case for community-driven “tactical resilience” aided (not controlled) by data science. I came across the term “tactical urbanism” whilst at the “The City Resilient” conference co-organized by PopTech & Rockefeller in June. Tactical urbanism refers to small and temporary projects that demonstrate what could be. We also need people-centered tactical resilience initiatives to show small-scale resilience in action and demonstrate what these could mean at scale. Data science can play an important role in formulating and implementing tactical resilience interventions and in demonstrating their resulting impact at various scales.

Ultimately, if tactical resilience projects do not increase local capacity for smart and scalable self-organization, then they may not render cities more resilient. “Smart Cities” should mean “Resilient Neighborhoods” but the former concept takes a mostly top-down approach focused on the physical layer while the latter recognizes the importance of social capital and self-organization at the neighborhood level. “Indeed, neighborhoods have an impact on a surprisingly wide variety of outcomes, including child health, high-school graduation, teen births, adult mortality, social disorder and even IQ scores” (1).

So just like IBM is driving the data science behind their Smart Cities initiatives, I believe Rockefeller’s 100 Resilient Cities grantees would benefit from similar data science support and expertise but at the tactical and neighborhood level. This explains why my team and I plan to launch a Data Science for Resilience Program at the Qatar Foundation’s Computing Research Institute (QCRI). This program will focus on providing data science support to promising “tactical resilience” projects related to Rockefeller’s 100 Resilient Cities initiative.

The initial springboard for these conversations will be the PopTech & Rockefeller Fellows Program on “Community Resilience Through Big Data and Technology”. I’m really honored and excited to have been selected as one of the PopTech and Rockefeller Fellows to explore the intersections of Big Data, Technology and Resilience. As mentioned to the organizers, one of my objectives during this two-week brainstorming session is to produce a joint set of “tactical resilience” project proposals with well articulated research questions. My plan is to select the strongest questions and make them the basis for our initial data science for resilience research at QCRI.


Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch

I’ve spent the past 12 months working with top notch data scientists at QCRI et al. The following may thus be biased: I think QCRI got it right. They strive to balance their commitment to positive social change with their primary mission of becoming a world class institute for advanced computing research. The two are not mutually exclusive. What it takes is a dedicated position, like the one created for me at QCRI. It is high time that other research institutes, academic programs and international computing conferences create comparable focal points to catalyze data science for social good.

Microsoft Research, to name just one company, carries out very interesting research that could have tremendous social impact, but the bridge necessary to transfer much of that research from knowledge to operation to social impact is often not there. And when it is, it is usually by happenstance. So researchers continue to formulate research questions based on what they find interesting rather than identifying equally interesting questions that could have direct social impact if answered by data science. Hundreds of papers get presented at computing conferences every month, and yet few if any of the authors have linked up with organizations like the United Nations, World Bank, Habitat for Humanity etc., to identify and answer questions with social good potential. The same is true for hundreds of computing dissertations that get defended every year. Doctoral students do not realize that a minor reformulation of their research question could perhaps make a world of difference to a community-based organization in India dedicated to fighting corruption, for example.

Cognitive Mismatch

The challenge here is not one of untapped cognitive surplus (to borrow from Clay Shirky), but rather complete cognitive mismatch. As my QCRI colleague Ihab Ilyas puts it: there are “problem owners” on the one hand and “problem solvers” on the other. The former have problems that prevent them from catalyzing positive social change. The later know how to solve comparable problems and do so every day. But the two are not talking or even aware of each other. Creating and maintaining this two-way conversation requires more than one dedicated position (like mine at QCRI).

sweet spot

In short, I really want to have dedicated counterparts at Microsoft Research, IBM, SAP, LinkedIn, Bitly, GNIP, etc., as well as leading universities, top notch computing conferences and challenges; counterparts who have one foot in the world of data science and the other in the social sector; individuals who have a demonstrated track-record in bridging communities. There’s a community here waiting to be connected and needing to be formed. Again, carrying out cutting edge computing R&D is in no way incompatible with generating positive social impact. Moreover, the latter provides an important return on investment in the form of data, reputation, publicity, connections and social capital. In sum, social good challenges need to be formulated into research questions that have scientific as well as social good value. There is definitely a sweet spot here but it takes a dedicated community to bring problem owners and solvers together and hit that social good sweet spot.


Social Media for Emergency Management: Question of Supply and Demand

I’m always amazed by folks who dismiss the value of social media for emergency management based on the perception that said content is useless for disaster response. In that case, libraries are also useless (bar the few books you’re looking for, but those rarely represent more than 1% of all the books available in a major library). Does that mean libraries are useless? Of course not. Is social media useless for disaster response? Of course not. Even if only 0.001% of the 20+ million tweets posted during Hurricane Sandy were useful, and only half of these were accurate, this would still mean over 1,000 real-time and informative tweets, or some 15,000 words—i.e., the equivalent of a 25-page, single-space document exclusively composed of fully relevant, actionable & timely disaster information.


Empirical studies clearly prove that social media reports can be informative for disaster response. Numerous case studies have also described how social media has saved lives during crises. That said, if emergency responders do not actively or explicitly create demand for relevant and high quality social media content during crises, then why should supply follow? If the 911 emergency number (999 in the UK) were never advertised, then would anyone call? If 911 were simply a voicemail inbox with no instructions, would callers know what type of actionable information to relay after the beep?

While the majority of emergency management centers do not create the demand for crowdsourced crisis information, members of the public are increasingly demanding that said responders monitor social media for “emergency posts”. But most responders fear that opening up social media as a crisis communication channel with the public will result in an unmanageable flood of requests, The London Fire Brigade seems to think otherwise, however. So lets carefully unpack the fear of information flooding.

First of all, New York City’s 911 operators receive over 10 million calls every year that are accidental, false or hoaxes. Does this mean we should abolish the 911 system? Of course not. Now, assuming that 10% of these calls takes an operator 10 seconds to manage, this represents close to 3,000 hours or 115 days worth of “wasted work”. But this filtering is absolutely critical and requires human intervention. In contrast, “emergency posts” published on social media can be automatically filtered and triaged thanks to Big Data Analytics and Social Computing, which could save time operators time. The Digital Operations Center at the American Red Cross is currently exploring this automated filtering approach. Moreover, just as it is illegal to report false emergency information to 911, there’s no reason why the same laws could not apply to social media when these communication channels are used for emergency purposes.

Second, if individuals prefer to share disaster related information and/or needs via social media, this means they are less likely to call in as well. In other words, double reporting is unlikely to occur and could also be discouraged and/or penalized. In other words, the volume of emergency reports from “the crowd” need not increase substantially after all. Those who use the phone to report an emergency today may in the future opt for social media instead. The only significant change here is the ease of reporting for the person in need. Again, the question is one of supply and demand. Even if relevant emergency posts were to increase without a comparable fall in calls, this would simply reveal that the current voice-based system creates a barrier to reporting that discriminates against certain users in need.

Third, not all emergency calls/posts require immediate response by a paid professional with 10+ years of experience. In other words, the various types of needs can be triaged and responded to accordingly. As part of their police training or internships, new cadets could be tasked to respond to less serious needs, leaving the more seasoned professionals to focus on the more difficult situations. While this approach certainly has some limitations in the context of 911, these same limitations are far less pronounced for disaster response efforts in which most needs are met locally by the affected communities themselves anyway. In fact, the Filipino government actively promotes the use of social media reporting and crisis hashtags to crowdsource disaster response.

In sum, if disaster responders and emergency management processionals are not content with the quality of crisis reporting found on social media, then they should do something about it by implementing the appropriate policies to create the demand for higher quality and more structured reporting. The first emergency telephone service was launched in London some 80 years ago in response to a devastating fire. At the time, the idea of using a phone to report emergencies was controversial. Today, the London Fire Brigade is paving the way forward by introducing Twitter as a reporting channel. This move may seem controversial to some today, but give it a few years and people will look back and ask what took us so long to adopt new social media channels for crisis reporting.


Data Science for Social Good and Humanitarian Action

My (new) colleagues at the University of Chicago recently launched a new and exciting program called “Data Science for Social Good”. The program, which launches this summer, will bring together dozens top-notch data scientists, computer scientists an social scientists to address major social challenges. Advisors for this initiative include Eric Schmidt (Google), Raed Ghani (Obama Administration) and my very likable colleague Jake Porway (DataKind). Think of “Data Science for Social Good” as a “Code for America” but broader in scope and application. I’m excited to announce that QCRI is looking to collaborate with this important new program given the strong overlap with our Social Innovation Vision, Strategy and Projects.

My team and I at QCRI are hoping to mentor and engage fellows throughout the summer on key humanitarian & development projects we are working on in partnership with the United Nations, Red Cross, World Bank and others. This would provide fellows with the opportunity to engage in  “real world” challenges that directly match their expertise and interests. Second, we (QCRI) are hoping to replicate this type of program in Qatar in January 2014.

Why January? This will give us enough time to design the new program based on the result of this summer’s experiment. More importantly, perhaps, it will be freezing in Chicago ; ) and wonderfully warm in Doha. Plus January is an easier time for many students and professionals to take “time off”. The fellows program will likely be 3 weeks in duration (rather than 3 months) and will focus on applying data science to promote social good projects in the Arab World and beyond. Mentors will include top Data Scientists from QCRI and hopefully the University of Chicago. We hope to create 10 fellowship positions for this Data Science for Social Good program. The call for said applications will go out this summer, so stay tuned for an update.


Data Protection Protocols for Crisis Mapping

The day after the CrisisMappers 2011 Conference in Geneva, my colleague Phoebe Wynn-Pope organized and facilitated the most important workshop I attended that year. She brought together a small group of seasoned crisis mappers and experts in protection standards. The workshop concluded with a pressing action item: update the International Committee of the Red Cross’s (ICRC) Professional Standards for Protection Work in order to provide digital humanitarians with expert guidance on protection standards for humani-tarianism in the network age.

My colleague Anahi Ayala and I were invited to provide feedback on the new 20+ page chapter specifically dedicated to data management and new technologies. We added many, many comments and suggestions on the draft. The full report is available here (PDF). Today, thanks to ICRC, I am in Switzerland to give a Keynote on Next Generation Humanitarian Technology for the official launch of the report. The purpose of this blog post is to list the protection protocols that relate most directly to Crisis Mapping &  Digital Humanitarian Response; and to problematize some of these protocols. 

The Protocols

In the preface of the ICRC’s 2013 Edition of the Professional Standards for Protection Work, the report lists three reasons for the updated edition. The first has to do with new technologies:

In light of the rapidly proliferating initiatives to make new uses of information technology for protection purposes, such as satellite imagery, crisis mapping and publicizing abuses and violations through social media, the advisory group agreed to review the scope and language of the standards on managing sensitive information. The revised standards reflect the experiences and good practices of humanitarian and human rights organizations as well as of information & communication technology actors.

The new and most relevant protection standards relating—or applicable to—digital humanitarians are listed below (indented text) together with commentary.

Protection actors must only collect information on abuses and violations when necessary 
for the design or implementation of protection activities. It may not be used for other purposes without additional consent.

A number of Digital Humanitarian Networks such as the Standby Volunteer Task Force (SBTF) only collect crisis information specifically requested by the “Activating Organization,” such as the UN Office for the Coordination of Humanitarian Affairs (OCHA) for example. Volunteer networks like the SBTF are not “protection actors” but rather provide direct support to humanitarian organizations when the latter meet the SBTF’s activation criteria. In terms of what type of information the SBTF collects, again it is the Activating Organization that decides this, not the SBTF. For example, the Libya Crisis Map launched by the SBTF at the request of OCHA displayed categories of information that were decided by the UN team in Geneva.

Protection actors must collect and handle information containing personal details in accordance with the rules and principles of international law and other relevant regional or national laws on individual data protection.

These international, regional and national rules, principles and laws need to be made available to Digital Humanitarians in a concise, accessible and clear format. Such a resource is still missing.

Protection actors seeking information bear the responsibility to assess threats to the persons providing information, and to take necessary measures to avoid negative consequences for those from whom they are seeking information.

Protection actors setting up systematic information collection through the Internet or other media must analyse the different potential risks linked to the collection, sharing or public display of the information and adapt the way they collect, manage and publicly release the information accordingly.

Interestingly, when OCHA activated the SBTF in response to the Libya Crisis, it was the SBTF, not the UN, that took the initiative to formulate a Threat and Risks Mitigation Strategy that was subsequently approved by the UN. Furthermore, unlike other digital humanitarian networks, the Standby Task Force’s  “Prime Directive” is to not interact with the crisis-affected population. Why? Precisely to minimize the risk to those voluntarily sharing information on social media.

Protection actors must determine the scope, level of precision and depth of detail
of the information collection process, in relation to the intended use of the information collected.

Again, this is determined by the protection actor activating a digital humanitarian network like the SBTF.

Protection actors should systematically review the information collected in order to confirm that it is reliable, accurate, and updated.

The SBTF has a dedicated Verification Team that strives to do this. The verification of crowdsourced, user-generated content posted on social media during crises is no small task. But the BBC’s User-Generated Hub (UGC) has been doing just this for 8 years. Meanwhile, new strategies and technologies are under development to facilitate the rapid verification of such content. Also, the ICRC report notes that “Combining and cross-checking such [crowdsourced] information with other sources, including information collected directly from communities and individuals affected, is becoming standard good practice.”

Protection actors should be explicit as to the level of reliability and accuracy of information they use or share.

Networks like the SBTF make explicit whether a report published on a crisis map has been verified or not. If the latter, the report is clearly marked as “Unverified”. There are more nuanced ways to do this, however. I have recently given feedback on some exciting new research that is looking to quantify the probable veracity of user-generated content.

Protection actors must gather and subsequently process protection information in an objective and impartial manner, to avoid discrimination. They must identify and minimize bias that may affect information collection.

Objective, impartial, non-discriminatory and unbiased information is often more a fantasy than reality even with traditional data. Meeting these requirements in a conflict zone can be prohibitively expensive, overly time consuming and/or downright dangerous. This explains why advanced statistical methods dedicated to correcting biases exist. These can and have been applied to conflict and human rights data. They can also be applied to user-generated content on social media to the extent that the underlying demographic & census based information is possible.

To place this into context, Harvard University Professor Gary King, reminded me that the vast majority of medical data is not representative either. Nor is the vast majority of crime data. Does that render these datasets void? Of course not. Please see this post on Demystifying Crowdsourcing: An Introduction to Non-Probability Sampling.

Security safeguards appropriate to the sensitivity of the information must be in place prior
to any collection of information, to ensure protection from loss or theft, unauthorized access, disclosure, copying, use or modification, in any format in which it is kept.

One of the popular mapping technologies used by digital humanitarian networks is the Ushahidi platform. When the SBTF learned in 2012 that security holes had still not been patched almost a year after reporting them to Ushahidi Inc., the SBTF Core Team made an executive decision to avoid using Ushahidi technology whenever possible given that the platform could be easily hacked. (Just last month, a colleague of mine who is not a techie but a UN practitioner was able to scrape Ushahidi’s entire Kenya election monitoring data form March 2013, which included some personal identifying information). The SBTF has thus been exploring work-arounds and is looking to make greater use of GeoFeedia and Google’s new mapping technology, Stratomap, in future crisis mapping operations.

Protection actors must integrate the notion of informed consent when calling upon the general public, or members of a community, to spontaneously send them information through SMS, an open Internet platform, or any other means of communication, or when using information already available on the Internet.

This is perhaps the most problematic but important protection protocol as far as digital humanitarian work is concerned. While informed consent is absolutely of critical importance, the vast majority of crowdsourced content displayed on crisis maps is user-generated and voluntarily shared on social media. The very act of communicating with these individuals to request their consent not only runs the risk of endangering these individuals but also violates the SBTF’s Prime Directive for the exact same reason. Moreover, interacting with crisis-affected communities may raise expectations of response that digital humanitarians are simply not in position to guarantee. In situations of armed conflict and other situations of violence, conducting individual interviews can put people at risk not only because of the sensitive nature of the information collected, but because mere participation in the process can cause these people to be stigmatized or targeted.

That said, the ICRC does recognize that, “When such consent cannot be realistically obtained, information allowing the identification of victims or witnesses, should only be relayed in the public domain if the expected protection outcome clearly outweighs the risks. In case of doubt, displaying only aggregated data, with no individual markers, is strongly recommended.”

Protection actors should, to the degree possible, keep victims or communities having transmitted information on abuses and violations informed of the action they have taken
on their behalf – and of the ensuing results. Protection actors using information provided
by individuals should remain alert to any negative repercussions on the individuals or communities concerned, owing to the actions they have taken, and take measures to mitigate these repercussions.

Part of this protocol is problematic for the same reason as the above protocol. The very act of communicating with victims could place them in harm’s way. As far as staying alert to any negative repercussions, I believe the more seasoned digital humanitarian networks make this one of their top priorities.

When handling confidential and sensitive information on abuses and violations, protection actors should endeavor when appropriate and feasible, to share aggregated data on the trends they observed.

The purpose of the SBTF’s Analysis Team is precisely to serve this function.

Protection actors should establish formal procedures on the information handling process, from collection to exchange,  archiving or destruction.

Formal procedures to archive & destroy crowdsourced crisis information are largely lacking. Moving forward, the SBTF will defer this responsibility to the Activating Organization.


In conclusion, the ICRC notes that, “When it comes to protection, crowdsourcing can be an extremely efficient way to collect data on ongoing violence and abuses and/or their effects on individuals and communities. Made possible by the wide availability of Internet or SMS in countries affected by violence, crowdsourcing has rapidly gained traction.” To this end,

Although the need for caution is a central message [in the ICRC report], it should in no way be interpreted as a call to avoid sharing information. On the contrary, when the disclosing of protection information is thought to be of benefit to the individuals and communities concerned, it should be shared, as appropriate, with local, regional or national authorities, UN peacekeeping operations, other protection actors, and last but not least with service providers.

This is inline with the conclusions reached by OCHA’s landmark report, which notes that “Concern over the protection of information and data is not a sufficient reason to avoid using new communications technologies in emergencies, but it must be taken into account.” And so, “Whereas the first exercises were conducted without clear procedures to assess and to subsequently limit the risks faced by individuals who participated or who were named, the groups engaged in crisis mapping efforts over the years have become increasingly sensitive to the need to identify & manage these risks” (ICRC 2013).

It is worth recalling that the vast majority of the groups engaged in crisis mapping efforts, such as the SBTF, are first and foremost volunteers who are not only continuing to offer their time, skills and services for free, but are also taking it upon themselves to actively manage the risks involved in crisis mapping—risks that they, perhaps better than anyone else, understand and worry about the most because they are after all at the frontlines of these digital humanitarian efforts. And they do this all on a grand operational budget of $0 (as far as the SBTF goes). And yet, these volunteers continue to mobilize at the request of international humanitarian organizations and are always looking to learn, improve and do better. They continue to change the world for one map at a time.

I have organized a CrisisMappers Webinar on April 17, 2013, featuring presentations and remarks by the lead authors of the new ICRC report. Please join the CrisisMappers list-serve for more information.


See also:

  • SMS Code of Conduct for Disaster Response (Link)
  • Humanitarian Accountability Handbook (PDF)

A Research Framework for Next Generation Humanitarian Technology and Innovation

Humanitarian donors and organizations are increasingly championing innovation and the use of new technologies for humanitarian response. DfID, for example, is committed to using “innovative techniques and technologies more routinely in humanitarian response” (2011). In a more recent strategy paper, DfID confirmed that it would “continue to invest in new technologies” (2012). ALNAP’s important report on “The State of the Humanitarian System” documents the shift towards greater innovation, “with new funds and mechanisms designed to study and support innovation in humanitarian programming” (2012). A forthcoming land-mark study by OCHA makes the strongest case yet for the use and early adoption of new technologies for humanitarian response (2013).


These strategic policy documents are game-changers and pivotal to ushering in the next wave of humanitarian technology and innovation. That said, the reports are limited by the very fact that the authors are humanitarian professionals and thus not necessarily familiar with the field of advanced computing. The purpose of this post is therefore to set out a more detailed research framework for next generation humanitarian technology and innovation—one with a strong focus on information systems for crisis response and management.

In 2010, I wrote this piece on “The Humanitarian-Technology Divide and What To Do About It.” This divide became increasingly clear to me when I co-founded and co-directed the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping & Early Warning (2007-2009). So I co-founded the annual Inter-national CrisisMappers Conference series in 2009 and have continued to co-organize this unique, cross-disciplinary forum on humanitarian technology. The CrisisMappers Network also plays an important role in bridging the humanitarian and technology divide. My decision to join Ushahidi as Director of Crisis Mapping (2009-2012) was a strategic move to continue bridging the divide—and to do so from the technology side this time.

The same is true of my move to the Qatar Computing Research Institute (QCRI) at the Qatar Foundation. My experience at Ushahidi made me realize that serious expertise in Data Science is required to tackle the major challenges appearing on the horizon of humanitarian technology. Indeed, the key words missing from the DfID, ALNAP and OCHA innovation reports include: Data Science, Big Data Analytics, Artificial Intelligence, Machine Learning, Machine Translation and Human Computing. This current divide between the humanitarian and data science space needs to be bridged, which is precisely why I joined the Qatar Com-puting Research Institute as Director of Innovation; to develop and prototype the next generation of humanitarian technologies by working directly with experts in Data Science and Advanced Computing.


My efforts to bridge these communities also explains why I am co-organizing this year’s Workshop on “Social Web for Disaster Management” at the 2013 World Wide Web conference (WWW13). The WWW event series is one of the most prestigious conferences in the field of Advanced Computing. I have found that experts in this field are very interested and highly motivated to work on humanitarian technology challenges and crisis computing problems. As one of them recently told me: “We simply don’t know what projects or questions to prioritize or work on. We want questions, preferably hard questions, please!”

Yet the humanitarian innovation and technology reports cited above overlook the field of advanced computing. Their policy recommendations vis-a-vis future information systems for crisis response and management are vague at best. Yet one of the major challenges that the humanitarian sector faces is the rise of Big (Crisis) Data. I have already discussed this here, here and here, for example. The humanitarian community is woefully unprepared to deal with this tidal wave of user-generated crisis information. There are already more mobile phone sub-scriptions than people in 100+ countries. And fully 50% of the world’s population in developing countries will be using the Internet within the next 20 months—the current figure is 24%. Meanwhile, close to 250 million people were affected by disasters in 2010 alone. Since then, the number of new mobile phone subscrip-tions has increased by well over one billion, which means that disaster-affected communities today are increasingly likely to be digital communities as well.

In the Philippines, a country highly prone to “natural” disasters, 92% of Filipinos who access the web use Facebook. In early 2012, Filipinos sent an average of 2 billion text messages every day. When disaster strikes, some of these messages will contain information critical for situational awareness & rapid needs assess-ment. The innovation reports by DfID, ALNAP and OCHA emphasize time and time again that listening to local communities is a humanitarian imperative. As DfID notes, “there is a strong need to systematically involve beneficiaries in the collection and use of data to inform decision making. Currently the people directly affected by crises do not routinely have a voice, which makes it difficult for their needs be effectively addressed” (2012). But how exactly should we listen to millions of voices at once, let alone manage, verify and respond to these voices with potentially life-saving information? Over 20 million tweets were posted during Hurricane Sandy. In Japan, over half-a-million new users joined Twitter the day after the 2011 Earthquake. More than 177 million tweets about the disaster were posted that same day, i.e., 2,000 tweets per second on average.

Screen Shot 2013-03-20 at 1.42.25 PM

Of course, the volume and velocity of crisis information will vary from country to country and disaster to disaster. But the majority of humanitarian organizations do not have the technologies in place to handle smaller tidal waves either. Take the case of the recent Typhoon in the Philippines, for example. OCHA activated the Digital Humanitarian Network (DHN) to ask them to carry out a rapid damage assessment by analyzing the 20,000 tweets posted during the first 48 hours of Typhoon Pablo. In fact, one of the main reasons digital volunteer networks like the DHN and the Standby Volunteer Task Force (SBTF) exist is to provide humanitarian organizations with this kind of skilled surge capacity. But analyzing 20,000 tweets in 12 hours (mostly manually) is one thing, analyzing 20 million requires more than a few hundred dedicated volunteers. What’s more, we do not have the luxury of having months to carry out this analysis. Access to information is as important as access to food; and like food, information has a sell-by date.

We clearly need a research agenda to guide the development of next generation humanitarian technology. One such framework is proposed her. The Big (Crisis) Data challenge is composed of (at least) two major problems: (1) finding the needle in the haystack; (2) assessing the accuracy of that needle. In other words, identifying the signal in the noise and determining whether that signal is accurate. Both of these challenges are exacerbated by serious time con-straints. There are (at least) two ways too manage the Big Data challenge in real or near real-time: Human Computing and Artificial Intelligence. We know about these solutions because they have already been developed and used by other sectors and disciplines for several years now. In other words, our information problems are hardly as unique as we might think. Hence the importance of bridging the humanitarian and data science communities.

In sum, the Big Crisis Data challenge can be addressed using Human Computing (HC) and/or Artificial Intelligence (AI). Human Computing includes crowd-sourcing and microtasking. AI includes natural language processing and machine learning. A framework for next generation humanitarian technology and inno-vation must thus promote Research and Development (R&D) that apply these methodologies for humanitarian response. For example, Verily is a project that leverages HC for the verification of crowdsourced social media content generated during crises. In contrast, this here is an example of an AI approach to verification. The Standby Volunteer Task Force (SBTF) has used HC (micro-tasking) to analyze satellite imagery (Big Data) for humanitarian response. An-other novel HC approach to managing Big Data is the use of gaming, something called Playsourcing. AI for Disaster Response (AIDR) is an example of AI applied to humanitarian response. In many ways, though, AIDR combines AI with Human Computing, as does MatchApp. Such hybrid solutions should also be promoted   as part of the R&D framework on next generation humanitarian technology. 

There is of course more to humanitarian technology than information manage-ment alone. Related is the topic of Data Visualization, for example. There are also exciting innovations and developments in the use of drones or Unmanned Aerial Vehicles (UAVs), meshed mobile communication networks, hyper low-cost satellites, etc.. I am particularly interested in each of these areas will continue to blog about them. In the meantime, I very much welcome feedback on this post’s proposed research framework for humanitarian technology and innovation.


Opening World Bank Data with QCRI’s GeoTagger

My colleagues and I at QCRI partnered with the World Bank several months ago to develop an automated GeoTagger platform to increase the transparency and accountability of international development projects by accelerating the process of opening key development and finance data. We are proud to launch the first version of the GeoTagger platform today. The project builds on the Bank’s Open Data Initiatives promoted by former President, Robert Zoellick, and continued under the current leadership of Dr. Jim Yong Kim.

QCRI GeoTagger 1

The Bank has accumulated an extensive amount of socio-economic data as well as a massive amount of data on Bank-sponsored development projects worldwide. Much of this data, however, is not directly usable by the general public due to numerous data format, quality and access issues. The Bank therefore launched their “Mapping for Results” initiative to visualize the location of Bank-financed projects to better monitor development impact, improve aid effectiveness and coordination while enhancing transparency and social accountability. The geo-tagging of this data, however, has been especially time-consuming and tedious. Numerous interns were required to manually read through tens of thousands of dense World Bank project documentation, safeguard documents and results reports to identify and geocode exact project locations. But there are hundreds of thousands of such PDF documents. To make matters worse, these documents make seemingly “random” passing references to project locations, with no sign of any  standardized reporting structure whatsoever.

QCRI GeoTagger 2

The purpose of QCRI’s GeoTagger Beta is to automatically “read” through these countless PDF documents to identify and map all references to locations. GeoTagger does this using the World Bank Projects Data API and the Stanford Name Entity Recognizer (NER) & Alchemy. These tools help to automatically search through documents and identify place names, which are then geocoded using the Google GeocoderYahoo! Placefinder & Geonames and placed on a de-dicated map. QCRI’s GeoTagger will remain freely available and we’ll be making the code open source as well.

Naturally, this platform could be customized for many different datasets and organizations, which is why we’ve already been approached by a number of pro-spective partners to explore other applications. So feel free to get in touch should this also be of interest to your project and/or organization. In the meantime, a very big thank you to my colleagues at QCRI’s Big Data Analytics Center: Dr. Ihab Ilyas, Dr. Shady El-Bassuoni, Mina Farid and last but certainly not least, Ian Ye for their time on this project. Many thanks as well to my colleagues Johannes Kiess, Aleem Walji and team from the World Bank and Stephen Davenport at Development Gateway for the partnership.