Tag Archives: data

Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch

I’ve spent the past 12 months working with top notch data scientists at QCRI et al. The following may thus be biased: I think QCRI got it right. They strive to balance their commitment to positive social change with their primary mission of becoming a world class institute for advanced computing research. The two are not mutually exclusive. What it takes is a dedicated position, like the one created for me at QCRI. It is high time that other research institutes, academic programs and international computing conferences create comparable focal points to catalyze data science for social good.

Microsoft Research, to name just one company, carries out very interesting research that could have tremendous social impact, but the bridge necessary to transfer much of that research from knowledge to operation to social impact is often not there. And when it is, it is usually by happenstance. So researchers continue to formulate research questions based on what they find interesting rather than identifying equally interesting questions that could have direct social impact if answered by data science. Hundreds of papers get presented at computing conferences every month, and yet few if any of the authors have linked up with organizations like the United Nations, World Bank, Habitat for Humanity etc., to identify and answer questions with social good potential. The same is true for hundreds of computing dissertations that get defended every year. Doctoral students do not realize that a minor reformulation of their research question could perhaps make a world of difference to a community-based organization in India dedicated to fighting corruption, for example.

Cognitive Mismatch

The challenge here is not one of untapped cognitive surplus (to borrow from Clay Shirky), but rather complete cognitive mismatch. As my QCRI colleague Ihab Ilyas puts it: there are “problem owners” on the one hand and “problem solvers” on the other. The former have problems that prevent them from catalyzing positive social change. The later know how to solve comparable problems and do so every day. But the two are not talking or even aware of each other. Creating and maintaining this two-way conversation requires more than one dedicated position (like mine at QCRI).

sweet spot

In short, I really want to have dedicated counterparts at Microsoft Research, IBM, SAP, LinkedIn, Bitly, GNIP, etc., as well as leading universities, top notch computing conferences and challenges; counterparts who have one foot in the world of data science and the other in the social sector; individuals who have a demonstrated track-record in bridging communities. There’s a community here waiting to be connected and needing to be formed. Again, carrying out cutting edge computing R&D is in no way incompatible with generating positive social impact. Moreover, the latter provides an important return on investment in the form of data, reputation, publicity, connections and social capital. In sum, social good challenges need to be formulated into research questions that have scientific as well as social good value. There is definitely a sweet spot here but it takes a dedicated community to bring problem owners and solvers together and hit that social good sweet spot.

Bio

Social Media for Emergency Management: Question of Supply and Demand

I’m always amazed by folks who dismiss the value of social media for emergency management based on the perception that said content is useless for disaster response. In that case, libraries are also useless (bar the few books you’re looking for, but those rarely represent more than 1% of all the books available in a major library). Does that mean libraries are useless? Of course not. Is social media useless for disaster response? Of course not. Even if only 0.001% of the 20+ million tweets posted during Hurricane Sandy were useful, and only half of these were accurate, this would still mean over 1,000 real-time and informative tweets, or some 15,000 words—i.e., the equivalent of a 25-page, single-space document exclusively composed of fully relevant, actionable & timely disaster information.

LibTweet

Empirical studies clearly prove that social media reports can be informative for disaster response. Numerous case studies have also described how social media has saved lives during crises. That said, if emergency responders do not actively or explicitly create demand for relevant and high quality social media content during crises, then why should supply follow? If the 911 emergency number (999 in the UK) were never advertised, then would anyone call? If 911 were simply a voicemail inbox with no instructions, would callers know what type of actionable information to relay after the beep?

While the majority of emergency management centers do not create the demand for crowdsourced crisis information, members of the public are increasingly demanding that said responders monitor social media for “emergency posts”. But most responders fear that opening up social media as a crisis communication channel with the public will result in an unmanageable flood of requests, The London Fire Brigade seems to think otherwise, however. So lets carefully unpack the fear of information flooding.

First of all, New York City’s 911 operators receive over 10 million calls every year that are accidental, false or hoaxes. Does this mean we should abolish the 911 system? Of course not. Now, assuming that 10% of these calls takes an operator 10 seconds to manage, this represents close to 3,000 hours or 115 days worth of “wasted work”. But this filtering is absolutely critical and requires human intervention. In contrast, “emergency posts” published on social media can be automatically filtered and triaged thanks to Big Data Analytics and Social Computing, which could save time operators time. The Digital Operations Center at the American Red Cross is currently exploring this automated filtering approach. Moreover, just as it is illegal to report false emergency information to 911, there’s no reason why the same laws could not apply to social media when these communication channels are used for emergency purposes.

Second, if individuals prefer to share disaster related information and/or needs via social media, this means they are less likely to call in as well. In other words, double reporting is unlikely to occur and could also be discouraged and/or penalized. In other words, the volume of emergency reports from “the crowd” need not increase substantially after all. Those who use the phone to report an emergency today may in the future opt for social media instead. The only significant change here is the ease of reporting for the person in need. Again, the question is one of supply and demand. Even if relevant emergency posts were to increase without a comparable fall in calls, this would simply reveal that the current voice-based system creates a barrier to reporting that discriminates against certain users in need.

Third, not all emergency calls/posts require immediate response by a paid professional with 10+ years of experience. In other words, the various types of needs can be triaged and responded to accordingly. As part of their police training or internships, new cadets could be tasked to respond to less serious needs, leaving the more seasoned professionals to focus on the more difficult situations. While this approach certainly has some limitations in the context of 911, these same limitations are far less pronounced for disaster response efforts in which most needs are met locally by the affected communities themselves anyway. In fact, the Filipino government actively promotes the use of social media reporting and crisis hashtags to crowdsource disaster response.

In sum, if disaster responders and emergency management processionals are not content with the quality of crisis reporting found on social media, then they should do something about it by implementing the appropriate policies to create the demand for higher quality and more structured reporting. The first emergency telephone service was launched in London some 80 years ago in response to a devastating fire. At the time, the idea of using a phone to report emergencies was controversial. Today, the London Fire Brigade is paving the way forward by introducing Twitter as a reporting channel. This move may seem controversial to some today, but give it a few years and people will look back and ask what took us so long to adopt new social media channels for crisis reporting.

Bio

Data Science for Social Good and Humanitarian Action

My (new) colleagues at the University of Chicago recently launched a new and exciting program called “Data Science for Social Good”. The program, which launches this summer, will bring together dozens top-notch data scientists, computer scientists an social scientists to address major social challenges. Advisors for this initiative include Eric Schmidt (Google), Raed Ghani (Obama Administration) and my very likable colleague Jake Porway (DataKind). Think of “Data Science for Social Good” as a “Code for America” but broader in scope and application. I’m excited to announce that QCRI is looking to collaborate with this important new program given the strong overlap with our Social Innovation Vision, Strategy and Projects.

My team and I at QCRI are hoping to mentor and engage fellows throughout the summer on key humanitarian & development projects we are working on in partnership with the United Nations, Red Cross, World Bank and others. This would provide fellows with the opportunity to engage in  “real world” challenges that directly match their expertise and interests. Second, we (QCRI) are hoping to replicate this type of program in Qatar in January 2014.

Why January? This will give us enough time to design the new program based on the result of this summer’s experiment. More importantly, perhaps, it will be freezing in Chicago ; ) and wonderfully warm in Doha. Plus January is an easier time for many students and professionals to take “time off”. The fellows program will likely be 3 weeks in duration (rather than 3 months) and will focus on applying data science to promote social good projects in the Arab World and beyond. Mentors will include top Data Scientists from QCRI and hopefully the University of Chicago. We hope to create 10 fellowship positions for this Data Science for Social Good program. The call for said applications will go out this summer, so stay tuned for an update.

bio

Data Protection Protocols for Crisis Mapping

The day after the CrisisMappers 2011 Conference in Geneva, my colleague Phoebe Wynn-Pope organized and facilitated the most important workshop I attended that year. She brought together a small group of seasoned crisis mappers and experts in protection standards. The workshop concluded with a pressing action item: update the International Committee of the Red Cross’s (ICRC) Professional Standards for Protection Work in order to provide digital humanitarians with expert guidance on protection standards for humani-tarianism in the network age.

My colleague Anahi Ayala and I were invited to provide feedback on the new 20+ page chapter specifically dedicated to data management and new technologies. We added many, many comments and suggestions on the draft. The full report is available here (PDF). Today, thanks to ICRC, I am in Switzerland to give a Keynote on Next Generation Humanitarian Technology for the official launch of the report. The purpose of this blog post is to list the protection protocols that relate most directly to Crisis Mapping &  Digital Humanitarian Response; and to problematize some of these protocols. 

The Protocols

In the preface of the ICRC’s 2013 Edition of the Professional Standards for Protection Work, the report lists three reasons for the updated edition. The first has to do with new technologies:

In light of the rapidly proliferating initiatives to make new uses of information technology for protection purposes, such as satellite imagery, crisis mapping and publicizing abuses and violations through social media, the advisory group agreed to review the scope and language of the standards on managing sensitive information. The revised standards reflect the experiences and good practices of humanitarian and human rights organizations as well as of information & communication technology actors.

The new and most relevant protection standards relating—or applicable to—digital humanitarians are listed below (indented text) together with commentary.

Protection actors must only collect information on abuses and violations when necessary 
for the design or implementation of protection activities. It may not be used for other purposes without additional consent.

A number of Digital Humanitarian Networks such as the Standby Volunteer Task Force (SBTF) only collect crisis information specifically requested by the “Activating Organization,” such as the UN Office for the Coordination of Humanitarian Affairs (OCHA) for example. Volunteer networks like the SBTF are not “protection actors” but rather provide direct support to humanitarian organizations when the latter meet the SBTF’s activation criteria. In terms of what type of information the SBTF collects, again it is the Activating Organization that decides this, not the SBTF. For example, the Libya Crisis Map launched by the SBTF at the request of OCHA displayed categories of information that were decided by the UN team in Geneva.

Protection actors must collect and handle information containing personal details in accordance with the rules and principles of international law and other relevant regional or national laws on individual data protection.

These international, regional and national rules, principles and laws need to be made available to Digital Humanitarians in a concise, accessible and clear format. Such a resource is still missing.

Protection actors seeking information bear the responsibility to assess threats to the persons providing information, and to take necessary measures to avoid negative consequences for those from whom they are seeking information.

Protection actors setting up systematic information collection through the Internet or other media must analyse the different potential risks linked to the collection, sharing or public display of the information and adapt the way they collect, manage and publicly release the information accordingly.

Interestingly, when OCHA activated the SBTF in response to the Libya Crisis, it was the SBTF, not the UN, that took the initiative to formulate a Threat and Risks Mitigation Strategy that was subsequently approved by the UN. Furthermore, unlike other digital humanitarian networks, the Standby Task Force’s  “Prime Directive” is to not interact with the crisis-affected population. Why? Precisely to minimize the risk to those voluntarily sharing information on social media.

Protection actors must determine the scope, level of precision and depth of detail
of the information collection process, in relation to the intended use of the information collected.

Again, this is determined by the protection actor activating a digital humanitarian network like the SBTF.

Protection actors should systematically review the information collected in order to confirm that it is reliable, accurate, and updated.

The SBTF has a dedicated Verification Team that strives to do this. The verification of crowdsourced, user-generated content posted on social media during crises is no small task. But the BBC’s User-Generated Hub (UGC) has been doing just this for 8 years. Meanwhile, new strategies and technologies are under development to facilitate the rapid verification of such content. Also, the ICRC report notes that “Combining and cross-checking such [crowdsourced] information with other sources, including information collected directly from communities and individuals affected, is becoming standard good practice.”

Protection actors should be explicit as to the level of reliability and accuracy of information they use or share.

Networks like the SBTF make explicit whether a report published on a crisis map has been verified or not. If the latter, the report is clearly marked as “Unverified”. There are more nuanced ways to do this, however. I have recently given feedback on some exciting new research that is looking to quantify the probable veracity of user-generated content.

Protection actors must gather and subsequently process protection information in an objective and impartial manner, to avoid discrimination. They must identify and minimize bias that may affect information collection.

Objective, impartial, non-discriminatory and unbiased information is often more a fantasy than reality even with traditional data. Meeting these requirements in a conflict zone can be prohibitively expensive, overly time consuming and/or downright dangerous. This explains why advanced statistical methods dedicated to correcting biases exist. These can and have been applied to conflict and human rights data. They can also be applied to user-generated content on social media to the extent that the underlying demographic & census based information is possible.

To place this into context, Harvard University Professor Gary King, reminded me that the vast majority of medical data is not representative either. Nor is the vast majority of crime data. Does that render these datasets void? Of course not. Please see this post on Demystifying Crowdsourcing: An Introduction to Non-Probability Sampling.

Security safeguards appropriate to the sensitivity of the information must be in place prior
to any collection of information, to ensure protection from loss or theft, unauthorized access, disclosure, copying, use or modification, in any format in which it is kept.

One of the popular mapping technologies used by digital humanitarian networks is the Ushahidi platform. When the SBTF learned in 2012 that security holes had still not been patched almost a year after reporting them to Ushahidi Inc., the SBTF Core Team made an executive decision to avoid using Ushahidi technology whenever possible given that the platform could be easily hacked. (Just last month, a colleague of mine who is not a techie but a UN practitioner was able to scrape Ushahidi’s entire Kenya election monitoring data form March 2013, which included some personal identifying information). The SBTF has thus been exploring work-arounds and is looking to make greater use of GeoFeedia and Google’s new mapping technology, Stratomap, in future crisis mapping operations.

Protection actors must integrate the notion of informed consent when calling upon the general public, or members of a community, to spontaneously send them information through SMS, an open Internet platform, or any other means of communication, or when using information already available on the Internet.

This is perhaps the most problematic but important protection protocol as far as digital humanitarian work is concerned. While informed consent is absolutely of critical importance, the vast majority of crowdsourced content displayed on crisis maps is user-generated and voluntarily shared on social media. The very act of communicating with these individuals to request their consent not only runs the risk of endangering these individuals but also violates the SBTF’s Prime Directive for the exact same reason. Moreover, interacting with crisis-affected communities may raise expectations of response that digital humanitarians are simply not in position to guarantee. In situations of armed conflict and other situations of violence, conducting individual interviews can put people at risk not only because of the sensitive nature of the information collected, but because mere participation in the process can cause these people to be stigmatized or targeted.

That said, the ICRC does recognize that, “When such consent cannot be realistically obtained, information allowing the identification of victims or witnesses, should only be relayed in the public domain if the expected protection outcome clearly outweighs the risks. In case of doubt, displaying only aggregated data, with no individual markers, is strongly recommended.”

Protection actors should, to the degree possible, keep victims or communities having transmitted information on abuses and violations informed of the action they have taken
on their behalf – and of the ensuing results. Protection actors using information provided
by individuals should remain alert to any negative repercussions on the individuals or communities concerned, owing to the actions they have taken, and take measures to mitigate these repercussions.

Part of this protocol is problematic for the same reason as the above protocol. The very act of communicating with victims could place them in harm’s way. As far as staying alert to any negative repercussions, I believe the more seasoned digital humanitarian networks make this one of their top priorities.

When handling confidential and sensitive information on abuses and violations, protection actors should endeavor when appropriate and feasible, to share aggregated data on the trends they observed.

The purpose of the SBTF’s Analysis Team is precisely to serve this function.

Protection actors should establish formal procedures on the information handling process, from collection to exchange,  archiving or destruction.

Formal procedures to archive & destroy crowdsourced crisis information are largely lacking. Moving forward, the SBTF will defer this responsibility to the Activating Organization.

Conclusion

In conclusion, the ICRC notes that, “When it comes to protection, crowdsourcing can be an extremely efficient way to collect data on ongoing violence and abuses and/or their effects on individuals and communities. Made possible by the wide availability of Internet or SMS in countries affected by violence, crowdsourcing has rapidly gained traction.” To this end,

Although the need for caution is a central message [in the ICRC report], it should in no way be interpreted as a call to avoid sharing information. On the contrary, when the disclosing of protection information is thought to be of benefit to the individuals and communities concerned, it should be shared, as appropriate, with local, regional or national authorities, UN peacekeeping operations, other protection actors, and last but not least with service providers.

This is inline with the conclusions reached by OCHA’s landmark report, which notes that “Concern over the protection of information and data is not a sufficient reason to avoid using new communications technologies in emergencies, but it must be taken into account.” And so, “Whereas the first exercises were conducted without clear procedures to assess and to subsequently limit the risks faced by individuals who participated or who were named, the groups engaged in crisis mapping efforts over the years have become increasingly sensitive to the need to identify & manage these risks” (ICRC 2013).

It is worth recalling that the vast majority of the groups engaged in crisis mapping efforts, such as the SBTF, are first and foremost volunteers who are not only continuing to offer their time, skills and services for free, but are also taking it upon themselves to actively manage the risks involved in crisis mapping—risks that they, perhaps better than anyone else, understand and worry about the most because they are after all at the frontlines of these digital humanitarian efforts. And they do this all on a grand operational budget of $0 (as far as the SBTF goes). And yet, these volunteers continue to mobilize at the request of international humanitarian organizations and are always looking to learn, improve and do better. They continue to change the world for one map at a time.

I have organized a CrisisMappers Webinar on April 17, 2013, featuring presentations and remarks by the lead authors of the new ICRC report. Please join the CrisisMappers list-serve for more information.

Bio

See also:

  • SMS Code of Conduct for Disaster Response (Link)
  • Humanitarian Accountability Handbook (PDF)

A Research Framework for Next Generation Humanitarian Technology and Innovation

Humanitarian donors and organizations are increasingly championing innovation and the use of new technologies for humanitarian response. DfID, for example, is committed to using “innovative techniques and technologies more routinely in humanitarian response” (2011). In a more recent strategy paper, DfID confirmed that it would “continue to invest in new technologies” (2012). ALNAP’s important report on “The State of the Humanitarian System” documents the shift towards greater innovation, “with new funds and mechanisms designed to study and support innovation in humanitarian programming” (2012). A forthcoming land-mark study by OCHA makes the strongest case yet for the use and early adoption of new technologies for humanitarian response (2013).

picme8

These strategic policy documents are game-changers and pivotal to ushering in the next wave of humanitarian technology and innovation. That said, the reports are limited by the very fact that the authors are humanitarian professionals and thus not necessarily familiar with the field of advanced computing. The purpose of this post is therefore to set out a more detailed research framework for next generation humanitarian technology and innovation—one with a strong focus on information systems for crisis response and management.

In 2010, I wrote this piece on “The Humanitarian-Technology Divide and What To Do About It.” This divide became increasingly clear to me when I co-founded and co-directed the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping & Early Warning (2007-2009). So I co-founded the annual Inter-national CrisisMappers Conference series in 2009 and have continued to co-organize this unique, cross-disciplinary forum on humanitarian technology. The CrisisMappers Network also plays an important role in bridging the humanitarian and technology divide. My decision to join Ushahidi as Director of Crisis Mapping (2009-2012) was a strategic move to continue bridging the divide—and to do so from the technology side this time.

The same is true of my move to the Qatar Computing Research Institute (QCRI) at the Qatar Foundation. My experience at Ushahidi made me realize that serious expertise in Data Science is required to tackle the major challenges appearing on the horizon of humanitarian technology. Indeed, the key words missing from the DfID, ALNAP and OCHA innovation reports include: Data Science, Big Data Analytics, Artificial Intelligence, Machine Learning, Machine Translation and Human Computing. This current divide between the humanitarian and data science space needs to be bridged, which is precisely why I joined the Qatar Com-puting Research Institute as Director of Innovation; to develop and prototype the next generation of humanitarian technologies by working directly with experts in Data Science and Advanced Computing.

bridgetech

My efforts to bridge these communities also explains why I am co-organizing this year’s Workshop on “Social Web for Disaster Management” at the 2013 World Wide Web conference (WWW13). The WWW event series is one of the most prestigious conferences in the field of Advanced Computing. I have found that experts in this field are very interested and highly motivated to work on humanitarian technology challenges and crisis computing problems. As one of them recently told me: “We simply don’t know what projects or questions to prioritize or work on. We want questions, preferably hard questions, please!”

Yet the humanitarian innovation and technology reports cited above overlook the field of advanced computing. Their policy recommendations vis-a-vis future information systems for crisis response and management are vague at best. Yet one of the major challenges that the humanitarian sector faces is the rise of Big (Crisis) Data. I have already discussed this here, here and here, for example. The humanitarian community is woefully unprepared to deal with this tidal wave of user-generated crisis information. There are already more mobile phone sub-scriptions than people in 100+ countries. And fully 50% of the world’s population in developing countries will be using the Internet within the next 20 months—the current figure is 24%. Meanwhile, close to 250 million people were affected by disasters in 2010 alone. Since then, the number of new mobile phone subscrip-tions has increased by well over one billion, which means that disaster-affected communities today are increasingly likely to be digital communities as well.

In the Philippines, a country highly prone to “natural” disasters, 92% of Filipinos who access the web use Facebook. In early 2012, Filipinos sent an average of 2 billion text messages every day. When disaster strikes, some of these messages will contain information critical for situational awareness & rapid needs assess-ment. The innovation reports by DfID, ALNAP and OCHA emphasize time and time again that listening to local communities is a humanitarian imperative. As DfID notes, “there is a strong need to systematically involve beneficiaries in the collection and use of data to inform decision making. Currently the people directly affected by crises do not routinely have a voice, which makes it difficult for their needs be effectively addressed” (2012). But how exactly should we listen to millions of voices at once, let alone manage, verify and respond to these voices with potentially life-saving information? Over 20 million tweets were posted during Hurricane Sandy. In Japan, over half-a-million new users joined Twitter the day after the 2011 Earthquake. More than 177 million tweets about the disaster were posted that same day, i.e., 2,000 tweets per second on average.

Screen Shot 2013-03-20 at 1.42.25 PM

Of course, the volume and velocity of crisis information will vary from country to country and disaster to disaster. But the majority of humanitarian organizations do not have the technologies in place to handle smaller tidal waves either. Take the case of the recent Typhoon in the Philippines, for example. OCHA activated the Digital Humanitarian Network (DHN) to ask them to carry out a rapid damage assessment by analyzing the 20,000 tweets posted during the first 48 hours of Typhoon Pablo. In fact, one of the main reasons digital volunteer networks like the DHN and the Standby Volunteer Task Force (SBTF) exist is to provide humanitarian organizations with this kind of skilled surge capacity. But analyzing 20,000 tweets in 12 hours (mostly manually) is one thing, analyzing 20 million requires more than a few hundred dedicated volunteers. What’s more, we do not have the luxury of having months to carry out this analysis. Access to information is as important as access to food; and like food, information has a sell-by date.

We clearly need a research agenda to guide the development of next generation humanitarian technology. One such framework is proposed her. The Big (Crisis) Data challenge is composed of (at least) two major problems: (1) finding the needle in the haystack; (2) assessing the accuracy of that needle. In other words, identifying the signal in the noise and determining whether that signal is accurate. Both of these challenges are exacerbated by serious time con-straints. There are (at least) two ways too manage the Big Data challenge in real or near real-time: Human Computing and Artificial Intelligence. We know about these solutions because they have already been developed and used by other sectors and disciplines for several years now. In other words, our information problems are hardly as unique as we might think. Hence the importance of bridging the humanitarian and data science communities.

In sum, the Big Crisis Data challenge can be addressed using Human Computing (HC) and/or Artificial Intelligence (AI). Human Computing includes crowd-sourcing and microtasking. AI includes natural language processing and machine learning. A framework for next generation humanitarian technology and inno-vation must thus promote Research and Development (R&D) that apply these methodologies for humanitarian response. For example, Verily is a project that leverages HC for the verification of crowdsourced social media content generated during crises. In contrast, this here is an example of an AI approach to verification. The Standby Volunteer Task Force (SBTF) has used HC (micro-tasking) to analyze satellite imagery (Big Data) for humanitarian response. An-other novel HC approach to managing Big Data is the use of gaming, something called Playsourcing. AI for Disaster Response (AIDR) is an example of AI applied to humanitarian response. In many ways, though, AIDR combines AI with Human Computing, as does MatchApp. Such hybrid solutions should also be promoted   as part of the R&D framework on next generation humanitarian technology. 

There is of course more to humanitarian technology than information manage-ment alone. Related is the topic of Data Visualization, for example. There are also exciting innovations and developments in the use of drones or Unmanned Aerial Vehicles (UAVs), meshed mobile communication networks, hyper low-cost satellites, etc.. I am particularly interested in each of these areas will continue to blog about them. In the meantime, I very much welcome feedback on this post’s proposed research framework for humanitarian technology and innovation.

 bio

Opening World Bank Data with QCRI’s GeoTagger

My colleagues and I at QCRI partnered with the World Bank several months ago to develop an automated GeoTagger platform to increase the transparency and accountability of international development projects by accelerating the process of opening key development and finance data. We are proud to launch the first version of the GeoTagger platform today. The project builds on the Bank’s Open Data Initiatives promoted by former President, Robert Zoellick, and continued under the current leadership of Dr. Jim Yong Kim.

QCRI GeoTagger 1

The Bank has accumulated an extensive amount of socio-economic data as well as a massive amount of data on Bank-sponsored development projects worldwide. Much of this data, however, is not directly usable by the general public due to numerous data format, quality and access issues. The Bank therefore launched their “Mapping for Results” initiative to visualize the location of Bank-financed projects to better monitor development impact, improve aid effectiveness and coordination while enhancing transparency and social accountability. The geo-tagging of this data, however, has been especially time-consuming and tedious. Numerous interns were required to manually read through tens of thousands of dense World Bank project documentation, safeguard documents and results reports to identify and geocode exact project locations. But there are hundreds of thousands of such PDF documents. To make matters worse, these documents make seemingly “random” passing references to project locations, with no sign of any  standardized reporting structure whatsoever.

QCRI GeoTagger 2

The purpose of QCRI’s GeoTagger Beta is to automatically “read” through these countless PDF documents to identify and map all references to locations. GeoTagger does this using the World Bank Projects Data API and the Stanford Name Entity Recognizer (NER) & Alchemy. These tools help to automatically search through documents and identify place names, which are then geocoded using the Google GeocoderYahoo! Placefinder & Geonames and placed on a de-dicated map. QCRI’s GeoTagger will remain freely available and we’ll be making the code open source as well.

Naturally, this platform could be customized for many different datasets and organizations, which is why we’ve already been approached by a number of pro-spective partners to explore other applications. So feel free to get in touch should this also be of interest to your project and/or organization. In the meantime, a very big thank you to my colleagues at QCRI’s Big Data Analytics Center: Dr. Ihab Ilyas, Dr. Shady El-Bassuoni, Mina Farid and last but certainly not least, Ian Ye for their time on this project. Many thanks as well to my colleagues Johannes Kiess, Aleem Walji and team from the World Bank and Stephen Davenport at Development Gateway for the partnership.

bio

 

Launching: SMS Code of Conduct for Disaster Response

Shortly after the devastating Haiti Earthquake of January 12, 2010, I published this blog post on the urgent need for an SMS code of conduct for disaster response. Several months later, I co-authored this peer-reviewed study on the lessons learned from the unprecedented use of SMS following the Haiti Earth-quake. This week, at the Mobile World Congress (MWC 2013) in Barcelona, GSMA’s Disaster Response Program organized two panels on mobile technology for disaster response and used the event to launch an official SMS Code of Conduct for Disaster Response (PDF). GSMA members comprise nearly 800 mobile operators based in more than 220 countries.

Screen Shot 2013-02-18 at 2.27.32 PM

Thanks to Kyla Reid, Director for Disaster Response at GSMA, and to Souktel’s Jakob Korenblummy calls for an SMS code of conduct were not ignored. The three of us spent a considerable amount of time in 2012 drafting and re-drafting a detailed set of principles to guide SMS use in disaster response. During this process, we benefited enormously from many experts on the mobile operators side and the humanitarian community; many of whom are at MWC 2013 for the launch of the guidelines. It is important to note that there have been a number of parallel efforts that our combined work has greatly benefited from. The Code of Conduct we launched this week does not seek to duplicate these important efforts but rather serves to inform GSMA members about the growing importance of SMS use for disaster response. We hope this will help catalyze a closer relationship between the world’s leading mobile operators and the international humanitarian community.

Since the impetus for this week’s launch began in response to the Haiti Earth-quake, I was invited to reflect on the crisis mapping efforts I spearheaded at the time. (My slides for the second panel organized by GSMA are available here. My more personal reflections on the 3rd year anniversary of the earthquake are posted here). For several weeks, digital volunteers updated the Ushahidi-Haiti Crisis Map (pictured above) with new information gathered from hundreds of different sources. One of these information channels was SMS. My colleague Josh Nesbit secured an SMS short code for Haiti thanks to a tweet he posted at 1:38pm on Jan 13th (top left in image below). Several days later, the short code (4636) was integrated with the Ushahidi-Haiti Map.

Screen Shot 2013-02-18 at 2.40.09 PM

We received about 10,000 text messages from the disaster-affected population during the during the Search and Rescue phase. But we only mapped about 10% of these because we prioritized the most urgent and actionable messages. While mapping these messages, however, we had to address a critical issue: data privacy and protection. There’s an important trade-off here: the more open the data, the more widely useable that information is likely to be for professional disaster responders, local communities and the Diaspora—but goodbye privacy.

Time was not a luxury we had; an an entire week had already passed since the earthquake. We were at the tail end of the search and rescue phase, which meant that literally every hour counted for potential survivors still trapped under the rubble. So we immediately reached out to 2 trusted lawyers in Boston, one of them a highly reputable Law Professor at The Fletcher School of Law and Diplomacy who also a specialist on Haiti. You can read the lawyers’ written email replies along with the day/time they were received on the right-hand side of the slide. Both lawyers opined that consent was implied vis-à-vis the publishing of personal identifying information. We shared this opinion with all team members and partners working with us. We then made a joint decision 24 hours later to move ahead and publish the full content of incoming messages. This decision was supported by an Advisory Board I put together comprised of humanitarian colleagues from the Harvard Humanitarian Initiative who agreed that the risks of making this info public were minimal vis-à-vis the principle of Do No HarmUshahidi thus launched a micro-tasking platform to crowdsource the translation efforts and hosted this on 4636.Ushahidi.com [link no longer live], which volunteers from the Diaspora used to translate the text messages.

I was able to secure a small amount of funding in March 2010 to commission a fully independent evaluation of our combined efforts. The project was evaluated a year later by seasoned experts from Tulane University. The results were mixed. While the US Marine Corps publicly claimed to have saved hundreds of lives thanks to the map, it was very hard for the evaluators to corroborate this infor-mation during their short field visit to Port-au-Prince more than 12 months after the earthquake. Still, this evaluation remains the only professional, independent and rigorous assessment of Ushahidi and 4636 to date.

Screen Shot 2013-02-25 at 2.10.47 AM

The use of mobile technology for disaster response will continue to increase for years to come. Mobile operators and humanitarian organizations must therefore be pro-active in managing this increase demand by ensuring that the technology is used wisely. I, for one, never again want to spend 24+ precious hours debating whether or not urgent life-and-death text messages can or cannot be mapped because of uncertainties over data privacy and protection—24 hours during a Search and Rescue phase is almost certain to make the difference between life and death. More importantly, however, I am stunned that a bunch of volunteers with little experience in crisis response and no affiliation whatsoever to any established humanitarian organization were able to secure and use an official SMS short code within days of a major disaster. It is little surprise that we made mistakes. So a big thank you to Kyla and Jakob for their leadership and perseverance in drafting and launching GSMA’s official SMS Code of Conduct to make sure the same mistakes are not made again.

While the document we’ve compiled does not solve every possible challenge con-ceivable, we hope it is seen as a first step towards a more informed and responsible use of SMS for disaster response. Rest assured that these guidelines are by no means written in stone. Please, if you have any feedback, kindly share them in the comments section below or privately via email. We are absolutely committed to making this a living document that can be updated.

To connect this effort with the work that my CrisisComputing Team and I are doing at QCRI, our contact at Digicel during the Haiti response had given us the option of sending out a mass SMS broadcast to their 2 million subscribers to get the word out about 4636. (We had thus far used local community radio stations). But given that we were processing incoming SMS’s manually, there was no way we’d be able to handle the increased volume and velocity of incoming text messages following the SMS blast. So my team and I are exploring the use of advanced computing solutions to automatically parse and triage large volumes of text messages posted during disasters. The project, which currently uses Twitter, is described here in more detail.

bio