Tag Archives: HealthMap

State of the Art in Digital Disease Detection

Larry Brilliant’s TED Talk back in 2006 played an important role in catalyzing my own personal interest in humanitarian technology. Larry spoke about the use of natural language processing and computational linguistics for the early detection and early response to epidemics. So it was with tremendous honor and deep gratitude that I delivered the first keynote presentation at Harvard University’s Digital Disease Detection (DDD) conference earlier this year.

The field of digital disease detection has remained way ahead of the curve since 2006 in terms of leveraging natural language processing, computational linguistics and now crowdsourcing for the purposes of early detection of critical events. I thus highly, highly recommend watching the videos of the DDD Ignite Talks and panel presentations, which are all available here. Topics include “Participatory Surveillance,” “Monitoring Rumors,” “Twitter and Disease Detection,” “Search Query Surveillance,” “Open Source Surveillance,” “Mobile Disease Detection,” etc. The presentation on BioCaster is also well worth watching. I blogged about BioCaster here over three years ago and the platform is as impressive as ever.

These public health experts are really operating at the cutting-edge and their insights are proving important to the broader humanitarian technology community. To be sure, the potential added value of cross-fertilization between fields is tremendous. Just take this example of a public health data mining platform (HealthMap) being used by Syrian activists to detect evidence of killings and human rights violations.

Crisis Mapping Syria: Automated Data Mining and Crowdsourced Human Intelligence

The Syria Tracker Crisis Map is without doubt one of the most impressive crisis mapping projects yet. Launched just a few weeks after the protests began one year ago, the crisis map is spearheaded by a just handful of US-based Syrian activists have meticulously and systematically documented 1,529 reports of human rights violations including a total of 11,147 killings. As recently reported in this NewScientist article, “Mapping the Human Cost of Syria’s Uprising,” the crisis map “could be the most accurate estimate yet of the death toll in Syria’s uprising [...].” Their approach? “A combination of automated data mining and crowdsourced human intelligence,” which “could provide a powerful means to assess the human cost of wars and disasters.”

On the data-mining side, Syria Tracker has repurposed the HealthMap platform, which mines thousands of online sources for the purposes of disease detection and then maps the results, “giving public-health officials an easy way to monitor local disease conditions.” The customized version of this platform for Syria Tracker (ST), known as HealthMap Crisis, mines English information sources for evidence of human rights violations, such as killings, torture and detainment. As the ST Team notes, their data mining platform “draws from a broad range of sources to reduce reporting biases.” Between June 2011 and January 2012, for example, the platform collected over 43,o00 news articles and blog posts from almost 2,000 English-based sources from around the world (including some pro-regime sources).

Syria Tracker combines the results of this sophisticated data mining approach with crowdsourced human intelligence, i.e., field-based eye-witness reports shared via webform, email, Twitter, Facebook, YouTube and voicemail. This naturally presents several important security issues, which explains why the main ST website includes an instructions page detailing security precautions that need to be taken while sub-mitting reports from within Syria. They also link to this practical guide on how to protect your identity and security online and when using mobile phones. The guide is available in both English and Arabic.

Eye-witness reports are subsequently translated, geo-referenced, coded and verified by a group of volunteers who triangulate the information with other sources such as those provided by the HealthMap Crisis platform. They also filter the reports and remove dupli-cates. Reports that have a low con-fidence level vis-a-vis veracity are also removed. Volunteers use a dig-up or vote-up/vote-down feature to “score” the veracity of eye-witness reports. Using this approach, the ST Team and their volunteers have been able to verify almost 90% of the documented killings mapped on their platform thanks to video and/or photographic evidence. They have also been able to associate specific names to about 88% of those reported killed by Syrian forces since the uprising began.

Depending on the levels of violence in Syria, the turn-around time for a report to be mapped on Syria Tracker is between 1-3 days. The team also produces weekly situation reports based on the data they’ve collected along with detailed graphical analysis. KML files that can be uploaded and viewed using Google Earth are also made available on a regular basis. These provide “a more precisely geo-located tally of deaths per location.”

In sum, Syria Tracker is very much breaking new ground vis-a-vis crisis mapping. They’re combining automated data mining technology with crowdsourced eye-witness reports from Syria. In addition, they’ve been doing this for a year, which makes the project the longest running crisis maps I’ve seen in a hostile environ-ment. Moreover, they’ve been able to sustain these import efforts with just a small team of volunteers. As for the veracity of the collected information, I know of no other public effort that has taken such a meticulous and rigorous approach to documenting the killings in Syria in near real-time. On February 24th, Al-Jazeera posted the following estimates:

Syrian Revolution Coordination Union: 9,073 deaths
Local Coordination Committees: 8,551 deaths
Syrian Observatory for Human Rights: 5,581 deaths

At the time, Syria Tracker had a total of 7,901 documented killings associated with specific names, dates and locations. While some duplicate reports may remain, the team argues that “missing records are a much bigger source of error.” Indeed, They believe that “the higher estimates are more likely, even if one chooses to disregard those reports that came in on some of the most violent days where names were not always recorded.”

The Syria Crisis Map itself has been viewed by visitors from 136 countries around the world and 2,018 cities—with the top 3 cities being Damascus, Washington DC and, interestingly, Riyadh, Saudia Arabia. The witnessing has thus been truly global and collective. When the Syrian regime falls, “the data may help sub-sequent governments hold him and other senior leaders to account,” writes the New Scientist. This was one of the principle motivations behind the launch of the Ushahidi platform in Kenya over four years ago. Syria Tracker is powered by Ushahidi’s cloud-based platform, Crowdmap. Finally, we know for a fact that the International Criminal Court (ICC) and Amnesty International (AI) closely followed the Libya Crisis Map last year.

Twitter, Crises and Early Detection: Why “Small Data” Still Matters

My colleagues John Brownstein and Rumi Chunara at Harvard Univer-sity’s HealthMap project are continuing to break new ground in the field of Digital Disease Detection. Using data obtained from tweets and online news, the team was able to identify a cholera outbreak in Haiti weeks before health officials acknowledged the problem publicly. Meanwhile, my colleagues from UN Global Pulse partnered with Crimson Hexagon to forecast food prices in Indonesia by carrying out sentiment analysis of tweets. I had actually written this blog post on Crimson Hexagon four years ago to explore how the platform could be used for early warning purposes, so I’m thrilled to see this potential realized.

There is a lot that intrigues me about the work that HealthMap and Global Pulse are doing. But one point that really struck me vis-a-vis the former is just how little data was necessary to identify the outbreak. To be sure, not many Haitians are on Twitter and my impression is that most humanitarians have not really taken to Twitter either (I’m not sure about the Haitian Diaspora). This would suggest that accurate, early detection is possible even without Big Data; even with “Small Data” that is neither representative or indeed verified. (Inter-estingly, Rumi notes that the Haiti dataset is actually larger than datasets typically used for this kind of study).

In related news, a recent peer-reviewed study by the European Commi-ssion found that the spatial distribution of crowdsourced text messages (SMS) following the earthquake in Haiti were strongly correlated with building damage. Again, the dataset of text messages was relatively small. And again, this data was neither collected using random sampling (i.e., it was crowdsourced) nor was it verified for accuracy. Yet the analysis of this small dataset still yielded some particularly interesting findings that have important implications for rapid damage detection in post-emergency contexts.

While I’m no expert in econometrics, what these studies suggests to me is that detecting change-over–time is ultimately more critical than having a large-N dataset, let alone one that is obtained via random sampling or even vetted for quality control purposes. That doesn’t mean that the latter factors are not important, it simply means that the outcome of the analysis is relatively less sensitive to these specific variables. Changes in the baseline volume/location of tweets on a given topic appears to be strongly correlated with offline dynamics.

What are the implications for crowdsourced crisis maps and disaster response? Could similar statistical analyses be carried out on Crowdmap data, for example? How small can a dataset be and still yield actionable findings like those mentioned in this blog post?

A Brief History of Crisis Mapping (Updated)

Introduction

One of the donors I’m in contact with about the proposed crisis mapping conference wisely recommended I add a big-picture background to crisis mapping. This blog post is my first pass at providing a brief history of the field. In a way, this is a combined summary of several other posts I have written on this blog over the past 12 months plus my latest thoughts on crisis mapping.

Evidently, this account of history is very much influenced by my own experience so I may have unintentionally missed a few relevant crisis mapping projects. Note that by crisis  I refer specifically to armed conflict and human rights violations. As usual, I welcome any feedback and comments you may have so I can improve my blog posts.

From GIS to Neogeography: 2003-2005

The field of dynamic crisis mapping is new and rapidly changing. The three core drivers of this change are the increasingly available and accessible of (1) open-source, dynamic mapping tools; (2) mobile data collection technologies; and lastly (3) the development of new methodologies.

Some experts at the cutting-edge of this change call the results “Neogeography,” which is essentially about “people using and creating their own maps, on their own terms and by combining elements of an existing toolset.” The revolution in applications for user-generated content and mobile technology provides the basis for widely distributed information collection and crowdsourcing—a term coined by Wired less than three years ago. The unprecedented rise in citizen journalism is stark evidence of this revolution. New methodologies for conflict trends analysis increasingly take spatial and/or inter-annual dynamics into account and thereby reveal conflict patterns that otherwise remain hidden when using traditional methodologies.

Until recently, traditional mapping tools were expensive and highly technical geographic information systems (GIS), proprietary software that required extensive training to produce static maps.

In terms of information collection, trained experts traditionally collected conflict and human rights data and documented these using hard-copy survey forms, which typically became proprietary once completed. Scholars began coding conflict event-data but data sharing was the exception rather than the rule.

With respect to methodologies, the quantitative study of conflict trends was virtually devoid of techniques that took spatial dynamics into account because conflict data at the time was largely macro-level data constrained by the “country-year straightjacket.”

That is, conflict data was limited to the country-level and rarely updated more than once a year, which explains why methodologies did not seek to analyze sub-national and inter-annual variations for patterns of conflict and human rights abuses. In addition, scholars in the political sciences were more interested in identifying when conflict as likely to occur as opposed to where. For a more in-depth discussion of this issue, please see my paper from 2006  “On Scale and Complexity in Conflict Analysis” (PDF).

Neogeography is Born: 2005

The pivotal year for dynamic crisis mapping was 2005. This is the year that Google rolled out Google Earth. The application marks an important milestone in Neogeography because the free, user-friendly platform drastically reduced the cost of dynamic and interactive mapping—cost in terms of both availability and accessibility. Microsoft has since launched Virual Earth to compete with Google Earth and other  potential contenders.

Interest in dynamic crisis mapping did exist prior to the availability of Google Earth. This is evidenced by the dynamic mapping initiatives I took at Swisspeace in 2003. I proposed that the organization use GIS tools to visualize, animate and analyze the geo-referenced conflict event-data collected by local Swisspeace field monitors in conflict-ridden countries—a project called FAST. In a 2003 proposal, I defined dynamic crisis maps as follows:

FAST Maps are interactive geographic information systems that enable users of leading agencies to depict a multitude of complex interdependent indicators on a user-friendly and accessible two-dimensional map. […] Users have the option of selecting among a host of single and composite events and event types to investigate linkages [between events]. Events and event types can be superimposed and visualized through time using FAST Map’s animation feature. This enables users to go beyond studying a static picture of linkages to a more realistic dynamic visualization.

I just managed to dig up old documents from 2003 and found the interface I had designed for FAST Maps using the template at the time for Swisspeace’s website.

fast-map1

fast-map2

However, GIS software was (and still is) prohibitively expensive and highly technical. To this end, Swisspeace was not compelled to make the necessary investments in 2004 to develop the first crisis mapping platform for producing dynamic crisis maps using geo-referenced conflict data. In hindsight, this was the right decision since Google Earth was rolled out the following year.

Enter PRIO and GROW-net: 2006-2007

With the arrival of Google Earth, a variety of dynamic crisis maps quickly emerged. In fact, one if not the first application of Google Earth for crisis mapping was carried out in 2006 by Jen Ziemke and I. We independently used Google Earth and newly available data from the Peace Research Institute, Oslo (PRIO) to visualize conflict data over time and space. (Note that both Jen and I were researchers at PRIO between 2006-2007).

Jen used Google Earth to explain the dynamics and spatio-temporal variation in violence during the Angolan war. To do this, she first coded nearly 10,000 battle and massacre events as reported in the Portuguese press that took place over a 40 year period.

Meanwhile, I produced additional dynamic crisis maps of the conflict in the Democratic Republic of the Congo (DRC) for PRIO and of the Colombian civil war for the Conflict Analysis Resource Center (CARC) in Bogota. At the time, researchers in Oslo and Bogota used proprietary GIS software to produce static maps (PDF) of their newly geo-referenced conflict data. PRIO eventually used Google Earth but only to publicize the novelty of their new geo-referenced historical conflict datasets.

Since then, PRIO has continued to play an important role in analyzing the spatial dynamics of armed conflict by applying new quantitative methodologies. Together with universities in Europe, the Institute formed the Geographic Representations of War-net (GROW-net) in 2006, with the goal of “uncovering the causal mechanisms that generate civil violence within relevant historical and geographical and historical configurations.” In 2007, the Swiss Federal Institute of Technology in Zurich (ETH), a member of GROW-net, produced dynamic crisis maps using Google Earth for a project called WarViews.

Crisis Mapping Evolves: 2007-2008

More recently, Automated Crisis Mapping (ACM), real-time and automated information collection mechanisms using natural language processing (NLP) have been developed for the automated and dynamic mapping of disaster and health-related events. Examples of such platforms include the Global Disaster Alert and Crisis System (GDACS), CrisisWire, Havaria and HealthMap. Similar platforms have been developed for  automated mapping of other news events, such as Global Incident Map, BuzzTracker, Development Seed’s Managing the News, and the Joint Research Center’s European Media Monitor.

Equally recent is the development of Mobile Crisis Mapping (MCM), mobile crowdsourcing platforms designed for the dynamic mapping of conflict and human rights data as exemplified by Ushahidi (with FrontLineSMS) and the Humanitarian Sensor Web (SensorWeb).

Another important development around this time is the practice of participatory GIS preceded by the recognition that social maps and conflict maps can empower local communities and be used for conflict resolution. Like maps of natural disasters and environmental degradation, these can be developed and discussed at the community level to engage conversation and joint decision-making. This is a critical component since one of the goals of crisis mapping is to empower individuals to take better decisions.

HHI’s Crisis Mapping Project: 2007-2009

The Harvard Humanitarian Initiative (HHI) is currently playing a pivotal role in crafting the new field of dynamic crisis mapping. Coordinated by Jennifer Leaning and myself, HHI is completing a two-year applied research project on Crisis Mapping and Early Warning. This project comprised a critical and comprehensive evaluation of the field and the documentation of lessons learned, best practices as well as alternative and innovative approaches to crisis mapping and early warning.

HHI also acts as an incubator for new projects and  supported the conceptual development of new crisis mapping platforms like Ushahidi and the SensorWeb. In addition, HHI produced the first comparative and dynamic crisis map of Kenya by drawing on reports from the mainstream media, citizen journalists and Ushahidi to analyze spatial and temporal patterns of conflict events and communication flows during a crisis.

HHI’s Sets a Research Agenda: 2009

HHI has articulated an action-oriented research agenda for the future of crisis mapping based on the findings from the two-year crisis mapping project. This research agenda can be categorized into the following three areas, which were coined by HHI:

  1. Crisis Map Sourcing
  2. Mobile Crisis Mapping
  3. Crisis Mapping Analytics

1) Crisis Map Sourcing (CMS) seeks to further research on the challenge of visualizing disparate sets of data ranging from structural and dynamic data to automated and mobile crisis mapping data. The challenge of CMS is to develop appropriate methods and best practices for mashing data from Automated Crisis Mapping (ACM) tools and Mobile Crisis Mapping platforms (see below) to add value to Crisis Mapping Analytics (also below).

2) The purpose of setting an applied-research agenda for Mobile Crisis Mapping, or MCM, is to recognize that the future of distributed information collection and crowdsourcing will be increasingly driven by mobile technologies and new information ecosystems. This presents the crisis mapping community with a host of pressing challenges ranging from data validation and manipulation to data security.

These hurdles need to be addressed directly by the crisis mapping community so that new and creative solutions can be applied earlier rather than later. If the persistent problem of data quality is not adequately resolved, then policy makers may question the reliability of crisis mapping for conflict prevention, rapid response and the documentation of human rights violations. Worse still, inaccurate data may put lives at risk.

3) Crisis Mapping Analytics (CMA) is the third critical area of research set by HHI. CMA is becoming increasingly important given the unprecedented volume of geo-referenced data that is rapidly becoming available. Existing academic platforms like WarViews and operational MCM platforms like Ushahidi do not include features that allow practitioners, scholars and the public to query the data and to visually analyze and identify the underlying spatial dynamics of the conflict and human rights data. This is largely true of Automated Crisis Mapping (ACM) tools as well.

In other words, new and informative metrics are need to be developed to identify patterns in human rights abuses and violent conflict both retrospectively and in real-time. In addition, existing techniques from spatial econometrics need to be rendered more accessible to non-statisticians and built into existing dynamic crisis mapping platforms.

Conclusion

Jen Ziemke and I thus conclude that the most pressing need in the field of crisis mapping is to bridge the gap between scholars and practitioners who self-identify as crisis mappers. This is the most pressing issue because bridging that divide will enable the field of crisis mapping to effectively and efficiently move forward by pursuing the three research agendas set out by the Harvard Humanitarian Initiative (HHI).

We think this is key to moving the crisis-mapping field into more mainstream humanitarian and human rights work—i.e., operational response. But doing so first requires that leading crisis mapping scholars and practitioners proactively bridge the existing gap. This is the core goal of the crisis mapping conference that we propose to organize.

Patrick Philippe Meier