Monthly Archives: October 2011

Tracking Population Movements using Mobile Phones and Crisis Mapping: A Post-Earthquake Geospatial Study in Haiti

I’ve been meaning to blog about this project since it was featured on BBC last month: “Mobile Phones Help to Target Disaster Aid, says Study.” I’ve since had the good fortune of meeting Linus Bengtsson and Xin Lu, the two lead authors of this study (PDF), at a recent strategy meeting organized by GSMA. The authors are now launching “Flowminder” in affiliation with the Karolinska Institutet in Stockholm to replicate their excellent work beyond Haiti. If “Flowminder” sounds familiar, you may be thinking of Hans Rosling’s “Gapminder” which also came out of the Karolinska Institutet. Flowminder’s mission: “Providing priceless information for free for the benefit of those who need it the most.”

As the authors note, “population movements following disasters can cause important increases in morbidity and mortality.” That is why the UN sought to develop early warning systems for refugee flows during the 1980’s and 1990’s. These largely didn’t pan out; forecasting is not a trivial challenge. Nowcasting, however, may be easier. That said, “no rapid and accurate method exists to track population movements after disasters.” So the authors used “position data of SIM cards from the largest mobile phone company in Haiti (Digicel) to estimate the magnitude and trends of population movements following the Haiti 2010 earthquake and cholera outbreak.”

The geographic locations of SIM cards were determined by the location of the mobile phone towers that SIM cards were connecting to when calling. The authors followed the daily positions of 1.9 million SIM cards for 42 days prior to the earthquake and 158 days following the quake. The results of the analysis reveal that an estimated 20% of the population in Port-au-Prince left the city within three weeks of the earthquake. These findings corresponded well with of a large, retrospective population based survey carried out by the UN.

“To demonstrate feasibility of rapid estimates and to identify areas at potentially increased risk of outbreaks,” the authors “produced reports on SIM card move-ments from a cholera outbreak area at its immediate onset and within 12 hours of receiving data.” This latter analysis tracked close to 140,000 SIM cards over an 8 day period. In sum, the “results suggest that estimates of population movements during disasters and outbreaks can be delivered rapidly and with potentially high validity in areas with high mobile phone use.”

I’m really keen to see the Flowminder team continue their important work in and beyond Haiti. I’ve invited them to present at the International Conference of Crisis Mappers (ICCM 2011) in Geneva next month and hope they’ll be able to join us. I’m interested to explore the possibilities of combining this type of data and analysis with crowdsourced crisis information and satellite imagery analysis. In addition, mobile phone data can also be used to estimate the hardest hit areas after a disaster. For more on this, please see my previous blog post entitled “Analyzing Call Dynamics to Assess the Impact of Earthquakes” and this post on using mobile phone data to assess the impact of building damage in Haiti.

Crowdsourcing Will Solve All Humanitarian Problems

Here’s one of my favorite false arguments: “There are some people who believe that crowdsourcing will solve all humanitarian challenges….” So said a good colleague of mine vis-a-vis crisis response at a recent strategy meeting. Of course, when I pressed him for names, he didn’t have a reply. I don’t know anyone who subscribes to the above-mentioned point of view. While I understand that he made the statement in jest and primarily to position himself, I’m concerned that some in the humanitarian community actually believe this comment to be true.

First of all, suggesting that some individuals subscribe to an extreme point of view is a cheap debating tactic and a real pet peeve of mine. Simply label your “opponent” as holding a fundamentalist view of the world and everything you say following that statement holds true, easily discrediting your competition in the eyes of the jury. Surely we’ve moved beyond these types of false arguments in the crisis mapping community.

Secondly, crowdsourcing  is simply one among several methodologies that can, in some cases, be useful to collect information following a crisis. And as mentioned in this previous blog post entitled, “Demystifying Crowdsourcing: An Intro-duction to Non-Random Sampling,” the use of crowdsourcing, like any metho-dology, comes with advantages and disadvantages that depend both on goals and context. Surely, this is now common knowledge.

My point here is neither defend nor dismiss the use of crowdsourcing. My hope is that we move away from such false, dichotomous debates to conversations that recognize the complexities of an evolving situation; dialogues that value having more methodologies in the toolbox rather than fewer—and corresponding manuals that give us clarification on trade-offs and appropriate guidance on when to use which methods, why and how. Crowdsourcing crisis information has never been an either-or argument, so lets not turn it into one. Polarizing the con-versation with fictitious claims will only get in the way of learning and innovation.

Detecting Emerging Conflicts with Web Mining and Crisis Mapping

My colleague Christopher Ahlberg, CEO of Recorded Future, recently got in touch to share some exciting news. We had discussed our shared interests a while back at Harvard University. It was clear then that his ideas and existing technologies were very closely aligned to those we were pursuing with Ushahidi’s Swift River platform. I’m thrilled that he has been able to accomplish a lot since we last spoke. His exciting update is captured in this excellent co-authored study entitled “Detecting Emergent Conflicts Through Web Mining and Visualization” which is available here as a PDF.

The study combines almost all of my core interests: crisis mapping, conflict early warning, conflict analysis, digital activism, pattern recognition, natural language processing, machine learning, data visualization, etc. The study describes a semi-automatic system which automatically collects information from pre-specified sources and then applies linguistic analysis to user-specified extract events and entities, i.e., structured data for quantitative analysis.

Natural Language Processing (NLP) and event-data extraction applied to crisis monitoring and analysis is of course nothing new. Back in 2004-2005, I worked for a company that was at the cutting edge of this field vis-a-vis conflict early warning. (The company subsequently joined the Integrated Conflict Early Warning System (ICEWS) consortium supported by DARPA). Just a year later, Larry Brilliant told TED 2006 how the Global Public Health Information Net-work (GPHIN) had leveraged NLP and machine learning to detect an outbreak of SARS 3 months before the WHO. I blogged about this, Global Incident Map, European Media Monitor (EMM), HavariaHealthMap and Crimson Hexagon back in 2008. Most recently, my colleague Kalev Leetaru showed how applying NLP to historical data could have predicted the Arab Spring. Each of these initiatives represents an important effort in leveraging NLP and machine learning for early detection of events of interest.

The RecordedFuture system works as follows. A user first selects a set of data sources (websites, RSS feeds, etc) and determines the rate at which to update the data. Next, the user chooses one or several existing “extractors” to find specific entities and events (or constructs a new type). Finally, a taxonomy is selected to specify exactly how the data is to be grouped. The data is then automatically harvested and passed through a linguistics analyzer which extracts useful information such as event types, names, dates, and places. Finally, the reports are clustered and visualized on a crisis map, in this case using an Ushahidi platform. This allows for all kinds of other datasets to be imported, compared and analyzed, such as high resolution satellite imagery and crowdsourced data.

A key feature of the RecordedFuture system is that extracts and estimates the time for the event described rather than the publication time of the newspaper article parsed, for example. As such, the harvested data can include both historic and future events.

In sum, the RecordedFuture system is composed of the following five features as described in the study:

1. Harvesting: a process in which text documents are retrieved from various sources and stored in the database. The documents are stored for long-term if permitted by terms of use and IPR legislation, otherwise they are only stored temporarily for the needed analysis.

2. Linguistic analysis: the process in which the retrieved texts are analyzed in order to extract entities, events, time and location, etc. In contrast to other components, the linguistic analysis is language dependent.

3. Refinement: additional information can be obtained in this process by synonym detection, ontology analysis, and sentiment analysis.

4. Data analysis: application of statistical and AI-based models such as Hidden Markov Models (HMMs) and Artificial Neural Networks (ANNs) to generate predictions about the future and detect anomalies in the data.

5. User experience: a web interface for ordinary users to interact with, and an API for interfacing to other systems.

The authors ran a pilot that “manually” integrated the RecordedFuture system with the Ushahidi platform. The result is depicted in the figure below. In the future, the authors plan to automate the creation of reports on the Ushahidi platform via the RecordedFuture system. Intriguingly, the authors chose to focus on protest events to demo their Ushahidi-coupled system. Why is this intriguing? Because my dissertation analyzed whether access to new information and communication technologies (ICTs) are statistically significant predictors of protest events in repressive states. Moreover, the protest data I used in my econometric analysis came from an automated NLP algorithm that parsed Reuters Newswires.

Using RecordedFuture, the authors extracted some 6,000 protest event-data for Quarter 1 of 2011. These events were identified and harvested using a “trained protest extractor” constructed using the system’s event extractor frame-work. Note that many of the 6,000 events are duplicates because they are the same events but reported by different forces. Not surprisingly, Christopher and team plan to develop a duplicate detection algorithm that will also double as a triangulation & veracity scoring feature. I would be particularly interested to see them do this kind of triangulation and validation of crowdsourced data on the fly.

Below are the protest events picked up by RecordedFuture for both Tunisia and Egypt. From these two figures, it is possible to see how the Tunisian protests preceded those in Egypt.

The authors argue that if the platform had been set up earlier this year, a user would have seen the sudden rise in the number of protests in Egypt. However, the authors acknowledge that their data is a function of media interest and attention—the same issue I had with my dissertation. One way to overcome this challenge might be by complementing the harvested reports with crowdsourced data from social media and Crowdmap.

In the future, the authors plan to have the system auto-detect major changes in trends and to add support for the analysis of media in languages beyond English. They also plan to test the reliability and accuracy of their conflict early warning algorithm by comparing their forecasts of historical data with existing conflict data sets. I have several ideas of my own about next steps and look forward to speaking with Christopher’s team about ways to collaborate.

Crisis Mapping the Opening Battle of the Sino-French War

I only had a few hours to explore Taipei last week and thus chose to visit the highly recommended National Place Museum just outside the city. I was well impressed with the Museum’s use of technology, from table-sized “iPads” to 3D virtual reality displays of ancient artifacts. But it was a small and nondescript 127-year-old crisis map that truly stole the show for me.

The crisis map depicts the Battle of Fuzhou (Foochow) also known as the Battle of the Pagoda Anchorage, named for a remarkable Chinese pagoda, the Luoxingta (羅星塔), which still stands on a hill above the harbor today. The battle, which took place in August 1884, was the opening engagement of the Sino-French War which lasted for a year and a half.

Admiral Amédée Courbet, in command of the French squadran, had noticed that the Chinese ships anchored near the harbor swung with the tide and thus decided to plan his attack just before high tide at 2 p.m. on the afternoon of Saturday, August 23, 1884, when he hoped the Chinese ships would have “swung away from the French ships and would be presenting their vulnerable sterns to the attackers.” Courbet’s strategy worked, “virtually destroying the Fujian Fleet, one of China’s four regional fleets.”

I took a picture of the Chinese crisis map on display in Taipei (see below), which is apparently the first copy to make it on the Internet. The caption in English on the bottom right reads: “Diagram of engagement between the French and Chinese naval fleets at Mawei, French warships attack during the afternoon low tide. Chinese vessels anchored at the bows, now face the French astern, unable to use the powerful bow cannons, resulting in the total sinking of the Ch’ing  Fuzhou (Foochow) Naval Fleet, August 23, 1884.”

I was so intrigued and surprised to find this crisis map that I followed up with some online research. The Wikipedia article on the battle was an absolute treasure trove of information and pictures. Take for example, the French version of the crisis map below.

Both maps appear to be more or less at the same scale but only the French includes distance bar (0-500 meters). The French map is also more detailed (history is written by the victors?) but the Chinese one makes more use of color-coding. To get a better sense of what the “battle field” and ships looked liked, check out the following pictures.

The above was painted in the 19th century. The painting below depicts the bombing of the Fuzhou Arsenal on the following day, August 24th.

Contrast the above French version with the Chinese lithograph of the battle below and the Japanese depiction that follows.

The picture below shows the Chinese fleet the night before the French attack. The two following pictures depict the result, the sunken Chinese ships.


Curious to know what the area looks like today? The Wikipedia article also provided a number of pictures.

Know of other crisis maps from hundreds of years ago? If so, please feel free to share in the comments section below. Thanks!

It’s Official, I’m a PhD

Theorizing Ushahidi: An Academic Treatise

[This is an excerpt taken from Chapter 1 of my dissertation]

Activists are not only turning to social media to document unfolding events, they are increasingly mapping these events for the world to bear witness. We’ve seen this happen in Tunisia, Egypt, Libya, Syria, Yemen and beyond. My colleague Alexey Sidorenko describes this new phenomenon as a “mapping reflex.” When student activists from Khartoum got in touch earlier this year, they specifically asked for a map, one that would display their pro-democracy protests and the government crackdown. Why? They wanted the world to see that the Arab Spring extended to the Sudan.

The Ushahidi platform is increasingly used to map information generated by crowds in near-real time like the picture depicted above. Why is this important? Because live public maps can help synchronize shared awareness, an important catalyzing factor of social movements, according to Jürgen Habermas. Recall Habermas’s treatise that “those who take on the tools of open expression become a public, and the presence of a synchronized public increasingly constrains un-democratic rulers while expanding the right of that public.”

Sophisticated political maps have been around for hundreds of years. But the maps of yesteryear, like the books of old, were created and controlled by the few. While history used to be written by the victors, today, journalists like my colleague Anand Giridharadas from the New York Times are asking whether the triangulated crisis map will become the new first draft of history. In the field of geography and cartography, some refer to this new wave of democratized map-making as “neo-geography.” But this new type of geography is not only radically different from traditional approaches because it is user-generated and more par-ticipatory; the fact that today’s dynamic maps can also be updated and shared in near real-time opens up an entire new world of possibilities and responses.

Having a real time map is almost as good as having your own helicopter. A live map provides immediate situational awareness, a third dimension and additional perspective on events unfolding in time and space. Moreover, creating a map catalyzes conversations between activists, raises questions about geographic patterns or new incidents, and leads to more questions regarding the status quo in a repressive environment. To be sure, mass media alone does not change people’s minds.  Recall that political change is a two-step process, with the second—social step—being where political opinions are formed (Katz and Lazarsfeld 1955). “This is the step in which the Internet in general, and social media in particular, can make a difference” (Shirky 2010). In addition, the collaboration that takes place when creating a live map can also reinforce weak and strong ties, both of which are important for civil resistance.

The Ushahidi platform enables a form of live-mapped “sousveillance,” which refers to the recording of an activity using portable personal technologies. In many respects, however, the use of Ushahidi goes beyond sousveillance in that it generates the possibility of “dataveillance” and a possible reversal of Bentham’s panopticon. “With postmodernity, the panopticon has been informationalized; what once was organized around hierarchical observation is now organized through decoding and recoding of information” (Lyon 2006). In Seeing Like a State, James Scott argues eloquently that this process of decoding and recoding was for centuries the sole privilege of the State. In contrast, the Ushahidi platform provides a participatory digital canvas for the public decoding, recoding of information and synchronization of said information. In other words, the platform serves to democratize dataveillance by crowdsourcing what was once the exclusive realm of the “security-informational complex.”

In Domination and the Arts of Resistance: Hidden Transcripts published in 1990, James Scott distinguishes between public and hidden transcripts. The former describes the open, public interactions that take place between domina-tors and oppressed while hidden transcripts relate to the critique of power that “goes on offstage” and which the power elites cannot decode. This hidden transcript is comprised of the second step, social conversations, that Katz and Lazarsfeld (1955) argue ultimately change political behavior. Scott writes that when the oppressed classes publicize this “hidden transcript”, they become conscious of its common status. Borrowing from Habermas, the oppressed thereby become a public and more importantly a synchronized public. In many ways, the Ushahidi platform is a vehicle by which the hidden transcript is collectively published and used to create shared awareness—thereby threatening to alter the balance of power between the oppressors and oppressed.

The new dynamics that are enabled by “liberation technologies” like Ushahidi may enable a different form of democracy, one which arising from “the inability of electoral/representative politics to keep it promises [has thus] led to the development of indirect forms of democracy” (Rosanvallon 2008). More specifically, Rosanvallon indentifies three channels whereby civil society can hold the state accountable not just during elections but also between elections and independent of their results. “The first refers to the various means whereby citizens (or, more accurately, organizations of citizens) are able to monitor and publicize the behavior of elected and appointed rulers; the second to their capacity to mobilize resistance to specific policies, either before or after they have been selected; the third to the trend toward ‘juridification’ of politics  [cf. dataveillance] when individuals or social groups use the courts and, especially, jury trials to bring delinquent politicians to judgment” (Schmitter 2008, PDF).

These three phases correspond surprisingly well with the three waves of Ushahidi uses witnessed over the past three years. The first wave was reactive and documentary focused. The second was more pro-active and focused on action beyond documentation while the third seeks to capitalize on the first two to complete the rebalancing of power. Perhaps this final wave is the teleological purpose of the Ushahidi platform or What Technology Wants as per Kevin Kelly’s treatise. However, this third wave, the trend toward the “juridificaiton” of democracy bolstered by crowdsourced evidence that is live-mapped on a public Ushahidi platform, is today more a timid ripple than a tsunami of change reversing the all-seeing “panopticon”. A considerable amount of learning-by-doing remains to be done by those who wish to use the Ushahidi platform for impact beyond the first two phases of Rosanvallon’s democracy.