Monthly Archives: April 2009

JRC: Geo-Spatial Analysis for Global Security

The European Commission’s Joint Research Center (JRC) is doing some phenomenal work on Geo-Spatial Information Analysis for Global Security and Stability. I’ve had several meetings with JRC colleagues over the years and have always been very impressed with their projects.

The group is not very well known outside Europe so the purpose of this blog post is to highlight some of the Center’s projects.

  • Enumeration of Refugee Camps: The project developed an operational methodology to estimate refugee populations using very high resolution (VHR) satellite imagery. “The methodology relies on a combination of machine-assisted procedures, photo-interpretation and statistical sampling.”

jrc1

  • Benchmarking Hand Held Equipment for Field Data Collection: This project tested new devices for the collection for geo-referenced information. “The assessment of the instruments considered their technical characteristics, like the availability of necessary instruments or functionalities, technical features, hardware specifics, software compatibility and interfaces.”

jrc3

  • GEOCREW – Study on Geodata and Crisis Early Warning: This project analyzed the use of geo-spatial technology in the decision-making process of institutions dealing with international crises. The project also aimed to show best practice in the use of geo-spatial technologies in the decision-making process.
  • Support to Peacekeeping Operations in the Sudan: Maps are generally not available or often are out of date for most of the conflict areas in which peacekeping personnel is deployed,  This UNDPKO Darfur mapping initiative aimed to create an alliance of partners that addressed this gap and shared the results.

jrc4

  • Temporary Settlement Analysis by Remote Sensing: The project analyzes different types of refugee and IDP settlements to identify single structures inside refugee settlements. “The objective of the project is to establish the first comprehensive catalog of image interpretation keys, based on last-generation satellite data and related to the analysis of transitional settlements.”

JRC colleagues often publish papers on their work and I highly recommend having a look at this book when it comes out in June 2009:

jrc5

Patrick Philippe Meier

Video Introduction to Crisis Mapping

I’ve given many presentations on crisis mapping over the past two years but these were never filmed. So I decided to create this video presentation with narration in order to share my findings more widely and hopefully get a lot of feedback in the process. The presentation is not meant to be exhaustive although the video does run to about 30 minutes.

The topics covered in this presentation include:

  • Crisis Map Sourcing – information collection;
  • Mobile Crisis Mapping – mobile technology;
  • Crisis Mapping Visualization – data visualization;
  • Crisis Mapping Analysis – spatial analysis.

The presentation references several blog posts of mine in addition to several operational projects to illustrate the main concepts behind crisis mapping. The individual blog posts featured in the presentation are listed below:

This research is the product of a 2-year grant provided by Humanity United  (HU) to the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping and Early Warning, where I am a doctoral fellow.

I look forward to any questions/suggestions you may have on the video primer!

Patrick Philippe Meier

Folksomaps: Gold Standard for Community Mapping

There were a number of mapping-related papers, posters and demo’s at ICTD2009. One paper in particular caught my intention given the topic’s direct relevance to my ongoing consulting work with the UN’s Threat and Risk Mapping Analysis (TRMA) project in the Sudan and the upcoming ecosystem project in Liberia with Ushahidi and Humanity United.

Introduction

Entitled “Folksomaps – Towards Community Intelligent Maps for Developing Regions,” the paper outlines a community-driven approach for creating maps by drawing on “Web 2.0 principles” and “Semantic Web technologies” but without having to rely entirely on a web-based interface. Indeed, Folksomaps “makes use of web and voice applications to provide access to its services.”

I particularly value the authors’ aim to “provide map-based services that represent user’s intuitive way of finding locations and directions in developing regions.” This is an approach that definitely resonates with me. Indeed, it is our responsibility to adapt and customize our community-based mapping tools to meet the needs, habits and symbology of the end user; not the other way around.

I highly recommend this paper (or summary below) to anyone doing work in the crisis mapping field. In fact, I consider it required reading. The paper is co-authored by Arun Kumar, Dipanjan Chakraborty, Himanshu Chauhan, Sheetal Agarwal and Nitendra Rajput of IBM India Research Lab in New Delhi.

Background

Vast rural areas of developing countries do not have detailed maps or mapping tools. Rural populations are generally semi-literate, low-income and non-tech savvy. They are hardly like to have access to neogeography platforms like Google Earth. Moreover, the lack of electricity access and Internet connection also complicates the situation.

We also know that cities, towns and villages in developing countries “typically do not have well structured naming of streets, roads and houses,” which means “key landmarks become very important in specifying locations and directions.”

Drawing on these insights, the authors seek to tap the collective efforts of local communities to populate, maintain and access content for their own benefit—an approach I have described as crowdfeeding.

Surveys of Tech and Non-Tech Users

The study is centered on end-user needs, which is rather refreshing. The authors carried out a series of surveys to be better understand the profiles of end-users, e.g., tech and non-tech users.

The first survey sought to identify answers to the following questions:

  • How do people find out points of interest?
  • How do much people rely on maps versus people on the streets?
  • How do people provide local information to other people?
  • Whether people are interested in consuming and feeding information for a community-driven map system?

The results are listed in the table below:

folksotb1

Non-tech savvy users did not use maps to find information about locations and only 36% of these users required precise information. In addition, 75% of non-tech respondents preferred the choice of a phone-based interface, which really drives home the need for what I have coined “Mobile Crisis Mapping” or MCM.

Tech-users also rely primarily on others (as opposed to maps) for location related information. The authors associate this result with the lack of signboards in countries like India. “Many a times, the maps do not contain fine-grained information in the first place.”

Most tech-users responded that a phone-based location and direction finding system in addition to a web-based interface. Almost 80% expressed interest in “contributing to the service by uploading content either over the phone or though a web-based portal.”

The second survey sought to identify how tech and non-tech users express directions and local information. For example:

  • How do you give directions to people on the road or to friends?
  • How do you describe proximity of a landmark to another one?
  • How do you describe distance? Kilometers or using time-to-travel?

The results are listed in the table below:

folksotb2

The majority of non-tech savvy participants said they make use of landmarks when giving directions. “They use names of big roads […] and use ‘near to’, ‘adjacent to’, ‘opposite to’ relations with respect to visible and popular landmarks […].” Almost 40% of responders said they use time only to describe the distance between any two locations.

Tech-savvy participants almost always use both time and kilometers as a measure to represent distance. Only 10% or so of participants used kilometers only to represent distance.

The Technology

The following characteristics highlight the design choices that differentiate Folksomaps from established notions of map systems:

  • Relies on user generated content rather than data populated by professionals;
  • Strives for spatial integrity in the logical sense and does not consider spatial integrity in the physical sense as essential (which is a defining feature of social maps);
  • Does not consider visual representation as essential, which is important considering the fact that a large segment of users in developing countries do not have access to Internet (hence my own emphasis on mobile crisis mapping);
  • Is non-static and intelligent in the sense that it infers new information from what is entered by the users.
  • User input is not verified by the system and it is possible that pieces of incorrect information in the knowledgebase may be present at different points of time. Folksomaps adopts the Wiki model and allows all users to add, edit and remove content freely while keeping maps up-to-date.

Conceptual Design

Folksomaps uses “landmark” as the basic unit in the mapping knowledgebase model while “location” represents more coarse-grained geographical areas such as a village, city or country. The model then seeks to capture a few key logical characteristics of locations such as direction, distance, proximity and reachability and layer.

The latter constitutes the granularity of the geographic area that a location represents. “The notion of direction and distance from a location is interpreted with respect to the layer that the location represents. In other words, direction and distance could be viewed as binary operator over locations of the same level. For instance, ‘is towards left of ’ would be appropriate if the location pair being considered is <Libya, Egypt>,” but not if the pair is <Nairobi, India>.

The knowledgebase makes use of two modules, the Web Ontology Language (OWL) and a graph database, to represent and store the above concepts. The Semantic Web language OWL is used to model the categorical characteristics of a landmark (e.g., direction, proximity, etc), and thence infer new relationships not explicitly specified by users of the system. In other words, OWL provides an ontology of locations.

The graph database is used represent distance (numerical relationships) between landmarks. “The locations are represented by nodes and the edges between two nodes of the graph are labeled with the distance between the corresponding locations.” Given the insights gained from user surveys, precise distances and directions are not integral components of community-based maps.

The two modules are used to generate answers to queries submitted by users.

User Interaction

The authors rightly recognize that the user interface design is critical to the success of community-based mapping projects. To be sure, users of may be illiterate, or semi-illiterate and not very tech-savvy. Furthermore, users will tend to query the map system when they need it most, e.g., “when they are stuck on the road looking for directions […] and would be pressed for time.” This very much holds true for crisis mapping as well.

Users can perform three main tasks with the system: “find place”, “trace path” and “add info.” In addition, some or all users may be granted the right to edit or remove entries from the knowledgebase. The Folksomaps system can also be bootstrapped from existing databases to populate instances of location types. “Two such sources of data in the absence of a full-fledged Geographical Information System (GIS) come from the Telecom Industry and the Postal Department.”

folksofig3

How the users interface with the system to carry out these tasks will depend on how tech-savvy or literate they are and what type of access they have to information and communication technologies.

Folksomaps thus provides three types of interface: web-based, voice-based and SMS-based. Each interface allows the user to query and update the database. The web-based interface was developed using Java Server Pages (JSP) while the voice-based interface uses JSPs and VoiceXML.

folksofig41

I am particularly interested in the voice-based interface. The authors point to previous studies that suggest a voice-based interaction works well with users who are illiterate or semi-illiterate and who cannot afford to have high-end devices but can use ordinary low-end phones.

folksofig1

I will share this with the Ushahidi development team with the hopes that they will consider adding a voice-based interface for the platform later this year. To be sure, could be very interesting to integrate Freedom Fone’s work in this area.

Insights from User Studies

The authors conducted user studies to verify the benefit and acceptability of Folksomaps. Tech-savvy used the web-based interface while non-tech savvy participants used the voice-based interface. The results are shown in the two tables below.

folksotb3

Several important insights surfaced from the results of the user studies. For example, an important insight gained from the non-tech user feedback was “the sense of security that they would get with such a system. […] Even though asking for travel directions from strangers on the street is an option, it exposes the enquirer to criminal elements […].”

Another insight gain was the fact that many non-tech savvy participants were willing to pay for the call even a small premium over normal charges as they saw value to having this information available to them at all times.” That said, the majority of participants “preferred the advertisement model where an advertisement played in the beginning of the call pays for the entire call.”

Interestingly, almost all participants preferred the voice-based interface over SMS even though the former led to a number of speech recognition errors. The reason being that “many people are either not comfortable using SMS or not comfortable using a mobile phone itself.”

There were also interesting insights on the issue of accuracy from the perspective of non-tech savvy participants. Most participants asked for full accuracy and only a handful were tolerant of minor mistakes. “In fact, one of the main reasons for preferring a voice call over asking people for directions was to avoid wrong directions.”

This need for high accuracy is driven by the fact that most people use public transportation, walk or use a bicycle to reach their destination, which means the cost of incorrect information is large compared to someone who owns a car.

This is an important insight since the authors had first assumed that tolerance for incorrect information was higher. They also learned that meta information is as important to non-tech savvy users as the landmarks themselves. For instance, low-income participants were more interested in knowing the modes of available transportation, timetables and bus route numbers than the road route from a source to a destination.

folkstb4

In terms of insights from tech-savvy participants, they did not ask for fine-grained directions all the time. “They were fight with getting high level directions involving major landmarks.” In addition, the need for accuracy was not as strong as for the non-tech savvy respondents and they preferred the content from the queries sent to them via SMS so they could store it for future access, “pointing out that it is easy to forget the directions if you just hear it.”

Some tech-savvy participants also suggested that the directions provided by Folksomaps should “take into consideration the amount of knowledge the subject already has about the area, i.e., it should be personalized based upon user profile. Other participants mentioned that “frequent changes in road plans due to constructions should be captured by such a system—thus making it more usable than just getting directions.”

Conclusion

In sum, the user interface of Folksomaps needs to be “rich and adaptive to the information needs of the user […].” To be sure, given user preference towards “voice-based interface over SMS, designing an efficient user-friendly voice-based user interface […].” In addition, “dynamic and real-time information augmented with traditional services like finding directions and locations would certainly add value to Folksomaps.” Furthermore, the authors recognize that Folksomaps can “certainly benefit from user interface designs,” and “multi-model front ends.”

Finally, the user surveys suggest “the community is very receptive towards the concept of a community-driven map,” so it is important that the TRMA project in the Sudan and the ecosystem Liberia project build on the insights and lessons learned provided in this study.

Patrick Philippe Meier

Improving Quality of Data Collected by Mobile Phones

The ICTD2009 conference in Doha, Qatar, had some excellent tech demo’s. I had the opportunity to interview Kuang Chen, a PhD student with UC Berkeley’s computer science department about his work on improving data quality using dynamic forms and machine learning.

I’m particularly interested in this area of research since ensuring data quality continues to be a real challenge in the fields of conflict early warning and crisis mapping. So I always look for alternative and creative approaches that address this challenge. I include below the abstract for Kuang’s project (which includes 5 other team members) and a short 2-minute interview.

Abstract

“Organizations in developing regions want to efficiently collect digital data, but standard data gathering practices from the developed world are often inappropriate. Traditional techniques for form design and data quality are expensive and labour-intensive. We propose a new data-driven approach to form design, execution (filling) and quality assurance. We demonstrate USHER, an end-to-end system that automatically generates data entry forms that enforce and maintain data quality constraints during execution. The system features a probabilistic engine that drives form-user interactions to encourage correct answers.”

In my previous post on data quality evaluation, I pointed to a study that suggests mobile-based data entry has significantly higher error rates. The study shows that a voice call to a human operator results in superior data quality—no doubt due to the human operator double-checking the respondent’s input verbally.  USHER’s ability to dynamically adjust the user interface (form layout and data entry widgets) is one approach to provide some context-specific data-driven user feedback that is currently lacking in mobile forms, as an automated proxy of a human data entry person on the other end of the line.

Interview

This is my first video so many thanks to Erik Hersman for his tips on video editing! And many thanks to Kuang for the interview.

Patrick Philippe Meier

Evaluating Accuracy of Data Collection on Mobile Phones

The importance of data validation is unquestioned but few empirical studies seek to assess the possible errors incurred during mobile data collection. Authors Somani Patnaik, Emma Brunskill and William Thies thus carried out what is possibly the first quantitative evaluation  (PDF) of data entry accuracy on mobile phones in resource-constrained environments. They just presented their findings at ICTD 2009.

Mobile devices have become an increasingly important tool for information collection. Hence, for example, my interest in pushing forward the idea of Mobile Crisis Mapping (MCM). While studies on data accuracy exist for personal digital assistants (PDAs), there are very few that focus on mobile phones. This new study thus evaluates three user interfaces for information collection: 1) Electronic forms; 2) SMS and 3) voice.

The results of the study indicate the following associated error rates:

  • Electronic forms = 4.2%
  • SMS = 4.5%
  • Voice = 0.45%

For compartive purposes and context, note that error rates using PDAs have generally been less than 2%. These figures represent the fraction of questions that were answered incorrectly. However, since “each patient interaction consisted of eleven questions, the probability of error somewhere in a patient report is much higher. For both electronic forms and SMS, 10 out of 26 reports (38%) contained an error; for voice, only 1 out of 20 reports (5%) contained an error (which was due to operator transcription).

I do hope that the results of this study prompt many others to carry out similar investigations.  I think we need a lot more studies like this one but with a larger survey sample (N) and across multiple sectors (this study drew on just 13 healthworkers).

The UN Threat and Risk Mapping Analysis (TRMA) project I’m working on in the Sudan right now will be doing a study on data collection accuracy using mobile phones when they roll out their program later this month. The idea is to introduce mobile phones in a number of localities and not in neighboring ones. The team will then compare the data quality of both samples.

I look forward to sharing the results.

Patrick Philippe Meier

ICT for Development Highlights

Credit: http://farm2.static.flickr.com/1403/623843568_7fa3c0cbe9.jpg?v=0

For a moment there, during the 8-hour drive from Kassala back to Khartoum, I thought Doha was going to be a miss. My passport was still being processed by the Sudanese Ministry of Foreign Affairs and my flight to Doha was leaving in a matter of hours. I began resigning myself to the likelihood that I would miss ICT4D 2009. But thanks to the incredible team at IOM, not only did I get my passport back, but I got a one-year, mulitple re-entry visa as well.

I had almost convinced myself that missing ICT4D would ok. How wrong I would have been. When the quality of poster presentations and demo’s at a conference rival the panels and presentation, you know that you’re in for a treat. As the title of this posts suggest, I’m just going to point out a few highlights here and there.

Panels

  • Onno Purbo gave a great presentation on wokbolic, a  cost saving wi-fi receiver  antenna made in Indonesia using a wok. The wokbolic has as 4km range, costs $5-$10/month. Great hack.

wok

  • Kentaro Toyama with Microsoft Research India (MSR India) made the point that all development is paternalistic and that we should stop fretting about this since development will by definition be paternalistic. I’m not convinced. Partnership is possible without paternalism.
  • Ken Banks noted the work of QuestionBox, which I found very interesting. I’d be interested to know how they remain sustainable, a point made by another colleague of mine at DigiActive.
  • Other interesting comments by various panelists included (and I paraphrase): “Contact books and status are more important than having an email address”; “Many people still think of mobile phones as devices one holds to the ear… How do we show that phones can also be used to view and edit content?”

Demo’s & Posters

I wish I could write more about the demo’s and posters below but these short notes and few pictures will have to do for now.

dudes

  • Analyzing Statistical Relationships between Global Indicators through Visualization:

geostats

  • Numeric Paper Forms for NGOs:

paperforms

  • Uses of Mobile Phones in Post-Conflict Liberia:

liberiaphones

  • Improving Data Quality with Dynamic Forms

datavalidate

  • Open Source Data Collection Tools:

opensourcecollection

Patrick Philippe Meier

Crisis Mapping and Agent Based Models

The idea of combining crisis mapping and agent based modeling has been of great interest to me ever since I took my first seminar on complex systems back in 2006. There are few studies out there that ground agent based models (ABM) on conflict dynamics within a real-world geographical space. One of those few, entitled “Global Pattern Formation and Ethnic/Cultural Violence,” appeared in the journal Science in 2007.

Note that I take issue with a number of assumptions that underlie this study as well as the methodology used. That said, the study is a good illustration of how crisis mapping and ABM can be combined.

Introduction

The authors suggest that global patterns of violence arise due to “the structure of boundaries between groups rather than the groups themselves.” In other words, the spatial boundaries between different populations create a propensity for conflict, “so that spatial heterogeneity itself is predictive of local violence.”

The authors argue that this pattern is “consistent with the natural dynamics of “type separation,” a specific pattern formation also observed in physical and chemical phase separation. The unit of analysis in this study’s ABM, however, is the local ethnic “patch size,” which represents the smallest unit of ethnic members that act collectively as one.

The Model

A simple model of type separation assumes that individuals (or ethnic units) prefer to move to areas where more individuals of the same time reside. Playing the ABM yields progressively larger patches or “islands” of each ethnic group over time. The relationship between patch size and time follows a power law distribution, “a universal behavior that does not depend on many of the details of the model […].”

In other words, the model depicts scale invariant behavior, which implies that “a number of individual agents of the model can be aggregated into a single agent if time is rescaled correspondingly without changing the behavior at the larger scales.”

To model violent conflict, the authors assume that both highly mixed regions and well-segregated groups do not engage in violence. The rationale regarding the former being that in highly mixed regions, “groups of the same type are not large enough to develop strong collective identities, or to identify public spaces as associated with one or another group. When groups are much bigger, “they typically form self-sufficient entities that enjoy local sovereignty.”

To this end, the authors argue that partial separation with poorly defined boundaries fosters conflict when groups are of a size that allows them to impose cultural norms on public spaces, “but where there are still intermittent violations of these rules due to the overlap of cultural domains.” In other words, conflict is a function of population distribution and not of the “specific mechanism by which the population achieves this structure, which may include internally or externally directed migrations.”

The model is therefore founded on the principle that the conditions under which violent conflict becomes likely can be determined by census.

The Analysis

The authors used 1991 census data of the former Yugoslavia and the Indian census data from 2001 and converted the data into map form (see figure below), which they used in an ABM simulation. “Mathematically, the expected violence was determined by detecting patches consisting of islands or peninsulas of one type surrounded by populations of other types.”

mexicanhat

A wavelet filter that has a positive center and a negative surround (also called a Mexican hat filter) was used to detect and correlate the islands/peninsulas. scienceabm1

The red overlays depicted in Figure D above represents the maximum correlation over population types. The diameter of the positive region of the wavelet, i.e., “the size of the local population patches that are likely to experience violence,” is the main predictor of the model.

scienceabm2

To test the predictive power of their model, the authors compared the locations of red overlays with actual incidents of violence as reported in books, newspapers and online sources (the yellow dots in the crisis map below).

yugoabm

Their statistical results indicate that the Yugoslavia crisis map model has a correlation of 0.89 with reports. Moreover, “the predicted results are highly robust to parameter variation [patch size], with essentially equivalent agreement obtained for filter diameters ranging from 18 to 60 km […].”

The statistical results for the India crisis map model indicate a correlation of 0.98. The range of the patch size overlapped that of the former Yugoslavia but is shifted to larger values, up to 100km. This suggests that “regions of width less than 10km or greater than 100km may provide sufficient mixing or isolation to reduce the chance of violence.”

Conclusion

While the authors recognize the importance of social and institutional drivers of violence, they argue that, “influencing the spatial structure might address the conditions that promote violence described [in this study].” In sum, they suggest that, “peaceful coexistence need not require complete integration.”

What do you think?

Patrick Philippe Meier

Nation-State Routing: Globalizing Censorship

I just found an interesting piece on Internet censorship at arXiv, my favorite go-to place for scientific papers that are pre-publication. Entitled “Nation-State Routing: Censorship, Wiretapping and BGP,” this empirical study is possibly the first to determine the aggregate effect of national policies on the flow of international traffic.

As government control over the treatment of Internet traffic becomes more common, many people will want to understand how international reachability depends on individual countries and to adopt strategies either for enhancing or weakening the dependence on some countries.

Introduction

States typically impose censorship to prevent domestic users from reaching questionable content. Some censorship techniques, however, “may affect all traffic traversing an [Autonomous System].” For example, Internet Service Providers (ISPs) in China, Britain and Pakistan block Internet traffic at the Internet Protocol (IP) level by “filtering based on IP addresses and URLs in the data packets, or performing internal prefix hijacks, which could affect the international traffic they transit.”

The scope and magnitude of this affect is unclear. What we do know is that one may intentionally or by accident apply censorship policies to international traffic, as demonstrated by the global YouTube outage last year as a result of a domestic Pakistani policy directive.

Methodology

The authors therefore developed a framework to study interdomain routing at the nation-state level. They first adapted the “Betweeness Centrality” metric from statistical physics to measure the importance, or centrality, of each country to Internet reachability. Second, they designed, implemented and validated a Country Path Algorithm (CPA) to infer country-paths from a pair of source and destination IP addresses.

Findings

The table below shows Country Centrality (CC) computed directly from Trace Route (TR) and Border Gate Protocol (BGP). The closer the number is to one, the more impact that country’s domestic Internet censorship policies has on international Internet traffic.

arxivtable1

The second table below lists both Country Centrality (CC) and Strong Country Centrality (SCC). The latter measures how central countries are when alternative routes are considered. When SCC equals one, this suggests a country is completely unavoidable.

arxiv-table2

“Collectively, these results show that the ‘West’ continues to exercise disproportionate influence over international routing, despite the penetration of the Internet to almost every region of the world, and the rapid development of China and India.”

This last table below lists CC and SCC measures for authoritarian countries that are known for significant domestic censorship of Internet content. Aside from China, “these countries have very little influence over global reachability.”

arxiv-table3

Next Steps

The authors of the study point to a number of interesting questions for future research. For example, it would be interesting to know how the centrality result above change over time, i.e., which countries are becoming more central over time, and why?

Another important question is what economically driven strategies single countries (or small coalitions of countries) could adopt to increase their own centrality or to reduce that of other countries?

One final and particularly important question would to find out what fraction of domestic paths are actually routed through another country? This is important because the answer to this question would “provide insight into the influence that foreign nations have over a country’s domestic routing and security, and would shed light on […] whether warrantless tapping on links in one country to another might inadvertently capture some purely domestic traffic.”

Patrick Philippe Meier

Developing Swift River to Validate Crowdsourcing

Swift River is an Ushahidi initiative to crowdsource the process of data validation. We’re developing a Swift River pilot to complement the VoteReport India crowdsourcing platform we officially launched this week. As part of the Swift River team, I’d like to share with iRevolution readers what I hope the Swift River tool will achieve.

We had an excellent series of brainstorming sessions several weeks ago in Orlando and decided we would combine both natural language processing (NLP) and decentralized human filtering to get one step closer at validating crowdsourced data. Let me expand on how I see both components working individually and together.

Automated Parsing

Double-counting has typically been the bane of traditional NLP or automated event-data extraction algorithms. At Virtual Research Associates (VRA), for example, we would parse headlines of Reuters newswires in quasi real-time, which meant that a breaking story would typically be updated throughout the day or week.

But the natural language parser was specifically developed to automate event-data extraction based on the parameters “Who did what, to whom, where and when?” In other words, the parser could not distinguish whether coded events were actually the same or related. This tedious task was left to VRA analysts to carry out.

Digital Straw

The logic behind eliminating double counting (duplicate event-data) is inevitably reversed given the nature of crowdsourcing. To be sure, the more reports are collected about a specific event, the more likely it is that the event in question actually took place as described by the crowd. Ironically, that is precisely why we want to “drink from the fire hose,” the swift river of data gushing through the wires of social media networks.

We simply need a clever digital straw to filter the torrent of data. This is where our Swift River project comes in and why I first addressed the issue of double counting. One of the central tasks I’d like Swift River to do is to parse the incoming reports from VoteReport India and to cluster them into unique event-clusters. This would be one way to filter the cascading data. Moreover, the parser could potentially help filter fabricated reports.

An Example

For example, if 17 individual reports from different sources are submitted over a two-day period about “forged votes,” then the reports in effect self-triangulate or validate each other. Of course, someone (with too much time on their hands) might decide to send 17 false reports about “forged votes.”

Our digital straw won’t filter all the impurities, but automating this first-level filter is surely better than nothing. Automating this process would require that the digital straw automate the extraction of nouns, verbs and place names from each report, i.e., actor, action and location. Date and time would automatically be coded based on when the report was submitted.

Reports that use similar verbs (synonyms) and refer to the same or similar actors at the same location on the same day can then be clustered into appropriate event-clusters. More on that in the section on crowdsourcing the filter below.

More Filters

A second-level filter would compare the content of the reports to determine if they were exact replicas. In other words, if someone were simply copying and pasting the same report, Swift River could flag those identical reports as suspicious. This means someone gaming the system would have to send multiple reports with different wording, thus making it a bit more time consuming to game the system.

A third-level filter or trip-wire could compare the source of the 17 reports. For example, perhaps 10 reports were submitted by email, 5 by SMS and two by Twitter. The greater the diversity of media used to report an event, the more likely that event actually happened. This means that someone wanting to game the system would have to send several emails, text messages and Tweets using different language to describe a particular event.

A fourth-level filter could identify the email addresses, IP addresses and mobile phone numbers in question to determine if they too were different. A crook trying to game the system would now have to send emails from different accounts and IP addresses, different mobile phone numbers, and so on. Anything “looking suspicious” would be flagged for a human to review; more on that soon. The point is to make the gaming of the system as time consuming and frustrating as possible.

Gaming the System

Of course, if someone is absolutely bent on submitting fabricated data that passes all the filters, then they will.  But those individuals probably constitute a minority of offenders. Perhaps the longer and more often they do this, the more likely someone in the crowd will pick up on the con. As for the less die-hard crooks out there, they may try and game the system only to see that their reports do not get mapped. Hopefully they’ll give up.

I do realize I’m giving away some “secrets” to gaming the system, but I hope this will be more a deterrent than an invitation to crack the system. If you do happen to be someone bent on gaming the platform, I wish you’d get in touch with us instead and help us improve the filters. Either way, we’ll learn from you.

No one on the Swift River team claims that 100% of the dirt will be filtered. What we seek to do is develop a digital filter that makes the data that does come through palatable enough for public consumption.

Crowdsourcing the Filter

Remember the unique event-clusters idea from above? These could be visualized in a simple and intuitive manner for human volunteers (the crowd) to filter. Flag icons, perhaps using three different colors—green, orange and red—could indicate how suspicious a specific series of reports might be based on the results of the individual filters described above.

A green flag would indicate that the report has been automatically mapped on VoteReport upon receipt. An orange flag would indicate the need for review by the crowd while a red flag would send an alert for immediate review.

If a member of the crowd does confirm that a series of reports were indeed fabricated, Swift River would note the associated email address(es), IP address(es) and/or mobile phone number(s) and automatically flag future reports from those sources as red. In other words, Swift River would start rating the credibility of users as well.

If we can pull this off, Swift River may actually start to provide “early warning” signals. To be sure, if we fine tune our unique event-cluster approach, a new event-cluster would be created by a report that describes an event which our parser determines has not yet been reported on.

This should set off a (yellow) flag for immediate review by the crowd. This could either be a legitimate new event or a fabricated report that doesn’t fit into pre-existing cluster. Of course, we will get a number of false positives, but that’s precisely why we include the human crowdsourcing element.

Simplicity

Either way, as the Swift River team has already agreed, this process of crowdsourcing the filter needs to be rendered as simple and seamless as possible. This means minimizing the number of clicks and “mouse motions” a user has to make and allowing for short-cut keys to be used, just like in Gmail. In addition, a userfiendly version of the interface should be designed specifically for mobile phones (various platforms and brands).

As always, I’d love to get your feedback.

Patrick Philippe Meier

Threat and Risk Mapping Analysis in Sudan

Massively informative.

That’s how I would describe my past 10 days with the UNDP‘s Threat and Risk Mapping Analysis (TRMA) project in the Sudan. The team here is doing some of the most exciting work I’ve seen in the field of crisis mapping. Truly pioneering. I can’t think of  a better project to apply the past two years of work I have done with the Harvard Humanitarian Initiative’s (HHI) Crisis Mapping and Early Warning Program.

TRMA combines all the facets of crisis mapping that I’ve been focusing on since 2007. Namely, crisis map sourcing, (CMS), mobile crisis mapping (MCM), crisis mapping visualization (CMV), crisis mapping analytics (CMA) and crisis mapping platforms (CMP). I’ll be blogging about each of these in more detail later but wanted to provide a sneak previous in the meantime.

Crisis Map Sourcing (CMS)

The team facilitates 2-day focus groups using participatory mapping methods. Participants identify and map the most pressing crisis factors in their immediate vicinity. It’s really quite stunning to see just how much conversation a map can generate. Rich local knowledge.

trma1

What’s more, TRMA conducts these workshops at two levels for each locality (administrative boundaries within a state): the community-level and at the state-level. They can then compare the perceived threats and risks from both points of view. Makes for very interesting comparisons.

trma2

In addition to this consultative approach to crisis map sourcing, TRMA has played a pivotal role in setting up an Information Management Working Group (IMWG) in the Sudan, which includes the UN’s leading field-based agencies.

What is truly extraordinary about this initiative is that each agency has formally signed an information sharing protocol to share their geo-referenced data. TRMA had already been using much of this data but the process until now had always been challenging since it required repeated bilateral efforts. TRMA has also developed a close professional relationship with the Central Bureau of Statistics Office.

Mobile Crisis Mapping (MCM)

The team has just partnered with a multinational communications corporation to introduce the use of mobile phones for information collection. I’ll write more about this in the coming weeks. Needless to say, I’m excited. Hopefully it won’t be too late to bring up FrontlineSMS‘s excellent work in this area, as well as Ushahidi‘s.

Crisis Mapping Visualization (CMV)

The team needs some help in this area, but then again, that’s one of the reasons I’m here. Watching first reactions during focus groups when we show participants the large GIS maps of their state is  really very telling. Lots more to write about on this and lots to contribute to TRMA’s work. I don’t yet know which maps can be made public but I’ll do my utmost best to get permission to post one or two in the coming weeks.

Crisis Mapping Analytics (CMA)

The team has produced a rich number of different layers of data which can be superimposed to identify visual correlations and otherwise hidden patterns. Perhaps one of the most exciting examples is when the team started drawing fault lines on the maps based on the data collected and their own local area expertise. The team subsequently realized that these fault lines could potential serve as “early warning” markers since a number of conflict incidents subsequently took place along those lines. Like the other crisis mapping components described above, there’s much more to write on this!

Crisis Mapping Platforms (CMP)

TRMA’s GIS team has used ArcGIS but this has been challenging given the US embargo on the Sudan. They therefore developed their own in-house mapping platforms using open-source software. These platforms include the “Threat Mapper” for data entry during (or shortly after) the focus groups and “4Ws” which stands for Who, What, Where and When. The latter tool is operational and will soon be fully developed. 4Ws will actually be used by members of the IMWG to share and visualize their data.

In addition, TRMA makes it’s many maps and layers available by distributing a customized DVD with ArcReader (which is free). Lots more on this in the coming weeks and hopefully some screenshots as well.

Closing the Feedback Loop

I’d like to add with one quick thought, which I will also expand on in the next few weeks. I’ve been in Blue Nile State over the past three days, visiting a number of different local ministries and civil society groups, including the Blue Nile’s Nomadic Union. We distributed dozens of poster-size maps and had at times hour long discussions while pouring over these maps. As I hinted above, the data visualization can be improved. But the question I want to pose at the moment is: how can we develop a manual GIS platform?

While the maps we distributed were of huge interest to our local partners, they were static, as hard-copy maps are bound to be. This got me thinking about possibly using transparencies to overlap different data/thematic layers over a general hard-copy map. I know transparencies can be printed on. I’m just not sure what size they come in or just how expensive they are, but they could start simulating the interactive functionality of ArcReader.

transparency

Even if they’re only available in A4 size, we could distribute binders with literally dozens of transparencies each with a printed layer of data. This would allow community groups to actually start doing some analysis themselves and could be far more compelling than just disseminating poster-size static maps, especially in rural areas. Another idea would be to use transparent folders like those below and hand-draw some of the major layers. Alternatively, there might a type of thin plastic sheet available in the Sudan.

I’m thinking of trying to pilot this at some point. Any thoughts?

folders

Patrick Philippe Meier