Monthly Archives: April 2009

Nation-State Routing: Globalizing Censorship

I just found an interesting piece on Internet censorship at arXiv, my favorite go-to place for scientific papers that are pre-publication. Entitled “Nation-State Routing: Censorship, Wiretapping and BGP,” this empirical study is possibly the first to determine the aggregate effect of national policies on the flow of international traffic.

As government control over the treatment of Internet traffic becomes more common, many people will want to understand how international reachability depends on individual countries and to adopt strategies either for enhancing or weakening the dependence on some countries.

Introduction

States typically impose censorship to prevent domestic users from reaching questionable content. Some censorship techniques, however, “may affect all traffic traversing an [Autonomous System].” For example, Internet Service Providers (ISPs) in China, Britain and Pakistan block Internet traffic at the Internet Protocol (IP) level by “filtering based on IP addresses and URLs in the data packets, or performing internal prefix hijacks, which could affect the international traffic they transit.”

The scope and magnitude of this affect is unclear. What we do know is that one may intentionally or by accident apply censorship policies to international traffic, as demonstrated by the global YouTube outage last year as a result of a domestic Pakistani policy directive.

Methodology

The authors therefore developed a framework to study interdomain routing at the nation-state level. They first adapted the “Betweeness Centrality” metric from statistical physics to measure the importance, or centrality, of each country to Internet reachability. Second, they designed, implemented and validated a Country Path Algorithm (CPA) to infer country-paths from a pair of source and destination IP addresses.

Findings

The table below shows Country Centrality (CC) computed directly from Trace Route (TR) and Border Gate Protocol (BGP). The closer the number is to one, the more impact that country’s domestic Internet censorship policies has on international Internet traffic.

arxivtable1

The second table below lists both Country Centrality (CC) and Strong Country Centrality (SCC). The latter measures how central countries are when alternative routes are considered. When SCC equals one, this suggests a country is completely unavoidable.

arxiv-table2

“Collectively, these results show that the ‘West’ continues to exercise disproportionate influence over international routing, despite the penetration of the Internet to almost every region of the world, and the rapid development of China and India.”

This last table below lists CC and SCC measures for authoritarian countries that are known for significant domestic censorship of Internet content. Aside from China, “these countries have very little influence over global reachability.”

arxiv-table3

Next Steps

The authors of the study point to a number of interesting questions for future research. For example, it would be interesting to know how the centrality result above change over time, i.e., which countries are becoming more central over time, and why?

Another important question is what economically driven strategies single countries (or small coalitions of countries) could adopt to increase their own centrality or to reduce that of other countries?

One final and particularly important question would to find out what fraction of domestic paths are actually routed through another country? This is important because the answer to this question would “provide insight into the influence that foreign nations have over a country’s domestic routing and security, and would shed light on […] whether warrantless tapping on links in one country to another might inadvertently capture some purely domestic traffic.”

Patrick Philippe Meier

Developing Swift River to Validate Crowdsourcing

Swift River is an Ushahidi initiative to crowdsource the process of data validation. We’re developing a Swift River pilot to complement the VoteReport India crowdsourcing platform we officially launched this week. As part of the Swift River team, I’d like to share with iRevolution readers what I hope the Swift River tool will achieve.

We had an excellent series of brainstorming sessions several weeks ago in Orlando and decided we would combine both natural language processing (NLP) and decentralized human filtering to get one step closer at validating crowdsourced data. Let me expand on how I see both components working individually and together.

Automated Parsing

Double-counting has typically been the bane of traditional NLP or automated event-data extraction algorithms. At Virtual Research Associates (VRA), for example, we would parse headlines of Reuters newswires in quasi real-time, which meant that a breaking story would typically be updated throughout the day or week.

But the natural language parser was specifically developed to automate event-data extraction based on the parameters “Who did what, to whom, where and when?” In other words, the parser could not distinguish whether coded events were actually the same or related. This tedious task was left to VRA analysts to carry out.

Digital Straw

The logic behind eliminating double counting (duplicate event-data) is inevitably reversed given the nature of crowdsourcing. To be sure, the more reports are collected about a specific event, the more likely it is that the event in question actually took place as described by the crowd. Ironically, that is precisely why we want to “drink from the fire hose,” the swift river of data gushing through the wires of social media networks.

We simply need a clever digital straw to filter the torrent of data. This is where our Swift River project comes in and why I first addressed the issue of double counting. One of the central tasks I’d like Swift River to do is to parse the incoming reports from VoteReport India and to cluster them into unique event-clusters. This would be one way to filter the cascading data. Moreover, the parser could potentially help filter fabricated reports.

An Example

For example, if 17 individual reports from different sources are submitted over a two-day period about “forged votes,” then the reports in effect self-triangulate or validate each other. Of course, someone (with too much time on their hands) might decide to send 17 false reports about “forged votes.”

Our digital straw won’t filter all the impurities, but automating this first-level filter is surely better than nothing. Automating this process would require that the digital straw automate the extraction of nouns, verbs and place names from each report, i.e., actor, action and location. Date and time would automatically be coded based on when the report was submitted.

Reports that use similar verbs (synonyms) and refer to the same or similar actors at the same location on the same day can then be clustered into appropriate event-clusters. More on that in the section on crowdsourcing the filter below.

More Filters

A second-level filter would compare the content of the reports to determine if they were exact replicas. In other words, if someone were simply copying and pasting the same report, Swift River could flag those identical reports as suspicious. This means someone gaming the system would have to send multiple reports with different wording, thus making it a bit more time consuming to game the system.

A third-level filter or trip-wire could compare the source of the 17 reports. For example, perhaps 10 reports were submitted by email, 5 by SMS and two by Twitter. The greater the diversity of media used to report an event, the more likely that event actually happened. This means that someone wanting to game the system would have to send several emails, text messages and Tweets using different language to describe a particular event.

A fourth-level filter could identify the email addresses, IP addresses and mobile phone numbers in question to determine if they too were different. A crook trying to game the system would now have to send emails from different accounts and IP addresses, different mobile phone numbers, and so on. Anything “looking suspicious” would be flagged for a human to review; more on that soon. The point is to make the gaming of the system as time consuming and frustrating as possible.

Gaming the System

Of course, if someone is absolutely bent on submitting fabricated data that passes all the filters, then they will.  But those individuals probably constitute a minority of offenders. Perhaps the longer and more often they do this, the more likely someone in the crowd will pick up on the con. As for the less die-hard crooks out there, they may try and game the system only to see that their reports do not get mapped. Hopefully they’ll give up.

I do realize I’m giving away some “secrets” to gaming the system, but I hope this will be more a deterrent than an invitation to crack the system. If you do happen to be someone bent on gaming the platform, I wish you’d get in touch with us instead and help us improve the filters. Either way, we’ll learn from you.

No one on the Swift River team claims that 100% of the dirt will be filtered. What we seek to do is develop a digital filter that makes the data that does come through palatable enough for public consumption.

Crowdsourcing the Filter

Remember the unique event-clusters idea from above? These could be visualized in a simple and intuitive manner for human volunteers (the crowd) to filter. Flag icons, perhaps using three different colors—green, orange and red—could indicate how suspicious a specific series of reports might be based on the results of the individual filters described above.

A green flag would indicate that the report has been automatically mapped on VoteReport upon receipt. An orange flag would indicate the need for review by the crowd while a red flag would send an alert for immediate review.

If a member of the crowd does confirm that a series of reports were indeed fabricated, Swift River would note the associated email address(es), IP address(es) and/or mobile phone number(s) and automatically flag future reports from those sources as red. In other words, Swift River would start rating the credibility of users as well.

If we can pull this off, Swift River may actually start to provide “early warning” signals. To be sure, if we fine tune our unique event-cluster approach, a new event-cluster would be created by a report that describes an event which our parser determines has not yet been reported on.

This should set off a (yellow) flag for immediate review by the crowd. This could either be a legitimate new event or a fabricated report that doesn’t fit into pre-existing cluster. Of course, we will get a number of false positives, but that’s precisely why we include the human crowdsourcing element.

Simplicity

Either way, as the Swift River team has already agreed, this process of crowdsourcing the filter needs to be rendered as simple and seamless as possible. This means minimizing the number of clicks and “mouse motions” a user has to make and allowing for short-cut keys to be used, just like in Gmail. In addition, a userfiendly version of the interface should be designed specifically for mobile phones (various platforms and brands).

As always, I’d love to get your feedback.

Patrick Philippe Meier

Threat and Risk Mapping Analysis in Sudan

Massively informative.

That’s how I would describe my past 10 days with the UNDP‘s Threat and Risk Mapping Analysis (TRMA) project in the Sudan. The team here is doing some of the most exciting work I’ve seen in the field of crisis mapping. Truly pioneering. I can’t think of  a better project to apply the past two years of work I have done with the Harvard Humanitarian Initiative’s (HHI) Crisis Mapping and Early Warning Program.

TRMA combines all the facets of crisis mapping that I’ve been focusing on since 2007. Namely, crisis map sourcing, (CMS), mobile crisis mapping (MCM), crisis mapping visualization (CMV), crisis mapping analytics (CMA) and crisis mapping platforms (CMP). I’ll be blogging about each of these in more detail later but wanted to provide a sneak previous in the meantime.

Crisis Map Sourcing (CMS)

The team facilitates 2-day focus groups using participatory mapping methods. Participants identify and map the most pressing crisis factors in their immediate vicinity. It’s really quite stunning to see just how much conversation a map can generate. Rich local knowledge.

trma1

What’s more, TRMA conducts these workshops at two levels for each locality (administrative boundaries within a state): the community-level and at the state-level. They can then compare the perceived threats and risks from both points of view. Makes for very interesting comparisons.

trma2

In addition to this consultative approach to crisis map sourcing, TRMA has played a pivotal role in setting up an Information Management Working Group (IMWG) in the Sudan, which includes the UN’s leading field-based agencies.

What is truly extraordinary about this initiative is that each agency has formally signed an information sharing protocol to share their geo-referenced data. TRMA had already been using much of this data but the process until now had always been challenging since it required repeated bilateral efforts. TRMA has also developed a close professional relationship with the Central Bureau of Statistics Office.

Mobile Crisis Mapping (MCM)

The team has just partnered with a multinational communications corporation to introduce the use of mobile phones for information collection. I’ll write more about this in the coming weeks. Needless to say, I’m excited. Hopefully it won’t be too late to bring up FrontlineSMS‘s excellent work in this area, as well as Ushahidi‘s.

Crisis Mapping Visualization (CMV)

The team needs some help in this area, but then again, that’s one of the reasons I’m here. Watching first reactions during focus groups when we show participants the large GIS maps of their state is  really very telling. Lots more to write about on this and lots to contribute to TRMA’s work. I don’t yet know which maps can be made public but I’ll do my utmost best to get permission to post one or two in the coming weeks.

Crisis Mapping Analytics (CMA)

The team has produced a rich number of different layers of data which can be superimposed to identify visual correlations and otherwise hidden patterns. Perhaps one of the most exciting examples is when the team started drawing fault lines on the maps based on the data collected and their own local area expertise. The team subsequently realized that these fault lines could potential serve as “early warning” markers since a number of conflict incidents subsequently took place along those lines. Like the other crisis mapping components described above, there’s much more to write on this!

Crisis Mapping Platforms (CMP)

TRMA’s GIS team has used ArcGIS but this has been challenging given the US embargo on the Sudan. They therefore developed their own in-house mapping platforms using open-source software. These platforms include the “Threat Mapper” for data entry during (or shortly after) the focus groups and “4Ws” which stands for Who, What, Where and When. The latter tool is operational and will soon be fully developed. 4Ws will actually be used by members of the IMWG to share and visualize their data.

In addition, TRMA makes it’s many maps and layers available by distributing a customized DVD with ArcReader (which is free). Lots more on this in the coming weeks and hopefully some screenshots as well.

Closing the Feedback Loop

I’d like to add with one quick thought, which I will also expand on in the next few weeks. I’ve been in Blue Nile State over the past three days, visiting a number of different local ministries and civil society groups, including the Blue Nile’s Nomadic Union. We distributed dozens of poster-size maps and had at times hour long discussions while pouring over these maps. As I hinted above, the data visualization can be improved. But the question I want to pose at the moment is: how can we develop a manual GIS platform?

While the maps we distributed were of huge interest to our local partners, they were static, as hard-copy maps are bound to be. This got me thinking about possibly using transparencies to overlap different data/thematic layers over a general hard-copy map. I know transparencies can be printed on. I’m just not sure what size they come in or just how expensive they are, but they could start simulating the interactive functionality of ArcReader.

transparency

Even if they’re only available in A4 size, we could distribute binders with literally dozens of transparencies each with a printed layer of data. This would allow community groups to actually start doing some analysis themselves and could be far more compelling than just disseminating poster-size static maps, especially in rural areas. Another idea would be to use transparent folders like those below and hand-draw some of the major layers. Alternatively, there might a type of thin plastic sheet available in the Sudan.

I’m thinking of trying to pilot this at some point. Any thoughts?

folders

Patrick Philippe Meier

Ushahidi Comes to India for the Elections (Updated)

I’m very please to announce that the Ushahidi platform has been deployed at VoteReport.in to crowdsource the monitoring of India’s upcoming elections. The roll out followed our preferred model: an amazing group of Indian partners took the initiative to drive the project forward and are doing a superb job. I’m learning a lot from their strategic thinking.

picture-3

We’re also excited about developing Swift River as part of VoteReport India to apply a crowdsourcing approach to filter the incoming information for accuracy. This is of course all experimental and we’ll be learning a lot in the process. For a visual introduction to Swift River, please see Erik Hersman’s recent video documentary on our conversations on Swift River, which we had a few weeks ago in Orlando.

picture-5

As per our latest Ushahidi deployments, VoteReport users can report on the Indian elections by email, SMS, Tweet or by submitting an incident directly online at VoteReport. Users can also subscribe to email alerts—a functionality I’m particularly excited about as this closes the crowdsourcing to crowdfeeding feedback loop; so I’m hoping we can also add SMS alerts, funding permitted. For more on crowdfeeding, please see my previous post on “Ushahidi: From Crowdsourcing to Crowdfeeding.

picture-4

You can read more about the project here and about the core team here. It really is an honor to be a part of this amazing group. We also have an official VoteReport blog here. I also highly recommend reading Gaurav Mishra‘s blog post on VoteReport here and Ushahidi’s here.

Next Steps

  • We’re thinking of using a different color to depict “All Categories” since red has cognitive connotations of violence and we don’t want this to be the first impression given by the map.
  • I’m hoping we can add a “download feature” that will allow users to directly download the VoteReport data as a CSV file and as a KML Google Earth Layer. The latter will allow users to dynamically visualize VoteReports over space and time just like [I did here] with the Ushahidi data during the Kenyan elections.
  • We’re also hoping to add a feature that asks those submitting incidents to check-off that the information they submit is true. The motivation behind this is inspired from recent lessons learned in behavioral economics as explained in my blog post on “Crowdsourcing Honesty.

Patrick Philippe Meier

iRevolution One Year On…

I started iRevolution exactly one year ago and it’s been great fun! I owe the Fletcher A/V Club sincere thanks for encouraging me to blog. Little did I know that blogging was so stimulating or that I’d be blogging from the Sudan.

Here are some stats from iRevolution Year One:

  • Total number of blog posts = 212
  • Total number of comments = 453
  • Busiest day ever = December 15, 2008

And the Top 10 posts:

  1. Crisis Mapping Kenya’s Election Violence
  2. The Past and Future of Crisis Mapping
  3. Mobile Banking for the Bottom Billion
  4. Impact of ICTs on Repressive Regimes
  5. Towards an Emergency News Agency
  6. Intellipedia for Humanitarian Warning/Response
  7. Crisis Mapping Africa’s Cross-border Conflicts
  8. 3D Crisis Mapping for Disaster Simulation
  9. Digital Resistance: Digital Activism and Civil Resistance
  10. Neogeography and Crisis Mapping Analytics

I do have a second blog that focuses specifically on Conflict Early Warning, which I started at the same time. I have authored a total of 48 blog posts.

That makes 260 posts in 12 months. Now I know where all the time went!

The Top 10 posts:

  1. Crimson Hexagon: Early Warning 2.0
  2. CSIS PCR: Review of Early Warning Systems
  3. Conflict Prevention: Theory, Police and Practice
  4. New OECD Report on Early Warning
  5. Crowdsourcing and Data Validation
  6. Sri Lanka: Citizen-based Early Warning/Response
  7. Online Searches as Early Warning Indicators
  8. Conflict Early Warning: Any Successes?
  9. Ushahidi and Conflict Early Response
  10. Detecting Rumors with Web-based Text Mining System

I look forward to a second year of blogging! Thanks to everyone for reading and commenting, I really appreciate it!

Patrick Philippe Meier

Peer Producing Human Rights

Molly Land at New York Law School has written an excellent paper on peer producing human rights, which will appear in the Alberta Law Review, 2009. This is one of the best pieces of research that I have come across on the topic. I highly recommend reading her article when published.

Molly considers Wikipedia, YouTube and Witness.org in her excellent research but somewhat surprisingly does not reference Ushahidi. I thus summarize her main points below and draw on the case study of Ushahidi—particularly Swift River—to compare and contrast her analysis with my own research and experience.

Introduction

Funding for human rights monitoring and advocacy is particularly limited, which is why “amateur involvement in human rights activities has the potential to have a significant impact on the field.” At the same time, Molly recognizes that peer producing human rights may “present as many problems as it solves.”

Human rights reporting is the most professionalized activity of human rights organizations. This professionalization exists “not because of an inherent desire to control the process, but rather as a practical response to the demands of reporting-namely, the need to ensure accuracy of the information contained in the report.” The question is whether peer-produced human rights reporting can achieve the same degree of accuracy without a comparable centralized hierarchy.

Accurate documentation of human rights abuses is very important for building up a reputation as a credible human rights organization. Accuracy is also important to counter challenges by repressive regimes that question the validity of certain human rights reports. Moreover, “inaccurate reporting risks injury not only to the organization’s credibility and influence but also to those whose behalf the organization advocates.”

Control vs Participation

A successful model for peer producing human rights monitoring would represent an important leap forward in the human rights community. Such a model would enable us to process a lot more information in a timelier manner and would also “increase the extent to which ordinary individuals connect to human rights issues, thus fostering the ability of the movement to mobilize broad constituencies and influence public opinion in support of human rights.”

Increased participation is often associated with an increased risk of inaccuracy. In fact, “even the perception of unreliability can be enough to provide […] a basis for critiquing the information as invalid.” Clearly, ensuring the trustworthiness of information in any peer-reviewed project is a continuing challenge.

Wikipedia uses corrective editing as the primary mechanism to evaluate the accuracy of crowdsourced information. Molly argues that this may not work well in the human rights context because direct observation, interviews and interpretation are central to human rights research.

To this end, “if the researcher contributes this information to a collaboratively-edited report, other contributors will be unable to verify the statements because they do not have access to either the witness’s statement or the information that led the researcher to conclude it was reliable.” Even if they were able to verify statements, much of human rights reporting is interpretive, which means that even experienced human rights professionals disagree about interpretive conclusions.

Models for Peer Production

Molly presents three potential models to outline how human rights reporting and advocacy might be democratized. The first two models focus on secondary and primary information respectively, while the third proposes certification by local NGOs. Molly outlines the advantages and challenges that each model presents. Below is a summary with my critiques. I do not address the third model because as noted by Molly it is not entirely participatory.

Model 1. This approach would limit peer-production to collecting, synthesizing and verifying secondary information. Examples include “portals or spin-offs of existing portals, such as Wikipedia,” which could “allow participants to write about human rights issues but require them to rely only on sources that are verifiable […].” Accuracy challenges could be handled in the same way that Wikipedia does; namely through a “combination of collaborative editing and policies; all versions of the page are saved and it is easy for editors who notice gaming or vandalism to revert to the earlier version.”

The two central limitations of this approach are that (1) the model would be limited to a subset of available information restricted to online or print media; and (2) even limiting the subset of information might be insufficient to ensure reliability. To this end, this model might be best used to complement, not substitute, existing fact-finding efforts.

Model 2. This approach would limit the peer-production of human rights report to those with first-hand knowledge. While Molly doesn’t reference Ushahidi in her research, she does mention the possibility of using a website that would allow witnesses to report human rights abuses that they saw or experienced. Molly argues that this first-hand information on human rights violations could be particularly useful for human rights organizations that seek to “augment their capacity to collect primary information.”

This model still presents accuracy problems, however. “There would be no way to verify the information contributed and it would be easy for individuals to manipulate the system.” I don’t agree. The statement: “there would be no way to verify the information” is an exaggeration. There multiple methods that could be employed to determine the probability that the contributed information is reliable, which is the motivation behind our Swift River project at Ushahidi, which seeks to use crowdsourcing to filter human rights information.

Since Swift River deserves an entire blog post to itself, I won’t describe the project. I’d just like to mention that the Ushahidi team just spent two days brainstorming creative ways that crowdsourced information could be verified. Stay tuned for more on Swift River.

We can still address Molly’s concerns without reference to Ushahidi’s Swift River.

Individuals who wanted to spread false allegations about a particular government or group, or to falsely refute such allegations, might make multiple entries (which would therefore corroborate each other) regarding a specific incident. Once picked up by other sources, such allegations ‘may take on a life of their own.’ NGOs using such information may feel compelled to verify this information, thus undermining some of the advantages that might otherwise be provided by peer production.

Unlike Molly, I don’t see the challenge of crowdsourced human rights data as first and foremost a problem of accuracy but rather volume. Accuracy, in many instances, is a function of how many data points exist in our dataset.

To be sure, more crowdsourced information can provide an ideal basis for triangulation and validation of peer produced human rights reporting-particularly if we embrace multimedia in addition to simply text. In addition, more information allows us to use probability analysis to determine the potential reliability of incoming reports. This would not undermine the advantages of peer-production.

Of course, this method also faces some challenges since the success of triangulating crowdsourced human rights reports is dependent on volume. I’m not suggesting this is a perfect fix, but I do argue that this method will become increasingly tenable since we are only going to see more user-generated content, not less. For more on crowdsourcing and data validation, please see my previous posts here.

Molly is concerned that a website allowing peer-production based on primary information may “become nothing more than an opinion site.” However, a crowdsourcing platform like Ushahidi is not an efficient platform for interactive opinion sharing. Witnesses simply report on events, when they took place and where. Unlike blogs, the platform does not provide a way for users to comment on individual reports.

Capacity Building

Molly does raise an excellent point vis-à-vis the second model, however. The challenges of accuracy and opinion competition might be resolved by “shifting the purpose for which the information is used from identifying violations to capacity building.” As we all know, “most policy makers and members of the political elite know the facts already; what they want to know is what they should do about them.”

To this end, “the purpose of reporting in the context of capacity building is not to establish what happened, but rather to collect information about particular problems and generate solutions. As a result, the information collected is more often in the form of opinion testimony from key informants rather than the kind of primary material that needs to be verified for accuracy.”

This means that the peer produced reporting does not “purport to represent a kind of verifiable ‘truth’ about the existence or non-existence of a particular set of facts,” so the issue of “accuracy is somewhat less acute.” Molly suggests that accuracy might be further improved by “requiring participants to register and identify themselves when they post information,” which would “help minimize the risk of manipulation of the system.” Moreover, this would allow participants to view each other’s contributions and enable a contributor to build a reputation for credible contributions.

However, Molly points out that these potential solutions don’t change the fact that only those with Internet access would be able to contribute human right reports, which could “introduce significant bias considering that most victims and eyewitnesses of human rights violations are members of vulnerable populations with limited, if any, such access.” I agree with this general observation, but I’m surprised that Molly doesn’t reference the use of mobile phones (and other mobile technologies) as a way to collect testimony from individuals without access to the Internet or in inaccessible areas.

Finally, Molly is concerned that Model 2 by itself “lacks the deep participation that can help mobilize ordinary individuals to become involved in human rights advocacy.” This is increasingly problematic since “traditional  ‘naming and shaming’ may, by itself, be increasingly less effective in its ability to achieve changes state conduct regarding human rights.” So Molly rightly encourages the human rights community to “investigate ways to mobilize the public to become involved in human rights advocacy.”

In my opinion, peer produced advocacy faces the same challenges as traditional human rights advocacy. It is therefore important that the human rights community adopt a more tactical approach to human rights monitoring. At Ushahidi, for example, we’re working to add a “subscribe-to-alerts” feature, which will allow anyone to receive SMS alerts for specific locations.

P2P Human Rights

The point is to improve the situational awareness of those who find themselves at risk so they can get out of harm’s way and not become another human rights statistic. For more on tactical human rights, please see my previous blog post.

Human rights organizations that are engaged in intervening to prevent human rights violations would also benefit from subscribing to Ushahidi. More importantly, the average person on the street would have the option of intervening as well. I, for one, am optimistic about the possibility of P2P human rights protection.

Patrick Philippe Meier