Tag Archives: Swift River

Developing Swift River to Validate Crowdsourcing

Swift River is an Ushahidi initiative to crowdsource the process of data validation. We’re developing a Swift River pilot to complement the VoteReport India crowdsourcing platform we officially launched this week. As part of the Swift River team, I’d like to share with iRevolution readers what I hope the Swift River tool will achieve.

We had an excellent series of brainstorming sessions several weeks ago in Orlando and decided we would combine both natural language processing (NLP) and decentralized human filtering to get one step closer at validating crowdsourced data. Let me expand on how I see both components working individually and together.

Automated Parsing

Double-counting has typically been the bane of traditional NLP or automated event-data extraction algorithms. At Virtual Research Associates (VRA), for example, we would parse headlines of Reuters newswires in quasi real-time, which meant that a breaking story would typically be updated throughout the day or week.

But the natural language parser was specifically developed to automate event-data extraction based on the parameters “Who did what, to whom, where and when?” In other words, the parser could not distinguish whether coded events were actually the same or related. This tedious task was left to VRA analysts to carry out.

Digital Straw

The logic behind eliminating double counting (duplicate event-data) is inevitably reversed given the nature of crowdsourcing. To be sure, the more reports are collected about a specific event, the more likely it is that the event in question actually took place as described by the crowd. Ironically, that is precisely why we want to “drink from the fire hose,” the swift river of data gushing through the wires of social media networks.

We simply need a clever digital straw to filter the torrent of data. This is where our Swift River project comes in and why I first addressed the issue of double counting. One of the central tasks I’d like Swift River to do is to parse the incoming reports from VoteReport India and to cluster them into unique event-clusters. This would be one way to filter the cascading data. Moreover, the parser could potentially help filter fabricated reports.

An Example

For example, if 17 individual reports from different sources are submitted over a two-day period about “forged votes,” then the reports in effect self-triangulate or validate each other. Of course, someone (with too much time on their hands) might decide to send 17 false reports about “forged votes.”

Our digital straw won’t filter all the impurities, but automating this first-level filter is surely better than nothing. Automating this process would require that the digital straw automate the extraction of nouns, verbs and place names from each report, i.e., actor, action and location. Date and time would automatically be coded based on when the report was submitted.

Reports that use similar verbs (synonyms) and refer to the same or similar actors at the same location on the same day can then be clustered into appropriate event-clusters. More on that in the section on crowdsourcing the filter below.

More Filters

A second-level filter would compare the content of the reports to determine if they were exact replicas. In other words, if someone were simply copying and pasting the same report, Swift River could flag those identical reports as suspicious. This means someone gaming the system would have to send multiple reports with different wording, thus making it a bit more time consuming to game the system.

A third-level filter or trip-wire could compare the source of the 17 reports. For example, perhaps 10 reports were submitted by email, 5 by SMS and two by Twitter. The greater the diversity of media used to report an event, the more likely that event actually happened. This means that someone wanting to game the system would have to send several emails, text messages and Tweets using different language to describe a particular event.

A fourth-level filter could identify the email addresses, IP addresses and mobile phone numbers in question to determine if they too were different. A crook trying to game the system would now have to send emails from different accounts and IP addresses, different mobile phone numbers, and so on. Anything “looking suspicious” would be flagged for a human to review; more on that soon. The point is to make the gaming of the system as time consuming and frustrating as possible.

Gaming the System

Of course, if someone is absolutely bent on submitting fabricated data that passes all the filters, then they will.  But those individuals probably constitute a minority of offenders. Perhaps the longer and more often they do this, the more likely someone in the crowd will pick up on the con. As for the less die-hard crooks out there, they may try and game the system only to see that their reports do not get mapped. Hopefully they’ll give up.

I do realize I’m giving away some “secrets” to gaming the system, but I hope this will be more a deterrent than an invitation to crack the system. If you do happen to be someone bent on gaming the platform, I wish you’d get in touch with us instead and help us improve the filters. Either way, we’ll learn from you.

No one on the Swift River team claims that 100% of the dirt will be filtered. What we seek to do is develop a digital filter that makes the data that does come through palatable enough for public consumption.

Crowdsourcing the Filter

Remember the unique event-clusters idea from above? These could be visualized in a simple and intuitive manner for human volunteers (the crowd) to filter. Flag icons, perhaps using three different colors—green, orange and red—could indicate how suspicious a specific series of reports might be based on the results of the individual filters described above.

A green flag would indicate that the report has been automatically mapped on VoteReport upon receipt. An orange flag would indicate the need for review by the crowd while a red flag would send an alert for immediate review.

If a member of the crowd does confirm that a series of reports were indeed fabricated, Swift River would note the associated email address(es), IP address(es) and/or mobile phone number(s) and automatically flag future reports from those sources as red. In other words, Swift River would start rating the credibility of users as well.

If we can pull this off, Swift River may actually start to provide “early warning” signals. To be sure, if we fine tune our unique event-cluster approach, a new event-cluster would be created by a report that describes an event which our parser determines has not yet been reported on.

This should set off a (yellow) flag for immediate review by the crowd. This could either be a legitimate new event or a fabricated report that doesn’t fit into pre-existing cluster. Of course, we will get a number of false positives, but that’s precisely why we include the human crowdsourcing element.

Simplicity

Either way, as the Swift River team has already agreed, this process of crowdsourcing the filter needs to be rendered as simple and seamless as possible. This means minimizing the number of clicks and “mouse motions” a user has to make and allowing for short-cut keys to be used, just like in Gmail. In addition, a userfiendly version of the interface should be designed specifically for mobile phones (various platforms and brands).

As always, I’d love to get your feedback.

Patrick Philippe Meier

Ushahidi Comes to India for the Elections (Updated)

I’m very please to announce that the Ushahidi platform has been deployed at VoteReport.in to crowdsource the monitoring of India’s upcoming elections. The roll out followed our preferred model: an amazing group of Indian partners took the initiative to drive the project forward and are doing a superb job. I’m learning a lot from their strategic thinking.

picture-3

We’re also excited about developing Swift River as part of VoteReport India to apply a crowdsourcing approach to filter the incoming information for accuracy. This is of course all experimental and we’ll be learning a lot in the process. For a visual introduction to Swift River, please see Erik Hersman’s recent video documentary on our conversations on Swift River, which we had a few weeks ago in Orlando.

picture-5

As per our latest Ushahidi deployments, VoteReport users can report on the Indian elections by email, SMS, Tweet or by submitting an incident directly online at VoteReport. Users can also subscribe to email alerts—a functionality I’m particularly excited about as this closes the crowdsourcing to crowdfeeding feedback loop; so I’m hoping we can also add SMS alerts, funding permitted. For more on crowdfeeding, please see my previous post on “Ushahidi: From Crowdsourcing to Crowdfeeding.

picture-4

You can read more about the project here and about the core team here. It really is an honor to be a part of this amazing group. We also have an official VoteReport blog here. I also highly recommend reading Gaurav Mishra‘s blog post on VoteReport here and Ushahidi’s here.

Next Steps

  • We’re thinking of using a different color to depict “All Categories” since red has cognitive connotations of violence and we don’t want this to be the first impression given by the map.
  • I’m hoping we can add a “download feature” that will allow users to directly download the VoteReport data as a CSV file and as a KML Google Earth Layer. The latter will allow users to dynamically visualize VoteReports over space and time just like [I did here] with the Ushahidi data during the Kenyan elections.
  • We’re also hoping to add a feature that asks those submitting incidents to check-off that the information they submit is true. The motivation behind this is inspired from recent lessons learned in behavioral economics as explained in my blog post on “Crowdsourcing Honesty.

Patrick Philippe Meier

WikiMapAid, Ushahidi and Swift River

Keeping up to date with science journals always pays off. The NewScientist just published a really interesting piece related to crisis mapping of diseases this morning. I had to hop on a flight back to Boston so am uploading my post now.

The cholera outbreak in Zimbabwe is becoming increasingly serious but needed data on the number cases and fatalities to control the problem is difficult to obtain. The World Health Organization (WHO) in Zimbabwe has stated that “any system that improves data collecting and sharing would be beneficial.”

This is where WikiMapAid comes in. Developed by Global Map Aid, the wiki enables humanitarian workers to map information on a version of Google Maps that can be viewed by anyone. “The hope is that by circumventing official information channels, a clearer picture of what is happening on the ground can develop.” The website is based on a “Brazilian project called Wikicrimes, launched last year, in which members of the public share information about crime in their local area.”

wikimapaid

WikiMapAid allows users to create markers and attach links to photographs or to post a report of the current situation in the area. Given the context of Zimbabwe, “if people feel they will attract attention from the authorities by posting information, they could perhaps get friends on the outside to post information for them.”

As always with peer-produced data, the validity of the information will depend on those supplying it. While moderators will “edit and keep track of postings […], unreliable reporting could be a problem. In order to address this, the team behind the project is “developing an algorithm that will rate the reputation of users according to whether the information they post is corroborated, or contradicted.”

This is very much in line with the approach we’re taking at Ushahidi for the Swift River project. As WikiMapAid notes, “even if we’re just 80 per cent perfect, we will still have made a huge step forward in terms of being able to galvanize public opinion, raise funds, prioritize need and speed the aid on those who need it most.”

Time to get in touch with the good folks at WikiMapAid.

Patrick Philippe Meier