Data Mining Wikipedia in Real Time for Disaster Response

My colleague Fernando Diaz has continued working on an interesting Wikipedia project since he first discussed the idea with me last year. Since Wikipedia is increasingly used to crowdsource live reports on breaking news such as sudden-onset humanitarian crisis and disasters, why not mine these pages for structured information relevant to humanitarian response professionals?


In computing-speak, Sequential Update Summarization is a task that generates useful, new and timely sentence-length updates about a developing event such as a disaster. In contrast, Value Tracking tracks the value of important event-related attributes such as fatalities and financial impact. Fernando and his colleagues will be using both approaches to mine and analyze Wikipedia pages in real time. Other attributes worth tracking include injuries, number of displaced individuals, infrastructure damage and perhaps disease outbreaks. Pictures of the disaster uploaded to a given Wikipedia page may also be of interest to humanitarians, along with meta-data such as the number of edits made to a page per minute or hour and the number of unique editors.

Fernando and his colleagues have recently launched this tech challenge to apply these two advanced computing techniques to disaster response based on crowdsourced Wikipedia articles. The challenge is part of the Text Retrieval Conference (TREC), which is being held in Maryland this November. As part of this applied research and prototyping challenge, Fernando et al. plan to use the resulting summarization and value tracking from Wikipedia to verify related  crisis information shared on social media. Needless to say, I’m really excited about the potential. So Fernando and I are exploring ways to ensure that the results of this challenge are appropriately transferred to the humanitarian community. Stay tuned for updates. 



Web App Tracks Breaking News Using Wikipedia Edits

A colleague of mine at Google recently shared a new and very interesting Web App that tracks breaking news events by monitoring Wikipedia edits in real-time. The App, Wikipedia Live Monitor, alerts users to breaking news based on the frequency of edits to certain articles. Almost every significant news event has a Wikipedia page that gets updated in near real-time and thus acts as a single, powerful cluster for tacking an evolving crisis.

Wikipedia Live Monitor

Social media, in contrast, is far more distributed, which makes it more difficult to track. In addition, social media is highly prone to false positives. These, however, are almost immediately corrected on Wikipedia thanks to dedicated editors. Wikipedia Live Monitor currently works across several dozen languages and also “cross-checks edits with social media updates on Twitter, Google Plus and Facebook to help users get a better sense of what is trending” (1).

I’m really excited to explore the use of this Live Monitor for crisis response and possible integration with some of the humanitarian technology platforms that my colleagues and I at QCRI are developing. For example, the Monitor could be used to supplement crisis information collected via social media using the Artificial Intelligence for Disaster Response (AIDR) platform. In addition, the Wikipedia Monitor could also be used to triangulate reports posted to our Verily platform, which leverages time-critical crowdsourcing techniques to verify user-generated content posted on social media during disasters.


HURIDOCS09: From Wikipedia to Ushahidi

The Panel

I just participated in a panel on “Communicating Human Rights Information Through Technology” at the HURIDOCS conference in Geneva and presented Ushahidi as an alternative model. My fellow panelists included Florence Devouard, Chair of the Wikimedia Foundation, Sam Gregory from Witness.org, Lars Bromley from AAAS and Dan Brickley, a researcher, advocate and developer of Semantic Web technologies.


Out of the hundred-or-so participants in the plenary, only a handful, five-or-so, had heard of the Kenyan initiative. So this was a great opportunity to share the Ushahidi story with a diverse coalition of committed human rights workers. There were at least 40 countries or territories represented, ranging from Armenia and Ecuador to Palestine and Zimbabwe.

Since I’ve blogged about Ushahidi extensively already, I will only add a few observations here (see Slideshare for the slides). My presentation followed Florence’s talk on the latest developments at Wikipedia and I really hope to get more of her thoughts on applying lessons learned to the Ushahidi project. Both projects entail crowdsourcing and data validation processes.


“Nobody Knows Everything, but Everyone Knows Something.” I borrowed this line from Florence’s talk to explain the rationale behind Ushahidi. Applied to human rights reporting, “nobody knows about every human rights violation taking place, but everyone may know of some incidents.” The latter is the local knowledge that Ushahidi seeks to render more visible by taking a crowdsourcing approach.

Recognizing the powerful convergence of communication technologies and information ecosystems is key to Ushahidi’s platform. Various deployments of Ushahidi have allowed individuals to report human rights violations online, by SMS and/or via Twitter. Unlike the majority of human rights monitoring platforms, Ushahidi seeks to “close the feedback loop” by allowing individuals to subscribe to alerts in their cities. As we know only too well, monitoring human rights violations is not equivalent to preventing them.


Given the importance of data validation vis-a-vis human rights reporting, I outlined Ushahidi’s approach and introduced the Swift River initiative which uses crowdsourcing to filter crisis information reported via Twitter, Ushahidi, Flickr, YouTube, local mobile and web social networks. When Ushahidi published their first blog post on Swift River, I commented that Wikipedia was most likely the best at crowdsourcing the filter.

This explains why I’m eager to learn more from Florence regarding her experience with Wikipedia. She mentioned that one new way they track online vandalism of Wikipedia entries is by detecting “sudden changes” in the flow of edits by anonymous users. Edits of this nature must be validated by a third party before being officially published—a new rule being considered by Wikipedia.

One other point worth noting, and which I’ve blogged about before, is that Wikipedia continues to be used for real-time reporting of unfolding crises. We saw this during the London bombings back in 2005 and more recently with the Mumbai attacks. The pages were being edited at least a hundred times a day and as far as I know were as accurate as mainstream media reports and more up-to-date.

The point is, if Wikipedia can serve as a platform for accurate, real-time reporting of political crises, then so should Ushahidi. The challenge is to get enough contributors to Ushahidi to constitute “the crowd” and sufficient alerts to constitute a river. The power here is in the numbers. Perhaps in time the Ushahidi platform may become more like a public sphere where different perspectives on alerts might be exchanged. In other words, we may see a shift away from data “deconfliction” which is reductionist.

The Q&A

The Questions and Answers session was productive and lively. Concerns about data validation and the security of those reporting in repressive environments were raised. The point to keep in mind is that Ushahidi does not exist in a vacuum, which is why I showed HHI’s Google Earth Layer of Kenya’s post-election violence. To be sure, Ushahidi does not replace but rather complements traditional sources of reporting like the national media or alternative sources like citizen journalism. Think of a collage as opposed to a painting.

Human rights incidents mapped on the Ushahidi platform may not be fully validated, but the purpose of Ushahidi is not to provide information on human rights violations that meet ICC standards. The point is to document instances of violations so they (1) can be investigated by interested parties, and (2) serve as potential early warnings for communities caught in conflict. In terms of the security of those engaged in reporting alerts using the Ushahidi platform, the team is adding a feature that allows users to report anonymously.

As expected, there were also concerns about “bad guys” gaming the Ushahidi platform. This is a tricky point to respond to because (1) to the best of my knowledge this hasn’t happened; (2) I’m not sure what the “bad guys” would stand to gain tactically and strategically; (3) Ushahidi has a fraction of the audience—and hence political influence—that television and radio stations have; (4) I doubt “bad guys” are immune to the digital “fog of war“; (5)  the point of Swift River is to make gaming difficult by filtering it out.

In any event, it would behoove Ushahidi to consider potential scenarios in which the platform could be used to promote disinformation and violence. At this point, however, I’m really not convinced that “bad guys” will see the Ushahidi platform as a useful tool to further their own ends.

Web 2.0 Tracks Attacks on Mumbai (Updated)

Twitter, Flickr, Wikipedia, YouTube – a few of the Web 2.0 & mobile applications tracking the Mumbai attacks in quasi real time along with the aftermath. Twitter was apparently faster than CNN in reporting the initial events, according to TechCrunch:


From TechMacro: “the local authority advised TV channels to stop broadcasting sensitive information which may help terrorists tracking army’s movements. It is much less likely that the terrorists are now using Twitter to find way to escape.”

For live, crowdsourcing updates, see the following links on Twitter, Flickr, Wikipedia. The Wikipedia entry already includes a picture (probably taken with mobile phone) of one of the terrorists.

Wired also writes that “local bloggers at Metblogs Mumbai have new updates every couple of minutes. So do the folks at GroundReport. Dozens of videos have been uploaded to YouTube. But the most remarkable citizen journalism may be coming from “Vinu,” who is posting a stream of harrowing post-attack pictures to Flickr.”

