Tag Archives: Crowdsourcing

GDACSmobile: Disaster Responders Turn to Bounded Crowdsourcing

GDACS, the Global Disaster Alert and Coordination System, sparked my interest in technology and disaster response when it was first launched back in 2004, which is why I’ve referred to GDACS in multiple blog posts since. This near real-time, multi-hazard monitoring platform is a joint initiative between the UN’s Office for the Coordination of Humanitarian Affairs (OCHA) and the European Commission (EC). GDACS serves to consolidate and improve the dissemination of crisis-related information including rapid mathematical analyses of expected disaster impact. The resulting risk information is distributed via Web and auto-mated email, fax and SMS alerts.

Screen Shot 2013-03-25 at 3.13.35 AM

I recently had the pleasure of connecting with two new colleagues, Daniel Link and Adam Widera, who are researchers at the University of Muenster’s European Research Center for Information Systems (ERCIS). Daniel and Adam have been working on GDACSmobile, a smartphone app that was initially developed to extend the reach of the GDACS portal. This project originates from a student project supervised by Daniel, Adam along with the Chair of the Center Bernd Hellingrath in cooperation with both Tom de Groeve from the Joint Research Center (JRC) and Minu Kumar Limbu, who is now with UNICEF Kenya.

GDACSmobile is intended for use by disaster responders and the general public, allowing for a combined crowdsourcing and “bounded crowdsourcing” approach to data collection and curation. This bounded approach was a deliberate design feature for GDACSmobile from the outset. I coined the term “bounded crowd-sourcing” four years ago (see this blog post from 2009). The “bounded crowd-sourcing” approach uses “snowball sampling” to grow a crowd of trusted reporters for the collection of crisis information. For example, one invites 5 (or more) trusted local reports to collect relevant information and subsequently ask each of these to invite 5 additional reporters who they fully trust; And so on, and so forth. I’m thrilled to see this term applied in practical applications such GDACSmobile. For more on this approach, please see these blog posts.

Bildschirmfoto 2013-03-25 um 13.47.21

GDACSmobile, which operates on all major mobile smartphones, uses a delibera-tely minimalist approach to situation reporting and can be used to collect info-rmation (via text & image) while offline. The collected data is then automatically transmitted when a connection becomes available. Users can also view & filter data via map view and in list form. Daniel and Adam are considering the addition of an icon-based data-entry interface instead of text-based data-entry since the latter is more cumbersome & time-consuming.

Bildschirmfoto 2013-03-24 um 22.15.28

Meanwhile, the server side of GDACSmobile facilitates administrative tasks such as the curation of data submitted by app users and shared on Twitter. Other social media platforms may be added in the future, such as Flickr, to retrieve relevant pictures from disaster-affected areas (similar to GeoFeedia). The server-side moderation feature is used to ensure high data quality standards. But the ERCIS researchers are also open to computational solutions, which is one reason GDACSmobile is not a ‘data island’ and why other systems for computational analysis, microtasking etc., can be used to process the same dataset. The server also “offers a variety of JSON services to allow ‘foreign’ systems to access the data. [...] SQL queries can also be used with admin access to the server, and it would be very possible to export tables to spreadsheets [...].” 

I very much look forward to following GDACSmobile’s progress. Since Daniel and Adam have designed their app to be open and are also themselves open to con-sidering computational solutions, I have already begun to discuss with them our AIDR project (Artificial Intelligence for Disaster Response) project at the Qatar Computing Research Institute (QCRI). I believe that making the ADIR-GDACS interoperable would make a whole lot of sense. Until then, if you’re going to this year’s International Conference on Information Systems for Crisis Response and Management (ISCRAM 2013) in May, then be sure to participate in the workshop (PDF) that Daniel and Adam are running there. The side-event will present the state of the art and future trends of rapid assessment tools to stimulate a conver-sation on current solutions and developments in mobile tech-nologies for post-disaster data analytics and situational awareness. My colleague Dr. Imran Muhammad from QCRI will also be there to present findings from our crisis computing research, so I highly recommend connecting with him.

Bio

Crisis Mapping Syria: Automated Data Mining and Crowdsourced Human Intelligence

The Syria Tracker Crisis Map is without doubt one of the most impressive crisis mapping projects yet. Launched just a few weeks after the protests began one year ago, the crisis map is spearheaded by a just handful of US-based Syrian activists have meticulously and systematically documented 1,529 reports of human rights violations including a total of 11,147 killings. As recently reported in this NewScientist article, “Mapping the Human Cost of Syria’s Uprising,” the crisis map “could be the most accurate estimate yet of the death toll in Syria’s uprising [...].” Their approach? “A combination of automated data mining and crowdsourced human intelligence,” which “could provide a powerful means to assess the human cost of wars and disasters.”

On the data-mining side, Syria Tracker has repurposed the HealthMap platform, which mines thousands of online sources for the purposes of disease detection and then maps the results, “giving public-health officials an easy way to monitor local disease conditions.” The customized version of this platform for Syria Tracker (ST), known as HealthMap Crisis, mines English information sources for evidence of human rights violations, such as killings, torture and detainment. As the ST Team notes, their data mining platform “draws from a broad range of sources to reduce reporting biases.” Between June 2011 and January 2012, for example, the platform collected over 43,o00 news articles and blog posts from almost 2,000 English-based sources from around the world (including some pro-regime sources).

Syria Tracker combines the results of this sophisticated data mining approach with crowdsourced human intelligence, i.e., field-based eye-witness reports shared via webform, email, Twitter, Facebook, YouTube and voicemail. This naturally presents several important security issues, which explains why the main ST website includes an instructions page detailing security precautions that need to be taken while sub-mitting reports from within Syria. They also link to this practical guide on how to protect your identity and security online and when using mobile phones. The guide is available in both English and Arabic.

Eye-witness reports are subsequently translated, geo-referenced, coded and verified by a group of volunteers who triangulate the information with other sources such as those provided by the HealthMap Crisis platform. They also filter the reports and remove dupli-cates. Reports that have a low con-fidence level vis-a-vis veracity are also removed. Volunteers use a dig-up or vote-up/vote-down feature to “score” the veracity of eye-witness reports. Using this approach, the ST Team and their volunteers have been able to verify almost 90% of the documented killings mapped on their platform thanks to video and/or photographic evidence. They have also been able to associate specific names to about 88% of those reported killed by Syrian forces since the uprising began.

Depending on the levels of violence in Syria, the turn-around time for a report to be mapped on Syria Tracker is between 1-3 days. The team also produces weekly situation reports based on the data they’ve collected along with detailed graphical analysis. KML files that can be uploaded and viewed using Google Earth are also made available on a regular basis. These provide “a more precisely geo-located tally of deaths per location.”

In sum, Syria Tracker is very much breaking new ground vis-a-vis crisis mapping. They’re combining automated data mining technology with crowdsourced eye-witness reports from Syria. In addition, they’ve been doing this for a year, which makes the project the longest running crisis maps I’ve seen in a hostile environ-ment. Moreover, they’ve been able to sustain these import efforts with just a small team of volunteers. As for the veracity of the collected information, I know of no other public effort that has taken such a meticulous and rigorous approach to documenting the killings in Syria in near real-time. On February 24th, Al-Jazeera posted the following estimates:

Syrian Revolution Coordination Union: 9,073 deaths
Local Coordination Committees: 8,551 deaths
Syrian Observatory for Human Rights: 5,581 deaths

At the time, Syria Tracker had a total of 7,901 documented killings associated with specific names, dates and locations. While some duplicate reports may remain, the team argues that “missing records are a much bigger source of error.” Indeed, They believe that “the higher estimates are more likely, even if one chooses to disregard those reports that came in on some of the most violent days where names were not always recorded.”

The Syria Crisis Map itself has been viewed by visitors from 136 countries around the world and 2,018 cities—with the top 3 cities being Damascus, Washington DC and, interestingly, Riyadh, Saudia Arabia. The witnessing has thus been truly global and collective. When the Syrian regime falls, “the data may help sub-sequent governments hold him and other senior leaders to account,” writes the New Scientist. This was one of the principle motivations behind the launch of the Ushahidi platform in Kenya over four years ago. Syria Tracker is powered by Ushahidi’s cloud-based platform, Crowdmap. Finally, we know for a fact that the International Criminal Court (ICC) and Amnesty International (AI) closely followed the Libya Crisis Map last year.

Is Journalism Just Failed Crowdsourcing?

This provocative question materialized during a recent conversation I had with a Professor of Political Science whilst in New York in this week. Major news companies like CNN have started to crowdsource citizen generated news on the basis that “looking at the news from different angles gives us a deeper understanding of what’s going on.” CNN’s iReporter thus invites citizens to help shape the news “in order to paint a more complete picture of the news.”

This would imply that traditional journalism has provided a relatively incomplete picture of global events. So the question is, if crowdsourcing platforms had been available to journalists one hundred years ago, would they view these platforms as an exciting opportunity to get early leads on breaking stories? The common counter argument is: but crowdsourcing “opens the floodgates” of information and we simply can’t follow up on everything. Yes, but whoever said that every lead requires follow up?

Journalists are not always interested in following up on every lead that comes their way. They’ll select a few sources, interview them and then write up the story. What crowdsourcing citizen generated news does, however, is to provide them with many more leads to choose from. Isn’t this an ideal set up for a journalist? Instead of having to chase down leads across town, the leads come directly to them with names, phone numbers and email addresses.

Imagine that the field of journalism had started out using crowdsourcing platforms combined with investigative journalism. If these platforms were then outlawed for whatever reason, would investigative journalists be hindered in their ability to cover the news from different angles? Or would they still be able to paint an equally complete picture of the news?

Granted, one common criticism of citizen journalism is the lack of context they provide especially when using Twitter given the 140 characters restriction. But surely 140 characters are plenty for the purposes of a potential lead. And if a mountain of Tweets started to point to the same lead story, then a professional journalist could take advantage of this information when deciding whether or not to follow up.

Source: CoolThing

I also find the criticism against Twitter interesting coming from traditional journalists. In the early 1900s, large newspapers started hiring war correspondents “who used the new telegraph and expanding railways to move news faster to their newspapers.” However, the cost of sending telegrams forced reporters to develop a “new concise or ‘tight’ style of writing which became the standard for journalism through the next century.”

Today, the costs of hiring professional journalists means that a newspaper like the Herald (at the time),  is not going to send any modern Henry Stanley to find a certain Dr. Livingstone in Africa. And besides, if the Herald had global crowdsourcing platforms back in the 1870s, they may have instead used Twitter to crowdsource the coordinates of Dr. Livingstone.

This may imply that traditional journalism was primarily shaped by the constraint of technology at the time. In a teleological sense, then, crowdsourcing may simply by the next phase in the future of journalism.

Patrick Philippe Meier

Crisis Information and The End of Crowdsourcing

When Wired journalist Jeff Howe coined the term crowdsourcing back in 2006, he did so in contradistinction to the term outsourcing and defined crowdsourcing as tapping the talent of the crowd. The tag line of his article was: “Remember outsourcing? Sending jobs to India and China is so 2003. The new pool of cheap labor: everyday people using their spare cycles to create content, solve problems, even do corporate R & D.”

If I had a tag line for this blog post it would be: “Remember crowdsourcing? Cheap labor to create content and solve problems using the Internet is so 2006. What’s new and cool today is the tapping of official and unofficial sources using new technologies to create and validate quality content.” I would call this allsourcing.

The word “crowdsourcing” is obviously a compound word that combines “crowd” and “sourcing”. But what exactly does “crowd” mean in this respect? And how has “sourcing” changed since Jeff introduced the term crowdsourcing over three-and-a-half years ago?

Lets tackle the question of “sourcing” first. In his June 2006 article on crowdsourcing, Jeff provides case studies that all relate to a novel application of a website and perhaps the most famous example of crowdsourcing is Wikipedia, another website. But we’ve just recently seen some interesting uses of mobile phones to crowdsource information. See Ushahidi or Nathan Eagle’s talk at ETech09, for example:

So the word “sourcing” here goes beyond the website-based e-business approach that Jeff originally wrote about in 2006. The mobile technology component here is key. A “crowd” is not still. A crowd moves, especially in crisis, which is my area of interest. So the term “allsourcing” not only implies collecting information from all sources but also the use of “all” technologies to collect said information in different media.

As for the word “crowd”, I recently noted in this Ushahidi blog post that we may need some qualifiers—namely bounded and unbounded crowdsourcing. In other words, the term “crowd” can mean a large group of people (unbounded crowdsourcing) or perhaps a specific group (bounded crowdsourcing). Unbounded crowdsourcing implies that the identity of individuals reporting the information is unknown whereas bounded crowdsourcing would describe a known group of individuals supplying information.

The term “allsourcing” represents a combination of bounded and unbounded crowdsourcing coupled with new “sourcing” technologies. An allsourcing approach would combined information supplied by known/official sources and unknown/unofficial sources using the Web, e-mail, SMS, Twitter, Flickr, YouTube etc. I think the future of crowdsourcing is allsourcing because allsourcing combines the strengths of both bounded and unbounded approaches while reducing the constraints inherent to each individual approach.

Let me explain. One main important advantage of unbounded crowdsourcing is the ability to collect information from unofficial sources. I consider this an advantage over bounded crowdsourcing since more information can be collected this way. The challenge of course is how to verify the validity of said information. Verifying information is by no means a new process, but unbounded crowdsourcing has the potential to generate a lot more information than bounded crowdsourcing since the former does not censor unofficial content. This presents a challenge.

At the same time, bounded crowdsourcing has the advantage of yielding reliable information since the reports are produced by known/official sources. However, bounded crowdsourcing is constrained to a relatively small number of individuals doing the reporting. Obviously, these individuals cannot be everywhere at the same time. But if we combined bounded and unbounded crowdsourcing, we would see an increase in (1) overall reporting, and (2) in the ability to validate reports from unknown sources.

The increased ability to validate information is due to the fact that official and unofficial sources can be triangulated when using an allsourcing approach. Given that official sources are considered trusted sources, any reports from unofficial sources that match official reports can be considered more reliable along with their associated sources. And so the combined allsourcing approach in effect enables the identification of new reliable sources even if the identify of these sources remains unknown.

Ushahidi is good example of an allsourcing platform. Organizations can use Ushahidi to capture both official and unofficial sources using all kinds of new sourcing technologies. Allsourcing is definitely something new so there’s still much to learn. I have a hunch that there is huge potential. Jeff Howe titled his famous article in Wired “The Rise of Crowdsourcing.” Will a future edition of Wired include an article on “The Rise of Allsourcing”?

Patrick Philippe Meier

Three Common Misconceptions About Ushahidi

Cross posted on Ushahidi

Here are three interesting misconceptions about Ushahidi and crowdsourcing in general:

  1. Ushahidi takes the lead in deploying the Ushahidi platform
  2. Crowdsourced information is statistically representative
  3. Crowdsourced information cannot be validated

Lets start with the first. We do not take the lead in deploying Ushahidi platforms. In fact, we often learn about new deployments second-hand via Twitter. We are a non-profit tech company and our goal is to continue developing innovative crowdsourcing platforms that cater to the growing needs of our current and prospective partners. We provide technical and strategic support when asked but otherwise you’ll find us in the backseat, which is honestly where we prefer to be. Our comparative advantage is not in deployment. So the credit for Ushahidi deployments really go the numerous organizations that continue to implement the platform in new and innovative ways.

On this note, keep in mind that the first downloadable Ushahidi platform was made available just this May, and the second version just last week. So implementing organizations have been remarkable test pilots, experimenting and learning on the fly without recourse to any particular manual or documented best practices. Most election-related deployments, for example, were even launched before May, when platform stability was still an issue and the code was still being written. So our hats go off to all the organizations that have piloted Ushahidi and continue to do so. They are the true pioneers in this space.

Also keep in mind that these organizations rarely had more than a month or two of lead-time before scheduled elections, like in India. If all of us have learned anything from watching these deployments in 2009, it is this: the challenge is not one of technology but election awareness and voter education. So we’re impressed that several organizations are already customizing the Ushahidi platform for elections that are more than 6-12 months away. These deployments will definitely be a first for Ushahidi and we look forward to learning all we can from implementing organizations.

The second misconception, “crowdsourced information is statistically representative,” often crops up in conversations around election monitoring. The problem is largely one of language. The field of election monitoring is hardly new. Established organizations have been involved in election monitoring for decades and have gained a wealth of knowledge and experience in this area. For these organizations, the term “election monitoring” has specific connotations, such as random sampling and statistical analysis, verification, validation and accredited election monitors.

When partners use Ushahidi for election monitoring, I think they mean something different. What they generally mean is citizen-powered election monitoring aided by crowdsourcing. Does this imply that crowdsourced information is statistically representative of all the events taking place across a given country? Of course not: I’ve never heard anyone suggest that crowdsourcing is equivalent to random sampling.

Citizen-powered election monitoring is about empowering citizens to take ownership over their elections and to have a voice. Indeed, elections do not start and stop at the polling booth. Should we prevent civil society groups from crowdsourcing crisis information on the basis that their reports may not be statistically representative? No. This is not our decision to make and the data is not even meant for us.

Another language-related problem has to due with the term “crowdsourcing”. The word  “crowd” here can literally mean anyone (unbounded crowdsourcing) or a specific group (bounded crowdsourcing) such as designated election monitors. If these official monitors use Ushahidi and they are deliberately positioned across a country for random sampling purposes, then this becomes no different at all to standard and established approaches to election monitoring. Bounded crowdsourcing can be statistically representative.

The third misconception about Ushahidi has to do with the tradeoff between unbounded crowdsourcing and the validation of said crowdsourced information. One of the main advantages of unbounded crowdsourcing is the ability to collect a lot of information from a variety of sources and media—official and nonofficial sources—in near real time. Of course, this means that a lot more of information can be reported at once, which can make the validation of said information a challenging process.

A common reaction to this challenge is to dismiss crowdsourcing altogether because unofficial sources may be unreliable or at worse deliberately misleading. Some organizations thus find it easier to write off all unofficial content because of these concerns. Ushahidi takes a different stance. We recognize that user-generated content is not about to disappear any time soon and that a lot of good can come out of such content, not least because official information can too easily become proprietary and guarded instead of shared.

So we’re not prepared to write off user-generated content because validating information happens to be challenging. Crowdsourcing crisis information is our business and so is (obviously) the validation of crowdsourced information. This is why Ushahidi is fully committed to developing Swift River. Swift is a free and open source platform that validates crowdsourced information in near real-time. Follow the Ushahidi blog for exciting updates!

Evolving a Global System of Info Webs

I’ve already blogged about what an ecosystem approach to conflict early warning and response entails. But I have done so with a country focus rather than thinking globally. This blog post applies a global perspective to the ecosystem approach given the proliferation of new platforms with global scalability.

Perhaps the most apt analogy here is one of food webs where the food happens to be information. Organisms in a food web are grouped into primary producers, primary consumers and secondary consumers. Primary producers such as grass harvest an energy source such as sunlight that they turn into biomass. Herbivores are primary consumers of this biomass while carnivores are secondary consumers of herbivores. There is thus a clear relationship known as a food chain.

This is an excellent video visualizing food web dynamics produced by researchers affiliated with the Santa Fe Institute (SFI):

Our information web (or Info Web) is also composed of multiple producers and consumers of information each interlinked by communication technology in increasingly connected ways. Indeed, primary producers, primary consumers and secondary consumers also crawl and dynamically populate the Info Web. But the shock of the information revolution is altering the food chains in our ecosystem. Primary consumers of information can now be primary producers, for example.

At the smallest unit of analysis, individuals are the most primary producers of information. The mainstream media, social media, natural language parsing tools, crowdsourcing platforms, etc, arguably comprise the primary consumers of that information. Secondary consumers are larger organisms such as the global Emergency Information Service (EIS) and the Global Impact and Vulnerability Alert System (GIVAS).

These newly forming platforms are at different stages of evolution. EIS and GIVAS are relatively embryonic while the Global Disaster Alert and Coordination Systems (GDACS) and Google Earth are far more evolved. A relatively new organism in the Info Web is the UAV as exemplified by ITHACA. The BrightEarth Humanitarian Sensor Web (SensorWeb) is further along the information chain while Ushahidi’s Crisis Mapping platform and the Swift River driver are more mature but have not yet deployed as a global instance.

InSTEDD’s GeoChat, Riff and Mesh4X solutions have already iterated through a number of generations. So have ReliefWeb and the Humanitarian Information Unit (HIU). There are of course additional organisms in this ecosystem, but the above list should suffice to demonstrate my point.

What if we connected these various organisms to catalyze a super organism? A Global System of Systems (GSS)? Would the whole—a global system of systems for crisis mapping and early warning—be greater than the sum of its parts? Before we can answer this question in any reasonable way, we need to know the characteristics of each organism in the ecosystem. These organisms represent the threads that may be woven into the GSS, a global web of crisis mapping and early warning systems.

Global System of Systems

Emergency Information Service (EIS) is slated to be a unified communications solution linking citizens, journalists, governments and non-governmental organizations in a seamless flow of timely, accurate and credible information—even when local communication infrastructures are rendered inoperable. This feature will be made possible by utilizing SMS as the communications backbone of the system.

In the event of a crisis, the EIS team would sift, collate, make sense of and verify the myriad of streams of information generated by a large humanitarian intervention. The team would gather information from governments, local media, the military, UN agencies and local NGOs to develop reporting that will be tailored to the specific needs of the affected population and translated into local languages. EIS would work closely with local media to disseminate messages of critical, life saving information.

Global Impact and Vulnerability Alert System (GIVAS) is being designed to closely monitor vulnerabilities and accelerate communication between the time a global crisis hits and when information reaches decision makers through official channels. The system is mandated to provide the international community with early, real-time evidence of how a global crisis is affecting the lives of the poorest and to provide decision-makers with real time information to ensure that decisions take the needs of the most vulnerable into account.

BrightEarth Humanitarian Sensor Web (SensorWeb) is specifically designed for UN field-based agencies to improve real time situational awareness. The dynamic mapping platform enables humanitarians to easily and quickly map infrastructure relevant for humanitarian response such as airstrips, bridges, refugee camps, IDP camps, etc. The SensorWeb is also used to map events of interest such as cholera outbreaks. The platform leverages mobile technology as well as social networking features to encourage collaborative analytics.

Ushahidi integrates web, mobile and dynamic mapping technology to crowdsource crisis information. The platform uses FrontlineSMS and can be deployed quickly as a crisis unfolds. Users can visualize events of interest on a dynamic map that also includes an animation feature to visualize the reported data over time and space.

Swift River is under development but designed to validate crowdsourced information in real time by combining machine learning for predictive tagging with human crowdsourcing for filtering purposes. The purpsose of the platform is to create veracity scores to denote the probability of an event being true when reported across several media such as Twitter, Online news, SMS, Flickr, etc.

GeoChat and Mesh4X could serve as the nodes connecting the above platforms in dynamic ways. Riff could be made interoperable with Swift River.

Can such a global Info Web be catalyzed? The question hinges on several factors the most important of which are probably awareness and impact. The more these individual organisms know about each other, the better picture they will have of the potential synergies between their efforts and then find incentives to collaborate. This is one of the main reasons I am co-organizing the first International Conference on Crisis Mapping (ICCM 2009) next week.

Patrick Philippe Meier

Is Crowdsourcing Really a Myth?

Dan Woods had an interesting piece in Forbes Magazine last month that labels crowdsourcing as a myth. As Dan puts it, the popular press and millions of people are deluded in thinking that “there is a crowd that solves problem better than individuals.”

Dan writes that…

  • The notion of crowds creating solutions appeals to our desire to believe that working together we can do anything, but in terms of innovation it is just ridiculous.
  • There is no crowd in crowdsourcing. There are only virtuosos, usually uniquely talented, highly trained people who have worked for decades in a field. [...] From their fervent brains spring new ideas. The crowd has nothing to do with it. The crowd solves nothing, creates nothing.
  • What really happens in crowdsourcing as it is practiced in wide variety of contexts, from Wikipedia to open source to scientific research, is that a problem is broadcast to a large number of people with varying forms of expertise. Then individuals motivated by obsession, competition, money or all three apply their individual talent to creating a solution.
  • What bugs me is that misplaced faith in the crowd is a blow to the image of the heroic inventor. We need to nurture and fund inventors and give them time to explore, play and fail. A false idea of the crowd reduces the motivation for this investment, with the supposition that companies can tap the minds of inventors on the cheap.
  • Whatever term we use, let’s not call it crowdsourcing and pretend that 10,000 average Joes invent better products than Steve Jobs.

Dan certainly makes some valid points. But when Wired journalist Jeff Howe coined the term in 2006, he did so to differentiate the process from “outsourcing”, from whence the term crowdsourcing originates. In his own words, crowdsourcing describes a process when tasks are opened to anyone as a way “to tap the talent of the crowd.”

I looked up the definition of the word talent and sure enough the term can be used to describe both a person and a group. Clearly Dan takes issue with the semantics of the term crowdsourcing since he sees this as misleading and unfair to those who do most of the innovative work.

In context of the Internet, the 1 % rule or the 90-9-1 principle reflects an observation that “more people will lurk in a virtual community than will participate. This term is often used as a euphemism for participation inequality in the context of the Internet.” What is often overlooked, however, is that this 1% figure is one percent of a growing number given the increasing access to the Web around the world. So the 1% figure does constitute a crowd; a self-selected crowd for sure, but a crowd nevertheless.

In terms of innovation, new ideas are not isolated islands of thought. Ideas tend to ricochet off different minds. Innovation doesn’t occur in a vacuum; there is always context.

Why have have innovation labs otherwise? Why cluster in Silicon Valley?

While I certainly respect individual talent and absolutely believe that individuals should get credit for their work, I feel that Dan’s article is bent on establishing ownership and safeguarding proprietary rights, i.e., control. This stands in contrast to the crowdsourcing approach which has individuals contribute without controlling.

Perhaps elitesourcing is the term that the author would prefer? But as James Surowiecki notes in his best seller, “The Wisdom of the Crowds“:

Diversity and independence are important because the best collective decisions are the product of disagreement and contest, not consensus or compromise.

Under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them. Groups do not need to be dominated by exceptionally intelligent people in order to be smart.

There is also confusion with respect to micro-motives and macro-behavior, or emergent behavior. Swarm intelligence also comes to mind. When we see a flock of birds seemingly “dancing” in the sky as if one entity, this is emergent behavior driven by local synchronization. Does this mean we value the individual birds any less? Of course not. When we say that “it takes a village to raise a child” do we devalue individual parents and family members? Surely not.

We credit the crowd because no one person lives in a vacuum and comes up with innovative ideas that are completely independent from their interaction with the outside world.

Patrick Philippe Meier

Accurate Crowdsourcing for Human Rights

This is a short video of the presentation I will be giving at the Leir Conference on The Next Generation of Human Rights. My talk focuses on the use of digital technologies to leverage the crowdsourcing and crowdfeeding of human rights information. I draw on Ushahidi’s Swift River initiative to describe how crowdsourced information can be auto-validated.

Here’s a copy of the agenda (PDF) along with more details. This Leir Conference aims to bring together world-leading human rights practitioners, advocates, and funders for discussions in an intimate setting. Three panels will be convened, with a focus on audience discussion with the panelists. The topics will include:

  1. Trends in Combating Human Rights Abuses;
  2. Human Rights 2.0: The Next Generation of Human Rights Organizations;
  3. Challenges and Opportunities of  Technology for Human Rights.

I will be on presenting on the third panel together with colleagues from Witness.org and The Diarna Project. For more details on the larger subject of my presentation, please see this blog post on peer-producing human rights.

The desired results of this conference are to allow participants to improve advocacy, funding, or operations through collaborative efforts and shared ideas in a natural setting.

Patrick Philippe Meier

Moving Forward with Swift River

This is an update on the latest Swift River open group meeting that took place this morning at the InSTEDD office in Palo Alto. Ushahidi colleague Kaushal Jhalla first proposed the idea behind Swift River after the terrorist attacks on Mumbai last November. Ushahidi has since taken on the initiative as a core project since the goal of Swift River is central to the group’s mission: the crowdsourcing of crisis information.

Kaushal and Chris Blow gave the first formal presentation of Swift River during our first Ushahidi strategy meeting in Orlando last March where we formally established the Swift River group, which includes Andrew Turner, Sean Gourely, Erik Hersman and myself in addition to Kaushal and Chris. Andrew has played a pivotal role in getting Swift River and Vote Report India off the ground and I highly recommend reading his blog post on the initiative.

The group now includes several new friends of Ushahidi, a number of whom kindly shared their time and insights this morning after Chris kicked off the meeting to bring everyone up to speed.  The purpose of this blog post is to outline how I hope Swift River moves forward based on this morning’s fruitful session. Please see my previous blog post for an overview of the basic methodology.

The purpose of the Swift River platform, as I proposed this morning, is to provide two core services. The first, to borrow Guarva Mishra‘s description, is to crowdsource the tagging of crisis information. The second is to triangulate the tagged information to assign reality scores to individual events. Confused? Not to worry, it’s actually really straightforward.

Crowdsourcing Tagging

Information on a developing crisis can be captured from several text-based sources such articles from online news media, Tweets and SMS, for example. Of course, video footage, pictures and satellite imagery can also provide important information, but we’re more interested in text-based data for now.

The first point to note is that information can range from being very structured to highly unstructured. The word structure is simply another way of describing how organized information is. A few examples are in order vis-a-vis text-based information.

A book is generally highly structured information. Why? Well, because the author hopefully used page numbers, chapter headings, paragraphs, punctuation, an index and table of contents. The fact that the book is structured makes it easier for the reader to find the information she is looking for. The other end of the “structure spectrum” would be a run-on sentence with nospacesandpunctuation. Not terribly helpful.

Below is a slide from a seminar I taught on disaster and conflict early warning back in 2006; ignore the (c).

ewstructure

The slide above depicts the tradeoff between control and structure. We can impose structure on data collected if we control the data entry process. Surveys are an example of a high-control process that yields high-structure. We want high structure because this allows us to find and analyze the data more easily (c.f. entropy). This has generally been the preferred approach, particularly amongst academics.

If we give up control, as one does when crowdsourcing crisis information, we open ourselves up to the possibility of having to deal with a range of structured and unstructured information. To make sense of this information typically requires data mining and natural language processing (NLP) techniques that can identify structure in said information. For example, we would want to identify nouns, verbs, places and dates in order to extra event-data.

One way to do this would be to automatically tag an article with the parameters “who, what, where and when.” A number of platforms such as Open Calais and Virtual Research Associate’s FORECITE already do this. However, these platforms are not customized for crowdsourcing of crisis information and most are entirely closed. (Note: I did consulting work for VRA many years ago).

So we need to draw (and modify) relevant algorithms that are publically available and provide and a user-friendly interface for human oversight of the automated tagging (what we also referred to as crowdsourcing the filter). Here’s a proposed interface that Chris recently designed for Swift River.

swiftriver

The idea would be to develop an algorithm that parses the text (on the left) and auto-suggests answers for the tags (on the right). The user would then confirm or correct the suggested tags and the algorithm would learn from it’s mistakes. In other words, the algorithm would become more accurate over time and the need for human oversight would decrease. In short, we’d be developing a data-driven ontology backed up by Freebase to provide semantic linkages.

VRA already does this but, (1) the data validation is carried out by one (poor) individual, (2) the articles were restricted to the headlines from Reuters and Agence France Press (AFP) newswires, and (3) the project did not draw on semantic analysis. The validation component entailed making sure that events described in the headlines were correctly coded by the parser and ensuring there were no duplicates. See VRA’s patent for the full methodology (PDF).

Triangulation and Scoring

The above tagging process would yield a highly structured event dataset like the example depicted below.

dataset

We could then use simple machine analysis to cluster the same events together and thereby do away with any duplicate event-data. The four records above would then be collapsed into one record:

datafilter2

But that’s not all. We would use a simple weighting or scoring schema to assign a reality score to determine the probability that the event reported really happened. I already described this schema in my previous post so will just give one example: An event that is reported by more than one source is more likely to have happened. This increases the reality score of the event above and pushes it higher up the list. One could also score an event by the geographical proximity of the source to the reported event, and so on. These scores could be combined to give an overall score.

Compelling Visualization

The database output above is not exactly compelling to most people. This is where we need some creative visualization techniques to render the information more intuitive and interesting. Here are a few thoughts. We could draw on Gapminder to visualize the triangulated event-data over time. We could also use the idea of a volume equalizer display.

equalize

This is not the best equalizer interface around for sure, but hopefully gets the point across. Instead of decibels on the Y-axis, we’d have probability scores that an event really happened. Instead of frequencies on the X-axis, we’d have the individual events. Since the data coming in is not static, the bars would bounce up and down as more articles/tweets get tagged and dumped into the event database.

I think this would be an elegant way to visualize the data, not least because the animation would resemble the flow or waves of a swift river but the idea of using a volume equalizer could be used as analogy to quiet the unwanted noise. For the actual Swift River interface, I’d prefer using more colors to denote different characteristics about the event and would provide the user with the option of double-clicking on a bar to drill down to the event sources and underlying text.

Patrick Philippe Meier

Video Introduction to Crisis Mapping

I’ve given many presentations on crisis mapping over the past two years but these were never filmed. So I decided to create this video presentation with narration in order to share my findings more widely and hopefully get a lot of feedback in the process. The presentation is not meant to be exhaustive although the video does run to about 30 minutes.

The topics covered in this presentation include:

  • Crisis Map Sourcing – information collection;
  • Mobile Crisis Mapping – mobile technology;
  • Crisis Mapping Visualization – data visualization;
  • Crisis Mapping Analysis – spatial analysis.

The presentation references several blog posts of mine in addition to several operational projects to illustrate the main concepts behind crisis mapping. The individual blog posts featured in the presentation are listed below:

This research is the product of a 2-year grant provided by Humanity United  (HU) to the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping and Early Warning, where I am a doctoral fellow.

I look forward to any questions/suggestions you may have on the video primer!

Patrick Philippe Meier