Category Archives: Crowdsourcing

Code of Conduct: Cyber Crowdsourcing for Good

There is currently no unified code of conduct for digital crowdsourcing efforts in the development, humanitarian or human rights space. As such, we propose the following principles (displayed below) as a way to catalyze a conversation on these issues and to improve and/or expand this Code of Conduct as appropriate.

Screen Shot 2014-10-20 at 5.22.21 PM

This initial draft was put together by Kate ChapmanBrooke Simons and myself. The link above points to this open, editable Google Doc. So please feel free to contribute your thoughts by inserting comments where appropriate. Thank you.

An organization that launches a digital crowdsourcing project must:

  • Provide clear volunteer guidelines on how to participate in the project so that volunteers are able to contribute meaningfully.
  • Test their crowdsourcing platform prior to any project or pilot to ensure that the system will not crash due to obvious bugs.
  • Disclose the purpose of the project, exactly which entities will be using and/or have access to the resulting data, to what end exactly, over what period of time and what the expected impact of the project is likely to be.
  • Disclose whether volunteer contributions to the project will or may be used as training data in subsequent machine learning research.
  • Not ask volunteers to carry out any illegal tasks.
  • Explain any risks (direct and indirect) that may come with volunteer participation in a given project. To this end, carry out a risk assessment and produce corresponding risk mitigation strategies.
  • Clearly communicate if the results of volunteer tasks will or are likely to be sold to partners/clients.
  • Limit the level of duplication required (for data quality assurance) to a reasonable number based on previous research and experience. In sum, do not waste volunteers’ time and do not offer tasks that are not meaningful. When all tasks have been carried, inform volunteers accordingly.
  • Be fully transparent on the results of the project even if the results are poor or unusable.
  • Only launch a full-scale crowdsourcing project if they are not able to analyze the results and deliver the findings within a timeframe that provides added value to end-users of the data.

An organization that launches a digital crowdsourcing project should:

  • Share as much of the resulting data with volunteers as possible without violating data privacy or the principle of Do No Harm.
  • Enable volunteers to opt out of having their tasks contribute to subsequent machine learning research. Provide digital volunteers with the option of having their contributions withheld from subsequent machine learning studies.
  • Assess how many digital volunteers are likely to be needed for a project and recruit appropriately. Using additional volunteers just because they are available is not appropriate. Should recruitment nevertheless exceed need, adjust project to inform volunteers as soon as their inputs are no longer needed, and possibly give them options for redirecting their efforts.
  • Explain that the same crowdsourcing task (microtask) may/will be given to multiple digital volunteers for data control purposes. This often reassures volunteers who initially lack confidence when contributing to a project.

Automatically Classifying Text Messages (SMS) for Disaster Response

Humanitarian organizations like the UN and Red Cross often face a deluge of social media data when disasters strike areas with a large digital footprint. This explains why my team and I have been working on AIDR (Artificial Intelligence for Disaster Response), a free and open source platform to automatically classify tweets in real-time. Given that the vast majority of the world’s population does not tweet, we’ve teamed up with UNICEF’s Innovation Team to extend our AIDR platform so users can also automatically classify streaming SMS.

BulkSMS_graphic

After the Haiti Earthquake in 2010, the main mobile network operator there (Digicel) offered to sent an SMS to each of their 1.4 million subscribers (at the time) to accelerate our disaster needs assessment efforts. We politely declined since we didn’t have any automated (or even semi-automated way) of analyzing incoming text messages. With AIDR, however, we should (theoretically) be able to classify some 1.8 million SMS’s (and tweets) per hour. Enabling humanitarian organizations to make sense of “Big Data” generated by affected communities is obviously key for two-way communication with said communities during disasters, hence our work at QCRI on “Computing for Good”.

AIDR/SMS applications are certainly not limited to disaster response. In fact, we plan to pilot the AIDR/SMS platform for a public health project with our UNICEF partners in Zambia next month and with other partners in early 2015. While still experimental, I hope the platform will eventually be robust enough for use in response to major disasters; allowing humanitarian organizations to poll affected communities and to make sense of resulting needs in near real-time, for example. Millions of text messages could be automatically classified according to the Cluster System, for example, and the results communicated back to local communities via community radio stations, as described here.

These are still very early days, of course, but I’m typically an eternal optimist, so I hope that our research and pilots do show promising results. Either way, we’ll be sure to share the full outcome of said pilots publicly so that others can benefit from our work and findings. In the meantime, if your organization is interested in piloting and learning with us, then feel free to get in touch.

bio

Integrating Geo-Data with Social Media Improves Situational Awareness During Disasters

A new data-driven study on the flooding of River Elbe in 2013 (one of the most severe floods ever recorded in Germany) shows that geo-data can enhance the process of extracting relevant information from social media during disasters. The authors use “specific geographical features like hydrological data and digital elevation models to prioritize crisis-relevant twitter messages.” The results demonstrate that an “approach based on geographical relations can enhance information extraction from volunteered geographic information,” which is “valuable for both crisis response and preventive flood monitoring.” These conclusions thus support a number of earlier studies that show the added value of data integration. This analysis also confirms several other key assumptions, which are important for crisis computing and disaster response.

floods elbe

The authors apply a “geographical approach to prioritize [the collection of] crisis-relevant information from social media.” More specifically, they combine information from “tweets, water level measurements & digital elevation models” to answer the following three research questions:

  • Does the spatial and temporal distribution of flood-related tweets actually match the spatial and temporal distribution of the flood phenomenon (despite Twitter bias, potentially false info, etc)?

  • Does the spatial distribution of flood-related tweets differ depending on their content?
  • Is geographical proximity to flooding a useful parameter to prioritize social media messages in order to improve situation awareness?

The authors analyzed just over 60,000 disaster-related tweets generated in Germany during the flooding of River Elbe in June 2013. Only 398 of these tweets (0.7%) contained keywords related to the flooding. The geographical distribution of flood-related tweets versus non-flood related tweets is depicted below (click to enlarge).

Screen Shot 2014-10-04 at 7.04.59 AM

As the authors note, “a considerable amount” of flood-related tweets are geo-located in areas of major flooding. So they tested the statistical correlation between the location of flood-related tweets and the actual flooding, which they found to be “statistically significantly lower compared to non-related Twitter messages.” This finding “implies that the locations of flood-related twitter messages and flood-affected catchments match to a certain extent. In particular this means that mostly people in regions affected by the flooding or people close to these regions posted twitter messages referring to the flood.” To this end, major urban areas like Munich and Hamburg were not the source of most flood-related tweets. Instead, “The majority of tweet referring to the flooding were posted by locals” closer to the flooding.

Given that “most flood-related tweets were posted by locals it seems probable that these messages contain local knowledge only available to people on site.” To this end, the authors analyzed the “spatial distribution of flood-related tweets depending on their content.” The results, depicted below (click to enlarge), show that the geographical distribution of tweets do indeed differ based on their content. This is especially true of tweets containing information about “volunteer actions” and “flood level”. The authors confirm these results are statistically significant when compared with tweets related to “media” and “other” issues.

Screen Shot 2014-10-04 at 7.22.05 AM

These findings also reveal that the content of Twitter messages can be combined into three groups given their distance to actual flooding:

Group A: flood level & volunteer related tweets are closest to the floods.
Group B: tweets on traffic conditions have a medium distance to the floods.
Group C: other and media related tweets a furthest to the flooding.

Tweets belonging to “Group A” yield greater situational awareness. “Indeed, information about current flood levels is crucial for situation awareness and can complement existing water level measurements, which are only available for determined geographical points where gauging stations are located. Since volunteer actions are increasingly organized via social media, this is a type of information which is very valuable and completely missing from other sources.”

Screen Shot 2014-10-04 at 6.55.49 AM

In sum, these results show that “twitter messages that are closest to the flood- affected areas (Group A) are also the most useful ones.” The authors thus conclude that “the distance to flood phenomena is indeed a useful parameter to prioritize twitter messages towards improving situation awareness.” To be sure, the spatial distribution of flood-related tweets is “significantly different from the spatial distribution of off-topic messages.” Whether this is also true of other social media platforms like Instagram and Flickr remains to be seen. This is an important area for future research given the increasing use of pictures posted on social media for rapid damage assessments in the aftermath of disasters.

ImageClicker

“The integration of other official datasets, e.g. precipitation data or satellite images, is another avenue for future work towards better understanding the relations between social media and crisis phenomena from a geographical perspective.” I would add both aerial imagery (captured by UAVs) and data from mainstream news (captured by GDELT) to this data fusion exercise. Of course, the geographical approach described above is not limited to the study of flooding only but could be extended to other natural hazards.

This explains why my colleagues at GeoFeedia may be on the right track with their crisis mapping platform. That said, the main limitation with GeoFeedia and the study above is the fact that only 3% of all tweets are actually geo-referenced. But this need not be a deal breaker. Instead, platforms like GeoFeedia can be complemented by other crisis computing solutions that prioritize the analysis of social media content over geography.

Take the free and open-source “Artificial Intelligence for Disaster Response” (AIDR) platform that my team and I at QCRI are developing. Humanitarian organizations can use AIDR to automatically identify tweets related to flood levels and volunteer actions (deemed to provide the most situational awareness) without requiring that tweets be geo-referenced. In addition, AIDR can also be used to identify eyewitness tweets regardless of whether they refer to flood levels, volunteering or other issues. Indeed, we already demonstrated that eyewitness tweets can be automatically identified with an accuracy of 80-90% using AIDR. And note that AIDR can also be used on geo-tagged tweets only.

The authors of the above study recently go in touch to explore ways that their insights can be used to further improve AIDR. So stay tuned for future updates on how we may integrate geo-data more directly within AIDR to improve situational awareness during disasters.

bio

See also:

  • Debating the Value of Tweets For Disaster Response (Intelligently) [link]
  • Social Media for Emergency Management: Question of Supply and Demand [link]
  • Become a (Social Media) Data Donor and Save a Life [link]

May the Crowd Be With You

Three years ago, 167 digital volunteers and I combed through satellite imagery of Somalia to support the UN Refugee Agency (UNHCR) on this joint project. The purpose of this digital humanitarian effort was to identify how many Somalis had been displaced (easily 200,000) due to fighting and violence. Earlier this year, 239 passengers and crew went missing when Malaysia Flight 370 suddenly disappeared. In response, some 8 million digital volunteers mobilized as part of the digital search & rescue effort that followed.

May the Crowd be With You

So in the first case, 168 volunteers were looking for 200,000+ people displaced by violence and in the second case, some 8,000,000 volunteers were looking for 239 missing souls. Last year, in response to Typhoon Haiyan, digital volunteers spent 200 hours or so tagging social media content in support of the UN’s rapid disaster damage assessment efforts. According to responders at the time, some 11 million people in the Philippines were affected by the Typhoon. In contrast, well over 20,000 years of volunteer time went into the search for Flight 370’s missing passengers.

What to do about this heavily skewed distribution of volunteer time? Can (or should) we do anything? Are we simply left with “May the Crowd be with You”?The massive (and as yet unparalleled) online response to Flight 370 won’t be a one-off. We’re entering an era of mass-sourcing where entire populations can be mobilized online. What happens when future mass-sourcing efforts ask digital volunteers to look for military vehicles and aircraft in satellite images taken of a mysterious, unnamed “enemy country” for unknown reasons? Think this is far-fetched? As noted in my forthcoming book, Digital Humanitarians, this online, crowdsourced military surveillance operation already took place (at least once).

As we continue heading towards this new era of mass-sourcing, those with the ability to mobilize entire populations online will indeed yield an impressive new form of power. And as millions of volunteers continue tagging, tracing various features, this volunteer-generated data combined with machine learning will be used to automate future tagging and tracing needs of militaries and multi-billion dollar companies, thus obviating the need for large volumes of volunteers (especially handy should volunteers seek to boycott these digital operations).

At the same time, however, the rise of this artificial intelligence may level the playing field. But few players out there have ready access to high resolution satellite imagery and the actual technical expertise to turn volunteer-generated tags/traces into machine learning classifiers. To this end, perhaps one way forward is to try and “democratize” access to both satellite imagery and the technology needed to make sense of this “Big Data”. Easier said than done. But maybe less impossible than we may think. Perhaps new, disruptive initiatives like Planet Labs will help pave the way forward.

bio

Proof: How Crowdsourced Election Monitoring Makes a Difference

My colleagues Catie Bailard & Steven Livingston have just published the results of their empirical study on the impact of citizen-based crowdsourced election monitoring. Readers of iRevolution may recall that my doctoral dissertation analyzed the use of crowdsourcing in repressive environments and specifically during contested elections. This explains my keen interest in the results of my colleagues’ news data-driven study, which suggests that crowdsourcing does have a measurable and positive impact on voter turnout.

Reclaim Naija

Catie and Steven are “interested in digitally enabled collective action initiatives” spearheaded by “nonstate actors, especially in places where the state is incapable of meeting the expectations of democratic governance.” They are particularly interested in measuring the impact of said initiatives. “By leveraging the efficiencies found in small, incremental, digitally enabled contributions (an SMS text, phone call, email or tweet) to a public good (a more transparent election process), crowdsourced elections monitoring constitutes [an] important example of digitally-enabled collective action.” To be sure, “the successful deployment of a crowdsourced elections monitoring initiative can generate information about a specific political process—information that would otherwise be impossible to generate in nations and geographic spaces with limited organizational and administrative capacity.”

To this end, their new study tests for the effects of citizen-based crowdsourced election monitoring efforts on the 2011 Nigerian presidential elections. More specifically, they analyzed close to 30,000 citizen-generated reports of failures, abuses and successes which were publicly crowdsourced and mapped as part of the Reclaim Naija project. Controlling for a number of factors, Catie and Steven find that the number and nature of crowdsourced reports is “significantly correlated with increased voter turnout.”

Reclaim Naija 2

What explains this correlation? The authors “do not argue that this increased turnout is a result of crowdsourced reports increasing citizens’ motivation or desire to vote.” They emphasize that their data does not speak to individual citizen motivations. Instead, Catie and Steven show that “crowdsourced reports provided operationally critical information about the functionality of the elections process to government officials. Specifically, crowdsourced information led to the reallocation of resources to specific polling stations (those found to be in some way defective by information provided by crowdsourced reports) in preparation for the presidential elections.”

(As an aside, this finding is also relevant for crowdsourced crisis mapping efforts in response to natural disasters. In these situations, citizen-generated disaster reports can—and in some cases do—provide humanitarian organizations with operationally critical information on disaster damage and resulting needs).

In sum, “the electoral deficiencies revealed by crowdsourced reports […] provided actionable information to officials that enabled them to reallocate election resources in preparation for the presidential election […]. This strengthened the functionality of those polling stations, thereby increasing the number of votes that could be successfully cast and counted–an argument that is supported by both quantitative and qualitative data brought to bear in this analysis.” Another important finding is that the resulting “higher turnout in the presidential election was of particular benefit to the incumbent candidate.” As Catie and Steven rightly note, “this has important implications for how various actors may choose to utilize the information generated by new [technologies].”

In conclusion, the authors argue that “digital technologies fundamentally change information environments and, by doing so, alter the opportunities and constraints that the political actors face.” This new study is an important contribution to the literature and should be required reading for anyone interested in digitally-enabled, crowdsourced collective action. Of course, the analysis focuses on “just” one case study, which means that the effects identified in Nigeria may not occur in other crowdsourced, election monitoring efforts. But that’s another reason why this study is important—it will no doubt catalyze future research to determine just how generalizable these initial findings are.

bio

See also:

  • Traditional Election Monitoring Versus Crowdsourced Monitoring: Which Has More Impact? [link]
  • Artificial Intelligence for Monitoring Elections (AIME) [link]
  • Automatically Classifying Crowdsourced Election Reports [link]
  • Evolution in Live Mapping: The Egyptian Elections [link]

Piloting MicroMappers: How to Become a Digital Ranger in Namibia (Revised!)

Many thanks to all of you who have signed up to search and protect Namibia’s beautiful wildlife! (There’s still time to sign up here; you’ll receive an email on Friday, September 26th with the link to volunteer).

Our MicroMappers Wildlife Challenge will launch on Friday, September 26th and run through Sunday, September 28th. More specifically, we’ll begin the search for Namibia’s wildlife at 12noon Namibia time that Friday (which is 12noon Geneva, 11am London, 6am New York, 6pm Shanghai, 7pm Tokyo, 8pm Sydney). You can join the expedition at any time after this. Simply add yourself to this list-serve to participate. Anyone who can get online can be a digital ranger, no prior experience necessary. We’ll continue our digital search until sunset on Sunday evening.

Namibia Map 1

As noted here, rangers at Kuzikus Wildlife Reserve need our help to find wild animals in their reserve. This will help our ranger friends to better protect these beautiful animals from poachers and other threats. According to the rangers, “Rhino poaching continues to be a growing problem that threatens to extinguish some rhino species within a decade or two. Rhino monitoring is thus important for their protection. Using digital maps in combination with MicroMappers to trace aerial images of rhinos could greatly improve rhino monitoring efforts.”

NamibiaMap2
At 12noon Namibia time on Friday, September 26th, we’ll send an email to the above list-serve with the link to our MicroMappers Aerial Clicker, which we’ll use to crowdsource the search for Namibia’s wildlife. We’ll also publish a blog post on MicroMappers.org with the link. Here’s what the Clicker looks like (click to enlarge the Clicker):

MM Aerial Clicker Namibia

When we find animals, we’ll draw “digital shields” around them. Before we show you how to draw these shields and what types of animals we’ll be looking for, here are examples of helpful shields (versus unhelpful ones); note that we’ve had to change these instructions, so please review them carefully! 

MM Rihno Zoom

This looks like two animals! So lets draw two shields.

MM Rhine New YES

The white outlines are the shields that we drew using the Aerial Clicker above. Notice that our shields include the shadows of the animals, this important. If the animals are close to each other, the shields can overlap but there can only be one shield per animal (one shield per rhino in this case : )

MM Rhino New NO

These shields are too close to the animals, please give them more room!

MM Rhino No
These shields are too big.

If you’ve found something that may be an animal but you’re not sure, then please draw a shield anyway just in case. Don’t worry if most pictures don’t have any animals. Knowing where the animals are not is just as important as knowing where they are!

MM Giraffe Zoom

This looks like a giraffe! So lets draw a shield.

MM Giraffe No2

This shield does not include the giraffe’s shadow! So lets try again.

MM Giraffe No

This shield is too large. Lets try again!

MM Giraffe New YES

Now that’s perfect!

Here are some more pictures of animals that we’ll be looking for. As a digital ranger, you’ll simply need to draw shields around these animals, that’s all there is to it. The shields can overlap if need be, but remember: one shield per animal, include their shadows and give them some room to move around : )

MM Ostritch

Can you spot the ostriches? Click picture above to enlarge. You’ll be abel to zoom in with the Aerial Clicker during the Wildlife Challenge.

MM Oryx

Can you spot the five oryxes in the above? (Actually, there may be a 6th one, can you see it in the shadows?).

MM Impala

And the impalas in the left of the picture? Again, you’ll be able zoom in with the Aerial Clicker.

So how exactly does this Aerial Clicker work? Here’s a short video that shows just easy it is to draw a digital shield using the Clicker (note that we’ve had to change the instructions, so please review this video carefully!):

Thanks for reading and for watching! The results of this expedition will help rangers in Namibia make sure they have found all the animals, which is important for their wildlife protection efforts. We’ll have thousands of aerial photographs to search through next week, which means that our ranger friends in Namibia need as much help as possible! So this is where you come on in: please spread the word and invite your friends, families and colleagues to search and protect Namibia’s beautiful wildlife.

MicroMappers is a joint project with the United Nations (OCHA), and the purpose of this pilot is also to test the Aerial Clicker for future humanitarian response efforts. More here. Any questions or suggestions? Feel free to email me at patrick@iRevolution.net or add them in the comments section below.

Thank you!

Piloting MicroMappers: Crowdsourcing the Analysis of UAV Imagery for Disaster Response

New update here!

UAVs are increasingly used in humanitarian response. We have thus added a new Clicker to our MicroMappers collection. The purpose of the “Aerial Clicker” is to crowdsource the tagging of aerial imagery captured by UAVs in humanitarian settings. Trying out new technologies during major disasters can pose several challenges, however. So we’re teaming up with Drone Adventures, Kuzikus Wildlife Reserve, Polytechnic of Namibia, and l’École Polytechnique Fédérale de Lausanne (EPFL) to try out our new Clicker using high-resolution aerial photographs of wild animals in Namibia.

Kuzikus1
As part of their wildlife protection efforts, rangers at Kuzikus want to know how many animals (and what kinds) are roaming about their wildlife reserve. So Kuzikus partnered with Drone Adventures and EPFL’s Cooperation and Development Center (CODEV) and the Laboratory of Geographic Information Systems (LASIG) to launch the SAVMAP project, which stands for “Near real-time ultrahigh-resolution imaging from unmanned aerial vehicles for sustainable land management and biodiversity conservation in semi-arid savanna under regional and global change.” SAVMAP was co-funded by CODEV through LASIG. You can learn more about their UAV flights here.

Our partners are interested in experimenting with crowdsourcing to make sense of this aerial imagery and raise awareness about wildlife in Namibia. As colleagues at Kuzikus recently told us, “Rhino poaching continues to be a growing problem that threatens to extinguish some rhino species within a decade or two. Rhino monitoring is thus important for their protection. One problematic is to detect rhinos in large areas and/or dense bush areas. Using digital maps in combination with MicroMappers to trace aerial images of rhinos could greatly improve rhino monitoring efforts.” 

So our pilot project serves two goals: 1) Trying out the new Aerial Clicker for future humanitarian deployments; 2) Assessing whether crowdsourcing can be used to correctly identify wild animals.

MM Aerial Clicker

Can you spot the zebras in the aerial imagery above? If so, you’re already a digital ranger! No worries, you won’t need to know that those are actually zebras, you’ll simply outline any animals you find (using your mouse) and click on “Add my drawings.” Yes, it’s that easy : )

We’ll be running our Wildlife Challenge from September 26th-28th. To sign up for this digital expedition to Namibia, simply join the MicroMappers list-serve here. We’ll be sure to share the results of the Challenge with all volunteers who participate and with our partners in Namibia. We’ll also be creating a wildlife map based on the results so our friends know where the animals have been spotted (by you!).

MM_Rhino

Given that rhino poaching continues to be a growing problem in Namibia (and elsewhere), we will obviously not include the location of rhinos in our wildlife map. You’ll still be able to look for and trace rhinos (like those above) as well as other animals like ostriches, oryxes & giraffes, for example. Hint: shadows often reveal the presence of wild animals!

MM_Giraffe

Drone Adventures hopes to carry out a second mission in Namibia early next year. So if we’re successful in finding all the animals this time around, then we’ll have the opportunity to support the Kuzikus Reserve again in their future protection efforts. Either way, we’ll be better prepared for the next humanitarian disaster thanks to this pilot. MicroMappers is developed by QCRI and is a joint project with the United Nations Office for the Coordination of Humanitarian Affairs (OCHA).

Any questions or suggestions? Feel free to email me at patrick@iRevolution.net or add them in the comments section below. Thank you!