Raphael Hörler from Zurich’s ETH University has just completed his thesis on the role of crowdsourcing in humanitarian action. His valuable research offers one of the most up-to-date and comprehensive reviews of the principal players and humanitarian technologies in action today. In short, I highly recommend this important resource. Raphael’s full thesis is available here (PDF).
There is currently no unified code of conduct for digital crowdsourcing efforts in the development, humanitarian or human rights space. As such, we propose the following principles (displayed below) as a way to catalyze a conversation on these issues and to improve and/or expand this Code of Conduct as appropriate.
This initial draft was put together by Kate Chapman, Brooke Simons and myself. The link above points to this open, editable Google Doc. So please feel free to contribute your thoughts by inserting comments where appropriate. Thank you.
An organization that launches a digital crowdsourcing project must:
- Provide clear volunteer guidelines on how to participate in the project so that volunteers are able to contribute meaningfully.
- Test their crowdsourcing platform prior to any project or pilot to ensure that the system will not crash due to obvious bugs.
- Disclose the purpose of the project, exactly which entities will be using and/or have access to the resulting data, to what end exactly, over what period of time and what the expected impact of the project is likely to be.
- Disclose whether volunteer contributions to the project will or may be used as training data in subsequent machine learning research.
- Not ask volunteers to carry out any illegal tasks.
- Explain any risks (direct and indirect) that may come with volunteer participation in a given project. To this end, carry out a risk assessment and produce corresponding risk mitigation strategies.
- Clearly communicate if the results of volunteer tasks will or are likely to be sold to partners/clients.
- Limit the level of duplication required (for data quality assurance) to a reasonable number based on previous research and experience. In sum, do not waste volunteers’ time and do not offer tasks that are not meaningful. When all tasks have been carried, inform volunteers accordingly.
- Be fully transparent on the results of the project even if the results are poor or unusable.
- Only launch a full-scale crowdsourcing project if they are not able to analyze the results and deliver the findings within a timeframe that provides added value to end-users of the data.
An organization that launches a digital crowdsourcing project should:
- Share as much of the resulting data with volunteers as possible without violating data privacy or the principle of Do No Harm.
- Enable volunteers to opt out of having their tasks contribute to subsequent machine learning research. Provide digital volunteers with the option of having their contributions withheld from subsequent machine learning studies.
- Assess how many digital volunteers are likely to be needed for a project and recruit appropriately. Using additional volunteers just because they are available is not appropriate. Should recruitment nevertheless exceed need, adjust project to inform volunteers as soon as their inputs are no longer needed, and possibly give them options for redirecting their efforts.
- Explain that the same crowdsourcing task (microtask) may/will be given to multiple digital volunteers for data control purposes. This often reassures volunteers who initially lack confidence when contributing to a project.
Humanitarian organizations like the UN and Red Cross often face a deluge of social media data when disasters strike areas with a large digital footprint. This explains why my team and I have been working on AIDR (Artificial Intelligence for Disaster Response), a free and open source platform to automatically classify tweets in real-time. Given that the vast majority of the world’s population does not tweet, we’ve teamed up with UNICEF’s Innovation Team to extend our AIDR platform so users can also automatically classify streaming SMS.
After the Haiti Earthquake in 2010, the main mobile network operator there (Digicel) offered to sent an SMS to each of their 1.4 million subscribers (at the time) to accelerate our disaster needs assessment efforts. We politely declined since we didn’t have any automated (or even semi-automated way) of analyzing incoming text messages. With AIDR, however, we should (theoretically) be able to classify some 1.8 million SMS’s (and tweets) per hour. Enabling humanitarian organizations to make sense of “Big Data” generated by affected communities is obviously key for two-way communication with said communities during disasters, hence our work at QCRI on “Computing for Good”.
AIDR/SMS applications are certainly not limited to disaster response. In fact, we plan to pilot the AIDR/SMS platform for a public health project with our UNICEF partners in Zambia next month and with other partners in early 2015. While still experimental, I hope the platform will eventually be robust enough for use in response to major disasters; allowing humanitarian organizations to poll affected communities and to make sense of resulting needs in near real-time, for example. Millions of text messages could be automatically classified according to the Cluster System, for example, and the results communicated back to local communities via community radio stations, as described here.
These are still very early days, of course, but I’m typically an eternal optimist, so I hope that our research and pilots do show promising results. Either way, we’ll be sure to share the full outcome of said pilots publicly so that others can benefit from our work and findings. In the meantime, if your organization is interested in piloting and learning with us, then feel free to get in touch.
Three years ago, 167 digital volunteers and I combed through satellite imagery of Somalia to support the UN Refugee Agency (UNHCR) on this joint project. The purpose of this digital humanitarian effort was to identify how many Somalis had been displaced (easily 200,000) due to fighting and violence. Earlier this year, 239 passengers and crew went missing when Malaysia Flight 370 suddenly disappeared. In response, some 8 million digital volunteers mobilized as part of the digital search & rescue effort that followed.
So in the first case, 168 volunteers were looking for 200,000+ people displaced by violence and in the second case, some 8,000,000 volunteers were looking for 239 missing souls. Last year, in response to Typhoon Haiyan, digital volunteers spent 200 hours or so tagging social media content in support of the UN’s rapid disaster damage assessment efforts. According to responders at the time, some 11 million people in the Philippines were affected by the Typhoon. In contrast, well over 20,000 years of volunteer time went into the search for Flight 370’s missing passengers.
What to do about this heavily skewed distribution of volunteer time? Can (or should) we do anything? Are we simply left with “May the Crowd be with You”?The massive (and as yet unparalleled) online response to Flight 370 won’t be a one-off. We’re entering an era of mass-sourcing where entire populations can be mobilized online. What happens when future mass-sourcing efforts ask digital volunteers to look for military vehicles and aircraft in satellite images taken of a mysterious, unnamed “enemy country” for unknown reasons? Think this is far-fetched? As noted in my forthcoming book, Digital Humanitarians, this online, crowdsourced military surveillance operation already took place (at least once).
As we continue heading towards this new era of mass-sourcing, those with the ability to mobilize entire populations online will indeed yield an impressive new form of power. And as millions of volunteers continue tagging, tracing various features, this volunteer-generated data combined with machine learning will be used to automate future tagging and tracing needs of militaries and multi-billion dollar companies, thus obviating the need for large volumes of volunteers (especially handy should volunteers seek to boycott these digital operations).
At the same time, however, the rise of this artificial intelligence may level the playing field. But few players out there have ready access to high resolution satellite imagery and the actual technical expertise to turn volunteer-generated tags/traces into machine learning classifiers. To this end, perhaps one way forward is to try and “democratize” access to both satellite imagery and the technology needed to make sense of this “Big Data”. Easier said than done. But maybe less impossible than we may think. Perhaps new, disruptive initiatives like Planet Labs will help pave the way forward.