Raphael Hörler from Zurich’s ETH University has just completed his thesis on the role of crowdsourcing in humanitarian action. His valuable research offers one of the most up-to-date and comprehensive reviews of the principal players and humanitarian technologies in action today. In short, I highly recommend this important resource. Raphael’s full thesis is available here (PDF).
There is currently no unified code of conduct for digital crowdsourcing efforts in the development, humanitarian or human rights space. As such, we propose the following principles (displayed below) as a way to catalyze a conversation on these issues and to improve and/or expand this Code of Conduct as appropriate.
This initial draft was put together by Kate Chapman, Brooke Simons and myself. The link above points to this open, editable Google Doc. So please feel free to contribute your thoughts by inserting comments where appropriate. Thank you.
An organization that launches a digital crowdsourcing project must:
- Provide clear volunteer guidelines on how to participate in the project so that volunteers are able to contribute meaningfully.
- Test their crowdsourcing platform prior to any project or pilot to ensure that the system will not crash due to obvious bugs.
- Disclose the purpose of the project, exactly which entities will be using and/or have access to the resulting data, to what end exactly, over what period of time and what the expected impact of the project is likely to be.
- Disclose whether volunteer contributions to the project will or may be used as training data in subsequent machine learning research.
- Not ask volunteers to carry out any illegal tasks.
- Explain any risks (direct and indirect) that may come with volunteer participation in a given project. To this end, carry out a risk assessment and produce corresponding risk mitigation strategies.
- Clearly communicate if the results of volunteer tasks will or are likely to be sold to partners/clients.
- Limit the level of duplication required (for data quality assurance) to a reasonable number based on previous research and experience. In sum, do not waste volunteers’ time and do not offer tasks that are not meaningful. When all tasks have been carried, inform volunteers accordingly.
- Be fully transparent on the results of the project even if the results are poor or unusable.
- Only launch a full-scale crowdsourcing project if they are not able to analyze the results and deliver the findings within a timeframe that provides added value to end-users of the data.
An organization that launches a digital crowdsourcing project should:
- Share as much of the resulting data with volunteers as possible without violating data privacy or the principle of Do No Harm.
- Enable volunteers to opt out of having their tasks contribute to subsequent machine learning research. Provide digital volunteers with the option of having their contributions withheld from subsequent machine learning studies.
- Assess how many digital volunteers are likely to be needed for a project and recruit appropriately. Using additional volunteers just because they are available is not appropriate. Should recruitment nevertheless exceed need, adjust project to inform volunteers as soon as their inputs are no longer needed, and possibly give them options for redirecting their efforts.
- Explain that the same crowdsourcing task (microtask) may/will be given to multiple digital volunteers for data control purposes. This often reassures volunteers who initially lack confidence when contributing to a project.
Humanitarian organizations like the UN and Red Cross often face a deluge of social media data when disasters strike areas with a large digital footprint. This explains why my team and I have been working on AIDR (Artificial Intelligence for Disaster Response), a free and open source platform to automatically classify tweets in real-time. Given that the vast majority of the world’s population does not tweet, we’ve teamed up with UNICEF’s Innovation Team to extend our AIDR platform so users can also automatically classify streaming SMS.
After the Haiti Earthquake in 2010, the main mobile network operator there (Digicel) offered to sent an SMS to each of their 1.4 million subscribers (at the time) to accelerate our disaster needs assessment efforts. We politely declined since we didn’t have any automated (or even semi-automated way) of analyzing incoming text messages. With AIDR, however, we should (theoretically) be able to classify some 1.8 million SMS’s (and tweets) per hour. Enabling humanitarian organizations to make sense of “Big Data” generated by affected communities is obviously key for two-way communication with said communities during disasters, hence our work at QCRI on “Computing for Good”.
AIDR/SMS applications are certainly not limited to disaster response. In fact, we plan to pilot the AIDR/SMS platform for a public health project with our UNICEF partners in Zambia next month and with other partners in early 2015. While still experimental, I hope the platform will eventually be robust enough for use in response to major disasters; allowing humanitarian organizations to poll affected communities and to make sense of resulting needs in near real-time, for example. Millions of text messages could be automatically classified according to the Cluster System, for example, and the results communicated back to local communities via community radio stations, as described here.
These are still very early days, of course, but I’m typically an eternal optimist, so I hope that our research and pilots do show promising results. Either way, we’ll be sure to share the full outcome of said pilots publicly so that others can benefit from our work and findings. In the meantime, if your organization is interested in piloting and learning with us, then feel free to get in touch.