My colleague Kirk Morris recently pointed me to this very neat study on iterative versus parallel models of crowdsourcing for the analysis of satellite imagery. The study was carried out by French researcher & engineer Nicolas Maisonneuve for the next GISscience2012 conference.
Nicolas finds that after reaching a certain threshold, adding more volunteers to the parallel model does “not change the representativeness of opinion and thus will not change the consensual output.” His analysis also shows that the value of this threshold has significant impact on the resulting quality of the parallel work and thus should be chosen carefully. In terms of the iterative approach, Nicolas finds that “the first iterations have a high impact on the final results due to a path dependency effect.” To this end, “stronger commitment during the first steps are thus a primary concern for using such model,” which means that “asking expert/committed users to start,” is important.
Nicolas’s study also reveals that the parellel approach is better able to correct wrong annotations (wrong analysis of the satellite imagery) than the iterative model for images that are fairly straightforward to interpret. In contrast, the iterative model is better suited for handling more ambiguous imagery. But there is a catch: the potential path dependency effect in the iterative model means that ”mistakes could be propagated, generating more easily type I errors as the iterations proceed.” In terms of spatial coverage, the iterative model is more efficient since the parallel model leverages redundancy to ensure data quality. Still, Nicolas concludes that the “parallel model provides an output which is more reliable than that of a basic iterative [because] the latter is sensitive to vandalism or knowledge destruction.”
So the question that naturally follow is this: how can parallel and iterative methodologies be combined to produce a better overall result? Perhaps the parallel approach could be used as the default to begin with. However, images that are considered difficult to interpret would get pushed from the parallel workflow to the iterative workflow. The latter would first be processed by experts in order to create favorable path dependency. Could this hybrid approach be the wining strategy?