Tag Archives: Sampling

Why Bounded Crowdsourcing is Important for Crisis Mapping and Beyond

I coined the term “bounded crowdsourcing” a couple years back to distinguish the approach from other methodologies for information collection. As tends to happen, some Muggles (in the humanitarian community) ridiculed the term. They freaked out about the semantics instead of trying to understand the under-lying concept. It’s not their fault though, they’ve never been to Hogwarts and have never taken Crowdsourcery 101 (joke!).

Open crowdsourcing or “unbounded crowdsourcing” refers to the collection of information with no intentional constraints. Anyone who hears about an effort to crowdsource information can participate. This definition is inline with the original description put forward by Jeff Howe: outsourcing a task to a generally large group of people in the form of an open call.

In contrast, the point of “bounded crowdsourcing” is to start with a small number of trusted individuals and to have these individuals invite say 3 additional individuals to join the project–individuals who they fully trust and can vouch for. After joining and working on the project, these individuals in turn invite 3 additional people they fully trust. And so on and so forth at an exponential rate if desired. Just like crowdsourcing is nothing new in the field of statistics, neither is “bounded crowdsourcing”; it’s analog being snowball sampling.

In snowball sampling, a number of individuals are identified who meet certain criteria but unlike purposive sampling they are asked to recommend others who also meet this same criteria—thus expanding the network of participants. Although these “bounded” methods are unlikely to produce representative samples, they are more likely to produce trustworthy information. In addition, there are times when it may be the best—or indeed only—method available. Incidentally, a recent study that analyzed various field research methodologies for conflict environments concluded that snowball sampling was the most effective method (Cohen and Arieli 2011).

I introduced the concept of bounded crowdsourcing to the field of crisis mapping in response to concerns over the reliability of crowd sourced information. One excellent real world case study of bounded crowdsourcing for crisis response is this remarkable example from Kyrgyzstan. The “boundary” in bounded crowd-sourcing is dynamic and can grow exponentially very quickly. Participants may not all know each other (just like in open crowdsourcing) so in some ways they become a crowd but one bounded by an invite-only criteria.

I have since recommended this approach to several groups using the Ushahidi platform, like the #OWS movement. The statistical method known as snowball sampling is decades old. So I’m not introducing a new technique, simply applying a conventional approach from statistics to the field of crisis mapping and calling it bounded to distinguish the methodology from regular crowdsourcing efforts. What is different and exciting about combining snowball sampling with crowd-sourcing is that a far larger group can be sampled, a lot more quickly and also more cost-effectively given today’s real-time, free social networking platforms.

Demystifying Crowdsourcing: An Introduction to Non-Probability Sampling

The use of crowdsourcing may be relatively new to the technology, business and humanitarian sectors but when it comes to statistics, crowdsourcing is a well known and established sampling method. Crowdsourcing is just non-probability sampling. The crowdsourcing of crisis information is simply an application of non-probability sampling.

Lets first review probability sampling in which every unit in the population being sampled has a known probability (greater than zero) of being selected. This approach makes it possible to “produce unbiased estimates of population totals, by weighting sampled units according to their probability selection.”

Non-probability sampling, on the other hand, describes an approach in which some units of the population have no chance of being selected or where the probability of selection cannot be accurately determined. An example is convenience sampling. The main drawback of non-probability sampling techniques is that “information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population.”

There are several advantages, however. First, non-probability sampling is a quick way to collect way to collect and analyze data in range of settings with diverse populations. The approach is also a “cost-efficient means of greatly increasing the sample, thus enabling more frequent measurement.” In some cases, the non-probability sampling may actually be the only approach available—a common constrain in a lot of research, including many medical studies, not to mention Ushahidi Haiti. The method is also used in exploratory research, e.g., for hypothesis generation, especially when attempting to determine whether a problem exists or not.

The point is that non-probability sampling can save lives, many lives. Much of the data used for medical research is the product of convenience sampling. When you see your doctor, or you’re hospitalized, that is not a representative sample. Should the medical field throw away all this data based on the fact that it constitutes non-probability sampling. Of course not, that would be ludicrous.

The notion of bounded crowdsourcing, which I blogged about here, is also a known sampling technique called purposive sampling. This approach involves targeting experts or key informants. Snowball sampling is another type of non-probability sampling, which may also be applied to crowdsource of crisis information.

In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study. You then ask them to recommend others who they may know who also meet the criteria. Although this method would hardly lead to representative samples, there are times when it may be the best method available. Snowball sampling is especially useful when you are trying to reach populations that are inaccessible or hard to find.

A project like Mission 4636 and Ushahidi-Haiti could take advantage of this approach by using two-way SMS communication to ask respondents to spread the word. Individuals who sent in text messages about persons trapped under the rubble could (later) be sent an SMS asking them to share the 4636 short code with people who may know of other trapped individuals. When the humanitarian response began to scale during the search and rescue operations, purposive sampling using UN personnel could also have been implemented.

In contrast to non-probability sampling techniques, probability sampling often requires considerable time and extensive resources. Furthermore, non-response effects can easily turn any probability design into non-probability sampling if the “characteristics of non-response are not well understood” since these modify each unit’s probability of being sampled.

This is not to suggest that one approach is better than the other since this depends entirely on the context and research question.

Patrick Philippe Meier