Tag Archives: Resilience

Seven Principles for Big Data and Resilience Projects

Authored by Kate Crawford, Patrick MeierClaudia PerlichAmy Luers, Gustavo Faleiros and Jer Thorp, 2013 PopTech & Rockefeller Foundation Bellagio Fellows

Update: See also “Big Data, Communities and Ethical Resilience: A Framework for Action” written by the above Fellows and available here (PDF).

Bellagio Fellows

The following is a draft “Code of Conduct” that seeks to provide guidance on best practices for resilience building projects that leverage Big Data and Advanced Computing. These seven core principles serve to guide data projects to ensure they are socially just, encourage local wealth- & skill-creation, require informed consent, and be maintainable over long timeframes. This document is a work in progress, so we very much welcome feedback. Our aim is not to enforce these principles on others but rather to hold ourselves accountable and in the process encourage others to do the same. Initial versions of this draft were written during the 2013 PopTech & Rockefeller Foundation workshop in Bellagio, August 2013.

1. Open Source Data Tools

Wherever possible, data analytics and manipulation tools should be open source, architecture independent and broadly prevalent (R, python, etc.). Open source, hackable tools are generative, and building generative capacity is an important element of resilience. Data tools that are closed prevent end-users from customizing and localizing them freely. This creates dependency on external experts which is a major point of vulnerability. Open source tools generate a large user base and typically have a wider open knowledge base. Open source solutions are also more affordable and by definition more transparent. Open Data Tools should be highly accessible and intuitive to use by non-technical users and those with limited technology access in order to maximize the number of participants who can independently use and analyze Big Data.

2. Transparent Data Infrastructure

Infrastructure for data collection and storage should operate based on transparent standards to maximize the number of users that can interact with the infrastructure. Data infrastructure should strive for built-in documentation, be extensive and provide easy access. Data is only as useful to the data scientist as her/his understanding of its collection is correct. This is critical for projects to be maintained over time, regardless of team membership, otherwise projects will collapse when key members leave. To allow for continuity, the infrastructure has to be transparent and clear to a broad set of analysts – independent of the tools they bring to bear. Solutions such as hadoop, JSON formats and the use of clouds are potentially suitable.

3. Develop and Maintain Local Skills

Make “Data Literacy” more widespread. Leverage local data labor and build on existing skills. The key and most constraint ingredient to effective data solutions remains human skill/knowledge and needs to be retained locally. In doing so, consider cultural issues and language. Catalyze the next generation of data scientists and generate new required skills in the cities where the data is being collected. Provide members of local communities with hands-on experience; people who can draw on local understanding and socio-cultural context. Longevity of Big Data for Resilience projects depends on the continuity of local data science teams that maintain an active knowledge and skills base that can be passed on to other local groups. This means hiring local researchers and data scientists and getting them to build teams of the best established talent, as well as up-and-coming developers and designers. Risks emerge when non-resident companies are asked to spearhead data programs that are connected to local communities. They bring in their own employees, do not foster local talent over the long-term, and extract value from the data and the learning algorithms that are kept by the company rather than the local community.

4. Local Data Ownership

Use Creative Commons and licenses that state that data is not to be used for commercial purposes. The community directly owns the data it generates, along with the learning algorithms (machine learning classifiers) and derivatives. Strong data protection protocols need to be in place to protect identities and personally identifying information. Only the “Principle of Do No Harm” can trump consent, as explicitly stated by the International Committee of the Red Cross’s Data Protection Protocols (ICRC 2013). While the ICRC’s data protection standards are geared towards humanitarian professionals, their core protocols are equally applicable to the use of Big Data in resilience projects. Time limits on how long the data can be used for should be transparently stated. Shorter frameworks should always be preferred, unless there are compelling reasons to do otherwise. People can give consent for how their data might be used in the short to medium term, but after that, the possibilities for data analytics, predictive modelling and de-anonymization will have advanced to a state that cannot at this stage be predicted, let alone consented to.

5. Ethical Data Sharing

Adopt existing data sharing protocols like the ICRC’s (2013). Permission for sharing is essential. How the data will be used should be clearly articulated. An opt in approach should be the preference wherever possible, and the ability for individuals to remove themselves from a data set after it has been collected must always be an option. Projects should always explicitly state which third parties will get access to data, if any, so that it is clear who will be able to access and use the data. Sharing with NGOs, academics and humanitarian agencies should be carefully negotiated, and only shared with for-profit companies when there are clear and urgent reasons to do so. In that case, clear data protection policies must be in place that will bind those third parties in the same way as the initial data gatherers. Transparency here is key: communities should be able to see where their data goes, and a complete list of who has access to it and why.

6. Right Not To Be Sensed

Local communities have a right not to be sensed. Large scale city sensing projects must have a clear framework for how people are able to be involved or choose not to participate. All too often, sensing projects are established without any ethical framework or any commitment to informed consent. It is essential that the collection of any sensitive data, from social and mobile data to video and photographic records of houses, streets and individuals, is done with full public knowledge, community discussion, and the ability to opt out. One proposal is the #NoShare tag. In essence, this principle seeks to place “Data Philanthropy” in the hands of local communities and in particular individuals. Creating clear informed consent mechanisms is a requisite for data philanthropy.

7. Learning from Mistakes

Big Data and Resilience projects need to be open to face, report, and discuss failures. Big Data technology is still very much in a learning phase. Failure and the learning and insights resulting from it should be accepted and appreciated. Without admitting what does not work we are not learning effectively as a community. Quality control and assessment for data-driven solutions is notably harder than comparable efforts in other technology fields. The uncertainty about quality of the solution is created by the uncertainty inherent in data. Even good data scientist are struggling to assess the upside potential of incremental efforts on the quality of a solution. The correct analogy is more one a craft rather a science. Similar to traditional crafts, the most effective way is to excellence is to learn from ones mistakes under the guidance of a mentor with a collective knowledge of experiences of both failure and success.

Yes, But Resilience for Whom?

I sense a little bit of history repeating, and not the good kind. About ten years ago, I was deeply involved in the field of conflict early warning and response. Eventually, I realized that the systems we were designing and implementing excluded at-risk communities even though the rhetoric had me believe they were instrumented to protect them. The truth is that these information systems were purely extractive and ultimately did little else than fill the pockets of academics who were hired as consultants to develop these early warning systems.


The prevailing belief amongst these academics was (and still is) that large datasets and advanced quantitative methodologies can predict the escalation of political tensions and thus impede violence. To be sure, “these systems have been developed in advanced environments where the intention is to gather data so as to predict events in distant places. This leads to a division of labor between those who ‘predict’ and those ‘predicted’ upon” (Cited Meier 2008, PDF).

Those who predict assume their sophisticated remote sensing systems will enable them to forecast and thus prevent impending conflict. Those predicted upon don’t even know these systems exist. The sum result? Conflict early warning systems have failed miserably at forecasting anything, let alone catalyzing preventive action or empowering local communities to get out of harm’s way. Conflict prevention is inherently political, and “political will is not an icon on your computer screen” (Cited in Meier 2013).

In Toward a Rational Society (1970), the German philosopher Jürgen Habermas describes “the colonization of the public sphere through the use of instrumental technical rationality. In this sphere, complex social problems are reduced to technical questions, effectively removing the plurality of contending perspectives” (Cited in Meier 2006, PDF). This instrumentalization of society depoliticized complex social problems like conflict and resilience into terms that are susceptible to technical solutions formulated by external experts. The participation of local communities thus becomes totally unnecessary to produce and deliver these technical solutions. To be sure, the colonization of the public sphere crowds out both local knowledge and participation.

We run this risk of repeating these mistakes with respect the discourse on community resilience. While we speak of community resilience, we gravitate towards the instrumentalization of communities using Big Data, which is largely conceived as a technical challenge of real-time data sensing and optimization. This external, top-down approach bars local participation. The depoliticization of resilience also hides the fact that “every act of measurement is an act marked by the play of powerful relations” (Cited Meier 2013b). To make matters worse, these measurements are almost always taken without the subjects knowing, let alone their consent. And so we create the division between those who sense and those sensed upon, thereby fully excluding the latter, all in the name of building community resilience.


Acknowledgements: I raised the question “Resilience for whom?” during the PopTech and Rockefeller Foundation workshop on “Big Data & Community Resilience.” I am thus grateful to the organizers and fellows for informing my thinking and the motivation for this post.

Big Data, Disaster Resilience and Lord of the Rings

The Shire is a local community of Hobbits seemingly disconnected from the systemic changes taking place in Middle Earth. They are a quiet, self-sufficient community with high levels of social capital. Hobbits are not interested in “Big Data”; their world is populated by “Small Data” and gentle action. This doesn’t stop the “Eye of Sauron” from sensing this small harmless hamlet, however. During Gandalf’s visit, the Hobbits learn that all is not well in the world outside the Shire. The changing climate, deforestation and land degradation is wholly unnatural and ultimately threatens their own way of life.


Gandalf leads a small band of Hobbits (bonding social capital) out of the Shire to join forces with other peoples of Middle Earth (bridging social capital) in what he calls “The Fellowship of the Ring” (resilience in diversity). Together, they must overcome personal & collective adversity and travel to Mordor to destroy the one ring that rules them all. Only then will Sauron’s “All Seeing Eye” cease sensing and oppressing the world of Middle Earth.


I’m definitely no expert on J. R. R Tolken or The Lord of the Rings, but I’ve found that literature and indeed mythology often hold up important mirrors to our modern societies and remind us that the perils we face may not be entirely new. This implies that cautionary tales of the past may still bear some relevance today. The hero’s journey speaks to the human condition, and mythology serves as a evidence of human resilience. These narratives carry deep truths about the human condition, our shortcomings and redeeming qualities. Mythologies, analogies and metaphors help us make sense of our world; we ignore them at our own risk.

This is why I’ve employed the metaphor of the Shire (local communities) and Big Data (Eye of Sauron) during recent conversations on Big Data and Community Resilience. There’s been push-back of late against Big Data, with many promoting the notion of Small Data. “For many problems and questions, small data in itself is enough” (1). Yes, for specific problems: locally disconnected problems. But we live in an increasingly interdependent and connected world with coupled systems that run the risk of experiencing synchronous failure and collapse. Our sensors cannot be purely local since the resilience of our communities is no longer mostly place-based. This is where the rings come in.


Frodo’s ring allows him to sense change well beyond the Shire and at the same time mask his local presence. But using the ring allows him to be sensed and hunted by Sauron. The same is true of Google and social media platforms like Facebook. We have no ways to opt out from being sensed if we wish to use these platforms. Community-generated content, our digital fingerprints, belong to the Great Eye, not to the Shire. This excellent piece on the Political Economy of Twitter clearly demonstrates that an elite few control user-generated content. The true owners of social media data are the platform providers, not the end users. In sum, “only corporate actors and regulators—who possess both the intellectual and financial resources to succeed in this race—can afford to participate,” which means “that the emerging data market will be shaped according to their interests.” Of course, the scandal surrounding PRISM makes Sauron’s “All Seeing Eye” even more palpable.

So when we say that we have more data than ever before in human history, it behooves us to ask “Who is we? And to what end?” Does the Shire have access to greater data than ever before thanks to Sauron? Hardly. Is this data used by Sauron to support community resilience? Fat chance. Local communities are excluded; they are observers, unwilling participants in a centralized system that ultimately undermines trust and their own resilience. Hobbits deserve the right not to be sensed. This should be a non-negotiable. They also deserve the right to own and manage their own “Small Data” themselves; that is, data generated by the community, for the community. We need respectful, people-centered data protection protocols like those developed by Open Paths. Community resilience ought to be ethical community resilience.

To be sure, we need to place individual data-sharing decisions in the hands of individuals rather than external parties. In addition to Open Paths, Creative Commons is an excellent example of what is possible. Why not extend that framework to personal and social media data? Why not include a temporal element to these licenses, as hinted in this blog post last year. That is, something like SnapChat where the user decides for herself how long the data should be accessible and usable. Well it turns out that these discussions and related conversations are taking place thanks to my fellow PopTech and Rockefeller Foundation Fellows. Stay tuned for updates. The ideas presented above are the result of our joint brainstorming sessions, and certainly not my ideas alone (but I take full blame for The Lord of the Rings analogy given my limited knowledge of said books!).

In closing, a final reference to The Lord of the Rings: Gandalf (who is a translational leader) didn’t empower the Hobbits, he took them on a journey that built on their existing capacities for resilience. That is, we cannot empower others, we can only provide them with the means to empower themselves. In sum, “Not all those who wander are lost.”


ps. I’m hoping my talented fellows Kate Crawford, Gustavo Faleiros, Amy Luers, Claudia Perlich and Jer Thorp will chime in, improve my Lord of the Rings analogy and post comments in full Elvish script.