Tag Archives: data

Quantifying Information Flow During Emergencies

I was particularly pleased to see this study appear in the top-tier journal, Nature. (Thanks to my colleague Sarah Vieweg for flagging). Earlier studies have shown that “human communications are both temporally & spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness.” In this new study, the authors analyze crisis events using country-wide mobile phone data. To this end, they also analyze the communication patterns of mobile phone users outside the affected area. So the question driving this study is this: how do the communication patterns of non-affected mobile phone users differ from those affected? Why ask this question? Understanding the communication patterns of mobile phone users outside the affected areas sheds light on how situational awareness spreads during disasters.

Nature graphs

The graphs above (click to enlarge) simply depict the change in call volume for three crisis events and one non-emergency event for the two types of mobile phone users. The set of users directly affected by a crisis is labeled G0 while users they contact during the emergency are labeled G1. Note that G1 users are not affected by the crisis. Since the study seeks to assess how G1 users change their communication patterns following a crisis, one logical question is this: do the call volume of G1 users increase like those of G0 users? The graphs above reveal that G1 and G0 users have instantaneous and corresponding spikes for crisis events. This is not the case for the non-emergency event.

“As the activity spikes for G0 users for emergency events are both temporally and spatially localized, the communication of G1 users becomes the most important means of spreading situational awareness.” To quantify the reach of situational awareness, the authors study the communication patterns of G1 users after they receive a call or SMS from the affected set of G0 users. They find 3 types of communication patterns for G1 users, as depicted below (click to enlarge).

Nature graphs 2

Pattern 1: G1 users call back G0 users (orange edges). Pattern 2: G1 users call forward to G2 users (purple edges). Pattern 3: G1 users call other G1 users (green edges). Which of these 3 patterns is most pronounced during a crisis? Pattern 1, call backs, constitute 25% of all G1 communication responses. Pattern 2, call forwards, constitutes 70% of communications. Pattern 3, calls between G1 users only represents 5% of all communications. This means that the spikes in call volumes shown in the above graphs is overwhelmingly driven by Patterns 1 and 2: call backs and call forwards.

The graphs below (click to enlarge) show call volumes by communication patterns 1 and 2. In these graphs, Pattern 1 is the orange line and Pattern 2 the dashed purple line. In all three crisis events, Pattern 1 (call backs) has clear volume spikes. “That is, G1 users prefer to interact back with G0 users rather than contacting with new users (G2), a phenomenon that limits the spreading of information.” In effect, Pattern 1 is a measure of reciprocal communications and indeed social capital, “representing correspondence and coordination calls between social neighbors.” In contrast, Pattern 2 measures the dissemination of the “dissemination of situational awareness, corresponding to information cascades that penetrate the underlying social network.”

Nature graphs 3

The histogram below shows average levels of reciprocal communication for the 4 events under study. These results clearly show a spike in reciprocal behavior for the three crisis events compared to the baseline. The opposite is true for the non-emergency event.Nature graphs 4

In sum, a crisis early warning system based on communication patterns should seek to monitor changes in the following two indicators: (1) Volume of Call Backs; and (2) Deviation of Call Backs from baseline. Given that access to mobile phone data is near-impossible for the vast majority of academics and humanitarian professionals, one question worth exploring is whether similar communication dynamics can be observed on social networks like Twitter and Facebook.

 bio

Big Data & Disaster Response: Even More Wrong Assumptions

“Arguing that Big Data isn’t all it’s cracked up to be is a straw man, pure and simple—because no one should think it’s magic to begin with.” Since citing this point in my previous post on Big Data for Disaster Response: A List of Wrong Assumptions, I’ve come across more mischaracterizations of Big (Crisis) Data. Most of these fallacies originate from the Ivory Towers; from [a small number of] social scientists who have carried out one or two studies on the use of social media during disasters and repeat their findings ad nauseam as if their conclusions are the final word on a very new area of research.

Screen Shot 2013-11-05 at 12.38.31 PM

The mischaracterization of “Big Data and Sample Bias”, for example, typically arises when academics point out that marginalized communities do not have access to social media. First things first: I highly recommend reading “Big Data and Its Exclusions,” published by Stanford Law Review. While the piece does not address Big Crisis Data, it is nevertheless instructive when thinking about social media for emergency management. Secondly, identifying who “speaks” (and who does not speak) on social media during humanitarian crises is of course imperative, but that’s exactly why the argument about sample bias is such a straw man—all of my humanitarian colleagues know full well that social media reports are not representative. They live in the real world where the vast majority of data they have access to is unrepresentative and imperfect—hence the importance of drawing on as many sources as possible, including social media. Random sampling during disasters is a Quixotic luxury, which explains why humanitarian colleagues seek “good enough” data and methods.

Some academics also seem to believe that disaster responders ignore all other traditional sources of crisis information in favor of social media. This means, to follow their argument, that marginalized communities have no access to other communication life lines if they are not active on social media. One popular observation is the “revelation” that some marginalized neighborhoods in New York posted very few tweets during Hurricane Sandy. Why some academics want us to be surprised by this, I know not. And why they seem to imply that emergency management centers will thus ignore these communities (since they apparently only respond to Twitter) is also a mystery. What I do know is that social capital and the use of traditional emergency communication channels do not disappear just because academics chose to study tweets. Social media is simply another node in the pre-existing ecosystem of crisis information. 

negative space

Furthermore, the fact that very few tweets came out of the Rockaways during Hurricane Sandy can be valuable information for disaster responders, a point that academics often overlook. To be sure, monitoring  social media footprints during disasters can help humanitarians get a better picture of the “negative space” and thus infer what they might be missing, especially when comparing these “negative footprints” with data from traditional sources. Indeed, knowing what you don’t know is a key component of situational awareness. No one wants blind spots, and knowing who is not speaking on social media during disasters can help correct said blind spots. Moreover, the contours of a community’s social media footprint during a disaster can shed light on how neighboring areas (that are not documented on social media) may have been affected. When I spoke about this with humanitarian colleagues in Geneva this week, they fully agreed with my line of reasoning and even added that they already apply “good enough” methods of inference with traditional crisis data.

My PopTech colleague Andrew Zolli is fond of saying that we shape the world by the questions we ask. My UN colleague Andrej Verity recently reminded me that one of the most valuable aspects of social media for humanitarian response is that it helps us to ask important questions (that would not otherwise be posed) when coordinating disaster relief. So the next time you hear an academic go on about [a presentation on] issues of bias and exclusion, feel free to share the above along with this list of wrong assumptions.

Most importantly, tell them [say] this: “Arguing that Big Data isn’t all it’s cracked up to be is a straw man, pure and simple—because no one should think it’s magic to begin with.” It is high time we stop mischaracterizing Big Crisis Data. What we need instead is a can-do, problem-solving attitude. Otherwise we’ll all fall prey to the Smart-Talk trap.

Bio

Data Protection: This Tweet Will Self-Destruct In…

The permanence of social media such as tweets presents an important challenge for data protection and privacy. This is particularly true when social media is used to communicate during crises. Indeed, social media users tend to volunteer personal identifying information during disasters that they otherwise would not share, such as phone numbers and home addresses. They typically share this sensitive information to offer help or seek assistance. What if we could limit the visibility of these messages after their initial use?

Twitter self destruct

Enter TwitterSpirit and Efemr, which enable users to schedule their tweets for automatic deletion after a specified period of time using hashtags like #1m, #2h or #3d. According to Wired, using these services will (in some cases) also delete retweets. That said, tweets with #time hashtags can always be copied manually in any number of ways, so the self-destruction is not total. Nevertheless, their visibility can still be reduced by using TwitterSpirit and Efemr. Lastly, the use of these hashtags also sends a social signal that these tweets are intended to have limited temporal use.

bio

Note: My fellow PopTech and Rockefeller Foundation Fellows and I have been thinking of related solutions, which we plan to blog about shortly. Hence my interest in Spirit & Efemr, which I stumbled upon by chance just now.

Radical Visualization of Photos Posted to Instagram During Hurricane Sandy

Sandy Instagram Pictures

This data visualization (click to enlarge) displays more than 23,500 photos taken in Brooklyn and posted to Instagram during Hurricane Sandy. A picture’s distance from the center (radius) corresponds to its mean hue while a picture’s position along the perimeter (angle) corresponds to the time that picture was taken. “Note the demarcation line that reveals the moment of a power outage in the area and indicates the intensity of the shared experience (dramatic decrease in the number of photos, and their darker colors to the right of the line)” (1).

Sandy Instagram 2

Click here to interact with the data visualization. The research methods behind this visualization are described here along with other stunning visuals.

bio

Stunning Wind Map of Hurricane Sandy

Surface wind data from the National Digital Forecast Database is updated on an hourly basis. More galleries of stunning wind maps here.

bio

Data Science for 100 Resilient Cities

The Rockefeller Foundation recently launched a major international initiative called “100 Resilient Cities.” The motivation behind this global project stems from the recognition that cities are facing increasing stresses driven by the unprecedented pace urbanization. More than 75% of people expected to live in cities by 2050. The Foundation is thus rightly concerned: “As natural and man-made shocks and stresses grow in frequency, impact and scale, with the ability to ripple across systems and geographies, cities are largely unprepared to respond to, withstand, and bounce back from disasters” (1).

Resilience is the capacity to self-organize, and smart self-organization requires social capital and robust feedback loops. I’ve discussed these issues and related linkages at lengths in the posts listed below and so shan’t repeat myself here. 

  • How to Create Resilience Through Big Data [link]
  • On Technology and Building Resilient Societies [link]
  • Using Social Media to Predict Disaster Resilience [link]
  • Social Media = Social Capital = Disaster Resilience? [link]
  • Does Social Capital Drive Disaster Resilience? [link]
  • Failing Gracefully in Complex Systems: A Note on Resilience [link]

Instead, I want to make a case for community-driven “tactical resilience” aided (not controlled) by data science. I came across the term “tactical urbanism” whilst at the “The City Resilient” conference co-organized by PopTech & Rockefeller in June. Tactical urbanism refers to small and temporary projects that demonstrate what could be. We also need people-centered tactical resilience initiatives to show small-scale resilience in action and demonstrate what these could mean at scale. Data science can play an important role in formulating and implementing tactical resilience interventions and in demonstrating their resulting impact at various scales.

Ultimately, if tactical resilience projects do not increase local capacity for smart and scalable self-organization, then they may not render cities more resilient. “Smart Cities” should mean “Resilient Neighborhoods” but the former concept takes a mostly top-down approach focused on the physical layer while the latter recognizes the importance of social capital and self-organization at the neighborhood level. “Indeed, neighborhoods have an impact on a surprisingly wide variety of outcomes, including child health, high-school graduation, teen births, adult mortality, social disorder and even IQ scores” (1).

So just like IBM is driving the data science behind their Smart Cities initiatives, I believe Rockefeller’s 100 Resilient Cities grantees would benefit from similar data science support and expertise but at the tactical and neighborhood level. This explains why my team and I plan to launch a Data Science for Resilience Program at the Qatar Foundation’s Computing Research Institute (QCRI). This program will focus on providing data science support to promising “tactical resilience” projects related to Rockefeller’s 100 Resilient Cities initiative.

The initial springboard for these conversations will be the PopTech & Rockefeller Fellows Program on “Community Resilience Through Big Data and Technology”. I’m really honored and excited to have been selected as one of the PopTech and Rockefeller Fellows to explore the intersections of Big Data, Technology and Resilience. As mentioned to the organizers, one of my objectives during this two-week brainstorming session is to produce a joint set of “tactical resilience” project proposals with well articulated research questions. My plan is to select the strongest questions and make them the basis for our initial data science for resilience research at QCRI.

bio

Data Science for Social Good: Not Cognitive Surplus but Cognitive Mismatch

I’ve spent the past 12 months working with top notch data scientists at QCRI et al. The following may thus be biased: I think QCRI got it right. They strive to balance their commitment to positive social change with their primary mission of becoming a world class institute for advanced computing research. The two are not mutually exclusive. What it takes is a dedicated position, like the one created for me at QCRI. It is high time that other research institutes, academic programs and international computing conferences create comparable focal points to catalyze data science for social good.

Microsoft Research, to name just one company, carries out very interesting research that could have tremendous social impact, but the bridge necessary to transfer much of that research from knowledge to operation to social impact is often not there. And when it is, it is usually by happenstance. So researchers continue to formulate research questions based on what they find interesting rather than identifying equally interesting questions that could have direct social impact if answered by data science. Hundreds of papers get presented at computing conferences every month, and yet few if any of the authors have linked up with organizations like the United Nations, World Bank, Habitat for Humanity etc., to identify and answer questions with social good potential. The same is true for hundreds of computing dissertations that get defended every year. Doctoral students do not realize that a minor reformulation of their research question could perhaps make a world of difference to a community-based organization in India dedicated to fighting corruption, for example.

Cognitive Mismatch

The challenge here is not one of untapped cognitive surplus (to borrow from Clay Shirky), but rather complete cognitive mismatch. As my QCRI colleague Ihab Ilyas puts it: there are “problem owners” on the one hand and “problem solvers” on the other. The former have problems that prevent them from catalyzing positive social change. The later know how to solve comparable problems and do so every day. But the two are not talking or even aware of each other. Creating and maintaining this two-way conversation requires more than one dedicated position (like mine at QCRI).

sweet spot

In short, I really want to have dedicated counterparts at Microsoft Research, IBM, SAP, LinkedIn, Bitly, GNIP, etc., as well as leading universities, top notch computing conferences and challenges; counterparts who have one foot in the world of data science and the other in the social sector; individuals who have a demonstrated track-record in bridging communities. There’s a community here waiting to be connected and needing to be formed. Again, carrying out cutting edge computing R&D is in no way incompatible with generating positive social impact. Moreover, the latter provides an important return on investment in the form of data, reputation, publicity, connections and social capital. In sum, social good challenges need to be formulated into research questions that have scientific as well as social good value. There is definitely a sweet spot here but it takes a dedicated community to bring problem owners and solvers together and hit that social good sweet spot.

Bio