Tag Archives: data

Integrating Geo-Data with Social Media Improves Situational Awareness During Disasters

A new data-driven study on the flooding of River Elbe in 2013 (one of the most severe floods ever recorded in Germany) shows that geo-data can enhance the process of extracting relevant information from social media during disasters. The authors use “specific geographical features like hydrological data and digital elevation models to prioritize crisis-relevant twitter messages.” The results demonstrate that an “approach based on geographical relations can enhance information extraction from volunteered geographic information,” which is “valuable for both crisis response and preventive flood monitoring.” These conclusions thus support a number of earlier studies that show the added value of data integration. This analysis also confirms several other key assumptions, which are important for crisis computing and disaster response.

floods elbe

The authors apply a “geographical approach to prioritize [the collection of] crisis-relevant information from social media.” More specifically, they combine information from “tweets, water level measurements & digital elevation models” to answer the following three research questions:

  • Does the spatial and temporal distribution of flood-related tweets actually match the spatial and temporal distribution of the flood phenomenon (despite Twitter bias, potentially false info, etc)?

  • Does the spatial distribution of flood-related tweets differ depending on their content?
  • Is geographical proximity to flooding a useful parameter to prioritize social media messages in order to improve situation awareness?

The authors analyzed just over 60,000 disaster-related tweets generated in Germany during the flooding of River Elbe in June 2013. Only 398 of these tweets (0.7%) contained keywords related to the flooding. The geographical distribution of flood-related tweets versus non-flood related tweets is depicted below (click to enlarge).

Screen Shot 2014-10-04 at 7.04.59 AM

As the authors note, “a considerable amount” of flood-related tweets are geo-located in areas of major flooding. So they tested the statistical correlation between the location of flood-related tweets and the actual flooding, which they found to be “statistically significantly lower compared to non-related Twitter messages.” This finding “implies that the locations of flood-related twitter messages and flood-affected catchments match to a certain extent. In particular this means that mostly people in regions affected by the flooding or people close to these regions posted twitter messages referring to the flood.” To this end, major urban areas like Munich and Hamburg were not the source of most flood-related tweets. Instead, “The majority of tweet referring to the flooding were posted by locals” closer to the flooding.

Given that “most flood-related tweets were posted by locals it seems probable that these messages contain local knowledge only available to people on site.” To this end, the authors analyzed the “spatial distribution of flood-related tweets depending on their content.” The results, depicted below (click to enlarge), show that the geographical distribution of tweets do indeed differ based on their content. This is especially true of tweets containing information about “volunteer actions” and “flood level”. The authors confirm these results are statistically significant when compared with tweets related to “media” and “other” issues.

Screen Shot 2014-10-04 at 7.22.05 AM

These findings also reveal that the content of Twitter messages can be combined into three groups given their distance to actual flooding:

Group A: flood level & volunteer related tweets are closest to the floods.
Group B: tweets on traffic conditions have a medium distance to the floods.
Group C: other and media related tweets a furthest to the flooding.

Tweets belonging to “Group A” yield greater situational awareness. “Indeed, information about current flood levels is crucial for situation awareness and can complement existing water level measurements, which are only available for determined geographical points where gauging stations are located. Since volunteer actions are increasingly organized via social media, this is a type of information which is very valuable and completely missing from other sources.”

Screen Shot 2014-10-04 at 6.55.49 AM

In sum, these results show that “twitter messages that are closest to the flood- affected areas (Group A) are also the most useful ones.” The authors thus conclude that “the distance to flood phenomena is indeed a useful parameter to prioritize twitter messages towards improving situation awareness.” To be sure, the spatial distribution of flood-related tweets is “significantly different from the spatial distribution of off-topic messages.” Whether this is also true of other social media platforms like Instagram and Flickr remains to be seen. This is an important area for future research given the increasing use of pictures posted on social media for rapid damage assessments in the aftermath of disasters.

ImageClicker

“The integration of other official datasets, e.g. precipitation data or satellite images, is another avenue for future work towards better understanding the relations between social media and crisis phenomena from a geographical perspective.” I would add both aerial imagery (captured by UAVs) and data from mainstream news (captured by GDELT) to this data fusion exercise. Of course, the geographical approach described above is not limited to the study of flooding only but could be extended to other natural hazards.

This explains why my colleagues at GeoFeedia may be on the right track with their crisis mapping platform. That said, the main limitation with GeoFeedia and the study above is the fact that only 3% of all tweets are actually geo-referenced. But this need not be a deal breaker. Instead, platforms like GeoFeedia can be complemented by other crisis computing solutions that prioritize the analysis of social media content over geography.

Take the free and open-source “Artificial Intelligence for Disaster Response” (AIDR) platform that my team and I at QCRI are developing. Humanitarian organizations can use AIDR to automatically identify tweets related to flood levels and volunteer actions (deemed to provide the most situational awareness) without requiring that tweets be geo-referenced. In addition, AIDR can also be used to identify eyewitness tweets regardless of whether they refer to flood levels, volunteering or other issues. Indeed, we already demonstrated that eyewitness tweets can be automatically identified with an accuracy of 80-90% using AIDR. And note that AIDR can also be used on geo-tagged tweets only.

The authors of the above study recently go in touch to explore ways that their insights can be used to further improve AIDR. So stay tuned for future updates on how we may integrate geo-data more directly within AIDR to improve situational awareness during disasters.

bio

See also:

  • Debating the Value of Tweets For Disaster Response (Intelligently) [link]
  • Social Media for Emergency Management: Question of Supply and Demand [link]
  • Become a (Social Media) Data Donor and Save a Life [link]

Quantifying Information Flow During Emergencies

I was particularly pleased to see this study appear in the top-tier journal, Nature. (Thanks to my colleague Sarah Vieweg for flagging). Earlier studies have shown that “human communications are both temporally & spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness.” In this new study, the authors analyze crisis events using country-wide mobile phone data. To this end, they also analyze the communication patterns of mobile phone users outside the affected area. So the question driving this study is this: how do the communication patterns of non-affected mobile phone users differ from those affected? Why ask this question? Understanding the communication patterns of mobile phone users outside the affected areas sheds light on how situational awareness spreads during disasters.

Nature graphs

The graphs above (click to enlarge) simply depict the change in call volume for three crisis events and one non-emergency event for the two types of mobile phone users. The set of users directly affected by a crisis is labeled G0 while users they contact during the emergency are labeled G1. Note that G1 users are not affected by the crisis. Since the study seeks to assess how G1 users change their communication patterns following a crisis, one logical question is this: do the call volume of G1 users increase like those of G0 users? The graphs above reveal that G1 and G0 users have instantaneous and corresponding spikes for crisis events. This is not the case for the non-emergency event.

“As the activity spikes for G0 users for emergency events are both temporally and spatially localized, the communication of G1 users becomes the most important means of spreading situational awareness.” To quantify the reach of situational awareness, the authors study the communication patterns of G1 users after they receive a call or SMS from the affected set of G0 users. They find 3 types of communication patterns for G1 users, as depicted below (click to enlarge).

Nature graphs 2

Pattern 1: G1 users call back G0 users (orange edges). Pattern 2: G1 users call forward to G2 users (purple edges). Pattern 3: G1 users call other G1 users (green edges). Which of these 3 patterns is most pronounced during a crisis? Pattern 1, call backs, constitute 25% of all G1 communication responses. Pattern 2, call forwards, constitutes 70% of communications. Pattern 3, calls between G1 users only represents 5% of all communications. This means that the spikes in call volumes shown in the above graphs is overwhelmingly driven by Patterns 1 and 2: call backs and call forwards.

The graphs below (click to enlarge) show call volumes by communication patterns 1 and 2. In these graphs, Pattern 1 is the orange line and Pattern 2 the dashed purple line. In all three crisis events, Pattern 1 (call backs) has clear volume spikes. “That is, G1 users prefer to interact back with G0 users rather than contacting with new users (G2), a phenomenon that limits the spreading of information.” In effect, Pattern 1 is a measure of reciprocal communications and indeed social capital, “representing correspondence and coordination calls between social neighbors.” In contrast, Pattern 2 measures the dissemination of the “dissemination of situational awareness, corresponding to information cascades that penetrate the underlying social network.”

Nature graphs 3

The histogram below shows average levels of reciprocal communication for the 4 events under study. These results clearly show a spike in reciprocal behavior for the three crisis events compared to the baseline. The opposite is true for the non-emergency event.Nature graphs 4

In sum, a crisis early warning system based on communication patterns should seek to monitor changes in the following two indicators: (1) Volume of Call Backs; and (2) Deviation of Call Backs from baseline. Given that access to mobile phone data is near-impossible for the vast majority of academics and humanitarian professionals, one question worth exploring is whether similar communication dynamics can be observed on social networks like Twitter and Facebook.

 bio

Big Data & Disaster Response: Even More Wrong Assumptions

“Arguing that Big Data isn’t all it’s cracked up to be is a straw man, pure and simple—because no one should think it’s magic to begin with.” Since citing this point in my previous post on Big Data for Disaster Response: A List of Wrong Assumptions, I’ve come across more mischaracterizations of Big (Crisis) Data. Most of these fallacies originate from the Ivory Towers; from [a small number of] social scientists who have carried out one or two studies on the use of social media during disasters and repeat their findings ad nauseam as if their conclusions are the final word on a very new area of research.

Screen Shot 2013-11-05 at 12.38.31 PM

The mischaracterization of “Big Data and Sample Bias”, for example, typically arises when academics point out that marginalized communities do not have access to social media. First things first: I highly recommend reading “Big Data and Its Exclusions,” published by Stanford Law Review. While the piece does not address Big Crisis Data, it is nevertheless instructive when thinking about social media for emergency management. Secondly, identifying who “speaks” (and who does not speak) on social media during humanitarian crises is of course imperative, but that’s exactly why the argument about sample bias is such a straw man—all of my humanitarian colleagues know full well that social media reports are not representative. They live in the real world where the vast majority of data they have access to is unrepresentative and imperfect—hence the importance of drawing on as many sources as possible, including social media. Random sampling during disasters is a Quixotic luxury, which explains why humanitarian colleagues seek “good enough” data and methods.

Some academics also seem to believe that disaster responders ignore all other traditional sources of crisis information in favor of social media. This means, to follow their argument, that marginalized communities have no access to other communication life lines if they are not active on social media. One popular observation is the “revelation” that some marginalized neighborhoods in New York posted very few tweets during Hurricane Sandy. Why some academics want us to be surprised by this, I know not. And why they seem to imply that emergency management centers will thus ignore these communities (since they apparently only respond to Twitter) is also a mystery. What I do know is that social capital and the use of traditional emergency communication channels do not disappear just because academics chose to study tweets. Social media is simply another node in the pre-existing ecosystem of crisis information. 

negative space

Furthermore, the fact that very few tweets came out of the Rockaways during Hurricane Sandy can be valuable information for disaster responders, a point that academics often overlook. To be sure, monitoring  social media footprints during disasters can help humanitarians get a better picture of the “negative space” and thus infer what they might be missing, especially when comparing these “negative footprints” with data from traditional sources. Indeed, knowing what you don’t know is a key component of situational awareness. No one wants blind spots, and knowing who is not speaking on social media during disasters can help correct said blind spots. Moreover, the contours of a community’s social media footprint during a disaster can shed light on how neighboring areas (that are not documented on social media) may have been affected. When I spoke about this with humanitarian colleagues in Geneva this week, they fully agreed with my line of reasoning and even added that they already apply “good enough” methods of inference with traditional crisis data.

My PopTech colleague Andrew Zolli is fond of saying that we shape the world by the questions we ask. My UN colleague Andrej Verity recently reminded me that one of the most valuable aspects of social media for humanitarian response is that it helps us to ask important questions (that would not otherwise be posed) when coordinating disaster relief. So the next time you hear an academic go on about [a presentation on] issues of bias and exclusion, feel free to share the above along with this list of wrong assumptions.

Most importantly, tell them [say] this: “Arguing that Big Data isn’t all it’s cracked up to be is a straw man, pure and simple—because no one should think it’s magic to begin with.” It is high time we stop mischaracterizing Big Crisis Data. What we need instead is a can-do, problem-solving attitude. Otherwise we’ll all fall prey to the Smart-Talk trap.

Bio

Data Protection: This Tweet Will Self-Destruct In…

The permanence of social media such as tweets presents an important challenge for data protection and privacy. This is particularly true when social media is used to communicate during crises. Indeed, social media users tend to volunteer personal identifying information during disasters that they otherwise would not share, such as phone numbers and home addresses. They typically share this sensitive information to offer help or seek assistance. What if we could limit the visibility of these messages after their initial use?

Twitter self destruct

Enter TwitterSpirit and Efemr, which enable users to schedule their tweets for automatic deletion after a specified period of time using hashtags like #1m, #2h or #3d. According to Wired, using these services will (in some cases) also delete retweets. That said, tweets with #time hashtags can always be copied manually in any number of ways, so the self-destruction is not total. Nevertheless, their visibility can still be reduced by using TwitterSpirit and Efemr. Lastly, the use of these hashtags also sends a social signal that these tweets are intended to have limited temporal use.

bio

Note: My fellow PopTech and Rockefeller Foundation Fellows and I have been thinking of related solutions, which we plan to blog about shortly. Hence my interest in Spirit & Efemr, which I stumbled upon by chance just now.

Radical Visualization of Photos Posted to Instagram During Hurricane Sandy

Sandy Instagram Pictures

This data visualization (click to enlarge) displays more than 23,500 photos taken in Brooklyn and posted to Instagram during Hurricane Sandy. A picture’s distance from the center (radius) corresponds to its mean hue while a picture’s position along the perimeter (angle) corresponds to the time that picture was taken. “Note the demarcation line that reveals the moment of a power outage in the area and indicates the intensity of the shared experience (dramatic decrease in the number of photos, and their darker colors to the right of the line)” (1).

Sandy Instagram 2

Click here to interact with the data visualization. The research methods behind this visualization are described here along with other stunning visuals.

bio

Stunning Wind Map of Hurricane Sandy

Surface wind data from the National Digital Forecast Database is updated on an hourly basis. More galleries of stunning wind maps here.

bio

Data Science for 100 Resilient Cities

The Rockefeller Foundation recently launched a major international initiative called “100 Resilient Cities.” The motivation behind this global project stems from the recognition that cities are facing increasing stresses driven by the unprecedented pace urbanization. More than 75% of people expected to live in cities by 2050. The Foundation is thus rightly concerned: “As natural and man-made shocks and stresses grow in frequency, impact and scale, with the ability to ripple across systems and geographies, cities are largely unprepared to respond to, withstand, and bounce back from disasters” (1).

Resilience is the capacity to self-organize, and smart self-organization requires social capital and robust feedback loops. I’ve discussed these issues and related linkages at lengths in the posts listed below and so shan’t repeat myself here. 

  • How to Create Resilience Through Big Data [link]
  • On Technology and Building Resilient Societies [link]
  • Using Social Media to Predict Disaster Resilience [link]
  • Social Media = Social Capital = Disaster Resilience? [link]
  • Does Social Capital Drive Disaster Resilience? [link]
  • Failing Gracefully in Complex Systems: A Note on Resilience [link]

Instead, I want to make a case for community-driven “tactical resilience” aided (not controlled) by data science. I came across the term “tactical urbanism” whilst at the “The City Resilient” conference co-organized by PopTech & Rockefeller in June. Tactical urbanism refers to small and temporary projects that demonstrate what could be. We also need people-centered tactical resilience initiatives to show small-scale resilience in action and demonstrate what these could mean at scale. Data science can play an important role in formulating and implementing tactical resilience interventions and in demonstrating their resulting impact at various scales.

Ultimately, if tactical resilience projects do not increase local capacity for smart and scalable self-organization, then they may not render cities more resilient. “Smart Cities” should mean “Resilient Neighborhoods” but the former concept takes a mostly top-down approach focused on the physical layer while the latter recognizes the importance of social capital and self-organization at the neighborhood level. “Indeed, neighborhoods have an impact on a surprisingly wide variety of outcomes, including child health, high-school graduation, teen births, adult mortality, social disorder and even IQ scores” (1).

So just like IBM is driving the data science behind their Smart Cities initiatives, I believe Rockefeller’s 100 Resilient Cities grantees would benefit from similar data science support and expertise but at the tactical and neighborhood level. This explains why my team and I plan to launch a Data Science for Resilience Program at the Qatar Foundation’s Computing Research Institute (QCRI). This program will focus on providing data science support to promising “tactical resilience” projects related to Rockefeller’s 100 Resilient Cities initiative.

The initial springboard for these conversations will be the PopTech & Rockefeller Fellows Program on “Community Resilience Through Big Data and Technology”. I’m really honored and excited to have been selected as one of the PopTech and Rockefeller Fellows to explore the intersections of Big Data, Technology and Resilience. As mentioned to the organizers, one of my objectives during this two-week brainstorming session is to produce a joint set of “tactical resilience” project proposals with well articulated research questions. My plan is to select the strongest questions and make them the basis for our initial data science for resilience research at QCRI.

bio