Category Archives: Social Computing

Data Science for Social Good and Humanitarian Action

My (new) colleagues at the University of Chicago recently launched a new and exciting program called “Data Science for Social Good”. The program, which launches this summer, will bring together dozens top-notch data scientists, computer scientists an social scientists to address major social challenges. Advisors for this initiative include Eric Schmidt (Google), Raed Ghani (Obama Administration) and my very likable colleague Jake Porway (DataKind). Think of “Data Science for Social Good” as a “Code for America” but broader in scope and application. I’m excited to announce that QCRI is looking to collaborate with this important new program given the strong overlap with our Social Innovation Vision, Strategy and Projects.

My team and I at QCRI are hoping to mentor and engage fellows throughout the summer on key humanitarian & development projects we are working on in partnership with the United Nations, Red Cross, World Bank and others. This would provide fellows with the opportunity to engage in  “real world” challenges that directly match their expertise and interests. Second, we (QCRI) are hoping to replicate this type of program in Qatar in January 2014.

Why January? This will give us enough time to design the new program based on the result of this summer’s experiment. More importantly, perhaps, it will be freezing in Chicago ; ) and wonderfully warm in Doha. Plus January is an easier time for many students and professionals to take “time off”. The fellows program will likely be 3 weeks in duration (rather than 3 months) and will focus on applying data science to promote social good projects in the Arab World and beyond. Mentors will include top Data Scientists from QCRI and hopefully the University of Chicago. We hope to create 10 fellowship positions for this Data Science for Social Good program. The call for said applications will go out this summer, so stay tuned for an update.

bio

Artificial Intelligence for Monitoring Elections (AIME)

Citizen-based, crowdsourced election observation initiatives are on the rise. Leading election monitoring organizations are also looking to leverage citizen-based reporting to complement their own professional election monitoring efforts. Meanwhile, the information revolution continues apace, with the number of new mobile phone subscriptions up by over 1 billion in just the past 36 months alone. The volume of election-related reports generated by “the crowd” is thus expected to grow significantly in the coming years. But international, national and local election monitoring organizations are completely unprepared to deal with the rise of Big (Election) Data.

Liberia2011

The purpose of this collaborative research project, AIME, is to develop a free and open source platform to automatically filter relevant election reports from the crowd. The platform will include pre-defined classifiers (e.g., security incidents,  intimidation, vote-buying, ballot stuffing etc.) for specific countries and will also allow end-users to create their own classifiers on the fly. The project, launched by QCRI and several key partners, will specifically focus on unstructured user-generated content from SMS and Twitter. AIME partners include a major international election monitoring organization and several academic research centers. The AIME platform will use the technology being developed for QCRI’s AIDR project: Artificial Intelligence for Disaster Response.

Bio

Self-Organized Crisis Response to #BostonMarathon Attack

I’m going to keep this blog post technical because the emotions from yesterday’s events are still too difficult to deal with. Within an hour of the bombs going off, I received several emails asking me to comment on the use of social media in Boston and how it differed to the digital humanitarian response efforts I am typically engaged in. So here are just a few notes, nothing too polished, but some initial reactions.

I Stand with Boston

Once again, we saw the outpouring of operational support from the “Crowd” with over two thousand people in the Boston area volunteering to take people in if they needed help, and this within 60 minutes of the attack. This was coordinated via a Google Spreadsheet & Google Form. This is not the first time that these web-based solutions were used for disaster response. For example, Google Spreadsheets was used to coordinate grassroots response efforts during the major Philippine floods in 2012.

We’re not all affected the same way during a crisis and those of us who are less affected almost always look for ways to help. Unlike the era of television broadcasting, the crowd can now become an operational actor in disaster response. To be sure, paid disaster response professionals cannot be everywhere at the same time, but the crowd is always there. This explains I have look called for a “Match.com for disaster response” to match local needs with local resources. So while I received numerous pings on Twitter, Skype and email about launching a crisis map for Boston, I am skeptical that doing so would have added much value.

What was/is needed is real-time filtering of social media content and matching of local needs (information and material needs) with local resources. There are two complementary ways to do this: human computing (e.g., crowdsourcing, microtasking, etc) and machine computing (natural language processing, machine learning, etc), which is why my team and I at QCRI are working on developing these solutions.

Other observations from the response to yesterday’s tragedy:

  • Boston Police made active use of their Twitter account to inform and advise. They also asked other Twitter users to spread their request for everyone to leave the city center area. The police and other emergency services also actively crowdsourced photographs and video footage to begin their criminal investigations. There was such heavy multimedia social media activity in the area that one could no doubt develop a Photosynth rendering of the scene.
  • There were calls for residents to unlock their Wifi networks to enable people in the streets to get access to the Internet. This was especially important after the cellphone network was taken offline for security reasons. To be sure, access to information is equally important as access to water, food, shelter, etc, during a crisis.

I’d welcome any other observation from readers, e.g., similarities and differences between the use of technologies for domestic emergency management versus international humanitarian efforts. I would also be interested to hear thoughts about how the two could be integrated or at the very least learn from each other.

bio

Humanitarianism in the Network Age: Groundbreaking Study

My colleagues at the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) have just published a groundbreaking must-read study on Humanitarianism in the Network Age; an important and forward-thinking policy document on humanitarian technology and innovation. The report “imagines how a world of increasingly informed, connected and self-reliant communities will affect the delivery of humanitarian aid. Its conclusions suggest a fundamental shift in power from capital and headquarters to the people [that] aid agencies aim to assist.” The latter is an unsettling prospect for many. To be sure, Humanitarianism in the Network Age calls for “more diverse and bottom-up forms of decision-making—something that most Governments and humanitarian organizations were not designed for. Systems constructed to move information up and down hierarchies are facing a new reality where information can be generated by any-one, shared with anyone and acted by anyone.”

Screen Shot 2013-04-04 at 10.35.40 AM

The purpose of this blog post (available as a PDF) is to summarize the 120-page OCHA study. In this summary, I specifically highlight the most important insights and profound implications. I also fill what I believe are some of the report’s most important gaps. I strongly recommend reading the OCHA publication in full, but if you don’t have time to leaf through the study, reading this summary will ensure that you don’t miss a beat. Unless otherwise stated, all quotes and figures below are taken directly from the OCHA report.

All in all, this is an outstanding, accurate, radical and impressively cross-disciplinary study. In fact, what strikes me most about this report is how far we’ve come since the devastating Haiti Earthquake of 2010. Just three short years ago, speaking the word “crowdsourcing” was blasphemous, like “Voldermort” (for all you Harry Potter fans). This explains why some humanitarians called me the CrowdSorcerer at the time (thinking it was a derogatory term). CrisisMappers was only launched three months before Haiti. The Standby Volunteer Task Force (SBTF) didn’t even exist at the time and the Digital Humanitarian Network (DHN) was to be launched 2 years hence. And here we are, just three short years later, with this official, high-profile humanitarian policy document that promotes crowdsourcing, digital humanitarian response and next generation humanitarian technology. Exciting times. While great challenges remain, I dare say we’re trying our darned best to find some solutions, and this time through collaboration, CrowdSorcerers and all. The OCHA report is a testament to this collaboration.

Screen Shot 2013-04-04 at 10.43.15 AM

Summary

the Rise of big (crisis) data

Over 100 countries have more mobile phone subscriptions than they have people. One in four individuals in developing countries use the Internet. This figure will double within 20 months. About 70% of Africa’s total population are mobile subscribers. In short, “The planet has gone online, producing and sharing vast quantities of information.” Meanwhile, however, hundreds of millions of people are affected by disasters every year—more than 250 million in 2010 alone. There have been over 1 billion new mobile phone subscriptions since 2010. In other words, disaster affected communities have becoming increasingly “digital” as a result of the information revolution. These new digital technologies continue are evolving new nervous system for our planet, taking the pulse of our social, economic and political networks in real-time.

“Filipinos sent an average of 2 billion SMS messages every day in early 2012,” for example. When disaster strikes, many of these messages are likely to relay crisis information. In Japan, over half-a-million new users joined Twitter the day after the 2011 Earthquake. More than 177 million tweets about the disaster were posted that same day—that is, 2,000 tweets per second on average. Welcome to “The Rise of Big (Crisis) Data.” Meanwhile, back in the US, 80% of the American public expects emergency responders to monitor social media; and almost as many expect them to respond within three hours of posting a request on social media (1). These expectations have been shown to increase year-on year. “At the same time,” however, the OCHA report notes that “there are greater numbers of people […] who are willing and able to respond to needs.”

communities first

A few brave humanitarian organizations are embracing these changes and new realities, “reorienting their approaches around the essential objectives of helping people to help themselves.” That said, “the frontline of humanitarian action has always consisted of communities helping themselves before outside aid arrives.” What is new, however, is “affected people using technology to communicate, interact with and mobilize their social networks quicker than ever before […].” To this end, “by rethinking how aid agencies work and communicate with people in crisis, there is a chance that many more lives can be saved.” In sum, “the increased reach of communications networks and the growing network of people willing and able to help, are defining a new age—a network age—for humanitarian assistance.”

This stands in stark contrast to traditional notions of humanitarian assistance, which refer to “a small group of established international organizations, often based in and funded by high-income countries, providing help to people in a major crisis. This view is now out of date.” As my colleague Tim McNamara noted on the CrisisMappers list-serve, (cited in the OCHA report), this is “…not simply a technological shift [but] also a process of rapid decentralization of power. With extremely low barriers to entry, many new entrants are appearing in the fields of emergency and disaster response. They are ignoring the traditional hierarchies, because the new entrants perceive that there is something they can do which benefits others.” In other words, the humanitarian “world order” is shifting towards a more multipolar system. And so, while Tim was “referring to the specific case of volunteer crisis mappers […], the point holds true across all types of humanitarian work.”

Take the case of Somalia Speaks, for example. A journalist recently asked me to list the projects I am most proud of in this field. Somalia Speaks ranks very high. I originally pitched the idea to my Al-Jazeera colleagues back in September 2011; the project was launched three months later. Together with my colleagues at Souktelwe texted 5,000 Somalis across the country to ask how were personally affected by the crisis.

SomaliaSpeaksPic

As the OCHA study notes, we received over 3,000 responses, which were translated into English and geotagged by the Diaspora and subsequently added to a crisis map hosted on the Al-Jazeera website. From the OCHA report: “effective communication can also be seen as an end itself in promoting human dignity. More than 3,000 Somalis responded to the Somalia Speaks project, and they seemed to feel that speaking out was a worthwhile activity.” In sum, “The Somalia Speaks project enabled the voices of people from one of the world’s most inaccessible, conflict-ridden areas, in a language known to few outside their community, to be heard by decision makers from across the planet.” The project has since been replicated several times; see Uganda Speaks for example. The OCHA study refers to Somalia Speaks at least four times, highlighting the project as an example of networked humanitarianism.

PRIVACY, SECURITY & PROTECTION

The report also emphasizes the critical importance of data security, privacy and protection in the network age. OCHA’s honest and balanced approach to the topic is another reason why this report is so radical and forward thinking. “Concern over the protection of information and data is not a sufficient reason to avoid using new communications technologies in emergencies, but it must be taken into account. To adapt to increased ethical risks, humanitarian responders and partners need explicit guidelines and codes of conduct for managing new data sources.” This is precisely why I worked with GSMA’s Disaster Response Program to draft and publish the first ever Code of Conduct for the Use of SMS in Disaster Response. I have also provided extensive feedback to the International Committee of the Red Cross’s (ICRC) latest edition of the “Professional Standards for Protection Work,” which was just launched in Geneva this month. My colleagues Emmanuel Letouzé and Patrick Vinck also included a section on data security and ethics in our recent publication on the use of Big Data for Conflict Prevention. In addition, I have blogged about this topic quite a bit: herehere and here, for example.

crisis in decision making

“As the 2010 Haiti crisis revealed, the usefulness of new forms of information gathering is limited by the awareness of responders that new data sources exist, and their applicability to existing systems of humanitarian decision-making.” The fact of the matter is that humanitarian decision-making structures are simply not geared towards using Big Crisis Data let alone new data sources. More pointedly, however, humanitarian decision-making processes are often not based on empirical data in the first place, even when the data originate from traditional sources. As DfID notes in this 2012 strategy document, “Even when good data is available, it is not always used to inform decisions. There are a number of reasons for this, including data not being available in the right format, not widely dispersed, not easily accessible by users, not being transmitted through training and poor information management. Also, data may arrive too late to be able to influence decision-making in real time operations or may not be valued by actors who are more focused on immediate action.”

This is the classic warning-response gap, which has been discussed ad nauseum for decades in the field of famine early warning systems and conflict early warning systems. More data in no way implies action. Take the 2011 Somalia Famine, which was one of the best documented crises yet. So the famine didn’t occur because data was lacking. “Would more data have driven a better decision making process that could have averted disaster? Unfortunately, this does not appear to be the case. There had, in fact, been eleven months of escalating warnings emanating from the famine early warning systems that monitor Somalia. Somalia was, at the time, one of the most frequently surveyed countries in the world, with detailed data available on malnutrition prevalence, mortality rates, and many other indicators. The evolution of the famine was reported in almost real time, yet there was no adequate scaling up of humanitarian intervention until too late” (2).

At other times, “Information is sporadic,” which is why OCHA notes that “decisions can be made on the basis of anecdote rather than fact.” Indeed, “Media reports can significantly influence allocations, often more than directly transmitted community statements of need, because they are more widely read or better trusted.” (It is worth keeping in mind that the media makes mistakes; the New York Times alone makes over 7,000 errors every year). Furthermore, as acknowledged, by OCHA, “The evidence suggests that new information sources are no less representative or reliable than more traditional sources, which are also imperfect in crisis settings.” This is one of the most radical statements in the entire report. OCHA should be applauded for their remarkable fortitude in plunging into this rapidly shifting information landscape. Indeed, they go on to state that, “Crowdsourcing has been used to validate information, map events, translate text and integrate data useful to humanitarian decision makers.”

Screen Shot 2013-04-04 at 10.40.50 AM

The vast major of disaster datasets are not perfect, regardless of whether they are drawn from traditional or non-traditional sources. “So instead of criticizing the lack of 100% data accuracy, we need to use it as a base and ensure our Monitoring and Evaluation (M&E) and community engagement pieces are strong enough to keep our programming relevant” (Bartosiak 2013). And so, perhaps the biggest impact of new technologies and recent disasters on the humanitarian sector is the self disrobing of the Emperor’s Clothes (or Data). “Analyses of emergency response during the past five years reveal that poor information management has severely hampered effective action, costing many lives.” Disasters increasingly serve as brutal audits of traditional humanitarian organizations; and the cracks are increasingly difficult to hide in an always-on social media world. The OCHA study makes clear that  decision-makers need to figure out “how to incorporate these sources into decisions.”

Fact is, “To exploit the opportunity of the network age, humanitarians must understand how to use the new range of available data sources and have the capacity to transform this data into useful information.” Furthermore, it is imperative “to ensure new partners have a better understanding of how [these] decisions are made and what information is useful to improve humanitarian action.” These new partners include the members of the Digital Humanitarian Network (DHN), for example. Finally, decision-makers also need to “invest in building analytic capacity across the entire humanitarian network.” This analytic capacity can no longer rest on manual solutions alone. The private sector already makes use of advanced computing platforms for decision-making purposes. The humanitarian industry would be well served to recognize that their problems are hardly unique. Of course, investing in greater analytic capacity is an obvious solution but many organizations are already dealing with limited budgets and facing serious capacity constraints. I provide some creative solutions to this challenge below, which I refer to as “Data Science Philanthropy“.

Commentary

Near Perfection

OCHA’s report is brilliant, honest and forward thinking. This is by far the most important official policy document yet on humanitarian technology and digital humanitarian response—and thus on the very future of humanitarian action. The study should be required reading for everyone in the humanitarian and technology communities, which is why I plan to organize a panel on the report at CrisisMappers 2013 and will refer to the strategy document in all of my forthcoming talks and many a future blog post. In the meantime, I would like to highlight and address a some of the issues that I feel need to be discussed to take this discussion further.

Ironically, some of these gaps appear to reflect a rather limited understanding of advanced computing & next generation humanitarian technology. The following topics, for example, are missing from the OCHA report: Microtasking, Sentiment Analysis and Information Forensics. In addition, the report does not relate OCHA’s important work to disaster resilience and people-centered early warning. So I’m planning to expand on the OCHA report in the technology chapter for this year’s World Disaster Report (WDR 2013). This high-profile policy document is an ideal opportunity to amplify OCHA’s radical insights and to take these to their natural and logical conclusions vis-à-vis Big (Crisis) Data. To be clear, and I must repeat this, the OCHA report is the most important forward thinking policy document yet on the future of humanitarian response. The gaps I seek to fill in no way make the previous statement any less valid. The team at OCHA should be applauded, recognized and thanked for their tremendous work on this report. So despite some of the key shortcomings described below, this policy document is by far the most honest, enlightened and refreshing look at the state of the humanitarian response today; a grounded and well-researched study that provides hope, leadership and a clear vision for the future of humanitarianism in the network age.

BIG DATA HOW

OCHA recognizes that “there is a significant opportunity to use big data to save lives,” and they also get that, “finding ways to make big data useful to humanitarian decision makers is one of the great challenges, and opportunities, of the network age.” Moreover, they realize that “While valuable information can be generated anywhere, detecting the value of a given piece of data requires analysis and understanding.” So they warn, quite rightly, that “the search for more data can obscure the need for more analysis.” To this end, they correctly conclude that “identifying the best uses of crowdsourcing and how to blend automated and crowdsourced approaches is a critical area for study.” But the report does not take these insights to their natural and logical conclusions. Nor does the report explore how to tap these new data sources let alone analyze them in real time.

Yet these Big Data challenges are hardly unique. Our problems in the humanitarian space are not that “special” or  different. OCHA rightly notes that “Understanding which bits of information are valuable to saving lives is a challenge when faced with this ocean of data.” Yes. But such challenges have been around for over a decade in other disciplines. The field of digital disease detection, for example, is years ahead when it comes to real-time analysis of crowdsourced big data, not to mention private sector companies, research institutes and even new startups whose expertise is Big Data Analytics. I can also speak to this from my own professsional experience. About a decade ago, I worked with a company specializing in conflict forecasting and early using Reuters news data (Big Data).

In sum, the OCHA report should have highlighted the fact that solutions to many of these Big Data challenges already exist, which is precisely why I joined the Qatar Computing Research Institute (QCRI). What’s more, a number of humanitarian technology projects at QCRI are already developing prototypes based on these solutions; and OCHA is actually the main partner in one such project, so it is a shame they did not get credit for this in their own report.

sentiment analysis

While I introduced the use of sentiment analysis during the Haiti Earthquake, this has yet to be replicated in other humanitarian settings. Why is sentiment analysis key to humanitarianism in the network age? The answer is simple: “Communities know best what works for them; external actors need to listen and model their response accordingly.” Indeed, “Affected people’s needs must be the starting point.” Actively listening to millions of voices is a Big Data challenge that has already been solved by the private sector. One such solution is real-time sentiment analysis to capture brand perception. This is a rapidly growing multimillion dollar market, which is why many companies like Crimson Hexagon exist. Numerous Top 500 Fortune companies have been actively using automated sentiment analysis for years now. Why? Because these advanced listening solutions enable them to better understand customer perceptions.

Screen Shot 2013-04-08 at 5.49.56 AM

In Haiti, I applied this approach to tens of thousands of text messages sent by the disaster-affected population. It allowed us to track the general mood of this population on a daily basis. This is important because sentiment analysis as a feedback loop works particularly well with Big Data, which explains why the private sector is all over it. If just one or two individuals in a community are displeased with service delivery during a disaster, they may simply be “an outlier”  or perhaps exaggerating. But if the sentiment analysis at the community level suddenly starts to dip, then this means hundreds, perhaps thousands of affected individuals are now all feeling the same way about a situation. In other words, sentiment analysis serves as a triangulating mechanism. The fact that the OCHA report makes no mention of this existing solution is unfortunate since sentiment feedback loops enable organizations to assess the impact of their interventions by capturing their clients’ perceptions.

Information forensics

“When dealing with the vast volume and complexity of information available in the network age, understanding how to assess the accuracy and utility of any data source becomes critical.” Indeed, and the BBC’s User-Generated Content (UGC) Hub has been doing just this since 2005—when Twitter didn’t even exist. The field of digital information forensics may be new to the humanitarian sector, but that doesn’t mean it is new to every other sector on the planet. Furthermore, recent research on crisis computing has revealed that the credibility of social media reporting can be modeled and even predicted. Twitter has even been called a “Truth Machine” because of the self-correcting dynamic that has been empirically observed. Finally, one of QCRI’s humanitarian technology projects, Verily, focuses precisely on the issue of verifying crowdsourced social media information from social media. And the first organization I reached out to for feedback on this project was OCHA.

microtasking

The OCHA report overlooks microtasking as well. Yes, the study does address and promote the use of crowdsourcing repeatedly, but again, this  tends to focus on the collection of information rather than the processing of said information. Microtasking applications in the humanitarian space are not totally unheard of, however. Microtasking was used to translate and geolocate tens of thousands of text messages following the Haiti Earthquake. (As the OCHA study notes, “some experts estimated that 90 per cent [of the SMS's] were ‘repetition’, or ‘white noise’, meaning useless chatter”). There have been several other high profile uses of microtasking for humanitarian operations such as this one thanks to OCHA’s leadership in response to Typhoon Pablo. In sum, microtasking has been used extensively in other sectors to manage the big data and quality control challenge for many years now. So this important human computing solution really ought to have appeared in the OCHA report along with the immense potential of microtasking humanitarian information using massive online multiplayer games (more here).

Open Data is Open Power

OCHA argues that “while information can be used by anyone, power remains concentrated in the hands of a limited number of decision makers.” So if the latter “do not use this information to make decisions in the interests of the people they serve, its value is lost.” I don’t agree that the value is lost. One of the reports’ main themes is the high-impact agency and ingenuity of disaster-affected communities. As OCHA rightly points out, “The terrain is continually shifting, and people are finding new and brilliant ways to cope with crises every day.” Openly accessible crisis information posted on social media has already been used by affected populations for almost a decade now. In other words, communities affected by crises are (quite rightly) taking matters into their own hands in today’s networked world—just like they did in the analog era of yesteryear. As noted earlier, “affected people [are] using technology to communicate, interact with and mobilize their social networks quicker than ever before […].” This explains why “the failure to share [information] is no longer a matter of institutional recalcitrance: it can cost lives.”

creative partnerships

The OCHA study emphasizes that “Humanitarian agencies can learn from other agencies, such as fire departments or militaries, on how to effectively respond to large amounts of often confusing information during a fast-moving crisis.” This is spot on. Situational awareness is first and foremost a military term. The latest Revolution in Military Affairs (RMA) provides important insights into the future of humanitarian technology—see these recent developments, for example. Mean-while, the London Fire Brigade has announced plans to add Twitter as a communication channel, which means city residents will have the option of reporting a fire alert via Twitter. Moreover, the 911 service in the US (999 in the UK) is quite possibly the oldest and longest running crowdsourced emergency service in the world. So there much that humanitarian can learn from 911. But the fact of the matter is that most domestic emergency response agencies are completely unprepared to deal with the tidal wave of Big (Crisis) Data, which is precisely why the Fire Department of New York City (FDNY) and San Francisco City’s Emergency Response Team have recently reached out to me.

Screen Shot 2013-04-04 at 11.08.13 AM

But some fields are way ahead of the curve. The OCHA report should thus have pointed to crime mapping and digital disease detection since these fields have more effectively navigated the big data challenge. As for the American Red Cross’s Digital Operations Center, the main technology they are using, Radian6, has been used by private sector clients for years now. And while the latter can afford the very expensive licensing fees, it is unlikely that cash-strapped domestic emergency response officers and international humanitarian organizations will ever be able to afford these advanced solutions. This is why we need more than just “Data Philanthropy“.

We also need “Data Science Philanthropy“. As the OCHA report states, decision-makers need to ”invest in building analytic capacity across the entire humanitarian network.” This is an obvious recommendation, but perhaps not particularly realistic given the limited budgets and capacity constraints in the humanitarian space. This means we need to create more partnerships with Data Science groups like DataKind, Kaggle and the University of Chicago’s Data Science for Social Good program. I’m in touch with these groups and others for this reason. I’ve also been (quietly) building a global academic network called “Data Science for Humanitarian Action” which will launch very soon. Open Source solutions are also imperative for building analytic capacity, which is why the humanitarian technology platforms being developed by QCRI will all be Open Source and freely available.

DISASTER RESILIENCE

This points to the following gap in the OCHA report: there is no reference whatsoever to resilience. While the study does recognize that collective self-help behavior is typical in disaster response and should be amplified, the report does not make the connection that this age-old mutual-aid dynamic is the humanitarian sector’s own lifeline during a major disaster. Resilience has to do with a community’s capacity for self-organization. Communication technologies increasingly play a pivotal role in self-organization. This explains why disaster preparedness and disaster risk reduction programs ought to place greater emphasis on building the capacity of at-risk communities to self-organize and mitigate the impact of disasters on their livelihoods. More about this here. Creating resilience through big data is also more academic curiosity, as explained here.

DECENTRALIZING RESPONSE

As more and more disaster-affected communities turn to social media in time of need, “Governments and responders will soon need answers to the questions: ‘Where were you? We Facebooked/tweeted/texted for help, why didn’t someone come?’” Again, customer support challenges are hardly unique to the humanitarian sector. Private sector companies have had to manage parallel problems by developing more advanced customer service platforms. Some have even turned to crowdsourcing to manage customer support. I blogged about this here to drive the point home that solutions to these humanitarian challenges already exist in other sectors.

Yes, that’s right, I am promoting the idea of crowdsourcing crisis response. Fact is, disaster response has always been crowdsourced. The real first responders are the disaster affected communities themselves. Thanks to new technologies, this crowdsourced response can be accelerated and made more efficient. And yes, there’s an app (in the making) for that: MatchApp. This too is a QCRI humanitarian technology project (in partnership with MIT’s Computer Science and Artificial Intelligence Lab). The purpose of MatchApp is to decentralize disaster response. After all, the many small needs that arise following a disaster rarely require the attention of paid and experienced emergency responders. Furthermore, as a colleague of mine at NYU shared based on her disaster efforts following Hurricane Sandy, “Solving little challenges can make the biggest differences” for disaster-affected communities.

As noted above, more and more individuals believe that emergency responders should monitor social media during disasters and respond accordingly. This is “likely to increase the pressure on humanitarian responders to define what they can and cannot provide. The extent of communities’ desires may exceed their immediate life-saving needs, raising expectations beyond those that humanitarian responders can meet. This can have dangerous consequences. Expectation management has always been important; it will become more so in the network age.”

Screen Shot 2013-04-04 at 11.20.15 AM

PEOPLE-CENTERED

“Community early warning systems (CEWS) can buy time for people to implement plans and reach safety during a crisis. The best CEWS link to external sources of assistance and include the pre-positioning of essential supplies.” At the same time, “communities do not need to wait for information to come from outside sources, […] they can monitor local hazards and vulnerabilities themselves and then shape the response.” This sense and shaping capacity builds resilience, which explains why “international humanitarian organizations must embrace the shift of warning systems to the community level, and help Governments and communities to prepare for, react and respond to emergencies using their own resources and networks.”

This is absolutely spot on and at least 7 years old as far as  UN policy goes. In 2006, the UN’s International Strategy for Disaster Risk Reduction (UNISDR) published this policy document advocating for a people-centered approach to early warning and response systems. They defined the purpose of such as systems as follows:

“… to empower individuals and communities threatened by hazards to act in sufficient time and in an appropriate manner so as to reduce the possibility of personal injury, loss of life, damage to property and the environment, and loss of livelihoods.”

Unfortunately, the OCHA report does not drive these insights to their logical conclusion. Disaster-affected communities are even more ill-equipped to manage the rise of Big (Crisis) Data. Storing, let alone analyzing Big Data Analytics in real-time, is a major technical challenge. As noted here vis-à-vis Big Data Analytics on Twitter, “only corporate actors and regulators—who possess both the intellectual and financial resources to succeed in this race—can afford to participate […].” Indeed, only a handful of research institutes have the technical ability and large funding base carry out the real-time analysis of Big (Crisis) Data. My team and I at QCRI, along with colleagues at UN Global Pulse and GSMA are trying to change this. In the meantime, however, the “Big Data Divide” is already here and very real.

information > Food

“Information is not water, food or shelter; on its own, it will not save lives. But in the list of priorities, it must come shortly after these.” While I understand the logic behind this assertion, I consider it a step back, not forward from the 2005 World Disaster Report (WDR 2005), which states that “People need information as much as water, food, medicine or shelter. Information can save lives, livelihoods and resources.” In fact, OCHA’s assertion contradicts an earlier statement in the report; namely that “information in itself is a life-saving need for people in crisis. It is as important as water, food and shelter.” Fact is: without information, how does one know where/when and from whom clean water and food might be available? How does one know which shelters are open, whether they can accommodate your family and whether the road to the shelter is safe to drive on?

Screen Shot 2013-04-08 at 5.39.51 AM

OCHA writes that, “Easy access to data and analysis, through technology, can help people make better life-saving decisions for themselves and mobilize the right types of external support. This can be as simple as ensuring that people know where to go and how to get help. But to do so effectively requires a clear understanding of how information flows locally and how people make decisions.” In sum, access to information is paramount, which means that local communities should have easy access to next generation humanitarian technologies that can manage and analyze Big Crisis Data. As a seasoned humanitarian colleague recently told me, “humanitarians sometimes have a misconception that all aid and relief comes through agencies.  In fact, (especially with things such a shelter) people start to recover on their own or within their communities. Thus, information is vital in assuring that they do this safely and properly.  Think of the Haiti, build-back-better campaign and the issues with cholera outbreaks.”

Them not us

The technologies of the network age should not be restricted to empowering second- and third-level responders. Unfortunately, as OCHA rightly observes, “there is still a tendency for people removed from a crisis to decide what is best for the people living through that crisis.” Moreover, these paid responders cannot be everywhere at the same time. But the crowd is always there. And as OCHA points out, there are “growing groups of people willing able to help those in need;” groups that unlike their analog counterparts of yesteryear now operate in the “network age with its increased reach of communications networks.” So information is not simply or “primarily a tool for agencies to decide how to help people, it must be understood as a product, or service, to help affected communities determine their own priorities.” Recall the above definition of people-centered early warning. This definition does not all of a sudden become obsolete in the network age. The purpose of next generation technologies is to “empower individuals and communities threatened by hazards to act in sufficient time and in an appropriate manner so as to reduce the possibility of personal injury, loss of life, damage to property and the environment, and loss of livelihoods.”

Screen Shot 2013-04-08 at 5.36.05 AM

Digital humanitarian volunteers are also highly unprepared to deal with the rise of Big Crisis Data, even though they are at the frontlines and indeed the pioneers of digital response. This explains why the Standby Volunteer Task Force (SBTF), a network of digital volunteers that OCHA refers to half-a-dozen times throughout the report, are actively looking to becoming early adopters of next generation humanitarian technologies. Burn out is a serious issue with digital volunteers. They too require access to these next generation technologies, which is precisely why the American Red Cross equips their digital volunteers with advanced computing platforms as part of their Digital Operations Center. Unfortunately, some humanitarians still think that they can just as easily throw more (virtual) volunteers at the Big Crisis Data challenge. Not only are they terribly misguided but also insensitive, which is why, As OCHA notes, “Using new forms of data may also require empowering technical experts to overrule the decisions of their less informed superiors.” As the OCHA study concludes, “Crowdsourcing is a powerful tool, but ensuring that scarce volunteer and technical resources are properly deployed will take further research and the expansion of collaborative models, such as SBTF.”

Conclusion

So will next generation humanitarian technology solve everything? Of course not, I don’t know anyone naïve enough to make this kind of claim. (But it is a common tactic used by the ignorant to attack humanitarian innovation). I have already warned about techno-centric tendencies in the past, such as here and here (see epilogue). Furthermore, one of the principal findings from this OECD report published in 2008 is that “An external, interventionist, and state-centric approach in early warning fuels disjointed and top down responses in situations that require integrated and multilevel action.” You can throw all the advanced computing technology you want at this dysfunctional structural problem but it won’t solve a thing. The OECD thus advocates for “micro-level” responses to crises because “these kinds of responses save lives.” Preparedness is obviously central to these micro-level responses and self-organization strategies. Shockingly, however, the OCHA study reveals that, “only 3% of humanitarian aid goes to disaster prevention and preparedness,” while barely “1% of all other development assistance goes towards disaster risk reduction.” This is no way to build disaster resilience. I doubt these figures will increase substantially in the near future.

This reality makes it even more pressing to ensure that “responders listen to affected people and find ways to respond to their priorities will require a mindset change.” To be sure, “If aid organizations are willing to listen, learn and encourage innovation on the front lines, they can play a critical role in building a more inclusive and more effective humanitarian system.” This need to listen and learn is why next generation humanitarian technologies are not optional. Ensuring that first, second and third-level responders have access to next generation humanitarian technologies is critical for the purposes of self-help, mutual aid and external response.

bio

A Research Framework for Next Generation Humanitarian Technology and Innovation

Humanitarian donors and organizations are increasingly championing innovation and the use of new technologies for humanitarian response. DfID, for example, is committed to using “innovative techniques and technologies more routinely in humanitarian response” (2011). In a more recent strategy paper, DfID confirmed that it would “continue to invest in new technologies” (2012). ALNAP’s important report on “The State of the Humanitarian System” documents the shift towards greater innovation, “with new funds and mechanisms designed to study and support innovation in humanitarian programming” (2012). A forthcoming land-mark study by OCHA makes the strongest case yet for the use and early adoption of new technologies for humanitarian response (2013).

picme8

These strategic policy documents are game-changers and pivotal to ushering in the next wave of humanitarian technology and innovation. That said, the reports are limited by the very fact that the authors are humanitarian professionals and thus not necessarily familiar with the field of advanced computing. The purpose of this post is therefore to set out a more detailed research framework for next generation humanitarian technology and innovation—one with a strong focus on information systems for crisis response and management.

In 2010, I wrote this piece on “The Humanitarian-Technology Divide and What To Do About It.” This divide became increasingly clear to me when I co-founded and co-directed the Harvard Humanitarian Initiative’s (HHI) Program on Crisis Mapping & Early Warning (2007-2009). So I co-founded the annual Inter-national CrisisMappers Conference series in 2009 and have continued to co-organize this unique, cross-disciplinary forum on humanitarian technology. The CrisisMappers Network also plays an important role in bridging the humanitarian and technology divide. My decision to join Ushahidi as Director of Crisis Mapping (2009-2012) was a strategic move to continue bridging the divide—and to do so from the technology side this time.

The same is true of my move to the Qatar Computing Research Institute (QCRI) at the Qatar Foundation. My experience at Ushahidi made me realize that serious expertise in Data Science is required to tackle the major challenges appearing on the horizon of humanitarian technology. Indeed, the key words missing from the DfID, ALNAP and OCHA innovation reports include: Data Science, Big Data Analytics, Artificial Intelligence, Machine Learning, Machine Translation and Human Computing. This current divide between the humanitarian and data science space needs to be bridged, which is precisely why I joined the Qatar Com-puting Research Institute as Director of Innovation; to develop and prototype the next generation of humanitarian technologies by working directly with experts in Data Science and Advanced Computing.

bridgetech

My efforts to bridge these communities also explains why I am co-organizing this year’s Workshop on “Social Web for Disaster Management” at the 2013 World Wide Web conference (WWW13). The WWW event series is one of the most prestigious conferences in the field of Advanced Computing. I have found that experts in this field are very interested and highly motivated to work on humanitarian technology challenges and crisis computing problems. As one of them recently told me: “We simply don’t know what projects or questions to prioritize or work on. We want questions, preferably hard questions, please!”

Yet the humanitarian innovation and technology reports cited above overlook the field of advanced computing. Their policy recommendations vis-a-vis future information systems for crisis response and management are vague at best. Yet one of the major challenges that the humanitarian sector faces is the rise of Big (Crisis) Data. I have already discussed this here, here and here, for example. The humanitarian community is woefully unprepared to deal with this tidal wave of user-generated crisis information. There are already more mobile phone sub-scriptions than people in 100+ countries. And fully 50% of the world’s population in developing countries will be using the Internet within the next 20 months—the current figure is 24%. Meanwhile, close to 250 million people were affected by disasters in 2010 alone. Since then, the number of new mobile phone subscrip-tions has increased by well over one billion, which means that disaster-affected communities today are increasingly likely to be digital communities as well.

In the Philippines, a country highly prone to “natural” disasters, 92% of Filipinos who access the web use Facebook. In early 2012, Filipinos sent an average of 2 billion text messages every day. When disaster strikes, some of these messages will contain information critical for situational awareness & rapid needs assess-ment. The innovation reports by DfID, ALNAP and OCHA emphasize time and time again that listening to local communities is a humanitarian imperative. As DfID notes, “there is a strong need to systematically involve beneficiaries in the collection and use of data to inform decision making. Currently the people directly affected by crises do not routinely have a voice, which makes it difficult for their needs be effectively addressed” (2012). But how exactly should we listen to millions of voices at once, let alone manage, verify and respond to these voices with potentially life-saving information? Over 20 million tweets were posted during Hurricane Sandy. In Japan, over half-a-million new users joined Twitter the day after the 2011 Earthquake. More than 177 million tweets about the disaster were posted that same day, i.e., 2,000 tweets per second on average.

Screen Shot 2013-03-20 at 1.42.25 PM

Of course, the volume and velocity of crisis information will vary from country to country and disaster to disaster. But the majority of humanitarian organizations do not have the technologies in place to handle smaller tidal waves either. Take the case of the recent Typhoon in the Philippines, for example. OCHA activated the Digital Humanitarian Network (DHN) to ask them to carry out a rapid damage assessment by analyzing the 20,000 tweets posted during the first 48 hours of Typhoon Pablo. In fact, one of the main reasons digital volunteer networks like the DHN and the Standby Volunteer Task Force (SBTF) exist is to provide humanitarian organizations with this kind of skilled surge capacity. But analyzing 20,000 tweets in 12 hours (mostly manually) is one thing, analyzing 20 million requires more than a few hundred dedicated volunteers. What’s more, we do not have the luxury of having months to carry out this analysis. Access to information is as important as access to food; and like food, information has a sell-by date.

We clearly need a research agenda to guide the development of next generation humanitarian technology. One such framework is proposed her. The Big (Crisis) Data challenge is composed of (at least) two major problems: (1) finding the needle in the haystack; (2) assessing the accuracy of that needle. In other words, identifying the signal in the noise and determining whether that signal is accurate. Both of these challenges are exacerbated by serious time con-straints. There are (at least) two ways too manage the Big Data challenge in real or near real-time: Human Computing and Artificial Intelligence. We know about these solutions because they have already been developed and used by other sectors and disciplines for several years now. In other words, our information problems are hardly as unique as we might think. Hence the importance of bridging the humanitarian and data science communities.

In sum, the Big Crisis Data challenge can be addressed using Human Computing (HC) and/or Artificial Intelligence (AI). Human Computing includes crowd-sourcing and microtasking. AI includes natural language processing and machine learning. A framework for next generation humanitarian technology and inno-vation must thus promote Research and Development (R&D) that apply these methodologies for humanitarian response. For example, Verily is a project that leverages HC for the verification of crowdsourced social media content generated during crises. In contrast, this here is an example of an AI approach to verification. The Standby Volunteer Task Force (SBTF) has used HC (micro-tasking) to analyze satellite imagery (Big Data) for humanitarian response. An-other novel HC approach to managing Big Data is the use of gaming, something called Playsourcing. AI for Disaster Response (AIDR) is an example of AI applied to humanitarian response. In many ways, though, AIDR combines AI with Human Computing, as does MatchApp. Such hybrid solutions should also be promoted   as part of the R&D framework on next generation humanitarian technology. 

There is of course more to humanitarian technology than information manage-ment alone. Related is the topic of Data Visualization, for example. There are also exciting innovations and developments in the use of drones or Unmanned Aerial Vehicles (UAVs), meshed mobile communication networks, hyper low-cost satellites, etc.. I am particularly interested in each of these areas will continue to blog about them. In the meantime, I very much welcome feedback on this post’s proposed research framework for humanitarian technology and innovation.

 bio

Analyzing Tweets Posted During Mumbai Terrorist Attacks

Over 1 million unique users posted more than 2.7 million tweets in just 3 days following the triple bomb blasts that struck Mumbai on July 13, 2011. Out of these, over 68,000 tweets were “original tweets” (in contrast to retweets) and related to the bombings. An analysis of these tweets yielded some interesting patterns. (Note that the Ushahidi Map of the bombings captured ~150 reports; more here).

One unique aspect of this study (PDF) is the methodology used to assess the quality of the Twitter dataset. The number of tweets per user was graphed in order to test for a power law distribution. The graph below shows the log distri-bution of the number of tweets per user. The straight lines suggests power law behavior. This finding is in line with previous research done on Twitter. So the authors conclude that the quality of the dataset is comparable to the quality of Twitter datasets used in other peer-reviewed studies.

I find this approach intriguing because Professor Michael Spagat, Dr. Ryan Woodard and I carried out related research on conflict data back in 2006. One fascinating research question that emerges from all this, and which could be applied to twitter datasets, is whether the slope of the power law says anything about the type of conflict/disaster being tweeted about, the expected number of casualties or even the propagation of rumors.  If you’re interested in pursuing this research question (and have worked with power laws before), please do get in touch. In the meantime, I challenge the authors’ suggestion that a power law distribution necessarily says anything about the quality or reliability of the underlying data. Using the casualty data from SyriaTracker (which is also used by USAID in their official crisis maps), my colleague Dr. Ryan Woodard showed that this dataset does not follow a power law distribution—even thought it is one of the most reliable on Syria.

Syria_PL

Moving on to the content analysis of the Mumbai blast tweets:  ”The number of URLs and @-mentions in tweets increase during the time of the crisis in com-parison to what researchers have exhibited for normal circumstances.” The table below lists the top 10 URLs shared on Twitter. Inter-estingly, the link to a Google Spreadsheet was amongst the most shared resource. Created by Twitter user Nitin Sagar, the spreadsheet was used to “coordinate relief operation among people. Within hours hundreds of people registered on the sheet via Twitter. People asked for or off ered help on that spreadsheet for many hours.”

The analysis also reveals that ”the number of tweets or updates by authority users (those with large number of followers) are very less, i.e., majority of content generated on Twitter during the crisis comes from non authority users.”  In addition, tweets generated by authority users have a high level of retweets. The results also indicate that “the number of tweets generated by people with large follower base (who are generally like government owned accounts, cele-brities, media companies) were very few. Thus, the majority of content generated at the time of crisis was from unknown users. It was also observed that, though the number of posts were less by users with large number of followers, these posts registered high numbers of retweets.”

Rumors related to the blasts also spread through Twitter. For example, rumors began to circulate about a fourth bomb going off. “Some tweets even speci fied locations of 4th blast as Lemington street, Colaba and Charni. Around 500+ tweets and retweets were posted about this.” False rumors about hospital blood banks needing donations were also propagated via Twitter. “They were initiated by a user, @KapoorChetan and around 2,000 tweets and retweets were made regarding this by Twitter users.” The authors of the study believe that such false rumors and can be prevented if credible sources like the mainstream media companies and the government post updates on social media more frequently.

I did a bit of research on this and found that NDTV did use their twitter feed (which has over half-a-million followers) to counter these rumors. For example, “RT @ndtv: Mumbai police: Don’t believe rumours of more bombs. False rumours being spread deliberately.” Journalist Sonal Kalra also acted to counter rumors: “RT @sonalkalra: BBMs about bombs found in Delhi are FALSE. Pls pls don’t spread rumours. #mumbaiblasts.”

In conclusion, the study considers the “privacy threats during the Twitter activity after the blasts. People openly tweeted their phone numbers on social media websites like Twitter, since at such moment of crisis people wished to reach out to help others. But, long after the crisis was over, such posts still remained publicly available on the Internet.” In addition, “people also openly posted their blood group, home address, etc. on Twitter to off er help to victims of the blasts.” The Ushahidi Map also includes personal information. These data privacy and security issues continue to pose major challenges vis-a-vis the use of social media for crisis response.

Bio

See also: Did Terrorists Use Twitter to Increase Situational Awareness? [Link]

Keynote: Next Generation Humanitarian Technology

I’m excited to be giving the Keynote address at the Social Media and Response Management Interface Event (SMARMIE 2013) in New York this morning. A big thank you to the principal driver behind this important event, Chuck Frank, for kindly inviting me to speak. This is my first major keynote since joining QCRI, so I’m thrilled to share what I’ve learned during this time and my vision for the future of humanitarian technology. But I’m even more excited by the selection of speakers and caliber of participants. I’m eager to learn about their latest projects, gain new insights and hopefully create pro-active partnerships moving forward.

You can follow this event via live stream and @smarmieNYC & #smarmie). I  plan to live tweeting the event at @patrickmeier. My slides are available for download here (125MB). Each slide include speaking notes, which may be of interest to folks who are unable to follow via live stream. Feel free to use my slides but strictly for non-commercial purposes and only with direct attribution. I’ll be sure to post the video of my talk on iRevolution when it becomes available. In the meantime, these videos and publications may be of interest. Also, I’ve curated the table of contents below with 60+ links to every project and/or concept referred to in my keynote and slides (in chronological order) so participants and others can revisit these after the conference—and more importantly keep our conver-sations going via Twitter and the comments section of the blog posts. I plan to hire a Research Assistant in the near future to turn these (and other posts) into a series of up-to-date e-books in which I’ll cite and fully credit the most interesting and insightful comments posted on iRevolution.

Social Media Pulse of Planet

http://iRevolution.net/2013/02/02/pulse-of-the-planet

http://iRevolution.net/2013/02/06/the-world-at-night

http://iRevolution.net/2011/04/20/network-witness

Big Crisis Data and Added Value

http://iRevolution.net/2011/06/22/no-data-bad-data

http://iRevolution.net/2012/02/26/mobile-technologies-crisis-mapping-disaster-response

http://iRevolution.net/2012/12/17/debating-tweets-disaster

http://iRevolution.net/2012/07/18/disaster-tweets-for-situational-awareness

http://iRevolution.net/2013/01/11/disaster-resilience-2-0

Standby Task Force (SBTF)

http://blog.standbytaskforce.com

http://iRevolution.net/2010/09/26/crisis-mappers-task-force

Libya Crisis Map

http://blog.standbytaskforce.com/libya-crisis-map-report

http://irevolution.net/2011/03/04/crisis-mapping-libya

http://iRevolution.net/2011/03/08/volunteers-behind-libya-crisis-map

http://iRevolution.net/2011/06/12/im-not-gaddafi-test

Philippines Crisis Map

http://iRevolution.net/2012/12/05/digital-response-to-typhoon-philippines

http://iRevolution.net/2012/12/08/digital-response-typhoon-pablo

http://iRevolution.net/2012/12/06/digital-disaster-response-typhoon

http://iRevolution.net/2012/06/03/geofeedia-for-crisis-mapping

http://iRevolution.net/2013/02/26/crowdflower-for-disaster-response

Digital Humanitarians 

http://www.digitalhumanitarians.com

Human Computation

http://iRevolution.net/2013/01/20/digital-humanitarian-micro-tasking

Human Computation for Disaster Response (submitted for publication)

Syria Crisis Map

http://iRevolution.net/2012/03/25/crisis-mapping-syria

http://iRevolution.net/2012/11/27/usaid-crisis-map-syria

http://iRevolution.net/2012/07/30/collaborative-social-media-analysis

http://iRevolution.net/2012/05/29/state-of-the-art-digital-disease-detection

Hybrid Systems for Disaster Response

http://iRevolution.net/2012/10/21/crowdsourcing-and-advanced-computing

http://iRevolution.net/2012/07/30/twitter-for-humanitarian-cluster

http://iRevolution.net/2013/02/11/update-twitter-dashboard

Credibility of Social Media: Compare to What?

http://iRevolution.net/2013/01/08/disaster-tweets-versus-911-calls

http://iRevolution.net/2010/09/22/911-system

Human Computed Crediblity 

http://iRevolution.net/2012/07/26/truth-and-social-media

http://iRevolution.net/2011/11/29/information-forensics-five-case-studies

http://iRevolution.net/2010/06/30/crowdsourcing-detective

http://iRevolution.net/2012/11/20/verifying-source-credibility

http://iRevolution.net/2012/09/16/accelerating-verification

http://iRevolution.net/2010/09/19/veracity-of-tweets-during-a-major-crisis

http://iRevolution.net/2011/03/26/technology-to-counter-rumors

http://iRevolution.net/2012/03/10/truthiness-as-probability

http://iRevolution.net/2013/01/27/mythbuster-tweets

http://iRevolution.net/2012/10/31/hurricane-sandy

http://iRevolution.net/2012/07/16/crowdsourcing-for-human-rights-monitoring-challenges-and-opportunities-for-information-collection-verification

Verily: Crowdsourced Verification

http://iRevolution.net/2013/02/19/verily-crowdsourcing-evidence

http://iRevolution.net/2011/11/06/time-critical-crowdsourcing

http://iRevolution.net/2012/09/18/six-degrees-verification

http://iRevolution.net/2011/09/26/augmented-reality-crisis-mapping

AI Computed Credibility

http://iRevolution.net/2012/12/03/predicting-credibility

http://iRevolution.net/2012/12/10/ranking-credibility-of-tweets

Future of Humanitarian Tech

http://iRevolution.net/2012/04/17/red-cross-digital-ops

http://iRevolution.net/2012/11/15/live-global-twitter-map

http://iRevolution.net/2013/02/16/crisis-mapping-minority-report

http://iRevolution.net/2012/04/09/humanitarian-future

http://iRevolution.net/2011/08/22/khan-borneo-galaxies

http://iRevolution.net/2010/03/24/games-to-turksource

http://iRevolution.net/2010/07/08/cognitive-surplus

http://iRevolution.net/2010/08/14/crowd-is-always-there

http://iRevolution.net/2011/09/14/crowdsource-crisis-response

http://iRevolution.net/2012/07/04/match-com-for-economic-resilience

http://iRevolution.net/2013/02/27/matchapp-disaster-response-app

http://iRevolution.net/2013/01/07/what-waze-can-teach-us

Policy

http://iRevolution.net/2012/12/04/catch-22

http://iRevolution.net/2012/02/05/iom-data-protection

http://iRevolution.net/2013/01/23/perils-of-crisis-mapping

http://iRevolution.net/2013/02/25/launching-sms-code-of-conduct

http://iRevolution.net/2013/02/26/haiti-lies

http://iRevolution.net/2012/06/04/big-data-philanthropy-for-humanitarian-response

http://iRevolution.net/2012/07/25/become-a-data-donor

Bio

ps. Please let me know if you find any broken links so I can fix them, thank you!

Did Terrorists Use Twitter to Increase Situational Awareness?

Those who are still skeptical about the value of Twitter for real-time situational awareness during a crisis ought to ask why terrorists likely think otherwise. In 2008, terrorists carried out multiple attacks on Mumbai in what many refer to as the worst terrorist incident in Indian history. This study, summarized below, explains how the terrorists in question could have used social media for coor-dination and decision-making purposes.

The study argues that “the situational information which was broadcast through live media and Twitter contributed to the terrorists’ decision making process and, as a result, it enhanced the effectiveness of hand-held weapons to accomplish their terrorist goal.” To be sure, the “sharing of real time situational information on the move can enable the ‘sophisticated usage of the most primitive weapons.’” In sum, ”unregulated real time Twitter postings can contribute to increase the level of situation awareness for terrorist groups to make their attack decision.”

According to the study, “an analysis of satellite phone conversations between terrorist commandos in Mumbai and remote handlers in Pakistan shows that the remote handlers in Pakistan were monitoring the situation in Mumbai through live media, and delivered specific and situational attack commands through satellite phones to field terrorists in Mumbai.” These conversations provide “evidence that the Mumbai terrorist groups understood the value of up-to-date situation information during the terrorist operation. [...] They under-stood that the loss of information superiority can compromise their operational goal.”

Handler: See, the media is saying that you guys are now in room no. 360 or 361. How did they come to know the room you guys are in?…Is there a camera installed there? Switch off all the lights…If you spot a camera, fire on it…see, they should not know at any cost how many of you are in the hotel, what condition you are in, where you are, things like that… these will compromise your security and also our operation […]

Terrorist: I don’t know how it happened…I can’t see a camera anywhere.

A subsequent phone conversation reveals that “the terrorists group used the web search engine to increase their decision making quality by employing the search engine as a complement to live TV which does not provide detailed information of specific hostages. For instance, to make a decision if they need to kill a hostage who was residing in the Taj hotel, a field attacker reported the identity of a hostage to the remote controller, and a remote controller used a search engine to obtain the detailed information about him.”

Terrorist: He is saying his full name is K.R.Ramamoorthy.

Handler: K.R. Ramamoorthy. Who is he? … A designer … A professor … Yes, yes, I got it …[The caller was doing an internet search on the name, and a results showed up a picture of Ramamoorthy] … Okay, is he wearing glasses? [The caller wanted to match the image on his computer with the man before the terrorists.]

Terrorist: He is not wearing glasses. Hey, … where are your glasses?

Handler: … Is he bald from the front?

Terrorist: Yes, he is bald from the front …

The terrorist group had three specific political agendas: “(1) an anti-India agenda, (2) an anti-Israel and anti-Jewish agenda, and (3) an anti-US and anti-Nato agenda.” A content analysis of 900+ tweets posted during the attacks reveal whether said tweets may have provided situational awareness information in support of these three political goals. The results: 18% of tweets contained “situa-tional information which can be helpful for Mumbai terrorist groups to make an operational decision of achieving their Anti-India political agenda. Also, 11.34% and 4.6% of posts contained operationally sensitive information which may help terrorist groups to make an operational decision of achieving their political goals of Anti-Israel/Anti-Jewish and Anti-US/Anti-Nato respectively.”

In addition, the content analysis found that “Twitter site played a significant role in relaying situational information to the mainstream media, which was monitored by Mumbai terrorists. Therefore, we conclude that the Mumbai Twitter page in-directly contributed to enhancing the situational awareness level of Mumbai terrorists, although we cannot exclude the possibility of its direct contribution as well.”

In conclusion, the study stresses the importance analyzing a terrorist group’s political goals in order to develop an appropriate information control strategy. “Because terrorists’ political goals function as interpretative filters to process situational information, understanding of adversaries’ political goals may reduce costs for security operation teams to monitor and decide which tweets need to be controlled.”

bio

Update: Twitter Dashboard for Disaster Response

Project name: Artificial Intelligence for Disaster Response (AIDR)

My Crisis Computing Team and I at QCRI have been working hard on the Twitter Dashboard for Disaster Response. We first announced the project on iRevolution last year. The experimental research we’ve carried out since has been particularly insightful vis-a-vis the opportunities and challenges of building such a Dashboard. We’re now using the findings from our empirical research to inform the next phase of the project—namely building the prototype for our humanitarian colleagues to experiment with so we can iterate and improve the platform as we move forward.

KnightDash

Manually processing disaster tweets is becoming increasingly difficult and unrealistic. Over 20 million tweets were posted during Hurricane Sandy, for example. This is the main problem that our Twitter Dashboard aims to solve. There are two ways to manage this challenge of Big (Crisis) Data: Advanced Computing and Human Computation. The former entails the use of machine learning algorithms to automatically tag tweets while the latter involves the use of microtasking, which I often refer to as Smart Crowdsourcing. Our Twitter Dashboard seeks to combine the best of both methodologies.

On the Advanced Computing side, we’ve developed a number of classifiers that automatically identify tweets that:

  • Contain informative content (in contrast to personal messages or information unhelpful for disaster response);
  • Are posted by eye-witnesses (as opposed to 2nd-hand reporting);
  • Include pictures, video footage, mentions from TV/radio
  • Report casualties and infrastructure damage;
  • Relate to people missing, seen and/or found;
  • Communicate caution and advice;
  • Call for help and important needs;
  • Offer help and support.

These classifiers are developed using state-of-the-art machine learning tech-niques. This simply means that we take a Twitter dataset of a disaster, say Hurricane Sandy, and develop clear definitions for “Informative Content,” “Eye-witness accounts,” etc. We use this classification system to tag a random sample of tweets from the dataset (usually 100+ tweets). We then “teach” algorithms to find these different topics in the rest of the dataset. We tweak said algorithms to make them as accurate as possible; much like training a dog new tricks like go-fetch (wink).

fetchball

We’ve found from this research that the classifiers are quite accurate but sensitive to the type of disaster being analyzed and also the country in which said disaster occurs. For example, a set of classifiers developed from tweets posted during Hurricane Sandy tend to be less accurate when applied to tweets posted for New Zealand’s earthquake. Each classifier is developed based on tweets posted during a specific disaster. In other words, while the classifiers can be highly accurate (i.e., tweets are correctly tagged as being damage-related, for example), they only tend to be accurate for the type of disaster they’ve been trained for, e.g., weather-related disasters (tornadoes), earth-related (earth-quakes) and water-related (floods).

So we’ve been busy trying to collect as many Twitter datasets of different disasters as possible, which has been particularly challenging and seriously time-consuming given Twitter’s highly restrictive Terms of Service, which prevents the direct sharing of Twitter datasets—even for humanitarian purposes. This means we’ve had to spend a considerable amount of time re-creating Twitter datasets for past disasters; datasets that other research groups and academics have already crawled and collected. Thank you, Twitter. Clearly, we can’t collect every single tweet for every disaster that has occurred over the past five years or we’ll never get to actually developing the Dashboard.

That said, some of the most interesting Twitter disaster datasets are of recent (and indeed future) disasters. Truth be told, tweets were still largely US-centric before 2010. But the international coverage has since increased, along with the number of new Twitter users, which almost doubled in 2012 alone (more neat stats here). This in part explains why more and more Twitter users actively tweet during disasters. There is also a demonstration effect. That is, the international media coverage of social media use during Hurricane Sandy, for example, is likely to prompt citizens in other countries to replicate this kind of pro-active social media use when disaster knocks on their doors.

So where does this leave us vis-a-vis the Twitter Dashboard for Disaster Response? Simply that a hybrid approach is necessary (see TEDx talk above). That is, the Dashboard we’re developing will have a number of pre-developed classifiers based on as many datasets as we can get our hands on (categorized by disaster type). In addition to that, the dashboard will also allow users to create their own classifiers on the fly by leveraging human computation. They’ll also be able to microtask the creation of new classifiers.

In other words, what they’ll do is this:

  • Enter a search query on the dashboard, e.g., #Sandy.
  • Click on “Create Classifier” for #Sandy.
  • Create a label for the new classifier, e.g., “Animal Rescue”.
  • Tag 50+ #Sandy tweets that convey content about animal rescue.
  • Click “Run Animal Rescue Classifier” on new incoming tweets.

The new classifier will then automatically tag incoming tweets. Of course, the classifier won’t get it completely right. But the beauty here is that the user can “teach” the classifier not to make the same mistakes, which means the classifier continues to learn and improve over time. On the geo-location side of things, it is indeed true that only ~3% of all tweets are geotagged by users. But this figure can be boosted to 30% using full-text geo-coding (as was done the TwitterBeat project). Some believe this figure can be doubled (towards 75%) by applying Google Translate to the full-text geo-coding. The remaining users can be queried via Twitter for their location and that of the events they are reporting.

So that’s where we’re at with the project. Ultimately, we envision these classifiers to be like individual apps that can be used/created, dragged and dropped on an intuitive widget-like dashboard with various data visualization options. As noted in my previous post, everything we’re building will be freely accessible and open source. And of course we hope to include classifiers for other languages beyond English, such as Arabic, Spanish and French. Again, however, this is purely experimental research for the time being; we want to be crystal clear about this in order to manage expectations. There is still much work to be done.

In the meantime, please feel free to get in touch if you have disaster datasets you can contribute to these efforts (we promise not to tell Twitter). If you’ve developed classifiers that you think could be used for disaster response and you’re willing to share them, please also get in touch. If you’d like to join this project and have the required skill sets, then get in touch, we may be able to hire you! Finally, if you’re an interested end-user or want to share some thoughts and suggestions as we embark on this next phase of the project, please do also get in touch. Thank you!

bio

Big Data for Development: From Information to Knowledge Societies?

Unlike analog information, “digital information inherently leaves a trace that can be analyzed (in real-time or later on).” But the “crux of the ‘Big Data’ paradigm is actually not the increasingly large amount of data itself, but its analysis for intelligent decision-making (in this sense, the term ‘Big Data Analysis’ would actually be more fitting than the term ‘Big Data’ by itself).” Martin Hilbert describes this as the “natural next step in the evolution from the ‘Information Age’ & ‘Information Societies’ to ‘Knowledge Societies’ [...].”

Hilbert has just published this study on the prospects of Big Data for inter-national development. “From a macro-perspective, it is expected that Big Data informed decision-making will have a similar positive effect on efficiency and productivity as ICT have had during the recent decade.” Hilbert references a 2011 study that concluded the following: “firms that adopted Big Data Analysis have output and productivity that is 5–6 % higher than what would be expected given their other investments and information technology usage.” Can these efficiency gains be brought to the unruly world of international development?

To answer this question, Hilbert introduces the above conceptual framework to “systematically review literature and empirical evidence related to the pre-requisites, opportunities and threats of Big Data Analysis for international development.” Words, Locations, Nature and Behavior are types of data that are becoming increasingly available in large volumes.

“Analyzing comments, searches or online posts [i.e., Words] can produce nearly the same results for statistical inference as household surveys and polls.” For example, “the simple number of Google searches for the word ‘unemployment’ in the U.S. correlates very closely with actual unemployment data from the Bureau of Labor Statistics.” Hilbert argues that the tremendous volume of free textual data makes “the work and time-intensive need for statistical sampling seem almost obsolete.” But while the “large amount of data makes the sampling error irrelevant, this does not automatically make the sample representative.” 

The increasing availability of Location data (via GPS-enabled mobile phones or RFIDs) needs no further explanation. Nature refers to data on natural processes such as temperature and rainfall. Behavior denotes activities that can be captured through digital means, such as user-behavior in multiplayer online games or economic affairs, for example. But “studying digital traces might not automatically give us insights into offline dynamics. Besides these biases in the source, the data-cleaning process of unstructured Big Data frequently introduces additional subjectivity.”

The availability and analysis of Big Data is obviously limited in areas with scant access to tangible hardware infrastructure. This corresponds to the “Infra-structure” variable in Hilbert’s framework. “Generic Services” refers to the production, adoption and adaptation of software products, since these are a “key ingredient for a thriving Big Data environment.” In addition, the exploitation of Big Data also requires “data-savvy managers and analysts and deep analytical talent, as well as capabilities in machine learning and computer science.” This corresponds to “Capacities and Knowledge Skills” in the framework.

The third and final side of the framework represents the types of policies that are necessary to actualize the potential of Big Data for international develop-ment. These policies are divided into those that elicit a Positive Feedback Loops such as financial incentives and those that create regulations such as interoperability, that is, Negative Feedback Loops.

The added value of Big Data Analytics is also dependent on the availability of publicly accessible data, i.e., Open Data. Hilbert estimates that a quarter of US government data could be used for Big Data Analysis if it were made available to the public. There is a clear return on investment in opening up this data. On average, governments with “more than 500 publicly available databases on their open data online portals have 2.5 times the per capita income, and 1.5 times more perceived transparency than their counterparts with less than 500 public databases.” The direction of “causality” here is questionable, however.

Hilbert concludes with a warning. The Big Data paradigm “inevitably creates a new dimension of the digital divide: a divide in the capacity to place the analytic treatment of data at the forefront of informed decision-making. This divide does not only refer to the availability of information, but to intelligent decision-making and therefore to a divide in (data-based) knowledge.” While the advent of Big Data Analysis is certainly not a panacea,”in a world where we desperately need further insights into development dynamics, Big Data Analysis can be an important tool to contribute to our understanding of and improve our contributions to manifold development challenges.”

I am troubled by the study’s assumption that we live in a Newtonian world of decision-making in which for every action there is an automatic equal and opposite reaction. The fact of the matter is that the vast majority of development policies and decisions are not based on empirical evidence. Indeed, rigorous evidence-based policy-making and interventions are still very much the exception rather than the rule in international development. Why? “Account-ability is often the unhappy byproduct rather than desirable outcome of innovative analytics. Greater accountability makes people nervous” (Harvard 2013). Moreover, response is always political. But Big Data Analysis runs the risk de-politicize a problem. As Alex de Waal noted over 15 years ago, “one universal tendency stands out: technical solutions are promoted at the expense of political ones.” I hinted at this concern when I first blogged about the UN Global Pulse back in 2009.

In sum, James Scott (one of my heroes) puts it best in his latest book:

“Applying scientific laws and quantitative measurement to most social problems would, modernists believed, eliminate the sterile debates once the ‘facts’ were known. [...] There are, on this account, facts (usually numerical) that require no interpretation. Reliance on such facts should reduce the destructive play of narratives, sentiment, prejudices, habits, hyperbole and emotion generally in public life. [...] Both the passions and the interests would be replaced by neutral, technical judgment. [...] This aspiration was seen as a new ‘civilizing project.’ The reformist, cerebral Progressives in early twentieth-century American and, oddly enough, Lenin as well believed that objective scientific knowledge would allow the ‘administration of things’ to largely replace politics. Their gospel of efficiency, technical training and engineering solutions implied a world directed by a trained, rational, and professional managerial elite. [...].”

“Beneath this appearance, of course, cost-benefit analysis is deeply political. Its politics are buried deep in the techniques [...] how to measure it, in what scale to use, [...] in how observations are translated into numerical values, and in how these numerical values are used in decision making. While fending off charges of bias or favoritism, such techniques [...] succeed brilliantly in entrenching a political agenda at the level of procedures and conventions of calculation that is doubly opaque and inaccessible. [...] Charged with bias, the official can claim, with some truth, that ‘I am just cranking the handle” of a nonpolitical decision-making machine.”

See also:

  • Big Data for Development: Challenges and Opportunities [Link]
  • Beware the Big Errors of Big Data (by Nassim Taleb) [Link]
  • How to Build Resilience Through Big Data [Link]