Tag Archives: time

Social Media as Passive Polling: Prospects for Development & Disaster Response

My Harvard/MIT colleague Todd Mostak wrote his award-winning Master’s Thesis on “Social Media as Passive Polling: Using Twitter and Online Forums to Map Islamism in Egypt.” For this research, Todd evaluated the “potential of Twitter as a source of time-stamped, geocoded public opinion data in the context of the recent popular uprisings in the Middle East.” More specifically, “he explored three ways of measuring a Twitter user’s degree of political Islamism.” Why? Because he wanted to test the long-standing debate on whether Islamism is associated with poverty.

Screen Shot 2013-02-18 at 11.17.09 AM

So Todd collected millions of geo-tagged tweets from Egypt over a six month period, which he then aggregated by census district in order to regress proxies for poverty against measures of Islamism drived from the tweets and the users’ social graphs. His findings reveal that “Islamist sentiment seems to be positively correlated with male unemployment, illiteracy, and percentage of land used in agriculture and negatively correlated with percentage of men in their youth aged 15-25. Note that female variables for unemployment and age were statistically insignificant.” As with all research, there are caveats such as the weighting scale used for the variables and questions over the reliability of census variables.

Screen Shot 2013-02-18 at 11.15.59 AM

To carry out his graduate research, Todd built a web-enabled database (MapD) powered by a Graphics Processing Units (GPU) to perform real-time querying and visualization of big datasets. He is now working with Harvard’s Center for Geographic Analysis (CGA) to put make this available via a public web interface called Tweetmap. This Big Data streaming and exploration tool presen-tly displays 119 million tweets from 12/10/2012 to 12/31/2012. He is adding 6-7 million new georeferenced tweets per day (but these are not yet publicly available on Tweetmap). According to Todd, the time delay from live tweet to display on the map is about 1 second. Thanks to this GPU-powered approach, he expects that billions of tweets could be displayed in real-time.

Screen Shot 2013-02-18 at 11.14.02 AM

As always with impressive projects, no one single person was behind the entire effort. Ben Lewis, who heads the WorldMap initiative at CGA deserves a lot of credit for making Tweetmap a reality. Indeed, Todd collaborated directly with CGA’s Ben Lewis throughout this project and benefited extensively from his expertise. Matt Bertrand (lead developer for CGA) did the WorldMap-side integration of MapD to create the TweetMap interface.

Todd and I recently spoke about integrating his outstanding work on automated live mapping to QCRI’s Twitter Dashboard for Disaster Response. Exciting times. In the meantime, Todd has kindly shared his dataset of 700+ million geotagged tweets for my team and I to analyze. The reason I’m excited about this approach is best explained with this heatmap of the recent snow-storm in the northeastern US. Todd is already using Tweetmap for live crisis mapping. While this system filters by keyword, our Dashboard will use machine learning to provide more specific streams of relevant tweets, some of which could be automatically mapped on Tweetmap. See Todd’s Flickr page for more Tweetmap visuals.

Screen Shot 2013-02-18 at 11.30.54 AM

I’m also excited by Todd’s GPU-powered approach for a project I’m exploring with UN and World Bank colleagues. The purpose of that research project is to determine whether socio-economic trends such as poverty and unemployment can be captured via Twitter. Our first case study is Egypt. Depending on the results, we may be able to take it one step further by applying sentiment analysis to real-time, georeferenced tweets to visualize Twitter users’ per-ception vis-a-vis government services—a point of interest for my UN colleagues in Cairo.

bio

Some Thoughts on Real-Time Awareness for Tech@State

I’ve been invited to present at Tech@State in Washington DC to share some thoughts on the future of real-time awareness. So I thought I’d use my blog to brainstorm and invite feedback from iRevolution readers. The organizers of the event have shared the following questions with me as a way to guide the conver-sation: Where is all of this headed?  What will social media look like in five to ten years and what will we do with all of the data? Knowing that the data stream can only increase in size, what can we do now to prepare and prevent being over-whelmed by the sheer volume of data?

These are big, open-ended questions, and I will only have 5 minutes to share some preliminary thoughts. I shall thus focus on how time-critical crowdsourcing can yield real-time awareness and expand from there.

Two years ago, my good friend and colleague Riley Crane won DARPA’s $40,000 Red Balloon Competition. His team at MIT found the location of 10 weather balloons hidden across the continental US in under 9 hours. The US covers more than 3.7 million square miles and the balloons were barely 8 feet wide. This was truly a needle-in-the-haystack kind of challenge. So how did they do it? They used crowdsourcing and leveraged social media—Twitter in particular—by using a “recursive incentive mechanism” to recruit thousands of volunteers to the cause. This mechanism would basically reward individual participants financially based on how important their contributions were to the location of one or more balloons. The result? Real-time, networked awareness.

Around the same time that Riley and his team celebrated their victory at MIT, another novel crowdsourcing initiative was taking place just a few miles away at The Fletcher School. Hundreds of students were busy combing through social and mainstream media channels for actionable and mappable information on Haiti following the devastating earthquake that had struck Port-au-Prince. This content was then mapped on the Ushahidi-Haiti Crisis Map, providing real-time situational awareness to first responders like the US Coast Guard and US Marine Corps. At the same time, hundreds of volunteers from the Haitian Diaspora were busy translating and geo-coding tens of thousands of text messages from disaster-affected communities in Haiti who were texting in their location & most urgent needs to a dedicated SMS short code. Fletcher School students filtered and mapped the most urgent and actionable of these text messages as well.

One year after Haiti, the United Nation’s Office for the Coordination of Humanitarian Affairs (OCHA) asked the Standby Volunteer Task Force (SBTF) , a global network of 700+ volunteers, for a real-time map of crowdsourced social media information on Libya in order to improve their own situational awareness. Thus was born the Libya Crisis Map.

The result? The Head of OCHA’s Information Services Section at the time sent an email to SBTF volunteers to commend them for their novel efforts. In this email, he wrote:

“Your efforts at tackling a difficult problem have definitely reduced the information overload; sorting through the multitude of signals on the crisis is no easy task. The Task Force has given us an output that is manageable and digestible, which in turn contributes to better situational awareness and decision making.”

These three examples from the US, Haiti and Libya demonstrate what is already possible with time-critical crowdsourcing and social media. So where is all this headed? You may have noted from each of these examples that their success relied on the individual actions of hundreds and sometimes thousands of volunteers. This is primarily because automated solutions to filter and curate the data stream are not yet available (or rather accessible) to the wider public. Indeed, these solutions tend to be proprietary, expensive and/or classified. I thus expect to see free and open source solutions crop up in the near future; solutions that will radically democratize the tools needed to gain shared, real-time awareness.

But automated natural language processing (NLP) and machine learning alone are not likely to succeed, in my opinion. The data stream is actually not a stream, it is a massive torent of non-indexed information, a 24-hour global firehose of real-time, distributed multi-media data that continues to outpace our ability to produce actionable intelligence from this torrential downpour of 0’s and 1’s. To turn this data tsunami into real-time shared awareness will require that our filtering and curation platforms become more automated and collaborative. I believe the key is thus to combine automated solutions with real-time collabora-tive crowdsourcing tools—that is, platforms that enable crowds to collaboratively filter and curate real-time information, in real-time.

Right now, when we comb through Twitter, for example, we do so on our own, sitting behind our laptop, isolated from others who may be seeking to filter the exact same type of content. We need to develop free and open source platforms that allow for the distributed-but-networked, crowdsourced filtering and curation of information in order to democratize the sense-making of the firehose. Only then will the wider public be able to win the equivalent of Red Balloon competitions without needing $40,000 or a degree from MIT.

I’d love to get feedback from readers about what other compelling cases or arguments I should bring up in my presentation tomorrow. So feel free to post some suggestions in the comments section below. Thank you!

Time-Critical Crowdsourcing for Social Mobilization and Crowd-Solving

My good friend Riley Crane just co-authored a very interesting study entitled “Time-Critical Social Mobilization” in the peer-reviewed journal Science. Riley spearheaded the team at MIT that won the DARPA Red Balloon competition last year. His team found the locations of all 10 weather balloons hidden around the continental US in under 9 hours. While we were already discussing alternative approaches to crowdsourcing for social impact before the competition, the approach he designed to win the competition certainly gave us a whole lot more to talk about given the work I’d been doing on crowd sourcing crisis information and near real-time crisis mapping.

Crowd-solving non-trivial problems in quasi real-time poses two important challenges. A very large number of participants is typically required couple with extremely fast execution. Another common challenge is the need for some sort of search process. “For example, search may be conducted by members of the mobilized community for survivors after a natural disaster.” Recruiting large numbers of participants, however, requires that individuals be motivated to actually conduct the search and participate in the information diffusion. Clearly, “providing appropriate incentives is a key challenge in social mobilization.”

This explains the rationale behind DARPA decision to launch their Red Balloon Challenge: “to explore the roles the Internet and social networking play in the timely communication, wide-area team-building, and urgent mobilization required to solve broad-scope, time-critical problems.” So 10 red weather balloons were discretely placed at different locations in the continental US. A senior analyst at the National Geospatial-Intelligence Agency is said to have characterized the challenge is impossible for conventional intelligence-gathering methods. Riley’s team found all 10 balloons in 8 hours and 36 minutes. How did they do it?

Some 36 hours before the start of the challenge, the team at MIT had already recruited over 4,000 participants using a “recursive incentive mechanism.” They used the $40,000 prize money that would be awarded by the winners of the challenge as a “financial incentive structure rewarding not only the people who correctly located the balloons but also those connecting the finder [back to the MIT team].” If Riley and colleagues won:

we would allocate $4000 in prize money to each of the 10 balloons. We promised $2000 per balloon to the first person to send in the cor- rect balloon coordinates. We promised $1000 to the person who invited that balloon finder onto the team, $500 to whoever invited the in- viter, $250 to whoever invited that person, and so on. The underlying structure of the “recursive incentive” was that whenever a person received prize money for any reason, the person who in- vited them would also receive money equal to half that awarded to their invitee

In other words, the reward offers by Team MIT “scales with the size of the entire recruitment tree (because larger trees are more likely to succeed), rather than depending solely on the immediate recruited friends.” What is stunning about Riley et al.’s approach is that their “attrition rate” was almost half the rate of other comparable social network experiments. In other words, participants in the MIT recruitment tree were about twice as likely to “play the game” so-to-speak rather than give up. In addition, the number recruited by each individual followed a power law distribution, which suggests a possible tipping point dynamic.

In conclusion, the mechanism devised by the winning team “simultaneously provides incentives for participation and for recruiting more individuals to the cause.” So what insights does this study provide vis-a-vis live crisis mapping initiatives that are volunteer-based, like those spearheaded by the Standby Volunteer Task Force (SBTF) and the Humanitarian OpenStreetMap (HOT) communities? While these networks don’t have any funding to pay volunteers (this would go against the spirit of volunteerism in any case), I think a number of insights can nevertheless be drawn.

In the volunteer sector, the “currency of exchange” is credit. That is, the knowledge and acknowledgement that I participated in the Libya Crisis Map to support the UN’s humanitarian operations, for example. I recently introduced SBTF “deployment badges” to serve in part the public acknowledgment incentive. SBTF volunteers can now add badges for deployments there were engaged in, e.g., “Sudan 2011″; “New Zealand 2011″, etc.

What about using a recursive credit mechanism? For example, it would be ideal if volunteers could find out how a given report they worked on was ultimately used by a humanitarian colleague monitoring a live map. Using the Red Balloon analogy, the person who finds the balloon should be able to reward all those in her “recruitment tree” or in our case “SBTF network”. Lets say Helena works for the UN and used the Libya Crisis Map whilst in Tripoli. She finds an important report on the map and shares this with her colleagues on the Tunisian border who decide to take some kind of action as a result. Now lets say this report came from a tweet that Chrissy in the Media Monitoring Team found while volunteering on the deployment. She shared the tweet with Jess in the GPS Team who found the coordinates for the location referred to in that tweet. Melissa then added this to the live map being monitored by the UN. Wouldn’t be be ideal if each could be sent an email letting them know about Helena’s response? I realize this isn’t trivial to implement but what would have to be in place to make something like this actually happen? Any thoughts?

On the recruitment side, we haven’t really done anything explicitly to incentivize current volunteers to recruit additional volunteers. Could we incentivize this beyond giving credit? Perhaps we could design a game-like point system? Or a fun ranking system with different titles assigned according to the number of volunteers recruited? Another thought would be to simply ask existing volunteers to recruit one or two additional volunteers every year. We currently have about 700 volunteers in the SBTF, so this might be one way to increase substantially in size.

I’m not sure what type of mechanism we could devise to simultaneously provide incentives for participation and recruitment. Perhaps those incentives already exist, in the sense that the SBTF response to international crises, which perhaps serves as a sufficient draw. I’d love to hear what iRevolution readers think, especially if you have good ideas that we could realistically implement!