Category Archives: Crowdsourcing

Social Media for Disaster Response – Done Right!

To say that Indonesia’s capital is prone to flooding would be an understatement. Well over 40% of Jakarta is at or below sea level. Add to this a rapidly growing population of over 10 million and you have a recipe for recurring disasters. Increasing the resilience of the city’s residents to flooding is thus imperative. Resilience is the capacity of affected individuals to self-organize effectively, which requires timely decision-making based on accurate, actionable and real-time information. But Jakarta is also flooded with information during disasters. Indeed, the Indonesian capital is the world’s most active Twitter city.

JK1

So even if relevant, actionable information on rising flood levels could somehow be gleaned from millions of tweets in real-time, these reports could be inaccurate or completely false. Besides, only 3% of tweets on average are geo-located, which means any reliable evidence of flooding reported via Twitter is typically not actionable—that is, unless local residents and responders know where waters are rising, they can’t take tactical action in a timely manner. These major challenges explain why most discount the value of social media for disaster response.

But Digital Humanitarians in Jakarta aren’t your average Digital Humanitarians. These Digital Jedis recently launched one of the most promising humanitarian technology initiatives I’ve seen in years. Code named Peta Jakarta, the project takes social media and digital humanitarian action to the next level. Whenever someone posts a tweet with the word banjir (flood), they receive an automated tweet reply from @PetaJkt inviting them to confirm whether they see signs of flooding in their area: “Flooding? Enable geo-location, tweet @petajkt #banjir and check petajakarta.org.” The user can confirm their report by turning geo-location on and simply replying with the keyword banjir or flood. The result gets added to a live, public crisis map, like the one below.

Credit: Peta Jakarta

Over the course of the 2014/2015 monsoon season, Peta Jakarta automatically sent 89,000 tweets to citizens in Jakarta as a call to action to confirm flood conditions. These automated invitation tweets served to inform the user about the project and linked to the video below (via Twitter Cards) to provide simple instructions on how to submit a confirmed report with approximate flood levels. If a Twitter user forgets to turn on the geo-location feature of their smartphone, they receive an automated tweet reminding them to enable geo-location and resubmit their tweet. Finally, the platform “generates a thank you message confirming the receipt of the user’s report and directing them to PetaJakarta.org to see their contribution to the map.” Note that the “overall aim of sending programmatic messages is not to simply solicit a high volume of replies, but to reach active, committed citizen-users willing to participate in civic co-management by sharing nontrivial data that can benefit other users and government agencies in decision-making during disaster scenarios.”

A report is considered verified when a confirmed geo-tagged tweet includes a picture of the flooding, like in the tweet below. These confirmed and verified tweets get automatically mapped and also shared with Jakarta’s Emergency Management Agency (BPBD DKI Jakarta). The latter are directly involved in this initiative since they’re “regularly faced with the difficult challenge of anticipating & responding to floods hazards and related extreme weather events in Jakarta.” This direct partnership also serves to limit the “Data Rot Syndrome” where data is gathered but not utilized. Note that Peta Jakarta is able to carry out additional verification measures by manually assessing the validity of tweets and pictures by cross-checking other Twitter reports from the same district and also by monitoring “television and internet news sites, to follow coverage of flooded areas and cross-check reports.”

Screen Shot 2015-06-29 at 2.38.54 PM

During the latest monsoon season, Peta Jakarta “received and mapped 1,119 confirmed reports of flooding. These reports were formed by 877 users, indicating an average tweet to user ratio of 1.27 tweets per user. A further 2,091 confirmed reports were received without the required geolocation metadata to be mapped, highlighting the value of the programmatic geo-location ‘reminders’ […]. With regard to unconfirmed reports, Peta Jakarta recorded and mapped a total of 25,584 over the course of the monsoon.”

The Live Crisis Maps could be viewed via two different interfaces depending on the end user. For local residents, the maps could be accessed via smartphone with the visual display designed specifically for more tactical decision-making, showing flood reports at the neighborhood level and only for the past hour.

PJ2

For institutional partners, the data is visualized in more aggregate terms for strategic decision-making based trends-analysis and data integration. “When viewed on a desktop computer, the web-application scaled the map to show a situational overview of the city.”

Credit: Peta Jakarta

Peta Jakarta has “proven the value and utility of social media as a mega-city methodology for crowdsourcing relevant situational information to aid in decision-making and response coordination during extreme weather events.” The initiative enables “autonomous users to make independent decisions on safety and navigation in response to the flood in real-time, thereby helping increase the resilience of the city’s residents to flooding and its attendant difficulties.” In addition, by “providing decision support at the various spatial and temporal scales required by the different actors within city, Peta Jakarta offers an innovative and inexpensive method for the crowdsourcing of time-critical situational information in disaster scenarios.” The resulting confirmed and verified tweets were used by BPBD DKI Jakarta to “cross-validate formal reports of flooding from traditional data sources, supporting the creation of information for flood assessment, response, and management in real-time.”


My blog post is based several conversations I had with Peta Jakarta team and on this white paper, which was just published a week ago. The report runs close to 100 pages and should absolutely be considered required reading for all Digital Humanitarians and CrisisMappers. The paper includes several dozen insights which a short blog post simply cannot do justice to. If you can’t find the time to read the report, then please see the key excerpts below. In a future blog post, I’ll describe how the Peta Jakarta team plans to leverage UAVs to complement social media reporting.

  • Extracting knowledge from the “noise” of social media requires designed engagement and filtering processes to eliminate unwanted information, reward valuable reports, and display useful data in a manner that further enables users, governments, or other agencies to make non-trivial, actionable decisions in a time-critical manner.
  • While the utility of passively-mined social media data can offer insights for offline analytics and derivative studies for future planning scenarios, the critical issue for frontline emergency responders is the organization and coordination of actionable, real-time data related to disaster situations.
  • User anonymity in the reporting process was embedded within the Peta Jakarta project. Whilst the data produced by Twitter reports of flooding is in the public domain, the objective was not to create an archive of users who submitted potentially sensitive reports about flooding events, outside of the Twitter platform. Peta Jakarta was thus designed to anonymize reports collected by separating reports from their respective users. Furthermore, the text content of tweets is only stored when the report is confirmed, that is, when the user has opted to send a message to the @petajkt account to describe their situation. Similarly, when usernames are stored, they are encrypted using a one-way hash function.
  • In developing the Peta Jakarta brand as the public face of the project, it was important to ensure that the interface and map were presented as community-owned, rather than as a government product or academic research tool. Aiming to appeal to first adopters—the young, tech-savvy Twitter-public of Jakarta—the language used in all the outreach materials (Twitter replies, the outreach video, graphics, and print advertisements) was intentionally casual and concise. Because of the repeated recurrence of flood events during the monsoon, and the continuation of daily activities around and through these flood events, the messages were intentionally designed to be more like normal twitter chatter and less like public service announcements.
  • It was important to design the user interaction with PetaJakarta.org to create a user experience that highlighted the community resource element of the project (similar to the Waze traffic app), rather than an emergency or information service. With this aim in mind, the graphics and language are casual and light in tone. In the video, auto-replies, and print advertisements, PetaJakarta.org never used alarmist or moralizing language; instead, the graphic identity is one of casual, opt-in, community participation.
  • The most frequent question directed to @petajkt on Twitter was about how to activate the geo-location function for tweets. So far, this question has been addressed manually by sending a reply tweet with a graphic instruction describing how to activate geo-location functionality.
  • Critical to the success of the project was its official public launch with, and promotion by, the Governor. This endorsement gave the platform very high visibility and increased legitimacy among other government agencies and public users; it also produced a very successful media event, which led substantial media coverage and subsequent public attention.

  • The aggregation of the tweets (designed to match the spatio-temporal structure of flood reporting in the system of the Jakarta Disaster Management Agency) was still inadequate when looking at social media because it could result in their overlooking reports that occurred in areas of especially low Twitter activity. Instead, the Agency used the @petajkt Twitter stream to direct their use of the map and to verify and cross-check information about flood-affected areas in real-time. While this use of social media was productive overall, the findings from the Joint Pilot Study have led to the proposal for the development of a more robust Risk Evaluation Matrix (REM) that would enable Peta Jakarta to serve a wider community of users & optimize the data collection process through an open API.
  • Developing a more robust integration of social media data also means leveraging other potential data sets to increase the intelligence produced by the system through hybridity; these other sources could include, but are not limited to, government, private sector, and NGO applications (‘apps’) for on- the-ground data collection, LIDAR or UAV-sourced elevation data, and fixed ground control points with various types of sensor data. The “citizen-as- sensor” paradigm for urban data collection will advance most effectively if other types of sensors and their attendant data sources are developed in concert with social media sourced information.

Assessing Disaster Damage from 3D Point Clouds

Humanitarian and development organizations like the United Nations and the World Bank typically carry out disaster damage and needs assessments following major disasters. The ultimate goal of these assessments is to measure the impact of disasters on the society, economy and environment of the affected country or region. This includes assessing the damage caused to building infrastructure, for example. These assessment surveys are generally carried out in person—that is, on foot and/or by driving around an affected area. This is a very time-consuming process with very variable results in terms of data quality. Can 3D (Point Clouds) derived from very high resolution aerial imagery captured by UAVs accelerate and improve the post-disaster damage assessment process? Yes, but a number of challenges related to methods, data & software need to be overcome first. Solving these challenges will require pro-active cross-disciplinary collaboration.

The following three-tiered scale is often used to classify infrastructure damage: “1) Completely destroyed buildings or those beyond repair; 2) Partially destroyed buildings with a possibility of repair; and 3) Unaffected buildings or those with only minor damage . By locating on a map all dwellings and buildings affected in accordance with the categories noted above, it is easy to visualize the areas hardest hit and thus requiring priority attention from authorities in producing more detailed studies and defining demolition and debris removal requirements” (UN Handbook). As one World Bank colleague confirmed in a recent email, “From the engineering standpoint, there are many definitions of the damage scales, but from years of working with structural engineers, I think the consensus is now to use a three-tier scale – destroyed, heavily damaged, and others (non-visible damage).”

That said, field-based surveys of disaster damage typically overlook damage caused to roofs since on-the-ground surveyors are bound by the laws of gravity. Hence the importance of satellite imagery. At the same time, however, “The primary problem is the vertical perspective of [satellite imagery, which] largely limits the building information to the roofs. This roof information is well suited for the identification of extreme damage states, that is completely destroyed structures or, to a lesser extent, undamaged buildings. However, damage is a complex 3-dimensional phenomenon,” which means that “important damage indicators expressed on building façades, such as cracks or inclined walls, are largely missed, preventing an effective assessment of intermediate damage states” (Fernandez Galaretta et al. 2014).

Screen Shot 2015-04-06 at 10.58.31 AM

This explains why “Oblique imagery [captured from UAVs] has been identified as more useful, though the multi-angle imagery also adds a new dimension of complexity” as we experienced first-hand during the World Bank’s UAV response to Cyclone Pam in Vanuatu (Ibid, 2014). Obtaining photogrammetric data for oblique images is particularly challenging. That is, identifying GPS coordinates for a given house pictured in an oblique photograph is virtually impossible to do automatically with the vast majority of UAV cameras. (Only specialist cameras using gimbal mounted systems can reportedly infer photogrammetric data in oblique aerial imagery, but even then it is unclear how accurate this inferred GPS data is). In any event, oblique data also “lead to challenges resulting from the multi-perspective nature of the data, such as how to create single damage scores when multiple façades are imaged” (Ibid, 2014).

To this end, my colleague Jorge Fernandez Galarreta and I are exploring the use of 3D (point clouds) to assess disaster damage. Multiple software solutions like Pix4D and PhotoScan can already be used to construct detailed point clouds from high-resolution 2D aerial imagery (nadir and oblique). “These exceed standard LiDAR point clouds in terms of detail, especially at façades, and provide a rich geometric environment that favors the identification of more subtle damage features, such as inclined walls, that otherwise would not be visible, and that in combination with detailed façade and roof imagery have not been studied yet” (Ibid, 2014).

Unlike oblique images, point clouds give surveyors a full 3D view of an urban area, allowing them to “fly through” and inspect each building up close and from all angles. One need no longer be physically onsite, nor limited to simply one façade or a strictly field-based view to determine whether a given building is partially damaged. But what does partially damaged even mean when this kind of high resolution 3D data becomes available? Take this recent note from a Bank colleague with 15+ years of experience in disaster damage assessments: “In the [Bank’s] official Post-Disaster Needs Assessment, the classification used is to say that if a building is 40% damaged, it needs to be repaired. In my view this is too vague a description and not much help. When we say 40%, is it the volume of the building we are talking about or the structural components?”

Screen Shot 2015-05-17 at 1.45.50 PM

In their recent study, Fernandez Galaretta et al. used point clouds to generate per-building damage scores based on a 5-tiered classification scale (D1-D5). They chose to compute these damage scores based on the following features: “cracks, holes, intersection of cracks with load-carrying elements and dislocated tiles.” They also selected non-damage related features: “façade, window, column and intact roof.” Their results suggest that the visual assessment of point clouds is very useful to identify the following disaster damage features: total collapse, collapsed roof, rubble piles, inclined façades and more subtle damage signatures that are difficult to recognize in more traditional BDA [Building Damage Assessment] approaches. The authors were thus able to compute a per building damage score, taking into account both “the overall structure of the building,” and the “aggregated information collected from each of the façades and roofs of the building to provide an individual per-building damage score.”

Fernandez Galaretta et al. also explore the possibility of automating this damage assessment process based on point clouds. Their conclusion: “More research is needed to extract automatically damage features from point clouds, combine those with spectral and pattern indicators of damage, and to couple this with engineering understanding of the significance of connected or occluded damage indictors for the overall structural integrity of a building.” That said, the authors note that this approach would “still suffer from the subjectivity that characterizes expert-based image analysis.”

Hence my interest in using crowdsourcing to analyze point clouds for disaster damage. Naturally, crowdsourcing alone will not eliminate subjectivity. In fact, having more people analyze point clouds may yield all kinds of disparate results. This is explains why a detailed and customized imagery interpretation guide is necessary; like this one, which was just released by my colleagues at the Harvard Humanitarian Initiative (HHI). This also explains why crowdsourcing platforms require quality-control mechanisms. One easy technique is triangulation: have ten different volunteers look at each point cloud and tag features in said cloud that show cracks, holes, intersection of cracks with load-carrying elements and dislocated tiles. Surely more eyes are better than two for tasks that require a good eye for detail.

Screen Shot 2015-05-17 at 1.49.59 PM

Next, identify which features have the most tags—this is the triangulation process. For example, if one area of a point cloud is tagged as a “crack” by 8 or more volunteers, chances are there really is a crack there. One can then count the total number of distinct areas tagged as cracks by 8 or more volunteers across the point cloud to calculate the total number of cracks per façade. Do the same with the other metrics (holes, dislocated titles, etc.), and you can compute a per building damage score based on overall consensus derived from hundreds of crowdsourced tags. Note that “tags’ can also be lines or polygons; meaning that individual cracks could be traced by volunteers, thus providing information on the approximate lengths/size of a crack. This variable could also be factored in the overall per-building damage score.

In sum, crowdsourcing could potentially overcome some of the data quality issues that have already marked field-based damage assessment surveys. In addition, crowdsourcing could potentially speed up the data analysis since professional imagery and GIS analysts tend to already be hugely busy in the aftermath of major disasters. Adding more data to their plate won’t help anyone. Crowdsourcing the analysis of 3D point clouds may thus be our best bet.

So why hasn’t this all been done yet? For several reasons. For one, creating very high-resolution point clouds requires more pictures and thus more UAV flights, which can be time consuming. Second, processing aerial imagery to construct point clouds can also take some time. Third, handling, sharing and hosting point clouds can be challenging given how large those files quickly get. Fourth, no software platform currently exists to crowdsource the annotation of point clouds as described above (particularly when it comes to the automated quality control mechanisms that are necessary to ensure data quality). Fifth, we need more robust imagery interpretation guides. Sixth, groups like the UN and the World Bank are still largely thinking in 2D rather than 3D. And those few who are considering 3D tend to approach this from a data visualization angle rather than using human and machine computing to analyze 3D data. Seventh, this area, point cloud analysis for 3D feature detection, is still a very new area of research. Many of the methodology questions that need answers have yet to be answered, which is why my team and I at QCRI are starting to explore this area from the perspective of computer vision and machine learning.

The holy grail? Combining crowdsourcing with machine learning for real-time feature detection of disaster damage in 3D point clouds rendered in real-time via airborne UAVs surveying a disaster site. So what is it going to take to get there? Well, first of all, UAVs are becoming more sophisticated; they’re flying faster and for longer and will increasingly be working in swarms. (In addition, many of the new micro-UAVs come with a “follow me” function, which could enable the easy and rapid collection of aerial imagery during field assessments). So the first challenge described above is temporary as are the second and third challenges since computer processing power is increasing, not decreasing, over time.

This leaves us with the software challenge and imagery guides. I’m already collaborate with HHI on the latter. As for the former, I’ve spoken with a number of colleagues to explore possible software solutions to crowdsource the tagging of point clouds. One idea is simply to extend MicroMappers. Another is to add simple annotation features to PLAS.io and PointCloudViz since these platforms are already designed to visualize and interact with point clouds. A third option is to use a 3D model platform like SketchFab, which already enables annotations. (Many thanks to colleague Matthew Schroyer for pointing me to SketchFab last week). I’ve since had a long call with SketchFab and am excited by the prospects of using this platform for simple point cloud annotation.

In fact, Matthew already used SketcFab to annotate a 3D model of Durbar Square neighborhood in downtown Kathmandu post-earthquake. He found an aerial video of the area, took multiple screenshots of this video, created a point cloud from these and then generated a 3D model which he annotated within SketchFab. This model, pictured below, would have been much higher resolution if he had the original footage or 2D images. Click pictures to enlarge.

3D Model 1 Nepal

3D Model 2 Nepal

3D Model 3 Nepal

3D Model 4 Nepal

Here’s a short video with all the annotations in the 3D model:

And here’s the link to the “live” 3D model. And to drive home the point that this 3D model could be far higher resolution if the underlying imagery had been directly accessible to Matthew, check out this other SketchFab model below, which you can also access in full here.

Screen Shot 2015-05-16 at 9.35.20 AM

Screen Shot 2015-05-16 at 9.35.41 AM

Screen Shot 2015-05-16 at 9.37.33 AM

The SketchFab team has kindly given me a SketchFab account that allows up to 50 annotations per 3D model. So I’ll be uploading a number of point clouds from Vanuatu (post Cyclone Pam) and Nepal (post earthquakes) to explore the usability of SketchFab for crowdsourced disaster damage assessments. In the meantime, one could simply tag-and-number all major features in a point cloud, create a Google Form, and ask digital volunteers to rate the level of damage near each numbered tag. Not a perfect solution, but one that works. Ultimately, we’d need users to annotate point clouds by tracing 3D polygons if we wanted a more easy way to use the resulting data for automated machine learning purposes.

In any event, if readers do have any suggestions on other software platforms, methodologies, studies worth reading, etc., feel free to get in touch via the comments section below or by email, thank you. In the meantime, many thanks to colleagues Jorge, Matthew, Ferda & Ji (QCRI), Salvador (PointCloudViz), Howard (PLAS.io) and Corentin (SketchFab) for the time they’ve kindly spent brainstorming the above issues with me.

Can Massively Multiplayer Online Games also be Next Generation Humanitarian Technologies?

IRL

My colleague Peter Mosur and I launched the Internet Response League (IRL) at QCRI a while back to actively explore the intersection of massively multiplayer online games & humanitarian response. IRL is also featured in my new book, Digital Humanitarians, along with many other innovative ideas & technologies. Shortly after the book came out, Peter and I had the pleasure of exploring a collaboration with the team at Massive Multiplayer Online Science (MMOS) and CCP Games—makers of the popular game EVE Online.

MMOS is an awesome group that aims to enable online gamers to contribute to scientific research while playing video games. Our colleagues at MMOS kindly reached out to us earlier this year as they’re really interested in supporting humanitarian efforts as well. They are thus kindly bringing IRL on board to help them explore the use of online games for humanitarian projects.

CCP Games has already been mentioned on the IRL blog here. Their gamers managed to raise an impressive $190,890 for the Icelandic Red Cross in response to Typhoon Haiyan/Yolanda with their PLEX for Good initiative. This is on top of the $100,000 that the company has raised with the program for various disasters in Japan, Haiti, Pakistan, and the United States.

CCP Game’s flagship title EVE Online passed 500,000 subscribers in 2013. The game is extremely unique when it comes to MMORPGs. Rather than having a player base spanning across many different servers, EVE Online keeps keeps all players on one large server. Entitled “Tranquility”, this one server currently averages 25,000 players at any given time, with peaks of over 38,000 [1]. This equates to an average of 600,000 hours of human time spent playing EVE Online every day! The potential good to come out of a humanitarian partnership would be immensely valuable to the world!

So we’re currently exploring with the team at MMOS possible ways to process humanitarian data within EVE’s gaming environment. We’ll write another post soon detailing the unique challenges we’re facing in terms of seamlessly process-ing digital humanitarian tasks within EVE Online. This will require a lot of creativity to pull off and success is by no means guaranteed (just like life and online games). In sum, our humanitarian tasks must in no way disrupt the EVE Online experience; they basically need to be “invisible” to the gamer (besides an initial opt-in).

See the video below for an in-depth overview of the type of work that MMOS and CCP Games envision incorporated into EVE Online. The video was screened at the recent EVE Online Fanfest last month and also features a message from the Internet Response League at the 40:36 minute mark!

This blog post was co-authored with Peter Mosur.

Artificial Intelligence for Monitoring Elections (AIME)

AIME logo

I published a blog post with the same title a good while back. Here’s what I wrote at the time:

Citizen-based, crowdsourced election observation initiatives are on the rise. Leading election monitoring organizations are also looking to leverage citizen-based reporting to complement their own professional election monitoring efforts. Meanwhile, the information revolution continues apace, with the number of new mobile phone subscriptions up by over 1 billion in just the past 36 months alone. The volume of election-related reports generated by “the crowd” is thus expected to grow significantly in the coming years. But international, national and local election monitoring organizations are completely unprepared to deal with the rise of Big (Election) Data.

I thus introduced a new project to “develop a free and open source platform to automatically filter relevant election reports from the crowd.” I’m pleased to report that my team and I at QCRI have just tested AIME during an actual election for the very first time—the 2015 Nigerian Elections. My QCRI Research Assistant Peter Mosur (co-author of this blog post) collaborated directly with Oludotun Babayemi from Clonehouse Nigeria and Chuks Ojidoh from the Community Life Project & Reclaim Naija to deploy and test the AIME platform.

AIME is a free and open source (experimental) solution that combines crowd-sourcing with Artificial Intelligence to automatically identify tweets of interest during major elections. As organizations engaged in election monitoring well know, there can be a lot chatter on social media as people rally behind their chosen candidates, announce this to the world, ask their friends and family who they will be voting for, and updating others when they have voted while posting about election related incidents they may have witnessed. This can make it rather challenging to find reports relevant to election monitoring groups.

WP1

Election monitors typically monitor instances of violence, election rigging, and voter issues. These incidents are monitored because they reveal problems that arise with the elections. Election monitoring initiatives such as Reclaim Naija & Uzabe also monitor several other type of incidents but for the purposes of testing the AIME platform, we selected three types of events mentioned above. In order to automatically identify tweets related to these events, one must first provide AIME with example tweets. (Of course, if there is no Twitter traffic to begin with, then there won’t be much need for AIME, which is precisely why we developed an SMS extension that can be used with AIME).

So where does the crowdsourcing comes in? Users of AIME can ask the crowd to tag tweets related to election-violence, rigging and voter issues by simply clicking on tagging tweets posted to the AIME platform with the appropriate event type. (Several quality control mechanisms are built in to ensure data quality. Also, one does not need to use crowdsourcing to tag the tweets; this can be done internally as well or instead). What AIME does next is use a technique from Artificial Intelligence (AI) called statistical machine learning to understand patterns in the human-tagged tweets. In other words, it begins to recognize which tweets belong in which category type—violence, rigging and voter issues. AIME will then auto-classify new tweets that are related to these categories (and can auto-classify around 2 millions tweets or text messages per minute).

Screen Shot 2015-04-10 at 8.33.08 AM

Before creating our automatic classifier for the Nigerian Elections, we first needed to collect examples of tweets related to election violence, rigging and voter issues in order to teach AIME. Oludotun Babayemi and Chuks Ojidoh kindly provided the expert local knowledge needed to identify the keywords we should be following on Twitter (using AIME). They graciously gave us many different keywords to use as well as a list of trusted Twitter accounts to follow for election-related messages. (Due to difficulties with AIME, we were not able to use the trusted accounts. In addition, many of the suggested keywords were unusable since words like “aggressive”, “detonate”, and “security” would have resulted in large amount of false positives).

Here is the full list of keywords used by AIME:

Nigeria elections, nigeriadecides, Nigeria decides, INEC, GEJ, Change Nigeria, Nigeria Transformation, President Jonathan, Goodluck Jonathan, Sai Buhari, saibuhari, All progressives congress, Osibanjo, Sambo, Peoples Democratic Party, boko haram, boko, area boys, nigeria2015, votenotfight, GEJwinsit, iwillvoteapc, gmb2015, revoda, thingsmustchange,  and march4buhari   

Out of this list, “NigeriaDecides” was by far the most popular keyword used in the elections. It accounted for over 28,000 Tweets of a batch of 100,000. During the week leading up to the elections, AIME collected roughly 800,000 Tweets. Over the course of the elections and the few days following, the total number of collected Tweets jumped to well over 4 million.

We sampled just a handful of these tweets and manually tagged those related to violence, rigging and other voting issues using AIME. “Violence” was described as “threats, riots, arming, attacks, rumors, lack of security, vandalism, etc.” while “Election Rigging” was described as “Ballot stuffing, issuing invalid ballot papers, voter impersonation, multiple voting, ballot boxes destroyed after counting, bribery, lack of transparency, tampered ballots etc.” Lastly, “Voting Issues” was defined as “Polling station logistics issues, technical issues, people unable to vote, media unable to enter, insufficient staff, lack of voter assistance, inadequate voting materials, underage voters, etc.”

Any tweet that did not fall into these three categories was tagged as “Other” or “Not Related”. Our Election Classifiers were trained with a total of 571 human-tagged tweets which enabled AIME to automatically classify well over 1 million tweets (1,263,654 to be precise). The results in the screenshot below show accurate AIME was at auto-classifying tweets based on the different event types define earlier. AUC is what captures the “overall accuracy” of AIME’s classifiers.

AIME_Nigeria

AIME was rather good at correctly tagging tweets related to “Voting Issues” (98% accuracy) but drastically poor at tagging related to “Election Rigging” (0%). This is not AIME’s fault : ) since it only had 8 examples to learn from. As for “Violence”, the accuracy score was 47%, which is actually surprising given that AIME only had 14 human-tagged examples to learn from. Lastly, AIME did fairly well at auto-classifying unrelated tweets (accuracy of 86%).

Conclusion: this was the first time we tested AIME during an actual election and we’ve learned a lot in the process. The results are not perfect but enough to press on and experiment further with the AIME platform. If you’d like to test AIME yourself (and if you fully recognize that the tool is experimental and still under development, hence not perfect), then feel free to get in touch with me here. We have 2 slots open for testing. In the meantime, big thanks to my RA Peter for spearheading both this deployment and the subsequent research.

Crowdsourcing Point Clouds for Disaster Response

Point Clouds, or 3D models derived from high resolution aerial imagery, are in fact nothing new. Several software platforms already exist to reconstruct a series of 2D aerial images into fully fledged 3D-fly-through models. Check out these very neat examples from my colleagues at Pix4D and SenseFly:

What does a castle, Jesus and a mountain have to do with humanitarian action? As noted in my previous blog post, there’s only so much disaster damage one can glean from nadir (that is, vertical) imagery and oblique imagery. Lets suppose that the nadir image below was taken by an orbiting satellite or flying UAV right after an earthquake, for example. How can you possibly assess disaster damage from this one picture alone? Even if you had nadir imagery for these houses before the earthquake, your ability to assess structural damage would be limited.

Screen Shot 2015-04-09 at 5.48.23 AM

This explains why we also captured oblique imagery for the World Bank’s UAV response to Cyclone Pam in Vanuatu (more here on that humanitarian mission). But even with oblique photographs, you’re stuck with one fixed perspective. Who knows what these houses below look like from the other side; your UAV may have simply captured this side only. And even if you had pictures for all possible angles, you’d literally have 100’s of pictures to leaf through and make sense of.

Screen Shot 2015-04-09 at 5.54.34 AM

What’s that famous quote by Henry Ford again? “If I had asked people what they wanted, they would have said faster horses.” We don’t need faster UAVs, we simply need to turn what we already have into Point Clouds, which I’m indeed hoping to do with the aerial imagery from Vanuatu, by the way. The Point Cloud below was made only from single 2D aerial images.

It isn’t perfect, but we don’t need perfection in disaster response, we need good enough. So when we as humanitarian UAV teams go into the next post-disaster deployment and ask what humanitarians they need, they may say “faster horses” because they’re not (yet) familiar with what’s really possible with the imagery processing solutions available today. That obviously doesn’t mean that we should ignore their information needs. It simply means we should seek to expand their imaginations vis-a-vis the art of the possible with UAVs and aerial imagery. Here is a 3D model of a village in Vanuatu constructed using 2D aerial imagery:

Now, the title of my blog post does lead with the word crowdsourcing. Why? For several reasons. First, it takes some decent computing power (and time) to create these Point Clouds. But if the underlying 2D imagery is made available to hundreds of Digital Humanitarians, we could use this distributed computing power to rapidly crowdsource the creation of 3D models. Second, each model can then be pushed to MicroMappers for crowdsourced analysis. Why? Because having a dozen eyes scrutinizing one Point Cloud is better than 2. Note that for quality control purposes, each Point Cloud would be shown to 5 different Digital Humanitarian volunteers; we already do this with MicroMappers for tweets, pictures, videos, satellite images and of course aerial images as well. Each digital volunteer would then trace areas in the Point Cloud where they spot damage. If the traces from the different volunteers match, then bingo, there’s likely damage at those x, y and z coordinate. Here’s the idea:

We could easily use iPads to turn the process into a Virtual Reality experience for digital volunteers. In other words, you’d be able to move around and above the actual Point Cloud by simply changing the position of your iPad accordingly. This technology already exists and has for several years now. Tracing features in the 3D models that appear to be damaged would be as simple as using your finger to outline the damage on your iPad.

What about the inevitable challenge of Big Data? What if thousands of Point Clouds are generated during a disaster? Sure, we could try to scale our crowd-sourcing efforts by recruiting more Digital Humanitarian volunteers, but wouldn’t that just be asking for a “faster horse”? Just like we’ve already done with MicroMappers for tweets and text messages, we would seek to combine crowdsourcing and Artificial Intelligence to automatically detect features of interest in 3D models. This sounds to me like an excellent research project for a research institute engaged in advanced computing R&D.

I would love to see the results of this applied research integrated directly within MicroMappers. This would allow us to integrate the results of social media analysis via MicroMappers (e.g, tweets, Instagram pictures, YouTube videos) directly with the results of satellite imagery analysis as well as 2D and 3D aerial imagery analysis generated via MicroMappers.

Anyone interested in working on this?

How Digital Jedis Are Springing to Action In Response To Cyclone Pam

Digital Humanitarians sprung to action just hours after the Category 5 Cyclone collided with Vanuatu’s many islands. This first deployment focused on rapidly assessing the damage by analyzing multimedia content posted on social media and in the mainstream news. This request came directly from the United Nations (OCHA), which activated the Digital Humanitarian Network (DHN) to carry out the rapid damage assessment. So the Standby Task Force (SBTF), a founding member of the DHN, used QCRI′s MicroMappers platform to produce a digital, interactive Crisis Map of some 1,000+ geo-tagged pictures of disaster damage (screenshot below).

MM_ImageMap_Vanuatu

Within days of Cyclone Pam making landfall, the World Bank (WB) activated the Humanitarian UAV Network (UAViators) to quickly deploy UAV pilots to the affected islands. UAViators has access to a global network of 700+ professional UAV pilots is some 70+ countries worldwide. The WB identified two UAV teams from the Humanitarian UAV Network and deployed them to capture very high-resolution aerial photographs of the damage to support the Government’s post-disaster damage assessment efforts. Pictures from these early UAV missions are available here. Aerial images & videos of the disaster damage were also posted to the UAViators Crowdsourced Crisis Map.

Last week, the World Bank activated the DHN (for the first time ever) to help analyze the many, many GigaBytes of aerial imagery from Vanuatu. So Digital Jedis from the DHN are now using Humanitarian OpenStreetMap (HOT) and MicroMappers (MM) to crowdsource the search for partially damaged and fully destroyed houses in the aerial imagery. The OSM team is specifically looking at the “nadir imagery” captured by the UAVs while MM is exclusively reviewing the “oblique imagery“. More specifically, digital volunteers are using MM to trace destroyed houses red, partially damaged houses orange, and using blue to denote houses that appear to have little to no damage. Below is an early screenshot of the Aerial Crisis Map for the island of Efate. The live Crisis Map is available here.

Screen Shot 2015-04-06 at 10.56.09 AM

Clicking on one of these markers will open up the high resolution aerial pictures taken at that location. Here, two houses are traced in blue (little to no damage) and two on the upper left are traced in orange (partial damage expected).

Screen Shot 2015-04-06 at 10.57.17 AM

The cameras on the UAVs captured the aerial imagery in very high resolution, as you can see from the close up below. You’ll note two traces for the house. These two traces were done by two independent volunteers (for the purposes of quality control). In fact, each aerial image is shown to at least 3 different Digital Jedis.

Screen Shot 2015-04-06 at 10.58.31 AM

Once this MicroMappers deployment is over, we’ll be using the resulting traces to create automated featured detection algorithms; just like we did here for the MicroMappers Namibia deployment. This approach, combining crowdsourcing with Artificial Intelligence (AI), is explored in more detail here vis-a-vis disaster response. The purpose of taking this hybrid human-machine computing solution is to accelerate (semi-automate) future damage assessment efforts.

Meanwhile, back in Vanuatu, the HOT team has already carried out some tentative, preliminary analysis of the damage based on the aerial imagery provided. They are also up-dating their OSM maps of the affected islands thanks this imagery. Below is an initial damage assessment carried out by HOT for demonstration purposes only. Please visit their deployment page on the Vanuatu response for more information.

2015-04-04_18h04_00

So what’s next? Combining both the nadir and oblique imagery to interpret disaster damage is ultimately what is needed, so we’re actually hoping to make this happen (today) by displaying the nadir imagery directly within the Aerial Crisis Map produced by MicroMappers. (Many thanks to the MapBox team for their assistance on this). We hope this integration will help HOT and our World Bank partners better assess the disaster damage. This is the first time that we as a group are doing anything like this, so obviously lots of learning going on, which should improve future deployments. Ultimately, we’ll need to create 3D models (point clouds) of disaster affected areas (already easy to do with high-resolution aerial imagery) and then simply use MicroMappers to crowdsource the analysis of these 3D models.

And here’s a 3D model of a village in Vanuatu constructed using 2D aerial photos taken by UAV:

For now, though, Digital Jedis will continue working very closely with the World Bank to ensure that the latter have the results they need in the right format to deliver a comprehensive damage assessment to the Government of Vanuatu by the end of the week. In the meantime, if you’re interested in learning more about digital humanitarian action, then please check out my new book, which features UAViators, HOT, MM and lots more.

Artificial Intelligence Powered by Crowdsourcing: The Future of Big Data and Humanitarian Action

There’s no point spewing stunning statistics like this recent one from The Economist, which states that 80% of adults will have access to smartphones before 2020. The volume, velocity and variety of digital data will continue to skyrocket. To paraphrase Douglas Adams, “Big Data is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is.”

WP1

And so, traditional humanitarian organizations have a choice when it comes to battling Big Data. They can either continue business as usual (and lose) or get with the program and adopt Big Data solutions like everyone else. The same goes for Digital Humanitarians. As noted in my new book of the same title, those Digital Humanitarians who cling to crowdsourcing alone as their pièce de résistance will inevitably become the ivy-laden battlefield monuments of 2020.

bookcover

Big Data comprises a variety of data types such as text, imagery and video. Examples of text-based data includes mainstream news articles, tweets and WhatsApp messages. Imagery includes Instagram, professional photographs that accompany news articles, satellite imagery and increasingly aerial imagery as well (captured by UAVs). Television channels, Meerkat and YouTube broadcast videos. Finding relevant, credible and actionable pieces of text, imagery and video in the Big Data generated during major disasters is like looking for a needle in a meadow (haystacks are ridiculously small datasets by comparison).

Humanitarian organizations, like many others in different sectors, often find comfort in the notion that their problems are unique. Thankfully, this is rarely true. Not only is the Big Data challenge not unique to the humanitarian space, real solutions to the data deluge have already been developed by groups that humanitarian professionals at worst don’t know exist and at best rarely speak with. These groups are already using Artificial Intelligence (AI) and some form of human input to make sense of Big Data.

Data digital flow

How does it work? And why do you still need some human input if AI is already in play? The human input, which can be via crowdsourcing or a few individuals is needed to train the AI engine, which uses a technique from AI called machine learning to learn from the human(s). Take AIDR, for example. This experimental solution, which stands for Artificial Intelligence for Disaster Response, uses AI powered by crowdsourcing to automatically identify relevant tweets and text messages in an exploding meadow of digital data. The crowd tags tweets and messages they find relevant and the AI engine learns to recognize the relevance patterns in real-time, allowing AIDR to automatically identify future tweets and messages.

As far as we know, AIDR is the only Big Data solution out there that combines crowdsourcing with real-time machine learning for disaster response. Why do we use crowdsourcing to train the AI engine? Because speed is of the essence in disasters. You need a crowd of Digital Humanitarians to quickly tag as many tweets/messages as possible so that AIDR can learn as fast as possible. Incidentally, once you’ve created an algorithm that accurately detects tweets relaying urgent needs after a Typhoon in the Philippines, you can use that same algorithm again when the next Typhoon hits (no crowd needed).

What about pictures? After all, pictures are worth a thousand words. Is it possible to combine artificial intelligence with human input to automatically identify pictures that show infrastructure damage? Thanks to recent break-throughs in computer vision, this is indeed possible. Take Metamind, for example, a new startup I just met with in Silicon Valley. Metamind is barely 6 months old but the team has already demonstrated that one can indeed automatically identify a whole host of features in pictures by using artificial intelligence and some initial human input. The key is human input since this is what trains the algorithms. The more human-generated training data you have, the better your algorithms.

My team and I at QCRI are collaborating with Metamind to create algorithms that can automatically detect infrastructure damage in pictures. The Silicon Valley start-up is convinced that we’ll be able to create a highly accurate algorithms if we have enough training data. This is where MicroMappers comes in. We’re already using MicroMappers to create training data for tweets and text messages (which is what AIDR uses to create algorithms). In addition, we’re already using MicroMappers to tag and map pictures of disaster damage. The missing link—in order to turn this tagged data into algorithms—is Metamind. I’m excited about the prospects, so stay tuned for updates as we plan to start teaching Metamind’s AI engine this month.

Screen Shot 2015-03-16 at 11.45.31 AM

How about videos as a source of Big Data during disasters? I was just in Austin for SXSW 2015 and met up with the CEO of WireWax, a British company that uses—you guessed it—artificial intelligence and human input to automatically detect countless features in videos. Their platform has already been used to automatically find guns and Justin Bieber across millions of videos. Several other groups are also working on feature detection in videos. Colleagues at Carnegie Melon University (CMU), for example, are working on developing algorithms that can detect evidence of gross human rights violations in YouTube videos coming from Syria. They’re currently applying their algorithms on videos of disaster footage, which we recently shared with them, to determine whether infrastructure damage can be automatically detected.

What about satellite & aerial imagery? Well the team driving DigitalGlobe’s Tomnod platform have already been using AI powered by crowdsourcing to automatically identify features of interest in satellite (and now aerial) imagery. My team and I are working on similar solutions with MicroMappers, with the hope of creating real-time machine learning solutions for both satellite and aerial imagery. Unlike Tomnod, the MicroMappers platform is free and open source (and also filters social media, photographs, videos & mainstream news).

Screen Shot 2015-03-16 at 11.43.23 AM

Screen Shot 2015-03-16 at 11.41.21 AM

So there you have it. The future of humanitarian information systems will not be an App Store but an “Alg Store”, i.e, an Algorithm Store providing a growing menu of algorithms that have already been trained to automatically detect certain features in texts, imagery and videos that gets generated during disasters. These algorithms will also “talk to each other” and integrate other feeds (from real-time sensors, Internet of Things) thanks to data-fusion solutions that already exist and others that are in the works.

Now, the astute reader may have noted that I omitted audio/speech in my post. I’ll be writing about this in a future post since this one is already long enough.