Tag Archives: Disaster

Results of MicroMappers Response to Pakistan Earthquake (Updated)

Update: We’re developing & launching MicroFilters to improve MicroMappers.

About 47 hours ago, the UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) in response to the Pakistan Earthquake. The activation request was for 48 hours, so the deployment will soon phase out. As already described here, the Standby Volunteer Task Force (SBTF) teamed up with QCRI to carry out an early test of MicroMappers, which was not set to launch until next month. This post shares some initial thoughts on how the test went along with preliminary results.

Pakistan Quake

During ~40 hours, 109 volunteers from the SBTF and the public tagged just over 30,000 tweets that were posted during the first 36 hours or so after the quake. We were able to automatically collect these tweets thanks to our partnership with GNIP and specifically filtered for said tweets using half-a-dozen hashtags. Given the large volume of tweets collected, we did not require that each tweet be tagged at least 3 times by individual volunteers to ensure data quality control. Out of these 30,000+ tweets, volunteers tagged a total of 177 tweets as noting needs or infrastructure damage. A review of these tweets by the SBTF concluded that none were actually informative or actionable.

Just over 350 pictures were tweeted in the aftermath of the earthquake. These were uploaded to the ImageClicker for tagging purposes. However, none of the pictures captured evidence of infrastructure damage. In fact, the vast majority were unrelated to the earthquake. This was also true of pictures published in news articles. Indeed, we used an automated algorithm to identify all tweets with links to news articles; this algorithm would then crawl these articles for evidence of images. We found that the vast majority of these automatically extracted pictures were related to politics rather than infrastructure damage.

Pakistan Quake2

A few preliminary thoughts and reflections from this first test of MicroMappers. First, however, a big, huge, gigantic thanks to my awesome QCRI team: Ji Lucas, Imran Muhammad and Kiran Garimella; to my outstanding colleagues on the SBTF Core Team including but certainly not limited to Jus Mackinnon, Melissa Elliott, Anahi A. Iaccuci, Per Aarvik & Brendan O’Hanrahan (bios here); to the amazing SBTF volunteers and members of the general public who rallied to tag tweets and images—in particular our top 5 taggers: Christina KR, Leah H, Lubna A, Deborah B and Joyce M! Also bravo to volunteers in the Netherlands, UK, US and Germany for being the most active MicroMappers; and last but certainly not least, big, huge and gigantic thanks to Andrew Ilyas for developing the algorithms to automatically identify pictures and videos posted to Twitter.

So what did we learn over the past 48 hours? First, the disaster-affected region is a remote area of south-western Pakistan with a very light social media footprint, so there was practically no user-generated content directly relevant to needs and damage posted on Twitter during the first 36 hours. In other words, there were no needles to be found in the haystack of information. This is in stark contrast to our experience when we carried out a very similar operation following Typhoon Pablo in the Philippines. Obviously, if there’s little to no social media footprint in a disaster-affected area, then monitoring social media is of no use at all to anyone. Note, however, that MicroMappers could also be used to tag 30,000+ text messages (SMS). (Incidentally, since the earthquake struck around 12noon local time, there was only about 18 hours of daylight during the 36-hour period for which we collected the tweets).

Second, while the point of this exercise was not to test our pre-processing filters, it was clear that the single biggest problem was ultimately with the filtering. Our goal was to upload as many tweets as possible to the Clickers and stress-test the apps. So we only filtered tweets using a number of general hashtags such as #Pakistan. Furthermore, we did not filter out any retweets, which probably accounted for 2/3 of the data, nor did we filter by geography to ensure that we were only collecting and thus tagging tweets from users based in Pakistan. This was a major mistake on our end. We were so pre-occupied with testing the actual Clickers that we simply did not pay attention to the pre-processing of tweets. This was equally true of the images uploaded to the ImageClicker.

Pakistan Quake 3

So where do we go from here? Well we have pages and pages worth of feedback to go through and integrate in the next version of the Clickers. For me, one of the top priorities is to optimize our pre-processing algorithms and ensure that the resulting output can be automatically uploaded to the Clickers. We have to refine our algorithms and make damned sure that we only upload unique tweets and images to our Clickers. At most, volunteers should not see the same tweet or image more than 3 times for verification purposes. We should also be more careful with our hashtag filtering and also consider filtering by geography. Incidentally, when our free & open source AIDR platform becomes operational in November, we’ll also have the ability to automatically identify tweets referring to needs, reports of damage, and much, much more.

In fact, AIDR was also tested for the very first time. SBTF volunteers tagged about 1,000 tweets, and just over 130 of the tags enabled us to create an accurate classifier that can automatically identify whether a tweet is relevant for disaster response efforts specifically in Pakistan (80% accuracy). Now, we didn’t apply this classifier on incoming tweets because AIDR uses streaming Twitter data, not static, archived data which is what we had (in the form of CSV files). In any event, we also made an effort to create classifiers for needs and infrastructure damage but did not get enough tags to make these accurate enough. Typically, we need a minimum of 20 or so tags (i.e., examples of actual tweets referring to needs or damage). The more tags, the more accurate the classifier.

The reason there were so few tags, however, is because there were very few to no informative tweets referring to needs or infrastructure damage during the first 36 hours. In any event, I believe this was the very first time that a machine learning classifier was crowdsourced for disaster response purposes. In the future, we may want to first crowdsource a machine learning classifier for disaster relevant tweets and then upload the results to MicroMappers; this would reduce the number of unrelated tweets  displayed on a TweetClicker.

As expected, we have also received a lot of feedback vis-a-vis user experience and the user interface of the Clickers. Speed is at the top of the list. That is, making sure that once I’ve clicked on a tweet/image, the next tweet/image automatically appears. At times, I had to wait more than 20 seconds for the next item to load. We also need to add more progress bars such as the number of tweets or images that remain to be tagged—a countdown display, basically. I could go on and on, frankly, but hopefully these early reflections are informative and useful to others developing next-generation humanitarian technologies. In sum, there is a lot of work to be done still. Onwards!

bio

MicroMappers Launched for Pakistan Earthquake Response (Updated)

Update 1: MicroMappers is now public! Anyone can join to help the efforts!
Update 2: Results of MicroMappers Response to Pakistan Earthquake [Link]

MicroMappers was not due to launch until next month but my team and I at QCRI received a time-sensitive request by colleagues at the UN to carry out an early test of the platform given yesterday’s 7.7 magnitude earthquake, which killed well over 300 and injured hundreds more in south-western Pakistan.

pakistan_quake_2013

Shortly after this request, the UN Office for the Coordination of Humanitarian Affairs (OCHA) in Pakistan officially activated the Digital Humanitarian Network (DHN) to rapidly assess the damage and needs resulting from the earthquake. The award-winning Standby Volunteer Task Force (SBTF), a founding member of the DHN. teamed up with QCRI to use MicroMappers in response to the request by OCHA-Pakistan. This exercise, however, is purely for testing purposes only. We made this clear to our UN partners since the results may be far from optimal.

MicroMappers is simply a collection of microtasking apps (we call them Clickers) that we have customized for disaster response purposes. We just launched both the Tweet and Image Clickers to support the earthquake relief and may also launch the Tweet and Image GeoClickers as well in the next 24 hours. The TweetClicker is pictured below (click to enlarge).

MicroMappers_Pakistan1

Thanks to our partnership with GNIP, QCRI automatically collected over 35,000 tweets related to Pakistan and the Earthquake (we’re continuing to collect more in real-time). We’ve uploaded these tweets to the TweetClicker and are also filtering links to images for upload to the ImageClicker. Depending on how the initial testing goes, we may be able to invite help from the global digital village. Indeed, “crowdsourcing” is simply another way of saying “It takes a village…” In fact, that’s precisely why MicroMappers was developed, to enable anyone with an Internet connection to become a digital humanitarian volunteer. The Clicker for images is displayed below (click to enlarge).

MicroMappers_Pakistan2

Now, whether this very first test of the Clickers goes well remains to be seen. As mentioned, we weren’t planning to launch until next month. But we’ve already learned heaps from the past few hours alone. For example, while the Clickers are indeed ready and operational, our automatic pre-processing filters are not yet optimized for rapid response. The purpose of these filters is to automatically identify tweets that link to images and videos so that they can be uploaded to the Clickers directly. In addition, while our ImageClicker is operational, our VideoClicker is still under development—as is our TranslateClicker, both of which would have been useful in this response. I’m sure will encounter other issues over the next 24-36 hours. We’re keeping track of these in a shared Google Spreadsheet so we can review them next week and make sure to integrate as much of the feedback as possible before the next disaster strikes.

Incidentally, we (QCRI) also teamed up with the SBTF to test the very first version of the Artificial Intelligence for Disaster Response (AIDR) platform for about six hours. As far as we know, this test represents the first time that machine learning classifiers for disaster resposne were created on the fly using crowdsourcing. We expect to launch AIDR publicly at the 2013 CrisisMappers conference this November (ICCM 2013). We’ll be sure to share what worked and didn’t work during this first AIDR pilot test. So stay tuned for future updates via iRevolution. In the meantime, a big, big thanks to the SBTF Team for rallying so quickly and for agreeing to test the platforms! If you’re interested in becoming a digital humanitarian volunteer, simply join us here.

Bio

Enabling Crowdfunding on Twitter for Disaster Response

Twitter is increasingly used to communicate needs during crises. These needs often include requests for information and financial assistance, for example. Identifying these tweets in real-time requires the use of advanced computing and machine learning in particular. This is why my team and I at QCRI are developing the Artificial Intelligence for Disaster Response (AIDR) platform. My colleague Hemant Purohit has been working with us to develop machine learning classifiers to automatically identify and disaggregate between different types of needs. He has also developed classifiers to automatically identify twitter users offering different types of help including financial support. Our aim is to develop a “Match.com” solution to match specific needs with offers of help. What we’re missing, however, is for an easy way to post micro-donations on Twitter as a result of matching financial needs and offers.

tinyGive-logo (1)

This is where my colleague Clarence Wardell and his start-up TinyGive may come in. Geared towards nonprofits, TinyGive is the easiest way to accept donations on Twitter. Indeed, Donating via TinyGive is as simple as tweeting five words: “Hey @[organization], here’s $5! #tinygive”. I recently tried the service at a fundraiser and it really is that easy. TinyGive turns your tweet into an actual donation (and public endorsement), thus drastically reducing the high barriers that currently exist for Twitter users who wish to help others. Indeed, many of the barriers that currently exist in the mobile donation space is overcome by TinyGive.

Combining the AIDR platform with TinyGive would enable us to automatically identify those asking for financial assistance following a disaster and also automatically tweet a link to TinyGive to those offering financial assistance via Twitter. We’re not all affected the same way by disasters and those of us who are in proximity to said disaster but largely unscathed could use Twitter to quickly help those nearby with a simple micro-donation here and there. Think of it as time-critical, peer-to-peer localvesting.

At this recent White House event on humanitarian technology and innovation (which I had been invited to speak at but regrettably had prior commitments), US Chief Technology Office Todd Park talks about the need for “A crowdfunding platform for small businesses and others to receive access to capital to help rebuild after a disaster, including a rating system that encourages rebuilding efforts that improve the community.” Time-critical crowdfunding can build resilience and enable communities to bounce back (and forward) more quickly following a disaster. TinyGive may thus be able to play a role in building community resilience as well.

In the future, my hope is that platforms like TinyGive will also allow disaster-affected individuals (in addition to businesses and other organizations) to receive access to micro-donations during times of need directly via Twitter. There are of course important challenges still ahead, but the self-help, mutual-aid approach to disaster response that I’ve been promoting for years should also include crowdfunding solutions. So if you’ve heard of other examples like TinyGive applied to disaster response, please let me know via the comments section below. Thank you!

bio

Can Official Disaster Response Apps Compete with Twitter?

There are over half-a-billion Twitter users, with an average of 135,000 new users signing up on a daily basis (1). Can emergency management and disaster response organizations win over some Twitter users by convincing them to use their apps in addition to Twitter? For example, will FEMA’s smartphone app gain as much “market share”? The app’s new crowdsourcing feature, “Disaster Reporter,” allows users to submit geo-tagged disaster-related images, which are then added to a public crisis map. So the question is, will more images be captured via FEMA’s app or from Twitter users posting Instagram pictures?

fema_app

This question is perhaps poorly stated. While FEMA may not get millions of users to share disaster-related pictures via their app, it is absolutely critical for disaster response organizations to explicitly solicit crisis information from the crowd. See my blog post “Social Media for Emergency Management: Question of Supply and Demand” for more information on the importance demand-driven crowdsourcing. The advantage of soliciting crisis information from a smartphone app is that the sourced information is structured and thus easily machine readable. For example, the pictures taken with FEMA’s app are automatically geo-tagged, which means they can be automatically mapped if need be.

While many, many more picture may be posted on Twitter, these may be more difficult to map. The vast majority of tweets are not geo-tagged, which means more sophisticated computational solutions are necessary. Instagram pictures are geo-tagged, but this information is not publicly available. So smartphone apps are a good way to overcome these challenges. But we shouldn’t overlook the value of pictures shared on Twitter. Many can be geo-tagged, as demonstrated by the Digital Humanitarian Network’s efforts in response to Typhoon Pablo. More-over, about 40% of pictures shared on Twitter in the immediate aftermath of the Oklahoma Tornado had geographic data. In other words, while the FEMA app may have 10,000 users who submit a picture during a disaster, Twitter may have 100,000 users posting pictures. And while only 40% of the latter pictures may be geo-tagged, this would still mean 40,000 pictures compared to FEMA’s 10,000. Recall that over half-a-million Instagram pictures were posted during Hurricane Sandy alone.

The main point, however, is that FEMA could also solicit pictures via Twitter and ask eyewitnesses to simply geo-tag their tweets during disasters. They could also speak with Instagram and perhaps ask them to share geo-tag data for solicited images. These strategies would render tweets and pictures machine-readable and thus automatically mappable, just like the pictures coming from FEMA’s app. In sum, the key issue here is one of policy and the best solution is to leverage multiple platforms to crowdsource crisis information. The technical challenge is how to deal with the high volume of pictures shared in real-time across multiple platforms. This is where microtasking comes in and why MicroMappers is being developed. For tweets and images that do not contain automatically geo-tagged data, MicroMappers has a microtasking app specifically developed to crowd-source the manual tagging of images.

In sum, there are trade-offs. The good news is that we don’t have to choose one solution over the other; they are complementary. We can leverage both a dedicated smartphone app and very popular social media platforms like Twitter and Facebook to crowdsource the collection of crisis information. Either way, a demand-driven approach to soliciting relevant information will work best, both for smartphone apps and social media platforms.

Bio

 

The First Ever Spam Filter for Disaster Response

While spam filters provide additional layers of security to websites, they can also be used to process all kinds of information. Perhaps most famously, for example, the reCAPTCHA spam filter was used to transcribe the New York Times’ entire paper-based archives. See my previous blog post to learn how this was done and how spam filters can also be used to process information for disaster response. Given the positive response I received from humanitarian colleagues who read the blog post, I teamed up with my colleagues at QCRI to create the first ever spam filter for disaster response.

During international disasters, the humanitarian community (often lead by the UN’s Office for the Coordination of Humanitarian Affairs, OCHA) needs to carry out rapid damage assessments. Recently, these assessments have included the analysis of pictures shared on social media following a disaster. For example, OCHA activated the Digital Humanitarian Network (DHN) to collect and quickly tag pictures that capture evidence of damage in response to Typhoon Pablo in the Philippines (as described here and TEDx talk above). Some of these pictures, which were found on Twitter, were also geo-referenced by DHN volunteers. This enabled OCHA to create (over night) the unique damage assessment map below.

Typhon PABLO_Social_Media_Mapping-OCHA_A4_Portrait_6Dec2012

OCHA intends to activate the DHN again in future disasters to replicate this type of rapid damage assessment operation. This is where spam filters come in. The DHN often needs support to quickly tag these pictures (which may number in the tens of thousands). Adding a spam filter that requires email users to tag which image captures disaster damage not only helps OCHA and other organizations carry out a rapid damage assessment, but also increases the security of email systems at the same time. And it only takes 3 seconds to use the spam filter.

OCHA reCAPTCHA

My team and I at QCRI have thus developed a spam filter plugin that can be easily added to email login pages like OCHA’s as shown above. When the Digital Humanitarian Network requires additional hands on deck to tag pictures during disasters, this plugin can simply be switched on. My team at QCRI can easily push the images to the plugin and pull data on which images have been tagged as showing disaster damage. The process for the end user couldn’t be simpler. Enter your username and password as normal and then simply select the picture below that shows disaster damage. If there are none, then simply click on “None” and then “Login”. The spam filter uses a predictive algorithm and an existing data-base of pictures as a control mechanism to ensure that the filter cannot be gamed. On that note, feel free to test the plugin here. We’d love your feedback as we continue testing.

recpatcha2

The desired outcome? Each potential disaster picture is displayed to 3 different email account users. Only if each of the 3 users tag the same picture as capturing disaster damage does that picture get automatically forwarded to members of the Digital Humanitarian Network. To tag more pictures after logging in, users are invited to do so via MicroMappers, which launches this September in partnership with OCHA. MicroMappers enables members of the public to participate in digital disaster response efforts with a simple click of the mouse.

I would ideally like to see an innovative and forward-thinking organization like OCHA pilot the plugin for a two week feasibility test. If the results are positive and promising, then I hope OCHA and other UN agencies engaged in disaster response adopt the plugin more broadly. As mentioned in my previous blog post, the UN employs well over 40,000 people around the world. Even if “only” 10% login in one day, that’s still 4,000 images effortlessly tagged for use by OCHA and others during their disaster relief operations. Again, this plugin would only be used in response to major disasters when the most help is needed. We’ll be making the code for this plugin freely available and open source.

Please do get in touch if you’d like to invite your organization to participate in this innovative humanitarian technology project. You can support disaster response efforts around the world by simply logging into your email account, web portal, or Intranet!

bio

TEDx: Microtasking for Disaster Response

My TEDx talk on Digital Humanitarians presented at TEDxTraverseCity. I’ve automatically forwarded the above video to a short 4 minute section of the talk in which I highlight how the Digital Humanitarian Network (DHN) used micro-tasking to support the UN Office for the Coordination of Humanitarian Affairs (OCHA) in response to Typhoon Pablo in the Philippines. See this blog post to learn more about the operation. As a result of this innovative use of micro-tasking, my team and I at QCRI are collaborating with UN OCHA colleagues to launch MicroMappers—a dedicated set of microtasking apps specifically designed for disaster response. These will go live in September 2013.


bio

 

Disaster Response Plugin for Online Games

The Internet Response League (IRL) was recently launched for online gamers to participate in supporting disaster response operations. A quick introduction to IRL is available here. Humanitarian organizations are increasingly turning to online volunteers to filter through social media reports (e.g. tweets, Instagram photos) posted during disasters. Online gamers already spend millions of hours online every day and could easily volunteer some of their time to process crisis information without ever having to leave the games they’re playing.

A message like this would greet you upon logging in. (Screenshot is from World of Warcraft and has been altered)

Lets take World of Warcraft, for example. If a gamer has opted in to receive disaster alerts, they’d see screens like the one above when logging in or like the one below whilst playing a game.

In game notification should have settings so as to not annoy players. (Screenshot is from World of Warcraft and has been altered)

If a gamer accepts the invitation to join the Internet Response League, they’d see the “Disaster Tagging” screen below. There they’d tag as many pictures as wish by clicking on the level of disaster damage they see in each photo. Naturally, gamers can exit the disaster tagging area at any time to return directly to their game.

A rough concept of what the tagging screen may look like. (Screenshot is from World of Warcraft and has been altered)

Each picture would be tagged by at least 3 gamers in order to ensure the accuracy of the tagging. That is, if 3 volunteers tag the same image as “Severe”, then we can be reasonably assured that the picture does indeed show infrastructure damage. These pictures would then be sent back to IRL and shared with humanitarian organizations for rapid damage assessment analysis. There are already precedents for this type of disaster response tagging. Last year, the UN asked volunteers to tag images shared on Twitter after a devastating Typhoon hit the Philippines. More specifically, they asked them to tag images that captured the damage caused by the Typhoon. You can learn more about this humanitarian response operation here.

IRL is now looking to develop a disaster response plugin like the one described above. This way, gaming companies will have an easily embeddable plugin that they can insert into their gaming environments. For more on this plugin and the latest updates on IRL, please visit the IRL website here. We’re actively looking for feedback and welcome collaborators and partnerships.

Bio

Acknowledgements: Screenshots created by my colleague Peter Mosur who is the co-founder of the IRL.

Why the Share Economy is Important for Disaster Response and Resilience

A unique and detailed survey funded by the Rockefeller Foundation confirms the important role that social and community bonds play vis-à-vis disaster resilience. The new study, which focuses on resilience and social capital in the wake of Hurricane Sandy, reveals how disaster-affected communities self-organized, “with reports of many people sharing access to power, food and water, and providing shelter.” This mutual aid was primarily coordinated face-to-face. This may not always be possible, however. So the “Share Economy” can also play an important role in coordinating self-help during disasters.

In a share economy, “asset owners use digital clearinghouses to capitalize the unused capacity of things they already have, and consumers rent from their peers rather than rent or buy from a company” (1). During disasters, these asset owners can use the same digital clearinghouses to offer what they have at no cost. For example, over 1,400 kindhearted New Yorkers offered free housing to people heavily affected by the hurricane. They did this using AirBnB, as shown in the short video above. Meanwhile, on the West Coast, the City of San Francisco has just lunched a partnership with BayShare, a sharing economy advocacy group in the Bay Area. The partnership’s goal is to “harness the power of sharing to ensure the best response to future disasters in San Francisco” (2).

fon wifi sharing

While share economy platforms like AirBnB are still relatively new, many believe that “the share economy is a real trend and not some small blip (3). So it may be worth taking an inventory of share platforms out there that are likely to be useful for disaster response. Here’s a short list:

  • AirBnBA global travel rental platform with accommodations in 192 countries. This service has already been used for disaster response as described above.
  • FonEnables people to share some of their home Wi-Fi  in exchange for getting free Wi-Fi from 8 million people in Fon’s network. Access to information is always key during & after disasters. The map above  displays a subset of all Fon users in that part of Europe.
  • LendingClub: A cheaper service than credit cards for borrowers. Also provides better interest rates than savings accounts for investors. Access to liquidity is often necessary after a disaster.
  • LiquidSpaceProvides high quality temporary workspaces and office rentals. These can be rented by the hour and by the day.  Dedicated spaces are key for coordinating disaster response.
  • Lyft: An is on-demand ride-sharing smartphone app for cheaper, safer rides. This service could be used to transport people and supplies following a disaster. Similar to Sidecar.
  • RelayRides:  A car sharing marketplace where participants can rent out their own cars. Like Lyft, RelayRides could be used to transport goods and people. Similar to Getaround. Also, ParkingPanda is the parking equivalent.
  • TaskRabbit: Get your deliveries and errands completed easily & quickly by trusted individuals in your neighborhood. This service could be used to run quick errands following disasters. Similar to Zaarly, a marketplace that helps you discover and hire local services. 
  • Yerdle: An “eBay” for sharing items with your friends. This could be used to provide basic supplies to disaster-affected neighborhoods. Similar to SnapGood, which also allows for temporary sharing.

Feel free to add more examples via the comments section below if you know of other sharing economy platforms that could be helpful during disasters.

While these share tools don’t necessary reinforce bonding social capital since face-to-face interactions are not required, they do stand to increase levels of bridging social capital. The former refers to social capital within existing social networks while the latter refers to “cooperative connections with people from different walks of life,” and is often considered “more valuable than ‘bonding social capital’” (3). Bridging social capital is “closely related to thin trust, as opposed to the bonding social capital of thick trust” (4). Platforms that facilitate the sharing economy provide reassurance vis-à-vis the thin trust since they tend to vet participants. This extra reassurance can go a long way during disasters and may thus facilitate mutual-aid at a distance.

 bio

Automatically Identifying Fake Images Shared on Twitter During Disasters

Artificial Intelligence (AI) can be used to automatically predict the credibility of tweets generated during disasters. AI can also be used to automatically rank the credibility of tweets posted during major events. Aditi Gupta et al. applied these same information forensics techniques to automatically identify fake images posted on Twitter during Hurricane Sandy. Using a decision tree classifier, the authors were able to predict which images were fake with an accuracy of 97%. Their analysis also revealed retweets accounted for 86% of all tweets linking to fake images. In addition, their results showed that 90% of these retweets were posted by just 30 Twitter users.

Fake Images

The authors collected the URLs of fake images shared during the hurricane by drawing on the UK Guardian’s list and other sources. They compared these links with 622,860 tweets that contained links and the words “Sandy” & “hurricane” posted between October 20th and November 1st, 2012. Just over 10,300 of these tweets and retweets contained links to URLs of fake images while close to 5,800 tweets and retweets pointed to real images. Of the ~10,300 tweets linking to fake images, 84% (or 9,000) of these were retweets. Interestingly, these retweets spike about 12 hours after the original tweets are posted. This spike is driven by just 30 Twitter users. Furthermore, the vast majority of retweets weren’t made by Twitter followers but rather by those following certain hashtags. 

Gupta et al. also studied the profiles of users who tweeted or retweeted fake images  (User Features) and also the content of their tweets (Tweet Features) to determine whether these features (listed below) might be predictive of whether a tweet posts to a fake image. Their decision tree classifier achieved an accuracy of over 90%, which is remarkable. But the authors note that this high accuracy score is due to “the similar nature of many tweets since since a lot of tweets are retweets of other tweets in our dataset.” In any event, their analysis also reveals that Tweet-based Features (such as length of tweet, number of uppercase letters, etc.), were far more accurate in predicting whether or not a tweeted image was fake than User-based Features (such as number of friends, followers, etc.). One feature that was overlooked, however, is gender.

Information Forensics

In conclusion, “content and property analysis of tweets can help us in identifying real image URLs being shared on Twitter with a high accuracy.” These results reinforce the proof that machine computing and automated techniques can be used for information forensics as applied to images shared on social media. In terms of future work, the authors Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru and Anupam Joshi plan to “conduct a larger study with more events for identification of fake images and news propagation.” They also hope to expand their study to include the detection of “rumors and other malicious content spread during real world events apart from images.” Lastly, they “would like to develop a browser plug-in that can detect fake images being shared on Twitter in real-time.” There full paper is available here.

Needless to say, all of this is music to my ears. Such a plugin could be added to our Artificial Intelligence for Disaster Response (AIDR) platform, not to mention our Verily platform, which seeks to crowdsource the verification of social media reports (including images and videos) during disasters. What I also really value about the authors’ approach is how pragmatic they are with their findings. That is, by noting their interest in developing a browser plugin, they are applying their data science expertise for social good. As per my previous blog post, this focus on social impact is particularly rare. So we need more data scientists like Aditi Gupta et al. This is why I was already in touch with Aditi last year given her research on automatically ranking the credibility of tweets. I’ve just reached out to her again to explore ways to collaborate with her and her team.

bio

What is Big (Crisis) Data?

What does Big Data mean in the context of disaster response? Big (Crisis) Data refers to the relatively large volumevelocity and variety of digital information that may improve sense making and situational awareness during disasters. This is often referred to the 3 V’s of Big Data.

Screen Shot 2013-06-26 at 7.49.49 PM

Volume refers to the amount of data (20 million tweets were posted during Hurricane Sandy) while Velocity refers to the speed at which that data is generated (over 2,000 tweets per second were generated following the Japan Earthquake & Tsunami). Variety refers to the variety of data generated, e.g., Numerical (GPS coordinates), Textual (SMS), Audio (phone calls), Photographic (satellite Imagery) and Video-graphic (YouTube). Sources of Big Crisis Data thus include both public and private sources such images posted as social media (Instagram) on the one hand, and emails or phone calls (Call Record Data) on the other. Big Crisis Data also relates to both raw data (the text of individual Facebook updates) as well as meta-data (the time and place those updates were posted, for example).

Ultimately, Big Data describe datasets that are too large to be effectively and quickly computed on your average desktop or laptop. In other words, Big Data is relative to the computing power—the filters—at your finger tips (along with the skills necessary to apply that computing power). Put differently, Big Data is “Big” because of filter failure. If we had more powerful filters, said “Big” Data would be easier to manage. As mentioned in previous blog posts, these filters can be created using Human Computing (crowdsourcing, microtasking) and/or Machine Computing (natural language processing, machine learning, etc.).

BigData1

Take the above graph, for example. The horizontal axis represents time while the vertical one represents volume of information. On a good day, i.e., when there are no major disasters, the Digital Operations Center of the American Red Cross monitors and manually reads about 5,000 tweets. This “steady state” volume and velocity of data is represented by the green area. The dotted line just above denotes an organization’s (or individual’s) capacity to manage a given volume, velocity and variety of data. When disaster strikes, that capacity is stretched and often overwhelmed. More than 3 million tweets were posted during the first 48 hours after the Category 5 Tornado devastated Moore, Oklahoma, for example. What happens next is depicted in the graph below.

BigData 2

Humanitarian and emergency management organizations often lack the internal surge capacity to manage the rapid increase in data generated during disasters. This Big Crisis Data is represented by the red area. But the dotted line can be raised. One way to do so is by building better filters (using Human and/or Machine Computing). Real world examples of Human and Machine Computing used for disaster response are highlighted here and here respectively.

BigData 3

A second way to shift the dotted line is with enlightened leadership. An example is the Filipino Government’s actions during the recent Typhoon. More on policy here. Both strategies (advanced computing & strategic policies) are necessary to raise that dotted line in a consistent manner.

Bio

See also:

  • Big Data for Disaster Response: A List of Wrong Assumptions [Link]