Tag Archives: bbc

Accelerating the Verification of Social Media Content

Journalists have already been developing a multitude of tactics to verify user-generated content shared on social media. As noted here, the BBC has a dedicated User-Generated Content (UGC) Hub that is tasked with verifying social media information. The UK Guardian, Al-Jazeera, CNN and others are also developing competency in what I refer to as “information forensics”. It turns out there are many tactics that can be used to try and verify social media content. Indeed, applying most of these existing tactics can be highly time consuming.

So building a decision-tree that combines these tactics is the way to go. But doing digital detective work online is still a time-intensive effort. Numerous pieces of digital evidence need to be collected in order to triangulate and ascertain the veracity of just one given report. We therefore need tools that can accelerate the processing of a verification decision-tree. To be sure, information is the most perishable commodity in a crisis—for both journalists and humanitarian pro-fessionals. This means that after a certain period of time, it no longer matters whether a report has been verified or not because the news cycle or crisis has unfolded further since.

This is why I’m a fan of tools like Rapportive. The point is to have the decision-tree not only serve as an instruction-set on what types of evidence to collect but to actually have a platform that collects that information. There are two general strategies that could be employed to accelerate and scale the verification process. One is to split the tasks listed in the decision-tree into individual micro-tasks that can be distributed and independently completed using crowdsourcing. A second strategy is to develop automated ways to collect the evidence.

Of course, both strategies could also be combined. Indeed, some tasks are far better suited for automation while others can only be carried about by humans. In sum, the idea here is to save journalists and humanitarians time by considerably reducing the time it takes to verify user-generated content posted on social media. I am also particularly interested in gamification approaches to solve major challenges, like the Protein Fold It game. So if you know of any projects seeking to solve the verification challenge described above in novel ways, I’d be very grateful for your input in the comments section below. Thank you!

How People in Emergencies Use Communication to Survive

“Still Left in the Dark? How People in Emergencies Use Communication to Survive — And How Humanitarian Agencies Can Help” is an excellent report pub-lished by the BBC World Service Trust earlier this year. It is a follow up to the BBC’s 2008 study “Left in the Dark: The Unmet Need for Information in Humanitarian Emergencies.” Both reports are absolute must-reads. I highlight the most important points from the 2012 publication below.

Are Humanitarians Being Left in the Dark?

The disruptive impact of new information and communication technologies (ICTs) is hardly a surprise. Back in 2007, researchers studying the use of social media during “forest fires in California concluded that ‘these emergent uses of social media are pre-cursors of broader future changes to the institutional and organizational arrangements of disaster response.’” While the main danger in 2008 was that disaster-affected communities would continue to be left in the dark since humanitarian organizations were not prioritizing information delivery, in 2012, “it may now be the humanitarian agencies themselves […] who risk being left in the dark.” Why? “Growing access to new technologies make it more likely that those affected by disaster will be better placed to access information and communicate their own needs.” Question is: “are humanitarian agencies prepared to respond to, help and engage with those who are communicating with them and who demand better information?” Indeed, “one of the consequences of greater access to, and the spread of, communications technology is that communities now expect—and demand—interaction.”

Monitoring Rumors While Focusing on Interaction and Listening

The BBC Report invites humanitarian organizations to focus on meaningful interaction with disaster-affected communities, rather than simply on message delivery. “Where agencies do address the question of communication with affected communities, this still tends to be seen as a question of relaying infor-mation (often described as ‘messaging’) to an unspecified ‘audience’ through a channel selected as appropriate (usually local radio). It is to be delivered when the agency thinks that it has something to say, rather than in response to demand. In an environment in which […] interaction is increasingly expected, this approach is becoming more and more out of touch with community needs. It also represents a fundamental misunderstanding of the nature and potential of many technological tools particularly Twitter, which work on a real time many-to-many information model rather than a simple broadcast.”

Two-way communication with disaster-affected communities requires two-way listening. Without listening, there can be no meaningful communication. “Listening benefits agencies, as well as those with whom they communicate. Any agency that does not monitor local media—including social media—for misinformation or rumors about their work or about important issues, such as cholera awareness risks, could be caught out by the speed at which information can move.” This is an incredibly important point. Alas, humanitarian organ-izations have not caught up with recent advances in social computing and big data analytics. This is one of the main reasons I joined the Qatar Computing Research Institute (QCRI); i.e., to spearhead the development of next-generation humani-tarian technology solutions.

Combining SMS with Geofencing for Emergency Alerts

Meanwhile, in Haiti, “phone company Digicel responded to the 2010 cholera outbreak by developing methods that would send an SMS to anyone who travelled through an identified cholera hotspot, alerting them to the dangers and advising on basic precautions.” The later is an excellent example of geofencing in action. That said, “while responders tend to see communication as a process either of delivering information (‘messaging’) or extracting it, disaster survivors seem to see the ability to communicate and the process of communication itself as every bit as important as the information delivered.”

Communication & Community-Based Disaster Response Efforts

As the BBC Report notes, “there is also growing evidence that communities in emergencies are adept at leveraging communications technology to organize their own responses.” This is indeed true as these recent examples demonstrate:

“Communications technology is empowering first responders in new and extremely potent ways that are, at present, little understood by international humanitarians. While aid agencies hesitate, local communities are using commu-nications technology to reshape the way they prepare for and respond to emergencies.” There is a definite payoff to those agencies that employ an “integrated approach to communicating and engaging with disaster affected communities […]” since they are “viewed more positively by beneficiaries than those that [do] not.” Indeed, “when disaster survivors are able to communicate with aid agencies their perceptions become more positive.”

Using New Technologies to Manage Local Feedback Mechanisms

So why don’t more agencies follow suite? Many are concerned that establishing feedback systems will prove impossible to manage let alone sustain. They fear that “they would not be able to answer questions asked, that they [would] not have the skills or capacity to manage the anticipated volume of inputs and that they [would be] unequipped to deal with people who would (it is assumed) be both angry and critical.”

I wonder whether these aid agencies realize that many private sector companies have feedback systems that engage millions of customers everyday; that these companies are using social media and big data analytics to make this happen. Some are even crowdsourcing their customer service support. It is high time that the humanitarian community realize that the challenges they face aren’t that unique and that solutions have already been developed in other sectors.

There are only a handful of examples of positive deviance vis-a-vis the setting up of feedback systems in the humanitarian space. Oxfam found that simply com-bining the “automatic management of SMS systems” with “just one dedicated local staff member […] was enough to cope with demand.” When the Danish Refugee Council set up their own SMS complaints mechanism, they too expected be overwhelmed with criticisms. “To their surprise, more than half of the SMS’s they received via their feedback system […] have been positive, with people thanking the agency for their assistance […].” This appears to be a pattern since “many other agencies reported receiving fewer ‘difficult’ questions than anticipated.”

Naturally, “a systematic and resourced approach for feedback” is needed either way. Interestingly, “many aid agencies are in fact now running de facto feedback and information line systems without realizing it. […] most staff who work directly with disaster survivors will be asked for contact details by those they interact with, and will give their own personal mobile numbers.” These ad hoc “systems” are hardly efficient, well-resourced or systematic, however.

User-Generated Content, Representativeness and Ecosystems

Obviously, user-generated content shared via social media may not be represen-tative. “But, as costs fall and coverage increases, all the signs are that usage will increase rapidly in rural areas and among poorer people. […] As one Somali NGO staff member commented […], ‘they may not have had lunch — but they’ll have a mobile phone.’” Moreover, there is growing evidence that individuals turn to social media platforms for the first time as a result of crisis. “In Thailand, for example, the use of social media increased 20% when the 2010 floods began–with fairly equal increases found in metropolitan Bangkok and in rural provinces.”

While the vast majority of Haitians in Port-au-Prince are not on Twitter, “the city’s journalists overwhelmingly are and and see it as an essential source of news and updates.” Since most Haitians listen to radio, “they are, in fact, the indirect beneficiaries of Twitter information systems.” Another interesting fact: “In Kenya, 27% of radio listeners tune in via their mobile phones.” This highlights the importance of an ecosystem approach when communicating with disaster-affected communities. On a related note, recent statistics reveal that individuals in developing countries spend about 17.5% of their income on ICTs compared to just 1.5% in developing countries.

Information Forensics: Five Case Studies on How to Verify Crowdsourced Information from Social Media

My 20+ page study on verifying crowdsourced information is now publicly available here as a PDF and here as an open Google Doc for comments. I very much welcome constructive feedback from iRevolution readers so I can improve the piece before it gets published in an edited book next year.


False information can cost lives. But no information can also cost lives, especially in a crisis zone. Indeed, information is perishable so the potential value of information must be weighed against the urgency of the situation. Correct information that arrives too late is useless. Crowdsourced information can provide rapid situational awareness, especially when added to a live crisis map. But information in the social media space may not be reliable or immediately verifiable. This may explain why humanitarian (and news) organizations are often reluctant to leverage crowdsourced crisis maps. Many believe that verifying crowdsourced information is either too challenging or impossible. The purpose of this paper is to demonstrate that concrete strategies do exist for the verification of geo-referenced crowdsourced social media information. The study first provides a brief introduction to crisis mapping and argues that crowdsourcing is simply non-probability sampling. Next, five case studies comprising various efforts to verify social media are analyzed to demonstrate how different verification strategies work. The five case studies are: Andy Carvin and Twitter; Kyrgyzstan and Skype; BBC’s User-Generated Content Hub; the Standby Volunteer Task Force (SBTF); and U-Shahid in Egypt. The final section concludes the study with specific recommendations.

Update: See also this link and my other posts on Information Forensics.

How to Verify Social Media Content: Some Tips and Tricks on Information Forensics

Update: I have authored a 20+ page paper on verifying social media content based on 5 case studies. Please see this blog post for a copy.

I get this question all the time: “How do you verify social media data?” This question drives many of the conversations on crowdsourcing and crisis mapping these days. It’s high time that we start compiling our tips and tricks into an online how-to-guide so that we don’t have to start from square one every time the question comes up. We need to build and accumulate our shared knowledge in information forensics. So here is the Google Doc version of this blog post, please feel free to add your best practices and ask others to contribute. Feel free to also add links to other studies on verifying social media content.

If every source we monitored in the social media space was known and trusted, then the need for verification would not be as pronounced. In other words, it is the plethora and virtual anonymity of sources that makes us skeptical of the content they deliver. The process of verifying  social media data thus requires a two-step process: the authentication of the source as reliable and the triangulation of the content as valid. If we can authenticate the source and find it trustworthy, this may be sufficient to trust the content and mark is a verified depending on context. If source authentication is difficult to ascertain, then we need to triangulate the content itself.

Lets unpack these two processes—authentication and triangulation—and apply them to Twitter since the most pressing challenges regarding social media verification have to do with eyewitness, user-generated content. The first step is to try and determine whether the source is trustworthy. Here are some tips on how to do this:

  • Bio on Twitter: Does the source provide a name, picture, bio and any  links to their own blog, identity, professional occupation, etc., on their page? If there’s a name, does searching for this name on Google provide any further clues to the person’s identity? Perhaps a Facebook page, a professional email address, a LinkedIn profile?
  • Number of Tweets: Is this a new Twitter handle with only a few tweets? If so, this makes authentication more difficult. Arasmus notes that “the more recent, the less reliable and the more likely it is to be an account intended to spread disinformation.” In general, the longer the Twitter handle has been around and the more Tweets linked to this handle, the better. This gives a digital trace, a history of prior evidence that can be scrutinized for evidence of political bias, misinformation, etc. Arasmus specifies: “What are the tweets like? Does the person qualify his/her reports? Are they intelligible? Is the person given to exaggeration and inconsistencies?”
  • Number of followers: Does the source have a large following? If there are only a few, are any of the followers know and credible sources? Also, how many lists has this Twitter hanlde been added to?
  • Number following: How many Twitter users does the Twitter handle follow? Are these known and credible sources?
  • Retweets: What type of content does the Twitter handle retweet? Does the Twitter handle in question get retweeted by known and credible sources?
  • Location: Can the source’s geographic location be ascertained? If so, are they nearby the unfolding events? One way to try and find out by proxy is to examine during which periods of the day/night the source tweets the most. This may provide an indication as to the person’s time zone.
  • Timing: Does the source appear to be tweeting in near real-time? Or are there considerable delays? Does anything appear unusual about the timing of the person’s tweets?
  • Social authentication: If you’re still unsure about the source’s reliability, use your own social network–Twitter, Facebook, LinkedIn–to find out if anyone in your network know about the source’s reliability.
  • Media authentication: Is the source quoted by trusted media outlines whether this be in the mainstream or social media space?
  • Engage the source: Tweet them back and ask them for further information. NPR’s Andy Carvin has employed this technique particularly well. For example, you can tweet back and ask for the source of the report and for any available pictures, videos, etc. Place the burden of proof on the source.

These are some of the tips that come to mind for source authentication. For more thoughts on this process, see my previous blog post “Passing the I’m-Not-Gaddafi-Test: Authenticating Identity During Crisis Mapping Operations.” If you some tips of your own not listed here, please do add them to the Google Doc—they don’t need to be limited to Twitter either.

Now, lets say that we’ve gone through list above and find the evidence inconclusive. We thus move to try and triangulate the content. Here are some tips on how to do this:

  • Triangulation: Are other sources on Twitter or elsewhere reporting on the event you are investigating? As Arasmus notes, “remain skeptical about the reports that you receive. Look for multiple reports from different unconnected sources.” The more independent witnesses you can get information from the better and the less critical the need for identity authentication.
  • Origins: If the user reporting an event is not necessarily the original source, can the original source be identified and authenticated? In particular, if the original source is found, does the time/date of the original report make sense given the situation?
  • Social authentication: Ask members of your own social network whether the tweet you are investigating is being reported by other sources. Ask them how unusual the event reporting is to get a sense of how likely it is to have happened in the first place. Andy Carvin’s followers, for example, “help him translate, triangulate, and track down key information. They enable remarkable acts of crowdsourced verification [...] but he must always tell himself to check and challenge what he is told.”
  • Language: Andy Carvin notes that tweets that sound too official, using official language like “breaking news”, “urgent”, “confirmed” etc. need to be scrutinized. “When he sees these terms used, Carvin often replies and asks for additional details, for pictures and video. Or he will quote the tweet and add a simple one word question to the front of the message: Source?” The BBC’s UGC (user-generated content) Hub in London also verifies whether the vocabulary, slang, accents are correct for the location that a source might claim to be reporting from.
  • Pictures: If the twitter handle shares photographic “evidence”, does the photo provide any clues about the location where it was taken based on buildings, signs, cars, etc., in the background? The BBC’s UGC Hub checks weaponry against those know for the given country and also looks for shadows to determine the possible time of day that a picture was taken. In addition, they examine weather reports to “confirm that the conditions shown fit with the claimed date and time.” These same tips can be applied to Tweets that share video footage.
  • Follow up: If you have contacts in the geographic area of interest, then you could ask them to follow up directly/in-person to confirm the validity of the report. Obviously this is not always possible, particularly in conflict zones. Still, there is increasing anecdotal evidence that this strategy is being used by various media organizations and human rights groups. One particularly striking example comes from Kyrgyzstan where  a Skype group with hundreds of users across the country were able disprove and counter rumors at a breathtaking pace. See this blog post for more details. See my blog post on “How to Use Technology to Counter Rumors During Crises: Anecdotes from Kyrgyzstan.”

These are just a handful of tips and tricks come to mind. The number of bullet points above clearly shows we are not completely powerless when verifying social media data. There are several strategies available. The main challenge, as the BBC points out, is that this type of information forensics “can take anything from seconds [...] to hours, as we hunt for clues and confirmation.” See for example my earlier post on “The Crowdsourcing Detective: Crisis, Deception and Intrigue in the Twitterspehere” which highlights some challenges but also new opportunities.

One of Storyful‘s comparative strengths when it comes to real-time news curation is the growing list of authenticated users it follows. This represents more of a bounded (but certainly not static) approach.  As noted in my previous blog post on “Seeking the Trustworthy Tweet,” following a bounded model presents some obvious advantages. This explains by the BBC recommends “maintaining lists of previously verified material [and sources] to act as a reference for colleagues covering the stories.” This strategy is also employed by the Verification Team of the Standby Volunteer Task Force (SBTF).

In sum, I still stand by my earlier blog post entitled “Wag the Dog: How Falsifying Crowdsourced Data can be a Pain.” I also continue to stand by my opinion that some data–even if not immediately verifiable—is better than no data. Also, it’s important to recognize that  we have in some occasions seen social media prove to be self-correcting, as I blogged about here. Finally, we know that information is often perishable in times of crises. By this I mean that crisis data often has a “use-by date” after which, it no longer matters whether said information is true or not. So speed is often vital. This is why semi-automated platforms like SwiftRiver that aim to filter and triangulate social media content can be helpful.