Tag Archives: SMS

Automatically Classifying Text Messages (SMS) for Disaster Response

Humanitarian organizations like the UN and Red Cross often face a deluge of social media data when disasters strike areas with a large digital footprint. This explains why my team and I have been working on AIDR (Artificial Intelligence for Disaster Response), a free and open source platform to automatically classify tweets in real-time. Given that the vast majority of the world’s population does not tweet, we’ve teamed up with UNICEF’s Innovation Team to extend our AIDR platform so users can also automatically classify streaming SMS.

BulkSMS_graphic

After the Haiti Earthquake in 2010, the main mobile network operator there (Digicel) offered to sent an SMS to each of their 1.4 million subscribers (at the time) to accelerate our disaster needs assessment efforts. We politely declined since we didn’t have any automated (or even semi-automated way) of analyzing incoming text messages. With AIDR, however, we should (theoretically) be able to classify some 1.8 million SMS’s (and tweets) per hour. Enabling humanitarian organizations to make sense of “Big Data” generated by affected communities is obviously key for two-way communication with said communities during disasters, hence our work at QCRI on “Computing for Good”.

AIDR/SMS applications are certainly not limited to disaster response. In fact, we plan to pilot the AIDR/SMS platform for a public health project with our UNICEF partners in Zambia next month and with other partners in early 2015. While still experimental, I hope the platform will eventually be robust enough for use in response to major disasters; allowing humanitarian organizations to poll affected communities and to make sense of resulting needs in near real-time, for example. Millions of text messages could be automatically classified according to the Cluster System, for example, and the results communicated back to local communities via community radio stations, as described here.

These are still very early days, of course, but I’m typically an eternal optimist, so I hope that our research and pilots do show promising results. Either way, we’ll be sure to share the full outcome of said pilots publicly so that others can benefit from our work and findings. In the meantime, if your organization is interested in piloting and learning with us, then feel free to get in touch.

bio

Combining Radio, SMS and Advanced Computing for Disaster Response

I’m headed to the Philippines this week to collaborate with the UN Office for the Coordination of Humanitarian Affairs (OCHA) on humanitarian crowdsourcing and technology projects. I’ll be based in the OCHA Offices in Manila, working directly with colleagues Andrej Verity and Luis Hernando to support their efforts in response to Typhoon Yolanda. One project I’m exploring in this respect is a novel radio-SMS-computing initiative that my colleague Anahi Ayala (Internews) and I began drafting during ICCM 2013 in Nairobi last week. I’m sharing the approach here to solicit feedback before I land in Manila.

Screen Shot 2013-11-25 at 6.21.33 AM

The “Radio + SMS + Computing” project is firmly grounded in GSMA’s official Code of Conduct for the use of SMS in Disaster Response. I have also drawn on the Bellagio Big Data Principles when writing up the in’s and out’s of this initiative with Anahi. The project is first and foremost a radio-based initiative that seeks to answer the information needs of disaster-affected communities.

The project: Local radio stations in the Philippines would create and broadcast radio programs inviting local communities to serve as “community journalists” to describe how the Typhoon has impacted their communities. The radio stations would provide a free SMS short-code and invite said communities to text in their observations. Each radio station would include in their broadcast a unique 2-letter identifier and would ask those texting in to start their SMS with that identifier. They would also emphasize that text messages should not include any Personal Identifying Information (PII) and no location information either. Those messages that do include PII would be deleted.

Text messages sent to the SMS short code would be automatically triaged by radio station (using the 2-letter identifier) and forwarded to the respective radio stations via SMS. (At this point, few local radio stations have web access in the disaster-affected areas). These radio stations would be funded to create radio programs based on the SMS’s received. These programs would conclude by asking local communities to text in their information needs—again using the unique radio identifier as a prefix in the text messages. Radio stations would create follow-up programs to address the information needs texted in by local communities (“news you can use”). This could be replicated on a weekly basis and extended to the post-disaster reconstruction phase.

Yolanda destruction

In parallel, the text messages documenting the impact of the Typhoon at the community level would be categorized by Cluster—such as shelter, health, education, etc. Each classified SMS would then be forwarded to the appropriate Cluster Leads. This is where advanced computing comes in: the application of microtasking and machine learning. Trusted Filipino volunteers would be invited to tag each SMS by Cluster-category (and also translate relevant text messages into English). Once enough text messages have been tagged per category, the use of machine learning classifiers would enable the automatic classification of incoming SMS’s. As explained above, these classified SMS’s would then be automatically forwarded to a designated point of contact at each Cluster Agency.

This process would be repeated for SMS’s documenting the information needs of local communities. In other words, information needs would be classified by Cluster category and forwarded to Cluster Leads. The latter would share their responses to stated information needs with the radio stations who in turn would complement their broadcasts with the information provided by the humanitarian community, thus closing the feedback loop.

The radio-SMS project would be strictly opt-in. Radio programs would clearly state that the data sent in via SMS would be fully owned by local communities who could call in or text in at any time to have their SMS deleted. Phone numbers would only be shared with humanitarian organization if the individuals texting to radio stations consented (via SMS) to their numbers being shared. Inviting communities to act as “citizen journalists” rather than asking them to report their needs may help manage expectations. Radio stations can further manage these expectations during their programs by taking questions from listeners calling in. In addition, the project seeks to limit the number of SMS’s that communities have to send. The greater the amount of information solicited from disaster-affected communities, the more challenging managing expectations may be. The project also makes a point of focusing on local information needs as the primary entry point. Finally, the data collection limits the geographical resolution to the village level for the purposes of data privacy and protection.

AIDR logo

It remains to be seen whether this project gets funded, but I’d welcome any feedback iRevolution readers may have in any event since this approach could also be used in future disasters. In the meantime, my QCRI colleagues and I are looking to modify AIDR to automatically classify SMS’s (in addition to tweets). My UNICEF colleagues already expressed to me their need to automatically classify millions of text messages for their U-Report project, so I believe that many other humanitarian and development organizations will benefit from a free and open source platform for automatic SMS classification. At the technical level, this means adding “batch-processing” to AIDR’s current “streaming” feature. We hope to have an update on this in coming weeks. Note that a batch-processing feature will also allow users to upload their own datasets of tweets for automatic classification. 

Bio

Making All Voices Count Using SMS and Advanced Computing

Local communities in Uganda send UNICEF some 10,000 text messages (SMS) every week. These messages reflect the voices of Ugandan youths who use UNICEF’s U-report SMS platform to share their views on a range of social issues. Some messages are responses to polls created by UNICEF while others are unsolicited reports of problems that youths witness in their communities. About 40% of text messages received by UNICEF require an SMS reply providing advice or an answer to a question while 7% of messages require immediate action. Over 220,000 young people in Uganda have enrolled in U-report, with 200 to 1,000 new users joining on daily basis. UNICEF doesn’t have months or the staff to manually analyze this high volume and velocity of incoming text messages. This is where advanced computing comes in.

UNICEF U-report

IBM recently partnered with UNICEF Uganda to develop an automated system to classify incoming text messages. (If this sounds familiar to iRevolution readers it is because my team and I at QCRI are developing a similar platform called Artificial Intelligence for Disaster Response, or AIDR. While our system is first and foremost geared towards classifying tweets, it can also be used to filter large volumes of SMS). The automated platform classifies incoming text messages into one (or more) of the following categories: water, health & nutrition, orphans & vulnerable children, violence against children, education, employment, social policy, emergency, u-report, energy, family & relationships, irrelevant and poll.

IBM analysis

IBM created machine learning classifiers that are 40% more accurate than a keyword based approach for automate classification. The predictive quality of the individual classifiers ranged from a low of 69.8% for family & relationships and a high of 98.4 for water-related issues. See full list of results in table above. Note that the IBM platform is limited to English-based text messages but the team is looking to provide multi-lingual support in the future.

UNICEF is using this system to automatically route classified tweets to the appropriate departments. For example, UNICEF recently received a surge of text messages about nodding disease and responded by sending out a series of mass SMS’s to communities living in the affected region. These text messages provided information on how to recognize symptoms and ways to get treated. The feedback loop also includes government agencies and ministries. Indeed, all Members of Parliament and Chief Administrative Officers receive SMS updates based on the automated classification platform.

U-report is now being deployed in Zambia, South Sudan, Yemen, Democratic Republic of Congo, Zimbabwe and Burundi. I plan to get in touch with the team at IBM to learn more about these deployments and explore where we at QCRI may be able to help given our related work on AIDR. In the meantime, many thanks to my colleague Claudia Perlich for pointing me to this project. To learn more about IBM’s automated system, please see this paper (PDF).

bio

Automatically Classifying Crowdsourced Election Reports

As part of QCRI’s Artificial Intelligence for Monitoring Elections (AIME) project, I liaised with Kaggle to work with a top notch Data Scientist to carry out a proof of concept study. As I’ve blogged in the past, crowdsourced election monitoring projects are starting to generate “Big Data” which cannot be managed or analyzed manually in real-time. Using the crowdsourced election reporting data recently collected by Uchaguzi during Kenya’s elections, we therefore set out to assess whether one could use machine learning to automatically tag user-generated reports according to topic, such as election-violence. The purpose of this post is to share the preliminary results from this innovative study, which we believe is the first of it’s kind.

uchaguzi

The aim of this initial proof-of-concept study was to create a model to classify short messages (crowdsourced election reports) into several predetermined categories. The classification models were developed by applying a machine learning technique called gradient boosting on word features extracted from the text of the election reports along with their titles. Unigrams, bigrams and the number of words in the text and title were considered in the model development. The tf-idf weighting function was used following internal validation of the model.

The results depicted above confirm that classifiers can be developed to automatically classify short election observation reports crowdsourced from the public. The classification was generated by 10-fold cross validation. Our classifier is able to correctly predict whether a report is related to violence with an accuracy of 91%, for example. We can also accurately predict  89% of reports that relate to “Voter Issues” such as registration issues and reports that indicate positive events, “Fine” (86%).

The plan for this Summer and Fall is to replicate this work for other crowdsourced election datasets from Ghana, Liberia, Nigeria and Uganda. We hope the insights gained from this additional research will reveal which classifiers and/or “super classifiers” are portable across certain countries and election types. Our hypothesis, based on related crisis computing research, is that classifiers for certain types of events will be highly portable. However, we also hypothesize that the application of most classifiers across countries will result in lower accuracy scores. To this end, our Artificial Intelligence for Monitoring Elections platform will allow election monitoring organizations (end users) to create their own classifiers on the fly and thus meet their own information needs.

bio

Big thanks to Nao for his excellent work on this predictive modeling project.

Artificial Intelligence for Monitoring Elections (AIME)

Citizen-based, crowdsourced election observation initiatives are on the rise. Leading election monitoring organizations are also looking to leverage citizen-based reporting to complement their own professional election monitoring efforts. Meanwhile, the information revolution continues apace, with the number of new mobile phone subscriptions up by over 1 billion in just the past 36 months alone. The volume of election-related reports generated by “the crowd” is thus expected to grow significantly in the coming years. But international, national and local election monitoring organizations are completely unprepared to deal with the rise of Big (Election) Data.

Liberia2011

The purpose of this collaborative research project, AIME, is to develop a free and open source platform to automatically filter relevant election reports from the crowd. The platform will include pre-defined classifiers (e.g., security incidents,  intimidation, vote-buying, ballot stuffing etc.) for specific countries and will also allow end-users to create their own classifiers on the fly. The project, launched by QCRI and several key partners, will specifically focus on unstructured user-generated content from SMS and Twitter. AIME partners include a major international election monitoring organization and several academic research centers. The AIME platform will use the technology being developed for QCRI’s AIDR project: Artificial Intelligence for Disaster Response.

Bio

  • Acknowledgements Fredrik Sjoberg kindly provided the Uchaguzi data which he scraped from the public website at the time.
  • Qualification: Professor Michael Best has rightly noted that these preliminary results are overstated given that the machine learning analysis was carried out on corpus of pre-structured reports.

PeaceTXT Kenya: Since Wars Begin in Minds of Men


“Since wars begin in the minds of men, it is in the minds of men that the defenses of peace must be constructed.” – 
UNESCO Constitution, 1945

Today, in Kenya, PeaceTXT is building the defenses of peace out of text messages (SMS). As The New York Times explains, PeaceTXT is developing a “text messaging service that sends out blasts of pro-peace messages to specific areas when trouble is brewing.” Launched by PopTech in partnership with the Kenyan NGO Sisi ni Amani (We are Peace), the Kenyan implementation of PeaceTXT uses mobile advertising to market peace and change men’s behaviors.

Conflicts are often grounded in the stories and narratives that people tell them-selves and in the emotions that these stories evoke. Narratives shape identity and the social construct of reality—we interpret our lives through stories. These have the power to transform or infect relationships and communities. As US-based PeaceTXT partner CureViolence (formerly CeaseFire) has clearly shown, violence propagates in much the same way as infectious diseases do. The good news is that we already know how to treat the later: by blocking transmission and treating the infected. This is precisely the approach taken by CureViolence to successfully prevent violence on the streets of Chicago, Baghdad and elsewhere.

The challenge? CureViolence cannot be everywhere at the same time. But the “Crowd” is always there and where the crowd goes, mobile phones often follow. PeaceTXT leverages this new reality by threading a social narrative of peace using mobile messages. Empirical research in public health (and mobile adver-tising) clearly demonstrates that mobile messages & reminders can change behaviors. Given that conflicts are often grounded in the narratives that people tell themselves, we believe that mobile messaging may also influence conflict behavior and possibly prevent the widespread transmission of violent mindsets.

To test this hypothesis, PopTech partnered with Sisi ni Amani in 2011 to pilot and assess the use of mobile messaging for violence interruption and prevention since SNA-K had already been using mobile messaging for almost three years to promote peace, raise awareness about civic rights and encourage recourse to legal instruments for dispute resolution. During the twelve months leading up to today’s Presidential Elections, the Kenyan NGO Sisi ni Amani (SNA-K) has worked with PopTech and PeaceTXT partners (Medic Mobile, QCRI, Ushahidi & CureViolence) to identify the causes of peace in some of the country’s most conflict-prone communities. Since wars begin in the minds of men, SNA-K has held dozens of focus groups in many local communities to better understand the kinds of messaging that might make would-be perpetrators think twice before committing violence. Focus group participants also discussed the kinds of messaging needed to counter rumors. Working with Ogilvy, a global public relations agency with expertise in social marketing, SNA-K subsequently codified the hundreds of messages developed by the local communities to produce a set of guidelines for SNA-K staff to follow. These guidelines describe what types of messages to send to whom, where and when depending on the kinds of tensions being reported.

In addition to organizing these important focus groups, SNA-K literally went door-to-door in Kenya’s most conflict-prone communities to talk with residents about PeaceTXT and invite them to subscribe to SNA-Ks free SMS service. Today, SNA-K boasts over 60,000 SMS subscribers across the country. Thanks to Safaricom, the region’s largest mobile operator, SNA-K will be able to send out 50 million text messages completely for free, which will significantly boost the NGO’s mobile reach during today’s elections. And thanks to SNA-K’s customized mobile messaging platform built by the Praekelt Foundation, the Kenyan NGO can target specific SMS’s to individual subscribers based on their location, gender and demographics. In sum, as CNN explains, “the intervention combines targeted SMS with intensive on-the-ground work by existing peace builders and community leaders to target potential flashpoints of violence.” 

The partnership with Pop-Tech enabled SNA-K to scale thanks to the new funding and strategic partnerships provided by PopTech. Today, PeaceTXT and Sisi ni Amani have already had positive impact in the lead up to today’s important elections. For example, a volatile situation in Dandora recently led to the stabbing of several individuals, which could have resulted in a serious escalation of violence. So SNA-K sent the following SMS: 

Screen Shot 2013-03-03 at 4.34.44 PM 

“Tu dumisha amani!” means “Lets keep the peace!” SNA-K’s local coordinator in Dandore spoke with a number of emotionally distraught and (initially) very angry individuals in the area who said they had been ready to mobilizing and take revenge. But, as they later explained, the SMS sent out by SNA-K made them think twice. They discussed the situation and decided that more violence wouldn’t bring their friend back and would only bring more violence. They chose to resolve the volatile situation through mediation instead.

In Sagamian, recent tensions over land issues resulted in an outbreak of violence. So SNA-K sent the following message:

Screen Shot 2013-03-03 at 4.37.48 PM 

Those involved in the fighting subsequently left the area, telling SNA-K that they had decided not to fight after receiving the SMS. What’s more, they even requested that additional messages to be sent. Sisi ni Amani has collected dozens of such testimonials, which suggest that PeaceTXT is indeed having an impact. Historian Geoffrey Blainey once wrote that “for every thousand pages on the causes of war, there is less than one page directly on the causes of peace.” Today, the PeaceTXT Kenya & SNAK partnership is making sure that for every one SMS that may incite violence, a thousand messages of peace, calm and solidarity will follow to change the minds of men. Tudumishe amani!

bio

Cross-posted on PopTech blog.

Using CrowdFlower to Microtask Disaster Response

Cross-posted from CrowdFlower blog

A devastating earthquake struck Port-au-Prince on January 12, 2010. Two weeks later, on January 27th, a CrowdFlower was used to translate text messages from Haitian Creole to English. Tens of thousands of messages were sent by affected Haitians over the course of several months. All of these were heroically translated by hundreds of dedicated Creole-speaking volunteers based in dozens of countries across the globe. While Ushahidi took the lead by developing the initial translation platform used just days after the earthquake, the translation efforts were eventually rerouted to CrowdFlower. Why? Three simple reasons:

  1. CrowdFlower is one of the leading and most highly robust micro-tasking platforms there is;
  2. CrowdFlower’s leadership is highly committed to supporting digital humanitarian response efforts;
  3. Haitians in Haiti could now be paid for their translation work.

While the CrowdFlower project was launched 15 days after the earthquake, i.e., following the completion of search and rescue operations, every single digital humanitarian effort in Haiti was reactive. The key takeaway here was the proof of concept–namely that large-scale micro-tasking could play an important role in humanitarian information management. This was confirmed months later when devastating floods inundated much of Pakistan. CrowdFlower was once again used to translate incoming messages from the disaster affected population. While still reactive, this second use of CrowdFlower demonstrated replicability.

The most recent and perhaps most powerful use of CrowdFlower for disaster response occurred right after Typhoon Pablo devastated the Philippines in early December 2012. The UN Office for the Coordination of Humanitarian Affairs (OCHA) activated the Digital Humanitarian Network (DHN) to rapidly deliver a detailed dataset of geo-tagged pictures and video footage (posted on Twitter) depicting the damage caused by the Typhoon. The UN needed this dataset within 12 hours, which required that 20,000 tweets to be analyzed as quickly as possible. The Standby Volunteer Task Force (SBTF), a member of Digital Huma-nitarians, immediately used CrowdFlower to identify all tweets with links to pictures & video footage. SBTF volunteers subsequently analyzed those pictures and videos for damage and geographic information using other means.

This was the most rapid use of CrowdFlower following a disaster. In fact, this use of CrowdFlower was pioneering in many respects. This was the first time that a member of the Digital Humanitarian Network made use of CrowdFlower (and thus micro-tasking) for disaster response. It was also the first time that Crowd-Flower’s existing workforce was used for disaster response. In addition, this was the first time that data processed by CrowdFlower contributed to an official crisis map produced by the UN for disaster response (see above).

These three use-cases, Haiti, Pakistan and the Philippines, clearly demonstrate the added value of micro-tasking (and hence CrowdFlower) for disaster response. If CrowdFlower had not been available in Haiti, the alternative would have been to pay a handful of professional translators. The total price could have come to some $10,000 for 50,000 text messages (at 0.20 cents per word). Thanks to CrowdFlower, Haitians in Haiti were given the chance to make some of that money by translating the text messages themselves. Income generation programs are absolutely critical to rapid recovery following major disasters. In Pakistan, the use of CrowdFlower enabled Pakistani students and the Diaspora to volunteer their time and thus accelerate the translation work for free. Following Typhoon Pablo, paid CrowdFlower workers from the Philippines, India and Australia categorized several thousand tweets in just a couple hours while the volunteers from the Standby Volunteer Task Force geo-tagged the results. Had CrowdFlower not been available then, it is highly, highly unlikely that the mission would have succeeded given the very short turn-around required by the UN.

While impressive, the above use-cases were also reactive. We need to be a lot more pro-active, which is why I’m excited to be collaborating with CrowdFlower colleagues to customize a standby platform for use by the Digital Humanitarian Network. Having a platform ready-to-go within minutes is key. And while digital volunteers will be able to use this standby platform, I strongly believe that paid CrowdFlower workers also have a key role to play in the digital huma-nitarian ecosystem. Indeed, CrowdFlower’s large, multinational and multi-lingual global workforce is simply unparalleled and has the distinct advantage of being very well versed in the CrowdFlower platform.

In sum, it is high time that the digital humanitarian space move from crowd-sourcing to micro-tasking. It has been three years since the tragic earthquake in Haiti but we have yet to adopt micro-tasking more widely. CrowdFlower should thus play a key role in promoting and enabling this important shift. Their con-tinued important leadership in digital humanitarian response should also serve as a model for other private sector companies in the US and across the globe.

bio