In their study, “Credibility Ranking of Tweets during High Impact Events,” authors Aditi Gupta and Ponnurangam Kumaraguru “analyzed the credibility of information in corresponding to fourteen high impact news events of 2011 around the globe.” According to their analysis, “30% of total tweets about an event contained situational information about the event while 14% was spam.” In addition, about 17% of total tweets contained situational awareness information that was credible.
The study analyzed over 35 million tweets posted by ~8 million users based on current trending topics. From this data, the authors identified 14 major events reflected in the tweets. These included the UK riots, Libya crisis, Virginia earthquake and Hurricane Irene, for example.
“Using regression analysis, we identied the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm signicantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.”