Evaluating Event Credibility on Twitter Manish Gupta, Peixiang Zhao, Jiawei Han SIAM Data Mining 2012, Anaheim, CA 26th April 2012 Univ of Illinois at Urbana Champaign Abstract • • • • Twitter = sensor network/news source. Rumors spread via Twitter cause considerable damage. Problem: Automatically assess the credibility of events. Our solution – Credibility propagation on a network of users, tweets and events. – Enhance event credibility by graph regularization over a network of only events. • Using ~500 events extracted from two tweet feed datasets, we show that our methods are significantly more accurate (86%) than the classifier-based feature approach (72%). Univ of Illinois at Urbana Champaign Growing Twitter Influence • Twitter was launched in 2006. • In 2011, average number of tweets were 1620/sec. • Whenever a new event takes place, Twitter reaches new peaks of traffic. (8868 tweets/sec) • The volume of Twitter mentions on blogs, message boards and forums has reached the same level as Facebook. • Userbase growing rapidly. Users have been quite active and its userbase is spread across different age groups and across multiple countries. Univ of Illinois at Urbana Champaign Importance of Twitter Analysis • 4% content on Twitter relates to news. (Pear Analytics Report) • The Library of Congress archives every tweet. • Tweets can be summarized and opinions can impact decisions. • Popular events on Twitter keep the users aware of what is happening all across the world. • Twitter users act as a massive sensor force pushing real-time updates. Univ of Illinois at Urbana Champaign But is all the content trustworthy? Univ of Illinois at Urbana Champaign Twitter Hoaxes (Clearly incredible events) • • • • • • • Fake celebrity death news. Fake events related to celebrities. Fake events related to particular places. Partly “spiced up” rumors. Old news suddenly becomes popular. Identity theft. Erroneous claims made by politicians. Univ of Illinois at Urbana Champaign Does all the content look credible? (Seemingly incredible events) • Informally Conveyed News – thr is now a website where students can bet on grades. I love dis idea, I would bet on myself to get all C’s and make a killing!!! • (Lack/Presence of) Supportive Evidence – Yahoo is up more than 6% in after-hours trading after ousting CEO Carol Bartz http://ow.ly/6nP6X • Absence of any credibility-conveying words like news, breaking, latest, report. • Absence of retweets. • Lots of tweets written with first person or second person pronouns. Univ of Illinois at Urbana Champaign Problem definition • Credibility of a Twitter event e is the degree to which a human can believe in an event by browsing over a random sample of tweets related to the event. – expected credibility of the tweets that belong to event e. • Credibility of a Twitter user u is the expected credibility of the tweets he provides. • Credibility of a tweet t is a function of the credibility of related events and users, and the credibility of other tweets that make similar (supporting/opposing) assertions as the ones made by this tweet. • Problem: Given a set of events along with their related tweets and users, our aim is to find which of the events in the set can be deemed as credible. • Hypothesis: Both the content and the network provides a lot of signals for the task. Univ of Illinois at Urbana Champaign Three Approaches • Classifier-based Approach • Basic Credibility Analyzer – Exploits a network of users, tweets and events • Event Optimization-based Credibility Analyzer – Exploits a network of events. Univ of Illinois at Urbana Champaign Classifier: User Features • User has many friends and followers. • User has linked his Twitter profile to his Facebook profile. • User is a verified user. • User registered on Twitter long back. • User has made a lot of posts. • A description, URL, profile image, location is attached to user’s profile. Univ of Illinois at Urbana Champaign Classifier: Tweet Features • It is complete. A more complete tweet gives a more complete picture of the truth. • A professionally written tweet with no slang words, question marks, exclamation marks, full uppercase words, or smileys is more credible. • Number of words with first, second, third person pronouns. • Presence of supportive evidence in the form of external URLs. • A tweet may be regarded as more credible if it is from the most frequent location related to the event. Univ of Illinois at Urbana Champaign Classifier: Event Features • Number of tweets and retweets related to the event. • Number of distinct URLs, domains, hashtags, user mentions, users, locations related to the event. • Number of hours for which the event has been popular. • Percentage tweets related to the event on the day when the event reached its peak popularity. Univ of Illinois at Urbana Champaign Classifier-based Approach • Previous work [4] suggested decision trees supported by user, tweet, event and propagation features. • Classifier approach (focuses on content) – Does not consider any network information • With high probability, credible users provide credible tweets • Average credibility of tweets is higher for credible events than that for noncredible events • Events sharing many common words and topics should have similar credibility scores. – Is not entity aware. Features that belong to users and tweets are averaged across an event and used as event features. Univ of Illinois at Urbana Champaign Fact Finding on Twitter: BasicCA • A network of users, tweets and events Users Tweets Events Univ of Illinois at Urbana Champaign Inter-dependency between credibility of entities • With a high probability credible users provide credible tweets. • With a high probability, average credibility of tweets related to a credible event is higher than that of tweets related to a noncredible event. • Tweets can derive credibility from related users and events. Univ of Illinois at Urbana Champaign Tweet Implications • We would like to capture influences when computing credibility of tweets. • t1= “Bull jumps into crowd in Spain: A bull jumped into the stands at a bullring in northern Spain injuring at least 30... http://bbc.in/bLC2qx”. • t2=“Damn! Check out that bull that jumps the fence into the crowd. #awesome”. Event words are shown in bold. • imp(t1 → t2) = 3/6 • imp(t2 → t1) = 3/3 EventType D2010 D2011 Non-Credible 260.3 29.4 Credible 374.4 204.6 Univ of Illinois at Urbana Champaign Fact Finding on Twitter: BasicCA Weights and Event Implications • Weights are associated to make sure that the “comparison is done among equals”. • Event Implications • Event Credibility Univ of Illinois at Urbana Champaign Credibility Flow Univ of Illinois at Urbana Champaign Credibility Initialization to Incorporate Signals from Content SVM weights SVMf for user/tweet feature f Univ of Illinois at Urbana Champaign Fact Finding on Twitter: BasicCA Summing up O(IT2/E) Univ of Illinois at Urbana Champaign Fact Finding on Twitter: EventOptCA A Subgraph of our D2011 Event Graph • Events can be considered similar if they share a lot of common words or topics. • Similar events should get the same credibility label. Univ of Illinois at Urbana Champaign EventOptCA: How to find topics? • An event is represented as E=RE+OE, where RE = core words, OE = subordinate words • TE= set of tweets that contain RE. • For each tweet t∈TE, find all words present in t and also in OE. • St = Maximal subset of words from S also present in t. Compute St for every t∈TE. • Most frequent St’s are topics for the event. Univ of Illinois at Urbana Champaign Fact Finding on Twitter: EventOptCA Similar events should have similar credibility scores Change in event credibility vector should be as less as possible Trust Region Reflexive Algorithm 2(D-W)+𝜆I has all square terms and so is positive definite. implementation in Matlab So, the problem has a unique global minimizer. (quadprog) Univ of Illinois at Urbana Champaign Fact Finding on Twitter: EventOptCA O(IT2/E+En) Univ of Illinois at Urbana Champaign Experiments Univ of Illinois at Urbana Champaign • D2010 – – – – Datasets Events supplied by Castillo [4], using Twitter Monitor for Apr-Sep 2010 288 events were labeled as credible or not credible Remove those events which have less than 10 tweets to get 207 events. News obtained using Google News Archives • D2011 – Events identified from March 2011 tweet feed of 34M tweets. – Events are Twitter Trends for March 2011. We removed events with <100 tweets. We selected 250 news events. Statistic D2010 – News obtained from Google News RSS feeds. #Users 47171 Sentiment analysis: General Inquirer dictionaries. • • Labeling – Google news archives. – Step 1: Check if event occurred in news headlines. – Step 2: Check if random 10 tweets look credible. D2011 9245 #Tweets 76730 76216 #Events 207 250 #PosEvents 140 167 #NegEvents 67 83 #Topics 2874 2101 Univ of Illinois at Urbana Champaign Accuracy Comparison Classifier BasicCA EventOptCA D2010 D2011 D2010 D2011 D2010 D2011 Accuracy 72.46 72.4 76.9 75.6 85.5 86.8 False +ves 36 41 24 23 28 31 False -ves 21 28 22 28 2 2 Parameter D201 D201 0 1 𝜆 0.25 83.1 82.4 0.5 84.4 86 0.75 86.5 84.4 1 85.5 86.8 5 81.6 86 20 78.7 84.8 50 78.3 82.8 Varying the parameter 𝜆 • Classifier based approach provides ~72% accuracy. • Our simple fact finder based approach provides some boost. • Updating the event vector using the event-graph based optimization provides significant boost in accuracy. (14% over the baseline.) Univ of Illinois at Urbana Champaign Important features User Features Tweet Features Event Features UserHasProfileImage? TweetWithMostFrequentUserMentioned? CountDictinctDomains UserRegistrationAge LengthInChars CountDistinctUsers UserFollowerCount TweetWithMostFreqURL? CountDistinctHashtags UserStatusCount TweetFromMostFreqLoc? NumberOfUserLocations UserHasLocation? URLfromTop10Domains? NumberOfHours Important Features for D2010 User Features Tweet Features Event Features VerifiedUser? NumURLs CountDictinctDomains UserStatusCount TweetSentiment CountDictinctURLs UserHasFacebookLink? TweetsWithMostFrequentUserMentioned? CountDistinctHashtags UserHasDesc? ExclamationMark? PercentTweetsOnPeakDay UserFollowerCount NumHashTags? CountDistinctUsers Important Features for D2011 Univ of Illinois at Urbana Champaign Effect of removing implications 90 Accuracy 85 D2010 80 D2010 (w/o implications) 75 70 D2011 EventOptCA BasicCA 65 D2011 (w/o implications) • Without considering implications, we notice a significant loss in accuracy. Univ of Illinois at Urbana Champaign Accuracy variation (wrt #iterations) Iteration 0 Iteration 1 Iteration 2 Iteration 3 EventOptCA (D2010) BasicCA (D2011) EventOptCA (D2011) 90 Accuracy 85 80 75 70 65 60 BasicCA (D2010) • Note that accuracy improves as the number of iterations increase. After 3-4 iterations, the accuracy drops a little and then becomes stable. Univ of Illinois at Urbana Champaign Case Study 1: Most Credible Users D2010 NewsCluster NewsRing oieaomar portland apts newsnetworks Newsbox101 euronewspure nevisupdates Reality Check cnnbrk D2011 funkmasterflex BreakingNews cwcscores RealTonyRocha espncricinfo GMANewsOnline ipadmgfindcom aolnews QuakeTsunami TechZader • Most credible users often include “news reporting Twitter users”. • Sometimes, celebrity users or automated users which tweet a lot also turn up at the top. Univ of Illinois at Urbana Champaign Case Study 2: Incredible Event • Event: 24pp • Tweets – #24pp had enough - I can take small doses of all the people on the show but not that much. – #24pp i love David Frost – #24pp is the best way to spend a saturday :) – #24pp Sir David Frost is hilarious!! – Andy Parsons is being fucking brilliant on this #24PP edition of #MockTheWeek ... shame no-one else is really ... – back link habits http://emap.ws/7/9Chvm4 DMZ #24pp Russell Tovey – @benjamin_cook and @mrtinoforever are SO visible in the audience from the back of their heads alone! #24pp – @brihonee I just setup my comptuer to my TV so I can watch #24pp until the end! – #buzzcocks this is turning into a very funny episode #24pp – David Walliams has a cheeky smile. #24pp • Classifier marks it positive, but BasicCA marks it as not credible, essentially because tweet implications are really low. Univ of Illinois at Urbana Champaign Case Study 3: Event Similarity • EventOptCA exploits similarity between events. • Consider the three events: – wembley – ghana – englandvsghana • BasicCA marks “ghana” and “englandvdghana” as credible correctly but marks “wembley” incorrectly. • However, all three events share a lot of common words and aspects and so are marked as credible by EventOptCA. Univ of Illinois at Urbana Champaign Drawbacks of our methods • Deep NLP techniques can be used to obtain more accurate tweet implications. • Evidence about an event being rumorous may be present in tweets themselves. – Deep semantic parsing of tweets may be quite inefficient. • Entertainment events tend to look not so credible, as the news is often written in a colloquial way. – Predicting credibility for entertainment events may need to be studied separately. Univ of Illinois at Urbana Champaign Related Work • • • • Credibility Analysis of Online Social Content Credibility Analysis for Twitter Fact Finding Approaches Graph Regularization Approaches Univ of Illinois at Urbana Champaign Related Work • Credibility Analysis of Online Social Content – Credibility perception depends on user background (Internetsavvy nature, politically-interested, experts, etc.), author gender, content attribution elements like author’s image, type of content (blog, tweet, news website), and the way of presenting the content (e.g. presence of ads, visual design). – 13% Wikipedia articles have mistakes on average. [5] – In general, news > blogs> Twitter [27] • Credibility Analysis for Twitter – Not credible because (1) Friends/followers are unknown (2) Tracing origin is difficult (3) Tampering with original message while retweeting is easy. [21] – Truthy service from Indiana University (Crowd Sourcing) [19] – Warranting impossible. [22] – Classifier approach to establish credibility. [4] Univ of Illinois at Urbana Champaign Related Work • Fact Finding Approaches – TruthFinder [30] – Different forms of credibility propagation [17] – Extensions like hardness of claims, probabilistic claims [2,3,8,9,29] • Graph Regularization Approaches – Mainly studied in semi-supervised settings – Has been used for a lot of applications like webpage categorization, large-scale semi-definite programming, color image processing, etc. [12, 28, 32] Univ of Illinois at Urbana Champaign Conclusions • We explored the possibility of detecting and recommending credible events from Twitter feeds using fact finders. • We exploit – Tweet feed content-based classifier results. – Traditional credibility propagation using a simple network of tweets, users and events. – Event Graph based optimization to assign similar scores to similar events. • Using two real datasets, we showed that credibility analysis approach with event graph optimization works better than the basic credibility analysis approach or the classifier approach. Univ of Illinois at Urbana Champaign References [1] R. Abdulla, B. Garrison, M. Salwen, and P. Driscoll. The Credibility of Newspapers, Television News, and Online News. Association for Education in Journalism and Mass Communication, Jan 2002. [2] R. Balakrishnan. Source Rank: Relevance and Trust Assessment for Deep Web Sources based on Inter-Source Agreement. In International Conference on World Wide Web (WWW), pages 227–236. ACM, 2011. [3] L. Berti-Equille, A. D. Sarma, X. Dong, A. Marian, and D. Srivastava. Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence. In Conference on Innovative Data Systems Research (CIDR). www.crdrdb.org, 2009. [4] C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In International Conference on World Wide Web (WWW), pages 675–684. ACM, 2011. [5] T. Chesney. An Empirical Examination of Wikipedia’s Credibility. First Monday, 11(11), 2006. [6] I. Dagan and O. Glickman. Probabilistic Textual Entailment: Generic Applied Modeling of Language Variability. In Learning Methods for Text Understanding and Mining, Jan. 2004. [7] I. Dagan, O. Glickman, and B. Magnini. The PASCAL Recognising Textual Entailment Challenge. In Machine Learning Challenges, volume 3944, pages 177–190. Springer, 2006. [8] A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating Information from Disagreeing Views. In ACM International Conference on Web Search and Data Mining (WSDM), pages 131–140. ACM, 2010. [9] M. Gupta, Y. Sun, and J. Han. Trust Analysis with Clustering. In International Conference on World Wide Web (WWW), pages 53–54, New York, NY, USA, 2011. ACM. [10] A. Hickl, J. Williams, J. Bensley, K. Roberts, B. Rink, and Y. Shi. Recognizing Textual Entailment with LCC’s GROUNDHOG System. In PASCAL Challenges Workshop on Recognising Textual Entailment, 2006. [11] E. Kim, S. Gilbert, M. J. Edwards, and E. Graeff. Detecting Sadness in 140 Characters: Sentiment Analysis of Mourning Michael Jackson on Twitter. 2009. Univ of Illinois at Urbana Champaign References [12] O. Lezoray, A. Elmoataz, and S. Bougleux. Graph Regularization for Color Image Processing. Computer Vision and Image Understanding, 107(1–2):38–55, 2007. [13] M. Mathioudakis and N. Koudas. TwitterMonitor : Trend Detection over the Twitter Stream. International Conference on Management of data (SIGMOD), pages 1155–1157, 2010. [14] MATLAB. version 7.9.0.529 (R2009b). The MathWorks Inc., Natick, Massachusetts, 2009. [15] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to theWeb. In International Conference on World Wide Web (WWW), pages 161–172. ACM, 1998. [16] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: Sentiment Classification using Machine Learning Techniques. In Empirical Methods in Natural Language Processing (EMNLP), pages 79–86. ACL, 2002. [17] J. Pasternack and D. Roth. Knowing What to Believe (When You Already Know Something). In International Conference on Computational Linguistics (COLING). Tsinghua University Press, 2010. [18] A. M. Popescu and M. Pennacchiotti. Detecting Controversial Events From Twitter. In ACM International Conference on Information and Knowledge Management (CIKM), pages 1873–1876. ACM, 2010. [19] J. Ratkiewicz, M. Conover, M. Meiss, B. G. calves, S. Patil, A. Flammini, and F. Menczer. Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams. arXiv, Nov 2010. [20] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake Shakes Twitter users: Real-Time Event Detection by Social Sensors. In International Conference on World Wide Web (WWW), pages 851–860. ACM, 2010. [21] M. Schmierbach and A. Oeldorf-Hirsch. A Little Bird Told Me, so I didn’t Believe It: Twitter, credibility, and Issue Perceptions. In Association for Education in Journalism and Mass Communication. AEJMC, Aug 2010. [22] A. Schrock. Are You What You Tweet? Warranting Trustworthiness on Twitter. In Association for Education in Journalism and Mass Communication. AEJMC, Aug 2010. Univ of Illinois at Urbana Champaign References [23] S. Soderland. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34:233– 272, 1999. [24] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In International AAAI Conference on Weblogs and Social Media (ICWSM). The AAAI Press, 2010. [25] P. Viola and M. Narasimhan. Learning to Extract Information from Semi-Structured Text using a Discriminative Context Free Grammar. In International ACM Conference on Research and Development in Information Retrieval (SIGIR), pages 330–337. ACM, 2005. [26] V. V. Vydiswaran, C. Zhai, and D. Roth. Content-Driven Trust Propagation Framework. In International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 974–982. ACM, 2011. [27] C. R. WebWatch. Leap of Faith: Using the Internet Despite the Dangers, Oct 2005. [28] K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul. Graph Laplacian Regularization for Large-Scale Semidefinite Programming. In Advances in Neural Information Processing Systems (NIPS), 2007. [29] X. Yin and W. Tan. Semi-Supervised Truth Discovery. In International Conference on World Wide Web (WWW), pages 217–226. ACM, 2011. [30] X. Yin, P. S. Yu, and J. Han. Truth Discovery with Multiple Conflicting Information Providers on the Web. IEEE Transactions on Knowledge and Data Engineering (TKDE), 20(6):796–808, 2008. [31] G. Zeng and W. Wang. An Evidence-Based Iterative Content Trust Algorithm for the Credibility of Online News. Concurrency and Computation:Practice and Experience (CCPE), 21:1857–1881, Oct 2009. [32] T. Zhang, A. Popescul, and B. Dom. Linear Prediction Models with Graph Regularization for Web-page Categorization. In International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 821–826. ACM, 2006. Univ of Illinois at Urbana Champaign Thanks! Univ of Illinois at Urbana Champaign