Slides - Microsoft Research

advertisement
Evaluating Event Credibility
on Twitter
Manish Gupta, Peixiang Zhao, Jiawei Han
SIAM Data Mining 2012, Anaheim, CA
26th April 2012
Univ of Illinois at Urbana Champaign
Abstract
•
•
•
•
Twitter = sensor network/news source.
Rumors spread via Twitter cause considerable damage.
Problem: Automatically assess the credibility of events.
Our solution
– Credibility propagation on a network of users, tweets and
events.
– Enhance event credibility by graph regularization over a network
of only events.
• Using ~500 events extracted from two tweet feed datasets, we
show that our methods are significantly more accurate (86%) than
the classifier-based feature approach (72%).
Univ of Illinois at Urbana Champaign
Growing Twitter Influence
• Twitter was launched in 2006.
• In 2011, average number of tweets were
1620/sec.
• Whenever a new event takes place, Twitter
reaches new peaks of traffic. (8868 tweets/sec)
• The volume of Twitter mentions on blogs,
message boards and forums has reached the
same level as Facebook.
• Userbase growing rapidly. Users have been quite
active and its userbase is spread across different
age groups and across multiple countries.
Univ of Illinois at Urbana Champaign
Importance of Twitter Analysis
• 4% content on Twitter relates to news. (Pear
Analytics Report)
• The Library of Congress archives every tweet.
• Tweets can be summarized and opinions can
impact decisions.
• Popular events on Twitter keep the users aware
of what is happening all across the world.
• Twitter users act as a massive sensor force
pushing real-time updates.
Univ of Illinois at Urbana Champaign
But is all the content trustworthy?
Univ of Illinois at Urbana Champaign
Twitter Hoaxes
(Clearly incredible events)
•
•
•
•
•
•
•
Fake celebrity death news.
Fake events related to celebrities.
Fake events related to particular places.
Partly “spiced up” rumors.
Old news suddenly becomes popular.
Identity theft.
Erroneous claims made by politicians.
Univ of Illinois at Urbana Champaign
Does all the content look credible?
(Seemingly incredible events)
• Informally Conveyed News
– thr is now a website where students can bet on grades. I
love dis idea, I would bet on myself to get all C’s and make
a killing!!!
• (Lack/Presence of) Supportive Evidence
– Yahoo is up more than 6% in after-hours trading after
ousting CEO Carol Bartz http://ow.ly/6nP6X
• Absence of any credibility-conveying words like news,
breaking, latest, report.
• Absence of retweets.
• Lots of tweets written with first person or second
person pronouns.
Univ of Illinois at Urbana Champaign
Problem definition
• Credibility of a Twitter event e is the degree to which a human can
believe in an event by browsing over a random sample of tweets
related to the event.
– expected credibility of the tweets that belong to event e.
• Credibility of a Twitter user u is the expected credibility of the
tweets he provides.
• Credibility of a tweet t is a function of the credibility of related
events and users, and the credibility of other tweets that make
similar (supporting/opposing) assertions as the ones made by this
tweet.
• Problem: Given a set of events along with their related tweets and
users, our aim is to find which of the events in the set can be
deemed as credible.
• Hypothesis: Both the content and the network provides a lot of
signals for the task.
Univ of Illinois at Urbana Champaign
Three Approaches
• Classifier-based Approach
• Basic Credibility Analyzer
– Exploits a network of users, tweets and events
• Event Optimization-based Credibility Analyzer
– Exploits a network of events.
Univ of Illinois at Urbana Champaign
Classifier: User Features
• User has many friends and followers.
• User has linked his Twitter profile to his
Facebook profile.
• User is a verified user.
• User registered on Twitter long back.
• User has made a lot of posts.
• A description, URL, profile image, location is
attached to user’s profile.
Univ of Illinois at Urbana Champaign
Classifier: Tweet Features
• It is complete. A more complete tweet gives a more
complete picture of the truth.
• A professionally written tweet with no slang words,
question marks, exclamation marks, full uppercase
words, or smileys is more credible.
• Number of words with first, second, third person
pronouns.
• Presence of supportive evidence in the form of
external URLs.
• A tweet may be regarded as more credible if it is from
the most frequent location related to the event.
Univ of Illinois at Urbana Champaign
Classifier: Event Features
• Number of tweets and retweets related to the
event.
• Number of distinct URLs, domains, hashtags,
user mentions, users, locations related to the
event.
• Number of hours for which the event has been
popular.
• Percentage tweets related to the event on the
day when the event reached its peak popularity.
Univ of Illinois at Urbana Champaign
Classifier-based Approach
• Previous work [4] suggested decision trees
supported by user, tweet, event and propagation
features.
• Classifier approach (focuses on content)
– Does not consider any network information
• With high probability, credible users provide credible tweets
• Average credibility of tweets is higher for credible events
than that for noncredible events
• Events sharing many common words and topics should have
similar credibility scores.
– Is not entity aware. Features that belong to users and
tweets are averaged across an event and used as
event features.
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: BasicCA
• A network of users, tweets and events
Users
Tweets
Events
Univ of Illinois at Urbana Champaign
Inter-dependency between credibility
of entities
• With a high probability credible
users provide credible tweets.
• With a high probability, average
credibility of tweets related to a
credible event is higher than
that of tweets related to a noncredible event.
• Tweets can derive credibility
from related users and events.
Univ of Illinois at Urbana Champaign
Tweet Implications
• We would like to capture influences when computing credibility of
tweets.
• t1= “Bull jumps into crowd in Spain: A bull jumped into the stands
at a bullring in northern Spain injuring at least 30...
http://bbc.in/bLC2qx”.
• t2=“Damn! Check out that bull that jumps the fence into the crowd.
#awesome”. Event words are shown in bold.
• imp(t1 → t2) = 3/6
• imp(t2 → t1) = 3/3
EventType
D2010 D2011
Non-Credible
260.3
29.4
Credible
374.4
204.6
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: BasicCA
Weights and Event Implications
• Weights are associated to make sure that the
“comparison is done among equals”.
• Event Implications
• Event Credibility
Univ of Illinois at Urbana Champaign
Credibility Flow
Univ of Illinois at Urbana Champaign
Credibility Initialization to Incorporate
Signals from Content
SVM weights SVMf for user/tweet feature f
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: BasicCA
Summing up
O(IT2/E)
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: EventOptCA
A Subgraph of our D2011 Event Graph
• Events can be considered similar if they share a lot of common
words or topics.
• Similar events should get the same credibility label.
Univ of Illinois at Urbana Champaign
EventOptCA: How to find topics?
• An event is represented as E=RE+OE, where RE
= core words, OE = subordinate words
• TE= set of tweets that contain RE.
• For each tweet t∈TE, find all words present in t
and also in OE.
• St = Maximal subset of words from S also
present in t. Compute St for every t∈TE.
• Most frequent St’s are topics for the event.
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: EventOptCA
Similar events should have
similar credibility scores
Change in event credibility
vector should be as less as
possible
Trust Region Reflexive Algorithm
2(D-W)+𝜆I has all square terms and so
is positive definite.
implementation
in Matlab
So, the problem has a unique global minimizer.
(quadprog)
Univ of Illinois at Urbana Champaign
Fact Finding on Twitter: EventOptCA
O(IT2/E+En)
Univ of Illinois at Urbana Champaign
Experiments
Univ of Illinois at Urbana Champaign
• D2010
–
–
–
–
Datasets
Events supplied by Castillo [4], using Twitter Monitor for Apr-Sep 2010
288 events were labeled as credible or not credible
Remove those events which have less than 10 tweets to get 207 events.
News obtained using Google News Archives
• D2011
– Events identified from March 2011 tweet feed of 34M tweets.
– Events are Twitter Trends for March 2011. We removed events with <100
tweets. We selected 250 news events.
Statistic
D2010
– News obtained from Google News RSS feeds.
#Users
47171
Sentiment analysis: General Inquirer dictionaries.
•
• Labeling
– Google news archives.
– Step 1: Check if event occurred in news headlines.
– Step 2: Check if random 10 tweets look credible.
D2011
9245
#Tweets
76730
76216
#Events
207
250
#PosEvents
140
167
#NegEvents
67
83
#Topics
2874
2101
Univ of Illinois at Urbana Champaign
Accuracy Comparison
Classifier
BasicCA
EventOptCA
D2010 D2011 D2010 D2011 D2010 D2011
Accuracy 72.46 72.4
76.9
75.6
85.5
86.8
False +ves 36
41
24
23
28
31
False -ves 21
28
22
28
2
2
Parameter D201 D201
0
1
𝜆
0.25
83.1 82.4
0.5
84.4
86
0.75
86.5 84.4
1
85.5 86.8
5
81.6
86
20
78.7 84.8
50
78.3 82.8
Varying the parameter 𝜆
• Classifier based approach provides ~72% accuracy.
• Our simple fact finder based approach provides some boost.
• Updating the event vector using the event-graph based
optimization provides significant boost in accuracy. (14% over the
baseline.)
Univ of Illinois at Urbana Champaign
Important features
User Features
Tweet Features
Event Features
UserHasProfileImage?
TweetWithMostFrequentUserMentioned?
CountDictinctDomains
UserRegistrationAge
LengthInChars
CountDistinctUsers
UserFollowerCount
TweetWithMostFreqURL?
CountDistinctHashtags
UserStatusCount
TweetFromMostFreqLoc?
NumberOfUserLocations
UserHasLocation?
URLfromTop10Domains?
NumberOfHours
Important Features for D2010
User Features
Tweet Features
Event Features
VerifiedUser?
NumURLs
CountDictinctDomains
UserStatusCount
TweetSentiment
CountDictinctURLs
UserHasFacebookLink?
TweetsWithMostFrequentUserMentioned?
CountDistinctHashtags
UserHasDesc?
ExclamationMark?
PercentTweetsOnPeakDay
UserFollowerCount
NumHashTags?
CountDistinctUsers
Important Features for D2011
Univ of Illinois at Urbana Champaign
Effect of removing implications
90
Accuracy
85
D2010
80
D2010 (w/o
implications)
75
70
D2011
EventOptCA
BasicCA
65
D2011 (w/o
implications)
• Without considering implications, we notice a
significant loss in accuracy.
Univ of Illinois at Urbana Champaign
Accuracy variation (wrt #iterations)
Iteration 0
Iteration 1
Iteration 2
Iteration 3
EventOptCA
(D2010)
BasicCA
(D2011)
EventOptCA
(D2011)
90
Accuracy
85
80
75
70
65
60
BasicCA
(D2010)
• Note that accuracy improves as the number of
iterations increase. After 3-4 iterations, the
accuracy drops a little and then becomes stable.
Univ of Illinois at Urbana Champaign
Case Study 1: Most Credible Users
D2010
NewsCluster
NewsRing
oieaomar
portland apts
newsnetworks
Newsbox101
euronewspure
nevisupdates
Reality Check
cnnbrk
D2011
funkmasterflex
BreakingNews
cwcscores
RealTonyRocha
espncricinfo
GMANewsOnline
ipadmgfindcom
aolnews
QuakeTsunami
TechZader
• Most credible users often include “news reporting Twitter
users”.
• Sometimes, celebrity users or automated users which
tweet a lot also turn up at the top.
Univ of Illinois at Urbana Champaign
Case Study 2: Incredible Event
• Event: 24pp
• Tweets
– #24pp had enough - I can take small doses of all the people on the show but
not that much.
– #24pp i love David Frost
– #24pp is the best way to spend a saturday :)
– #24pp Sir David Frost is hilarious!!
– Andy Parsons is being fucking brilliant on this #24PP edition of
#MockTheWeek ... shame no-one else is really ...
– back link habits http://emap.ws/7/9Chvm4 DMZ #24pp Russell Tovey
– @benjamin_cook and @mrtinoforever are SO visible in the audience from the
back of their heads alone! #24pp
– @brihonee I just setup my comptuer to my TV so I can watch #24pp until the
end!
– #buzzcocks this is turning into a very funny episode #24pp
– David Walliams has a cheeky smile. #24pp
• Classifier marks it positive, but BasicCA marks it as not credible, essentially
because tweet implications are really low.
Univ of Illinois at Urbana Champaign
Case Study 3: Event Similarity
• EventOptCA exploits similarity between events.
• Consider the three events:
– wembley
– ghana
– englandvsghana
• BasicCA marks “ghana” and “englandvdghana” as
credible correctly but marks “wembley”
incorrectly.
• However, all three events share a lot of common
words and aspects and so are marked as credible
by EventOptCA.
Univ of Illinois at Urbana Champaign
Drawbacks of our methods
• Deep NLP techniques can be used to obtain more
accurate tweet implications.
• Evidence about an event being rumorous may be
present in tweets themselves.
– Deep semantic parsing of tweets may be quite
inefficient.
• Entertainment events tend to look not so
credible, as the news is often written in a
colloquial way.
– Predicting credibility for entertainment events may
need to be studied separately.
Univ of Illinois at Urbana Champaign
Related Work
•
•
•
•
Credibility Analysis of Online Social Content
Credibility Analysis for Twitter
Fact Finding Approaches
Graph Regularization Approaches
Univ of Illinois at Urbana Champaign
Related Work
• Credibility Analysis of Online Social Content
– Credibility perception depends on user background (Internetsavvy nature, politically-interested, experts, etc.), author gender,
content attribution elements like author’s image, type of
content (blog, tweet, news website), and the way of presenting
the content (e.g. presence of ads, visual design).
– 13% Wikipedia articles have mistakes on average. [5]
– In general, news > blogs> Twitter [27]
• Credibility Analysis for Twitter
– Not credible because (1) Friends/followers are unknown (2)
Tracing origin is difficult (3) Tampering with original message
while retweeting is easy. [21]
– Truthy service from Indiana University (Crowd Sourcing) [19]
– Warranting impossible. [22]
– Classifier approach to establish credibility. [4]
Univ of Illinois at Urbana Champaign
Related Work
• Fact Finding Approaches
– TruthFinder [30]
– Different forms of credibility propagation [17]
– Extensions like hardness of claims, probabilistic claims
[2,3,8,9,29]
• Graph Regularization Approaches
– Mainly studied in semi-supervised settings
– Has been used for a lot of applications like webpage
categorization, large-scale semi-definite programming,
color image processing, etc. [12, 28, 32]
Univ of Illinois at Urbana Champaign
Conclusions
• We explored the possibility of detecting and recommending
credible events from Twitter feeds using fact finders.
• We exploit
– Tweet feed content-based classifier results.
– Traditional credibility propagation using a simple network of
tweets, users and events.
– Event Graph based optimization to assign similar scores to
similar events.
• Using two real datasets, we showed that credibility analysis
approach with event graph optimization works better than
the basic credibility analysis approach or the classifier
approach.
Univ of Illinois at Urbana Champaign
References
[1] R. Abdulla, B. Garrison, M. Salwen, and P. Driscoll. The Credibility of Newspapers, Television News, and Online News.
Association for Education in Journalism and Mass Communication, Jan 2002.
[2] R. Balakrishnan. Source Rank: Relevance and Trust Assessment for Deep Web Sources based on Inter-Source
Agreement. In International Conference on World Wide Web (WWW), pages 227–236. ACM, 2011.
[3] L. Berti-Equille, A. D. Sarma, X. Dong, A. Marian, and D. Srivastava. Sailing the Information Ocean with Awareness of
Currents: Discovery and Application of Source Dependence. In Conference on Innovative Data Systems Research (CIDR).
www.crdrdb.org, 2009.
[4] C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In International Conference on World
Wide Web (WWW), pages 675–684. ACM, 2011.
[5] T. Chesney. An Empirical Examination of Wikipedia’s Credibility. First Monday, 11(11), 2006.
[6] I. Dagan and O. Glickman. Probabilistic Textual Entailment: Generic Applied Modeling of Language Variability. In
Learning Methods for Text Understanding and Mining, Jan. 2004.
[7] I. Dagan, O. Glickman, and B. Magnini. The PASCAL Recognising Textual Entailment Challenge. In Machine Learning
Challenges, volume 3944, pages 177–190. Springer, 2006.
[8] A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating Information from Disagreeing Views. In ACM
International Conference on Web Search and Data Mining (WSDM), pages 131–140. ACM, 2010.
[9] M. Gupta, Y. Sun, and J. Han. Trust Analysis with Clustering. In International Conference on World Wide Web
(WWW), pages 53–54, New York, NY, USA, 2011. ACM.
[10] A. Hickl, J. Williams, J. Bensley, K. Roberts, B. Rink, and Y. Shi. Recognizing Textual Entailment with LCC’s
GROUNDHOG System. In PASCAL Challenges Workshop on Recognising Textual Entailment, 2006.
[11] E. Kim, S. Gilbert, M. J. Edwards, and E. Graeff. Detecting Sadness in 140 Characters: Sentiment Analysis of
Mourning Michael Jackson on Twitter. 2009.
Univ of Illinois at Urbana Champaign
References
[12] O. Lezoray, A. Elmoataz, and S. Bougleux. Graph Regularization for Color Image Processing. Computer Vision and
Image Understanding, 107(1–2):38–55, 2007.
[13] M. Mathioudakis and N. Koudas. TwitterMonitor : Trend Detection over the Twitter Stream. International
Conference on Management of data (SIGMOD), pages 1155–1157, 2010.
[14] MATLAB. version 7.9.0.529 (R2009b). The MathWorks Inc., Natick, Massachusetts, 2009.
[15] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to theWeb. In
International Conference on World Wide Web (WWW), pages 161–172. ACM, 1998.
[16] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: Sentiment Classification using Machine Learning Techniques. In
Empirical Methods in Natural Language Processing (EMNLP), pages 79–86. ACL, 2002.
[17] J. Pasternack and D. Roth. Knowing What to Believe (When You Already Know Something). In International
Conference on Computational Linguistics (COLING). Tsinghua University Press, 2010.
[18] A. M. Popescu and M. Pennacchiotti. Detecting Controversial Events From Twitter. In ACM International Conference
on Information and Knowledge Management (CIKM), pages 1873–1876. ACM, 2010.
[19] J. Ratkiewicz, M. Conover, M. Meiss, B. G. calves, S. Patil, A. Flammini, and F. Menczer. Detecting and Tracking the
Spread of Astroturf Memes in Microblog Streams. arXiv, Nov 2010.
[20] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake Shakes Twitter users: Real-Time Event Detection by Social
Sensors. In International Conference on World Wide Web (WWW), pages 851–860. ACM, 2010.
[21] M. Schmierbach and A. Oeldorf-Hirsch. A Little Bird Told Me, so I didn’t Believe It: Twitter, credibility, and Issue
Perceptions. In Association for Education in Journalism and Mass Communication. AEJMC, Aug 2010.
[22] A. Schrock. Are You What You Tweet? Warranting Trustworthiness on Twitter. In Association for Education in
Journalism and Mass Communication. AEJMC, Aug 2010.
Univ of Illinois at Urbana Champaign
References
[23] S. Soderland. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34:233–
272, 1999.
[24] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting Elections with Twitter: What 140 Characters
Reveal about Political Sentiment. In International AAAI Conference on Weblogs and Social Media (ICWSM). The AAAI
Press, 2010.
[25] P. Viola and M. Narasimhan. Learning to Extract Information from Semi-Structured Text using a Discriminative
Context Free Grammar. In International ACM Conference on Research and Development in Information Retrieval
(SIGIR), pages 330–337. ACM, 2005.
[26] V. V. Vydiswaran, C. Zhai, and D. Roth. Content-Driven Trust Propagation Framework. In International Conference
on Knowledge Discovery and Data Mining (SIGKDD), pages 974–982. ACM, 2011.
[27] C. R. WebWatch. Leap of Faith: Using the Internet Despite the Dangers, Oct 2005.
[28] K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul. Graph Laplacian Regularization for Large-Scale Semidefinite
Programming. In Advances in Neural Information Processing Systems (NIPS), 2007.
[29] X. Yin and W. Tan. Semi-Supervised Truth Discovery. In International Conference on World Wide Web (WWW),
pages 217–226. ACM, 2011.
[30] X. Yin, P. S. Yu, and J. Han. Truth Discovery with Multiple Conflicting Information Providers on the Web. IEEE
Transactions on Knowledge and Data Engineering (TKDE), 20(6):796–808, 2008.
[31] G. Zeng and W. Wang. An Evidence-Based Iterative Content Trust Algorithm for the Credibility of Online News.
Concurrency and Computation:Practice and Experience (CCPE), 21:1857–1881, Oct 2009.
[32] T. Zhang, A. Popescul, and B. Dom. Linear Prediction Models with Graph Regularization for Web-page
Categorization. In International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 821–826. ACM,
2006.
Univ of Illinois at Urbana Champaign
Thanks!
Univ of Illinois at Urbana Champaign
Download