Ranking Tweets Considering Trust and Relevance Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati Arizona State University 1 • One of the most prominent micro-blogging service. • Twitter has over 140 million active users and generates over 340 millions tweets daily and handles over 1.6 billion search queries per day. • Users access tweets by following other users and by using the search function. 2 Twitter Search Results for the Query: “Britney Spears” • Sorted by Reverse Chronological Order • Select the top retweeted single tweet as the top Tweet. • Does not apply any relevance metrics. • Contains spams and untrustworthy tweets. 3 TweetRank Query Top K Results Query Tweet Rank Top N Results Acts as a mediator between User and Twitter K is much higher than N and thereby we are able to eliminate untrustworthy results. 4 Need for Relevance and Trust Spread of False Facts in Twitter has become an everyday event •Re-Tweets and users can be bought. •Thereby making relying on those for trustworthiness does not work. 5 Getting Relevant & Trustworthy Results •Manual curation is out of question.. (unless you are Government of China :-) ) - How many would it take to clean up a micro-blog with140 million active users? •Automated analysis? -Page Rank uses the explicit links between the Web Pages for evaluation of Trust and Relevance. But what are the links between tweets? 6 Links in Twitter Space Retweet Agreement Re-Tweet: Explicit links between tweets Agreement: Implicit links between tweets that contain the same fact 7 Agreement • Agreement between two tweets is defined as amount of similarity in their content. • Retweets are not considered in Agreement as Retweets are unverified endorsements. • How does agreement Capture Relevance and Trust? - A tweet which is agreed upon by a large number of other tweets is likely to be popular. The popular tweets are more likely to be Relevant. -Since agreement does not include retweets, most agreed tweet has most number of independent users agreeing on the same fact and hence they are more trustworthy. 8 Agreement Computation • For efficient computation of agreement we need to understand the meaning of each tweet. This need Natural Language Processing. • As a preliminary idea, we compute agreement using Soft TF-IDF with Jaro-Winkler similarity. • Soft TF-IDF is similar to TF-IDF except it considers similar tokens in two compared document vectors in addition exactly similar terms. 9 Computing Ranked Results • Simple voting technique is used to compute the Ranked Results. 1.3 .6 • • The Agreement of a tweet is the sum of the agreement with all others tweets. The tweets are sorted according to Agreement voting and Top-N results are send to user. 10 1.0 1 2 .7 .4 0.0 3 Results: Britney Spears Twitter Results (Oops?!) Britney Spears is Engaged... Again! - its britney: http://t.co/1E9LsaH7 TweetRank Results In entertainment: Britney Spears engaged to marry her longtime boyfriend and former agent Jason Trawick. RT @GMA: Britney Spears Engaged #Britney #Spears #engaged to Again http://t.co/5Ly0lga4 #boyfriend: #report: LOS ANGELES (Reuters) - Pop star Britney ... http://t.co/PiVU Britney Spears engaged: http://t.co/gpQQ2S6I" Congratulations to Britney Spears and her beau Jason Trawick for getting engaged via a 3.5 carat ring! We are certainly happy for her! 11 Evaluation - Relevance • Top N results where manually labelled as follows: Not related to the topic or spam 0 Remotely Relevant to the topic 1/3 Tweets which have some information on the topic 2/3 Tweets which have good amount of information 1 12 Evaluation - Trust • Top N results where manually labelled as follows: Untrustworthy tweets such as spam or wrong facts Tweets which are opinions Tweets which contain correct facts -1 0 1 13 Ranking Cost •The time increases quadratically with the number of tweets. •Since the computation of agreement is pairwise it can be easily parallelized using MapReduce. 14 Twitter Eco-System Tweeted URL Tweeted By Followers Hyperlinks 15 Summary Micro-blog spamming is increasingly becoming lucrative and problematic. We are working on a ranking sensitive to trustworthiness and relevance of Micro-blogs. We model the tweet space as a tri-layer graph; containing tweet layer, user layer and web-page layer. Ranking is derived based on users, tweets, and prestige of the referred web pages. 16