Talk Slides - Raju Balakrishnan`s Home Page

advertisement
Ranking Tweets Considering Trust and
Relevance
Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati
Arizona State University
1
•
One of the most prominent micro-blogging service.
•
Twitter has over 140 million active users and generates
over 340 millions tweets daily and handles over 1.6
billion search queries per day.
•
Users access tweets by following other users and by
using the search function.
2
Twitter Search
Results for the Query: “Britney Spears”
•
Sorted by Reverse
Chronological Order
•
Select the top retweeted
single tweet as the top
Tweet.
•
Does not apply any
relevance metrics.
•
Contains spams and
untrustworthy tweets.
3
TweetRank
Query
Top K
Results
Query
Tweet
Rank
Top N
Results
Acts as a mediator between User and Twitter
K is much higher than N and thereby we are able
to eliminate untrustworthy results.
4
Need for Relevance and Trust
Spread of False Facts in Twitter has become an everyday
event
•Re-Tweets and users can
be bought.
•Thereby making relying on
those for trustworthiness
does not work.
5
Getting Relevant & Trustworthy
Results
•Manual curation is out of question.. (unless you are
Government of China :-) )
- How many would it take to clean up
a micro-blog with140 million active
users?
•Automated analysis?
-Page Rank uses the explicit links between the
Web Pages for evaluation of Trust and Relevance.
But what are the links between tweets?
6
Links in Twitter Space
Retweet
Agreement
Re-Tweet: Explicit links between tweets
Agreement: Implicit links between tweets that
contain the same fact
7
Agreement
•
Agreement between two tweets is defined as amount of
similarity in their content.
•
Retweets are not considered in Agreement as Retweets
are unverified endorsements.
• How does agreement Capture Relevance and
Trust?
- A tweet which is agreed upon by a large number of
other tweets is likely to be popular. The popular tweets
are more likely to be Relevant.
-Since agreement does not include retweets, most
agreed tweet has most number of independent users
agreeing on the same fact and hence they are more
trustworthy.
8
Agreement Computation
•
For efficient computation of agreement we need to
understand the meaning of each tweet. This need
Natural Language Processing.
•
As a preliminary idea, we compute agreement using
Soft TF-IDF with Jaro-Winkler similarity.
•
Soft TF-IDF is similar to TF-IDF except it considers
similar tokens in two compared document vectors in
addition exactly similar terms.
9
Computing Ranked Results
•
Simple voting technique is used
to compute the Ranked Results.
1.3
.6
•
•
The Agreement of a tweet is the
sum of the agreement with all
others tweets.
The tweets are sorted according
to Agreement voting and Top-N
results are send to user.
10
1.0
1
2
.7
.4
0.0
3
Results: Britney Spears
Twitter Results
(Oops?!) Britney Spears is
Engaged... Again! - its britney:
http://t.co/1E9LsaH7
TweetRank Results
In entertainment: Britney Spears
engaged to marry her longtime
boyfriend and former agent Jason
Trawick.
RT @GMA: Britney Spears Engaged #Britney #Spears #engaged to
Again http://t.co/5Ly0lga4
#boyfriend: #report: LOS ANGELES
(Reuters) - Pop star Britney ...
http://t.co/PiVU
Britney Spears engaged:
http://t.co/gpQQ2S6I"
Congratulations to Britney Spears
and her beau Jason Trawick for
getting engaged via a 3.5 carat ring!
We are certainly happy for her!
11
Evaluation - Relevance
•
Top N results where
manually labelled as
follows:
Not related to the topic or spam
0
Remotely Relevant to the topic
1/3
Tweets which have some
information on the topic
2/3
Tweets which have good amount
of information
1
12
Evaluation - Trust
•
Top N results where
manually labelled as
follows:
Untrustworthy tweets
such as spam or wrong
facts
Tweets which are
opinions
Tweets which contain
correct facts
-1
0
1
13
Ranking Cost
•The time increases
quadratically with the
number of tweets.
•Since the
computation of
agreement is
pairwise it can be
easily parallelized
using MapReduce.
14
Twitter Eco-System
Tweeted URL
Tweeted By
Followers
Hyperlinks
15
Summary
 Micro-blog spamming is
increasingly becoming lucrative
and problematic.
We are working on a ranking
sensitive to trustworthiness and
relevance of Micro-blogs.
We model the tweet space as a
tri-layer graph; containing tweet
layer, user layer and web-page
layer.
Ranking is derived based on
users, tweets, and prestige of the
referred web pages.
16
Download