Talk Slides - Raju Balakrishnan`s Home Page

advertisement
Ranking Tweets Considering Trust and
Relevance
Srijith Ravikumar,Raju Balakrishnan, and Subbarao Kambhampati
Arizona State University
1
•
One of the most prominent micro-blogging service.
•
Twitter has over 140 million active users and generates
over 340 millions tweets daily and handles over 1.6
billion search queries per day.
•
Users access tweets by following other users and by
using the search function.
2
Twitter Search
Results for the Query: “Britney Spears”
•
Sorted by Reverse
Chronological Order
•
Select the top retweeted
single tweet as the top
Tweet.
•
Does not apply any
relevance metrics.
•
Contains spams and
untrustworthy tweets.
3
TweetRank
Query
Top K
Results
Query
Tweet
Rank
Top N
Results
Acts as a mediator between User and Twitter
K is much higher than N and thereby we are able
to eliminate untrustworthy results.
4
Need for Relevance and Trust
Spread of False Facts in Twitter has become an everyday
event
•Re-Tweets and users can
be bought.
•Thereby making relying on
those for trustworthiness
does not work.
5
Getting Relevant & Trustworthy
Results
•Manual curation is out of question.. (unless you are
Government of China :-) )
- How many would it take to clean up
a micro-blog with140 million active
users?
•Automated analysis?
-Page Rank uses the explicit links between the
Web Pages for evaluation of Trust and Relevance.
But what are the links between tweets?
6
Links in Twitter Space
Retweet
Agreement
Re-Tweet: Explicit links between tweets
Agreement: Implicit links between tweets that
contain the same fact
7
Agreement
•
Agreement between two tweets is defined as amount of
similarity in their content.
•
Retweets are not considered in Agreement as Retweets
are unverified endorsements.
• How does agreement Capture Relevance and
Trust?
- A tweet which is agreed upon by a large number of
other tweets is likely to be popular. The popular tweets
are more likely to be Relevant.
-Since agreement does not include retweets, most
agreed tweet has most number of independent users
agreeing on the same fact and hence they are more
trustworthy.
8
Agreement Computation
•
For efficient computation of agreement we need to
understand the meaning of each tweet. This need
Natural Language Processing.
•
As a preliminary idea, we compute agreement using
Soft TF-IDF with Jaro-Winkler similarity.
•
Soft TF-IDF is similar to TF-IDF except it considers
similar tokens in two compared document vectors in
addition exactly similar terms.
9
Computing Ranked Results
•
Simple voting technique is used
to compute the Ranked Results.
1.3
.6
•
•
The Agreement of a tweet is the
sum of the agreement with all
others tweets.
The tweets are sorted according
to Agreement voting and Top-N
results are send to user.
10
1.0
1
2
.7
.4
0.0
3
Results: Britney Spears
Twitter Results
(Oops?!) Britney Spears is
Engaged... Again! - its britney:
http://t.co/1E9LsaH7
TweetRank Results
In entertainment: Britney Spears
engaged to marry her longtime
boyfriend and former agent Jason
Trawick.
RT @GMA: Britney Spears Engaged #Britney #Spears #engaged to
Again http://t.co/5Ly0lga4
#boyfriend: #report: LOS ANGELES
(Reuters) - Pop star Britney ...
http://t.co/PiVU
Britney Spears engaged:
http://t.co/gpQQ2S6I"
Congratulations to Britney Spears
and her beau Jason Trawick for
getting engaged via a 3.5 carat ring!
We are certainly happy for her!
11
Evaluation - Relevance
•
Top N results where
manually labelled as
follows:
Not related to the topic or spam
0
Remotely Relevant to the topic
1/3
Tweets which have some
information on the topic
2/3
Tweets which have good amount
of information
1
12
Evaluation - Trust
•
Top N results where
manually labelled as
follows:
Untrustworthy tweets
such as spam or wrong
facts
Tweets which are
opinions
Tweets which contain
correct facts
-1
0
1
13
Ranking Cost
•The time increases
quadratically with the
number of tweets.
•Since the
computation of
agreement is
pairwise it can be
easily parallelized
using MapReduce.
14
Twitter Eco-System
Tweeted URL
Tweeted By
Followers
Hyperlinks
15
Summary
 Micro-blog spamming is
increasingly becoming lucrative
and problematic.
We are working on a ranking
sensitive to trustworthiness and
relevance of Micro-blogs.
We model the tweet space as a
tri-layer graph; containing tweet
layer, user layer and web-page
layer.
Ranking is derived based on
users, tweets, and prestige of the
referred web pages.
16
Download
Random flashcards
Arab people

15 Cards

Pastoralists

20 Cards

Nomads

17 Cards

Marketing

46 Cards

Create flashcards