What is Twitter, a Social Network or a News Media? TWITTER

advertisement
TWITTE
What is Twitter,
a Social Network or a News Media?
Haewoon Kwak Changhyun Lee Hosung Park
Sue Moon
Department of Computer Science, KAIST, Korea
19th International World Wide Web Conference (WWW2010)
Friday, April 30, 2010
TWITTE
Twitter, a microblog service
2
Friday, April 30, 2010
TWITTE
Twitter, a microblog service
write a short message
2
Friday, April 30, 2010
TWITTE
Twitter, a microblog service
read neighbors’ tweets
3
Friday, April 30, 2010
TWITTE
In most OSN
“We are friends.”
4
Friday, April 30, 2010
TWITTE
In Twitter
“I follow you.”
5
Friday, April 30, 2010
TWITTE
Following on Twitter
“Unlike most social networks, following on Twitter
is not mutual. Someone who thinks you're
interesting can follow you, and you don't have to
approve, or follow back."
6
http://help.twitter.com/entries/14019-what-is-following
Friday, April 30, 2010
TWITTE
Following = subscribing tweets
recent tweets of followings
7
Friday, April 30, 2010
8
Friday, April 30, 2010
TWITTE
http://blog.marsdencartoons.com/2009/06/18/cartoon-iranian-election-demonstrations-and-twitter/marsden-iran-twitter72/
9
Friday, April 30, 2010
PROBLEM STATEMEN
The goal of this work
We analyze how directed relations of following
set Twitter apart from existing OSNs.
Then, we see if Twitter has any characteristics
of news media.
10
Friday, April 30, 2010
TWITTE
me⋅di⋅a [mee-dee-uh]
1.a pl. of medium
2.the means of communication, as radio and
television, newspapers, and magazines,
that reach or influence people widely
11
Friday, April 30, 2010
http://dictionary.reference.com/
PROBLEM STATEMEN
The goal of this work
We analyze how directed relations of following
set Twitter apart from existing OSNs.
Then, we see if Twitter has any characteristics
of news media.
12
Friday, April 30, 2010
1. Following is mostly not reciprocated (not so “social”)
2. Users talk about timely topics
3. A few users reach large audience directly
4. Most users can reach large audience by WOM* quickly
*WOM: word-of-mouth
13
Friday, April 30, 2010
OUTLIN
Summary of our findings
TWITTE
Data collection (09/6/1~9/24)
• 41.7M user profiles (near-complete at that time)
*publicly
available
1.47B
following
relations
•
• 4262 trending topics
• 106M tweets mentioning trending topics
‣
Spam tweets removed by CleanTweets
14
Friday, April 30, 2010
*http://an.kaist.ac.kr/traces/WWW2010.html
TWITTE
How we crawled
• Twitter’s well-defined 3rd party API
• With 20+ ‘whitelisted’ IPs
‣
Send 20,000 requests per IP / hour
15
Friday, April 30, 2010
TWITTE
Recent studies
• Ranking methodologies [WSDM’10]
• Predicting movie profits [HYPERTEXT’10]
• Recommending users [CHI’10 microblogging]
• Detecting real time events [WWW’10]
• The ‘entire’ Twittersphere unexplored
16
Friday, April 30, 2010
TRANSITIO
Part I.
1. Following is mostly not reciprocated (not so “social”)
2. Users talk about timely topics
3. A few users reach large audience directly
4. Most users can reach large audience by WOM* quickly
17
Friday, April 30, 2010
2. ACTIVE SUBSCRIPTIO
Why do people follow others?
• Reflection of offline social relationships
otherwise,
• Subscription to others’ messages
18
Friday, April 30, 2010
2. ACTIVE SUBSCRIPTIO
Sociologists’ answer
• “Reciprocal interactions pervade every relation
of primitive life and in all social systems”
19
Friday, April 30, 2010
2. ACTIVE SUBSCRIPTIO
Is following reciprocal?
• Only 22.1% of user pairs follow each other
• Much lower than
‣
68% on Flickr
‣
84% on Yahoo! 360
‣
77% on Cyworld guestbook messages
20
Friday, April 30, 2010
2. ACTIVE SUBSCRIPTIO
Low reciprocity of following
• Following is not similarly used as friend in OSNs
‣
Not reflection of offline social relationships
• Active subscription of tweets!
21
Friday, April 30, 2010
TRANSITIO
Part II.
1. Following is mostly not reciprocated (not so “social”)
2. Users talk about timely topics
3. A few users reach large audience directly
4. Most users can reach large audience by WOM* quickly
22
Friday, April 30, 2010
1. TIMELINESS TOPIC
Dynamically changing trends
23
Friday, April 30, 2010
1. TIMELINESS TOPIC
5.3 User Participation in Trending Topics
User participation pattern can
be a signature of a topic
How many topics does a user participate on average? Out of 41
million Twitter users, a large number of users (8, 262, 545) participated in trending topics and about 15% of those users participated
in more than 10 topics during four months.
(a) Topic ’apple’
(b) Topic ’#iranelection’
Figure 11: Cumulative numbers of tweets and users over time
24
Long-lasting topics with an increasing number of tweets do not
Friday, April 30, 2010
1. TIMELINESS TOPIC
Majority of topics are headline
topics ranked by the proe about offline news, and
‘remembering 9’) and, we
cting frequent words from
31.5%
“ephemeral”
54.3%
“headline news”
rending Topics
te on average? Out of 41
sers (8, 262, 545) particif those users participated
s.
(a) Exogenous subcritical
(topic ‘#backintheday’)
(b) Exogenous critical
(topic ‘beyonce’)
7.3%
“persistent news”
6.9%
Topic ’#iranelection’
eets and users over time
Friday, April 30, 2010
(d) Endogenous critical
(topic ‘#redsox’)
(c) Endogenous subcritical
(topic ‘lynn harris’)
Figure 13: The examples of classified popularity patterns
25
TRANSITIO
Part III.
1. Following is mostly not reciprocated (not so “social”)
2. Users talk about timely topics
3. A few users reach large audience directly
4. Most users can reach large audience by WOM* quickly
26
Friday, April 30, 2010
3. A FEW HUB
and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known
analysis and present the summary.
How
many
followers
a
user
has?
3.1 Basic Analysis
Figure 1: Number of followings
and
followers
27
Friday, April 30, 2010
3. A FEW HUB
≤ 3, attest to the existence of a relatively
nodes with a very large number of links.
so have distinguishingCCDF
properties, such as
mic threshold, ultra-small worldness, and
random errors [11, 12, 13, 14]. The degree
Complementary Cumulative Density Function
•
en plotted as a complementary
cumulative
R∞
#
#
−α
on (CCDF),
℘(k) ≡= k P(x)dx
P (k )dk ∼ k
∼
• CCDF(x=k)
wer-law distribution shows up as a straight
plot, the exponent of a power-law distrientative characteristic, distinguishing one
28
Friday, April 30, 2010
3. A FEW HUB
and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known
analysis and present the summary.
3.1
Reading
the
graph
Basic Analysis
Figure 1: Number of followings
and
followers
29
Friday, April 30, 2010
3. A FEW HUB
and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known
analysis and present the summary.
3.1
Plenty
of
super-hubs
Basic Analysis
Figure 1: Number of followings
and
followers
30
Friday, April 30, 2010
3. A FEW HUB
More super-hubs than projected
by power-law
• Where do they get all the followers? Possibly from...
‣
Search by ‘name’
‣
Recommendation by Twitter
• They reach millions in one hop
31
Friday, April 30, 2010
3. A FEW HUB
ings of the top 40 users is 114, three orders of magnitude smaller
than the number of followers). We revisit the issue of reciprocity in
Section 3.3.
3.2
Are those who have many
Followers
vs. Tweetsactive?
followers
Figure 2: The number of followers
and that of tweets per user
32
Friday, April 30, 2010
3. A FEW HUB
How we plotted
33
Friday, April 30, 2010
3. A FEW HUB
How we plotted
=9
× Med.
Avg. = 8
34
Friday, April 30, 2010
3. A FEW HUB
ings of the top 40 users is 114, three orders of magnitude smaller
than the number of followers). We revisit the issue of reciprocity in
Section 3.3.
More
followers,
more
tweets
3.2 Followers vs. Tweets
Figure 2: The number of followers
and that of tweets per user
35
Friday, April 30, 2010
3. A FEW HUB
ings of the top 40 users is 114, three orders of magnitude smaller
than the number of followers). We revisit the issue of reciprocity in
Section 3.3.
Many
followers
without
activity
3.2 Followers vs. Tweets
Figure 2: The number of followers
and that of tweets per user
36
Friday, April 30, 2010
3. A FEW HUB
Twitter user rankings by
Followers, PageRank and RT
37
Friday, April 30, 2010
3. A FEW HUB
Twitter user rankings by
Followers, PageRank and RT
38
Friday, April 30, 2010
3. A FEW HUB
independent news media based on online distribution. Ranking by
the retweets shows the rise of alternative media in Twitter.
Great
discrepancy
among
rankings
4.3 Comparison among Rankings
39
Friday, April 30, 2010
Figure 8: Comparison among rankings
TRANSITIO
Part IV.
1. Following is mostly not reciprocated (not so “social”)
2. Users talk about timely topics
3. A few users reach large audience directly
4. Most users can reach large audience by WOM* quickly
*WOM: word-of-mouth
40
Friday, April 30, 2010
4. WORD-OF-MOUT
Which is more efficient for WOM?
41
Friday, April 30, 2010
4. WORD-OF-MOUT
In Twitter
Information
Following
42
Friday, April 30, 2010
4. WORD-OF-MOUT
rather a source of information than a social networking site. Further validation is out of the scope of this paper and we leave it for
future work.
Average path length: 4.1
3.4 Degree of Separation
Figure 4: Degree of separation
43
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
44
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
Last day of WWW’10
node 0
45
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
Last day of WWW’10
Last day of WWW’10
Last day of WWW’10
Last day of WWW’10
46
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
RT @node0 Last day of WWW’10
node 4
47
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
RT @node0 Last day of WWW’10
RT @node0 Last day of WWW’10
RT @node0 Last day of WWW’10
48
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Relay tweets from a following to followers
retweeter
r
w writer
49
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• Not only 1 hop neighbors
1 hop neighbors
r
w
50
Friday, April 30, 2010
4. WORD-OF-MOUT
Retweet (RT)
• More goes further
2 hop neighbors
r
w
51
Friday, April 30, 2010
4. WORD-OF-MOUT
We construct RT tree
• A tree with writer and retweeter(s)
r
w
52
Friday, April 30, 2010
4. WORD-OF-MOUT
Height of RT trees
W
W
r
r r r
W
r
r
r
1
1
53
Friday, April 30, 2010
2
4. WORD-OF-MOUT
Empirical RT trees
54
Friday, April 30, 2010
4. WORD-OF-MOUT
96% of RT trees = Height 1
Figure 15: Retweet trees of ‘air france flight’ tweets
Figure 16: Height and participating
users
in
retweet
trees
55
Friday, April 30, 2010
4. WORD-OF-MOUT
yond adjacent neighbors. We dig into the retweet trees constructed
per trending topic and examine key factors that impact the eventual
spread of information.
Boosting audience by RT
6.1 Audience Size of Retweet
Figure 14: Average and median numbers of additional recipients of the tweet via retweeting56
Friday, April 30, 2010
4. WORD-OF-MOUT
Additional readers
2 additional readers
by retweeter
3 followers
r
w
57
Friday, April 30, 2010
4. WORD-OF-MOUT
yond adjacent neighbors. We dig into the retweet trees constructed
per trending topic and examine key factors that impact the eventual
spread of information.
A retweet brings a few
additional
6.1 hundred
Audience Size
of Retweet readers
Figure 14: Average and median numbers of additional recipients of the tweet via retweeting58
Friday, April 30, 2010
4. WORD-OF-MOUT
retweets appear and how long they last. Figure 17 plots the tim
lag from a tweet to its retweet. Half of retweeting occurs within a
hour, and 75% under a day. However about 10% of retweets tak
place a month later,
Time lag between hops in RT tree
Figure 17: Time lag between 59a retweet and the original tweet
Friday, April 30, 2010
4. WORD-OF-MOUT
retweets appear and how long they last. Figure 17 plots the tim
lag from a tweet to its retweet. Half of retweeting occurs within a
hour, and 75% under a day. However about 10% of retweets tak
place a month later,
Fast relaying tweets by RT:
35% of RT < 10 min.
Figure 17: Time lag between 60a retweet and the original tweet
Friday, April 30, 2010
4. WORD-OF-MOUT
retweets appear and how long they last. Figure 17 plots the tim
lag from a tweet to its retweet. Half of retweeting occurs within a
hour, and 75% under a day. However about 10% of retweets tak
place a month later,
Fast relaying tweets by RT:
55% of RT < 1hr.
Figure 17: Time lag between 61a retweet and the original tweet
Friday, April 30, 2010
SUMMAR
Summary
1. We study the entire Twittersphere
2. Low reciprocity distinguishes Twitter from OSNs
3. Twitter has characteristics of news media:
‣ Tweets mentioning timely topics
‣ Plenty of hubs reaching a large public directly
‣ Fast and wide spread of word-of-mouth
62
Friday, April 30, 2010
SUMMAR
Resources
• http://an.kaist.ac.kr/traces/WWW2010.html
63
Friday, April 30, 2010
Supplementary info.
64
Friday, April 30, 2010
SUPPLEMENTARY INF
About Twitter
“
”
65
Friday, April 30, 2010
SUPPLEMENTARY INF
About Twitter
“
”
65
Friday, April 30, 2010
SUPPLEMENTARY INF
A few numbers
• 105M registered accounts
• 55M tweets a day
• 180M unique visitors a month
• 19B searches a month
66
Friday, April 30, 2010
http://chirp.twitter.com/
SUPPLEMENTARY INF
This can be interpreted as a large following in another continent.
We conclude that Twitter users who have reciprocal relations of
fewer than 2, 000 are likely to be geographically close.
Homophily in terms of followers
Figure 6: The average number67of followers of r-friends per user
Friday, April 30, 2010
SUPPLEMENTARY INF
This can be interpreted as a large following in another continent.
We conclude that Twitter users who have reciprocal relations of
fewer than 2, 000 are likely to be geographically close.
Assortative mixing
Figure 6: The average number68of followers of r-friends per user
Friday, April 30, 2010
SUPPLEMENTARY INF
Homophily in terms of location
Figure 5: The average time differences
between a user and r69
friends
Friday, April 30, 2010
SUPPLEMENTARY INF
Favoritism in RTs?
• A few informative users?
?
?
?
70
Friday, April 30, 2010
SUPPLEMENTARY INF
Disparity in weighted network
s than
have
The
This
d how
height
r how
soon
time
hin an
Friday, April 30, 2010
evenly among one’s followers. How even is the information diffusion in retweet? To answer this question we investigate disparity [2]
in retweet trees.
For each user i we define |rij | as the number of retweets from
user j. The Y (k, i) is defined as follows:
Y (k, i) =
k
�
j=1
�
|rij |
�k
l=1 |ril |
�2
(3)
Y (k) represents Y (k, i) averaged over all nodes that have k outgoing (incoming) edges. Here an edge represents a retweet. When
retweeting occurs evenly among
followers, then kY (k) ∼ 1. If
71
most of retweeting occurs within a subset of followers, then kY (k)
SUPPLEMENTARY INF
ures 19(a) and 19(b) shows a linear correlation up to 1, 000 followers. The linear correlation to k represents favoritism in retweets:
people only retweets from a small number of people and only a
subset of a user’s followers actually retweet. Chun et al. also report that favoritism exists in conversation from guestbook logs of
Cyworld, the biggest social networks in Korea [5].
Favoritism in RTs
tweet
(a) kout Y(kout ) ∼ kout
etweet
on the
and the
than a
hat the
h more
away.
Friday, April 30, 2010
(b) kin Y(kin ) ∼ kin
Figure 19: Disparity in retweet trees
7.
RELATED WORK
Online social networks and social
media
72
The rising popularity of online social networking services has
SUPPLEMENTARY INF
responsive and basically occur back to back up to 5 hops away.
Cha et al. reports that favorite photos diffuse in the order of days
in Flickr [4]. The strength of Twitter as a medium for information
diffusion stands out by the speed of retweets.
Fast WOM by retweet
Figure 18: Elapsed time of retweet from (n − 1) hop to n hop
73
Friday, April 30, 2010
Download