TWITTE What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea 19th International World Wide Web Conference (WWW2010) Friday, April 30, 2010 TWITTE Twitter, a microblog service 2 Friday, April 30, 2010 TWITTE Twitter, a microblog service write a short message 2 Friday, April 30, 2010 TWITTE Twitter, a microblog service read neighbors’ tweets 3 Friday, April 30, 2010 TWITTE In most OSN “We are friends.” 4 Friday, April 30, 2010 TWITTE In Twitter “I follow you.” 5 Friday, April 30, 2010 TWITTE Following on Twitter “Unlike most social networks, following on Twitter is not mutual. Someone who thinks you're interesting can follow you, and you don't have to approve, or follow back." 6 http://help.twitter.com/entries/14019-what-is-following Friday, April 30, 2010 TWITTE Following = subscribing tweets recent tweets of followings 7 Friday, April 30, 2010 8 Friday, April 30, 2010 TWITTE http://blog.marsdencartoons.com/2009/06/18/cartoon-iranian-election-demonstrations-and-twitter/marsden-iran-twitter72/ 9 Friday, April 30, 2010 PROBLEM STATEMEN The goal of this work We analyze how directed relations of following set Twitter apart from existing OSNs. Then, we see if Twitter has any characteristics of news media. 10 Friday, April 30, 2010 TWITTE me⋅di⋅a [mee-dee-uh] 1.a pl. of medium 2.the means of communication, as radio and television, newspapers, and magazines, that reach or influence people widely 11 Friday, April 30, 2010 http://dictionary.reference.com/ PROBLEM STATEMEN The goal of this work We analyze how directed relations of following set Twitter apart from existing OSNs. Then, we see if Twitter has any characteristics of news media. 12 Friday, April 30, 2010 1. Following is mostly not reciprocated (not so “social”) 2. Users talk about timely topics 3. A few users reach large audience directly 4. Most users can reach large audience by WOM* quickly *WOM: word-of-mouth 13 Friday, April 30, 2010 OUTLIN Summary of our findings TWITTE Data collection (09/6/1~9/24) • 41.7M user profiles (near-complete at that time) *publicly available 1.47B following relations • • 4262 trending topics • 106M tweets mentioning trending topics ‣ Spam tweets removed by CleanTweets 14 Friday, April 30, 2010 *http://an.kaist.ac.kr/traces/WWW2010.html TWITTE How we crawled • Twitter’s well-defined 3rd party API • With 20+ ‘whitelisted’ IPs ‣ Send 20,000 requests per IP / hour 15 Friday, April 30, 2010 TWITTE Recent studies • Ranking methodologies [WSDM’10] • Predicting movie profits [HYPERTEXT’10] • Recommending users [CHI’10 microblogging] • Detecting real time events [WWW’10] • The ‘entire’ Twittersphere unexplored 16 Friday, April 30, 2010 TRANSITIO Part I. 1. Following is mostly not reciprocated (not so “social”) 2. Users talk about timely topics 3. A few users reach large audience directly 4. Most users can reach large audience by WOM* quickly 17 Friday, April 30, 2010 2. ACTIVE SUBSCRIPTIO Why do people follow others? • Reflection of offline social relationships otherwise, • Subscription to others’ messages 18 Friday, April 30, 2010 2. ACTIVE SUBSCRIPTIO Sociologists’ answer • “Reciprocal interactions pervade every relation of primitive life and in all social systems” 19 Friday, April 30, 2010 2. ACTIVE SUBSCRIPTIO Is following reciprocal? • Only 22.1% of user pairs follow each other • Much lower than ‣ 68% on Flickr ‣ 84% on Yahoo! 360 ‣ 77% on Cyworld guestbook messages 20 Friday, April 30, 2010 2. ACTIVE SUBSCRIPTIO Low reciprocity of following • Following is not similarly used as friend in OSNs ‣ Not reflection of offline social relationships • Active subscription of tweets! 21 Friday, April 30, 2010 TRANSITIO Part II. 1. Following is mostly not reciprocated (not so “social”) 2. Users talk about timely topics 3. A few users reach large audience directly 4. Most users can reach large audience by WOM* quickly 22 Friday, April 30, 2010 1. TIMELINESS TOPIC Dynamically changing trends 23 Friday, April 30, 2010 1. TIMELINESS TOPIC 5.3 User Participation in Trending Topics User participation pattern can be a signature of a topic How many topics does a user participate on average? Out of 41 million Twitter users, a large number of users (8, 262, 545) participated in trending topics and about 15% of those users participated in more than 10 topics during four months. (a) Topic ’apple’ (b) Topic ’#iranelection’ Figure 11: Cumulative numbers of tweets and users over time 24 Long-lasting topics with an increasing number of tweets do not Friday, April 30, 2010 1. TIMELINESS TOPIC Majority of topics are headline topics ranked by the proe about offline news, and ‘remembering 9’) and, we cting frequent words from 31.5% “ephemeral” 54.3% “headline news” rending Topics te on average? Out of 41 sers (8, 262, 545) particif those users participated s. (a) Exogenous subcritical (topic ‘#backintheday’) (b) Exogenous critical (topic ‘beyonce’) 7.3% “persistent news” 6.9% Topic ’#iranelection’ eets and users over time Friday, April 30, 2010 (d) Endogenous critical (topic ‘#redsox’) (c) Endogenous subcritical (topic ‘lynn harris’) Figure 13: The examples of classified popularity patterns 25 TRANSITIO Part III. 1. Following is mostly not reciprocated (not so “social”) 2. Users talk about timely topics 3. A few users reach large audience directly 4. Most users can reach large audience by WOM* quickly 26 Friday, April 30, 2010 3. A FEW HUB and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known analysis and present the summary. How many followers a user has? 3.1 Basic Analysis Figure 1: Number of followings and followers 27 Friday, April 30, 2010 3. A FEW HUB ≤ 3, attest to the existence of a relatively nodes with a very large number of links. so have distinguishingCCDF properties, such as mic threshold, ultra-small worldness, and random errors [11, 12, 13, 14]. The degree Complementary Cumulative Density Function • en plotted as a complementary cumulative R∞ # # −α on (CCDF), ℘(k) ≡= k P(x)dx P (k )dk ∼ k ∼ • CCDF(x=k) wer-law distribution shows up as a straight plot, the exponent of a power-law distrientative characteristic, distinguishing one 28 Friday, April 30, 2010 3. A FEW HUB and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known analysis and present the summary. 3.1 Reading the graph Basic Analysis Figure 1: Number of followings and followers 29 Friday, April 30, 2010 3. A FEW HUB and compared against each other. Before we delve into the eccentricities and peculiarities of Twitter, we run a batch of well-known analysis and present the summary. 3.1 Plenty of super-hubs Basic Analysis Figure 1: Number of followings and followers 30 Friday, April 30, 2010 3. A FEW HUB More super-hubs than projected by power-law • Where do they get all the followers? Possibly from... ‣ Search by ‘name’ ‣ Recommendation by Twitter • They reach millions in one hop 31 Friday, April 30, 2010 3. A FEW HUB ings of the top 40 users is 114, three orders of magnitude smaller than the number of followers). We revisit the issue of reciprocity in Section 3.3. 3.2 Are those who have many Followers vs. Tweetsactive? followers Figure 2: The number of followers and that of tweets per user 32 Friday, April 30, 2010 3. A FEW HUB How we plotted 33 Friday, April 30, 2010 3. A FEW HUB How we plotted =9 × Med. Avg. = 8 34 Friday, April 30, 2010 3. A FEW HUB ings of the top 40 users is 114, three orders of magnitude smaller than the number of followers). We revisit the issue of reciprocity in Section 3.3. More followers, more tweets 3.2 Followers vs. Tweets Figure 2: The number of followers and that of tweets per user 35 Friday, April 30, 2010 3. A FEW HUB ings of the top 40 users is 114, three orders of magnitude smaller than the number of followers). We revisit the issue of reciprocity in Section 3.3. Many followers without activity 3.2 Followers vs. Tweets Figure 2: The number of followers and that of tweets per user 36 Friday, April 30, 2010 3. A FEW HUB Twitter user rankings by Followers, PageRank and RT 37 Friday, April 30, 2010 3. A FEW HUB Twitter user rankings by Followers, PageRank and RT 38 Friday, April 30, 2010 3. A FEW HUB independent news media based on online distribution. Ranking by the retweets shows the rise of alternative media in Twitter. Great discrepancy among rankings 4.3 Comparison among Rankings 39 Friday, April 30, 2010 Figure 8: Comparison among rankings TRANSITIO Part IV. 1. Following is mostly not reciprocated (not so “social”) 2. Users talk about timely topics 3. A few users reach large audience directly 4. Most users can reach large audience by WOM* quickly *WOM: word-of-mouth 40 Friday, April 30, 2010 4. WORD-OF-MOUT Which is more efficient for WOM? 41 Friday, April 30, 2010 4. WORD-OF-MOUT In Twitter Information Following 42 Friday, April 30, 2010 4. WORD-OF-MOUT rather a source of information than a social networking site. Further validation is out of the scope of this paper and we leave it for future work. Average path length: 4.1 3.4 Degree of Separation Figure 4: Degree of separation 43 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers 44 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers Last day of WWW’10 node 0 45 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers Last day of WWW’10 Last day of WWW’10 Last day of WWW’10 Last day of WWW’10 46 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers RT @node0 Last day of WWW’10 node 4 47 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers RT @node0 Last day of WWW’10 RT @node0 Last day of WWW’10 RT @node0 Last day of WWW’10 48 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Relay tweets from a following to followers retweeter r w writer 49 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • Not only 1 hop neighbors 1 hop neighbors r w 50 Friday, April 30, 2010 4. WORD-OF-MOUT Retweet (RT) • More goes further 2 hop neighbors r w 51 Friday, April 30, 2010 4. WORD-OF-MOUT We construct RT tree • A tree with writer and retweeter(s) r w 52 Friday, April 30, 2010 4. WORD-OF-MOUT Height of RT trees W W r r r r W r r r 1 1 53 Friday, April 30, 2010 2 4. WORD-OF-MOUT Empirical RT trees 54 Friday, April 30, 2010 4. WORD-OF-MOUT 96% of RT trees = Height 1 Figure 15: Retweet trees of ‘air france flight’ tweets Figure 16: Height and participating users in retweet trees 55 Friday, April 30, 2010 4. WORD-OF-MOUT yond adjacent neighbors. We dig into the retweet trees constructed per trending topic and examine key factors that impact the eventual spread of information. Boosting audience by RT 6.1 Audience Size of Retweet Figure 14: Average and median numbers of additional recipients of the tweet via retweeting56 Friday, April 30, 2010 4. WORD-OF-MOUT Additional readers 2 additional readers by retweeter 3 followers r w 57 Friday, April 30, 2010 4. WORD-OF-MOUT yond adjacent neighbors. We dig into the retweet trees constructed per trending topic and examine key factors that impact the eventual spread of information. A retweet brings a few additional 6.1 hundred Audience Size of Retweet readers Figure 14: Average and median numbers of additional recipients of the tweet via retweeting58 Friday, April 30, 2010 4. WORD-OF-MOUT retweets appear and how long they last. Figure 17 plots the tim lag from a tweet to its retweet. Half of retweeting occurs within a hour, and 75% under a day. However about 10% of retweets tak place a month later, Time lag between hops in RT tree Figure 17: Time lag between 59a retweet and the original tweet Friday, April 30, 2010 4. WORD-OF-MOUT retweets appear and how long they last. Figure 17 plots the tim lag from a tweet to its retweet. Half of retweeting occurs within a hour, and 75% under a day. However about 10% of retweets tak place a month later, Fast relaying tweets by RT: 35% of RT < 10 min. Figure 17: Time lag between 60a retweet and the original tweet Friday, April 30, 2010 4. WORD-OF-MOUT retweets appear and how long they last. Figure 17 plots the tim lag from a tweet to its retweet. Half of retweeting occurs within a hour, and 75% under a day. However about 10% of retweets tak place a month later, Fast relaying tweets by RT: 55% of RT < 1hr. Figure 17: Time lag between 61a retweet and the original tweet Friday, April 30, 2010 SUMMAR Summary 1. We study the entire Twittersphere 2. Low reciprocity distinguishes Twitter from OSNs 3. Twitter has characteristics of news media: ‣ Tweets mentioning timely topics ‣ Plenty of hubs reaching a large public directly ‣ Fast and wide spread of word-of-mouth 62 Friday, April 30, 2010 SUMMAR Resources • http://an.kaist.ac.kr/traces/WWW2010.html 63 Friday, April 30, 2010 Supplementary info. 64 Friday, April 30, 2010 SUPPLEMENTARY INF About Twitter “ ” 65 Friday, April 30, 2010 SUPPLEMENTARY INF About Twitter “ ” 65 Friday, April 30, 2010 SUPPLEMENTARY INF A few numbers • 105M registered accounts • 55M tweets a day • 180M unique visitors a month • 19B searches a month 66 Friday, April 30, 2010 http://chirp.twitter.com/ SUPPLEMENTARY INF This can be interpreted as a large following in another continent. We conclude that Twitter users who have reciprocal relations of fewer than 2, 000 are likely to be geographically close. Homophily in terms of followers Figure 6: The average number67of followers of r-friends per user Friday, April 30, 2010 SUPPLEMENTARY INF This can be interpreted as a large following in another continent. We conclude that Twitter users who have reciprocal relations of fewer than 2, 000 are likely to be geographically close. Assortative mixing Figure 6: The average number68of followers of r-friends per user Friday, April 30, 2010 SUPPLEMENTARY INF Homophily in terms of location Figure 5: The average time differences between a user and r69 friends Friday, April 30, 2010 SUPPLEMENTARY INF Favoritism in RTs? • A few informative users? ? ? ? 70 Friday, April 30, 2010 SUPPLEMENTARY INF Disparity in weighted network s than have The This d how height r how soon time hin an Friday, April 30, 2010 evenly among one’s followers. How even is the information diffusion in retweet? To answer this question we investigate disparity [2] in retweet trees. For each user i we define |rij | as the number of retweets from user j. The Y (k, i) is defined as follows: Y (k, i) = k � j=1 � |rij | �k l=1 |ril | �2 (3) Y (k) represents Y (k, i) averaged over all nodes that have k outgoing (incoming) edges. Here an edge represents a retweet. When retweeting occurs evenly among followers, then kY (k) ∼ 1. If 71 most of retweeting occurs within a subset of followers, then kY (k) SUPPLEMENTARY INF ures 19(a) and 19(b) shows a linear correlation up to 1, 000 followers. The linear correlation to k represents favoritism in retweets: people only retweets from a small number of people and only a subset of a user’s followers actually retweet. Chun et al. also report that favoritism exists in conversation from guestbook logs of Cyworld, the biggest social networks in Korea [5]. Favoritism in RTs tweet (a) kout Y(kout ) ∼ kout etweet on the and the than a hat the h more away. Friday, April 30, 2010 (b) kin Y(kin ) ∼ kin Figure 19: Disparity in retweet trees 7. RELATED WORK Online social networks and social media 72 The rising popularity of online social networking services has SUPPLEMENTARY INF responsive and basically occur back to back up to 5 hops away. Cha et al. reports that favorite photos diffuse in the order of days in Flickr [4]. The strength of Twitter as a medium for information diffusion stands out by the speed of retweets. Fast WOM by retweet Figure 18: Elapsed time of retweet from (n − 1) hop to n hop 73 Friday, April 30, 2010