Influence in Social Media Kristina Lerman University of Southern California CS 599: Social Media Analysis University of Southern California 1 Identifying influential users in social networks Hard problem for large online social networks Topics covered How do you quantitatively measure influence in social networks? What metrics should you use? • Cha et al. 2010 “Measuring User Influence on Twitter: The Million Follower Fallacy”, in ICWSM. • Bakshy, et al. 2011 “Everyone’s an Influencer: Quantifying Influence on Twitter”, In WSDM. • Ghosh & Lerman. 2010 “Predicting Influential Users in Online Social Networks” In SNA-KDD workshop. Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha, Hamed Haddadi, Fabrıcio Benevenuto, Krishna P. Gummadi Niki Parmar University of Southern California Problem • Notion of Influence is important and long studied - Business, fashion, voting trends, marketing, etc • Measuring influence is difficult - Involves human choices, societies, complex • Modern view of key factors to measure influence are - Interpersonal relationship among users - Readiness to adapt to change • Leads to information diffusion, set trends, friendship, news gossip, contraversial issues.. • Can initiate large scale chain reactions Influence in Social Media • Online communication is the new way to receive information • Influence :“the power or capacity of causing an effect in indirect or intangible ways.” • Twitter usage - 271 million monthly active users. - 500 million Tweets are sent per day. • Understand roles of different users - Through followers, retweets, identity value • Effective advertisement on Twitter using influential users - Target only a few influential people to target majority of the users. Measuring Influence on Twitter • Comparison of three influence measures - Indegree influence: Number of followers of the user indicating size of the audience for that user - Retweet influence: Number of retweets containing one’s name indicating ability to generate content with pass-along value. - Mention influence: Number of replies to the user indicating ability of user to engage others in a conversation • Measure dynamics of user influence over topics and time. • How to gain and maintain influence? How can ordinary users be influential? Data Collection • Collected 54,981,152 in-use user accounts, which were connected to each other by 1,963,263,821 social links. • Gathered the tweets generated by all the user since early days which amounts to 1,755,925,520 tweets. • Focus on the largest component of the network which contains 99% of the links and tweets. • After filtering, measure influence for 6,189,636 users on the entire set of users. Comparing User Influence Comparing User Influence Top Users for each measure • Followers count – Public figures and news sources Get a lot of attention from their audience • Retweets – Content Aggregation Service like news channels & news sources, businessman They are about the content and contain URL’s • Mentions – Celebrities, Known Figures Ordinary users show great passion for celebrities Comparing User Influence • Correlation for all users is high since it contains the least influential users with no retweets or mentions and are not well connected. • Indegree has a low correlation with retweets and mentions. Large number of followers does not correlate to influence. Million Followers Fallacy • Retweets are about the content and mentions are identity driven – High Correlation , Greater Impact to influence. Influence Across Topics • Measure across 3 diverse topics : the Iranian presidential election, the outbreak of the H1N1 influenza and the death of Michael Jackson. • Distribution of user ranks for retweets and mentions follows power law pattern! Influence Across Topics • Measure variation of a user’s influence across the three topics using Spearman’s rank correlation coefficient • Observed strong correlation between topics • Most influential users hold influence across a range of topics. Top users similar for all the three topics. High for top 1%. Influence over time – Rise or Fall? • Engagement of top influentials - Top 100 users based on three measures - Calculate probability Ρ, random tweet posted on Twitter during a 15 day period is a retweet (or a mention) of that user. - Calculate this over 8 month period and normalize by the total tweets Influence of Ordinary Users • Gather top 20 users for each topic who tweet only about one topic • Calculate Probability P again for these 60 users over 8 months - Increase in retweets and mentions over the time period - Influentials spread information about protests, controversial news etc - Users limited to a single topic show largest increase in influence scores Conclusion • Capture different perspectives of influence – indegree, retweets, mentions • User’s popularity is not related to influence. Different groups of influence depending on content and name value • Most influential users hold influence over variety of topics • Top twitter users had a disproportionate amount of influence indicating a power law distribution • Users need to self advertise and have continuous effort and involvement to become influential over time Thank You ! #Questions? Everyone’s an influencer [Bakshy et al.] • Questions – What makes some content spread far but not others? – Can we reliably identify influential users on Twitter? • Findings – Largest cascades generated by users who have generated them in the past – Content matters? Positive URLs tend to spread farther – But, cannot reliably predict which user or URL will generate large cascade Word- of mouth diffusion & Influence • Diffusion: mechanism for information spread on networks – Diffusion event is a cascade Influencer is someone who can consistently trigger large cascades Marketing • Can we maximize cascades by seeding information (or product) with certain influential nodes? – These nodes can influence disproportionately many others • What characteristics of influentials can help identify them? – Credibility – Expertise – Enthusiasm – Centrality? • Need a large-scale, unbiased sample of observed cascades on a social network – Not only case studies of the most successful cascades Twitter • Users tweet short messages – Retweet posts of others – Tweets may contain URLs • Distinct markers that allow us to track diffusion • Social networks – Users follow ‘friends’ to see their tweets • “who listens to whom” • Compare impact of different users by measuring observed activity on Twitter – User who “seeds” content (URL) – Seed’s influence is measured by number of users connected to her who subsequently retweet the URL Data set • Tweets Diffusion events – 87M tweets containing a bit.ly URL that were broadcast Sep 13, 2009 – Nov 15, 2009 – 1.6M seed users active both months who initiated 74M cascades (46.3 cascades each on average) • Follower graph Social network – Collect followers of every active user, and their followers and so on – 56M users and 1.7B edges Computing influence on Twitter • Influence on Twitter = causing others to propagate information (URL) to their followers • User A influenced user B to retweet the URL if – User A tweets first – User A is a friend of B (B is a follower of A) – But, what if more than one friend tweets before B? Some hard facts about cascades • Most URLs do not spread at all! Predicting individual influence • User’s influence = log(average size of cascades user seeds) • What attributes of a user consistently predict influence? – Seed user attributes • • • • # of followers # friends # tweets Date of joining – Past influence of seed users • Average, minimum, maximum total influence • Average, minimum, maximum local influence – Local influence is number of retweet by seed’s followers • Train regression tree model on these attributes to predict future influence Regression tree (part) Most informative features: past local influence, # followers Influence as a function of most predictive features All users Top 25 Prediction performance of the regression tree Most cascades large cascades are driven by previously influential users. But, the extreme rarity of such cascades means that most users with these attributes will not be successful Does content matter? Do YouTube videos spread farther than niche news? • Manually classify 1000 URLs using Amazon Mechanical Turk – Type of URL • • • • • Spam/not spam/unsure Media sharing/social networking Blog/forum News/mass media Other – Category of URL • Lifestyle, tech, offbeat, entertainment, gaming, science, news, business, sports, other Average cascade size of types of content Impact of content on cascade size • Content judged by humans to be more interesting, or elicit more positive feeling spreads a little further on average • However, none of the content features had predictive value! Summary • What features help us predict whether a cascade will reach many people? – Content itself has no predictive power – Seed user’s features are somewhat predictive • Number of followers • Number of followers who retweet seeder’s posts • Targeting influencers – What viral marketing strategies reach wide audience at minimal cost? – May be more cost-effective to target many “ordinary influencers” than few highly influential users – Moot point, since “social epidemics” are so rare Predicting influential users in online social networks [Ghosh & Lerman] • Questions – Does network structure predict influence? – Which metric should we use to measure centrality in a particular social network? • Findings – The choice of the metric depends on the nature of interactions – Fundamental relationship between dynamic processes and measurement of network structure Centrality • SNA metrics examine topology of the network to identify important, or central nodes Alpha-centrality [Bonacich, 1987] Degree 4 5 1 4 5 3 4 2 3 PageRank [Brin et al, 1998] 5 1 1 2 Betweenness [Freeman, 1977] 4 5 1 2 3 2 3 Claim: The nature of interactions between nodes affects how we measure network structure Consequences for network analysis The Gossips, 1948 Norman Rockwell (American, 1894-1978) War News from Mexico, 1848 Richard Caton Woodville (American, 18251855) Types of interactions Two classes of interactions between network nodes •Conservative interactions – One-to-one, e.g., phone calls – Modeled by random walk 4 •Non-conservative interactions • One-to-many, e.g., epidemic, information diffusion • Modeled by contact process 5 1 4 5 2 3 1 2 3 Centrality metrics Centrality identifies important nodes in the network, e.g., those that are often visited by a process • Conservative interactions • Random walk-based metrics, e.g., PageRank, … 4 •Non-conservative interactions • Path-based metrics, e.g., Alphacentrality, … 5 1 4 5 2 3 1 2 3 Node size ~ centrality Which centrality metric is right? • Empirical study of influence in social media • Data from Digg and Twitter about how information (URL) spreads on the follower graph submitter follower follower follower Ground truth • Re-broadcasting (retweeting) provides ground truth for measuring influence • Empirical measure of influence/importance 1. average number re-broadcasts by followers 2. average size of cascades a node triggers • • Rank nodes by the empirical measure ground truth Compare rankings produced by centrality metrics to the ground truth Statistical significance of the influence metric Post URN MODEL submitter fan fan fan Users in OSN (N) Balls in the urn (N) Fans of submitter in OSN (K) White balls in the urn (K) No. of users who voted (n) No. of balls picked (n) K N k No of white balls picked (k) k n k P(X k | K,N,n) N (Hypergeometric Dist.) n No. of fans who voted (k) Metric is statistically significant Digg Twitter Which centrality metric is right for social media? Correlation between the ground truth and rankings predicted by Alpha-Centrality and PageRank Digg Twitter Non-conservative Alpha-Centrality best predicts node centrality (since information flow in social media is non-conservative) Summary • Network structure measurements – How we measure network structure depends on the nature of interactions between nodes – Affects how we compute centrality, strength of ties, and communities