george

Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian Preliminaries - Correlations exist in users' behaviors Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active Preliminaries - Correlations exist in users' behaviors - Representation: individuals are nodes of a social graph, G every node is "active" or "inactive" - Formally, correlation = if u and v are adjacent in G: the event that u becomes active is correlated with v becoming active - Want to distinguish between different sources of social correlation Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors) Models of Social Correlation - Homophily = tendency for individuals to choose friends with similar characteristics / preferences - Confounding = external influence from elements in the environment (confounding factors) - Influence = the action of one individual induces another individual to act in a similar way. Motivation - Useful to know when social influence is the source of correlation Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion) Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion) - We want to identify situations when such techniques can be applied. Motivation - Useful to know when social influence is the source of correlation - Viral marketing -> want to target select individuals - Influence behavior -> create "role models" (e.g. in fashion) - We want to identify situations when such techniques can be applied. - Also useful for analysis (predicting future state of network) Modeling Influence 1. Graph G drawn according to some distribution Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active. Modeling Influence 1. Graph G drawn according to some distribution 2. In each of the time steps 1, ..., T, each non-active agent decides whether to become active. 3. An agent becomes active with probability p(a), a function of the number of neighboring and active nodes. or, alternatively, Some remarks... - The coefficient α measures social correlation. Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes Some remarks... - The coefficient α measures social correlation. - Since actions are stored, a represents the number of users active at any earlier time step - This model is relatively simplistic: - the probability does not vary between nodes - or as time passes - However, these simplifying assumption are practical Estimating α, β - Can estimate using maximum likelihood logistic regression - Maximize expression where is the number of users who at the beginning of time had a active friends and became active at time t The Shuffle Test - Idea: if influence does not play a role, then the timing of activations amongst users should be independent of each other: Pr(a active before b) = Pr(b active before a) The Shuffle Test 1. Estimate α for initial graph 2. Randomly permute the order in which active nodes have been activated: set the time of 3. Estimate α' for this configuration 4. If the values for α and α' are close to each other, the model exhibits little or no social influence. The Edge-reversal Test 1. reverse direction of all the edges 2. run the same logistic regression on the data using the new graph If correlation is not due to influence, then α should not change Generative Models - No Correlation - Influence - Correlation, no influence Generative Models - No Correlation - network grows just as the real data - at every step, randomly pick n nodes, and make them active Influence Model - network grows just as the real data - at every step, every inactive node flips a coin, with Correlation, No Influence Model - network grows just as the real data - Pick a subset S of G: - randomly pick centers, add a ball of radius 2 from each to S - do this until |S| reaches parameter L - Pick nodes to become active uniformly at random, from S Distinguishing Influence: Shuffle Test Influence: Correlation: Distinguishing Influence: Edge Reversal Influence: Correlation: Real Data: the Flickr Dataset - analyzed 800K users over 16 months - about 340K exhibited tagging behavior - size of giant component: 160K - 2.8M directed edges, 28.5% not mutual - analyzed 1,700 tags independently - various types (event, color, object, etc) - various numbers of users - various growth patterns (bursty, smooth, periodic) Distinguishing Influence in Flickr Shuffle test Distinguishing Influence in Flickr Edge reversal test Some Influence - can discover traces of influence by looking at similar tags Some Influence - can discover traces of influence by looking at similar tags - for the tag "graffiti", the difference between αs was 0 - however, for the misspelling "grafitti", difference was slightly larger - with even less common misspelling "graffitti", difference increased even more Conclusions - distinguishing between correlation and causation is difficult Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle) Conclusions - distinguishing between correlation and causation is difficult - timing information can help answer the question (shuffle) - knowing of asymmetric social ties is also useful (edge-reversal) Further research directions - formal verification of results? (controlled experiments) - quantification of the strength of influence? - identify which nodes influence others - what if social ties are symmetric? - distinguishing between other forms of correlation - distinguishing between different forms of social influence Questions?

george

Related documents

Products

Support

george

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib