george

advertisement
Influence and Correlation in
Social Networks
Aris Anagnostopoulos
Ravi Kumar
Mohammad Mahdian
Preliminaries
- Correlations exist in users' behaviors
Preliminaries
- Correlations exist in users' behaviors
- Representation:
individuals are nodes of a social graph, G
every node is "active" or "inactive"
- Formally, correlation = if u and v are adjacent in G:
the event that u becomes active is correlated with v becoming active
Preliminaries
- Correlations exist in users' behaviors
- Representation:
individuals are nodes of a social graph, G
every node is "active" or "inactive"
- Formally, correlation = if u and v are adjacent in G:
the event that u becomes active is correlated with v becoming active
- Want to distinguish between different sources of social
correlation
Models of Social Correlation
- Homophily = tendency for individuals to choose friends with
similar characteristics / preferences
Models of Social Correlation
- Homophily = tendency for individuals to choose friends with
similar characteristics / preferences
- Confounding = external influence from elements in the
environment (confounding factors)
Models of Social Correlation
- Homophily = tendency for individuals to choose friends with
similar characteristics / preferences
- Confounding = external influence from elements in the
environment (confounding factors)
- Influence = the action of one individual induces another
individual to act in a similar way.
Motivation
- Useful to know when social influence is the source of
correlation
Motivation
- Useful to know when social influence is the source of
correlation
- Viral marketing -> want to target select individuals
Motivation
- Useful to know when social influence is the source of
correlation
- Viral marketing -> want to target select individuals
- Influence behavior -> create "role models" (e.g. in fashion)
Motivation
- Useful to know when social influence is the source of
correlation
- Viral marketing -> want to target select individuals
- Influence behavior -> create "role models" (e.g. in fashion)
- We want to identify situations when such techniques can be
applied.
Motivation
- Useful to know when social influence is the source of
correlation
- Viral marketing -> want to target select individuals
- Influence behavior -> create "role models" (e.g. in fashion)
- We want to identify situations when such techniques can be
applied.
- Also useful for analysis (predicting future state of network)
Modeling Influence
1. Graph G drawn according to some distribution
Modeling Influence
1. Graph G drawn according to some distribution
2. In each of the time steps 1, ..., T, each non-active agent
decides whether to become active.
Modeling Influence
1. Graph G drawn according to some distribution
2. In each of the time steps 1, ..., T, each non-active agent
decides whether to become active.
3. An agent becomes active with probability p(a), a function of
the number of neighboring and active nodes.
or, alternatively,
Some remarks...
- The coefficient α measures social correlation.
Some remarks...
- The coefficient α measures social correlation.
- Since actions are stored, a represents the number of users
active at any earlier time step
Some remarks...
- The coefficient α measures social correlation.
- Since actions are stored, a represents the number of users
active at any earlier time step
- This model is relatively simplistic:
- the probability does not vary between nodes
- or as time passes
Some remarks...
- The coefficient α measures social correlation.
- Since actions are stored, a represents the number of users
active at any earlier time step
- This model is relatively simplistic:
- the probability does not vary between nodes
- or as time passes
- However, these simplifying assumption are practical
Estimating α, β
- Can estimate using maximum likelihood logistic regression
- Maximize expression
where
is the number of users who at the beginning of time
had a active friends and became active at time t
The Shuffle Test
- Idea: if influence does not play a role, then the timing of
activations amongst users should be independent of each other:
Pr(a active before b) = Pr(b active before a)
The Shuffle Test
1. Estimate α for initial graph
2. Randomly permute the order in which active nodes have
been activated:
set the time of
3. Estimate α' for this configuration
4. If the values for α and α' are close to each other, the model
exhibits little or no social influence.
The Edge-reversal Test
1. reverse direction of all the edges
2. run the same logistic regression on the data using the new
graph
If correlation is not due to influence, then α should not change
Generative Models
- No Correlation
- Influence
- Correlation, no influence
Generative Models - No Correlation
- network grows just as the real data
- at every step, randomly pick n nodes, and make them active
Influence Model
- network grows just as the real data
- at every step, every inactive node flips a coin, with
Correlation, No Influence Model
- network grows just as the real data
- Pick a subset S of G:
- randomly pick centers, add a ball of radius 2 from each to S
- do this until |S| reaches parameter L
- Pick nodes to become active uniformly at random, from S
Distinguishing Influence: Shuffle Test
Influence:
Correlation:
Distinguishing Influence: Edge Reversal
Influence:
Correlation:
Real Data: the Flickr Dataset
- analyzed 800K users over 16 months
- about 340K exhibited tagging behavior
- size of giant component: 160K
- 2.8M directed edges, 28.5% not mutual
- analyzed 1,700 tags independently
- various types (event, color, object, etc)
- various numbers of users
- various growth patterns (bursty, smooth, periodic)
Distinguishing Influence in Flickr
Shuffle test
Distinguishing Influence in Flickr
Edge reversal test
Some Influence
- can discover traces of influence by looking at similar tags
Some Influence
- can discover traces of influence by looking at similar tags
- for the tag "graffiti", the difference between αs was 0
- however, for the misspelling "grafitti", difference was slightly
larger
- with even less common misspelling "graffitti", difference
increased even more
Conclusions
- distinguishing between correlation and causation is difficult
Conclusions
- distinguishing between correlation and causation is difficult
- timing information can help answer the question (shuffle)
Conclusions
- distinguishing between correlation and causation is difficult
- timing information can help answer the question (shuffle)
- knowing of asymmetric social ties is also useful (edge-reversal)
Further research directions
- formal verification of results? (controlled experiments)
- quantification of the strength of influence?
- identify which nodes influence others
- what if social ties are symmetric?
- distinguishing between other forms of correlation
- distinguishing between different forms of social influence
Questions?
Download