Maximizing the Spread of Influence through a Social Network

advertisement
Maximizing the Spread of Influence
through a Social Network
David Kempe, Jon Kleinberg, Eva Tardos
Cornell University
KDD 2003
Social network and spread of influence
• Social network spreads
INFLUENCE among its members
– Opinions, ideas, information …
• “Word-of-mouth” effect in Viral Marketing
Motivating scenarios
1. Adoption of a new drug by doctors and patients
How to reach many patients?
2. Adoption of a new book by profs and students
How to reach many students?
3. Bloggers blogging and publishing weblogs
Follow which blogger to get the most information?
4. Battle of Water Sensor Networks
How to find the optimize sensor placement?
Problem setting
• Given
– A limited budget B for initial advertising
– Influence estimates between individuals
• Goal
– Trigger a large cascade of influence
• Question
– Which set of individuals should we target?
What do we have in this paper?
• Form models of influence in social networks
• Obtain data about particular network
(inter-personal influence estimating)
• Devise algorithm to maximize spread of
influence
Models of influence
• First mathematical models
– [Schelling ‘70/’78], [Granovetter ‘78]
• Large body of subsequent work
– [Rogers ‘95], [Valente ‘95], [Wasserman/Faust ‘94]
• Two basic classes of diffusion models: threshold and
cascade
• General operational view:
– A social network is a directed graph, each person
(individual) is a node
– Nodes start either active or inactive
– An active node may trigger activation of neighboring nodes
– Monotonicity assumption
Linear threshold model
• A node 𝑣 has a random threshold 𝜃𝑣 ∈ [0,1]
• A node 𝑣 is influenced by each neighbor 𝑤 according to
a weight 𝑏𝑣,𝑤 such that :
𝑤
𝑏𝑣,𝑤 ≤ 1
• A node 𝑣 becomes active when at least (weighted) 𝜃𝑣
fraction of its neighbors are active:
𝑤 (𝑎𝑐𝑡𝑖𝑣𝑒)
𝑏𝑣,𝑤 ≥ 𝜃𝑣
Example
Inactive Node
0.6
0.3
Active Node
0.2
X
Threshold
0.2
Active neighbors
0.1
0.4
U
0.5
w
0.3
0.5
Stop!
0.2
v
Independent cascade model
• When node 𝑣 becomes active, it has a single
chance of activating each currently inactive
neighbor 𝑤
• The activation attempt succeeds with
independent probability 𝑝𝑣𝑤
Example
0.6
Inactive Node
0.3
0.2
X
0.4
0.5
w
0.2
U
0.1
0.3
0.2
0.5
v
Stop!
Active Node
Newly active
node
Successful
attempt
Unsuccessful
attempt
Influence maximization problem
• Influence of a node set 𝑆: 𝑓 𝑆
– Expected number of active nodes at the end, if set 𝑆 is
the initial active set
• Problem:
– Given a parameter 𝑘 (budget), find a 𝑘-node set 𝑆 to
maximize 𝑓(𝑆)
– Constrained optimization problem with 𝑓(𝑆) as the
objective function
Properties of 𝑓(𝑆)
• Non-negative (obviously)
• Monotone: 𝑓(𝑆 + 𝑣) ≥ 𝑓(𝑆)
• Submodular:
– Let 𝑁 be a finite set
– A set function 𝑓: 2𝑁 → ℝ is submodular iff
∀𝑆 ⊂ 𝑇 ⊂ 𝑁, ∀𝑣 ∈ 𝑁\𝑇,
𝑓 𝑆 + 𝑣 − 𝑓 𝑆 ≥ 𝑓 𝑇 + 𝑣 − 𝑓(𝑇)
Bad news
• For a submodular function 𝑓, if 𝑓 only takes nonnegative values, and is monotone, finding a 𝑘-element
set 𝑆 for which 𝑓(𝑆) is maximized in an NP-hard
optimization problem.
• It is NP-hard to determine the optimum for influence
maximization for both independent cascade model and
linear threshold model.
Good news
• We can use Greedy algorithm
– Start with an empty set 𝑆
– For 𝑅 iterations:
• Add node 𝑣 to 𝑆 that maximizes 𝑓(𝑆 + 𝑣) − 𝑓(𝑆)
• How good (bad) is it?
– Theorem: The greedy algorithm is a (1 − 1/𝑒) approx.
– The resulting set 𝑆 activated at least (1 − 1/𝑒) > 63% of
the number of nodes that any size-𝑘 set 𝑆 could activate
Greedy algorithm
Other heuristics to find 𝑆
• High-degree
– Picks 𝑘 nodes with highest node degree 𝑑𝑣
• Distance centrality
– Picks 𝑘 nodes with lowest average distance to other
nodes in the network
• Random
– Randomly pick 𝑘 nodes
Experiment setup
• Co-authorship network from physics section of arXiv.org
• A node is an author
• A link is a co-authored paper (𝑐𝑢,𝑣 links)
• LT model: The edge 𝑣 → 𝑢 has weight
𝑐𝑢,𝑣
𝑑𝑢
• IC model:
– 𝑝 = 1% − 10%
– The edge 𝑣 → 𝑢 has prob. 1 − (1 − 𝑝)𝑐𝑢,𝑣
Experiment result on IC model
• Result on LT model is similar
• Not sensitive to different algorithms at high 𝑝
Cost-effective Outbreak
Detection in Networks
Jure Leskovec, Andreas Krause, Carlos Guestrin,
Christos Faloutsos, Jeanne VanBriesen, Natalie Glance
Carnegie Mellon University
KDD 2007
Original Greedy
Inefficient!!!
Redundant!!!
15,000 nodes takes
a few days to complete
Complexity 𝑂(𝑘𝑛𝑅𝑚)
Submodularity property
• Recall:
∀𝑆 ⊂ 𝑇 ⊂ 𝑁, ∀𝑣 ∈ 𝑁\𝑇,
𝑓 𝑆 + 𝑣 − 𝑓 𝑆 ≥ 𝑓 𝑇 + 𝑣 − 𝑓(𝑇)
• When adding a vertex v to seed set S, the gain
of adding v is larger if S is smaller
• Therefore: a large number of nodes to not
need to be re-evaluate
CELF algorithm
𝑟
If 𝑠𝑣𝑟−1 < 𝑠𝑣′
then discard 𝑣
700 times faster than
the original greedy!!!
Efficient Influence Maximization
in Social Networks
Wei Chen, Yajun Wang, Siyu Yang
Microsoft Research, Tsinghua University
KDD 2009
Improved greedy
• Construct a graph 𝐺’
• Obtain 𝐺’ by removing edges not for propagation from
𝐺 with prob. (1 − 𝑝)
• Use DFS/BFS to find out the set of vertices reachable
from 𝑆 in 𝐺’ (𝑅𝐺’(𝑆))
• Also obtain 𝑅𝐺’ 𝑣 , ∀𝑣 ∈ 𝑉\S
• Remove overlapping elements 𝑅𝐺’ 𝑣 ∩ 𝑅𝐺’ 𝑆 = ⊘
Improved greedy
15-34% faster than
the original greedy!!!
Mix with CELF
• Cons
– CELF must consider all vertices to be added in the
first round, but then we can decreased in future
rounds
– Improved greedy must build G’ for R times
• Mix
– First vertex: use Improved greedy
– Other vertices: use CELF
Download