faloutsos-prakash-vi.. - School of Computer Science

advertisement
Understanding and Managing
Cascades on Large Graphs
B. Aditya Prakash
Carnegie Mellon University Virginia Tech.
Christos Faloutsos
Carnegie Mellon University
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
From: VLDB’12 tutorial
• http://vldb.org/pvldb/vol5/p2024_badityapra
kash_vldb2012.pdf
B. Aditya Prakash
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
2
Networks are everywhere!
Facebook Network [2010]
Gene Regulatory Network
[Decourty 2008]
Human Disease Network
[Barabasi 2007]
The Internet [2005]
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
3
Dynamical Processes over networks
are also everywhere!
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
4
Why do we care?
• Social collaboration
• Information Diffusion
• Viral Marketing
• Epidemiology and Public Health
• Cyber Security
• Human mobility
• Games and Virtual Worlds
• Ecology
........
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
5
Why do we care? (1:
Epidemiology)
• Dynamical Processes over networks
[AJPH 2007]
Diseases over contact networks
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
CDC data: Visualization of
the first 35 tuberculosis
(TB) patients and their
1039 contacts
6
Why do we care? (1:
Epidemiology)
• Dynamical Processes over networks
• Each circle is a hospital
• ~3000 hospitals
• More than 30,000 patients
transferred
[US-MEDICARE
NETWORK 2005]
Graph Analytics wkshp
Problem: Given k units of
disinfectant, whom to immunize?
B. A. Prakash; C. Faloutsos
8
Why do we care? (1:
Epidemiology)
~6x
fewer!
CURRENT PRACTICE
[US-MEDICARE
NETWORK 2005]
OUR METHOD
Graph Analytics wkshp
Prakash; C. Faloutsos
9
Hospital-acquired
inf. tookB. A.99K+
lives, cost $5B+ (all per year)
Why do we care? (2: Online
Diffusion)
> 800m users, ~$1B
revenue [WSJ 2010]
~100m active users
> 50m users
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
10
Why do we care? (2: Online
Diffusion)
• Dynamical Processes over networks
Buy Versace™!
Followers
Celebrity
Graph Analytics wkshp
Social MediaB. Marketing
A. Prakash; C. Faloutsos
11
Why do we care?
(3: To change the world?)
• Dynamical Processes over networks
Social networks and Collaborative Action
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
12
High Impact – Multiple Settings
epidemic out-breaks
Q. How to squash rumors faster?
products/viruses
Q. How do opinions spread?
transmit s/w patches
Q. How to market better?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
13
Research Theme
ANALYSIS
Understanding
POLICY/
ACTION
DATA
Large real-world
networks
& processes
Graph Analytics wkshp
Managing
B. A. Prakash; C. Faloutsos
14
Research Theme – Public Health
ANALYSIS
Will an epidemic
happen?
POLICY/
ACTION
DATA
Modeling # patient
transfers
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
How to control
out-breaks?
15
Research Theme – Social Media
ANALYSIS
# cascades in
future?
POLICY/
ACTION
DATA
Modeling Tweets
spreading
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
How to market
better?16
In this tutorial
Given propagation models:
Q1: Will an epidemic
happen?
ANALYSIS
Understanding
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
17
In this tutorial
Q2: How to immunize and
control out-breaks better?
POLICY/
ACTION
Managing
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
18
In this tutorial
Q3: How do #hashtags
spread?
DATA
Large real-world
networks & processes
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
19
Outline
•
•
•
•
•
Motivation
Part 1: Understanding Epidemics (Theory)
Part 2: Policy and Action (Algorithms)
Part 3: Learning Models (Empirical Studies)
Conclusion
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
20
Part 1: Theory
• Q1: What is the epidemic threshold?
• Q2: How do viruses compete?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
21
A fundamental question
Strong
Virus
Epidemic?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
22
example (static graph)
Weak Virus
Epidemic?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
23
Problem Statement
# Infected
above (epidemic)
below (extinction)
Separate the
regimes?
time
Find, a condition under which
– virus will die out exponentially quickly
– regardless of initial infection condition
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
24
Threshold (static version)
Problem Statement
• Given:
–Graph G, and
–Virus specs (attack prob. etc.)
• Find:
–A condition for virus extinction/invasion
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
25
Threshold: Why important?
•
•
•
•
Accelerating simulations
Forecasting (‘What-if’ scenarios)
Design of contagion and/or topology
A great handle to manipulate the spreading
– Immunization
– Maximize collaboration
…..
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
26
Part 1: Theory
• Q1: What is the epidemic threshold?
– Background
– Result and Intuition (Static Graphs)
– Proof Ideas (Static Graphs)
– Bonus: Dynamic Graphs
• Q2: How do viruses compete?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
27
“SIR” model: life immunity
(mumps)
• Each node in the graph is in one of three states
– Susceptible (i.e. healthy)
– Infected
– Removed (i.e. can’t get infected again)
Prob. δ
t=1
Graph Analytics wkshp
t=2
B. A. Prakash; C. Faloutsos
t=3
28
Terminology: continued
• Other virus propagation models (“VPM”)
– SIS : susceptible-infected-susceptible, flu-like
– SIRS : temporary immunity, like pertussis
– SEIR : mumps-like, with virus incubation
(E = Exposed)
….………….
• Underlying contact-network – ‘who-can-infectwhom’
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
29
Related Work















R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,
1991.
A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex
Networks. Cambridge University Press, 2010.
F. M. Bass. A new product growth for model consumer durables. Management Science,
15(5):215–227, 1969.
D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in
real networks. ACM TISSEC, 10(4), 2008.
D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly
Connected World. Cambridge University Press, 2010.
A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of
epidemics. IEEE INFOCOM, 2005.
Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free
networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003.
H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000.
H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer
Lecture Notes in Biomathematics, 46, 1984.
J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer
viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991.
J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE
Computer Society Symposium on Research in Security and Privacy, 1993.
R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks.
Physical Review Letters 86, 14, 2001.
………
………
………Analytics wkshp
Graph
All are about either:
• Structured
topologies (cliques,
block-diagonals,
hierarchies, random)
• Specific virus
propagation models
• Static graphs
B. A. Prakash; C. Faloutsos
30
Part 1: Theory
• Q1: What is the epidemic threshold?
– Background
– Result and Intuition (Static Graphs)
– Proof Ideas (Static Graphs)
– Bonus: Dynamic Graphs
• Q2: How do viruses compete?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
31
How should the answer look
like?
• Answer should depend on:
– Graph
– Virus Propagation Model (VPM)
• But how??
– Graph – average degree? max. degree? diameter?
– VPM – which parameters?
– How to combine – linear? quadratic? exponential?
2 2
(

d avg  d avg ) / d max ? …..
d avg   diameter ?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
32
Static Graphs: Our Main Result
• Informally,
For,
 any arbitrary topology (adjacency
matrix A)
 any virus propagation model (VPM) in
standard literature
•
the
epidemic threshold depends only
1. on the λ, first eigenvalue of A, and
2. some constant CVPM , determined by
the virus propagation model
λ
CVPM
No
epidemic if
λ * CVPM < 1
In Prakash+ ICDM 2011 (Selected among best papers).
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
33
Our thresholds for some models
• s = effective strength
• s < 1 : below threshold
Models
SIS, SIR, SIRS, SEIR
SIV, SEIV
Effective Strength
(s)
s=λ.
 
 
 
s=λ.
  


      
SI1I2 V1 V2 (H.I.V.) s
Graph Analytics wkshp
 1v2   2
= λ .  v2   v1 
B. A. Prakash; C. Faloutsos
Threshold (tipping
point)
s=1



34
Our result: Intuition for λ
“Official” definition:
• Let A be the adjacency
matrix. Then λ is the root
with the largest magnitude of
the characteristic polynomial
of A [det(A – xI)].
“Un-official” Intuition 
• λ ~ # paths in the graph
A
k
≈
u
k
.u
• Doesn’t give much intuition!
A
k (i, j) = # of paths i  j
of length k
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
35
Largest Eigenvalue (λ)
better connectivity
λ≈2
λ≈2
N = 1000
Graph Analytics wkshp
higher λ
λ= N
λ = N-1
λ= 31.67
λ= 999
B. A. Prakash; C. Faloutsos
N nodes
36
Footprint
Fraction of Infections
Examples: Simulations – SIR
(mumps)
Effective Strength
Time ticks
(a) Infection profile
Graph Analytics wkshp
(b) “Take-off” plot
PORTLAND graph
31 million links, 6 million nodes
B. A. Prakash; C. Faloutsos
37
Footprint
Fraction of Infections
Examples: Simulations – SIRS
(pertusis)
Time ticks
Effective Strength
(a) Infection profile
Graph Analytics wkshp
(b) “Take-off” plot
PORTLAND graph
31 million links, 6 million nodes
B. A. Prakash; C. Faloutsos
38
Part 1: Theory
• Q1: What is the epidemic threshold?
– Background
– Result and Intuition (Static Graphs)
– Proof Ideas (Static Graphs)
– Bonus: Dynamic Graphs
• Q2: How do viruses compete?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
39
Proof Sketch
General VPM
structure
Model-based
λ * CVPM < 1
Topology and
stability
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
Graph-based
40
Models and more models
Model
Used for
SIR
Mumps
SIS
Flu
SIRS
Pertussis
SEIR
Chicken-pox
……..
SICR
Tuberculosis
MSIR
Measles
SIV
Sensor Stability
SI1I2 V1 V2 H.I.V.
……….
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
41
Ingredient 1: Our generalized
model
Endogenous
Transitions
Susceptible
Infected
Exogenous
Transitions
Endogenous
Transitions
Graph Analytics wkshp
Vigilant
B. A. Prakash; C. Faloutsos
42
Special case
Susceptible
Infected
Vigilant
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
43
Special case: H.I.V.
SI1I2 V1 V2
“Non-terminal”
“Terminal”
Multiple Infectious,
Vigilant states
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
44
Ingredient 2: NLDS+Stability
• View as a NLDS
– discrete time
– non-linear dynamical system (NLDS)
size
mN x 1
.
.
.
.
.
Graph Analytics wkshp.
Probability vector
Specifies the state of
the system at time t
size N (number of
nodes in the graph)
B. A. Prakash; C. Faloutsos
S
I
V
45
Ingredient 2: NLDS + Stability
• View as a NLDS
– discrete time
– non-linear dynamical system (NLDS)
size
mN x 1
.
.
.
.
.
Graph Analytics wkshp.
Non-linear function
Explicitly gives the
evolution of system
B. A. Prakash; C. Faloutsos
46
Ingredient 2: NLDS + Stability
• View as a NLDS
– discrete time
– non-linear dynamical system (NLDS)
• Threshold  Stability of NLDS
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
47
Special case: SIR
S
size
3N x 1
S
I
I
R
R
= probability that node
i is not attacked by
any of its infectious
NLDS
neighbors
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
48
Fixed Point
1
1
.
0
0
.
0
0
.
Graph Analytics wkshp
State when no node is
infected
Q: Is it stable?
B. A. Prakash; C. Faloutsos
49
Stability for SIR
Stable
under threshold
Graph Analytics wkshp
Unstable
above threshold
B. A. Prakash; C. Faloutsos
50
See paper for
full proof
General VPM
structure
Model-based
λ * CVPM < 1
Topology and
stability
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
Graph-based
51
Part 1: Theory
• Q1: What is the epidemic threshold?
– Background
– Result and Intuition (Static Graphs)
– Proof Ideas (Static Graphs)
– Bonus: Dynamic Graphs
• Q2: How do viruses compete?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
52
Dynamic Graphs: Epidemic?
Alternating behaviors
DAY
(e.g., work)
adjacency
matrix
8
8
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
53
Dynamic Graphs: Epidemic?
Alternating behaviors
NIGHT
(e.g., home)
adjacency
matrix
8
8
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
54
Model Description
Healthy
• SIS model
N2
Prob. β
– recovery rate δ
– infection rate β
N1
X
Prob. δ
Infected
N3
• Set of T arbitrary graphs
day
N
night
N
Graph Analytics wkshp
N
, weekend…..
N
B. A. Prakash; C. Faloutsos
55
Our result: Dynamic Graphs
Threshold
• Informally, NO epidemic if
eig (S) =
Single number!
Largest eigenvalue of
The system matrix S
In Prakash+, ECML-PKDD 2010
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
<1
S =
56
Infection-profile
log(fraction infected)
MIT Reality
Mining
Synthetic
ABOVE
ABOVE
AT
AT
BELOW
BELOW
Time
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
57
Footprint (#
infected @
“steady state”)
“Take-off” plots
Synthetic
MIT Reality
EPIDEMIC
Our
threshold
EPIDEMIC
NO EPIDEMIC
Our
threshold
NO EPIDEMIC
(log scale)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
58
Part 1: Theory
• Q1: What is the epidemic threshold?
• Q2: What happens when viruses compete?
– Mutually-exclusive viruses
– Interacting viruses
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
59
Competing Contagions
iPhone v Android
Blu-ray v HD-DVD
Attack
v
Retreat
Biological common flu/avian flu, pneumococcal inf etc
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
60
A simple model
• Modified flu-like
• Mutual Immunity (“pick one of the two”)
• Susceptible-Infected1-Infected2-Susceptible
Virus 2
Virus 1
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
61
Question: What happens in the
end?
Number of
Infections
green: virus 1
red: virus 2
Footprint @ Steady State
Footprint @ Steady State
ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
= ?
62
Question: What happens in the
end? Footprint @ Steady State
Number of
Infections
green: virus 1
red: virus 2
Footprint @ Steady State
??
Strength
Strength
=
2
Strength
Strength
ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
63
Answer: Winner-Takes-All
Number of
Infections
green: virus 1
red: virus 2
ASSUME:
Virus 1 is stronger than Virus 2
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
64
Our Result: Winner-Takes-All
Given our model, and any graph, the
weaker virus always dies-out completely
1. The stronger survives only if it is above threshold
2. Virus 1 is stronger than Virus 2, if:
strength(Virus 1) > strength(Virus 2)
3. Strength(Virus) = λ β / δ  same as before!
In Prakash+ WWW 2012
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
65
Real Examples
[Google Search Trends data]
Reddit v Digg
Graph Analytics wkshp
Blu-Ray v HD-DVD
B. A. Prakash; C. Faloutsos
66
Part 1: Theory
• Q1: What is the epidemic threshold?
• Q2: What happens when viruses compete?
– Mutually-exclusive viruses
– Interacting viruses
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
67
A simple model: SI1|2S
• Modified flu-like (SIS)
• Susceptible-Infected1 or 2-Susceptible
• Interaction Factor ε
– Full Mutual Immunity: ε = 0
– Partial Mutual Immunity (competition): ε < 0
– Cooperation: ε > 0
Virus 1
Graph Analytics wkshp
&
Virus 2
B. A. Prakash; C. Faloutsos
68
Question: What happens in the
end?
ε=0
Winner takes all
ε=1
Co-exist independently
0.8
0.7
0.6
0.5
0.4
0.3
0.2
k1
k2
0.1
0
0
100
200
300
400
Time
500
600
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
k1
k2
0.1
i1,2
0
700
800
Footprint (Fraction of Population)
0.9
Footprint (Fraction of Population)
Footprint (Fraction of Population)
0.9
ε=2
Viruses cooperate
20
40
60
80
Time
100
120
0.8
0.7
0.6
0.5
0.4
0.3
0.2
k1
k2
0.1
i1,2
0
140
20
40
60
80
Time
100
120
What about for 0 < ε <1?
Is there a point at which both viruses can co-exist?
ASSUME:
Graph Analytics wkshp
Virus
1 is stronger than VirusB.2A. Prakash; C. Faloutsos
69
140
Footprint (Fraction of Population)
Answer: Yes!
There is a phase transition
0.8
0.6
0.4
0.2
k1
k2
0
0
ASSUME:
Virus
1 iswkshp
stronger than B.Virus
2Faloutsos
Graph Analytics
A. Prakash; C.
100
200
300
400
Time
500
600
700
70
800
Footprint (Fraction of Population)
Answer: Yes!
There is a phase transition
0.8
0.6
0.4
0.2
k1
k2
0
i1,2
20
ASSUME:
Virus
1 iswkshp
stronger than B.Virus
2Faloutsos
Graph Analytics
A. Prakash; C.
40
60
80
Time
100
120
140
71
Footprint (Fraction of Population)
Answer: Yes!
There is a phase transition
0.8
0.6
0.4
0.2
k1
k2
0
i1,2
0
ASSUME:
Virus
1 iswkshp
stronger than B.Virus
2Faloutsos
Graph Analytics
A. Prakash; C.
50
100
150
Time
200
250
72
300
Our Result: Viruses can Co-exist
Given our model and a fully connected graph,
there exists an εcritical such that for ε ≥ εcritical,
there is a fixed point where both viruses
survive.
1. The stronger survives only if it is above threshold
2. Virus 1 is stronger than Virus 2, if:
strength(Virus 1) > strength(Virus 2)
3. Strength(Virus) σ = N β / δ
In Beutel+ KDD 2012
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
73
Real Examples
[Google Search Trends data]
Hulu v Blockbuster
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
74
Real Examples
[Google Search Trends data]
Chrome v Firefox
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
75
Outline
•
•
•
•
•
Motivation
Part 1: Understanding Epidemics (Theory)
Part 2: Policy and Action (Algorithms)
Part 3: Learning Models (Empirical Studies)
Conclusion
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
76
Part 2: Algorithms
• Q3: Whom to immunize?
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
77
Full Static Immunization
Given: a graph A, virus prop. model and budget k;
Find: k ‘best’ nodes for immunization (removal).
?
?
k=2
?
?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
78
Part 2: Algorithms
• Q3: Whom to immunize?
– Full Immunization (Static Graphs)
– Full Immunization (Dynamic Graphs)
– Fractional Immunization
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
79
Challenges
• Given a graph A, budget k,
Q1 (Metric) How to measure the ‘shieldvalue’ for a set of nodes (S)?
Q2 (Algorithm) How to find a set of k nodes
with highest ‘shield-value’?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
80
Proposed vulnerability measure
λ
λ is the epidemic threshold
“Safe”
“Vulnerable”
“Deadly”
Increasing λ
Increasing vulnerability
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
81
A1: “Eigen-Drop”: an ideal shield
value
Eigen-Drop(S)
Δ λ = λ - λs
9
9
11
10
Δ
9
10
1
1
4
4
8
8
2
2
5
5
6
Original Graph
Graph Analytics wkshp
7
3
7
3
B. A. Prakash; C. Faloutsos
6
Without {2, 6}
82
(Q2) - Direct Algorithm too
expensive!
• Immunize k nodes which maximize Δ λ
S = argmax Δ λ
• Combinatorial!
• Complexity:
– Example:
• 1,000 nodes, with 10,000 edges
• It takes 0.01 seconds to compute λ
• It takes 2,615 years to find 5-best nodes!
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
83
A2: Our Solution
• Part 1: Shield Value
– Carefully approximate Eigen-drop (Δ λ)
– Matrix perturbation theory
• Part 2: Algorithm
– Greedily pick best node at each step
– Near-optimal due to submodularity
• NetShield (linear complexity)
– O(nk2+m) n = # nodes; m = # edges
In Tong, Prakash+ ICDM 2010
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
84
Our Solution: Part 1
• Approximate Eigen-drop (Δ λ)
• Δ λ ≈ SV(S) =
– Result using Matrix perturbation theory
– u(i) == ‘eigenscore’
~~ pagerank(i)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
A
u(i)
u =λ. u
85
P1: node importance
Original Graph
Graph Analytics wkshp
P2: set diversity
Select by P1
B. A. Prakash; C. Faloutsos
Select by P1+P2
86
Our Solution: Part 2: NetShield
• We prove that:
SV(S) is sub-modular (& monotone non-decreasing)
Corollary: Greedy algorithm works
1. NetShield is near-optimal (w.r.t. max SV(S))
2. NetShield is O(nk2+m)
• NetShield: Greedily add best node at each step
Graph Analytics wkshp
Prakash;NetShield
C. Faloutsos
Footnote:
near-optimal meansB. A.SV(S
) >= (1-1/e) SV(S Opt)
87
Experiment: Immunization
quality
Log(fraction of
infected
nodes)
PageRank
Betweeness (shortest path)
Degree
Lower
is
better
Graph Analytics wkshp
Acquaintance
Eigs (=HITS)
NetShield
Time
B. A. Prakash; C. Faloutsos
88
Part 2: Algorithms
• Q3: Whom to immunize?
– Full Immunization (Static Graphs)
– Full Immunization (Dynamic Graphs)
– Fractional Immunization
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
89
Full Dynamic Immunization
• Given:
Set of T arbitrary graphs
day
N
N
night
N
, weekend…..
N
• Find:
k ‘best’ nodes to immunize (remove)
In Prakash+ ECML-PKDD 2010
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
90
Full Dynamic Immunization
• Our solution
– Recall theorem
– Simple: reduce
Matrix
Product
day
night
(=λ )
• Goal: max eigendrop Δ λ
Δ λ
=
λ befor e  λ after
• No competing policy for comparison
• We propose and evaluate many policies
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
91
Performance of Policies
Footprint after k=6 immunizations
34
32
30
28
26
24
22
20
Graph Analytics wkshp
Lower is
better
Footprint
B. A. Prakash; C. Faloutsos
MIT Reality
Mining
92
Part 2: Algorithms
• Q3: Whom to immunize?
– Full Immunization (Static Graphs)
– Full Immunization (Dynamic Graphs)
– Fractional Immunization
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
93
Fractional Immunization of Networks
B. Aditya Prakash, Lada Adamic, Theodore
Iwashyna (M.D.), Hanghang Tong, Christos
Faloutsos
Under Submission
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
94
Full Static Immunization
Given: a graph A, virus prop. model and budget k;
Find: k ‘best’ nodes for immunization (removal).
k=2
?
?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
95
Fractional Asymmetric
Immunization
• Fractional Effect [ f(x) =0.5 x]
• Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
96
Fractional Asymmetric
Immunization
• Fractional Effect [ f(x) =0.5 x]
• Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
97
Fractional Asymmetric
Immunization
• Fractional Effect [ f(x) =0.5 x]
• Asymmetric Effect
# antidotes = 3
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
98
Fractional Asymmetric
Immunization
Drug-resistant Bacteria
(like XDR-TB)
Another
Hospital
Hospital
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
99
Fractional Asymmetric
Immunization
=f
Drug-resistant Bacteria
(like XDR-TB)
Another
Hospital
Hospital
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
100
Fractional Asymmetric
Immunization
Problem: Given k units of disinfectant,
how to distribute them to maximize
hospitals saved?
Another
Hospital
Hospital
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
101
Our Algorithm “SMARTALLOC”
~6x
fewer!
[US-MEDICARE NETWORK 2005]
• Each circle is a hospital, ~3000 hospitals
• More than 30,000 patients transferred
CURRENT PRACTICE
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
SMART-ALLOC
102
Wall-Clock
Time
Running Time
> 1 week
≈
> 30,000x
speed-up!
Lower
is
better
Graph Analytics wkshp
14 secs
Simulations
B. A. Prakash; C. Faloutsos
SMART-ALLOC
103
Lower
is
better
Experiments
PENN-NETWORK
SECOND-LIFE
~5 x
K = 200
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
~2.5 x
K = 2000
104
Part 2: Algorithms
• Q3: Whom to immunize?
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
105
Outbreak detection
• Problems of finding sources of
contamination in water networks
and finding “hot” stories on blogs
are isomorphic.
– Minimize time to detection,
population affected
– Maximize probability of detection.
– Minimize sensor placement cost.
Blo
gs
Links
Graph Analytics wkshp
Po
sts
B. A. Prakash; C. Faloutsos
Information
cascade
106
• J. Leskovec, A. Krause, C. Guestrin, C.
Faloutsos, J. VanBriesen, N. Glance. "Costeffective Outbreak Detection in Networks”
KDD 2007
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
107
CELF: Main idea
• Given a graph G(V,E)
• and a budget of B sensors
• and data on how contaminations spread over the
network:
– for each contamination i we know the time T(i, u) when it
contaminated node u
• Minimize time to detect outbreak
• CELF algorithm uses submodularity and lazy evaluation
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
108
Blogs: Comparison to heuristics
Benefit
(higher=
better)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
109
“Best 10 blogs to read”
NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links
•
•
•
•
•
•
•
•
•
•
•
k
1
2
3
4
5
6
7
8
9
10
PA score
0.1283
0.1822
0.2224
0.2592
0.2923
0.3152
0.3353
0.3508
0.3654
0.3778
Graph Analytics wkshp
Blog
http://instapundit.com
http://donsurber.blogspot.com
http://sciencepolitics.blogspot.com
http://www.watcherofweasels.com
http://michellemalkin.com
http://blogometer.nationaljournal.com
http://themodulator.org
http://www.bloggersblog.com
http://www.boingboing.net
http://atrios.blogspot.com
NP
4593
1534
924
261
1839
189
475
895
5776
4682
B. A. Prakash; C. Faloutsos
IL
4636
1206
576
941
12642
2313
717
247
6337
3205
OLO
1890
679
888
1733
1179
3669
1844
1244
1024
795
OLA
5255
3495
2701
3630
6323
9272
4944
10201
6183
3102
110
Part 2: Algorithms
• Q3: Whom to immunize?
• Q4: How to detect outbreaks?
• Q5: Who are the culprits?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
111
• B. Aditya Prakash, Jilles Vreeken, Christos
Faloutsos ‘Detecting Culprits in Epidemics:
Who and How many?’
Under Submission
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
112
Problem definition
2-d grid
‘+’ -> infected
Who started it?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
113
Problem definition
2-d grid
‘+’ -> infected
Who started it?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
114
Culprits: Exoneration
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
115
Who are the culprits
• Two-part solution
– use MDL for number of seeds
– for a given number:
• exoneration = centrality + penalty
• our method uses smallest eigenvector of Laplacian
submatrix
• Running time =
– linear! (in edges and nodes)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
116
Culprits: Results
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
117
Outline
•
•
•
•
•
Motivation
Part 1: Understanding Epidemics (Theory)
Part 2: Policy and Action (Algorithms)
Part 3: Learning Models (Empirical Studies)
Conclusion
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
118
Part 3: Empirical Studies
• Q6: How do cascades look like?
• Q7: How does activity evolve over time?
• Q8: How does external influence act?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
119
Cascading Behavior in
Large Blog Graphs
Blogs
Posts
Information
cascade
Links
How does information propagate
over the blogosphere?
J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M.
Hurst. Cascading Behavior in Large Blog Graphs. SDM
2007.
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
120
Cascades on the Blogosphere
B1
B2
B1
1
1
a
B2
1
B3
1
B4
Blogosphere
blogs + posts
d
c
b
e
e
B3
c
2
B4
Blog network
links among blogs
3
d
e
Post network
links among posts
a
Cascade is graph induced by a
time ordered propagation of
information (edges)
Cascades
3 - 121Analytics wkshp
Graph
b
B. A. Prakash; C. Faloutsos
Blog data
45,000 blogs participating in cascades
All their posts for 3 months (Aug-Sept ‘05)
2.4 million posts
~5 million links (245,404 inside the dataset)
Number of posts




Number
of posts
3 - 122Analytics wkshp
Graph
Time C.
[1Faloutsos
day]
B. A. Prakash;
Popularity over time
# in links
1
2
3
lag: days after post
Post popularity drops-off – exponentially?
@t
@t + lag
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
123
Popularity over time
# in links
(log)
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
124
Popularity over time
# in links
(log)
-1.6
days after post
(log)
Post popularity drops-off – exponentially?
POWER LAW!
Exponent? -1.6
• close to -1.5: Barabasi’s stack model
• and like the zero-crossings of a random walk
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
125
CMU SCS
-1.5 slope
J. G. Oliveira & A.-L. Barabási Human Dynamics: The
Correspondence Patterns of Darwin and Einstein.
Nature 437, 1251 (2005) . [PDF]
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
126
Topological patterns: Cascades
 Procedure for gathering cascades:




Find all initiators (nodes with out-degree 0)
Follow in-links
Produces directed acyclic graph
Count cascade shapes (use our multi-level graph
isomorphism testing algorithm)
a
a
b
b
c
d
c
d
e
3 - 127Analytics wkshp
Graph
d
e
e
B. A. Prakash; C. Faloutsos
c
b
e
a
Topological Observations
How do we measure how information flows through
the network?
Common cascade shapes extracted using
algorithms in [Leskovec, Singh, Kleinberg; PAKDD 2006].
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
128
Topological Observations
What graph properties do cascades exhibit?
Cascade size distributions also follow power law.
Observation 2: The probability of observing a cascade on n
nodes follows a Zipf distribution:
p(n)
n-2
Count
a=-2
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
Cascade
size (# of nodes)
129
Topological Observations
What graph properties do cascades exhibit?
Stars and chains also follow a power law, with
different exponents (star -3.1, chain -8.5).
a=-8.5
Count
Count
a=-3.1
Size of star (# nodes)
Graph Analytics wkshp
Size of chain (# nodes)
B. A. Prakash; C. Faloutsos
130
Blogs and structure
• Cascades take on different shapes (sorted by
frequency):
How can we use cascades
to identify communities?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
131
PCA on cascade types
~44,000 blogs
• Perform PCA on sparse
matrix.
• Use log(count+1)
• Project onto 2 PC…
Graph Analytics wkshp
~9,000 cascade types
…………
slashdot
boingboing
4.6
2.1
3.2
1.1
…
…
…
…
…
4.2
B. A. Prakash; C. Faloutsos
.09
3.4
.07
5.1
2.1
.67
1.1
.07
.01
132
PCA on cascade types
• Observation: Content of blogs and cascade behavior are
often related.
• Distinct clusters for
“conservative” and
“humorous” blogs
(hand-labeling).
M. McGlohon, J. Leskovec, C.
Faloutsos, M. Hurst, N. Glance.
Finding Patterns in Blog Shapes
and Blog Evolution. ICWSM 2007.
28
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
133
PCA on cascade types
• Observation: Content of blogs and cascade behavior are
often related.
• Distinct clusters for
“conservative” and
“humorous” blogs
(hand-labeling).
M. McGlohon, J. Leskovec, C.
Faloutsos, M. Hurst, N. Glance.
Finding Patterns in Blog Shapes
and Blog Evolution. ICWSM 2007.
29
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
134
Part 3: Empirical Studies
• Q6: How do cascades look like?
• Q7: How does activity evolve over time?
• Q8: How does external influence act?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
135
Rise and fall patterns in social media
• Meme (# of mentions in blogs)
– short phrases Sourced from U.S. politics in 2008
“you can put lipstick on a pig”
“yes we can”
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
136
Rise and fall patterns in social
media
• Can we find a unifying model, which
includes these patterns?
• four classes on YouTube [Crane et al. ’08]
• six classes on Meme [Yang et al. ’11]
100
100
100
50
50
50
0
0
100
50
100
50
0
0
Graph Analytics wkshp
0
0
100
50
100
50
50
100
0
0
0
0
100
50
100
50
100
50
50
100
B. A. Prakash; C. Faloutsos
0
0
137
Rise and fall patterns in social
media
• Answer: YES!
0
20
40
60 80
Time
20
40
60 80
Time
100 120
Value
20
40
60 80
Time
50
0
20
40
60 80
Time
100 120
50
0
100 120
Original
SpikeM
100
Value
Value
50
0
0
100 120
Original
SpikeM
100
50
Original
SpikeM
100
20
40
60 80
Time
100 120
Original
SpikeM
100
Value
Value
50
Original
SpikeM
100
Value
Original
SpikeM
100
50
0
20
40
60 80
Time
100 120
• We can represent all patterns by single model
In Matusbara+ SIGKDD 2012
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
138
Main idea - SpikeM
- 1. Un-informed bloggers (uninformed about rumor)
- 2. External shock at time nb (e.g, breaking news)
- 3. Infection (word-of-mouth)
Time n=0
Time n=nb
Infectiveness of a blog-post at age n:
b
f (n)
Time n=nb+1
f (n) = b * n-1.5
- Strength of infection (quality of news)
- Decay function (how infective a blog posting is)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
β
Power Law
139
SpikeM - with periodicity
• Full equation of SpikeM
n
é
ù
DB(n +1) = p(n +1)× êU(n)× å (DB(t) + S(t))× f (n +1- t) + e ú
ê
ú
ë
t=n
û
b
Periodicity
12pm
Peak activity
Bloggers change their
activity over time
activity
3am
Low activity
p(n)
(e.g., daily, weekly, yearly)
Time n
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
140
Details
• Analysis – exponential rise and power-raw fall
Liner-log
Rise-part
SI -> exponential
Log-log
Graph Analytics wkshp
SpikeM -> exponential
B. A. Prakash; C. Faloutsos
141
Details
• Analysis – exponential rise and power-raw fall
Liner-log
Fall-part
SI -> exponential
SpikeM -> power law
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
Log-log
142
Tail-part forecasts
• SpikeM can capture tail part
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
143
“What-if” forecasting
(1) First spike
(2) Release date
(3) Two weeks before release
?
?
e.g., given (1) first spike,
(2) release date of two sequel movies
(3) access volume before the release date
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
144
“What-if” forecasting
–SpikeM can forecast not only tail-part, but also rise-part!
(1) First spike
(2) Release date
(3) Two weeks before release
• SpikeM can forecast upcoming spikes
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
145
Part 3: Empirical Studies
• Q6: How do cascades look like?
• Q7: How does activity evolve over time?
• Q8: How does external influence act?
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
146
Tweets Diffusion: Problem
Definition
• Given:
– Action log of people tweeting a #hashtag
– A network of users
• Find:
– How external influence varies with #hashtags?
??
??
?
?
Graph Analytics wkshp
?
B. A. Prakash; C. Faloutsos
?
147
Tweet Diffusion: Data
• Yahoo! Twitter firehose
• More than 750 million tweets (> 10 Tera-bytes)
• Test-bed of > 6000 machines
– Hadoop+PIG system ver 0.20.204.0
• Took top 500 hashtags (by volume) in Feb 2011
• Network of users:
– connecting user X to user Y if X directed at least 3
@-messages to Y (or RT-ed a tweet)
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
148
Tweet Diffusion
• Propagation = Influence + External
• Developed a model
– takes the previous observations into account
– with parameters representing external influence
• Learn from previous data
– EM-style alternating minimizing algorithm
• Group tags according to learnt params
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
149
Results: External Influence vs
Time
“External
Effects”
Long-running tags
#nowwatching, #nowplaying, #epictweets
“Word-of-mouth”
#purpleglasses, #brits, #famouslies
#oscar, #25jan
Bursty, external
events
#openfollow, #ihatequotes, #tweetmyjobs
time
“Word-of-mouth”
Not trending
Can also use for Forecasting, Anomaly
Detection!
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
150
Outline
•
•
•
•
•
Motivation
Part 1: Understanding Epidemics (Theory)
Part 2: Policy and Action (Algorithms)
Part 3: Learning Models (Empirical Studies)
Conclusion
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
151
Conclusions
• Epidemic Threshold
– It’s the Eigenvalue
• Fast Immunization
– Max. drop in eigenvalue,
linear-time near-optimal algorithm
• Bursts: SpikeM model
– Exponential growth,
Power-law decay
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
152
Theory
& Algo.
Biology
Physics
Comp.
Systems
ML &
Stats.
Social
Science
Propagation
on Networks
Econ.
153
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
Publications
* 1.
** 2.
* 3.4.
5.
*
*
*
*
6.
7.
Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos
Faloutsos) – In WWW 2012, Lyon
Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis
Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of
ICDM.)
Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue
Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang
Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain
Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos
Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.
On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE
ICDM 2010, Sydney, Australia
Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong,
Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain
8.
MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos,
Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.
9.
Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010,
Singapore
10.
EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan,
Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India
11.
BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis
Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France.
12.
Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund
Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami
13.
FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash)
– In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed:
Johannes Gehrke.
14.
Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In
Graph Analytics
wkshp
B. A. Prakash; C. Faloutsos
154
IEEE ICDE
2007, Istanbul, Turkey.
Acknowledgements
Collaborators
Christos Faloutsos
Roni Rosenfeld,
Michalis Faloutsos,
Lada Adamic,
Theodore Iwashyna (M.D.),
Dave Andersen,
Tina Eliassi-Rad,
Iulian Neamtiu,
Varun Gupta,
Jilles Vreeken,
Graph Analytics wkshp
Deepayan Chakrabarti,
Hanghang Tong,
Kunal Punera,
Ashwin Sridharan,
Sridhar Machiraju,
Mukund Seshadri,
Alice Zheng,
Lei Li,
Polo Chau,
Nicholas Valler,
Alex Beutel,
Xuetao Wei
B. A. Prakash; C. Faloutsos
155
Acknowledgements
Funding
Graph Analytics wkshp
B. A. Prakash; C. Faloutsos
156
Dynamical Processes on Large
Networks
B. Aditya Prakash
Christos Faloutsos
Analysis
Graph Analytics wkshp
Policy/Action
B. A. Prakash; C. Faloutsos
Data
157
Download