Powerpoint

advertisement
The Universal Laws
of Structural Dynamics
in Large Graphs
Dmitri Krioukov
UCSD/CAIDA
David Meyer & David Rideout
UCSD/Math
F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá
M. Ostilli
DARPA’s GRAPHS, Washington, DC, Halloween 2012
(Cancelled by the Frankenstorm)
High-level project description
• Motivation:
– Predict network dynamics
– Detect anomalies
• Goal:
– Identify the universal laws of network dynamics
• Methods: geometry: random geometric graphs
– Past work: static graphs
– Present/future work: dynamic graphs
Outline
• Hyperbolic
popular
similar
– Growing random hyperbolic graphs
• Next step
– Random Lorentzian graphs
Growing
hyperbolic
Random geometric graph
• “Discretization” of a smooth manifolds
(B.Riemann, Nature, v.7)
hyperbolic
• Take a circle of radius R R grows to R+dR
1 new point in [R,R+dR]
• Sprinkle N points into it uniformly at random
new point to existing
• Connect each pair of points iff the distance
between them is x r R
𝑑 = 1,2,3, …
π‘Ÿπ‘‘ = ln(𝑑)
πœƒπ‘‘ ∈ π‘ˆ(0,2πœ‹)
Connecting to m closest nodes
The expected distance to the m’th closest node from t is:
πœ‹π‘šπ‘‘
πœ‹π‘š
𝑅𝑑 = ln
≈ π‘Ÿπ‘‘ + ln
≈ π‘Ÿπ‘‘
1
2
2 1−
𝑑
New node t located at radial coordinate rt ln t,
and connecting to all nodes within distance Rt ~ rt,
connects to a fixed number of closest nodes
Closest nodes
The hyperbolic distance between s and t is
π‘₯𝑠𝑑 = acosh cosh π‘Ÿπ‘  cosh π‘Ÿπ‘‘ − sinh π‘Ÿπ‘  sinh π‘Ÿπ‘‘ cos πœƒπ‘ π‘‘
πœƒπ‘ π‘‘
π‘ π‘‘πœƒπ‘ π‘‘
≈ π‘Ÿπ‘  + π‘Ÿπ‘‘ + ln
= ln
2
2
Find m nodes s, s t, with smallest xst for a given t:
π‘ π‘‘πœƒπ‘ π‘‘
π‘ π‘‘πœƒπ‘ π‘‘
min π‘₯𝑠𝑑 = min ln
= min
= min π‘ πœƒπ‘ π‘‘
2
2
New node t connects to a fixed number of existing nodes s
with smallest s st
Hyperbolic
popular
similar
• Two dimensions of attractiveness
– Radial popularity: birth time s:
• The smaller the s, the more popular is the node s
– Angular similarity: distance
• The smaller the
st,
st:
the more similar is the node s to t
• New node t connects to existing nodes s
optimizing trade-offs between
popularity and similarity
• This trade-off optimization yields
hyperbolic geometry
What else it yields
• Power-law graphs
• With strongest possible clustering
• Effective preferential attachment
π‘Ÿπ‘  𝑑 = π›½π‘Ÿπ‘  + (1 − 𝛽)π‘Ÿπ‘‘
𝑠<𝑑
0<𝛽≤1
1
𝛾 =1+
𝛽
𝑃 π‘˜ ~ π‘˜ −𝛾
Clustering
• Probability of new connections from t to s so far
𝑝 π‘₯𝑠𝑑 = Θ π‘…π‘‘ − π‘₯𝑠𝑑
• If we smoothen the threshold
1
𝑝 π‘₯𝑠𝑑 =
1+
π‘₯𝑠𝑑 −𝑅𝑑
𝑒 𝑇
T→0
Θ π‘…π‘‘ − π‘₯𝑠𝑑
• Then average clustering linearly decreases
with T from maximum at T = 0 to zero at T = 1
• Clustering is always zero at T > 1
• The model becomes identical to PA at T
Effective preferential attachment
• Average attractiveness of nodes of degree k
is a linear function of k
• Probability that new node t connects to
π‘˜−π‘š
an existing node of degree k is Π π‘˜ =
𝑑
PSO
• PSO
PA
PA
S, where
– PSO is popularity similarity optimization
– PA is preferential attachment (popularity)
– S is similarity (sphere)
• PA is 1-dimensional (radial popularity)
• PSO is d 1-dimensional, where d is the
dimensionality of the similarity space
Validation
• Take a series of historical snapshots of a real
network
• Infer angular/similarity coordinates for each
node
• Test if the probability of new connections
follows the model theoretical prediction
Learning similarity coordinates
• Take a historical snapshot of a real network
• Apply a maximum-likelihood estimation method
(e.g., MCMC) using the static hyperbolic model
• Metropolis-Hastings example
–
–
–
–
–
–
–
–
Assign random coordinates to all nodes
aij
1ο€­ aij
Compute current likelihood Lc ο€½  p ( xij ) [1 ο€­ p ( xij )]
iο€Ό j
Select a random node
Move it to a new random angular coordinate
Compute new likelihood Ln
If Ln > Lc, accept the move
If not, accept it with probability Ln / Lc
Repeat
Real networks
• PGP web of trust
– Nodes: PGP certificates (roughly, people)
– Links: Trust relationships
• Internet
– Nodes: Autonomous systems (ASes)
– Links: Business relationships
• Metabolic (E.coli)
– Nodes: Metabolites
– Links: Reactions
Binning and overfitting
• Number of parameters (𝑂(𝑁), node coordinates) is
much smaller then the number of unknowns
(𝑂(𝑁 2 ), distances between nodes)
• Overfitting is impossible but we have to bin the
hyperbolic distances with a small number of bins
to compute empirical connection probability
• More rigorous measures of the fitting quality,
independent of any binning, are desired
More rigorous measures
of modeling quality
• By maximizing likelihood β„’, MLE minimizes the
logarithmic loss 𝐿 ≡ − log β„’
• The modeling quality is thus measured by either the
log-loss difference 𝐿 − πΏπ‘Ÿπ‘Žπ‘›π‘‘ , or the normalized
β„’
likelihood
= exp(πΏπ‘Ÿπ‘Žπ‘›π‘‘ − 𝐿), where
β„’π‘Ÿπ‘Žπ‘›π‘‘
– 𝐿 is the log-loss with the inferred coordinates
– πΏπ‘Ÿπ‘Žπ‘›π‘‘ is the log-loss with random angular coordinates
• The normalized likelihood β„’/β„’π‘Ÿπ‘Žπ‘›π‘‘ is thus the ratio
of the probability that a given network with the
inferred coordinates is generated by the model, to the
same probability with random coordinates, in which
case the network has “nothing to do” with the model
Normalized likelihood
Network
β„’/ℒ𝒓𝒂𝒏𝒅
Social web of trust (PGP)
exp(2.3 × 105 )
Internet (AS)
exp(1.3 × 105 )
Metabolic (E.Coli)
exp(6.7 × 103 )
Actor collaborations (IMDB)
exp(−2.2 × 106 )
• The popularity similarity model does not
describe well the actor network because very
dissimilar actors often collaborate on big movies
Soft community detection effect
• Inferred coordinates correlate with meaningful
node groups
Capturing network structure
• As a “simple” consequence of the fact the PSO
model accurately describes the large-scale network
growth dynamics, it also reproduces very well the
observed large-scale network structure across a
wide range of structural properties
Take-home messages (on PSO)
• Popularity similarity optimization dynamics
Geometrical hyperbolicity
Topological heterogeneity transitivity (real nets)
• Popularity is modeled by radial coordinates
• Similarity is modeled by angular coordinates
– Projections of a properly weighted combination of all
the factors shaping the network structure
Immediate applications
(submitted)
• New simple network-embedding method
– The idea is to “replay” the growth of a given network
snapshot according to PSO
• New link prediction method, outperforming all the
most popular link prediction methods
– Some classes of links can be predicted with 100%
accuracy
– Perhaps because the method captures all the factors
shaping the network structure
Something is definitely wrong
• Node density is not uniform unless 𝛾 = 3, while
𝛾 ≈ 2 in all the considered real networks
• Modeled graphs are not random geometric graphs
• They do not properly reflect hyperbolic geometry
• The main project goal (find fundamental laws of
network dynamics using geometry) cannot be
achieved using hyperbolic geometry
Plausible solution
• Geometry under real networks is not hyperbolic
but Lorentzian
• Lorentzian manifolds explicitly model time
• Proof that PSO graphs are random geometric
graphs on de Sitter spacetime (accepted)
Lorentzian manifolds
• Pseudo-Riemannian manifold is a manifold
with a non-degenerate metric tensor
– Distances can be positive, zero, or negative
• Lorentzian manifold is a manifold with
signature (−, +, +, … , +)
– Coordinate corresponding to the minus sign is
called time
– Negative distance are time-like
– Positive distance are space-like
Causal structure
• For each point 𝑝 ∈ 𝑀, the set of points at timelike distances from p can be split in two subsets:
– 𝐼+ 𝑝 = p’s future
– 𝐼− 𝑝 = p’s past
• If π‘ž ∈ 𝐼 + (𝑝), then 𝐴 𝑝, π‘ž = 𝐼+ (𝑝) ∩ 𝐼 − (π‘ž) is
called the Alexandrov set of (𝑝, π‘ž)
Alexandrov sets
• Form a base of the manifold topology
– Similar to open balls in Riemannian case
Lorentzian
Random geometric graph
• “Discretization” of a smooth manifolds
(B.Riemann, Nature, v.7)
Lorentzian
T; 𝑑 ∈ [0, 𝑇]
• Take a circle of radius R
• Sprinkle N points into it uniformly at random
• Connect each pair of points iff the distance
between them is x r R 0; because Alexandrov
sets are “balls” now
Major challenge (in progress)
• On the one hand, random Lorentzian graphs are
random geometric graphs, and consequently
exponential random graphs (equilibrium ensembles)
• On the other hand, they are dynamic growing graphs
(non-equilibrium ensembles)
• Can it be the case that a given ensemble of graphs is
static (equilibrium) and dynamic (non-equilibrium)
at the same time???
• If we prove that it is indeed the case, then we
– Discover some unseen static-dynamic graph duality
– Open a possibility to apply very powerful tools developed
for equilibrium systems (e.g., exponential random graphs),
to dynamic networks
•
F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá, and D. Krioukov,
Popularity versus Similarity in Growing Networks,
Nature, v.489, p.537, 2012
Download