Hyperbolic geometry of complex network

advertisement
Popularity versus Similarity
in Growing Networks
Fragiskos Papadopoulos
Cyprus University of Technology
M. Kitsak, M. Á. Serrano, M. Boguñá, and
Dmitri Krioukov
CAIDA/UCSD
UMD, November 2011
Preferential Attachment (PA)
• Popularity is attractive
• If new connections in a growing network
prefer popular (high-degree) nodes, then
the network has a power-law distribution of
node degrees
– This result can be traced back to 1924 (Yule)
Issues with PA
• Zero clustering
• PA per se is impossible in real networks
– It requires global knowledge of the network
structure to be implemented
• The popularity preference should be exactly
a linear function of the node degree
– Otherwise, no power laws
One solution to these problems
• Mechanism:
– New node selects an existing edge uniformly at
random
– And connects to its both ends
• Results:
–
–
–
–
No global intelligence
Effective linear preference
Power laws
Strong clustering
• Dorogovtsev et al., PRE 63:062101, 2001
One problem with this solution
• It does not reflect reality
• It could not be validated against growth of
real networks
No model that would:
• Be simple and universal (like PA)
– Potentially describing (as a base line)
evolution of many different networks
• Yield graphs with observable properties
– Power laws, strong clustering, to start with
– But many other properties as well
• Not require any global intelligence
• Be validated
Validation of growth mechanism
• State of the art
– Here is my new model
– The graphs that it produces have power laws!
– And strong clustering!!
– And even X!!!
• Almost never the growth mechanism is
validated directly
• PA was validated directly for many
networks, because it is so simple
Paradox with PA validation
• Dilemma
– PA was validated
– But PA is impossible
• Possible resolution
– PA is an emergent phenomenon
– A consequence of some other underlying
processes
Popularity versus Similarity
• Intuition
– I (new node) connect to you (existing node)
not only if you are popular (like Google or
Facebook), but also if you are similar to me
(like Tartini or free soloing) — homophily
• Mechanism
– New connections are formed by trade-off
optimization between popularity and similarity
Mechanism (growth algorithm)
• Nodes t are introduced one by one
–t
1, 2, 3, …
• Measure of popularity
– Node’s birth time t
• Measure of similarity
– Upon its birth, node t gets positioned at a random
coordinate θt in a “similarity” space
– The similarity space is a circle
– θ is random variable uniformly distributed on [0,2π]
– Measure of similarity between t and s is θst |θs θt|
Mechanism (contd.)
• New connections
– New node t connects to m existing nodes s, s
minimizing sθst
– That is, maximizing the product between
popularity and similarity
t,
New node t connects to m existing nodes s that minimize
— the hyperbolic distance
between s and t
New nodes connects to m hyperbolically closest nodes
The expected distance to the m’th closest node from t is
— average degree is fixed to 2m
— average degree grows logarithmically with t
if
2
New node t is located at radial coordinate rt ~ ln t,
and connects to all nodes within distance Rt ~ rt
Popularity similarity
Similarity only
Clustering
• Probability of new connections from t to s so far
• If we smoothen the threshold
• Then average clustering linearly decreases
with T from maximum at T = 0 to zero at T = 1
• Clustering is always zero at T > 1
• The model becomes identical to PA at T
Validation
• Take a series of historical snapshots of a real
network
• Infer angular/similarity coordinates for each
node
• Test if the probability of new connections
follows the model theoretical prediction
Learning similarity coordinates
• Take a historical snapshot of a real network
• Apply a maximum-likelihood estimation method
(e.g., MCMC) using the static hyperbolic model
• Metropolis-Hastings example
–
–
–
–
–
–
–
–
Assign random coordinates to all nodes
aij
1 aij
Compute current likelihood Lc   p ( xij ) [1  p ( xij )]
i j
Select a random node
Move it to a new random angular coordinate
Compute new likelihood Ln
If Ln > Lc, accept the move
If not, accept it with probability Ln / Lc
Repeat
Popularity similarity optimization
• Explains PA as an emergent phenomenon
• Resolves all major issues with PA
• Generates graphs similar to real networks
across many vital metrics
• Directly validates against some real networks
– Technological (Internet)
– Social (web of trust)
– Biological (metabolic)
PSO compared to PA
• PA just ignores similarity, which leads to
severe aberrations
– Probability of similar connections is badly
underestimated
– Probability of dissimilar connections is badly
overestimated
• If the connection probability is correctly
estimated, then one immediate application
is link prediction
Link prediction
• Suppose that some network has zero
temperature
• Then one can predict links with 100%
accuracy!
– Because the connection probability is
either 0 or 1
Non-zero temperature
• Link prediction is worse than 100%, but it
must be still accurate since the connection
probability is close to the step function
• No global intelligence is required
– At zero temperature, new nodes connect to
exactly the closest nodes
– Non-zero temperature models reality where this
hyperbolic proximity knowledge cannot be exact,
and where it is mixed with errors and noise
• PA is an infinite-temperature regime with
similarity forces reduced to nothing but noise
Download