Kronecker Graphs

advertisement
Kronecker Graphs: An Approach
to Modeling Networks
Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg,
Christos Faloutsos, Zoubin Ghahramani
Presented by Eric Wang
4/21/2011
Introduction
• Modeling real world graphs are important for 2 reasons:
– (1) It can readily generate hypothetical graphs for extrapolation and
hypothesis testing and
– (2) It gives users a useful framework for studying network properties
that generative models should obey to be realistic.
• In this paper, the authors propose a generative network model
called the Kronecker graph that obeys all the static network
patterns exhibited in real work graphs.
• The three main goals of the paper are
– (1) Naturally produce networks where many properties of real networks
emerge.
– (2) Fast and scalable parameter estimation.
– (3) Generate realistic-looking networks that match statistical properties
of real networks.
Introduction
• To address the issue of efficient large-scale parameter
estimation, the authors introduce a maximum likelihood based
algorithm called KRONFIT.
• The fitted model has several interesting applications
–
–
–
–
–
–
–
–
Network Structure.
Null-model.
Simulations.
Extrapolations.
Sampling.
Graph similarity.
Graph visualizations.
Anonymization.
Network Properties
• Degree Distribution: The degree of a node is the number of
connections it has. It is heavy tailed and is given by
where
is the number of nodes with degree d and
.
• Small diameter: A graph with diameter D means that every
pair of nodes can be connected by a path of at most D edges.
This value tends to be small for most large real-world graphs.
• Hop-plot: Defined as the number of reachable pairs
within h hops , where
is fraction of connected pairs
whose shortest connecting path is at most h hops.
Network Properties
• Scree plot: A plot of the eigen- or singular values of the graph
adjacency matrix versus their rank, using the logarithmic scale.
This is found to approximately obey a power law.
• Densification power law: States that real networks tend to
sprout many more edges than nodes, and thus grow denser.
The relationship between the number of edges E(t) and the
number of nodes N(t) at time t is
where a is
typically larger than 1.
• Shrinking diameter: The effective diameter of graphs tends to
shrink and then stabilize for real world networks.
The Challenge of Parameter Estimation
• The “standard” method of estimating network models is called
the exponential random graph (p* models). This model
defines a log-linear model over all possible graphs G,
where s(.) is a set of functions that define summary statistics
for the structural features of the network.
• p* models are useful in modeling small networks and local
features, but are prohibitively expensive when the number of
nodes is large (>100).
• Another challenge is in finding correspondence between a
synthetic node and its real world counterpart.
Symbols and Notation
Kronecker Graph
• The Kronecker product C, of matrices for two matrices
and B of sizes n x m and n’ x m’ is given by
• The Kronecker product of two graphs is simply the Kronecker
product of their corresponding adjacency matrices.
• A crucial observation:
Kronecker Graph
Kronecker Graph
• Define the kth power of
as
Kronecker Graph
• Formally, a Kronecker graph of order k is defined by the
adjacency matrix
, where
is the Kronecker initiator
adjacency matrix.
• Several more examples of Kronecker graphs
Analysis of Kronecker Graphs
• A major advantage of Kronecker graphs is the ability to prove
analytical results regarding graph properties, including degree
distributions, diameters, eigenvalues, eigenvectors, and timeevolution.
• Degree Distribution: Kronecker graphs have multinomial
degree distributions, for both in- and out- degrees. A careful
choice of the initiator graph makes the resulting multinomial
behave like a power law.
• Multinomial eigenvalue and eigenvector distributions: The
eigenvectors and eigenvalues of a Krocker graph follow
multinomial distributions.
Analysis of Kronecker Graphs
• Connectivity of Kronecker Graphs: If at least one of G or H
is a disconnected graph, then
is also disconnected.
Further, if both G and H are connected but bipartite, then
is disconnected, and each of the two connected components is
again bipartite.
• Densification Power Law: Kronecker graphs follow the
densification power law (DPL), with densification exponent
• Diameter: If
has diameter D and a self-loop on every
node, then for every k, the graph also has diameter D.
• Effective diameter: If has diameter D and a self-loop on
every node, then for every q, the q-effective diameter of
approaches D from below as k increases.
Stochastic Kronecker Graphs
• This particular construction of a stochastic Kronecker graph
relaxes the assumption of the binary initiator matrix, and
instead allows each entry to take values on the interval [0,1].
• Later, the authors introduce a highly efficient hierarchical
sampling scheme to generate an instance of a Kronecker
graph.
Stochastic Kronecker Graphs
• A stochastic Kronecker graph is highly inefficient to store in
memory, so it is useful to compute the probability
of an
edge (u,v) occurring in the kth Kronecker graph in O(k) time:
• The recursive nature of stochastic Kronecker graphs also lends
itself to a fast generative procedure. Naively generating a
stochastic Kronecker graph K on N nodes takes
time,
while the proposed method takes linear time in the number of
edges of the graph.
Stochastic Kronecker Graphs
• Following Figure c, the authors recursively choose subregions
of the graph following the initiator matrix (Figure a) until they
reach a single cell (after K steps), and place an edge.
Stochastic Kronecker Graphs
• Another question that has to be answered is the number of
edges in the graph (to be generated). The authors state that the
number of edges in the kth Stochastic Kronecker graph is
normally distributed with mean
• Collisions of edges are rare (1% of edges collide) and simply
merit a re-insertion.
• Due to this slight error, the proposed generative method does
not yield exact samples from the graph parameter distribution,
but the authors state that the end result is indistinguishable
from graphs generated using the exact naïve procedure.
Simulations of Kronecker Graphs
• Citation network (CIT-HEP-TH):
Simulations of Kronecker Graphs
• Autonomous systems (AS-ROUTEVIEWS):
Kronecker Graph Model Estimation
• In this paper, the authors choose to find an initiator matrix
that yields a synthetic Kronecker graph K with the same
statistical properties as a real graph G.
• This is in contrast to existing parameter estimation schemes
that try to optimize by matching statistical properties because
it is difficult to specify a set of properties that accurately
describe a graph.
• The authors choose a maximum likelihood based approach.
This presents three challenges:
– Model selection/overfitting.
– Node Correspondence
– Likelihood estimation
Kronecker Graph Model Estimation
• Consider a graph G on
nodes, and an
stochastic Kronecker graph initiator matrix that we aim to
estimate.
• Using
where
we generate
and want to solve
are the parameters of
.
Kronecker Graph Model Estimation
• Since the mapping of nodes in G to those in K is unknown, all
possible permutations must be considered. Let denote a
particular mapping of nodes onto the adjacency matrix .
• Define the log likelihood
as
where
since the probability of any given edge is Bernoulli distributed.
Kronecker Graph Model Estimation
• Now the question becomes, how can we find the best
parameters of the initiator matrix?
, the
• Naively, a grid search could be used, but is highly inefficient.
In fact, even a naïve gradient descent algorithm still requires
order
time.
• The reason for this inefficiency is that without a clever way of
obtaining a good node permutation , we must sum over all
possible permutations, requiring N! time.
• The authors next introduce a Metropolis sampling approach
that performs the task in linear time.
Kronecker Graph Model Estimation
• The permutation distribution is
where Z is a computationally intractable normalizing constant.
• However, Z cancels out if the ratio of the likelihoods between
permutations and are computed.
Kronecker Graph Model Estimation
• The authors define two different proposal distributions to
generate permutation from the current permutation .
• Empirically, the authors find that they obtain the best
performance by executing SwapNodes with probability
SwapEdgeEndpoints with probability
.
• Using this approach, the sampling of
time.
and
can be done in O(kN)
Kronecker Graph Model Estimation
• Like sampling the permutations, computing the likelihood of a
given graph naively is quadratic in the number of nodes.
• The authors exploit the sparseness of a real graph, by first
calculating the likelihood on an empty graph (with no edges),
and then correcting for the edges that actually appear in G.
• The log-likelihood on an empty graph is approximated by a
scond order Taylor expansion
and then correct by subtracted the “no-edge” likelihood and
add the “edge” likelihoods
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Experiments on Real and Synthetic Data
Download