Diversity of Graphs with Highly Variable Connectivity*

advertisement
Diversity of Graphs with Highly Variable Connectivity*
David Alderson
Operations Research Department
Naval Postgraduate School
*Joint work with Lun Li (Caltech)
Acknowledgments: John Doyle, Walter Willinger, Daniel Whitney
IPAM Workshop: Random and Dynamic Graphs and Networks
May 8, 2007
Random and Dynamic Graphs and Networks
objective: characterize the structure and behavior of a large,
complex network
approach: focus on graph theoretic properties
• measure the connectivity statistics of real networks
• develop generative models to explain what is observed
• consider dynamics
– dynamics of the network: changes to the network itself
– dynamics on the network: separate dynamical
processes constrained by a given network structure
implicit assumption: graph theoretic properties adequately
capture key system features in order to serve as a basis for
comparison and contrast
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
2
What can go wrong?
Potential pitfall #1: attempting to use a simple graph to represent a
complex system involving
– heterogeneous components
– layered architectures
– feedback dynamics
Possible result: modeling artifacts lead to misinterpretation and
misrepresentation of what “matters” for system function
References:
• The “robust yet fragile” nature of Internet topology
[Doyle et al, PNAS 102, 14497 (2005)]
• Cellular metabolism [Tanaka, Phys Rev Lett 94, 168101 (2005)]
this will not be the focus of this talk 
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
3
What can go wrong?
Potential pitfall #2: ignoring the fact that many different
processes for network formation can give rise to the same
structural properties
Equivalently: assuming that the ability to reproduce an observed
structural property of a graph is evidence that a particular
mechanism “explains” the presence of that property
Example:
preferential attachment  power laws
Reference: Li, Alderson, Doyle, Willinger. Toward a Theory of
Scale-Free Networks: Definition, Properties, and Implications.
Internet Mathematics 2(4), 2006.
this will not be the focus of this talk 
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
4
What can go wrong?
Potential pitfall #3: assuming that a particular statistical
description is sufficient to characterize graph structure
Equivalently: failing to recognize that there can be great
diversity even among graphs having the same statistics
Example: degree distributions, particularly when heavytailed
Ref: D. Alderson and L. Li. Diversity of graphs with highly
variable connectivity. Phys Rev E 75, 046102 (2007)
this will be the focus of this talk 
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
5
basic notation
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
6
graphs with degree sequence D
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
7
graphs with degree sequence D
• restriction to a particular D: popular for graph generation
• Configuration Model (CM) as a null hypothesis
– it yields graphs that are maximally random (in the
sense of maximum entropy)
• selected references
– Bender and Canfield (1978)
– Molloy and Reed (1995)
– Aiello et al (2000)
– Newman (2002)
– Chung and Lu (2003)
• Here, we always restrict attention to a particular D
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
8
degree sequences and correlation
general recognition: degree sequence of a graph provides
only a simplistic characterization of its properties
recently: consider more sophisticated descriptions of
network connectivity, with emphasis on correlation
– simple notions of network clustering (i.e.,
connectivity correlations between vertex triplets)
– more general degree-degree correlations (also called
the joint degree distribution or JDD)
– spectral methods
 a growing literature on the importance of correlation
structure in networks
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
9
correlation (Pearson) coefficient
• graph assortativity: how likely will a vertex connect to
another having similar degree?
• the correlation coefficient summarizes the joint distribution
P(k,k') that a randomly selected link in the network will
connect vertices having degree values k and k'
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
10
correlation (Pearson) coefficient
• a summary statistic for the graph’s correlation profile
– consistently positive for some kinds of networks
– consistently negative for others
• often cited as a key feature distinguishing various
classes of complex networks
• several explanations have been offered
– Maslov and Sneppen (2002)
– Newman and Park (2003)
• evidence suggesting that correlation coefficient is
constrained by the degree sequence of the graph
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
11
Basic Questions
1. How does the degree sequence of a graph dictate
connectivity features, including correlation structure?
2. What kind of diversity exists among graphs having the
same degree sequence?
3. What are the implications for the use of degree-based
graph generation techniques as models of real systems?
• Can the graph theoretic properties of networks from
different application contexts be directly compared?
• How should one interpret the graph theoretic properties
of a network when studied in isolation?
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
12
a structural metric
Implicitly, s(g) measures the extent to which the graph g
has a “hub-like” core and is maximized when high-degree
vertices are connected to other high-degree vertices.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
13
s-metric: extreme points
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
14
the restricted space G(D)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
15
properties of s(g) and smax
• s(g) easily computed for any graph g
• depends only on the structural features of g, not how it
was generated
In G(D):
• high degree nodes in the smax graph have high centrality
(a monotonic relationship in trees)
• smax graphs are self-similar under appropriately defined
operations of trimming and coarse graining
• the smax graph has the highest likelihood of being
generated by the Generalized Random Graph (GRG)
model [Chung and Lu 2003]
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
16
measuring graph diversity
• We will use s-values to measure diversity among graphs
having the same degree sequence D.
• The difference smax – smin provides a simple bound on the
possible diversity
equivalent
in practice
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
17
How different are the smin and smax values?
Answer: it depends on the variability in D itself.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
18
reference graphs: chains and stars
a chain
08 May 2007
a star
IPAM Workshop: Random and Dynamic Graphs and Networks
19
reference graphs: exponential and scaling
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
20
variability in degree sequence
high
low
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
21
a numerical experiment
Graph generation via preferential attachment:
Given a choice for n and p, a single experiment yields:
• A connected tree with unspecified degree sequence D
• Given D: solve analytically for smin and smax within G(D)
• Given D: compute smax in G(D) via deterministic algorithm
• Given D: compute smin in G(D) heuristically
Repeating this experiment for different values of p yields
a systematic means for generating different degree sequences
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
22
a numerical experiment
Graph generation via preferential attachment:
Note: one can obtain the reference graphs from different p
• p  -∞
yields
Dchain
• p=0
yields
Dexp
• p=1
yields
Dscaling
• p∞
yields
Dstar
We are more interested in the degree sequences D than
the values of p that generated them.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
23
numerical results: trees of size n=100
5
10
smax
smax
smin
smin
in G(D)
in G(D)
in G(D)
in G(D)
4
10
3
10
0
08 May 2007
0.5
1
1.5
2
2.5
C V(D)
3
3.5
4
4.5
IPAM Workshop: Random and Dynamic Graphs and Networks
5
24
measuring diversity with s(g)
• Raw values of s(g) may not be informative
• Consider normalized versions of s(g)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
25
numerical results: trees of size n=100
smax in
G(D)
smax in
G(D)
smin in
G(D)
smin in
G(D)
5
10
4
10
“normalized” values s/smax in G(D)
1
0.9
3
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
5
s / smax
10
0.8
0.7
0.6
smax
smin
0.5
0.4
08 May 2007
0
0.5
1
1.5
2
2.5
CV(D)
IPAM Workshop: Random and Dynamic Graphs and Networks
3
3.5
4
4.5
5
26
measuring diversity with s(g)
• Raw values of s(g) may not be informative
• Consider normalized versions of s(g)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
27
assortativity revisited
s(g)
08 May 2007
???
IPAM Workshop: Random and Dynamic Graphs and Networks
28
a perfect zero assortativity “graph”
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
29
Pearson coefficient revisited
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
30
numerical results: trees of size n=100
5
10
4
10
smax in
G(D)
smax in
G(D)
smin in
G(D)
smin in
G(D)
3
10
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
5
1
s / smax
0.9
0.8
0.7
0.6
smax
smin
0.5
0.4
08 May 2007
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
5
IPAM Workshop: Random and Dynamic Graphs and Networks
31
numerical results: trees of size n=100
0.6
rmax
rmin
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
08 May 2007
0
0.5
1
1.5
2
2.5
CV(D)
3
3.5
4
IPAM Workshop: Random and Dynamic Graphs and Networks
4.5
5
32
numerical results: trees of size n=100
0.6
5
10
4
10
smax in
G(D)
smax in
G(D)
smin in
G(D)
smin in
G(D)
rmax
rmin
0.4
0.2
0
-0.2
-0.4
-0.6
3
10
-0.8
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
-1
5
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
5
1
s / smax
0.9
0.8
0.7
0.6
smax
smin
0.5
0.4
08 May 2007
0
0.5
1
1.5
2
2.5 3
CV(D)
3.5
4
4.5
5
IPAM Workshop: Random and Dynamic Graphs and Networks
33
Pearson coefficient and background sets
The implicit use of G(D) as background set for r(g) means:
• For degree sequences D with high Cv(D), r(g) is always
negative and tends to hide differences in s(g)
• For degree sequences D with low Cv(D), r(g) is very
sensitive to small structural changes and tends to
exaggerate differences in s(g)
08 May 2007
CV(D)
s / smax
r
0
1
0
0
1
 -1
0
1
0
IPAM Workshop: Random and Dynamic Graphs and Networks
34
Node Rank
four graphs with the same D
10
10
1
smax = 77350
rmax = -0.4243
0
10
1
10
2
Node Degree
128
26
13
8
14
8
32
64
11
43
8
16
8
18
12
43
s = 29876
s/smax = 0.3862
S = 0.022
r = -0.4815
08 May 2007
18
18
8
16
8
128
16
128
12
26
12
16
8
32
14
32
11
s = 33959
s/smax = 0.4390
S = 0.106
r = -0.4766
26
64
8
s = 60271
s/smax = 0.7792
S = 0.648
r = -0.4449
IPAM Workshop: Random and Dynamic Graphs and Networks
8
43
21
26
21
18
8
21
64
8
14
13
64
43
8
32
8
11
8
128
12
21
8
8
13
13
11
14
s = 74010
s/smax = 0.956
S = 0.931
r = -0.4283
35
Source: Doyle et al, PNAS (2005)
“HOTnet”
128
26
13
“poor design”
8
14
8
32
64
16
8
18
12
43
s = 29876
s/smax = 0.3862
S = 0.022
r = -0.4815
08 May 2007
18
18
8
16
8
128
16
128
12
26
12
16
8
32
14
32
11
s = 33959
s/smax = 0.4390
S = 0.106
r = -0.4766
26
64
8
s = 60271
s/smax = 0.7792
S = 0.648
r = -0.4449
IPAM Workshop: Random and Dynamic Graphs and Networks
8
43
21
26
21
18
8
21
64
8
14
13
64
43
8
32
8
11
8
128
12
21
8
8
13
11
43
8
“HSFnet”
“random”
13
11
14
s = 74010
s/smax = 0.956
S = 0.931
r = -0.4283
36
Recap
• considerable diversity among graphs having same D
• sequence D constrains the possible values of s(g)
– variability in D itself
• background sets: implications for interpretation
– r(g) as a normalization of s(g) in G(D)
– structural differences can be hidden or exaggerated
Questions
• How does a “random” graph compare against smin and
smax values?
• Implications for use of random graphs as a basis for
comparison?
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
37
numerical experiment revisited
• For a given attachment exponent p, generate a tree
having n = 100 nodes (with corresponding D)
• For the resulting degree sequence D:
– Solve analytically for smin and smax within G(D)
– Compute smax in G(D) via deterministic algorithm
– Compute smin in G(D) heuristically
• Generate an ensemble of “random” graphs having D
– degree preserving rewiring in G(D)
– degree preserving rewiring in G(D)
– configuration method (CM), implicitly in G(D)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
38
uniform attachment (p = 0)
(a)
4
5
2
4
2
2
4
2
4
2
3 2
2
3
2
3
6
2
2 3
2
3
2
3
2
3
5
2
2
2
2
2 4
3
2
2
3
4
2
2
4
2
2
4
2
3
4
5
6
vertex degree
08 May 2007
Cumulative Distribution of
graphs having degree D
vertex rank
10 1
4
2
5
2
3
4
2
-1
-0.8
-0.6
2
2
2
2
4
6
2
2
4
2
2
5
5
6
3
3
2
3
3
3
4
3
3
3
2
-0.2
4
6
3
2
2
2
-0.4
3
3
2
3
4
2
sorig = 765, s/smax = 0.91, S = 0.71, rorig = 0.01
s=572, s/smax=0.68, S=0.04, r = -0.82
10 0 1
2
2
CV(D)=0.6380
3
4
2
3
4
2
4
4
2
4
3
degree sequence D
4
2
2
6
10 2
6
4
2
6
5
3
2
3
2
2
3
6
2
2
2
3
3
3
2
2
2
2
2
2
2
2
2
2
2
4
3
2
2
2
2
3
4
2
2
2
2
2
6
2
2
3
3
the smax graph in G(D)
the original graph
the smin graph in G(D)
0
0.2
2
2
3
2
smax= 843, s/smax = 1, S = 1, rmax = 0.34
0.4
0.6
0.8
900
950
1
1
r-values
0.8
0.6
0.4
smin
0.2
0
550
600
sorig
650
700
750
800
smax
850
IPAM Workshop: Random and Dynamic Graphs and Networks
s-values
39
linear preferential attachment (p  1)
(b)
5
13
2
2
2
2
2
4
4
3
2
2
2
2
3
2
degree sequence D
1
2
10
10
vertex degree
Cumulative Distribution of
graphs having degree D
vertex rank
08 May 2007
3
2
5
-0.4
2
13
3
2
3
2
2
2
2
3
8
2
2
2
sorig=1894, s/smax=0.71, S=0.50, rorig= -0.31
-0.6
5
23
2
2
10 1
10 0
10
5
4
3
2
10
CV(D)=1.4121
0
3
2
s=1182, s/smax=0.44, S=0.03, r = -0.45
10
2
2
23
2
13
23
2
2
2
2
3
2
2
2
4
8
4
2
2
3
2
5
2
3
10
4
5
3
2
4
2
3
3
4
2
2
2
10
5
3
2
2
2
2
2
8
2
2
5
the smax graph in G(D)
the original graph
the smin graph in G(D)
-0.2
0
0.2
2
4
2
5
smax = 2659 , s/smax = 1, S = 1, rmax= -0.16
0.4
0.6
0.8
1
r-values
1
0.8
0.6
0.4
smin
0.2
0
0
1000
sorig
2000
smax
3000
4000
5000
6000
IPAM Workshop: Random and Dynamic Graphs and Networks
7000
8000
s-values
40
superlinear attachment (p > 1)
(c)
the smax graph in G(D)
the original graph
the smin graph in G(D)
4
2
2
5
2
3
2
5
8
3
2
3
2
5
2
2
2
3
3
2
2
3
47
3
2
2
2
2
3
47
2
3
2
4
3
4
2
2
2
3
2
2
2
2
19
19
3
2
2
19
CV(D)=2.5104
10 1
10
0
100
101
102
vertex degree
Cumulative Distribution of
graphs having degree D
vertex rank
10
degree sequence D
2
2
2
sorig=4623, s/smax=0.90, S=0.78, rorig= -0.44
-0.5
1
-0.4
-0.3
2
2
2
2
s=2844, s/smax=0.55, S=0.03, r = -0.49
2
2
8
47
-0.2
-0.1
2
8
2
smax = 5131, s/smax = 1, S = 1, rmax= -0.43
0
0.1
0.2
0.3
r-values
0.8
0.6
0.4
0.2
0
sorig
smax
smin
0
0.5
1
1.5
2
2.5
3
s-values
4
x 10
very unlikely that a “random”
graph will be in G(D)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
41
observations
1. For each D, there is considerable diversity
• smin is very “chain like”
• smax is very “star like”
2. Range for G(D) is greater than for G(D), and this
increases with variability in D
3. Assortativity r(g) hides some of these differences, while
s(g) highlights them
4. Generating an ensemble of graphs using random
rewiring is unlikely to obtain the smin and smax values
5. Good correspondence between random rewiring in
G(D) and CM, with values largely centered on r(g)=0
6. The distribution of graphs in G(D) is consistently
shifted toward larger s-values than those in G(D)
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
42
numerical experiment: non-trees
• For a given attachment exponent p, generate a tree
having n = 100 nodes (with corresponding D)
– initial graph: n nodes, n-1 links
– add an additional k(n-1) links using same (k)
• For the resulting degree sequence D:
– Solve analytically for smin and smax within G(D)
– Compute smax in G(D) via deterministic algorithm
– Compute smin in G(D) heuristically
– Compute rmin and rmax in G(D) accordingly
• Repeat many times
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
43
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
44
Takeaway Message #1
Considerable diversity exists among graphs having the
same degree sequence.
Open question: To what extent does a similar story hold for
higher order descriptions, including correlation structure?
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
45
Takeaway Message #2
Graphs that arise from different contexts may not be
directly comparable using structural metrics unless
defined in terms of an appropriate and consistent
background set.
The differences between the “unconstrained” space G(D)
and the space of simple, connected graphs G(D) may be
more important in determining graph properties than other
features as measured by aggregate statistics.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
46
Takeaway Message #3
While it is clear that the evaluation of a graph based on its
structural properties may be appropriate only in relation to
the corresponding background set, understanding the
implication of those structural features (e.g., in terms of
function) remains an open question.
For example, it remains unclear what, if anything, the
relative placement of a graph within the range [smin , smax]
actually says about the graph itself.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
47
selected references
• D. Alderson and L. Li. Diversity of Graphs With High Variability. Phys Rev E 75,
046102 (2007)
• D. Alderson, H. Chang, M. Roughan, S. Uhlig, and W. Willinger. The Many Facets of
Internet Topology and Traffic. AIMS Journal on Networks and Heterogeneous Media,
4(1), Dec. 2006.
• L. Li, D. Alderson, J.C. Doyle, W. Willinger. Toward a Theory of Scale-Free Networks:
Definition, Properties, and Implications. Internet Mathematics 2(4), 2006.
• D. Alderson, L. Li, W. Willinger, J.C. Doyle. Understanding Internet Topology:
Principles, Models, and Validation. IEEE Trans. on Networking. 13(6): Dec 2005.
• J.C. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W.
Willinger. The "robust yet fragile" nature of the Internet. PNAS. October 4, 2005.
• D. Alderson and W. Willinger. A contrasting look at self-organization in the Internet
and next-generation communication networks. IEEE Comm. Magazine. July 2005.
• L. Li, D. Alderson, W. Willinger, and J. Doyle, A first-principles approach to
understanding the Internet’s router-level topology, Proc. ACM SIGCOMM 2004.
• D. Alderson, J. Doyle, R. Govindan, and W. Willinger. Toward an Optimization-Driven
Framework for Designing and Generating Realistic Internet Topologies. In ACM
SIGCOMM Computer Communications Review, January 2003.
08 May 2007
IPAM Workshop: Random and Dynamic Graphs and Networks
48
Download