Diversity of Graphs with Highly Variable Connectivity* David Alderson Operations Research Department Naval Postgraduate School *Joint work with Lun Li (Caltech) Acknowledgments: John Doyle, Walter Willinger, Daniel Whitney IPAM Workshop: Random and Dynamic Graphs and Networks May 8, 2007 Random and Dynamic Graphs and Networks objective: characterize the structure and behavior of a large, complex network approach: focus on graph theoretic properties • measure the connectivity statistics of real networks • develop generative models to explain what is observed • consider dynamics – dynamics of the network: changes to the network itself – dynamics on the network: separate dynamical processes constrained by a given network structure implicit assumption: graph theoretic properties adequately capture key system features in order to serve as a basis for comparison and contrast 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 2 What can go wrong? Potential pitfall #1: attempting to use a simple graph to represent a complex system involving – heterogeneous components – layered architectures – feedback dynamics Possible result: modeling artifacts lead to misinterpretation and misrepresentation of what “matters” for system function References: • The “robust yet fragile” nature of Internet topology [Doyle et al, PNAS 102, 14497 (2005)] • Cellular metabolism [Tanaka, Phys Rev Lett 94, 168101 (2005)] this will not be the focus of this talk 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 3 What can go wrong? Potential pitfall #2: ignoring the fact that many different processes for network formation can give rise to the same structural properties Equivalently: assuming that the ability to reproduce an observed structural property of a graph is evidence that a particular mechanism “explains” the presence of that property Example: preferential attachment power laws Reference: Li, Alderson, Doyle, Willinger. Toward a Theory of Scale-Free Networks: Definition, Properties, and Implications. Internet Mathematics 2(4), 2006. this will not be the focus of this talk 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 4 What can go wrong? Potential pitfall #3: assuming that a particular statistical description is sufficient to characterize graph structure Equivalently: failing to recognize that there can be great diversity even among graphs having the same statistics Example: degree distributions, particularly when heavytailed Ref: D. Alderson and L. Li. Diversity of graphs with highly variable connectivity. Phys Rev E 75, 046102 (2007) this will be the focus of this talk 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 5 basic notation 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 6 graphs with degree sequence D 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 7 graphs with degree sequence D • restriction to a particular D: popular for graph generation • Configuration Model (CM) as a null hypothesis – it yields graphs that are maximally random (in the sense of maximum entropy) • selected references – Bender and Canfield (1978) – Molloy and Reed (1995) – Aiello et al (2000) – Newman (2002) – Chung and Lu (2003) • Here, we always restrict attention to a particular D 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 8 degree sequences and correlation general recognition: degree sequence of a graph provides only a simplistic characterization of its properties recently: consider more sophisticated descriptions of network connectivity, with emphasis on correlation – simple notions of network clustering (i.e., connectivity correlations between vertex triplets) – more general degree-degree correlations (also called the joint degree distribution or JDD) – spectral methods a growing literature on the importance of correlation structure in networks 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 9 correlation (Pearson) coefficient • graph assortativity: how likely will a vertex connect to another having similar degree? • the correlation coefficient summarizes the joint distribution P(k,k') that a randomly selected link in the network will connect vertices having degree values k and k' 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 10 correlation (Pearson) coefficient • a summary statistic for the graph’s correlation profile – consistently positive for some kinds of networks – consistently negative for others • often cited as a key feature distinguishing various classes of complex networks • several explanations have been offered – Maslov and Sneppen (2002) – Newman and Park (2003) • evidence suggesting that correlation coefficient is constrained by the degree sequence of the graph 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 11 Basic Questions 1. How does the degree sequence of a graph dictate connectivity features, including correlation structure? 2. What kind of diversity exists among graphs having the same degree sequence? 3. What are the implications for the use of degree-based graph generation techniques as models of real systems? • Can the graph theoretic properties of networks from different application contexts be directly compared? • How should one interpret the graph theoretic properties of a network when studied in isolation? 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 12 a structural metric Implicitly, s(g) measures the extent to which the graph g has a “hub-like” core and is maximized when high-degree vertices are connected to other high-degree vertices. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 13 s-metric: extreme points 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 14 the restricted space G(D) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 15 properties of s(g) and smax • s(g) easily computed for any graph g • depends only on the structural features of g, not how it was generated In G(D): • high degree nodes in the smax graph have high centrality (a monotonic relationship in trees) • smax graphs are self-similar under appropriately defined operations of trimming and coarse graining • the smax graph has the highest likelihood of being generated by the Generalized Random Graph (GRG) model [Chung and Lu 2003] 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 16 measuring graph diversity • We will use s-values to measure diversity among graphs having the same degree sequence D. • The difference smax – smin provides a simple bound on the possible diversity equivalent in practice 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 17 How different are the smin and smax values? Answer: it depends on the variability in D itself. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 18 reference graphs: chains and stars a chain 08 May 2007 a star IPAM Workshop: Random and Dynamic Graphs and Networks 19 reference graphs: exponential and scaling 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 20 variability in degree sequence high low 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 21 a numerical experiment Graph generation via preferential attachment: Given a choice for n and p, a single experiment yields: • A connected tree with unspecified degree sequence D • Given D: solve analytically for smin and smax within G(D) • Given D: compute smax in G(D) via deterministic algorithm • Given D: compute smin in G(D) heuristically Repeating this experiment for different values of p yields a systematic means for generating different degree sequences 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 22 a numerical experiment Graph generation via preferential attachment: Note: one can obtain the reference graphs from different p • p -∞ yields Dchain • p=0 yields Dexp • p=1 yields Dscaling • p∞ yields Dstar We are more interested in the degree sequences D than the values of p that generated them. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 23 numerical results: trees of size n=100 5 10 smax smax smin smin in G(D) in G(D) in G(D) in G(D) 4 10 3 10 0 08 May 2007 0.5 1 1.5 2 2.5 C V(D) 3 3.5 4 4.5 IPAM Workshop: Random and Dynamic Graphs and Networks 5 24 measuring diversity with s(g) • Raw values of s(g) may not be informative • Consider normalized versions of s(g) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 25 numerical results: trees of size n=100 smax in G(D) smax in G(D) smin in G(D) smin in G(D) 5 10 4 10 “normalized” values s/smax in G(D) 1 0.9 3 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 5 s / smax 10 0.8 0.7 0.6 smax smin 0.5 0.4 08 May 2007 0 0.5 1 1.5 2 2.5 CV(D) IPAM Workshop: Random and Dynamic Graphs and Networks 3 3.5 4 4.5 5 26 measuring diversity with s(g) • Raw values of s(g) may not be informative • Consider normalized versions of s(g) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 27 assortativity revisited s(g) 08 May 2007 ??? IPAM Workshop: Random and Dynamic Graphs and Networks 28 a perfect zero assortativity “graph” 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 29 Pearson coefficient revisited 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 30 numerical results: trees of size n=100 5 10 4 10 smax in G(D) smax in G(D) smin in G(D) smin in G(D) 3 10 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 5 1 s / smax 0.9 0.8 0.7 0.6 smax smin 0.5 0.4 08 May 2007 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 5 IPAM Workshop: Random and Dynamic Graphs and Networks 31 numerical results: trees of size n=100 0.6 rmax rmin 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 08 May 2007 0 0.5 1 1.5 2 2.5 CV(D) 3 3.5 4 IPAM Workshop: Random and Dynamic Graphs and Networks 4.5 5 32 numerical results: trees of size n=100 0.6 5 10 4 10 smax in G(D) smax in G(D) smin in G(D) smin in G(D) rmax rmin 0.4 0.2 0 -0.2 -0.4 -0.6 3 10 -0.8 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 -1 5 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 5 1 s / smax 0.9 0.8 0.7 0.6 smax smin 0.5 0.4 08 May 2007 0 0.5 1 1.5 2 2.5 3 CV(D) 3.5 4 4.5 5 IPAM Workshop: Random and Dynamic Graphs and Networks 33 Pearson coefficient and background sets The implicit use of G(D) as background set for r(g) means: • For degree sequences D with high Cv(D), r(g) is always negative and tends to hide differences in s(g) • For degree sequences D with low Cv(D), r(g) is very sensitive to small structural changes and tends to exaggerate differences in s(g) 08 May 2007 CV(D) s / smax r 0 1 0 0 1 -1 0 1 0 IPAM Workshop: Random and Dynamic Graphs and Networks 34 Node Rank four graphs with the same D 10 10 1 smax = 77350 rmax = -0.4243 0 10 1 10 2 Node Degree 128 26 13 8 14 8 32 64 11 43 8 16 8 18 12 43 s = 29876 s/smax = 0.3862 S = 0.022 r = -0.4815 08 May 2007 18 18 8 16 8 128 16 128 12 26 12 16 8 32 14 32 11 s = 33959 s/smax = 0.4390 S = 0.106 r = -0.4766 26 64 8 s = 60271 s/smax = 0.7792 S = 0.648 r = -0.4449 IPAM Workshop: Random and Dynamic Graphs and Networks 8 43 21 26 21 18 8 21 64 8 14 13 64 43 8 32 8 11 8 128 12 21 8 8 13 13 11 14 s = 74010 s/smax = 0.956 S = 0.931 r = -0.4283 35 Source: Doyle et al, PNAS (2005) “HOTnet” 128 26 13 “poor design” 8 14 8 32 64 16 8 18 12 43 s = 29876 s/smax = 0.3862 S = 0.022 r = -0.4815 08 May 2007 18 18 8 16 8 128 16 128 12 26 12 16 8 32 14 32 11 s = 33959 s/smax = 0.4390 S = 0.106 r = -0.4766 26 64 8 s = 60271 s/smax = 0.7792 S = 0.648 r = -0.4449 IPAM Workshop: Random and Dynamic Graphs and Networks 8 43 21 26 21 18 8 21 64 8 14 13 64 43 8 32 8 11 8 128 12 21 8 8 13 11 43 8 “HSFnet” “random” 13 11 14 s = 74010 s/smax = 0.956 S = 0.931 r = -0.4283 36 Recap • considerable diversity among graphs having same D • sequence D constrains the possible values of s(g) – variability in D itself • background sets: implications for interpretation – r(g) as a normalization of s(g) in G(D) – structural differences can be hidden or exaggerated Questions • How does a “random” graph compare against smin and smax values? • Implications for use of random graphs as a basis for comparison? 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 37 numerical experiment revisited • For a given attachment exponent p, generate a tree having n = 100 nodes (with corresponding D) • For the resulting degree sequence D: – Solve analytically for smin and smax within G(D) – Compute smax in G(D) via deterministic algorithm – Compute smin in G(D) heuristically • Generate an ensemble of “random” graphs having D – degree preserving rewiring in G(D) – degree preserving rewiring in G(D) – configuration method (CM), implicitly in G(D) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 38 uniform attachment (p = 0) (a) 4 5 2 4 2 2 4 2 4 2 3 2 2 3 2 3 6 2 2 3 2 3 2 3 2 3 5 2 2 2 2 2 4 3 2 2 3 4 2 2 4 2 2 4 2 3 4 5 6 vertex degree 08 May 2007 Cumulative Distribution of graphs having degree D vertex rank 10 1 4 2 5 2 3 4 2 -1 -0.8 -0.6 2 2 2 2 4 6 2 2 4 2 2 5 5 6 3 3 2 3 3 3 4 3 3 3 2 -0.2 4 6 3 2 2 2 -0.4 3 3 2 3 4 2 sorig = 765, s/smax = 0.91, S = 0.71, rorig = 0.01 s=572, s/smax=0.68, S=0.04, r = -0.82 10 0 1 2 2 CV(D)=0.6380 3 4 2 3 4 2 4 4 2 4 3 degree sequence D 4 2 2 6 10 2 6 4 2 6 5 3 2 3 2 2 3 6 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2 4 3 2 2 2 2 3 4 2 2 2 2 2 6 2 2 3 3 the smax graph in G(D) the original graph the smin graph in G(D) 0 0.2 2 2 3 2 smax= 843, s/smax = 1, S = 1, rmax = 0.34 0.4 0.6 0.8 900 950 1 1 r-values 0.8 0.6 0.4 smin 0.2 0 550 600 sorig 650 700 750 800 smax 850 IPAM Workshop: Random and Dynamic Graphs and Networks s-values 39 linear preferential attachment (p 1) (b) 5 13 2 2 2 2 2 4 4 3 2 2 2 2 3 2 degree sequence D 1 2 10 10 vertex degree Cumulative Distribution of graphs having degree D vertex rank 08 May 2007 3 2 5 -0.4 2 13 3 2 3 2 2 2 2 3 8 2 2 2 sorig=1894, s/smax=0.71, S=0.50, rorig= -0.31 -0.6 5 23 2 2 10 1 10 0 10 5 4 3 2 10 CV(D)=1.4121 0 3 2 s=1182, s/smax=0.44, S=0.03, r = -0.45 10 2 2 23 2 13 23 2 2 2 2 3 2 2 2 4 8 4 2 2 3 2 5 2 3 10 4 5 3 2 4 2 3 3 4 2 2 2 10 5 3 2 2 2 2 2 8 2 2 5 the smax graph in G(D) the original graph the smin graph in G(D) -0.2 0 0.2 2 4 2 5 smax = 2659 , s/smax = 1, S = 1, rmax= -0.16 0.4 0.6 0.8 1 r-values 1 0.8 0.6 0.4 smin 0.2 0 0 1000 sorig 2000 smax 3000 4000 5000 6000 IPAM Workshop: Random and Dynamic Graphs and Networks 7000 8000 s-values 40 superlinear attachment (p > 1) (c) the smax graph in G(D) the original graph the smin graph in G(D) 4 2 2 5 2 3 2 5 8 3 2 3 2 5 2 2 2 3 3 2 2 3 47 3 2 2 2 2 3 47 2 3 2 4 3 4 2 2 2 3 2 2 2 2 19 19 3 2 2 19 CV(D)=2.5104 10 1 10 0 100 101 102 vertex degree Cumulative Distribution of graphs having degree D vertex rank 10 degree sequence D 2 2 2 sorig=4623, s/smax=0.90, S=0.78, rorig= -0.44 -0.5 1 -0.4 -0.3 2 2 2 2 s=2844, s/smax=0.55, S=0.03, r = -0.49 2 2 8 47 -0.2 -0.1 2 8 2 smax = 5131, s/smax = 1, S = 1, rmax= -0.43 0 0.1 0.2 0.3 r-values 0.8 0.6 0.4 0.2 0 sorig smax smin 0 0.5 1 1.5 2 2.5 3 s-values 4 x 10 very unlikely that a “random” graph will be in G(D) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 41 observations 1. For each D, there is considerable diversity • smin is very “chain like” • smax is very “star like” 2. Range for G(D) is greater than for G(D), and this increases with variability in D 3. Assortativity r(g) hides some of these differences, while s(g) highlights them 4. Generating an ensemble of graphs using random rewiring is unlikely to obtain the smin and smax values 5. Good correspondence between random rewiring in G(D) and CM, with values largely centered on r(g)=0 6. The distribution of graphs in G(D) is consistently shifted toward larger s-values than those in G(D) 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 42 numerical experiment: non-trees • For a given attachment exponent p, generate a tree having n = 100 nodes (with corresponding D) – initial graph: n nodes, n-1 links – add an additional k(n-1) links using same (k) • For the resulting degree sequence D: – Solve analytically for smin and smax within G(D) – Compute smax in G(D) via deterministic algorithm – Compute smin in G(D) heuristically – Compute rmin and rmax in G(D) accordingly • Repeat many times 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 43 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 44 Takeaway Message #1 Considerable diversity exists among graphs having the same degree sequence. Open question: To what extent does a similar story hold for higher order descriptions, including correlation structure? 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 45 Takeaway Message #2 Graphs that arise from different contexts may not be directly comparable using structural metrics unless defined in terms of an appropriate and consistent background set. The differences between the “unconstrained” space G(D) and the space of simple, connected graphs G(D) may be more important in determining graph properties than other features as measured by aggregate statistics. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 46 Takeaway Message #3 While it is clear that the evaluation of a graph based on its structural properties may be appropriate only in relation to the corresponding background set, understanding the implication of those structural features (e.g., in terms of function) remains an open question. For example, it remains unclear what, if anything, the relative placement of a graph within the range [smin , smax] actually says about the graph itself. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 47 selected references • D. Alderson and L. Li. Diversity of Graphs With High Variability. Phys Rev E 75, 046102 (2007) • D. Alderson, H. Chang, M. Roughan, S. Uhlig, and W. Willinger. The Many Facets of Internet Topology and Traffic. AIMS Journal on Networks and Heterogeneous Media, 4(1), Dec. 2006. • L. Li, D. Alderson, J.C. Doyle, W. Willinger. Toward a Theory of Scale-Free Networks: Definition, Properties, and Implications. Internet Mathematics 2(4), 2006. • D. Alderson, L. Li, W. Willinger, J.C. Doyle. Understanding Internet Topology: Principles, Models, and Validation. IEEE Trans. on Networking. 13(6): Dec 2005. • J.C. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger. The "robust yet fragile" nature of the Internet. PNAS. October 4, 2005. • D. Alderson and W. Willinger. A contrasting look at self-organization in the Internet and next-generation communication networks. IEEE Comm. Magazine. July 2005. • L. Li, D. Alderson, W. Willinger, and J. Doyle, A first-principles approach to understanding the Internet’s router-level topology, Proc. ACM SIGCOMM 2004. • D. Alderson, J. Doyle, R. Govindan, and W. Willinger. Toward an Optimization-Driven Framework for Designing and Generating Realistic Internet Topologies. In ACM SIGCOMM Computer Communications Review, January 2003. 08 May 2007 IPAM Workshop: Random and Dynamic Graphs and Networks 48