Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School of Computer Science McGlohon, Akoglu, Faloutsos KDD08 2 “Disconnected” components ● ● ● In graphs a largest connected component emerges. What about the smaller-size components? How do they emerge, and join with the large one? McGlohon, Akoglu, Faloutsos KDD08 3 Weighted edges ● Graphs have heavy-tailed degree distribution. ● What can we also say about these edges? ● How are they repeated, or otherwise weighted? McGlohon, Akoglu, Faloutsos KDD08 4 Our goals ● Observe “Next-largest connected components” Q1. How does the GCC emerge? Q2. How do NLCC’s emerge and join with the GCC? ● Find properties that govern edge weights Q3: How does the total weight of the graph relate to the number of edges? Q4: How do the weights of nodes relate to degree? Q5: Does this relation change with the graph? ● Q6: Can we produce an emergent, generative model McGlohon, Akoglu, Faloutsos KDD08 5 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 1 2 3 4 5 McGlohon, Akoglu, Faloutsos KDD08 6 Properties of networks ● Small diameter (“small world” phenomenon) – ● Heavy-tailed degree distribution – ● [Barabasi, Albert 99] [Faloutsos, Faloutsos, Faloutsos 99] Densification – ● [Milgram 67] [Leskovec, Horovitz 07] [Leskovec, Kleinberg, Faloutsos 05] “Middle region” components as well as GCC and singletons – [Kumar, Novak, Tomkins 06] McGlohon, Akoglu, Faloutsos KDD08 7 Generative Models ● Erdos-Renyi model [Erdos, Renyi 60] ● Preferential Attachment [Barabasi, Albert 99] ● ● ● ● Forest Fire model [Leskovec, Kleinberg, Faloutsos 05] Kronecker multiplication [Leskovec, Chakrabarti, Kleinberg, Faloutsos 07] Edge Copying model [Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, Upfal 00] “Winners don’t take all” [Pennock, Flake, Lawrence, Glover, Giles 02] McGlohon, Akoglu, Faloutsos KDD08 8 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 1 2 3 4 5 6 McGlohon, Akoglu, Faloutsos KDD08 9 Diameter ● Diameter of a graph is the “longest shortest path”. n5 n1 n2 n6 n3 n4 n7 McGlohon, Akoglu, Faloutsos KDD08 10 Diameter ● Diameter of a graph is the “longest shortest path”. n5 n1 diameter=3 n2 n6 n3 n4 n7 McGlohon, Akoglu, Faloutsos KDD08 11 Diameter ● ● Diameter of a graph is the “longest shortest path”. Effective diameter is the distance at which 90% of nodes can be reached. n5 n1 diameter=3 n2 n6 n3 n4 n7 McGlohon, Akoglu, Faloutsos KDD08 12 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 1 2 3 4 5 McGlohon, Akoglu, Faloutsos KDD08 13 Unipartite Networks ● ● Postnet: Posts in blogs, hyperlinks between Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● n1 n3 n2 n4 Autonomous Systems (AS): Packets, repeated edges n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 14 Unipartite Networks ● ● Postnet: Posts in blogs, hyperlinks between Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● (3) n1 n3 n2 n4 Autonomous Systems (AS): Packets, repeated edges n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 15 Unipartite Networks ● ● Postnet: Posts in blogs, hyperlinks between Blognet: Aggregated Postnet, repeated edges ● Patent: Patent citations ● NIPS: Academic citations ● Arxiv: Academic citations ● NetTraffic: Packets, repeated edges ● 10 n1 n2 n3 1.2 1 8.3 Autonomous Systems (AS): Packets, repeated edges n4 6 n5 n6 2 n7 McGlohon, Akoglu, Faloutsos KDD08 16 Unipartite Networks ● (Nodes, Edges, Timestamps) ● Postnet: 250K, 218K, 80 days ● Blognet: 60K,125K, 80 days ● Patent: 4M, 8M, 17 yrs n1 ● NIPS: 2K, 3K, 13 yrs ● Arxiv: 30K, 60K, 13 yrs ● NetTraffic: 21K, 3M, 52 mo ● AS: 12K, 38K, 6 mo n3 n2 n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 17 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: conference- repeated edges – Author-Keyword – Keyword-Conference – ● Author-Conference US Election Donations: $ weights, repeated edges – – n1 m 1 n2 m 2 n3 Orgs-Candidates Individuals-Orgs McGlohon, Akoglu, Faloutsos KDD08 m n4 3 18 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – ● Author-Conference US Election Donations: $ weights, repeated edges – – n1 m 1 n2 m 2 n3 Orgs-Candidates Individuals-Orgs McGlohon, Akoglu, Faloutsos KDD08 m n4 3 19 Bipartite Networks ● IMDB: Actor-movie network ● Netflix: User-movie ratings ● DBLP: repeated edges – Author-Keyword – Keyword-Conference – ● Author-Conference US Election Donations: $ weights, repeated edges – – n2 McGlohon, Akoglu, Faloutsos KDD08 1.2 n3 n4 m 1 2 5 Orgs-Candidates Individuals-Orgs 10 n1 1 6 m 2 m 3 20 Bipartite Networks ● IMDB: 757K, 2M, 114 yr ● Netflix: 125K, 14M, 72 mo ● DBLP: 25 yr – Author-Keyword: 27K, 189K – Keyword-Conference: 10K, 23K – ● Author-Conference: 17K, 22K US Election Donations: 22 yr – – Orgs-Candidates: 23K, 877K Individuals-Orgs: 6M, 10M McGlohon, Akoglu, Faloutsos KDD08 n1 m 1 n2 m 2 n3 m n4 3 21 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 1 2 3 4 5 McGlohon, Akoglu, Faloutsos KDD08 22 Observation 1: Gelling Point Q1: How does the GCC emerge? McGlohon, Akoglu, Faloutsos KDD08 23 Observation 1: Gelling Point ● ● Most real graphs display a gelling point, or burning off period After gelling point, they exhibit typical behavior. This is marked by a spike in diameter. IMDB t=1914 Diameter Time McGlohon, Akoglu, Faloutsos KDD08 24 Observation 2: NLCC behavior Q2: How do NLCC’s emerge and join with the GCC? Do they continue to grow in size? Do they shrink? Stabilize? McGlohon, Akoglu, Faloutsos KDD08 25 Observation 2: NLCC behavior ● After the gelling point, the GCC takes off, but NLCC’s remain constant or oscillate. IMDB CC size Time McGlohon, Akoglu, Faloutsos KDD08 26 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● Model ● Summary 1 2 3 4 5 McGlohon, Akoglu, Faloutsos KDD08 27 Observation 3 Q3: How does the total weight of the graph relate to the number of edges? McGlohon, Akoglu, Faloutsos KDD08 28 Observation 3: Fortification Effect ● $ = # checks ? Orgs-Candidates 2004 |$| 1980 |Checks| McGlohon, Akoglu, Faloutsos KDD08 29 Observation 3: Fortification Effect ● Weight additions follow a power law with respect to the number of edges: – W(t): total weight of graph at t – E(t): total edges of graph at t – w is PL exponent – 1.01 < w < 1.5 = super-linear! – (more checks, even more $) Orgs-Candidates 2004 |$| 1980 |Checks| McGlohon, Akoglu, Faloutsos KDD08 30 Observation 4 and 5 Q4: How do the weights of nodes relate to degree? Q5: Does this relation change over time? McGlohon, Akoglu, Faloutsos KDD08 31 Observation 4: Snapshot Power Law ● ● At any time, total incoming weight of a node is proportional to in degree with PL exponent, iw. 1.01 < iw < 1.26, super-linear More donors, even more $ Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors In-weights ($) Edges (# donors) McGlohon, Akoglu, Faloutsos KDD08 32 Observation 5: Snapshot Power Law ● For a given graph, this exponent is constant over time. Orgs-Candidates exponent Time McGlohon, Akoglu, Faloutsos KDD08 33 Outline ● Motivation ● Related work ● Preliminaries ● Data ● Observations ● ● Q6: Is there a generative, “emergent” model? Summary McGlohon, Akoglu, Faloutsos KDD08 34 Goals of model ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution McGlohon, Akoglu, Faloutsos KDD08 35 Goals of model ● a) Emergent, intuitive behavior ● b) Shrinking diameter ● c) Constant NLCC’s ● d) Densification power law ● e) Power-law degree distribution = “Butterfly” Model McGlohon, Akoglu, Faloutsos KDD08 36 Butterfly model in action ● A node joins a network, with own parameter. pstep n1 n3 n2 n8 “Curiosity” n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 37 Butterfly model in action ● A node joins a network, with own parameter. ● With (global) phost, chooses a random host n1 n3 n2 phost “Cross-disciplinarity” n8 n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 38 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) phost, chooses a random host – With (global) plink, creates link n1 n3 n2 plink n8 “Friendliness” n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 39 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) phost, chooses a random host – With (global) plink, creates link – With pstep travels to random neighbor n1 n3 n2 n8 pstep n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 40 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) phost, chooses a random host – With (global) plink, creates link – With pstep travels to random neighbor. Repeat. n1 n3 n2 n8 plink n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 41 Butterfly model in action ● A node joins a network, with own parameters. ● With (global) phost, chooses a random host – With (global) plink, creates link – With pstep travels to random neighbor. Repeat. n1 n3 n2 n8 pstep n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 42 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With phost, choose new host, possibly link, etc. n1 n3 n2 n8 phost n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 43 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With phost, choose new host, possibly link, etc. n1 n3 n2 n8 phost n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 44 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1 n3 n2 n8 plink n4 n5 n6 n7 McGlohon, Akoglu, Faloutsos KDD08 45 Butterfly model in action ● Once there are no more “steps”, repeat “host” procedure: – With phost, choose new host, possibly link, etc. – Until no more steps, and no more hosts. n1 n3 n2 n8 n4 n5 pstep n6 n7 McGlohon, Akoglu, Faloutsos KDD08 46 a) Emergent, intuitive behavior Novelties of model: ● Nodes link with probability – ● Incoming nodes are “social butterflies” – ● May choose host, but not link (start new component) May have several hosts (merges components) Some nodes are friendlier than others – pstep different for each node – This creates power-law degree distribution (theorem) McGlohon, Akoglu, Faloutsos KDD08 47 Validation of Butterfly ● Chose following parameters: – phost= 0.3 – plink = 0.5 – pstep ~ U(0,1) ● Ran 10 simulations ● 100,000 nodes per simulation McGlohon, Akoglu, Faloutsos KDD08 48 b) Shrinking diameter ● Shrinking diameter – In model, gelling usually occurred around N=20,000 N=20,000 Diameter Nodes McGlohon, Akoglu, Faloutsos KDD08 49 c) Oscillating NLCC’s ● Constant / oscillating NLCC’s N=20,000 NLCC size Nodes McGlohon, Akoglu, Faloutsos KDD08 50 d) Densification power law ● Densification: – Our datasets had a=(1.03, 1.7) – In [Leskovec+05-KDD], a= (1.1, 1.7) – Simulation produced a = (1.1,1.2) Edges N=20,000 Nodes McGlohon, Akoglu, Faloutsos KDD08 51 e) Power-law degree distribution ● Power-law degree distribution – Exponents approx -2 Count Degree McGlohon, Akoglu, Faloutsos KDD08 52 Summary ● Studied several diverse public graphs – Measured at many timestamps – Unipartite and bipartite – Blogs, citations, real-world, network traffic – Largest was 6 million nodes, 10 million edges McGlohon, Akoglu, Faloutsos KDD08 53 Summary ● Observations on unweighted graphs: A1: The GCC emerges at the “gelling point” A2: NLCC’s are of constant / oscillating size ● Observations on weighted graphs: A3: Total weight increases super-linearly with edges A4: Node’s weights increase super-linearly with degree, power law exponent iw A5: iw remains constant over time ● A6: Intuitive, emergent generative “butterfly” model, that matches properties McGlohon, Akoglu, Faloutsos KDD08 54 References [Barabasi+99] Barabasi, A. L. & Albert, R. (1999), 'Emergence of scaling in random networks', Science 286(5439), 509--512. [Erdos+60] Erdos, P. & Renyi, A. (1960), 'On the evolution of random graphs', Publ. Math. Inst. Hungary. Acad. Sci. 5, 17-61. [Faloutsos*99] Faloutsos, M.; Faloutsos, P. & Faloutsos, C. (1999), 'On Power-law Relationships of the Internet Topology', SIGCOMM, 251-262. [Kumar+99]. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal. Stochastic models for the Web graph. Proceedings of the 41th FOCS. 2000, pp. 57-65 [Kumar+06] Kumar, R.; Novak, J. & Tomkins, A. (2006), Structure and evolution of online social networks, in 'KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowedge Discover and Data Mining', pp. 611—617. [Leskovec+05KDD] Leskovec, J.; Kleinberg, J. & Faloutsos, C. (2005), Graphs over time: densification laws, shrinking diameters and possible explanations, in 'KDD '05. [Leskovec+07] Leskovec, J.; Faloutsos, C. Scalable modeling of real graphs using Kronecker Multiplication. ICML 2007. [Milgram+67] Milgram, S. (1967), 'The small-world problem', Psychology Today 2, 60—67. [Pennock+02] Winners don’t take all: Characterizing the competition for links on the web McGlohon, Akoglu, Faloutsos KDD08 55 Contact us Leman Akoglu www.andrew.cmu.edu/~lakoglu lakoglu@cs.cmu.edu Christos Faloutsos www.cs.cmu.edu/~christos christos@cs.cmu.edu Mary McGlohon www.cs.cmu.edu/~mmcgloho mmcgloho@cs.cmu.edu McGlohon, Akoglu, Faloutsos KDD08 56 Entropy plots [Wang+2002] ● ● From time series data, begin with resolution r= T/2. Record entropy HR Entropy D Weights Time Resolution McGlohon, Akoglu, Faloutsos KDD08 57 Entropy plots ● ● From time series data, begin with resolution r= T/2. Record entropy HR` Entropy D Weights Time Resolution McGlohon, Akoglu, Faloutsos KDD08 58 Entropy plots ● ● ● From time series data, begin with resolution r= T/2. Record entropy HR Recursively take finer resolutions. Entropy D Weights Time Resolution McGlohon, Akoglu, Faloutsos KDD08 59 Entropy plots ● ● ● From time series data, begin with resolution r= T/2. Record entropy HR Recursively take finer resolutions. Entropy D Weights Time Resolution McGlohon, Akoglu, Faloutsos KDD08 60 Entropy Plots ● Self-similarity Linear plot ● Entropy s= 0.59 Resolution McGlohon, Akoglu, Faloutsos KDD08 61 Entropy Plots ● Self-similarity Linear plot Uniform: slope of plot s=1. time s= 0.59 Entropy ● Resolution McGlohon, Akoglu, Faloutsos KDD08 62 Entropy Plots ● Self-similarity Linear plot Uniform: slope of plot s=1. Point mass: s=0 time time s= 0.59 Entropy ● Resolution McGlohon, Akoglu, Faloutsos KDD08 63 Entropy Plots ● Self-similarity Linear plot Uniform: slope of plot s=1. Point mass: s=0 time time s= 0.59 Entropy ● Bursty: 0.2 < s < 0.9 Resolution McGlohon, Akoglu, Faloutsos KDD08 64