Part II: Complex Networks Empirical Properties and Metrics Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr Eurecom, Sophia-Antipolis Textbooks “Networks, Crowds, and Markets: Reasoning About a Highly Connected World” by D. Easley and T. Kleinberg (“NCM”: publicly available online) · “Networks: An Introduction” by M. Newman – (“Networks”: shared copies in library) Networked Life: 20 Questions and Answers by M.Chiang (some chapters - shared copies in library) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 2 What is a Network? A set of “nodes” Humans, routers, web pages, telephone switches, airports, proteins, scientific articles … Relations between these nodes humans: friendship/relation or online friendship routers, switches: connected by a communication link web pages: hyperlinks from one to other airports: direct flights between them articles: one citing the other proteins: link if chemically interacting Network often represented as a graph: vertex = node link relation (weight strength) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 3 Social Networks (of the past) The social network of friendships within a 34-person karate club provides clues to the fault lines that eventually split the club apart (Zachary, 1977) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 Social Networks (of the past) High school dating Peter S. Bearman, James Moody and Katherine Stovel Chains of affection: The structure of adolescent romantic and sexual networks American Journal of Sociology 110 44-91 (2004) Image drawn by Mark Newman Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 5 Network Research of the Past Mostly done by Social Scientists Interested in Human (Social) Networks Spread of Diseases, Influence, etc. Methodology: Questionnaires cumbersome, (lots of) bias Network Size: 10s or at most 100s Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 6 Email Network Email flows amongst a large project team. Colors denote each participant’s department Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 7 (Online) Social Networks Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 8 (a subset!) of the Internet Graph Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 9 The Science of Complex Networks The study of large networks coming from all sorts of diverse areas We will focus on technological (e.g. Internet) and information networks (e.g. Web, Facebook) Cannot visually observe such networks (as in the case of old social networks of few 10s of nodes) need ways to measure them, and quantify their properties The field is often called Social Networks or Network Science or Network Theory Question 1: What are the statistical properties of real networks? Connectivity, paths lengths, degree distributions How do we measure such huge networks sampling Question 2: Why do these properties arise? Models of large networks: random graphs Deterministic ways too complex/restrictive Question 3: How can we take advantage of these properties? Connectivity (epidemiology, resilience) Spread (information, disease) Search (Web page, person) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 10 Part I: Network Properties of Interest There are a lot of different properties we might be interested in also depends on application But there are some commonly studied properties for 2 reasons: 1. 2. These properties are important for key applications The majority of networks exhibit surprising similarities with respect to these properties. 1. Degree distribution (“scale free structure”) 2. Path length (“small world phenomena”) 3. Clustering (“community structure”) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 11 Measuring Real Networks: Degree distributions Problem: find the probability distribution that best fits the observed data frequency fk = fraction of nodes with degree k = probability of a randomly selected node to have degree k fk k Thrasyvoulos Spyropoulos / spyropou@eurecom.fr degree Eurecom, Sophia-Antipolis Basic Graph Properties: Revision Material Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 13 Undirected Graphs Graph G=(V,E) V = set of vertices E = set of edges 2 1 3 5 undirected graph E={(1,2),(1,3),(2,3),(3,4),(4,5)} Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 Directed Graphs Graph G=(V,E) V = set of vertices E = set of edges 2 1 3 5 directed graph E={‹1,2›, ‹2,1› ‹1,3›, ‹3,2›, ‹3,4›, ‹4,5›} Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 Weighted and Unweighted Graphs Edges have / do not have a weight associated with them 8 13 4 5 weighted Thrasyvoulos Spyropoulos / spyropou@eurecom.fr unweighted Eurecom, Sophia-Antipolis Undirected graph: Degree Distribution degree d(i) of node i 2 number of edges incident on node i degree distribution 1 node with degree 1 1 3 nodes with degree 2 1 node with degree 3 P(1) = 1/5, P(2) = 3/5, P(3) = 1/5 3 5 3 2 1 1 2 3 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr degree Eurecom, Sophia-Antipolis 4 Undirected Graph: Degree Distribution P(k) 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 k Network Science: Graph Theory January 24, Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 2011 Directed Graph: In- and Out-Degree 2 in-degree din(i) of node i number of edges pointing to node i out-degree dout(i) of node i number of edges leaving node i 1 in-degree sequence 5 [1,2,1,1,1] out-degree sequence [2,1,2,1,0] Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 3 Eurecom, Sophia-Antipolis 4 Paths Path from node i to node j: a sequence of edges (directed or undirected from node i to node j) path length: number of edges on the path nodes i and j are connected cycle: a path that starts and ends at the same node 2 2 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1 3 5 4 Eurecom, Sophia-Antipolis 4 Shortest Paths Shortest Path from node i to node j also known as BFS path, or geodesic path 2 2 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1 3 5 4 Eurecom, Sophia-Antipolis 4 Diameter The longest shortest path in the graph 2 2 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1 3 5 4 Eurecom, Sophia-Antipolis 4 Undirected graph: Components Connected graph: a graph where every pair of nodes is connected Disconnected graph: a graph that is not connected Connected Components: subsets of vertices that are connected 2 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 Fully Connected Graph Clique Kn A graph that has all possible n(n-1)/2 edges 2 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 4 Eurecom, Sophia-Antipolis Directed Graph 2 Strongly connected graph: there exists a path from every i to every j Weakly connected graph: If edges are made to be undirected the graph is connected Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1 3 5 Eurecom, Sophia-Antipolis 4 Adjacency Matrix: Undirected Graph Adjacency Matrix symmetric matrix for undirected graphs 2 0 1 A 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1 Eurecom, Sophia-Antipolis 3 5 4 Adjacency Matrix: Directed Graph Adjacency Matrix non-symmetric matrix for undirected graphs 2 0 1 A 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 3 5 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 4 Examples of Adjacency Matrices 1 2 1 0 1 1 0 1 2 5 6 3 3 0 1 1 1 4 0 0 1 1 0 1 1 1 1 0 G1 1 2 0 1 0 1 0 1 0 0 0 G2 symmetric undirected: n2/2 directed: n2 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr G3 Eurecom, Sophia-Antipolis 0 1 1 0 0 0 0 0 7 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 Exponential distribution Probability of having k neighbors p(k) = λe-λk Identified by a line in the log-linear plot log p(k) = - λk + log λ log frequency λ degree Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Power-law distributions Right-skewed/Heavy-tail distribution p(k) = Ck-α there is a non-negligible fraction of nodes that has very high degree (hubs) scale-free: f(ax) = bf(x), no characteristic scale, average is not informative Power-law distribution gives a line in the log-log plot log p(k) = -α logk + logC log frequency frequency α log degree degree α : power-law exponent (typically 2 ≤ α ≤ 3) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Power Law vs. Exponential Distribution This difference is particularly obvious if we plot them on a log vertical scale: for large x there are orders of magnitude differences between the two functions. 0 10 0 10 1 10 2 10 3 10 f ( x ) cx 0.5 f ( x ) cx 0.5 10-1 f ( x ) cx 1 10-2 f ( x ) c x 10-3 f ( x ) c x f ( x ) cx 1 loglog semilog 10-4 Network Science: Scale-Free Property February 7, Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 2011 Internet Topology Primer Internet backbone and regional connectivity Multi-tier AS topology Gateway Routers inside ASs Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 32 Internet Degree Distribution Holds for both AS and Router topologies Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 33 Degree Distribution for Other Networks Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 34 Power Law Exponent in Real Networks (M. Newman 2003) α : power-law exponent (typically 2 ≤ α ≤ 3) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 35 Measuring path length dij = shortest path between i and j dij Diameter: d max i, j Average path length: 1 dij n(n - 1)/2 i j Also of interest: distribution of all shortest paths Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Path Length: Lattice Network A total of n nodes arranged in a grid Only neighbors (up,down,left,right) connected Q: What is the diameter of the network? A: 2 n -2 Q: What is the avg. distance? n n Thrasyvoulos Spyropoulos / spyropou@eurecom.fr i.e. picking two nodes randomly A: It is in the order of n (i.e. c n ) Eurecom, Sophia-Antipolis 37 Path Length: Random Geometric Network n wireless nodes in an area of 1x1 Each transmits at distance R R must be at connectivity logn for Ο least n Q: Choose two random nodes: What is the expected hop count (distance) between them? n A: Ο logn Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 38 Millgram’s small world experiment Letters were handed out to people in Nebraska to be sent to a target in Boston People were instructed to pass on the letters to someone they knew on first-name basis ~60 letters, only about 35% delivered The letters that reached the destination followed paths of length around 6 Six degrees of separation: (play of John Guare) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Millgram’s small world experiment: Email Version In 2001, Duncan Watts, a professor at Columbia University, recreated Milgram's experiment using an e-mail message as the “package" that needed to be delivered. Surprisingly, after reviewing the data collected by 48,000 senders and 19 targets in 157 different countries, Watts found that again the average number of intermediaries was 6. Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Kevin Bacon number: link 2 actors in same movie Let’s make it legal Austin Powers: The spy who shagged me Robert Wagner Wild Things What Price Glory Barry Norton A Few Good Men Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Monsieur Verdoux Eurecom, Sophia-Antipolis Kevin Bacon Number (statistics from IMDB) ~740000 linkable actors Average (path length) = 3 99% of actors less than 6 hops Try your own actor here: http://www.cs.virginia.edu/oracle/ Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 42 Erdos number: collaboration networks Legendary mathematician Paul Erdos, around 1500 papers and 509 collaborators Collaboration Graph: link between two authors who wrote a paper together Erdos number of X: hop count between Erdos and author X in collaboration graph ~260,000 in connected component T. Spyropoulos Kostas Psounis Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Kostas Psounis Eurecom, Sophia-Antipolis 43 Internet Path Lengths Number of AS traversed by an email message • ~35000 nodes • Avg. path ~ 5! Number of routers traversed by an email message • >200000 • Avg. path ~ 15 plots taken from R. V. Hofstad Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 44 Internet Path Length: Different Continents Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 45 Measurement Findings: Path Length Milgram’s experiment => Small World Phenomenon Short paths exist between most nodes: Path length l << total nodes N (e.g line network: path length l = O(N)) “Small world” = avg. path length l is at most logN Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 46 Clustering (Transitivity) coefficient Measures the density of triangles (local clusters) in the graph Two different ways to measure it: 1 4 3 2 The ratio of the means C (1) triangles centered at node i triples centered at node i i i Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 5 Example 1 4 3 2 Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 5 Eurecom, Sophia-Antipolis C (1) 3 3 1 1 6 8 Clustering (Transitivity) coefficient Clustering coefficient for node i triangles centered at node i Ci triples centered at node i The mean of the ratios C Thrasyvoulos Spyropoulos / spyropou@eurecom.fr (2) 1 Ci n Eurecom, Sophia-Antipolis Clustering Coeff. In Real Nets (M. Newman 2003) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Summary of Findings Most real networks have… 1. Short paths between nodes (“small world”) 2. Transitivity/Clustering coefficient that is finite > 0 3. Degree distribution that follows a power law Q1. Can we design graph models that exhibit similar characteristics? Q2. Can we explain how/why these phenomena occur in the first place? Q3. Can we take advantage of these properties (e.g. searching, advertising, viral infection/immunization, etc.)? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 51