Graphs Definitions, Measures, Graphology, Pathology, Degree Distributions Network Science: Graph Theory 2012 COMPONENTS OF A COMPLEX SYSTEM components: nodes, vertices N interactions: links, edges L system: network, graph (N,L) Network Science: Graph Theory 2012 NETWORKS OR GRAPHS? network often refers to real systems •www, •social network •metabolic network. Language: (Network, node, link) graph: mathematical representation of a network •web graph, •social graph (a Facebook term) Language: (Graph, vertex, edge) We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably. Network Science: Graph Theory 2012 A COMMON LANGUAGE friend Movie 1 Mary Peter brothers friend co-worker Albert Movie 3 Movie 2 Albert Protein 1 Actor 2 Actor 1 Actor 4 Actor 3 Protein 2 Protein 5 Protein 9 N=4 L=4 Network Science: Graph Theory 2012 CHOOSING A PROPER REPRESENTATION The choice of the proper network representation determines our ability to use network theory successfully. In some cases there is a unique, unambiguous representation. In other cases, the representation is by no means unique. For example,, the way we assign the links between a group of individuals will determine the nature of the question we can study. Network Science: Graph Theory 2012 UNDIRECTED VS. DIRECTED NETWORKS Undirected Directed Links: undirected (symmetrical) Links: directed (arcs). Graph: Digraph = directed graph: L A M F D B C I D G B E G H A C Undirected links : coauthorship links Actor network protein interactions An undirected link is the superposition of two opposite directed links. F Directed links : URLs on the www phone calls metabolic reactions Network Science: Graph Theory 2012 NODE DEGREES Directed Undirected Node degree: the number of links connected to the node. A kA =1 B In directed networks we can define an in-degree and out-degree. D B The (total) degree is the sum of in- and out-degree. C A kB = 4 E G kCin 2 kCout 1 kC 3 F Source: a node with kin= 0; Sink: a node with kout= 0. Graph density No. of the connections that may exist between n nodes directed graph emax = n*(n-1) undirected graph emax = n*(n-1)/2 What fraction are present? density = e/ emax For example, out of 12 possible connections, this graph has 7, giving it a density of 7/12 = 0.583 8 Graph density • Would this measure be useful for comparing networks of different sizes (different numbers of nodes)? • As n → ∞, a graph whose density reaches – 0 is a sparse graph – a constant is a dense graph 9 AVERAGE DEGREE Directed Undirected Recall: <x> = (x1 + x1 + ... + xN)/N = Σi xi /N 1 k N j i N k i 1 k º i 2L N N – the number of nodes in the graph 1 N L ki 2 i 1 L – the number of links in the graph D B k C in 1 N N k i 1 in i , k out 1 N N k i 1 out i , k in k out E A F k º L N Network Science: Graph Theory 2012 COMPLETE GRAPH The maximum number of links a network of N nodes can have is: Lmax = æç N ö÷ = N(N -1) è2ø 2 A graph with degree L=Lmax is called a complete graph, and its average degree is <k>=N-1 Network Science: Graph Theory 2012 REAL NETWORKS ARE SPARSE Most networks observed in real systems are sparse: L << Lmax or <k> <<N-1. WWW (ND Sample): Protein (S. Cerevisiae): Coauthorship (Math): Movie Actors: N=325,729; N= 1,870; N= 70,975; N=212,250; L=1.4 106 L=4,470 L=2 105 L=6 106 Lmax=1012 Lmax=107 Lmax=3 1010 Lmax=1.8 1013 <k>=4.51 <k>=2.39 <k>=3.9 <k>=28.78 (Source: Albert, Barabasi, RMP2002) Network Science: Graph Theory 2012 ADJACENCY MATRIX 4 4 3 2 3 2 1 1 Aij=1 if there is a link between node i and j Aij=0 if nodes i and j are not connected to each other. æ0 ç 1 ç Aij = ç0 ç è1 1 0 0 0 0 0 1 1 1ö ÷ 1÷ 1÷ ÷ 0ø Directed is symmetric. 0 0 Aij 0 0 1 0 0 0 0 0 1 1 1 0 0 0 Undirected is not symmetric. Network Science: Graph Theory 2012 ADJACENCY MATRIX a e h b f g d a b c d e f g h a 0 1 0 0 1 0 1 0 b 1 0 1 0 0 0 0 1 c 0 1 0 1 0 1 1 0 d 0 0 1 0 1 0 0 0 e 1 0 0 1 0 0 0 0 f 0 0 1 0 0 0 0 0 g 1 0 1 0 0 1 0 0 h 0 1 0 0 0 0 0 0 c Network Science: Graph Theory 2012 Directed 3 2 1 0 0 0 1 0 1 1ö ÷ 1÷ 1÷ ÷ 0ø Aij = A ji Aii = 0 4 2 1 3 æ0 ç 0 Aij = ç ç0 ç è0 1 0 0 0 0 1 0 1 Aij ¹ A ji Aii = 0 ki = N åA ij j =1 N k j = å Aij i=1 N N 1 1 L = å ki = å Aij 2 i=1 2 ij 1ö ÷ 0÷ 0÷ ÷ 0ø N 4 1 0 k inj = å Aij æ0 ç 1 Aij = ç ç0 ç è1 i=1 Undirected ADJACENCY MATRIX AND NODE DEGREES out i k = N åA ij i =1 N N N i=1 j=1 i, j L = å kiin = å k out = å Aij j Network Science: Graph Theory 2012 GRAPHOLOGY 1 Undirected 4 Directed 4 1 1 2 2 3 æ0 ç 1 ç Aij = ç1 ç è0 3 1 1 0 1 1 0 1 0 Aii = 0 N 1 L = å Aij 2 i, j=1 0ö ÷ 1÷ 0÷ ÷ 0ø Aij = A ji 2L < k >= N Actor network, protein-protein interactions æ0 ç 0 ç Aij = ç1 ç è0 1 0 0 1 0 0 0 0 Aii = 0 L= N åA ij i, j=1 0ö ÷ 1÷ 0÷ ÷ 0ø Aij ¹ A ji L < k >= N WWW, citation networks Network Science: Graph Theory 2012 GRAPHOLOGY 2 Unweighted Weighted 4 (undirected) 4 (undirected) 1 1 2 2 3 æ0 ç 1 Aij = ç ç1 ç è0 Aii = 0 N 1 L = å Aij 2 i, j=1 3 1 1 0 1 1 0 1 0 0ö ÷ 1÷ 0÷ ÷ 0ø Aij = A ji 2L < k >= N protein-protein interactions, www æ 0 ç 2 Aij = ç ç 0.5 ç è 0 2 0.5 0 1 1 0 4 0 Aii = 0 N 1 L = å nonzero(Aij ) 2 i, j=1 0ö ÷ 4÷ 0÷ ÷ 0ø Aij = A ji 2L < k >= N Call Graph, metabolic networks Network Science: Graph Theory 2012 GRAPHOLOGY 3 Self-interactions Multigraph (undirected) 4 1 4 1 2 2 3 æ1 ç 1 Aij = ç ç1 ç è0 3 1 1 0 1 1 0 1 0 Aii 0 N 1 N L Aii Aij 2 i , j 1,i j i 1 Protein interaction network, www 0ö ÷ 1÷ 0÷ ÷ 1ø Aij A ji æ0 ç 2 Aij = ç ç1 ç è0 2 1 0 1 1 0 3 0 Aii = 0 N 1 L = å nonzero(Aij ) 2 i, j=1 0ö ÷ 3÷ 0÷ ÷ 0ø Aij = A ji 2L < k >= N Social networks, collaboration networks Network Science: Graph Theory 2012 GRAPHOLOGY 4 Complete Graph (undirected) 4 1 2 3 æ0 ç 1 Aij = ç ç1 ç è1 1 1 0 1 1 0 1 1 Aii = 0 N(N -1) L = Lmax = 2 1ö ÷ 1÷ 1÷ ÷ 0ø Ai¹ j = 1 < k >= N -1 Actor network, protein-protein interactions Network Science: Graph Theory 2012 GRAPHOLOGY: Real networks can have multiple characteristics WWW > Protein Interactions > directed multigraph with self-interactions undirected unweighted with self-interactions Collaboration network > Mobile phone calls > Facebook Friendship links > undirected multigraph or weighted. directed, weighted. undirected, unweighted. Network Science: Graph Theory 2012 BIPARTITE GRAPHS bipartite graph (or bigraph) is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets. Examples: Hollywood actor network Collaboration networks Disease network (diseasome) Network Science: Graph Theory 2012 PATHS A path is a sequence of nodes in which each node is adjacent to the next one Pi0,in of length n between nodes i0 and in is an ordered collection of n+1 nodes and n links Pn = {i0,i1,i2,...,in } Pn = {(i0 ,i1),(i1,i2 ),(i2 ,i3 ),...,(in-1,in )} B •A path can intersect itself and pass through the same link repeatedly. Each time a link is crossed, it is counted separately •A legitimate path on the graph on the right: ABCBCADEEBA A E C D • In a directed network, the path can follow only the direction of an arrow. Network Science: Graph Theory 2012 DISTANCE IN A GRAPH Shortest Path, Geodesic Path The distance (shortest path, geodesic path) between two B nodes is defined as the number of edges along the shortest A path connecting them. C D *If the two nodes are disconnected, the distance is infinity. In directed graphs each path needs to follow the direction of B the arrows. A Thus in a digraph the distance from node A to B (on an AB path) is generally different from the distance from node B to A D C (on a BCA path). Network Science: Graph Theory 2012 NUMBER OF PATHS BETWEEN TWO NODES Adjacency Matrix Nij, number of paths between any two nodes i and j: Length n=1: If there is a link between i and j, then Aij=1 and Aij=0 otherwise. Length n=2: If there is a path of length two between i and j, then AikAkj=1, and AikAkj=0 otherwise. The number of paths of length 2: N N ij( 2 ) Aik Akj k 1 Length n: In general, if there is a path of length n between i and j, then Aik…Alj=1 and Aik…Alj=0 otherwise. The number of paths of length n between i and j is N ij = [A ]ij (n) n Network Science: Graph Theory 2012 FINDING DISTANCES: BREADTH FIRST SEARCH Distance between node 1 and node 4: 1.Start at 1. 3 3 4 3 4 2 4 2 3 1 1 1 2 3 3 4 4 4 1 3 4 2 4 2 3 Network Science: Graph Theory 2012 FINDING DISTANCES: BREADTH FIRST SEATCH Distance between node 1 and node 4: 1.Start at 1. 2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue. 3 3 4 3 4 2 4 2 3 1 1 1 2 3 3 4 4 4 1 3 4 2 4 2 3 Network Science: Graph Theory 2012 FINDING DISTANCES: BREADTH FIRST SEATCH Distance between node 1 and node 4: 1.Start at 1. 2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue. 3.Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue. 3 3 4 3 4 2 4 2 3 1 1 1 2 3 4 4 3 4 1 3 4 2 4 2 3 Network Science: Graph Theory 2012 FINDING DISTANCES: BREADTH FIRST SEATCH Distance between node 1 and node 4: 1.Repeat until you find node 4 or there are no more nodes in the queue. 2.The distance between 1 and 4 is the label of 4 or, if 4 does not have a label, infinity. 3 3 4 3 4 2 4 2 3 1 1 1 2 3 3 4 4 4 1 3 4 2 4 2 3 Network Science: Graph Theory 2012 NETWORK DIAMETER AND AVERAGE DISTANCE Diameter: dmax the maximum distance between any pair of nodes in the graph; along the shortest path. Average path length/distance, <d>, for a connected graph: 1 d d ij where dij is the distance from node i to node j No. of Links i , j i Network Science: Graph Theory 2012 THE BRIDGES OF KONIGSBERG 1736: Konigsberg (Prussia) 7 bridges over river Pregel connected 4 land masses. Can one walk across the seven bridges and never cross the same bridge twice? Euler PATH or CIRCUIT: return to the starting point by traveling each link of the graph once and only once. Network Science: Graph Theory 2012 EULERIAN GRAPH: it has an Eulerian path Every vertex of this graph has an even degree, therefore this is an Eulerian graph. Following the edges in alphabetical order gives an Eulerian circuit/cycle. http://en.wikipedia.org/wiki/Euler_circuit Network Science: Graph Theory 2012 EULER CIRCUITS IN DIRECTED GRAPHS B If a digraph is strongly connected and the indegree of each node is equal to its out-degree, then there is an Euler circuit D A C E F G Otherwise there is no Euler circuit. in a circuit we need to enter each node as many times as we leave it. Network Science: Graph Theory 2012 PATHOLOGY: summary Path Shortest Path 1 1 2 5 2 5 3 4 3 4 A sequence of nodes such that each node is connected to the next node along the path by a link. The path with the shortest length between two nodes (distance). Network Science: Graph Theory 2012 PATHOLOGY: summary Diameter Average Path Length 1 1 2 5 2 5 3 4 3 4 The longest shortest path in a graph The average of the shortest paths for all pairs of nodes. Network Science: Graph Theory 2012 PATHOLOGY: summary Cycle Self-avoiding Path 1 1 2 5 2 5 3 4 3 4 A path with the same start and end node. A path that does not intersect itself. Network Science: Graph Theory 2012 PATHOLOGY: summary Eulerian Path Hamiltonian Path 1 1 2 5 2 5 3 4 3 4 A path that traverses each link exactly once. A path that visits each node exactly once. Network Science: Graph Theory 2012 CONNECTIVITY OF UNDIRECTED GRAPHS Connected (undirected) graph: any two vertices can be joined by a path. A disconnected graph is made up by two or more connected components. B B A D F C D F F G Largest Component: Giant Component A C F The rest: Isolates G Bridge: if we erase it, the graph becomes disconnected. Network Science: Graph Theory 2012 CONNECTIVITY OF UNDIRECTED GRAPHS Adjacency Matrix The adjacency matrix of a network with several components can be written in a blockdiagonal form, so that nonzero elements are confined to squares, with all other elements being zero: Figure after Newman, 2010 Network Science: Graph Theory 2012 THREE CENTRAL QUANTITIES IN NETWORK SCIENCE Degree distribution pk Average path length <d> Clustering coefficient C Network Science: Graph Theory 2012 STATISTICS REMINDER We have a sample of values x1, ..., xN Distribution of x (a.k.a. PDF): probability that a randomly chosen value is x P(x) = (# values x) / N Σi P(xi) = 1 always! Histograms >>> Network Science: Graph Theory 2012 pdf -- the continuous case Will work on the board Network Science: Graph Theory 2012 DEGREE DISTRIBUTION Degree distribution P(k): probability that a randomly chosen vertex has degree k Nk = # nodes with degree k P(k) = Nk / N plot P(k) 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 k Network Science: Graph Theory 2012 DEGREE DISTRIBUTION discrete representation: pk is the probability that a node has degree k. continuum description: p(k) is the pdf of the degrees, where k2 ò p(k)dk k1 represents the probability that a node’s degree is between k1 and k2. Normalization condition: ¥ åp k 0 =1 ¥ ò p(k)dk = 1 K min where Kmin is the minimal degree in the network. Network Science: Graph Theory 2012 CLUSTERING COEFFICIENT Clustering coefficient: What portion of your neighbors are connected? Captures the degree to which the neighbors of a given node link to each other. Node i with degree Ki ei is the number of links between the Ki neighbors of node i Ci in [0,1] Network Science: Graph Theory 2012 THREE CENTRAL QUANTITIES IN NETWORK SCIENCE A. Degree distribution: B. Path length: C. Clustering coefficient: pk <d> Network Science: Graph Theory 2012