Graph Algorithms B. S. Panda Department of Mathematics IIT Delhi What is a Graph? Graph G = (V,E) ( simple, Undirected) Set V of vertices (nodes): A non-empty finite set Set E of edges: E V2 , 2-element subsets of V. Elements of E are unordered pairs {v,w} where v,w V. So a graph is an ordered pair of two sets V and E. Variations: Directed graph, ( E V V) Weighted graph W: ER, c: V R 2 What is a Graph? Modeling view: G=(V,E) V: A finite set of objects E: A symmetric, irreflexive binary relation on V. When E becomes just a relation it gives rise to a directed graph. 3 What is a Graph? Any n X n 0-1 symmetric matrix is a graph Vertices are row ( colum) numbers and there is an edge between two vertices i and j if the ij th entry is 1. (Adjacency matrix) 4 Graphs: Diagramatic representation Vertices (aka nodes) 5 Graphs — Diagramatic Representation Vertices (aka nodes) 3 2 1 4 5 6 Edges 6 Graphs — Weighted graphs Vertices (aka nodes) 3 618 1 2 2273 211 190 4 1987 344 318 5 2145 2462 Weights 6 Undirected Edges 7 Directed Graph (digraph) Vertices (aka nodes) 3 2 1 4 5 6 Edges 8 Why Graph Theory? Discrete Mathematics and in particular Graph Theory is the language of Computer Scientist. Tarjan & Hopcroft ( 1986): ACM Turing Award For fundamental contributions to “ Graph Algorithms and Data structures” Planarity testing and Splay Trees are Key contributions among others. Navanlina Prize in ICM 2010 ( Hyderabad) Daniel Spielman for his work “ Application of Graph Theory to Computer Science” 9 Why Graph Theory? ABEL PRIZE ( 2012) Endre Szemeredi For his fundamental contributions to Discrete Mathematics and Theoretical computer Science Is P=NP? ( 1 Milion $ open Problem) Many Decision problems concerning Graphs are NP Complete. ( HC Decision Problem etc) P=NP iff any NP-complete problem belongs to P. One possible way of attacking the above problem is through Graph Theory . 10 Terminology Path: a sequence of vertices (v[1], v[2],...,v[k]) s.t. <i<k (v[i],v[i+1]) E for all 0 Simple Path: No vertex is repeated in a simple path The length of a path is number of vertices in it minus 1. Cycle : a simple path that begins and ends with the same node Cyclic graph – contains at least one cycle Acyclic graph - no cycles 11 Terminology, cont’d Subgraph of a graph G =(V,E) is H=(V’,E’) if V’V and E’ E. Vertex Induced Subgraph: Induced by vertex set S, G[S]=(S,E’), where E’ ={ xy| xy E and x,y S}. Edge Induced subgraph: Induced by E’ , G[E’]=(V’,E’), V’={x| xy E’ for some y V} Connected graph: a graph where for every pair of nodes there exists a path between them. Connected component of a graph G a maximal connected subgraph of G. 12 Tree A connected acyclic Graph is a tree. Spanning tree of a connected graph G=(V,E): a sub graph T=(V,E’) of G, and T is a tree, i.e. T spans the whole of V of G. W(T)= sum of the weights of all edges of T. Minimum Spanning Tree ( MST): Spanning tree having minimum weight of a edge weighted connected graph. 13 Graph Representation Two popular computer representations of a graph. Both represent the vertex set and the edge set, but in different ways. 1. 2. Adjacency Matrix Use a 2D matrix to represent the graph Adjacency List Use a 1D array of linked lists Adjacency Matrix 2D array A[0..n-1, 0..n-1], where n is the number of vertices in the graph Each row and column is indexed by the vertex id e,g a=0, b=1, c=2, d=3, e=4 A[i][j]=1 if there is an edge connecting vertices i and j; otherwise, A[i][j]=0 The storage requirement is Θ(n2). It is not efficient if the graph has few edges. An adjacency matrix is an appropriate representation if the graph is dense: |E|=Θ(|V|2) We can detect in O(1) time whether two vertices are connected. Adjacency List If the graph is not dense, in other words, sparse, a better solution is an adjacency list The adjacency list is an array A[0..n-1] of lists, where n is the number of vertices in the graph. Each array entry is indexed by the vertex id Each list A[i] stores the ids of the vertices adjacent to vertex i Adjacency Matrix Example 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 1 0 8 1 0 0 1 1 0 0 0 1 0 1 2 2 0 1 0 0 1 0 0 0 1 0 9 3 0 1 0 0 1 1 0 0 0 0 1 3 4 4 0 0 1 1 0 0 0 0 0 0 7 6 5 5 0 0 0 1 0 0 1 0 0 0 6 0 0 0 0 0 1 0 1 0 0 7 0 1 0 0 0 0 1 0 0 0 8 1 0 1 0 0 0 0 0 0 1 9 0 1 0 0 0 0 0 0 1 0 Adjacency List Example 0 8 2 9 1 7 3 4 6 5 0 8 1 2 3 7 9 2 1 4 8 3 1 4 5 4 2 3 5 3 6 6 5 7 7 1 6 8 0 2 9 9 1 8 Storage of Adjacency List The array takes up Θ(n) space Define degree of v, deg(v), to be the number of edges incident to v. Then, the total space to store the graph is proportional to: deg( v) vertex v An edge e={u,v} of the graph contributes a count of 1 to deg(u) and contributes a count 1 to deg(v) Therefore, Σvertex vdeg(v) = 2m, where m is the total number of edges In all, the adjacency list takes up Θ(n+m) space If m = O(n2) (i.e. dense graphs), both adjacent matrix and adjacent lists use Θ(n2) space. If m = O(n), adjacent list outperform adjacent matrix However, one cannot tell in O(1) time whether two vertices are connected Adjacency List vs. Matrix Adjacency List More compact than adjacency matrices if graph has few edges Requires more time to find if an edge exists Adjacency Matrix Always require n2 space This can waste a lot of space if the number of edges are sparse Can quickly find if an edge exists Graph Traversals BFS ( breadth first search) DFS ( depth first search) Algorithm Generic GraphSearch for i=1 to n do { Mark[i]=0; P[i]=0; } Mark[1]=1; P[1]=1; Num[1]=1; Count=2; S={1}; While ( S ≠ V) { select a vertex x from S; if ( x has an unmarked neighbor y) { Mark[y]=1; P[y]=x; Num[y]=count; count=count+1; S=S {y}; } else S=S-{x}; } } BFS and DFS If S is implemented using a Queue, the search becomes BFS If S is implemented using a stack, the search becomes DFS BFS Algorithm // flag[ ]: visited table Why use queue? Need FIFO BFS for each vertex v flag[v]=false; For all vertices v if (flag[v]=false) BFS(v); BFS Example Adjacency List 0 8 source 2 9 1 7 3 6 4 Visited Table (T/F) 0 F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 F 9 F 5 Initialize visited table (all False) Q= { } Initialize Q to be empty Adjacency List 0 8 source 2 9 1 7 3 4 6 Visited Table (T/F) 0 F 1 F 2 T 3 F 4 F 5 F 6 F 7 F 8 F 9 F 5 Flag that 2 has been visited Q= { 2 } Place source 2 on the queue Adjacency List Visited Table (T/F) 0 Neighbors 8 source 2 9 1 7 3 4 6 0 F 1 T 2 T 3 F 4 T 5 F 6 F 7 F 8 T 9 F 5 Mark neighbors as visited 1, 4, 8 Q = {2} → { 8, 1, 4 } Dequeue 2. Place all unvisited neighbors of 2 on the queue Adjacency List 0 8 source 2 9 1 7 3 4 Neighbors 6 Visited Table (T/F) 0 T 1 T 2 T 3 F 4 T 5 F 6 F 7 F 8 T 9 T 5 Mark new visited Neighbors 0, 9 Q = { 8, 1, 4 } → { 1, 4, 0, 9 } Dequeue 8. -- Place all unvisited neighbors of 8 on the queue. -- Notice that 2 is not placed on the queue again, it has been visited! Adjacency List 0 2 9 1 7 3 4 0 T 1 T 2 T 3 T 4 T 5 F 6 F 7 T 8 T 9 T Neighbors 8 source Visited Table (T/F) 6 5 Mark new visited Neighbors 3, 7 Q = { 1, 4, 0, 9 } → { 4, 0, 9, 3, 7 } Dequeue 1. -- Place all unvisited neighbors of 1 on the queue. -- Only nodes 3 and 7 haven’t been visited yet. Adjacency List 0 8 source 2 9 Neighbors 1 7 3 4 6 5 Q = { 4, 0, 9, 3, 7 } → { 0, 9, 3, 7 } Dequeue 4. -- 4 has no unvisited neighbors! Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 F 6 F 7 T 8 T 9 T Adjacency List Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 F 6 F 7 T 8 T 9 T Neighbors 0 8 source 2 9 1 7 3 4 6 5 Q = { 0, 9, 3, 7 } → { 9, 3, 7 } Dequeue 0. -- 0 has no unvisited neighbors! Adjacency List 0 8 source 2 9 1 7 3 4 6 Neighbors 5 Q = { 9, 3, 7 } → { 3, 7 } Dequeue 9. -- 9 has no unvisited neighbors! Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 F 6 F 7 T 8 T 9 T Adjacency List 0 8 source Neighbors 2 9 1 7 3 4 6 Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 T 6 F 7 T 8 T 9 T 5 Mark new visited Vertex 5 Q = { 3, 7 } → { 7, 5 } Dequeue 3. -- place neighbor 5 on the queue. Adjacency List 0 8 source 2 9 1 Neighbors 7 3 4 6 Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T 5 Mark new visited Vertex 6 Q = { 7, 5 } → { 5, 6 } Dequeue 7. -- place neighbor 6 on the queue Adjacency List 0 8 source 2 9 Neighbors 1 7 3 4 6 5 Q = { 5, 6} → { 6 } Dequeue 5. -- no unvisited neighbors of 5 Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T Adjacency List 0 8 source 2 9 1 7 3 4 Neighbors 6 5 Q= {6}→{ } Dequeue 6. -- no unvisited neighbors of 6 Visited Table (T/F) 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T Adjacency List Visited Table (T/F) 0 8 source 2 9 1 7 3 4 6 0 T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T 5 What did we discover? Q= { } STOP!!! Q is empty!!! Look at “visited” tables. There exists a path from source vertex 2 to all vertices in the graph Time Complexity of BFS (Using Adjacency List) Assume adjacency list n = number of vertices m = number of edges O(n + m) Each vertex will enter Q at most once. Each iteration takes time proportional to deg(v) + 1 (the number 1 is to account for the case where deg(v) = 0 --- the work required is 1, not 0). Running Time Recall: Given a graph with m edges, what is the total degree? Σvertex v deg(v) = 2m The total running time of the while loop is: O( Σvertex v (deg(v) + 1) ) = O(n+m) this is summing over all the iterations in the while loop! Time Complexity of BFS (Using Adjacency Matrix) Assume adjacency Matrix n = number of vertices m = number of edges O(n2) Finding the adjacent vertices of v requires checking all elements in the row. This takes linear time O(n). Summing over all the n iterations, the total running time is O(n2). So, with adjacency matrix, BFS is O(n2) independent of the number of edges m. With adjacent lists, BFS is O(n+m); if m=O(n2) like in a dense graph, O(n+m)=O(n2). What Can you do with BFS? Find all the connected components Test whether G has a cycle or not Find a spanning tree of a connected graph Find shortest paths from s to every other vertices Test whether a graph has an odd cycle Many more!!! Definition of MST Let G=(V,E) be a connected, undirected graph. For each edge (u,v) in E, we have a weight w(u,v) specifying the cost (length of edge) to connect u and v. We wish to find a (acyclic) subset T of E that connects all of the vertices in V and whose total weight is minimized. Since the total weight is minimized, the subset T must be acyclic (no circuit). Thus, T is a tree. We call it a spanning tree. The problem of determining the tree T is called the minimum-spanning-tree problem. 43 Here is an example of a connected graph and its minimum spanning tree: 8 c b 4 7 8 d 2 11 a 7 h i 1 14 4 6 9 e 10 g 2 f Notice that the tree is not unique: replacing (b,c) with (a,h) yields another spanning tree with the same minimum weight. 44 Real Life Application of a MST A cable TV company is laying cable in a new neighborhood. If it is constrained to bury the cable only along certain paths, then there would be a graph representing which points are connected by those paths. Some of those paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. A minimum spanning tree would be the network with the lowest total cost. power outlet or light MST Applications Electrical wiring of a house using minimum amount of wires (cables) 46 MST Applications 47 MST Applications 48 Graph Representation 49 Minimum Spanning Tree for electrical eiring 50 Cycle Property 8 f Cycle Property: Let T be a minimum spanning tree of a weighted graph G. Let e be an edge of G that is not in T and let C be the cycle formed by e with T For every edge f of C, weight(f) weight(e) Proof: By contradiction If weight(f) > weight(e) we can get a spanning tree of smaller weight by replacing e with f 4 C 2 9 6 3 e 8 7 7 Replacing f with e yields a better spanning tree 8 f 4 C 2 9 6 3 8 Minimum Spanning Trees 7 e 7 Partition Property U f Partition Property: Consider a partition of the vertices of G into subsets U and V. Let e be an edge of minimum weight across the partition. There is a minimum spanning tree of G containing edge e Proof: Let T be an MST of G If T does not contain e, consider the cycle C formed by e with T and let f be an edge of C across the partition By the cycle property, weight(f) weight(e) Thus, weight(f) = weight(e) We obtain another MST by replacing f with e Minimum Spanning Trees V 7 4 9 5 2 8 8 3 e 7 Replacing f with e yields another MST U f 2 V 7 4 9 5 8 8 e 7 3 Borůvka’s Algorithm The first MST Algorithm was proposed by Otakar Borůvka in 1926 The algorithm was used to create efficient connections between the electricity network in the Czech Republic Prim’s Algorithm Initially discovered in 1930 by Vojtěch Jarník, then rediscovered in 1957 by Robert C. Prim Starts off by picking any node within the graph and growing from there Prim’s Algorithm Cont. Label the starting node, A, with a 0 and all others with infinite Starting from A, update all the connected nodes’ labels to A with their weighted edges if it less than the labeled value Find the next smallest label and update the corresponding connecting nodes Repeat until all the nodes have been visited Prim’s Algorithm Input: A connected Graph G=(V,E) and the cost matrix C of G. Output: A Minimum spanning tree T=(V,E) of G. Method: { Step 1: Let u be any arbitrary vertex of G. T=(V,E), where V = {u} and E = . Step 2: while ( V V) { Choose a least cost edge from V to V-V. Let e=xy be a least cost edge such that x V and y V-V. V= V {x}; E = E { e }; } } The execution of Prim's algorithm 8 the root vertex c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 57 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 58 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 59 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 60 8 c b 4 7 8 d 2 11 a 7 h i 1 14 4 6 9 e 10 g 2 f Bottleneck spanning tree: A spanning tree of G whose largest edge weight is minimum over all spanning trees of G. The value of the bottleneck spanning tree is the weight of the maximum-weight edge in T. Theorem: A minimum spanning tree is also a bottleneck spanning tree. (Challenge problem) 61 Prim’s Algorithm Example Prim’s Algorithm Example Kruskal’s Algorithm (1956 by Joseph Kruskal) Input: A connected Graph G=(V,E) and the cost matrix C of G. Output: A Minimum spanning tree T=(V,E) of G. Method: Step 1: Sort the edges of G in the non-decreasing order of their costs. Let the sorted list edges be e1, e2,…., em. Step 2: T=(V, E), where E=. i=1; count=0; while ( count < n-1 and i < m) { if ( T=(V, E { ei }) is acyclic) { E = E { ei }; count= count+1;} i=i+1; } The execution of Kruskal's algorithm •The edges are considered by the algorithm in sorted order by weight. •The edge under consideration at each step is shown with a red weight number. 8 c b 4 7 8 d 2 11 a 7 h i 1 14 4 6 9 e 10 g 2 f 65 8 c b 4 i 7 8 d 2 11 a 7 h 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 66 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 67 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 68 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 69 8 c b 4 7 8 d 2 11 a 7 h i 2 8 c 7 8 d 2 11 a f 7 b 4 h i 1 e 10 g 1 14 4 6 9 14 4 6 9 e 10 g 2 f 70 Growing a MST(Generic Algorithm) GENERIC_MST(G,w) 1 A:={} 2 while A does not form a spanning tree do 3 find an edge (u,v) that is safe for A 4 A:=A∪{(u,v)} 5 return A Set A is always a subset of some minimum spanning tree. This property is called the invariant Property. An edge (u,v) is a safe edge for A if adding the edge to A does not destroy the invariant. A safe edge is just the CORRECT edge to choose to add to T. 71 How to find a safe edge We need some definitions and a theorem. A cut (S,V-S) of an undirected graph G=(V,E) is a partition of V. An edge crosses the cut (S,V-S) if one of its endpoints is in S and the other is in V-S. An edge is a light edge crossing a cut if its weight is the minimum of any edge crossing the cut. 72 8 V-S↓ c b 4 S↑ 7 8 d 2 11 a 7 h i 1 14 4 6 9 e ↑S 10 g 2 f ↓ V-S • This figure shows a cut (S,V-S) of the graph. • The edge (d,c) is the unique light edge crossing the cut. 73 Theorem 1 Let G=(V,E) be a connected, undirected graph with a realvalued weight function w defined on E. Let A be a subset of E that is included in some minimum spanning tree for G.Let (S,V-S) be any cut of G such that for any edge (u, v) in A, {u, v} S or {u, v} (V-S). Let (u,v) be a light edge crossing (S,V-S). Then, edge (u,v) is safe for A. Proof: Let Topt be a minimum spanning tree.(Blue). A --a subset of Topt and (u,v)-- a light edge crossing (S, V-S). If (u,v) is NOT safe, then (u,v) is not in T. (See the red edge in Figure) There MUST be another edge (u’, v’) in Topt crossing (S, V-S).(Green) We can replace (u’,v’) in Topt with (u, v) and get another treeT’opt Since (u, v) is light (the shortest edge connect crossing (S, V-S), T’opt is also optimal. 1 1 1 1 2 1 1 1 74 Corollary .2 Let G=(V,E) be a connected, undirected graph with a realvalued weight function w defined on E. Let A be a subset of E that is included in some minimum spanning tree for G. Let C be a connected component (tree) in the forest GA=(V,A). Let (u,v) be a light edge (shortest among all edges connecting C with others components) connecting C to some other component in GA. Then, edge (u,v) is safe for A. (For Kruskal’s algorithm) 75 The algorithms of Kruskal and Prim The two algorithms are elaborations of the generic algorithm. They each use a specific rule to determine a safe edge in line 3 of GENERIC_MST. In Kruskal's algorithm, The set A is a forest. The safe edge added to A is always a least-weight edge in the graph that connects two distinct components. In Prim's algorithm, The set A forms a single tree. The safe edge added to A is always a least-weight edge connecting the tree to a vertex not in the tree. 76 Kruskal's algorithm(basic part) (Sort the edges in an increasing order) 2 A:={} 3 while E is not empty do { 3 take an edge (u, v) that is shortest in E and delete it from E 4 if u and v are in different components then add (u, v) to A } Note: each time a shortest edge in E is considered. 1 77 Kruskal's algorithm (Fun Part, not required) MST_KRUSKAL(G,w) 1 A:={} 2 for each vertex v in V[G] 3 do MAKE_SET(v) 4 sort the edges of E by nondecreasing weight w 5 for each edge (u,v) in E, in order by nondecreasing weight 6 do if FIND_SET(u) != FIND_SET(v) 7 then A:=A∪{(u,v)} 8 UNION(u,v) 9 return A 78 Disjoint-Set Keep a collection of sets S1, S2, .., Sk, Each Si is a set, e,g, S1={v1, v2, v8}. Three operations Make-Set(x)-creates a new set whose only member is x. Union(x, y) –unites the sets that contain x and y, say, Sx and Sy, into a new set that is the union of the two sets. Find-Set(x)-returns a pointer to the representative of the set containing x. Each operation takes O(log n) time. 79 Our implementation uses a disjoint-set data structure to maintain several disjoint sets of elements. Each set contains the vertices in a tree of the current forest. The operation FIND_SET(u) returns a representative element from the set that contains u. Thus, we can determine whether two vertices u and v belong to the same tree by testing whether FIND_SET(u) equals FIND_SET(v). The combining of trees is accomplished by the UNION procedure. Running time O(|E| lg (|E|)). 80 Prim's algorithm(basic part) MST_PRIM(G,w,r) 1. A={} S:={r} (r is an arbitrary node in V) 3. Q=V-{r}; 4. while Q is not empty do { 5 take an edge (u, v) such that (1) u S and v Q (v S ) and 2. (u, v) is the shortest edge satisfying (1) 6 add (u, v) to A, add v to S and delete v from Q } 81 Further Reading Spanning Trees and Optimization Problems: by Bang Ye Wu and Kun-Mao Chao, Chapman and Hall, 2004