UBI529 3. Distributed Graph Algorithms Distributed Algorithms Models Interprocess Communication method: accessing shared memory, pointto-point or broadcast messages, or remote procedure calls. • Timing model: synchronous or asynchronous models. • Failure models : reliable or faulty behavior; Byzantine failures (failed processor can behave arbitrarily). 2 We assume A distributed network—Modeled as a graph. Nodes are processors and edges are communication links. • Nodes can communicate directly (only) with their neighbors through the edges. • Nodes have unique processor identities. • Synchronous model: Time is measured in rounds (time steps). • One message (typically of size O(log n)) can be sent through an edge in a time step. A node can send messages simultaneously through all its edges at once in a round. • No failure of nodes or edges. No malicious nodes. 3 2.1 Vertex and Tree Coloring • Vertex Coloring • Sequential Vertex Coloring Algorithms • Distributed Synchronous Vertex Coloring Algorithm • Distributed Tree Coloring Algorithms Preliminaries Vertex Coloring Problem: Given undirected Graph G = (V,E). Assign a color cu to each vertex u Є V such that if e = (v,w) Є E, then cu ≠ cw Aim is to use the minimum number of colors. Definition 2.1.1 : Given an undirected Graph, chromatic number Χ(G) is the minimum number of colors to color it. A vertex k-coloring uses exactly k colors. If X(G) = k, G is k-colorable but not (k-1) colorable. Calculating X(G) is NP-hard. 3-coloring decision is NP-complete. Applications : Assignment of radio frequencies : Colors represent frequencies, transmitters are the vertices. If two stations are neighbors when they interfere. University course scheduling : Vertices are courses, students edges Fast register allocation for computer programming : Vertices are variables, they are neigbors if they can be active at the same time. 5 Sequential Algorithm for Vertex Coloring Algorithm 2.1.1 : Sequential Vertex Coloring Input : G with v1,v2, ..., vn Output : Vertex Coloring f : VG -> {1,2,3,..} 1. For i =1 to n do 2. f(vi) := smallest color number that does not conflict by any of the other colored neighbors of vi 3. Return Vertex Coloring f 6 Vertex Coloring Algorithms Definition 2.1.2 : The number of neighbors of a vertex v is called the degree of v δ(v). The maximum degree vertex in a Graph G is called the the Graph degree Δ(G) = Δ. Theorem 2.1.1 : The algorithm is correct and terminates in O(n) steps. The algorithm uses Δ +1 colors. Proof: Correctness and termination are straight-forward. Since each node has at most Δ neighbors, there is always at least one color free in the range {1, …, Δ+1}. Remarks: • For many graphs coloring can be done with much less than Δ +1 colors. • This algorithm is not distributed; only one processor is active at a time. But: Use idea of Algorithm 1.4 to define “local” coloring subroutine 1.7 7 Heuristic Vertex Coloring Algorithm : Largest Degree First Idea : (Two observations) A vertex of a large degree is more difficult to color than a smaller degree vertex. Also, a vertex with more colored neighbors will be more difficult to color later Algorithm 2.1.1 : Largest Degree First Algorithm Input : G with v1,v2, ..., vn Output : Vertex Coloring f : VG -> {1,2,3,..} 1. While there are uncolored vertices of G 2. Among the uncolored max. degree vertices Choose vertex v with the max. Colored degree 3. Assign smallest possible k to v : f(v) := k 4. Return Vertex Coloring f The coloring in the diagram is v3,v1,v2,v4,v8,v6,v7,v5 Colored degree : # of different colors used to color neighbors of v 8 Coloring Trees : A Distributed Algorithm Lemma 2.1.1: X(Tree) <= 2. Proof: If the distance of a node to the root is odd (even), color it 1 (0). An odd node has only even neighbors and vice versa. If we assume that each node knows its parent (root has no parent) and children in a tree, this constructive proof gives a very simple algorithm. Algorithm 2.1.3 [Slow tree coloring]: 1. Root sends color 0 to children. (Root is colored 0) 2. When receiving a message x from parent, a node u picks color cu = 1x, and sends cu to its children 9 Distributed Tree Coloring Remarks: • With the proof of Lemma 2.1.1, the algorithm 2.13 is correct. • The time complexity of the algorithm is the height of the tree. • When the root is chosen randomly, this can be up to the diameter of the tree. 10 2.2 Distributed Tree based Communication Algorithms • Broadcast • Convergecast • BFS Tree Construction Broadcast Broadcasting means sending a message from a source node to all other nodes of the network. Two basic broadcasting approaches are flooding and spanning tree-based broadcast. Flooding: A source node s wants to send a message to all nodes in the network. s simply forwards the message over all its edges. Any vertex v != s, upon receiving the message for the first time (over an edge e) forwards it on every other edge. Upon receiving the message again it does nothing. 12 Broadcast Definition 2.2.1 [Broadcast]: A broadcast operation is initiated by a single processor, the source. The source wants to send a message to all other nodes in the system. Definition 2.2.2 [Distance, Radius, Diameter]: The distance between two nodes u, v in an undirected graph is the number of hops of a minimum path between u and v. • The radius of a node u in a graph is the maximum distance between u and any other node. The radius of a graph is the minimum radius of any node in the graph. • The diameter of a graph is the maximum distance between two arbitrary nodes. • 13 Broadcast Theorem 2.2.1 [Lower Bound]: The message complexity of a broadcast is at least n-1. The radius of the graph is a lower bound for the time complexity. Proof: Every node must receive the message. Remarks: • You can use a pre-computed spanning tree to do the broadcast with tight message complexity. • If the spanning tree is a breadth-first spanning tree (for a given source), then also the time complexity is tight. Definition 2.2.3 : A graph (system/network) is clean if the nodes do not know the topology of the graph. Theorem 2.2.2 [Clean Lower Bound]: For a clean network, the number of edges is a lower bound for the broadcast message complexity. Proof: If you do not try every edge, you might miss a whole part of the graph behind it. 14 Flooding Algorithm 2.2.1 [Flooding]: The source sends the message to all neighbors. Each node receiving the message the first time forwards to all (other) neighbors. Remarks: • If node v receives the message first from node u, then node v calls node u “parent”. This parent relation defines a spanning tree T. If the flooding algorithm is executed in a synchronous system, then T is a breadth-first spanning tree (with respect to the root). • More interestingly, also in asynchronous systems the flooding algorithm terminates after r time units, where r is the radius of the source. (But note that the constructed spanning tree needs not be breadth-first.) 15 Flooding Analysis Theorem : The message complexity of flooding is (|E|) and the time complexity is (D), where D is the diameter of G. Proof. The message complexity follows from the fact that each edge delivers the message at least once and at most twice (one in each direction). To show the time complexity, we use induction on t to show that after t time units, the message has already reached every vertex at a distance of t or less from the source 16 Broadcast Over a Rooted Spanning Tree Suppose processors already have information about a rooted spanning tree of the communication topology tree: connected graph with no cycles spanning tree: contains all processors rooted: there is a unique root node Implemented via parent and children local variables at each processor indicate which incident channels lead to parent and children in the rooted spanning tree 17 Broadcast Over a Rooted Spanning Tree: A Simple Algorithm 1. root initially sends msg to its children 2. when a node receives msg from its parent sends msg to its children terminates (sets a local boolean to true) Synchronous model: time is depth of the spanning tree, which is at most n - 1 number of messages is n - 1, since one message is sent over each spanning tree edge Asynchronous model: same time and messages 18 Tree Broadcast Assume that a spanning tree has been constructed. Theorem . For every n-vertex graph G with a spanning tree T rooted at r0, the message complexity of broadcast is n−1 and time complexity is depth(T). A broadcast algorithm can be used to construct a spanning tree in G. The message complexity of broadcast is asymptotically equivalent to the message complexity of spanning tree construction. Using a breadth-first spanning tree, we get the optimal message and time complexities for broadcast. 19 Convergecast Again, suppose a rooted spanning tree has already been computed by the processors parent and children variables at each processor Do the opposite of broadcast: leaves send messages to their parents non-leaves wait to get message from each child, then send combined info to parent 20 Convergecast solid arrows: parent-child relationships a b,d dotted lines: non-tree edges c,f,h b c d f,h e,g d e f g h g h 21 Finding a Spanning Tree Given a Root a distinguished processor is known, to serve as the root root sends M to all its neighbors when non-root first gets M set the sender as its parent send "parent" msg to sender send M to all other neighbors when get M otherwise send "reject" msg to sender use "parent" and "reject" msgs to set children variables and know when to terminate 22 Execution of Spanning Tree Alg. a b d a Both models: O(m) messages O(diam) time c e f g h Synchronous: always gives breadth-first search (BFS) tree b d c e f g h Asynchronous: not necessarily BFS tree 23 2.3 Distributed Minimum Spanning Tree Algorithms Minimum Spanning Tree Minimum spanning tree. Given a connected graph G = (V, E) with realvalued edge weights ce, an MST is a subset of the edges T E such that T is a spanning tree whose sum of edge weights is minimized. 24 4 23 6 16 4 18 5 9 5 11 8 14 10 9 6 7 8 11 7 21 G = (V, E) T, eT ce = 50 Cayley's Theorem. There are nn-2 spanning trees of Kn. can't solve by brute force 25 Applications MST is fundamental problem with diverse applications. Network design – telephone, electrical, hydraulic, TV cable, computer, road Approximation algorithms for NP-hard problems – traveling salesperson problem, Steiner tree Indirect applications max bottleneck paths – LDPC codes for error correction – image registration with Renyi entropy – learning salient features for real-time face verification – reducing data storage in sequencing amino acids in a protein – model locality of particle interactions in turbulent fluid flows – autoconfig protocol for Ethernet bridging to avoid cycles in a network Cluster analysis. – 26 Greedy Algorithms Kruskal's algorithm. Start with T = . Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle. Reverse-Delete algorithm. Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T. Prim's algorithm. Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T. Remark. All three algorithms produce an MST. 27 Greedy Algorithms Simplifying assumption. All edge costs ce are distinct. Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e. Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f. f S C e e is in the MST f is not in the MST 28 Cycles and Cuts Cycle. Set of edges the form a-b, b-c, c-d, …, y-z, z-a. 1 2 3 6 4 Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1 5 8 7 Cutset. A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S. 1 2 3 6 Cut S = { 4, 5, 8 } Cutset D = 5-6, 5-7, 3-4, 3-5, 7-8 4 5 7 8 29 Cycle-Cut Intersection Claim. A cycle and a cutset intersect in an even number of edges. 2 1 3 6 Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1 Cutset D = 3-4, 3-5, 5-6, 5-7, 7-8 Intersection = 3-4, 5-6 4 5 8 7 Pf. (by picture) C S V-S 30 Greedy Algorithms Simplifying assumption. All edge costs ce are distinct. Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T* contains e. Pf. (exchange argument) Suppose e does not belong to T*, and let's see what happens. Adding e to T* creates a cycle C in T*. Edge e is both in the cycle C and in the cutset D corresponding to S there exists another edge, say f, that is in both C and D. T' = T* { e } - { f } is also a spanning tree. Since ce < cf, cost(T') < cost(T*). This is a contradiction. ▪ f S e T* 31 Greedy Algorithms Simplifying assumption. All edge costs ce are distinct. Cycle property. Let C be any cycle in G, and let f be the max cost edge belonging to C. Then the MST T* does not contain f. Pf. (exchange argument) Suppose f belongs to T*, and let's see what happens. Deleting f from T* creates a cut S in T*. Edge f is both in the cycle C and in the cutset D corresponding to S there exists another edge, say e, that is in both C and D. T' = T* { e } - { f } is also a spanning tree. Since ce < cf, cost(T') < cost(T*). This is a contradiction. ▪ f S e T* 32 Prim's Algorithm: Proof of Correctness Prim's algorithm. [Jarník 1930, Dijkstra 1957, Prim 1959] Initialize S = any node. Apply cut property to S. Add min cost edge in cutset corresponding to S to T, and add one new explored node u to S. S 33 Implementation: Prim's Algorithm Implementation. Use a priority queue ala Dijkstra. Maintain set of explored nodes S. For each unexplored node v, maintain attachment cost a[v] = cost of cheapest edge v to a node in S. O(n2) with an array; O(m log n) with a binary heap. Prim(G, c) { foreach (v Initialize foreach (v Initialize V) a[v] an empty priority queue Q V) insert v onto Q set of explored nodes S while (Q is not empty) { u delete min element from Q S S {u } foreach (edge e = (u, v) incident to u) if ((v S) and (ce < a[v])) decrease priority a[v] to ce } 34 Kruskal's Algorithm: Proof of Correctness Kruskal's algorithm. [Kruskal, 1956] Consider edges in ascending order of weight. Case 1: If adding e to T creates a cycle, discard e according to cycle property. Case 2: Otherwise, insert e = (u, v) into T according to cut property where S = set of nodes in u's connected component. v e Case 1 S e u Case 2 35 Implementation: Kruskal's Algorithm Implementation. Use the union-find data structure. Build set T of edges in the MST. Maintain set for each connected component. O(m log n) for sorting and O(m (m, n)) for union-find. m n2 log m is O(log n) essentially a constant Kruskal(G, c) { Sort edges weights so that c1 c2 ... cm. T foreach (u V) make a set containing singleton u are u and v in different connected components? for i = 1 to m (u,v) = ei if (u and v are in different sets) { T T {ei} merge the sets containing u and v } merge two components return T } 36 37 Distributed Spanning tree construction For a graph G=(V,E), a spanning tree is a maximally connected subgraph T=(V,E’), E’ E,such that if one more edge is added, then the subgraph is no more a tree. Used for broadcasting in a network. Chang-Robert’s algorithm 1 2 {The root is known} Uses signals and acks, similar to the termination detection algorithm. Uses the same rule for sending acknowledgment. 0 5 root 3 4 Question: What if the root is not designated? 38 Chang Roberts Spanning Tree Alg program define initially probe-echo N : integer (no. of neighbors) C, D : integer; parent :=i; C=0; D=0; {for the initiator} send probes to each neighbor; D:=no. of neighbors; do D!=0 echo -> D:=D-1 od {D=0 signals end} { for a non-initator process i>0} do parent parent=i C=0 -> C:=1; parent := sender; if i is not a leaf -> send probes to non – parent neighbors; D:= no. of non-parent neighbors fi; echo -> D:=D-1; probe sender != parent -> send echo to sender; C=1 D=0 -> send echo to parent; C:=0; od 39 Graph traversal Consider web-crawlers, exploration of social networks, graph layouts for visualization or drawing etc. Many applications of exploring an unknown graph by a visitor (a token or mobile agent or a robot). The goal of traversal is to visit every node at least once, and return to the starting point. - How efficiently can this be done? - What is the guarantee that all nodes will be visited? - What is the guarantee that the algorithm will terminate? 40 Graph traversal and Spanning Tree Formation Tarry’s algorithm is one of the oldest (1895) 3 5 1 2 4 5 6 Rule 1. Send the token towards each neighbor exactly once. 0 Rule 2. If rule 1 is not applicable, then send the token to the parent. root A possible route is: 0 1 2 5 3 1 4 6 2 6 4 1 3 5 2 1 0 Nodes and their parent pointers generate a spanning tree that may not be DFS 41 Distributed MST Def MST Fragment : In a weighted graph G = (V,E,w), a tree T in G is called an MST fragment of G, i there exists an MST of G such that T is a subgraph of that MST. Def MWOE : An edge e is an outgoing edge of a MST fragment T, iff exactly one of its endpoints belongs to T. The minimum weight outgoing edge is denoted MWOE(T). Lemma : Consider a MST fragment T of a graph G = (V, E, w). Let e = MWOE(T). Then T U e is a MST fragment as well. Proof : Let TM be an MST containing T. If TM contains T we are done. Otherwise, let e’ be an edge that connects T to the rest of TM. Clearly, e’ is an outgoing edge of T and w(e’)>=w(e). Adding e to TM, creates a graph C with a cycle through e and e’. Discarding e’ from C yields a new T’ M with w(T’ M) >= w(TM). 42 Minimum Spanning Tree Given a weighted graph G = (V, E), generate a spanning tree T = (V, E’) such that the sum of the weights of all the edges is minimum. Applications The traveling salesman problem asks for the shortest route to visit a collection of cities and return to the starting point. On Euclidean plane, approximate solutions to the traveling salesman problem, Lease phone lines to connect the different offices with a minimum cost, Visualizing multidimensional data (how entities are related to each other) We are interested in distributed algorithms only 43 Example 44 Sequential algorithms for MST Review (1) Prim’s algorithm and (2) Kruskal’s algorithm. Theorem. If the weight of every edge is distinct, then the MST is unique. 8 0 1 1 2 e 5 4 3 5 7 T2 T1 4 6 2 6 3 9 45 Gallagher-Humblet-Spira (GHS) Algorithm GHS is a distributed version of Prim’s algorithm. 3 Bottom-up approach. MST is recursively 7 constructed by fragments joined by an edge of least cost. 5 Fragment Fragment 46 Challenges 8 0 1 1 2 e 5 4 3 5 7 T2 T1 4 6 2 6 3 9 Challenge 1. How will the nodes in a given fragment identify the edge to be used to connect with a different fragment? A root node in each fragment is the coordinator 47 Challenges 8 0 1 1 2 e 5 4 3 5 7 T2 T1 4 6 2 6 3 9 Challenge 2. How will a node in T1 determine if a given edge connects to a node of a different tree T2 or the same tree T1? Why will node 0 choose the edge e with weight 8, and not the edge with weight 4? Nodes in a fragment acquire the same name before augmentation. 48 Two main steps Each fragment has a level. Initially each node is a fragment at level 0. (MERGE) Two fragments at the same level L combine to form a fragment of level L+1 (ABSORB) A fragment at level L is absorbed by another fragment at level L’ (L < L’) 49 Least weight outgoing edge To test if an edge is outgoing, each node sends a test message through a candidate edge. The receiving node may send accept or reject. Root broadcasts initiate in its own fragment, collects the report from other nodes about eligible edges using a convergecast, and determines the least weight outgoing edge. test 8 0 2 e 1 5 accept 1 4 3 T1 reject 4 2 6 5 7 T2 6 3 9 50 Accept of reject? Case 1. If name (i) = name (j) then send reject Case 2. If name (i)≠name (j)level (i) level (j) then send accept Case 3. If name (i) ≠ name (j) level (i) > level (j) then wait until level (j) = level (i). Let i send test to j Levels can only increase. reject test test Question: Can fragments wait for ever and lead to a deadlock? 51 Delayed response test A join initiate Level 5 B Level 3 B is about to change its level to 5. So B does not send an accept reponse to A in response to test 52 The major steps Repeat Test edges as outgoing or not Determine lwoe - it becomes a tree edge Send join (or respond to join) Update level & name & identify new coordinator until done 53 Classification of edges Basic (initially all branches are basic) Branch (all tree edges) Rejected (not a tree edge) Branch and rejected are stable attributes 54 Wrapping it up Example of merge Merge The edge through which the join message is sent, changes its status to branch, and becomes a tree edge. Each root broadcasts an (join, L, T) T T’ (join, L’, T’) level = L’ level=L (initiate, L+1, name) message (a) L= L’ to the nodes in its own fragment. T (join, L’, T;) T’ level = L’ level=L (b) L > L’ 55 Wrapping it up Absorb T’ receives an initiate message. This indicates that the fragment at level L has been absorbed (join, L, T) T by the other fragment at level L’. T’ (join, L’, T’) level = L’ level=L (a) They collectively search for the L= L’ lwoe. The edge through which the join message was sent, changes its status to branch. T initiate (join, L’, T;) level=L T’ level = L’ Example of absorb (b) L > L’ 56 Example 1 0 8 2 5 1 3 7 4 5 4 6 2 6 9 3 57 Example merge 1 0 8 2 1 3 7 4 5 4 merge 2 merge 5 6 6 9 3 58 Example 1 0 8 2 5 1 merge 3 7 4 5 4 6 2 6 9 absorb 3 59 Example 1 0 absorb 8 2 5 1 3 7 4 5 4 6 2 6 9 3 60 Message complexity At least two messages (test + reject) must pass through each rejected edge. The upper bound is 2|E| messages. At each of the log N levels, a node can receive at most (1) one initiate message and (2) one accept message (3) one join message (4) one test message not leading to a rejection, and (5) one changeroot message. So, the total number of messages has an upper bound of 2|E| + 5N logN 61 MST Algorithms: Theory Deterministic comparison based algorithms. O(m log n) [Jarník, Prim, Dijkstra, Kruskal, Boruvka] O(m log log n). [Cheriton-Tarjan 1976, Yao 1975] O(m (m, n)). [Fredman-Tarjan 1987] O(m log (m, n)). [Gabow-Galil-Spencer-Tarjan 1986] O(m (m, n)). [Chazelle 2000] Holy grail. O(m). Notable. O(m) randomized. O(m) verification. [Karger-Klein-Tarjan 1995] [Dixon-Rauch-Tarjan 1992] Euclidean. 2-d: O(n log n). k-d: O(k n2). compute MST of edges in Delaunay dense Prim 62 Distributed MST Algorithms Gallager, Humblet, & Spira ’83: O(n log n) running time message: O(|E| + n log n) (optimal) Chin & Ting ’85: O(n log log n) time Gafni ’85: O(n log*n) Awerbuch ’87: O(n), existentially optimal Garay, Kutten, & Peleg ’98: O(D + n0.61), Diameter D Kutten & Peleg ’98: O D n log* n Elkin ’04: ~ O n , μ is called MST radius – Cannot detect termination unless μ is given as input. ~ Peleg & Rabinovich (’99) showed a lower bound of n for running time. 63 Distributed Graph Algs : Other areas of interest Distributed Cycle/Knot Detection Distributed Center Finding Distributed Connected Dominating Set Construction in MANETs, WSNs Distributed Clustering based on Graph Partitioning 64 References Introduction to Graph Theory, Douglas West, Prentice Hall, 2000 (basics) Graph Theory and Its Applications, Gross and Yellen, CRC Press, 1998 (basics) Distributed Algorithm Course notes, J.Welch, TAMU (flooding and tree algorithms) CS590A Fall 2007 G. Pandurangan 1, Purdue University Distributed Computing Principles Course Notes, Roger Wattenhofer, ETH (Coloring algorithms) Introduction to Algorithm Design, Kleinman, Tardos, Prentice-Hall, 2005 (MST dependent) 22C:166 Distributed Systems and Algorithms Course, Sukumar Ghosh, University of Iowa (routing part heavily dependent) 65