Tirgul 11 BFS,DFS review Properties Use Breadth-First-Search(BFS) The BFS algorithm executes a breadth search over the graph. The search starts at a source (node s). We find all the adjacent neighbors of that given node s. We continue recursively to s’s neighbors. The output is a breadth tree. Breadth-First-Search(BFS) The root of the breadth tree is s. It contains all the nodes that are accessible from s. h=0 would be s h=1 would be all s’s adjacent nodes. h=2 would be all s’s adjacent nodes’ adjacent nodes. And so on… The path between the node s and a given node v in the tree defines the shortest path from s to v in graph G. Breadth-First-Search(BFS) For example BFS over that graph: r s t u v w x y We give the breadth tree: s r v w t u x y The BFS Algorithm The algorithm colors the graph nodes in the process of the search All nodes are initially white besides the node s. A node becomes gray when it is discovered. It becomes black when the algorithm backtracks from it (all it’s adjacent nodes were found). For each node u, d[u] is the length of the shortest path from node s to node u. The BFS Run Time Each node Enqueues once and Dequeues once to the queue. Enqueue, Dequeue operations take O(1). That is why the queue ministration takes O(V). We pass through each edge only once. For every node, we go through all its adjacent neighbors. It takes O(E). Altogether the run time is O(V+E). Questions What is the run time of the BFS algorithm when the input graph is represented as an adjacency-matrix? How can it be determined that an undirected graph is a full two sided graph? What is the algorithm to determine the shortest path between two given nodes in a graph? What is its run time? Depth-First-Search (DFS) The DFS algorithm executes a depth search over the graph. The search starts each time at a different node. The DFS algorithm terminates the discovery of one path before it backtracks to discover a new path. The DFS Algorithm The algorithm colors the graph nodes in a similar manner as BFS does. All nodes are initially white. A node becomes gray when it is discovered. It becomes black when the algorithm backtracks through it. The algorithm determines two timestamps for each node u. d[u] and f[u]. The DFS Algorithm d[u] is the “time” in which the node was colored gray. f[u] is the time in which the node was colored black. The DFS Algorithm DFS(G) for each node u initialize color[u]=white, p[u]=NULL set time=1 for all nodes u with white color do DFS-Visit(u) DFS-Visit(u) color[u]=gray; d[u]=time; time++; for each neighbor v of u with white color do p[v] = u; DFS-Visit(v); color[u]=black; f[u]=time; time++; Example of DFS The DFS of this graph is 1/8 2/7 9/12 4/5 3/6 10/11 The DFS Run Time DFS-Visit() is called exactly once for each node in V It is called for all white nodes and colors them to gray => O(V). DFS-Visit goes over each edge it finds twice. Finding the edge and backtracking. For each node, the DFS-Visit eventually goes over all its neighbors. In total, DFS-Visit goes over all edges twice => O(E). The total run-time is O(V+E). Two properties of DFS The parenthesis theorem: for any two nodes u, v, if d[u]<d[v]<f[u], then f[v]<f[u] and v is a descendant of u. proof sketch: Since d[v]<f[u], then when v was discovered, u’s color was gray, and thus v is a descendant of u. Moreover, because of the recursion, u will not be finished until v is finished, and thus f[u]>f[v]. Two Properties of DFS This theorem is called the parenthesis theorem because: We can print “(u” when u is discovered and “u)” when u is finished. This theorem tells us that the expression we’ll get will have a proper parenthesis structure. The white-path theorem: In a DFS tree, node v is a descendant of node u if and only if at time d[u] there is a white path from u to v. Two Properties of DFS proof sketch: If v is a descendant of u, there is a path from u to v in the DFS tree. Every node w on that path is also a descendant of u, so d[w]>d[u], w is white at time d[u]. Two Properties of DFS In the other direction, if there is a white path from u to v at time d[u], suppose that v’ was the first node on that path that is not a descendant of u, and let w be the predecessor of v’ (so w is a descendant of u and thus f[w]<f[u]). v’ is discovered after u, but before w is finished (since v’ is a neighbor of w), d[v’]>d[u] and f[u]>f[w]>d[v’]. It follows from the parenthesis theorem that v’ is a descendant of u, in contradiction. Finding the Strongly-ConnectedComponents of a Directed Graph A strongly connected component of a directed graph G = (V,E) is a maximal set of vertices U such that for every pair of vertices u and v in U, v is reachable from u and u is reachable from v. Strongly-Connected-Components(G) [1] call DFS(G) to compute f[u] for all vertices u [2] compute GT (reverse edges) [3] call DFS(GT) consider the vertices according to f[u] in decreasing order. [4] the depth-first trees in [3] are the StronglyConnected-Components of graph G. Why does the Algorithm of SCC Works? What is the relation between DFS and SCC? Claim 1 and Lemma 1 connects the two. Claim 1: If nodes u,v are in the same SCC then any node w in some path from u to v is also in the same SCC. Lemma 1: In any DFS, all vertices in the same SCC are placed in the same depth-first tree. Why does the Algorithm of SCC Works? Proof sketch of Lemma 1: Suppose r is the first node of an SCC discovered during the DFS. Consider some other node v in this SCC. All the nodes in the path from r to v are in the same SCC (according to the above claim), and thus they are all white (r is the first to be discovered). From the white-path theorem, v is placed in the same depth-first tree as r. Why does the Algorithm of SCC Works? Definition: node w is reachable from node u if u=w or if there is a path from u to w. Definition: the forefather (u ) of node u is some node w that is reachable from u and that has maximal f[w]. Note that f[u] <= f[ (u ) ] since u is reachable from u. Forefathers and DFS trees Lemma 2: The forefather of u is an ancestor of u. proof sketch: Suppose the forefather of u is w. We need to show that color[w]=gray when u is discovered. If color[w]=black, then f[u]>f[w] in contradiction. If color[w]=white, let v be the last node on the path from u to w with non-white color. If color[v]=black this means that v has no white neighbors which is a contradiction since the next node from v on the path to w is white. If color[v]=gray then since there is a white path from v to w, w is a descendant of v and thus f[v]>f[w]. Since there is a path from u to v, it follows that v should be the forefather of u instead of w, in contradiction. Forefathers and SCC’s Lemma 3: two nodes u, v are in the same SCC if and only if they have the same forefather. Proof sketch: If u,v are in the same SCC, then the forefather of u is reachable from v, and vice-versa. Therefore, they must have the same forefather. In the other direction, if u, v have the same forefather (say w), We know that u is reachable from w since w is its ancestor, and w is reachable from u, and thus u, w are in the same SCC. Equivalently v, w are in the same SCC. Explaining the Algorithm Notice that the node with the highest finish time (say u) must be a forefather. All we need to do is to find all the nodes that have a path to u. These are exactly the nodes in the DFS over the transpose of G, starting at u. This is the first SCC. The next forefather must be the (remaining) white node with the highest finish time. All the nodes in his SCC and the path to them must be still white. Performing a DFS from this forefather will discover exactly his SCC, and so on. Proving the Algorithm. Theorem 1: The SCC’s the algorithm computes are the correct SCCs of G. proof sketch: We prove it by induction on the number of the SCC. Suppose r is the root of the next DFS tree to be computed. Then r is the forefather of itself (by the induction assumption). A node v that is included in the DFS tree of r is in r’s SCC because r is his forefather. A node v that is included in r’s SCC is included in the DFS tree of r due to Lemma 1. r’s tree is exactly r’s SCC.