Directed Graph In a directed graph, each edge (, ) has a direction → . Thus (, ) ≠ (, ). Directed graph is useful to model many practical problems (such as one-way road in traffic network, and asymmetric relation in a social network). However, the direction makes some major differences in many algorithms. We examine some of them in this chapter. Notations and Definitions Directed edge: a directed edge (, ) has a direction → . is called the tail, and is called the head. Arc: a directed edge (, ) is sometimes called an arc to emphasize it is directed. Degrees: indeg() is the number of incoming edges to . outdeg() is the number of outgoing edges from . We say is reachable from if there is a directed path from to . → 1 → 2 → ⋯ → . Strongly connected: A directed graph is strongly connected if every pair of vertices are reachable from each other. Clearly, strongly connected relation is an “equivalence relation”: a ~ a. (Reflexivity) if a ~ b then b ~ a. (Symmetry) if a ~ b and b ~ c then a ~ c. (Transitivity) Strongly connected component: A strongly connected component is a maximal strongly connected subgraph. Directed acyclic graph (DAG): A directed graph without any directed cycles. We are interested in the following questions: Is a directed graph strongly connected? Is a directed graph acyclic? Find all strongly connected component. BFS and DFS A quick review: The BFS and DFS algorithms for undirected graph still work for directed graph. The only difference is that we only following the out-edges in directed graph. More specifically, when is taken out from the queue or stack, we add unvisited vertices such that (, ) ∈ into the queue or stack. The BFS or DFS algorithm in directed graph still run in ( + ) time to visit all reachable vertices from the starting vertex . The edges responsible to add new vertices to the queue or stack form the BFS and DFS trees. The BFS tree still has the shortest distance property. The DFS tree still has the parenthesis property for the start and finish time. However, the properties about the non-tree edges in undirected graph do not hold anymore for directed graph. Reachability: Problem: Given , find all vertices such that there is a directed path from to . Theorem: BFS (or DFS) on directed graph ensure that 1. All reachable vertices from the source are visited. 2. The time complexity is ( + ). Proof: The same as undirected graph case. Omitted. Theorem: For directed graph, the path found by the BFS algorithm has the shortest distance from to . Proof: The same as undirected graph case. Omitted. Strongly Connected Graph Now we consider the problem of determining whether a graph is strongly connected. Example: Determine a traffic network consisting of only one-way edges are sufficient. In undirected graph, all vertices reachable from form the connected components containing . This is because if there is a path from to , and a path from to , then there is a path from to . That makes the problem solvable by either the BFS or DFS algorithm. However, this does not apply anymore for directed graph. Trivial Algorithm 1: For each ∈ , use BFS to check if it can reach every other vertex. Time complexity: ( ⋅ ( + )) = (). Think about the undirected graph again, we connect path → → to build the path from to . It is just that in undirected graph the path → is the same as the path → . In directed graph, we cannot reverse a path. But it does not forbid us from finding another path from to . Observation: Pick ∈ . If there is a path from to every , and a path from every to , then is strongly connected. (Proof is trivial and omitted) On the other hand, if is strongly connected, then the condition of the observation is also correct. Thus, we have the lemma. Lemma 1: For any ∈ , is strongly connected ⇔ There is a path from to every , and a path from every to . We know how to check “there is a path from to every ” in ( + ) time. How to check if there is a path from every to ? Just reverse the edges and the problem is reduced to check if there is a path from to every . Algorithm (Strong Connectivity) 1. 2. 3. 4. 5. Arbitrarily pick a vertex ∈ . Use DFS to check if every vertex is reachable from . Reverse the direction of all edges. Use DFS to check if every vertex is reachable from . If both yes, then say “yes”. Otherwise, say “no” The correctness follows the Lemma. Time complexity ( + ). Directed Acyclic Graphs Problem: Determine if a directed graph is acyclic. The above three graphs are isomorphic. However, the right most version is the easiest to determine if the graph has a cycle: We only need to check if there is an arc that points backwards. Topological ordering: Ordering of the vertices by relabeling the vertices, such that there is no arc ( , ) for > . Theorem 2: A directed graph is acyclic ⟺ it has a topological ordering. Proof: ⇐) Trivial. ⇒) We prove by providing an algorithm to construct the topological ordering. First, there must be a vertex with in-degree 0. Otherwise, one can find an infinite long backward path by repeatedly following the incoming edge of the vertex at the head of the path. Eventually we will visit a vertex twice, causing a directed cycle. Suppose we find a vertex with in-degree 0, then it can serve as the first vertex in the topological ordering. Now we consider ′ = − . Then ′ is a smaller acyclic graph. We topologically order ′ by recursion, and then put at the left-most of the ordering. Details omitted. Exercise: Find a way to implement the above algorithm in ( + ) time. Next let us examine a different algorithm for topological ordering. This is a useful preparation for the later algorithms for strongly connected components. Algorithm (Sketch): time ← 1 While not all vertices are visited Arbitrarily pick an unvisited vertex DFS() We require that the start and finish time of each vertex is recorded during DFS. Note that we may have a DFS forest. By using a global time, the start and finish time intervals of two nodes in two different trees are disjoint. Claim 3: If is acyclic, for every arc (, ), finish<finish. Proof: Case 1. start<start. Because acyclic, there is no path from to . So is not an ancestor of on a DFS tree. Thus, the two intervals [start[v], finish[v]] and [start[u], finish[u]] are disjoint. Thus, start<start implies that finish<finish. Case 2. start[u]<start[v] Because of the arc (, ), the algorithm ensures that is a descendant of . Because of the parenthesis property, finish<finish. With the claim in place, we have the following algorithm to find a topological ordering. Algorithm: 1. Run DFS on the whole graph, and record the vertices with decreasing finished time during DFS. 2. Check if the ordering is a topological ordering. If yes, output the ordering. Otherwise output “cyclic”. If is acyclic, then the algorithm outputs a topological ordering. Otherwise, it will realize it is acyclic in step 2. Correctness follows from the claim. Time complexity ( + ). (Remark: Don’t do comparison based sorting after DFS. It will end up with log .) Strongly Connected Component The figure shows an example of strongly connected components of a directed graph. Note that two strongly connected components do not share vertices with each other because a shared vertex can be used as the bridge to connect the two components together. The right figure regards each strongly connected component as a super node. Two super nodes have an edge if there is an edge connecting the two components. A straightforward algorithm is to modify the strongly connected graph algorithm. Find all vertices that are reachable from , and all vertices from which can be reached. Then take the intersection. But this way, finding each component will take ( + ) time. If there are components, it takes (( + )) time. We want to learn a linear time algorithm. Lemma 4. If each strongly connected component as a super node, the resulting super graph is acyclic. Proof: (As an exercise.) Idea 1. The acyclic super graph has a sink node with no out-degree. If our DFS starts with a vertex in this sink component, then it will end up with finding the exact component. This suggests the following strategy: Repeatedly find the sink component and remove it from the graph. How to find a sink component? More precisely, how to find a vertex in a sink component? Idea 2. Recall that if the graph is acyclic, the finish time in the DFS provides a reversed topological ordering. Now the super graph is acyclic. If we do DFS on the original graph, will the finish time provides any useful information? The situation is illustrated in the following figure: has edges entering ′, but there’re no paths from ′ to . We want to check the finish time of vertices in the two components. This depends on which component is visited first during DFS. Case 1. If the first vertex visited in ∪ ′ is ∈ ′. Before finishes, the DFS algorithm won’t enter at all. Case 2. If is in instead. Before finishes, the DFS will guarantee to finish all reachable unvisited vertices (and therefore all ′). Thus, no matter what, we have the latest finish time of is latter than the latest finish time of ′. Lemma 4: Let and ′ be two strongly connected components and there are edges from to ′. Then the latest finish time of is latter than the latest finish time of ′. Proof: This is because of there is no path from ′ to (otherwise there is a cycle in the super graph). Thus the lemma follows the earlier discussion. QED. Thus, the latest finish time of a component can be used to topologically order the super graph. BUT, we do not know the components yet!!! Don’t give up! Let’s step backward a bit and see if we’ve lost everything. We can compute the latest finish time of all vertices, it ought to be the latest finish time of all components. That vertex belongs to a source component! BUT, we do not want a vertex in a source component. We need a vertex in a sink component!!!!! Only when in a sink component, we can use DFS() to find the component. Other components do not have this property. Again, don’t give up. We’ve got something. We can find the source component. And we need to find the sink component. Can we make use of what we’ve got? Idea 3. The answer is simple: if you reverse all edges, the source and sink are reversed. And then you can use the above algorithm to find it. Aha! A straightforward summary: While is not empty Reverse edge direction of to get ′. DFS on ′. Let has the latest finishing time. Do DFS() on to find a strongly connected component ; Output . ← − . Notice a lot of repeated computation in the straightforward implementation. Most can be reduced straightforwardly. Special attention is required for DFS on ′ each time. The purpose of DFS on ′ is to find in , which is a source component of ′ and a sink component . Removing from and ′ together will result in a pair of reversed graphs again. Now the latest finishing time of the previous DFS belongs to a new source component of the new ′ . So, we do not need to do DFS again. Thus, the new algorithm: Reverse edge direction of to get ′. DFS on ′ to obtain a finish time for each vertex. While is not empty Let has the latest finishing time Do DFS() on to find a strongly connected component ; Output . ← − . Time complexity: ( + ) Exercise: Go through the proof of correctness and technical details to get the time complexity. Acknowledgement: Prepared based upon Lap Chi’s notes. Many figures copied from his notes and textbooks.