directed-graph

advertisement
Directed Graph
In a directed graph, each edge (𝑢, 𝑣) has a direction 𝑢 → 𝑣. Thus (𝑢, 𝑣) ≠ (𝑣, 𝑢). Directed graph is
useful to model many practical problems (such as one-way road in traffic network, and asymmetric
relation in a social network). However, the direction makes some major differences in many algorithms.
We examine some of them in this chapter.
Notations and Definitions
Directed edge: a directed edge (𝑢, 𝑣) has a direction 𝑢 → 𝑣. 𝑢 is called the tail, and 𝑣 is called the head.
Arc: a directed edge (𝑢, 𝑣) is sometimes called an arc to emphasize it is directed.
Degrees: indeg(𝑣) is the number of incoming edges to 𝑣. outdeg(𝑣) is the number of outgoing edges
from 𝑣.
We say 𝑡 is reachable from 𝑠 if there is a directed path from 𝑠 to 𝑡. 𝑠 → 𝑢1 → 𝑢2 → ⋯ 𝑢𝑘 → 𝑡.
Strongly connected: A directed graph is strongly connected if every pair of vertices are reachable from
each other.
Clearly, strongly connected relation is an “equivalence relation”:



a ~ a. (Reflexivity)
if a ~ b then b ~ a. (Symmetry)
if a ~ b and b ~ c then a ~ c. (Transitivity)
Strongly connected component: A strongly connected component is a maximal strongly connected
subgraph.
Directed acyclic graph (DAG): A directed graph without any directed cycles.
We are interested in the following questions:



Is a directed graph strongly connected?
Is a directed graph acyclic?
Find all strongly connected component.
BFS and DFS
A quick review: The BFS and DFS algorithms for undirected graph still work for directed graph. The only
difference is that we only following the out-edges in directed graph. More specifically, when 𝑢 is taken
out from the queue or stack, we add unvisited vertices 𝑣 such that (𝑢, 𝑣) ∈ 𝐸 into the queue or stack.
The BFS or DFS algorithm in directed graph still run in 𝑂(𝑛 + 𝑚) time to visit all reachable vertices from
the starting vertex 𝑠.
The edges responsible to add new vertices to the queue or stack form the BFS and DFS trees. The BFS
tree still has the shortest distance property. The DFS tree still has the parenthesis property for the start
and finish time. However, the properties about the non-tree edges in undirected graph do not hold
anymore for directed graph.
Reachability:
Problem: Given 𝑠, find all vertices 𝑣 such that there is a directed path from 𝑠 to 𝑣.
Theorem: BFS (or DFS) on directed graph ensure that
1. All reachable vertices from the source 𝑠 are visited.
2. The time complexity is 𝑂(𝑛 + 𝑚).
Proof: The same as undirected graph case. Omitted.
Theorem: For directed graph, the path found by the BFS algorithm has the shortest distance from 𝑠 to 𝑣.
Proof: The same as undirected graph case. Omitted.
Strongly Connected Graph
Now we consider the problem of determining whether a graph is strongly connected. Example:
Determine a traffic network consisting of only one-way edges are sufficient.
In undirected graph, all vertices reachable from 𝑠 form the connected components containing 𝑠. This is
because if there is a path from 𝑠 to 𝑢, and a path from 𝑠 to 𝑣, then there is a path from 𝑢 to 𝑣. That
makes the problem solvable by either the BFS or DFS algorithm. However, this does not apply anymore
for directed graph.
Trivial Algorithm 1: For each 𝑠 ∈ 𝑉, use BFS to check if it can reach every other vertex. Time complexity:
𝑂(𝑛 ⋅ (𝑛 + 𝑚)) = 𝑂(𝑚𝑛).
Think about the undirected graph again, we connect path 𝑢 → 𝑠 → 𝑣 to build the path from 𝑢 to 𝑣. It is
just that in undirected graph the path 𝑢 → 𝑠 is the same as the path 𝑠 → 𝑢. In directed graph, we cannot
reverse a path. But it does not forbid us from finding another path from 𝑢 to 𝑠.
Observation: Pick 𝑠 ∈ 𝑉. If there is a path from 𝑠 to every 𝑢, and a path from every 𝑢 to 𝑠, then 𝐺 is
strongly connected. (Proof is trivial and omitted)
On the other hand, if 𝐺 is strongly connected, then the condition of the observation is also correct. Thus,
we have the lemma.
Lemma 1: For any 𝑠 ∈ 𝑉, 𝐺 is strongly connected ⇔ There is a path from 𝑠 to every 𝑢, and a path from
every 𝑢 to 𝑠.
We know how to check “there is a path from 𝑠 to every 𝑢” in 𝑂(𝑚 + 𝑛) time. How to check if there is a
path from every 𝑢 to 𝑠? Just reverse the edges and the problem is reduced to check if there is a path
from 𝑠 to every 𝑢.
Algorithm (Strong Connectivity)
1.
2.
3.
4.
5.
Arbitrarily pick a vertex 𝑠 ∈ 𝑉.
Use DFS to check if every vertex is reachable from 𝑠.
Reverse the direction of all edges.
Use DFS to check if every vertex is reachable from 𝑠.
If both yes, then say “yes”. Otherwise, say “no”
The correctness follows the Lemma. Time complexity 𝑂(𝑛 + 𝑚).
Directed Acyclic Graphs
Problem: Determine if a directed graph is acyclic.
The above three graphs are isomorphic. However, the right most version is the easiest to determine if
the graph has a cycle: We only need to check if there is an arc that points backwards.
Topological ordering: Ordering of the vertices by relabeling the vertices, such that there is no arc
(𝑣𝑖 , 𝑣𝑗 ) for 𝑖 > 𝑗.
Theorem 2: A directed graph 𝐺 is acyclic ⟺ it has a topological ordering.
Proof: ⇐) Trivial.
⇒) We prove by providing an algorithm to construct the topological ordering.
First, there must be a vertex with in-degree 0. Otherwise, one can find an infinite long backward path by
repeatedly following the incoming edge of the vertex at the head of the path. Eventually we will visit a
vertex twice, causing a directed cycle.
Suppose we find a vertex 𝑣 with in-degree 0, then it can serve as the first vertex in the topological
ordering. Now we consider 𝐺 ′ = 𝐺 − 𝑣. Then 𝐺 ′ is a smaller acyclic graph. We topologically order 𝐺 ′ by
recursion, and then put 𝑣 at the left-most of the ordering. Details omitted.
Exercise: Find a way to implement the above algorithm in 𝑂(𝑚 + 𝑛) time.
Next let us examine a different algorithm for topological ordering. This is a useful preparation for the
later algorithms for strongly connected components.
Algorithm (Sketch):
time ← 1
While not all vertices are visited
Arbitrarily pick an unvisited vertex 𝑣
DFS(𝑣)
We require that the start and finish time of each vertex 𝑣 is recorded during DFS. Note that we may
have a DFS forest. By using a global time, the start and finish time intervals of two nodes in two different
trees are disjoint.
Claim 3: If 𝐺 is acyclic, for every arc (𝑢, 𝑣), finish[𝑣]<finish[𝑢].
Proof:
Case 1. start[𝑣]<start[𝑢].
Because acyclic, there is no path from 𝑣 to 𝑢. So 𝑣 is not an ancestor of 𝑢 on a DFS tree. Thus, the two
intervals [start[v], finish[v]] and [start[u], finish[u]] are disjoint. Thus, start[𝑣]<start[𝑢] implies that
finish[𝑣]<finish[𝑢].
Case 2. start[u]<start[v]
Because of the arc (𝑢, 𝑣), the algorithm ensures that 𝑣 is a descendant of 𝑢. Because of the parenthesis
property, finish[𝑣]<finish[𝑢].
With the claim in place, we have the following algorithm to find a topological ordering.
Algorithm:
1. Run DFS on the whole graph, and record the vertices with decreasing finished time during DFS.
2. Check if the ordering is a topological ordering. If yes, output the ordering. Otherwise output
“cyclic”.
If 𝐺 is acyclic, then the algorithm outputs a topological ordering. Otherwise, it will realize it is acyclic in
step 2.
Correctness follows from the claim. Time complexity 𝑂(𝑚 + 𝑛). (Remark: Don’t do comparison based
sorting after DFS. It will end up with 𝑛 log 𝑛.)
Strongly Connected Component
The figure shows an example of strongly connected components of a directed graph. Note that two
strongly connected components do not share vertices with each other because a shared vertex 𝑣 can be
used as the bridge to connect the two components together. The right figure regards each strongly
connected component as a super node. Two super nodes have an edge if there is an edge connecting
the two components.
A straightforward algorithm is to modify the strongly connected graph algorithm. Find all vertices that
are reachable from 𝑠, and all vertices from which 𝑠 can be reached. Then take the intersection. But this
way, finding each component will take 𝑂(𝑛 + 𝑚) time. If there are 𝑘 components, it takes 𝑂(𝑘(𝑛 + 𝑚))
time. We want to learn a linear time algorithm.
Lemma 4. If each strongly connected component as a super node, the resulting super graph is acyclic.
Proof: (As an exercise.)
Idea 1. The acyclic super graph has a sink node with no out-degree. If our DFS starts with a vertex in this
sink component, then it will end up with finding the exact component.
This suggests the following strategy: Repeatedly find the sink component and remove it from the graph.
How to find a sink component? More precisely, how to find a vertex in a sink component?
Idea 2. Recall that if the graph is acyclic, the finish time in the DFS provides a reversed topological
ordering. Now the super graph is acyclic. If we do DFS on the original graph, will the finish time provides
any useful information?
The situation is illustrated in the following figure: 𝐶 has edges entering 𝐶′, but there’re no paths from 𝐶′
to 𝐶. We want to check the finish time of vertices in the two components.
This depends on which component is visited first during DFS.
Case 1. If the first vertex visited in 𝐶 ∪ 𝐶′ is 𝑣 ∈ 𝐶′. Before 𝑣 finishes, the DFS algorithm won’t enter 𝐶 at
all.
Case 2. If 𝑣 is in 𝐶 instead. Before 𝑣 finishes, the DFS will guarantee to finish all reachable unvisited
vertices (and therefore all 𝐶′).
Thus, no matter what, we have the latest finish time of 𝑪 is latter than the latest finish time of 𝑪′.
Lemma 4: Let 𝐶 and 𝐶′ be two strongly connected components and there are edges from 𝐶 to 𝐶′. Then
the latest finish time of 𝐶 is latter than the latest finish time of 𝐶′.
Proof: This is because of there is no path from 𝐶 ′ to 𝐶 (otherwise there is a cycle in the super graph).
Thus the lemma follows the earlier discussion. QED.
Thus, the latest finish time of a component can be used to topologically order the super graph. BUT, we
do not know the components yet!!!
Don’t give up! Let’s step backward a bit and see if we’ve lost everything. We can compute the latest
finish time of all vertices, it ought to be the latest finish time of all components. That vertex belongs to a
source component!
BUT, we do not want a vertex in a source component. We need a vertex in a sink component!!!!! Only
when 𝑣 in a sink component, we can use DFS(𝑣) to find the component. Other components do not have
this property.
Again, don’t give up. We’ve got something. We can find the source component. And we need to find the
sink component. Can we make use of what we’ve got?
Idea 3. The answer is simple: if you reverse all edges, the source and sink are reversed. And then you can
use the above algorithm to find it. Aha!
A straightforward summary:
While 𝐺 is not empty
Reverse edge direction of 𝐺 to get 𝐺′.
DFS on 𝐺′.
Let 𝑣 has the latest finishing time.
Do DFS(𝑣) on 𝐺 to find a strongly connected component 𝐶; Output 𝐶.
𝐺 ← 𝐺 − 𝐶.
Notice a lot of repeated computation in the straightforward implementation. Most can be reduced
straightforwardly. Special attention is required for DFS on 𝐺 ′ each time.
The purpose of DFS on 𝐺 ′ is to find 𝑣 in 𝐶, which is a source component of 𝐺 ′ and a sink component 𝐺.
Removing 𝐶 from 𝐺 and 𝐺 ′ together will result in a pair of reversed graphs again. Now the latest
finishing time of the previous DFS belongs to a new source component of the new 𝐺 ′ . So, we do not
need to do DFS again.
Thus, the new algorithm:
Reverse edge direction of 𝐺 to get 𝐺′.
DFS on 𝐺′ to obtain a finish time for each vertex.
While 𝐺 is not empty
Let 𝑣 has the latest finishing time
Do DFS(𝑣) on 𝐺 to find a strongly connected component 𝐶; Output 𝐶.
𝐺 ← 𝐺 − 𝐶.
Time complexity: 𝑂(𝑛 + 𝑚)
Exercise: Go through the proof of correctness and technical details to get the time complexity.
Acknowledgement: Prepared based upon Lap Chi’s notes. Many figures copied from his notes and
textbooks.
Download