Advanced Algorithms

advertisement
Advanced Algorithms
Class Notes for Monday, October 23, 2012
Min Ye, Mingfu Shao, and Bernard Moret
Greedy Algorithms (continued)
The best known application where the greedy algorithm is optimal is surely the construction of a minimum spanning tree (MST). Most of you are familiar with Prim’s and
Kruskal’s algorithms, but they are just two of a rather large family of greedy algorithms
for the MST problem. We look at two more algorithms in that family and then proceed to
prove that any algorithm in that family indeed returns a minimum spanning tree.
Greedy algorithms for the MST problem
Greedy algorithms use only a fraction of the information available about the problem.
Bottom-up greedy methods build solutions piece by piece (starting from the empty set)
by selecting, among the remaining pieces, that which optimizes the value of the partial
solution, while ensuring that the subset selected so far can be extended into a feasible
solution. Top-down greedy methods (much less common) build solutions from the full set
of pieces by removing one piece at a time, selecting a piece whose removal optimizes the
value of the remaining collection while ensuring that this collection continues to contain
feasible solutions. Thus in both cases, and indeed for any greedy algorithm, the idea is to
produce the largest immediate gain while maintaining feasibility.
In the case of the MST problem, we are given an undirected graph G = (V, E), and a
length (distance/weight/etc.) for each edge, d : E −→ R. Our aim is to find a spanning
tree—a tree connecting all vertices—of minimum total weight. MSTs have two important
properties, usually called the cycle property and the cut property. The cycle property says
that, for any cycle X in the graph, X ⊂ E, if the weight of an edge e ∈ X is strictly larger
than the weight of every other edge of X , then this edge e cannot belong to any MST of the
graph. (Phrased slightly differently, for any cycle X in the graph, if the weight of an edge
e ∈ X is larger than or equal to the weight of every other edge of X , then there exists an
MST that does not contain e.) Recall that a cut in a graph is a partition of the vertices of
/ this partition induces
the graph into two non-empty subsets, Y = {S,V − S}, S ⊂ V , S 6= 0;
a set of cut edges, the cut-set CY = {u, v} | u ∈ S, v ∈ V − S}. In a connected graph (and
we are always given a connected graph for the MST problem), there is a bijection between
cuts and cut-sets, so that we can specify one or the other. The cut property says that, for
any cut-set C in graph G, if the weight of an edge e ∈ C is strictly smaller than the weight of
every other edge of C, then this edge belongs to all MSTs of this graph. (Phrased slightly
differently, for any cut-set C in graph G, if the weight of an edge e ∈ CY is smaller than or
equal to the weight of every other edge of CY , then there exists an MST that contains e.)
1
Now a bottom-up greedy method for the MST starts from an empty set and adds one
piece at a time (an edge or a vertex, although the distinction is somewhat artificial), subject
to not creating cycles, until a tree is built (or until there remains no candidate piece—
the two are equivalent). In contrast, a top-down greedy method for the MST starts from
the entire graph and removes one edge at a time, subject to not disconnecting the graph,
until no more edges can be removed. In each case, the choice is made on the basis of the
contribution made by the chosen edge (or vertex) to the current collection, a purely local
decision.
A top-down approach is the Reverse-Delete algorithm, first mentioned by Kruskal (but
not to be confused with Kruskal’s algorithm, which is, of course, a bottom-up approach).
The Reverse-Delete algorithm starts with the original graph and deletes edges from it. The
algorithm works as follows. Start with graph G, which contains a list of edges E. Sort
the edges in E in decreasing order by weight, then go through the edges one by one, from
largest weight down to smallest weight. For each edge in turn, check whether deleting the
edge would disconnect the graph; if not, remove that edge. The proof of optimality for this
algorithm is quite straightforward, using the cycle-property. If the graph G has no cycles,
then it is a tree and thus it is its own unique (and hence also optimal) spanning tree. As long
as there remains a cycle in the graph, we do not have a tree and must remove at least one
edge. Reverse-Delete removes the remaining edge of largest weight; this edge must appear
in at least one cut-set, because it is part of cycle (since its removal does not disconnect the
graph) and it is necessarily the edge of largest weight in any cutset in which it appears and
thus the cycle property ensures that there exists at least one MST that does not include it.
We can view the bottom-up methods as proceeding by coalescing equivalence classes.
An equivalence class consists of a set of vertices with an associated set of edges that form
a minimum spanning tree for the vertices in the class. Initially, each vertex is the sole element of its equivalence class and its associated set of edges is empty. When the algorithm
terminates, only one equivalence class remains and the associated set of edges defines a
minimum spanning tree. At each step of the algorithm, we select an edge with an endpoint
in each of two equivalence classes and coalesce these two classes, thereby combining two
trees into one larger tree. (Edges with both endpoints in the same equivalence class are
permanently excluded, since their selection would lead to a cycle.) In order to minimize
the increase in the value of the objective function, greediness dictates that the allowable
edge of least cost be chosen next. This choice can be made with or without additional
constraints. At one extreme we can apply no additional constraint and always choose the
shortest edge that combines two spanning trees into a larger spanning tree; at the other
extreme we can designate a special equivalence class which must be involved in any merge
operation. The first approach can be viewed as selecting edges; it is known as Kruskal’s
algorithm, after J. B. Kruskal, who first presented it in 1956; the second is best viewed (at
least for programming purposes) as selecting vertices (adding them one by one to a single
partial spanning tree) and is known as Prim’s algorithm, after R.C. Prim, who presented
it in 1957. A third, more general, approach is in fact the oldest algorithm proposed for
the MST and among the oldest algorithms formally defined in Computer Science; it is
due to O. Boruvka, who first published it in 1926 as a method of constructing an efficient
2
electrical network to serve the city of Moravia. Boruvka’s algorithm considers all current
equivalence classes (spanning trees for subsets of vertices) at once, joins each to its closest “neighbor” (where the distance is defined as the length of the shortest edge with one
endpoint in each of the two equivalence classes), subject to not creating a cycle.
We can prove the correctness of all three coalescence-based algorithms (Kruskal’s,
Prim’s, and Boruvka’s) at once by proving the correctness of the more general algorithm.
Theorem 1. If, at each step of the algorithm, an arbitrary equivalence class, Ti , is selected,
but the edge selected for inclusion is the smallest that has exactly one endpoint in Ti , then
the final tree that results is a minimum spanning tree.
Proof. As G is connected, there is always at least one allowable edge at each iteration. As
each iteration can proceed, irrespective of our choice of Ti , and as an iteration decreases
the number of equivalence classes by one, the algorithm terminates.
The proof is by induction, though we present it somewhat informally. Let TA be a
spanning tree produced by the above algorithm and let TM be a minimum spanning tree.
We give a procedure for transforming TM into TA . We will form a sequence of trees, each
slightly different from its predecessor, with the property that the sums of the lengths of the
edges in successive trees are the same. Since TA will be at the far end of the sequence of
transformations, it must also be a minimum spanning tree. Label the edges of TA by the
iteration on which they entered the tree. Let ei be the edge of lowest index that is present
in TA but not in TM . The addition of ei to TM forms a cycle. Note that the length of ei
is greater than or equal to that of every other edge in the cycle; otherwise, TM would not
be a minimum spanning tree, because breaking any edge in the cycle would produce a
spanning tree and breaking a longer edge, if one such existed, would reduce the total cost.
Now, when ei was added to the forest of trees that eventually became TA , it connected an
arbitrarily chosen tree, Ti , to some other tree. Traverse the cycle in TM starting from the
endpoint of ei that was in Ti and going in the direction that does not lead to an immediate
traversal of ei . At some point in this traversal, we first encounter an edge with exactly
one endpoint in Ti . It might be the first edge we encounter, or it might be many edges
into the traversal, but such an edge must exist since the other endpoint of ei is not in Ti .
Furthermore, this edge, call it ê, cannot be ei . Now the length of ei cannot exceed that of
ê, because ê was an allowable edge, but was not selected; thus the two edges have equal
length. We replace ê with ei in TM : the resulting tree has the same total length. Note that
ê may or may not be an edge of TA , but if it is, then its index is greater than i, so that our
new tree now first differs from TA at some index greater than i. Replacing TM with this new
minimum spanning tree, we continue this process until there are no differences, i.e., until
TM has been transformed into TA .
Iterative improvement methods
Our next class of methods are those that start with a complete solution structure and proceed to refine it through successive iterations; at each iteration, an improvement is made
3
on a local basis. This is a more powerful approach that the greedy approach: whereas a
greedy approach never alters any choice it has already made, an iterative improvement approach has no problem doing so. Moreover, the number of iterations is not fixed—indeed,
bounding the number of iterations is the major problem in analyzing an iterative improvement algorithm—, whereas the number of steps taken by a greedy algorithm is simply the
number of elements included in the solution.
Matching and flow
Matching and network flow are the two most important problems for which an iterative
improvement method delivers optimal solutions. We first consider the maximum matching
problem in bipartite graphs.
Maximum bipartite matching
A matching in a graph is a subset of edges that do not share any endpoint; a maximum
matching is just a matching of maximum cardinality. A graph is said to be bipartite if its
vertices can be partitioned into two sets in such a way that all edges of the graph will have
one endpoint in one set and the other endpoint in the other set—the vertices of each set form
an independent set. Matching in bipartite graphs is one of the fundamental optimization
problems, as it is used for assignment problems, that is, problems where one wants to find
the optimal way to assign, say works crews to jobs—problems to be solved everyday on
construction sites, in factories, in airlines and railways, as well as in job scheduling for
computing systems.
We describe an iterative improvement algorithm for this problem: an algorithm that
refines the current solution through local changes, using an approach that can be repeated
many times, each time improving the quality of the solution. In the problem of maximum
bipartite matching, in order to improve an existing matching, we must start by identifying
unmatched vertices of degree at least 1—we need at least one such on each side of the
bipartite graph. Consider the trivial 4-vertex bipartite graph with vertex set {a, b, 1, 2} and
edge set {{a, 1}, {a, 2}, {b, 1}}, with current matching M = {{a, 1}}. It is clear that there
exists a larger matching, namely M ∗ = {{a, 2}, {b, 1}}. Note that, in order to transform
M into M ∗ , the set of matched vertices will simply gain two new members, but the set of
matched edges, while larger by one, may have nothing in common with the previous set.
In this trivial example, we could have started our search at vertex b, which has just one
neighbor, vertex 1; but vertex 1 was already matched, so we had to unmatch it (undoing
a previous decision), which makes its previous “mate,” vertex a, to become an unmatched
vertex of degree at least 1, to follow the matched edge back to vertex a, where we found
that a had an unmatched neighbor, vertex 2. We thus identified a path of three edges, the
first and the last unmatched, the middle one matched; by flipping the status of each edge,
from matched to unmatched and vice versa, we replaced a path with one matched edge by
the same path, but with two matched edges. Let us formally define what this type of path
is.
4
Let G = (V, E) be a graph and M be a matching. An alternating path with
respect to M is a path such that such that every other edge on the path is in M,
while the others are in E − M. If, in addition, the path is of odd length and the
first and last vertices on the path are unmatched, then the alternating path is
called an augmenting path.
The reason it is called an augmenting path is that we can use is to augment the size of the
matching: whereas an alternating path may have the same number of edges in M and in
E − M, or one more in M, or one more in E − M, an augmenting path must have one more
edge in M than in E − M. Moreover, because of the definition of matchings, it is safe to flip
the status of every edge in an augmenting path from matched to unmatched, and vice versa:
none of the vertices on the augmenting path can have been the endpoint of a matched edge
other than those already on the path.
Augmenting paths are thus the tool we needed to design an iterative improvement
algorithm: in general terms, we start with an arbitrary matching (including possibly an
empty one), then we search for an augmenting path in the graph; if one is found, we
augment the matching by flipping the status of all edges along the augmenting path; if
none is found, we stop. The obvious question, at this point, is whether the absence of any
augmenting path indicates just a local maximum or a global one. The answer is positive:
if G has no augmenting path with respect to M, then M is a maximum matching—it is
optimal. We phrase this result positively.
Theorem 2. Let G be a graph, M ∗ an optimal matching for G, and M any matching for G
such that we have |M| < |M ∗ |. Then G has an augmenting path with respect to M.
This result is due to French mathematician Claude Berge and so known as Berge’s
theorem. The proof is deceptively simple, but note that it is nonconstructive.
Proof. Let M ⊕ M ∗ denote the symmetric difference of M and M ∗, i.e., M ⊕ M ∗ = (M ∪
M ∗ ) − (M ∩ M ∗ ), and consider the subgraph G′ = (V, M ⊕ M ∗ ). All vertices of G′ have
degree two or less, because they have at most one incident edge from each of M and M ∗ ;
moreover, every connected component of G′ is one of: (i) a single vertex; (ii) a cycle of
even length, with edges drawn alternately from M and M ∗ ; or (iii) a path with edges drawn
alternately from M and M ∗ . As the cardinality of M ∗ exceeds that of M, there exists at
least one path composed of alternating edges from M and M ∗ , with more edges from M ∗
than from M. The path must begin and end with edges from M ∗ and the endpoints are
unmatched in M, because the path is a connected component of G′ ; hence this path is an
augmenting path.
Berge’s theorem shows that the use of augmenting paths not only enables us to improve
on the quality of an initial solution, it enables us to obtain an optimal solution. Note that
the definitions of alternating paths and augmenting paths hold just as well for nonbipartite
graphs as for bipartite ones; and Berge’s theorem does too. The next step is to develop an
algorithm for finding augmenting path and this is where the difference between bipartite
and nonbipartite graphs will show.
5
In a bipartite graph, any augmenting path begins on one side of the graph and ends on
the other. Thus a search algorithm can simply start at any unmatched vertex on one side of
the graph, say the left side, and traverse any edge to the other side. If the endpoint on the
right side is also unmatched, then an augmenting path, consisting of a single unmatched
edge, has been found. If the other endpoint is matched, then the algorithm traverses that
matched edge to the left side and follows any unmatched edge, if one exists, to an unvisited
vertex on the right side. The process is repeated until either an augmenting path is found or
a deadend on the left side is reached. Unmatched edges are always traversed from the left
side to the right side and matched edges in the opposite direction. If a deadend is reached,
we must explore other paths until we find an augmenting path or run out of possibilities.
In developing an augmenting path, choices arise in only two places: in selecting an initial
unmatched vertex and in selecting an unmatched edge out of a vertex on the left side. In
order to examine all possibilities for augmenting paths, we need to explore these choices
in some systematic way; because all augmenting paths make exactly the same contribution
of one additional matched edge, we should search for the shortest augmenting paths.
Thus we use a breadth-first search of the graph, starting at each unmatched vertex
on the left side. If any of the current active vertices (the frontier in the BFS, which will
always be vertices on the left side) has an unmatched neighbor on the right, we are done.
Otherwise, from each neighbor on the right, we follow the matched edge of which it is an
endpoint back to a vertex on the left and repeat the process. Thus the BFS increases path
lengths by 2 at each iteration—because the move back to the left along matched edges is
forced. The BFS takes O(|E|) time, as it cannot look at an edge more than twice (once
from each end); as the number of augmenting paths we may find is in O(|V |), the running
time of this BFS augmenting strategy is O(|V | · |E|). Since the input size is Θ(|V | + |E|),
the time taken is more than linear, but no more than quadratic, in the size of the input.
However, we are wasting a lot of time: each new BFS starts from scratch and, most
likely, will follow many paths already followed in the previous BFS. And each BFS produces a single new augmenting path. Yet, there typically will be a number of augmenting
paths in a graph with respect to a matching, especially if that matching is small. Instead of
stopping at the first unmatched neighbor on the right, we could finish that stage of BFS, collecting all unmatched neighbors on the right. Doing so would not increase the worst-case
running time of the BFS, yet might yield multiple augmenting paths of the same length.
However, we can use multiple augmenting paths only if they are vertex-disjoint, since otherwise we could cause conflicting assignments of vertices or even edges. The BFS might
discover that kl of the current active vertices (on the left) have an unmatched neighbor on
the right, but some of these neighbors might be shared; if there are kr unmatched neighbors
on the right, the maximum number of disjoint augmenting paths is min{kl , kr }. The number may be smaller, however, because this sharing of vertices can occur at any stage along
the alternating paths. Thus we must adjust our BFS to provide backpointers, so that we can
retrace paths from right-side unmatched vertices reached in the search; and we must add a
backtracing phase, which retraces at most one path for each unmatched vertex reached on
the right-hand side. The backtracing is itself a graph search. Specifically, for each left-side
vertex encountered during the breadth-first search, we record its distance from the clos6
est unmatched left-side vertex, passing as before through matched right-side vertices. We
use this information to run a (backward) depth-first search from each unmatched right-side
vertex discovered during the BFS: during a DFS we consider only edges that take us one
level closer to unmatched left-side vertices. When we discover an augmenting path, we
eliminate the vertices along this path from consideration by any remaining DFS, thereby
ensuring that our augmenting paths will be vertex-disjoint.
We can hope that the number of augmenting paths found during each search is more
than a constant, so that the number of searches (iterations) to be run is significantly decreased, preferably to o(|V |). We characterize the gain to be realized through a series of
small theorems; these theorems apply equally to general graphs and bipartite graphs. We
begin with a more precise proof of Berge’s theorem that allows us to refine its conclusion.
Theorem 3. Let M1 and M2 be two matchings in some graph, G = (V, E), with |M1 | >
|M2 |. Then the subgraph G′ = (V, M1 ⊕ M2 ) contains at least |M1 | − |M2 | vertex-disjoint
augmenting paths with respect to M2 .
Proof. Recall that every connected component of G′ is one of: (i) a single vertex; (ii) a
cycle of even length, with edges alternately drawn from M1 and M2 ; or (iii) a path with
edges alternately drawn from M1 and M2 . Let Ci = (Vi , Ei ) be the ith connected component
and define δ(Ci ) = |Ei ∩ M1 | − |Ei ∩ M2 |. From our previous observations, we know that
δ(Ci ) must be one of −1, 0, or 1 and that it equals 1 exactly when Ci is an augmenting path
with respect to M2 . Now we have
∑ δ(Ci ) = |M1 − M2| − |M2 − M1| = |M1| − |M2|,
i
so that at least |M1 | − |M2 | components Ci are such that δ(Ci ) equals 1, which proves the
theorem.
This tells us that many disjoint augmenting paths exist, but says nothing about their
lengths, nor about finding them. Indeed, if we take M2 to be the empty set and M1 to be a
maximum matching, the theorem tells us that the original graph contains enough disjoint
augmenting paths to go from no matching at all to a maximum matching in a single step!
But these paths will normally be of various lengths and finding such a set is actually a very
hard problem. We will focus on finding a set of disjoint shortest augmenting paths (thus
all of the same length) with respect to the current matching; such a set will normally not
contain enough paths to obtain a maximum matching in one step.
Our next result is intuitively obvious, but the theorem proves it and also makes it precise: successive shortest augmenting paths cannot become shorter.
Theorem 4. Let G = (V, E) be a graph, with M a nonmaximal matching, P a shortest augmenting path with respect to M, and P′ any augmenting path with respect to the augmented
matching M ⊕ P. Then we have
|P′ | ≥ |P| + |P ∩ P′ |
7
Proof. The matching M ⊕ P ⊕ P′ contains two more edges than M, so that, by our previous
theorem, M ⊕ (M ⊕ P ⊕ P′ ) = P ⊕ P′ contains (at least) two vertex-disjoint augmenting
paths with respect to M, call them P1 and P2 . Thus we have |P ⊕ P′ | ≥ |P1 | + |P2 |. Since
P is a shortest augmenting path with respect to M, we also have |P| ≤ |P1 | and |P| ≤ |P2 |,
so that we get |P ⊕ P′ | ≥ 2|P|. Since P ⊕ P′ is (P ∪ P′ ) − (P ∩ P′ ), we can write |P ⊕ P′ | =
|P| + |P′ | − |P ∩ P′ |. Substituting in our preceding inequality yields our conclusion.
An interesting corollary (especially given our BFS approach to finding shortest augmenting paths) is that two successive shortest augmenting paths have the same length only
if they are disjoint. Our new algorithm uses all disjoint shortest augmenting paths it finds,
as follows.
• Begin with an arbitrary (possibly empty) matching.
• Repeatedly find a maximal set of vertex-disjoint shortest augmenting paths, and use
them all to augment the current matching, until no augmenting path can be found.
Now we are ready to prove the crucial result on the worst-case number of searches required
to obtain a maximum matching. The result itself is on the number of different lengths that
can be found among the collection of shortest augmenting paths produced in successive
searches.
Theorem 5. Let s be the cardinality of a maximum matching and let P1 , P2 , . . . , Ps be a
sequence of shortest augmenting paths that build on the empty matching. Then the number
√
of distinct integers in the sequence |P1 |, |P2 |, . . . , |Ps | cannot exceed 2⌊ s⌋.
The intuition here is that, as we start the first search with an empty matching (or a
small one), there will be many disjoint shortest augmenting paths and so there will be
many repeated values towards the beginning of the sequence of path lengths; toward the
end, however, augmenting paths are more complex, longer, and rarer, so that most values
toward the end of the sequence will be distinct. The proof formalizes this intuition by
using a “midpoint” in the number of distinct values that is very far along the sequence of
√
augmenting paths: not at 2s , but at s − ⌊ s⌋.
√
Proof. Let r = ⌊s − s⌋ and consider Mr , the rth matching in the augmentation sequence.
Since |Mr | = r and since the maximum matching has cardinality s > r, we conclude (using
Berge’s extended theorem) that there exist exactly s − r vertex-disjoint augmenting paths
with respect to Mr . (These need not be the remaining augmenting paths in our sequence,
Pr+1, Pr+2 , . . . , Ps .) Altogether these paths contain at most all of the edges from Mr , so
that the shortest contains at most ⌊r/(s − r)⌋ such edges (if the edges of Mr are evenly
distributed among the s − r vertex-disjoint paths) and thus at most 2⌊r/(s − r)⌋ + 1 edges
in all. But the shortest augmenting path is precisely the next one picked, so that we get
√
|Pr+1 | ≤ 2 ⌊s − s⌋/(s − ⌊s − sqrts⌋) + 1
√ √
≤ 2(s − s)/ s + 1
√
≤ 2 s−1
√
< 2⌊ s⌋ + 1.
8
Since |Pr+1 | is an odd integer (all augmenting paths have odd length), we can conclude
√
that |Pr+1 | ≤ 2⌊ s⌋ − 1. Hence each of P1 , P2 , . . . , Pr must have length no greater than
√
√
2⌊ s⌋ − 1. Therefore, these r lengths must be distributed among at most ⌊ s⌋ different
values and this bound can be reached only if |Pr | = |Pr+1 |. Since |Pr+1 |, |Pr+2 |, . . . , |Ps |
√
cannot contribute more than s − r = ⌈ s ⌉ distinct values, the total number of distinct
√
integers in the sequence does not exceed 2⌊ s⌋.
p
Thus our improved algorithm iterates Θ( |V |) times, as opposed to Θ(|V |) times
for the original version, a substantial improvement since the worst-case cost of an iterationpremains unchanged. For bipartite graphs, we can construct a maximum matching in
O( |V | · |E|) time.
9
Download