Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely the construction of a minimum spanning tree (MST). Most of you are familiar with Prim’s and Kruskal’s algorithms, but they are just two of a rather large family of greedy algorithms for the MST problem. We look at two more algorithms in that family and then proceed to prove that any algorithm in that family indeed returns a minimum spanning tree. Greedy algorithms for the MST problem Greedy algorithms use only a fraction of the information available about the problem. Bottom-up greedy methods build solutions piece by piece (starting from the empty set) by selecting, among the remaining pieces, that which optimizes the value of the partial solution, while ensuring that the subset selected so far can be extended into a feasible solution. Top-down greedy methods (much less common) build solutions from the full set of pieces by removing one piece at a time, selecting a piece whose removal optimizes the value of the remaining collection while ensuring that this collection continues to contain feasible solutions. Thus in both cases, and indeed for any greedy algorithm, the idea is to produce the largest immediate gain while maintaining feasibility. In the case of the MST problem, we are given an undirected graph G = (V, E), and a length (distance/weight/etc.) for each edge, d : E −→ R. Our aim is to find a spanning tree—a tree connecting all vertices—of minimum total weight. MSTs have two important properties, usually called the cycle property and the cut property. The cycle property says that, for any cycle X in the graph, X ⊂ E, if the weight of an edge e ∈ X is strictly larger than the weight of every other edge of X , then this edge e cannot belong to any MST of the graph. (Phrased slightly differently, for any cycle X in the graph, if the weight of an edge e ∈ X is larger than or equal to the weight of every other edge of X , then there exists an MST that does not contain e.) Recall that a cut in a graph is a partition of the vertices of / this partition induces the graph into two non-empty subsets, Y = {S,V − S}, S ⊂ V , S 6= 0; a set of cut edges, the cut-set CY = {u, v} | u ∈ S, v ∈ V − S}. In a connected graph (and we are always given a connected graph for the MST problem), there is a bijection between cuts and cut-sets, so that we can specify one or the other. The cut property says that, for any cut-set C in graph G, if the weight of an edge e ∈ C is strictly smaller than the weight of every other edge of C, then this edge belongs to all MSTs of this graph. (Phrased slightly differently, for any cut-set C in graph G, if the weight of an edge e ∈ CY is smaller than or equal to the weight of every other edge of CY , then there exists an MST that contains e.) 1 Now a bottom-up greedy method for the MST starts from an empty set and adds one piece at a time (an edge or a vertex, although the distinction is somewhat artificial), subject to not creating cycles, until a tree is built (or until there remains no candidate piece— the two are equivalent). In contrast, a top-down greedy method for the MST starts from the entire graph and removes one edge at a time, subject to not disconnecting the graph, until no more edges can be removed. In each case, the choice is made on the basis of the contribution made by the chosen edge (or vertex) to the current collection, a purely local decision. A top-down approach is the Reverse-Delete algorithm, first mentioned by Kruskal (but not to be confused with Kruskal’s algorithm, which is, of course, a bottom-up approach). The Reverse-Delete algorithm starts with the original graph and deletes edges from it. The algorithm works as follows. Start with graph G, which contains a list of edges E. Sort the edges in E in decreasing order by weight, then go through the edges one by one, from largest weight down to smallest weight. For each edge in turn, check whether deleting the edge would disconnect the graph; if not, remove that edge. The proof of optimality for this algorithm is quite straightforward, using the cycle-property. If the graph G has no cycles, then it is a tree and thus it is its own unique (and hence also optimal) spanning tree. As long as there remains a cycle in the graph, we do not have a tree and must remove at least one edge. Reverse-Delete removes the remaining edge of largest weight; this edge must appear in at least one cut-set, because it is part of cycle (since its removal does not disconnect the graph) and it is necessarily the edge of largest weight in any cutset in which it appears and thus the cycle property ensures that there exists at least one MST that does not include it. We can view the bottom-up methods as proceeding by coalescing equivalence classes. An equivalence class consists of a set of vertices with an associated set of edges that form a minimum spanning tree for the vertices in the class. Initially, each vertex is the sole element of its equivalence class and its associated set of edges is empty. When the algorithm terminates, only one equivalence class remains and the associated set of edges defines a minimum spanning tree. At each step of the algorithm, we select an edge with an endpoint in each of two equivalence classes and coalesce these two classes, thereby combining two trees into one larger tree. (Edges with both endpoints in the same equivalence class are permanently excluded, since their selection would lead to a cycle.) In order to minimize the increase in the value of the objective function, greediness dictates that the allowable edge of least cost be chosen next. This choice can be made with or without additional constraints. At one extreme we can apply no additional constraint and always choose the shortest edge that combines two spanning trees into a larger spanning tree; at the other extreme we can designate a special equivalence class which must be involved in any merge operation. The first approach can be viewed as selecting edges; it is known as Kruskal’s algorithm, after J. B. Kruskal, who first presented it in 1956; the second is best viewed (at least for programming purposes) as selecting vertices (adding them one by one to a single partial spanning tree) and is known as Prim’s algorithm, after R.C. Prim, who presented it in 1957. A third, more general, approach is in fact the oldest algorithm proposed for the MST and among the oldest algorithms formally defined in Computer Science; it is due to O. Boruvka, who first published it in 1926 as a method of constructing an efficient 2 electrical network to serve the city of Moravia. Boruvka’s algorithm considers all current equivalence classes (spanning trees for subsets of vertices) at once, joins each to its closest “neighbor” (where the distance is defined as the length of the shortest edge with one endpoint in each of the two equivalence classes), subject to not creating a cycle. We can prove the correctness of all three coalescence-based algorithms (Kruskal’s, Prim’s, and Boruvka’s) at once by proving the correctness of the more general algorithm. Theorem 1. If, at each step of the algorithm, an arbitrary equivalence class, Ti , is selected, but the edge selected for inclusion is the smallest that has exactly one endpoint in Ti , then the final tree that results is a minimum spanning tree. Proof. As G is connected, there is always at least one allowable edge at each iteration. As each iteration can proceed, irrespective of our choice of Ti , and as an iteration decreases the number of equivalence classes by one, the algorithm terminates. The proof is by induction, though we present it somewhat informally. Let TA be a spanning tree produced by the above algorithm and let TM be a minimum spanning tree. We give a procedure for transforming TM into TA . We will form a sequence of trees, each slightly different from its predecessor, with the property that the sums of the lengths of the edges in successive trees are the same. Since TA will be at the far end of the sequence of transformations, it must also be a minimum spanning tree. Label the edges of TA by the iteration on which they entered the tree. Let ei be the edge of lowest index that is present in TA but not in TM . The addition of ei to TM forms a cycle. Note that the length of ei is greater than or equal to that of every other edge in the cycle; otherwise, TM would not be a minimum spanning tree, because breaking any edge in the cycle would produce a spanning tree and breaking a longer edge, if one such existed, would reduce the total cost. Now, when ei was added to the forest of trees that eventually became TA , it connected an arbitrarily chosen tree, Ti , to some other tree. Traverse the cycle in TM starting from the endpoint of ei that was in Ti and going in the direction that does not lead to an immediate traversal of ei . At some point in this traversal, we first encounter an edge with exactly one endpoint in Ti . It might be the first edge we encounter, or it might be many edges into the traversal, but such an edge must exist since the other endpoint of ei is not in Ti . Furthermore, this edge, call it ê, cannot be ei . Now the length of ei cannot exceed that of ê, because ê was an allowable edge, but was not selected; thus the two edges have equal length. We replace ê with ei in TM : the resulting tree has the same total length. Note that ê may or may not be an edge of TA , but if it is, then its index is greater than i, so that our new tree now first differs from TA at some index greater than i. Replacing TM with this new minimum spanning tree, we continue this process until there are no differences, i.e., until TM has been transformed into TA . Iterative improvement methods Our next class of methods are those that start with a complete solution structure and proceed to refine it through successive iterations; at each iteration, an improvement is made 3 on a local basis. This is a more powerful approach that the greedy approach: whereas a greedy approach never alters any choice it has already made, an iterative improvement approach has no problem doing so. Moreover, the number of iterations is not fixed—indeed, bounding the number of iterations is the major problem in analyzing an iterative improvement algorithm—, whereas the number of steps taken by a greedy algorithm is simply the number of elements included in the solution. Matching and flow Matching and network flow are the two most important problems for which an iterative improvement method delivers optimal solutions. We first consider the maximum matching problem in bipartite graphs. Maximum bipartite matching A matching in a graph is a subset of edges that do not share any endpoint; a maximum matching is just a matching of maximum cardinality. A graph is said to be bipartite if its vertices can be partitioned into two sets in such a way that all edges of the graph will have one endpoint in one set and the other endpoint in the other set—the vertices of each set form an independent set. Matching in bipartite graphs is one of the fundamental optimization problems, as it is used for assignment problems, that is, problems where one wants to find the optimal way to assign, say works crews to jobs—problems to be solved everyday on construction sites, in factories, in airlines and railways, as well as in job scheduling for computing systems. We describe an iterative improvement algorithm for this problem: an algorithm that refines the current solution through local changes, using an approach that can be repeated many times, each time improving the quality of the solution. In the problem of maximum bipartite matching, in order to improve an existing matching, we must start by identifying unmatched vertices of degree at least 1—we need at least one such on each side of the bipartite graph. Consider the trivial 4-vertex bipartite graph with vertex set {a, b, 1, 2} and edge set {{a, 1}, {a, 2}, {b, 1}}, with current matching M = {{a, 1}}. It is clear that there exists a larger matching, namely M ∗ = {{a, 2}, {b, 1}}. Note that, in order to transform M into M ∗ , the set of matched vertices will simply gain two new members, but the set of matched edges, while larger by one, may have nothing in common with the previous set. In this trivial example, we could have started our search at vertex b, which has just one neighbor, vertex 1; but vertex 1 was already matched, so we had to unmatch it (undoing a previous decision), which makes its previous “mate,” vertex a, to become an unmatched vertex of degree at least 1, to follow the matched edge back to vertex a, where we found that a had an unmatched neighbor, vertex 2. We thus identified a path of three edges, the first and the last unmatched, the middle one matched; by flipping the status of each edge, from matched to unmatched and vice versa, we replaced a path with one matched edge by the same path, but with two matched edges. Let us formally define what this type of path is. 4 Let G = (V, E) be a graph and M be a matching. An alternating path with respect to M is a path such that such that every other edge on the path is in M, while the others are in E − M. If, in addition, the path is of odd length and the first and last vertices on the path are unmatched, then the alternating path is called an augmenting path. The reason it is called an augmenting path is that we can use is to augment the size of the matching: whereas an alternating path may have the same number of edges in M and in E − M, or one more in M, or one more in E − M, an augmenting path must have one more edge in M than in E − M. Moreover, because of the definition of matchings, it is safe to flip the status of every edge in an augmenting path from matched to unmatched, and vice versa: none of the vertices on the augmenting path can have been the endpoint of a matched edge other than those already on the path. Augmenting paths are thus the tool we needed to design an iterative improvement algorithm: in general terms, we start with an arbitrary matching (including possibly an empty one), then we search for an augmenting path in the graph; if one is found, we augment the matching by flipping the status of all edges along the augmenting path; if none is found, we stop. The obvious question, at this point, is whether the absence of any augmenting path indicates just a local maximum or a global one. The answer is positive: if G has no augmenting path with respect to M, then M is a maximum matching—it is optimal. We phrase this result positively. Theorem 2. Let G be a graph, M ∗ an optimal matching for G, and M any matching for G such that we have |M| < |M ∗ |. Then G has an augmenting path with respect to M. This result is due to French mathematician Claude Berge and so known as Berge’s theorem. The proof is deceptively simple, but note that it is nonconstructive. Proof. Let M ⊕ M ∗ denote the symmetric difference of M and M ∗, i.e., M ⊕ M ∗ = (M ∪ M ∗ ) − (M ∩ M ∗ ), and consider the subgraph G′ = (V, M ⊕ M ∗ ). All vertices of G′ have degree two or less, because they have at most one incident edge from each of M and M ∗ ; moreover, every connected component of G′ is one of: (i) a single vertex; (ii) a cycle of even length, with edges drawn alternately from M and M ∗ ; or (iii) a path with edges drawn alternately from M and M ∗ . As the cardinality of M ∗ exceeds that of M, there exists at least one path composed of alternating edges from M and M ∗ , with more edges from M ∗ than from M. The path must begin and end with edges from M ∗ and the endpoints are unmatched in M, because the path is a connected component of G′ ; hence this path is an augmenting path. Berge’s theorem shows that the use of augmenting paths not only enables us to improve on the quality of an initial solution, it enables us to obtain an optimal solution. Note that the definitions of alternating paths and augmenting paths hold just as well for nonbipartite graphs as for bipartite ones; and Berge’s theorem does too. The next step is to develop an algorithm for finding augmenting path and this is where the difference between bipartite and nonbipartite graphs will show. 5 In a bipartite graph, any augmenting path begins on one side of the graph and ends on the other. Thus a search algorithm can simply start at any unmatched vertex on one side of the graph, say the left side, and traverse any edge to the other side. If the endpoint on the right side is also unmatched, then an augmenting path, consisting of a single unmatched edge, has been found. If the other endpoint is matched, then the algorithm traverses that matched edge to the left side and follows any unmatched edge, if one exists, to an unvisited vertex on the right side. The process is repeated until either an augmenting path is found or a deadend on the left side is reached. Unmatched edges are always traversed from the left side to the right side and matched edges in the opposite direction. If a deadend is reached, we must explore other paths until we find an augmenting path or run out of possibilities. In developing an augmenting path, choices arise in only two places: in selecting an initial unmatched vertex and in selecting an unmatched edge out of a vertex on the left side. In order to examine all possibilities for augmenting paths, we need to explore these choices in some systematic way; because all augmenting paths make exactly the same contribution of one additional matched edge, we should search for the shortest augmenting paths. Thus we use a breadth-first search of the graph, starting at each unmatched vertex on the left side. If any of the current active vertices (the frontier in the BFS, which will always be vertices on the left side) has an unmatched neighbor on the right, we are done. Otherwise, from each neighbor on the right, we follow the matched edge of which it is an endpoint back to a vertex on the left and repeat the process. Thus the BFS increases path lengths by 2 at each iteration—because the move back to the left along matched edges is forced. The BFS takes O(|E|) time, as it cannot look at an edge more than twice (once from each end); as the number of augmenting paths we may find is in O(|V |), the running time of this BFS augmenting strategy is O(|V | · |E|). Since the input size is Θ(|V | + |E|), the time taken is more than linear, but no more than quadratic, in the size of the input. However, we are wasting a lot of time: each new BFS starts from scratch and, most likely, will follow many paths already followed in the previous BFS. And each BFS produces a single new augmenting path. Yet, there typically will be a number of augmenting paths in a graph with respect to a matching, especially if that matching is small. Instead of stopping at the first unmatched neighbor on the right, we could finish that stage of BFS, collecting all unmatched neighbors on the right. Doing so would not increase the worst-case running time of the BFS, yet might yield multiple augmenting paths of the same length. However, we can use multiple augmenting paths only if they are vertex-disjoint, since otherwise we could cause conflicting assignments of vertices or even edges. The BFS might discover that kl of the current active vertices (on the left) have an unmatched neighbor on the right, but some of these neighbors might be shared; if there are kr unmatched neighbors on the right, the maximum number of disjoint augmenting paths is min{kl , kr }. The number may be smaller, however, because this sharing of vertices can occur at any stage along the alternating paths. Thus we must adjust our BFS to provide backpointers, so that we can retrace paths from right-side unmatched vertices reached in the search; and we must add a backtracing phase, which retraces at most one path for each unmatched vertex reached on the right-hand side. The backtracing is itself a graph search. Specifically, for each left-side vertex encountered during the breadth-first search, we record its distance from the clos6 est unmatched left-side vertex, passing as before through matched right-side vertices. We use this information to run a (backward) depth-first search from each unmatched right-side vertex discovered during the BFS: during a DFS we consider only edges that take us one level closer to unmatched left-side vertices. When we discover an augmenting path, we eliminate the vertices along this path from consideration by any remaining DFS, thereby ensuring that our augmenting paths will be vertex-disjoint. We can hope that the number of augmenting paths found during each search is more than a constant, so that the number of searches (iterations) to be run is significantly decreased, preferably to o(|V |). We characterize the gain to be realized through a series of small theorems; these theorems apply equally to general graphs and bipartite graphs. We begin with a more precise proof of Berge’s theorem that allows us to refine its conclusion. Theorem 3. Let M1 and M2 be two matchings in some graph, G = (V, E), with |M1 | > |M2 |. Then the subgraph G′ = (V, M1 ⊕ M2 ) contains at least |M1 | − |M2 | vertex-disjoint augmenting paths with respect to M2 . Proof. Recall that every connected component of G′ is one of: (i) a single vertex; (ii) a cycle of even length, with edges alternately drawn from M1 and M2 ; or (iii) a path with edges alternately drawn from M1 and M2 . Let Ci = (Vi , Ei ) be the ith connected component and define δ(Ci ) = |Ei ∩ M1 | − |Ei ∩ M2 |. From our previous observations, we know that δ(Ci ) must be one of −1, 0, or 1 and that it equals 1 exactly when Ci is an augmenting path with respect to M2 . Now we have ∑ δ(Ci ) = |M1 − M2| − |M2 − M1| = |M1| − |M2|, i so that at least |M1 | − |M2 | components Ci are such that δ(Ci ) equals 1, which proves the theorem. This tells us that many disjoint augmenting paths exist, but says nothing about their lengths, nor about finding them. Indeed, if we take M2 to be the empty set and M1 to be a maximum matching, the theorem tells us that the original graph contains enough disjoint augmenting paths to go from no matching at all to a maximum matching in a single step! But these paths will normally be of various lengths and finding such a set is actually a very hard problem. We will focus on finding a set of disjoint shortest augmenting paths (thus all of the same length) with respect to the current matching; such a set will normally not contain enough paths to obtain a maximum matching in one step. Our next result is intuitively obvious, but the theorem proves it and also makes it precise: successive shortest augmenting paths cannot become shorter. Theorem 4. Let G = (V, E) be a graph, with M a nonmaximal matching, P a shortest augmenting path with respect to M, and P′ any augmenting path with respect to the augmented matching M ⊕ P. Then we have |P′ | ≥ |P| + |P ∩ P′ | 7 Proof. The matching M ⊕ P ⊕ P′ contains two more edges than M, so that, by our previous theorem, M ⊕ (M ⊕ P ⊕ P′ ) = P ⊕ P′ contains (at least) two vertex-disjoint augmenting paths with respect to M, call them P1 and P2 . Thus we have |P ⊕ P′ | ≥ |P1 | + |P2 |. Since P is a shortest augmenting path with respect to M, we also have |P| ≤ |P1 | and |P| ≤ |P2 |, so that we get |P ⊕ P′ | ≥ 2|P|. Since P ⊕ P′ is (P ∪ P′ ) − (P ∩ P′ ), we can write |P ⊕ P′ | = |P| + |P′ | − |P ∩ P′ |. Substituting in our preceding inequality yields our conclusion. An interesting corollary (especially given our BFS approach to finding shortest augmenting paths) is that two successive shortest augmenting paths have the same length only if they are disjoint. Our new algorithm uses all disjoint shortest augmenting paths it finds, as follows. • Begin with an arbitrary (possibly empty) matching. • Repeatedly find a maximal set of vertex-disjoint shortest augmenting paths, and use them all to augment the current matching, until no augmenting path can be found. Now we are ready to prove the crucial result on the worst-case number of searches required to obtain a maximum matching. The result itself is on the number of different lengths that can be found among the collection of shortest augmenting paths produced in successive searches. Theorem 5. Let s be the cardinality of a maximum matching and let P1 , P2 , . . . , Ps be a sequence of shortest augmenting paths that build on the empty matching. Then the number √ of distinct integers in the sequence |P1 |, |P2 |, . . . , |Ps | cannot exceed 2⌊ s⌋. The intuition here is that, as we start the first search with an empty matching (or a small one), there will be many disjoint shortest augmenting paths and so there will be many repeated values towards the beginning of the sequence of path lengths; toward the end, however, augmenting paths are more complex, longer, and rarer, so that most values toward the end of the sequence will be distinct. The proof formalizes this intuition by using a “midpoint” in the number of distinct values that is very far along the sequence of √ augmenting paths: not at 2s , but at s − ⌊ s⌋. √ Proof. Let r = ⌊s − s⌋ and consider Mr , the rth matching in the augmentation sequence. Since |Mr | = r and since the maximum matching has cardinality s > r, we conclude (using Berge’s extended theorem) that there exist exactly s − r vertex-disjoint augmenting paths with respect to Mr . (These need not be the remaining augmenting paths in our sequence, Pr+1, Pr+2 , . . . , Ps .) Altogether these paths contain at most all of the edges from Mr , so that the shortest contains at most ⌊r/(s − r)⌋ such edges (if the edges of Mr are evenly distributed among the s − r vertex-disjoint paths) and thus at most 2⌊r/(s − r)⌋ + 1 edges in all. But the shortest augmenting path is precisely the next one picked, so that we get √ |Pr+1 | ≤ 2 ⌊s − s⌋/(s − ⌊s − sqrts⌋) + 1 √ √ ≤ 2(s − s)/ s + 1 √ ≤ 2 s−1 √ < 2⌊ s⌋ + 1. 8 Since |Pr+1 | is an odd integer (all augmenting paths have odd length), we can conclude √ that |Pr+1 | ≤ 2⌊ s⌋ − 1. Hence each of P1 , P2 , . . . , Pr must have length no greater than √ √ 2⌊ s⌋ − 1. Therefore, these r lengths must be distributed among at most ⌊ s⌋ different values and this bound can be reached only if |Pr | = |Pr+1 |. Since |Pr+1 |, |Pr+2 |, . . . , |Ps | √ cannot contribute more than s − r = ⌈ s ⌉ distinct values, the total number of distinct √ integers in the sequence does not exceed 2⌊ s⌋. p Thus our improved algorithm iterates Θ( |V |) times, as opposed to Θ(|V |) times for the original version, a substantial improvement since the worst-case cost of an iterationpremains unchanged. For bipartite graphs, we can construct a maximum matching in O( |V | · |E|) time. 9