Lecture 20

18.438 Advanced Combinatorial Optimization Updated April 29, 2012 Lecture 20 Lecturer: Michel X. Goemans Scribe: Claudio Telha (Nov 17, 2009) Given a finite set V with n elements, a function f : 2V → Z is submodular if for all X, Y ⊆ V , f (X ∪ Y ) + f (X ∩ Y ) ≤ f (X) + f (Y ). Submodular functions frequently arise in combinatorial optimization. For example, the cut function in a weighted undirected graph and the rank function of a matroid are both submodular. Submodular function minimization is the problem of finding the global minimum of f . It is polynomial time solvable (assuming that f can be efficiently computed), as it was shown by Grötschel, Lovász and Schrijver [2] using the ellipsoid method. In this lecture, we describe1 a recent efficient combinatorial algorithm for submodular function minimization by Iwata and Orlin [1]. 1 Preliminaries Given a submodular function f such that f (∅) = 0, the base polyhedron B(f ) is defined as follows: B(f ) = {x ∈ RV : x(S) ≤ f (S), ∀S ⊆ V and x(V ) = f (V )}. For any linear linear ordering L of the elements in V , we define L(v) = {u : u ≤L v}. It can be shown that yL (v) ≡ f (L(v)) − f (L(v) − {v}) defines a vertex of B(f ). Moreover, any vertex of B(f ) can be represented in this way. We will make extensive use of this representation. Let us define the positive part function x+ (v) = max{x(v), 0} and the negative part function − x (v) = min{x(v), 0}. The following min-max relation establishes the connection between the base polyhedron and submodular function minimization: X min f (W ) = max x− (v). W ⊆V x∈B(f ) v∈V The Iwata and Orlin’s algorithm solves the maximization problem maxx∈B(f ) x− (v), and simultaneously finds a minimizer W for f , that is maximal inclusion-wise. This is how we will prove the correctness of the minimax relation. Instead of using a combinatorial algorithm like the one we describe, P we could also use the ellipsoid algorithm as we can separate efficiently over B(f ) and the function v∈V x− (v) is concave (lots of details are missing here, including how to start and when to stop). We assume in this lecture that f is given by a value oracle. In other words, we are only allowed to query the value f (S) for any S ⊆ V . We express the running time of the algorithm as a function of the number of oracle calls to f and the time to execute all the arithmetic operations. Section 2 describes the algorithm, and Section 3 analyzes its running time. 2 Iwata and Orlin’s algorithm One question is how to certify that the x that the algorithm delivers is in B(f ). The key here is to express it as a convex combination of a polynomial number of extreme points of B(f ) (which we know correspond to linear orderings). The algorithm maintains 4 objects: 1 The description given here closely follows [1]. 20-1 • A set Λ of linear orderings of V representing vertices of B(f ). Initially, Λ has a unique (arbitrary) ordering L. P • An element x ∈ B(f ) defined by a convex combination x = L∈Λ λL yL of vertices in Λ. Initially, λL = 1 for our unique ordering L in Λ. • Distance labels dL : V → Z+ for each linear ordering L ∈ Λ. Initally, dL (v) = 0 for all v ∈ V . • A set W ⊆ V containing every minimizer of f . At the end of the algorithm, W will be the maximal minimizer, inclusion-wise. Initially, W = V . During the execution of the algorithm, the following properties of the distance labels are preserved: 1. If x(u) ≤ 0 then dL (u) = 0 for all L ∈ Λ. 2. If u ≤L v then dL (u) ≤ dL (v). (This can be restated as dL (v) < dL (u) implies v <L u. 3. For all L, L0 ∈ Λ, |dL (u) − dL0 (u)| ≤ 1. We need some additional definitions. We say that the distance labels {dL }L∈Λ are valid if the three properties of the distance labels hold. We also define the minimum distance labels: dmin (u) ≡ min dL (u). L∈Λ Note that Property 3 of the distance labels ensures that dL (u) ∈ {dmin (u), dmin (u) + 1}. We group the elements v ∈ V according to their value dmin (v), and we say that the minimum distance labels have a gap at k ≥ 1 if there exists an element v ∈ V with dmin (v) = k and there are no elements u ∈ V with dmin (u) = k − 1. The importance of this gap notion is the following lemma, that allow us to delete all the elements in V after a gap. Lemma 1 If the distance labels are valid and they have a gap at k ≥ 1, then any minimizer X of f satisfies X ⊆ W := {v ∈ V : dmin (v) < k} Proof: Consider any u ∈ W, v ∈ / W . For each L ∈ Λ, we have that dL (u) ≤ dmin (u) + 1 ≤ (k − 2) + P 1 < dmin (v) ≤ dL (v). By property 2, we have that u <L v for all L ∈ Λ. In particular, yL (W ) = w∈W yw = f (W ) for every L ∈ Λ and therefore x(W ) = f (W ). Note that any element v ∈ V − W satisfies dmin (v) > 0, so by property 1, x(v) > 0. This implies that for any set X 6⊆ W , we have f (X ∪ W ) ≥ x(X ∪ W ) = x(X − W ) + x(W ) > x(W ) = f (W ) which, by submodularity implies: f (X ∩ W ) ≤ f (X) + f (W ) − f (X ∪ W ) < f (X), so X cannot be a minimizer. Whenever there is a gap at some level k ≥ 1, the algorithm updates W . Lemma 1 ensures that every minimizer of f is still contained in W . If v ∈ V is removed from W , we set dL (v) = n for all L ∈ Λ. We need to discuss when to stop the algorithm. We will use the value η = maxv∈W x(v) as a measure of the progress of the algorithm towards P an optimal solution. We can stop when η ≤ 0 as this would imply equality between f (W ) and v∈V x− (v). If we further assume that the submodular function takes integer values, we can stop when η < 1/n. Indeed, every possible minimizer satisfies X ⊆ W , and therefore, if η < 1/n, then f (W ) = x(W ) = x(X) + x(W − X) < f (X) + 1, so f (W ) ≤ f (X), by integrality. Thus, W is a minimizer. We have the main elements to describe the structure algorithm. Step 4 will be described in detail in the next subsection. 20-2 1. Choose an arbitrary order L0 and initialize Λ = {L0 }, x = yL0 , dL0 ≡ 0 and W = V . 2. Compute η = maxv∈W x(v). If η < 1/n, return W as the global minimizer of f . Otherwise, define δ = η/4n and find µ such that δ ≤ µ ≤ η − δ and there is no u ∈ W with µ − δ < x(u) < µ + δ. 3. Find u = arg min{dmin (v) : v ∈ W, x(v) > µ} and set l = dmin (u). Let L1 be a linear ordering in Λ with dL1 (u) = dmin (u). 4. Using the push procedure described in Section 2.1, build a new linear ordering L2 from L1 . Update Λ and x. 5. If there is a gap at any k ≥ 1, update W and set the distance labels dL (v) to n, for every vertex removed and every L ∈ Λ. Go to Step 2. 2.1 The push procedure We now the describe the push procedure (Step 4). This step updates x by adding one linear P order L2 to Λ and possibly removing one linear order L1 from Λ. If the current x is given by x = L∈Λ λL xL , the updated x0 satisfies x0 − x = yL2 − yL1 , so this means that λL1 is decreased by and λL2 is set to . will be set as large as possible under some conditions. First, let us see how to choose L1 . We find a value µ ∈ [δ, η −δ] such that the interval (µ−δ, µ+δ) does not contain any value in {x(v)}v∈W (this µ is guaranteed to exists by the pigeonhole principle). Finally, we find u and L1 such that u = arg min u:x(u)>µ dmin (u), dL1 (u) = dmin (u). Once L1 is fixed, we define L2 . Consider the sets S = {v ∈ W : dL1 (v) = dmin (u)} and R = {v ∈ S : x(v) > µ}. It is clear that u ∈ R ⊆ S and that the elements in S are consecutive in L1 . We obtain L2 from L1 by preserving the position of the elements in V − S, and then moving all the elements in R just after all the elements in S \ R, preserving the relative order in both sets. See Figure 2.1 for an illustration. S R L1 R S L2 Figure 1: The elements in R are moved backwards inside of S. The order in R and S \R is preserved. We also need to define new distance labels for L2 . This actually justifies our choice of R. Indeed, observe that we can increase the label of all elements of R by 1 unit and still stay feasible (since (i) 20-3 all elements of S have the same label and (ii) all elements v of R have dL1 (v) = dmin (v) because of the choice of u). Thus, the following labels are valid for L2 : ( dL1 (v) + 1 if v ∈ R, dL2 (v) = dL1 (v) if v ∈ / R. In order to conclude the procedure, we need to define . We choose as the larger satisfying two conditions. The first one is to keep λL1 nonnegative. This imposes the condition ≤ λL1 . The second condition requires that x0 (v) ≥ µ in R and x0 (v) ≤ µ in S \ R, both valid conditions for the original x. We choose as the larger value such that both conditions hold. If = λL1 , we call the update process a saturating push. Otherwise, we call it a non-saturating push. Note that a saturating push replaces L1 by L2 in the convex combination defining x. This completes the definition of the algorithm. We conclude this section with a simple observation about how the vector x changes with each push. The total change is x0 − x = yL2 − yL1 . It is not hard to see that yL2 (v) = yL1 (v) when v ∈ / S. By submodularity we also have that yL2 (v) ≤ yL1 (v) when v ∈ R, and yL2 (v) ≥ yL1 (v) when v ∈ S \ R. It follows that = x0 (v) for v ∈ /S x(v) ≥ x0 (v) for v ∈ R x(v) ≤ x0 (v) for v ∈ S \ R. x(v) Intuitively, we are thus making progress as we are decreasing some large x(v) (and increasing some smaller ones) and we are increasing some distance labels. In the next section we analyze the running time of the algorithm. 3 Running time We first prove that the number of saturating and non-saturating pushes is poly(n, log M ), where M is the maximum absolute value of f . In both cases, we use a potential function argument. Lemma 2 The number of non-saturating pushes is O(n3 log(nM )). P + 2 0 Proof: Consider the potential function Φ(x) = v∈W (x (v)) . Let us prove that if x is an update of x using a non-saturating push, then 1 . Φ(x0 ) ≤ Φ(x) 1 − 16n3 Let Q+ = {v : v ∈ S \ R, x0 (v) > 0}. Then, Φ(x) − Φ(x0 ) = X (x+ (v))2 − (x0 (v))2 v∈R∪Q+ = X (x+ (v) − x0 (v))(x+ (v) + x0 (v)) v∈R∪Q+ ≥ (2µ − δ) X (x+ (v) − x0 (v)) + (2µ + δ) v∈Q+  = 2µ  (x(v) − x0 (v)) v∈R  X X  (x+ (v) − x0 (v)) + δ  v∈Q+ ∪R  X v∈Q+ 20-4 (x0 (v) − x+ (v)) + X v∈R (x(v) − x0 (v)) Note that x+ (v) ≥ x0 (v) for v ∈ S \ (Q+ ∪ R), and thus we can write:   ! X X X Φ(x) − Φ(x0 ) ≥ 2µ (x+ (v) − x0 (v)) + δ  (x0 (v) − x+ (v)) + (x(v) − x0 (v)) v∈Q+ v∈S But the first term is ≥ 0 as x+ (v) ≥ x(v) and Φ(x) − Φ(x0 ) ≥ δ X P v∈S x(v) = v∈R P (x0 (v) − x+ (v)) + δ v∈Q v∈S X x0 (v). Therefore: (x(v) − x0 (v)). v∈R All these terms are ≥ 0 and, by construction, we know that |x+ (v) − x0 (v)| ≥ δ for some v, and therefore Φ(x) − Φ(x0 ) ≥ δ 2 for a non-starurating push and nonnegative for a saturating push. 1 , for every saturating push. Furthermore, Φ(x) ≤ nη 2 = 16n3 δ 2 , and thus Φ(x0 ) ≤ Φ(x) 1 − 16n 3 Note that if we start the algorithm with x = x0 , then Φ(x0 ) ≤ nM 2 , so after O(n3 log(nM )) nonsaturating pushes, the current x satisfies Φ(x) < 1/n2 . Since this condition implies the termination condition η < 1/n, the result follows. Lemma 3 The number of saturating pushes is O(n5 log(nM )). P P Proof: We introduce another potential function φ(Λ) = L∈Λ v∈V (n − dL (v)). Note that in any push operation, the distance labels of L2 are no smaller than the distance labels of L1 , and at least one of the labels increases. It follows that a saturating push increases the potential function by at least one. A non-saturating push adds a new linear order to Λ, and therefore, it increases Λ by at most n2 . From this relation, and Lemma 2, it follows that the number of saturating pushes is O(n5 log(nM )), otherwise the potential function would be negative. Both pushes can be carried out in polynomial time, using O(n) oracle calls, so the algorithm runs in polynomial time in the oracle model. References [1] Iwata, S. and J. Orlin (2009). A simple combinatorial algorithm for submodular function minimization. Accepted for the 2009 SODA Conference. [2] Grötschel, M. and Lovász, L. and Schrijver, A. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica (1) 2:169-177, 1981. 20-5

Lecture 20

Related documents

Products

Support

Lecture 20

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib