Lecture 20

advertisement
18.438 Advanced Combinatorial Optimization
Updated April 29, 2012
Lecture 20
Lecturer: Michel X. Goemans
Scribe: Claudio Telha (Nov 17, 2009)
Given a finite set V with n elements, a function f : 2V → Z is submodular if for all X, Y ⊆ V ,
f (X ∪ Y ) + f (X ∩ Y ) ≤ f (X) + f (Y ). Submodular functions frequently arise in combinatorial
optimization. For example, the cut function in a weighted undirected graph and the rank function
of a matroid are both submodular.
Submodular function minimization is the problem of finding the global minimum of f . It is polynomial time solvable (assuming that f can be efficiently computed), as it was shown by Grötschel,
Lovász and Schrijver [2] using the ellipsoid method. In this lecture, we describe1 a recent efficient
combinatorial algorithm for submodular function minimization by Iwata and Orlin [1].
1
Preliminaries
Given a submodular function f such that f (∅) = 0, the base polyhedron B(f ) is defined as follows:
B(f ) = {x ∈ RV : x(S) ≤ f (S), ∀S ⊆ V
and x(V ) = f (V )}.
For any linear linear ordering L of the elements in V , we define L(v) = {u : u ≤L v}. It can be
shown that yL (v) ≡ f (L(v)) − f (L(v) − {v}) defines a vertex of B(f ). Moreover, any vertex of B(f )
can be represented in this way. We will make extensive use of this representation.
Let us define the positive part function x+ (v) = max{x(v), 0} and the negative part function
−
x (v) = min{x(v), 0}. The following min-max relation establishes the connection between the base
polyhedron and submodular function minimization:
X
min f (W ) = max
x− (v).
W ⊆V
x∈B(f )
v∈V
The Iwata and Orlin’s algorithm solves the maximization problem maxx∈B(f ) x− (v), and simultaneously finds a minimizer W for f , that is maximal inclusion-wise. This is how we will prove the
correctness of the minimax relation. Instead of using a combinatorial algorithm like the one we
describe, P
we could also use the ellipsoid algorithm as we can separate efficiently over B(f ) and the
function v∈V x− (v) is concave (lots of details are missing here, including how to start and when
to stop).
We assume in this lecture that f is given by a value oracle. In other words, we are only allowed
to query the value f (S) for any S ⊆ V . We express the running time of the algorithm as a function
of the number of oracle calls to f and the time to execute all the arithmetic operations.
Section 2 describes the algorithm, and Section 3 analyzes its running time.
2
Iwata and Orlin’s algorithm
One question is how to certify that the x that the algorithm delivers is in B(f ). The key here is
to express it as a convex combination of a polynomial number of extreme points of B(f ) (which we
know correspond to linear orderings).
The algorithm maintains 4 objects:
1 The
description given here closely follows [1].
20-1
• A set Λ of linear orderings of V representing vertices of B(f ). Initially, Λ has a unique
(arbitrary) ordering L.
P
• An element x ∈ B(f ) defined by a convex combination x =
L∈Λ λL yL of vertices in Λ.
Initially, λL = 1 for our unique ordering L in Λ.
• Distance labels dL : V → Z+ for each linear ordering L ∈ Λ. Initally, dL (v) = 0 for all v ∈ V .
• A set W ⊆ V containing every minimizer of f . At the end of the algorithm, W will be the
maximal minimizer, inclusion-wise. Initially, W = V .
During the execution of the algorithm, the following properties of the distance labels are preserved:
1. If x(u) ≤ 0 then dL (u) = 0 for all L ∈ Λ.
2. If u ≤L v then dL (u) ≤ dL (v). (This can be restated as dL (v) < dL (u) implies v <L u.
3. For all L, L0 ∈ Λ, |dL (u) − dL0 (u)| ≤ 1.
We need some additional definitions. We say that the distance labels {dL }L∈Λ are valid if the three
properties of the distance labels hold. We also define the minimum distance labels:
dmin (u) ≡ min dL (u).
L∈Λ
Note that Property 3 of the distance labels ensures that
dL (u) ∈ {dmin (u), dmin (u) + 1}.
We group the elements v ∈ V according to their value dmin (v), and we say that the minimum
distance labels have a gap at k ≥ 1 if there exists an element v ∈ V with dmin (v) = k and there are
no elements u ∈ V with dmin (u) = k − 1. The importance of this gap notion is the following lemma,
that allow us to delete all the elements in V after a gap.
Lemma 1 If the distance labels are valid and they have a gap at k ≥ 1, then any minimizer X of
f satisfies X ⊆ W := {v ∈ V : dmin (v) < k}
Proof: Consider any u ∈ W, v ∈
/ W . For each L ∈ Λ, we have that dL (u) ≤ dmin (u) + 1 ≤
(k − 2) + P
1 < dmin (v) ≤ dL (v). By property 2, we have that u <L v for all L ∈ Λ. In particular,
yL (W ) = w∈W yw = f (W ) for every L ∈ Λ and therefore x(W ) = f (W ). Note that any element
v ∈ V − W satisfies dmin (v) > 0, so by property 1, x(v) > 0. This implies that for any set X 6⊆ W ,
we have
f (X ∪ W ) ≥ x(X ∪ W ) = x(X − W ) + x(W ) > x(W ) = f (W )
which, by submodularity implies:
f (X ∩ W ) ≤ f (X) + f (W ) − f (X ∪ W ) < f (X),
so X cannot be a minimizer.
Whenever there is a gap at some level k ≥ 1, the algorithm updates W . Lemma 1 ensures that
every minimizer of f is still contained in W . If v ∈ V is removed from W , we set dL (v) = n for all
L ∈ Λ.
We need to discuss when to stop the algorithm. We will use the value η = maxv∈W x(v) as a
measure of the progress of the algorithm towards
P an optimal solution. We can stop when η ≤ 0 as
this would imply equality between f (W ) and v∈V x− (v). If we further assume that the submodular
function takes integer values, we can stop when η < 1/n. Indeed, every possible minimizer satisfies
X ⊆ W , and therefore, if η < 1/n, then f (W ) = x(W ) = x(X) + x(W − X) < f (X) + 1, so
f (W ) ≤ f (X), by integrality. Thus, W is a minimizer.
We have the main elements to describe the structure algorithm. Step 4 will be described in detail
in the next subsection.
20-2
1. Choose an arbitrary order L0 and initialize Λ = {L0 }, x = yL0 , dL0 ≡ 0 and W = V .
2. Compute η = maxv∈W x(v). If η < 1/n, return W as the global minimizer of f . Otherwise,
define δ = η/4n and find µ such that δ ≤ µ ≤ η − δ and there is no u ∈ W with µ − δ < x(u) <
µ + δ.
3. Find u = arg min{dmin (v) : v ∈ W, x(v) > µ} and set l = dmin (u). Let L1 be a linear ordering
in Λ with dL1 (u) = dmin (u).
4. Using the push procedure described in Section 2.1, build a new linear ordering L2 from L1 .
Update Λ and x.
5. If there is a gap at any k ≥ 1, update W and set the distance labels dL (v) to n, for every
vertex removed and every L ∈ Λ. Go to Step 2.
2.1
The push procedure
We now the describe the push procedure (Step 4). This step updates x by adding one linear
P order L2
to Λ and possibly removing one linear order L1 from Λ. If the current x is given by x = L∈Λ λL xL ,
the updated x0 satisfies
x0 − x = yL2 − yL1 ,
so this means that λL1 is decreased by and λL2 is set to . will be set as large as possible under
some conditions.
First, let us see how to choose L1 . We find a value µ ∈ [δ, η −δ] such that the interval (µ−δ, µ+δ)
does not contain any value in {x(v)}v∈W (this µ is guaranteed to exists by the pigeonhole principle).
Finally, we find u and L1 such that
u = arg
min
u:x(u)>µ
dmin (u),
dL1 (u) = dmin (u).
Once L1 is fixed, we define L2 . Consider the sets S = {v ∈ W : dL1 (v) = dmin (u)} and
R = {v ∈ S : x(v) > µ}. It is clear that u ∈ R ⊆ S and that the elements in S are consecutive in
L1 . We obtain L2 from L1 by preserving the position of the elements in V − S, and then moving
all the elements in R just after all the elements in S \ R, preserving the relative order in both sets.
See Figure 2.1 for an illustration.
S
R
L1
R
S
L2
Figure 1: The elements in R are moved backwards inside of S. The order in R and S \R is preserved.
We also need to define new distance labels for L2 . This actually justifies our choice of R. Indeed,
observe that we can increase the label of all elements of R by 1 unit and still stay feasible (since (i)
20-3
all elements of S have the same label and (ii) all elements v of R have dL1 (v) = dmin (v) because of
the choice of u). Thus, the following labels are valid for L2 :
(
dL1 (v) + 1 if v ∈ R,
dL2 (v) =
dL1 (v)
if v ∈
/ R.
In order to conclude the procedure, we need to define . We choose as the larger satisfying
two conditions. The first one is to keep λL1 nonnegative. This imposes the condition ≤ λL1 . The
second condition requires that x0 (v) ≥ µ in R and x0 (v) ≤ µ in S \ R, both valid conditions for
the original x. We choose as the larger value such that both conditions hold. If = λL1 , we call
the update process a saturating push. Otherwise, we call it a non-saturating push. Note that a
saturating push replaces L1 by L2 in the convex combination defining x.
This completes the definition of the algorithm. We conclude this section with a simple observation
about how the vector x changes with each push. The total change is
x0 − x = yL2 − yL1 .
It is not hard to see that yL2 (v) = yL1 (v) when v ∈
/ S. By submodularity we also have that
yL2 (v) ≤ yL1 (v) when v ∈ R, and yL2 (v) ≥ yL1 (v) when v ∈ S \ R. It follows that
=
x0 (v) for v ∈
/S
x(v) ≥
x0 (v) for v ∈ R
x(v) ≤
x0 (v) for v ∈ S \ R.
x(v)
Intuitively, we are thus making progress as we are decreasing some large x(v) (and increasing some
smaller ones) and we are increasing some distance labels. In the next section we analyze the running
time of the algorithm.
3
Running time
We first prove that the number of saturating and non-saturating pushes is poly(n, log M ), where M
is the maximum absolute value of f . In both cases, we use a potential function argument.
Lemma 2 The number of non-saturating pushes is O(n3 log(nM )).
P
+
2
0
Proof: Consider the potential function Φ(x) =
v∈W (x (v)) . Let us prove that if x is an
update of x using a non-saturating push, then
1
.
Φ(x0 ) ≤ Φ(x) 1 −
16n3
Let Q+ = {v : v ∈ S \ R, x0 (v) > 0}. Then,
Φ(x) − Φ(x0 )
=
X
(x+ (v))2 − (x0 (v))2
v∈R∪Q+
=
X
(x+ (v) − x0 (v))(x+ (v) + x0 (v))
v∈R∪Q+
≥ (2µ − δ)
X
(x+ (v) − x0 (v)) + (2µ + δ)
v∈Q+

=
2µ 
(x(v) − x0 (v))
v∈R

X
X

(x+ (v) − x0 (v)) + δ 
v∈Q+ ∪R

X
v∈Q+
20-4
(x0 (v) − x+ (v)) +
X
v∈R
(x(v) − x0 (v))
Note that x+ (v) ≥ x0 (v) for v ∈ S \ (Q+ ∪ R), and thus we can write:


!
X
X
X
Φ(x) − Φ(x0 ) ≥ 2µ
(x+ (v) − x0 (v)) + δ 
(x0 (v) − x+ (v)) +
(x(v) − x0 (v))
v∈Q+
v∈S
But the first term is ≥ 0 as x+ (v) ≥ x(v) and
Φ(x) − Φ(x0 ) ≥ δ
X
P
v∈S
x(v) =
v∈R
P
(x0 (v) − x+ (v)) + δ
v∈Q
v∈S
X
x0 (v). Therefore:
(x(v) − x0 (v)).
v∈R
All these terms are ≥ 0 and, by construction, we know that |x+ (v) − x0 (v)| ≥ δ for some v, and
therefore Φ(x) − Φ(x0 ) ≥ δ 2 for a non-starurating push and nonnegative
for a saturating push.
1
,
for
every
saturating push.
Furthermore, Φ(x) ≤ nη 2 = 16n3 δ 2 , and thus Φ(x0 ) ≤ Φ(x) 1 − 16n
3
Note that if we start the algorithm with x = x0 , then Φ(x0 ) ≤ nM 2 , so after O(n3 log(nM )) nonsaturating pushes, the current x satisfies Φ(x) < 1/n2 . Since this condition implies the termination
condition η < 1/n, the result follows.
Lemma 3 The number of saturating pushes is O(n5 log(nM )).
P
P
Proof: We introduce another potential function φ(Λ) = L∈Λ v∈V (n − dL (v)). Note that in
any push operation, the distance labels of L2 are no smaller than the distance labels of L1 , and at
least one of the labels increases. It follows that a saturating push increases the potential function
by at least one. A non-saturating push adds a new linear order to Λ, and therefore, it increases Λ
by at most n2 . From this relation, and Lemma 2, it follows that the number of saturating pushes is
O(n5 log(nM )), otherwise the potential function would be negative.
Both pushes can be carried out in polynomial time, using O(n) oracle calls, so the algorithm
runs in polynomial time in the oracle model.
References
[1] Iwata, S. and J. Orlin (2009). A simple combinatorial algorithm for submodular function minimization. Accepted for the 2009 SODA Conference.
[2] Grötschel, M. and Lovász, L. and Schrijver, A. The ellipsoid method and its consequences in
combinatorial optimization. Combinatorica (1) 2:169-177, 1981.
20-5
Download