6. Randomized Rounding

advertisement
Chapter 6
Randomized Rounding
Randomized rounding is a powerful method in the design and analysis of approximation
algorithms. An approximation algorithm runs in polynomial time and determines a feasible
solution (for a usually NP-hard problem), which is provably not “too far” away from a
best-possible solution.
6.1
Basics
Many optmization problems can be formulated in terms of Integer Linear Programming (ILP). We are given an m×n-matrix A = (ai,j )1≤i≤m,1≤j≤n and vectors b = (bi )1≤i≤m
and c = (cj )1≤j≤n . Hence an instance is I = (A, b, c). Furthermore, we have a vector
x = (xj )1≤j≤n of integer variables, i.e., xj ∈ Z.PThe problem is to extremize, i.e., minimize or maximize, the linear function val(x) = nj=1 cj xj , subject to A · x ≥ b. val is also
called the objective function and a vector x which satisfies A · x ≥ b is called feasible solution. The set F (A, b) = {x ∈ Zn : A · x ≥ b} is the feasible region. A feasible solution x∗
where the desired extremum is attained is called optimal. We often write opt(I) = val(x∗ ).
Problem 6.1 Integer Linear Programming
Instance. Matrix A ∈ Rm,n , vectors b ∈ Rm , c ∈ Rn
Task. Solve the problem
minimize
val(x) =
n
X
cj xj ,
j=1
subject to
n
X
j=1
ai,j xj ≥ bi
i = 1, . . . , m,
xj ∈ Z j = 1, . . . , n.
As it is in general NP-hard to solve ILP, we often resort to approximation algorithms.
We consider a minimization problem w.l.o.g. here. A polynomial time algorithm alg is
called a c-approximation algorithm for Integer Linear Programming, if it produces
for any instance I = (A, b, c) a feasible solution x ∈ F (A, b) with objective value alg(I) =
val(x) satisfying
alg(I)
≤ c.
opt(I)
56
If we have a maximization problem, then we require alg(I)/opt(I) ≥ c.
A common strategy for designing approximation algorithms for ILP is called LPrelaxation. If we replace each constraint xj ∈ Z by xj ∈ R, then the resulting problem can
be solved in polynomial time as it is an instance of (ordinary) Linear Programming
(LP). Let z be an optimal solution for the relaxed problem. Observe that
val(z) ≤ val(x∗ ),
if the original ILP problem was a minimization problem. However, the optimal solution z
for the LP is in general infeasible for the ILP problem. Hence we still have to “round” z to
a solution x ∈ F (A, b) (but having an eye on solution quality). This means it is sufficient
to show an inequality of the form
val(x) ≤ c · val(z).
Then the resulting algorithm alg(I) = val(x) is c-approximate because
alg(I) = val(x) ≤ c · val(z) ≤ c · val(x∗ ) = c · opt(I).
The basic setting for randomized rounding is this: Suppose we have the requirements
xj ∈ {0, 1} (instead of xj ∈ Z) for j = 1, . . . , n. Suppose we replace each constraint
xj ∈ {0, 1} by xj ∈ [0, 1]. Then the values zj ∈ [0, 1] of any solution z can be interpreted
as probabilities. Thus we do the following: We randomly produce a vector X ∈ {0, 1}n ,
where Xj = 1 with probability zj and Xj = 0 otherwise. By solving the obtained LP with
solution z ∗ , say, we then have a randomized algorithm alg with


n
n
n
X
X
X


E [alg(I)] = E
cj Xj =
cj E [Xj ] =
cj zj∗ = val(z ∗ ).
j=1
j=1
j=1
However, we still have to ensure that the solution X is also in the feasible region F (A, b)
(with a certain probability). Sometimes this task is easy, sometimes not. It depends on
the problem at hand.
6.2
Minimum Set Cover
The Minimum Set Cover problem is a simple to state – yet quite general – NP-hard
problem. It is widely applicable in sometimes unexpected ways. The problem is the
following: We are given a set U (called universe) of n elements, a collection of sets S =
{S1 , . . . , Sk } where Si ⊆ U and ∪S∈S S = U , and a cost function c : S → R+ . The task is
to find a minimum cost subcollection S 0 ⊆ S that covers U , i.e., such that ∪S∈S 0 S = U .
Example 6.1. Consider this instance: U = {1, 2, 3}, S = {S1 , S2 , S3 } with S1 = {1, 2},
S2 = {2, 3}, S3 = {1, 2, 3} and cost c(S1 ) = 10, c(S2 ) = 50, and c(S3 ) = 100. These
collections cover U : {S1 , S2 }, {S3 }, {S1 , S3 }, {S2 , S3 }, {S1 , S2 , S3 }. The cheapest one is
{S1 , S2 } with cost equal to 60.
For each set S, we associate a variable xS ∈ {0, 1} that indicates of we want to choose
S or not. We may thus write solutions for Minimum Set Cover as a vector x ∈ {0, 1}k .
With this, we write Minimum Set Cover as a mathematical program.
57
Problem 6.2 Minimum Set Cover
Instance. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost
function c : S → R.
Task. Solve the problem
minimize
val(x) =
X
c(S)xS ,
S∈S
subject to
X
S:e∈S
xS ≥ 1
xS ∈ {0, 1}
e ∈ U,
S ∈ S.
Deterministic Rounding
Define the frequency of an element to be the number of sets it is contained in. Let f denote
the frequency of the most frequent element. The idea of the algorithm below is to include
those sets S into the cover for which the corresponding value zS in the optimal solution z
of the LP is “large enough”.
Algorithm 6.1 Simple Rounding Set Cover
Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.
Output. Vector x ∈ {0, 1}k
Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) =
c(S)xS ,
S∈S
subject to
X
S:e∈S
xS ≥ 1
xS ≥ 0
e ∈ U,
S ∈ S.
Step 2. For each set S let xS = 1 if zS ≥ 1/f .
Step 3. Return x.
Theorem 6.2. Simple Rounding Set Cover is f -approximate for Minimum Set
Cover.
Proof. Let x be the solution returned by the algorithm and z be the optimal solution of
the LP. Consider an arbitrary element e ∈ U . Since e is in at most f sets, one of these
sets must be picked to the extent of at least 1/f in the fractional solution z. Thus e is
covered due to the definition of the algorithm and x is hence a feasible cover. We further
have xS ≤ f zS and thus
val(x) ≤ f · val(z) ≤ f · val(x∗ )
where x∗ is an optimal solution for the Minimum Set Cover problem.
58
Randomized Rounding
Another natural idea for rounding fractional solutions is to use randomization: For example, for the above relaxation, observe that the values zS are between zero and one. We
may thus interpret these values as probabilities for choosing a certain set S.
Here is the idea of the following algorithm: Solve the LP-relaxation optimally and call
the solution z. With probability zS include the set S into the cover.
This basic procedure yields a vector x with expected value equal to the optimal fractional solution value but might not cover all the elements. We thus repeat the procedure
“sufficiently many” times and include a set into our cover if it was included in any of
the iterations. We will show that O (log n) many iterations suffice for obtaining a feasible
cover, thus yielding an O (log n)-approximation algorithm.
Algorithm 6.2 Randomized Rounding Set Cover
Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.
Output. Vector x ∈ {0, 1}k
Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) =
c(S)xS ,
S∈S
subject to
X
S:e∈S
xS ≥ 1
xS ≥ 0
e ∈ U,
S ∈ S.
Step 2. Repeat 2dln ne times: For each set S let xS = 1 with probability zS , leave xS
unchanged otherwise.
Step 3. Return x.
Theorem 6.3. Randomized Rounding Set Cover is 2dln ne-approximate for Minimum Set Cover, in expectation.
Proof. Let z be an optimal solution for the LP. In each iteration in Step 2, for each set S
we let xS = 1 with probability zS . Then we have an increase in objective value in each
iteration of at most
X
X
X
E [c(S)xS ] =
c(S)Pr [xS = 1] =
c(S)zS = val(z).
S∈S
S∈S
S∈S
We estimate the probability that an element u ∈ U is covered in one iteration. Let u
be contained in k sets and let z1 , . . . , zk be the probabilities given in the solution z. Since
u is fractionally covered we have z1 + · · · + zk ≥ 1. With easy but tedious calculus we see
that – under this condition – the probability for u being covered is minimized when the zi
are all equal, i.e., z1 = · · · = zk = 1/k:
1
Pr [xS = 1] = 1 − (1 − z1 ) · · · · · (1 − zk ) ≥ 1 − 1 −
k
59
k
1
≥1− .
e
Summing up this step: In each of the iterations in Step 2 the value of the solution x
constructed increases by at most val(z) in expectation. Each element is covered with
probability at least 1 − 1/e. But maybe we have not covered all elements after 2dln ne
iterations. Here we show that we will have with high probability.
The probability that element u is not covered at the end of the algorithm, i.e., after
2dln ne iterations is
2dln ne
1
1
Pr [u is not covered] ≤
≤ 2.
e
n
Thus the probability that there is an uncovered element is at most n/n2 = 1/n.
Clearly,
E [val(x)] ≤ 2dln ne · val(z) ≤ 2dln ne · val(x∗ ),
where x∗ is an optimal solution for Minimum Set Cover. So, with high probability, the
algorithm returns a feasible solution, whose expected value is at most 2dln ne.
6.3
Maximum Satisfiability
The Satisfiability problem asks if a certain given Boolean formula has a satisfying assignment, i.e., one that makes the whole formula evaluate to true. There is a related
optimization problem called Maximum Satisfiability. The goal of this chapter is to develop a deterministic 3/4-approximation algorithm by first giving a randomized algorithm,
which will then be derandomized.
We are given the Boolean variables X = {x1 , . . . , xn }, where each xi ∈ {0, 1}. A literal
`i of the variable xi is either xi itself, called a positive literal, or its negation x̄i with truth
value 1 − xi , called a negative literal. A clause is a disjunction C = (`1 ∨ · · · ∨ `k ) of literals
`j of X; their number k is called the size of C. For a clause C let SC+ denote the set of its
positive literals; similarly SC− the set of its negative literals. Let C denote the set of clauses.
A Boolean formula in conjunctive form is a conjunction of clauses F = C1 ∧ · · · ∧ Cm . Each
vector x ∈ {0, 1}n is called a truth assignment. For any clause C and any such assignment
x we say that x satisfies C if at least one of the literals of C evaluates to 1.
The problem Maximum Satisfiability is the following: We are given a formula F
in conjunctive form and for each clause C a weight wC , i.e., a weight function w : C → N.
The objective is to find a truth assignment x ∈ {0, 1}n that maximizes the total weight of
the satisfied clauses. As an important special case: If we set all weights wC equal to one,
then we seek to maximize the number of satisfied clauses.
Now we introduce for each clause C a variable zC ∈ {0, 1} which takes the value one if
and only if C is satisfied under a certain truth assignment x. Now we can formulate this
as an ILP as done in Problem 6.3.
The algorithm we aim for is a combination of two algorithms. One works better
for small clauses, the other for large clauses. Both are initially randomized but can be
derandomized using the method of conditional expectation, i.e., the final algorithm is
deterministic.
6.3.1
Randomized Algorithm
For each variable xi we define the random variable Xi that takes the value one with a
certain probability pi and zero otherwise. This induces, for each clause C, a random
variable ZC that takes the value one if C is satisfied under a (random) assignment and
zero otherwise.
60
Problem 6.3 Maximum Satisfiability
Instance. Formula F = C1 ∧ · · · ∧ Cm with m clauses over the n Boolean variables
X = {x1 , . . . , xn }. A weight function w : C → N.
Task. Solve the problem
maximize
val(z) =
X
wC z C ,
C∈C
subject to
X
X
xi +
−
i∈SC
+
i∈SC
(1 − xi ) ≥ zC
C ∈ C,
zC ∈ {0, 1} C ∈ C,
xi ∈ {0, 1} i = 1, . . . , n.
Algorithm for Large Clauses
Consider this algorithm Randomized Large: For each variable xi with i = 1, . . . , n,
set Xi = 1 independently with probability 1/2 and Xi = 0 otherwise. Output X =
(X1 , . . . , Xn ).
Define the quantity
αk = 1 − 2−k .
Lemma 6.4. Let C be a clause. If size(C) = k then
E [ZC ] = αk .
Proof. A clause C is not satisfied, i.e., zC = 0 if and only if all its literals are set to zero.
By independence, the probability of this event is exactly 2−k and thus
E [ZC ] = 1 · Pr [ZC = 1] + 0 · Pr [ZC = 0] = 1 − 2−k = αk
which was claimed.
Theorem 6.5. Randomized Large is a 1/2-approximate for Maximum Satisfiability, in expectation.
Proof. By linearity of expectation, Lemma 6.4, and size(C) ≥ 1 we have
E [val(Z)] =
X
C∈C
wC E [ZC ] =
X
C∈C
wC αsize(C) ≥
1X
1
wC ≥ val(z ∗ )
2
2
C∈C
where (x∗ , z ∗ ) is an optimal
P solution for Maximum Satisfiability. We have used the
∗
obvious bound val(z ) ≤ C∈C wC .
61
Algorithm for Small Clauses
Maybe the most natural LP-relaxation of the problem is:
X
maximize val(z) =
wC z C ,
C∈C
subject to
X
+
i∈SC
xi +
X
−
i∈SC
0 ≤ zC ≤ 1
0 ≤ xi ≤ 1
(1 − xi ) ≥ zC
C ∈ C,
C∈C
i = 1, . . . , n.
In the sequel let (x, z) denote an optimum solution for this LP.
Consider this algorithm Randomized Small: Determine (x, z). For each variable xi
with i = 1, . . . , n, set Xi = 1 independently with probability xi and Xi = 0 otherwise.
Output X = (X1 , . . . , Xn ).
Define the quantity
1 k
.
βk = 1 − 1 −
k
Lemma 6.6. Let C be a clause. If size(C) = k then
E [ZC ] = βk zC .
Proof. We may assume that the clause C has the form C = (x1 ∨ · · · ∨ xk ); otherwise
rename the variables and rewrite the LP.
The clause C is satisfied if X1 , . . . , Xk are not all set to zero. The probability of this
event is
!k
Pk
(1
−
x
)
i
i=1
1 − Πki=1 (1 − xi ) ≥ 1 −
k
!k
Pk
x
i
= 1 − 1 − i=1
k
zC k
≥1− 1−
.
k
Above we firstly have used the arithmetic-geometric mean inequality, which states that
for non-negative numbers a1 , . . . , ak we have
√
a1 + · · · + ak
≥ k a1 · · · · · ak .
k
Secondly the LP guarantees the inequality x1 + · · · + xk ≥ zC .
Now define the function g(t) = 1 − (1 − t/k)k . This function is concave with g(0) = 0
and g(1) = 1 − (1 − 1/k)k which yields that we can bound
g(t) ≥ t(1 − (1 − 1/k)k ) = tβk
for all t ∈ [0, 1].
Therefore
z C k
Pr [ZC = 1] ≥ 1 − 1 −
≥ βk z C
k
and the claim follows.
62
Theorem 6.7. Randomized Small is 1 − 1/e-approximate for Maximum Satisfiability, in expectation.
Proof. The function βk is decreasing with k. Therefore if all clauses are of size at most k,
then by Lemma 6.6
X
X
E [val(Z)] =
wC E [ZC ] ≥ βk
wC zC = βk val(z) ≥ βk val(z ∗ ),
C∈C
C∈C
where (x∗ , z ∗ ) is an optimal solution for Maximum Satisfiability. The claim follows
since (1 − 1/k)k > 1/e for all k ∈ N.
3/4-Approximation Algorithm
Consider the algorithm Randomized Combine: With probability 1/2 run Randomized
Large otherwise run Randomized Small.
Lemma 6.8. Let C be a clause, then
E [ZC ] =
3
· zC .
4
Proof. Let the random variable B take the value zero if the first algorithm is run, one
otherwise. For a clause C let size(C) = k. By Lemma 6.4 and zC ≤ 1
E [ ZC | B = 0] = αk ≥ αk zC .
and by Lemma 6.4
E [ ZC | B = 1] ≥ βk zC .
Combining we have
E [ZC ] = E [ ZC | B = 0] Pr [B = 0] + E [ ZC | B = 1] Pr [B = 1] ≥
zC
(αk + βk ).
2
It is tedious but relatively easy to see that αk + βk ≥ 3/2 for all k ∈ N.
Theorem 6.9. Randomized Combine is 3/4-approximate for Maximum Satisfiability, in expectation.
Proof. This follows from Lemma 6.8 and linearity of expectation.
6.3.2
Derandomization
The notion of derandomization refers to “turning” a randomized algorithm into a deterministic one (possibly at the cost of additional running time or deterioration of approximation guarantee). One of the several available techniques is the method of conditional
expectation.
We are given a Boolean formula F = C1 ∧· · ·∧Cm in conjunctive form over the variables
X = {x1 , . . . , xn }. Suppose we set x1 = 0, then we get a formula F0 over the variables
x2 , . . . , xn after simplification; if we set x1 = 1 then we get a formula F1 .
Example 6.10. Let F = (x1 ∨ x2 ) ∧ (x̄1 ∨ x3 ) ∧ (x1 ∨ x̄4 ) where X = {x1 , . . . , x4 }.
x1 = 0 : F0 = (x2 ) ∧ (x4 )
x1 = 1 : F1 = (x3 )
63
T (F )
x1 = 0
F
level 0
x1 = 1
F0
F1
T (F0)
T (F1)
level 1
Figure 6.1: Derandomization tree for a formula F .
Applying this recursively, we obtain the tree T (F ) depicted in Figure 6.1. The tree
T (F ) is a complete binary tree with height n+1 and 2n+1 −1 vertices. Each vertex at level i
corresponds to a setting for the Boolean variables x1 , . . . , xi . We label the vertices of T (F )
with their respective conditional expectations as follows. Let X1 = a1 , . . . , Xi = ai ∈ {0, 1}
be the outcome of a truth assignment for the variables x1 , . . . , xi . The vertex corresponding
to this assignment will be labeled
E [ val(Z) | X1 = a1 , . . . , Xi = ai ] .
If i = n, then this conditional expectation is simply the total weight of clauses satisfied by
the truth assignment x1 = a1 , . . . , xn = an .
The goal of the remainder of the section is to show that we can find deterministically
in polynomial time a path from the root of T (F ) to a leaf such that the conditional
expectations of the vertices on that path are at least as large as E [val(Z)]. Obviously, this
property yields the desired: We can construct determistically a solution which is at least
as good as the one of the randomized algorithm in expectation.
Lemma 6.11. The conditional expectation
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
of any vertex in T (F ) can be computed in polynomial time.
Proof. Consider a vertex X1 = a1 , . . . , Xi = ai . Let F 0 be the Boolean formula obtained
from F by setting x1 , . . . , xi accordingly. F 0 is in the variables xi+1 , . . . , xn .
Clearly, by linearity of expectation, the expected weight of any clause of F 0 under any
random truth assignment to the variables xi+1 , . . . , xn can be computed in polynomial
time. Adding to this the total weight of clauses satisfied by x1 , . . . , xi gives the answer.
Theorem 6.12. We can compute in polynomial time a path from the root to a leaf in T (F )
such that the conditional expectation of each vertex on this path is at least E [val(Z)].
Proof. Consider the conditional expectation at a certain vertex X1 = a1 , . . . , Xi = ai for
setting the next variable Xi+1 . We have that
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
= E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 0] Pr [Xi+1 = 0]
+ E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 1] Pr [Xi+1 = 1] .
64
We show that the two conditional expectations with Xi+1 can not be both strictly smaller
than E [ val(Z) | X1 = a1 , . . . , Xi = ai ]. Assume the contrary, then we have
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
< E [ val(Z) | X1 = a1 , . . . , Xi = ai ] (Pr [Xi+1 = 0] + Pr [Xi+1 = 1])
which is a contradiction since Pr [Xi+1 = 0] + Pr [Xi+1 = 1] = 1.
This yields the existence of such a path. And by Lemma 6.11 it can be computed in
polynomial time.
The derandomized version of a randomized algorithm now simply executes these proofs
with the probability distribution as given by the randomized algorithm.
6.4
Integer Linear Programs
Here we give a general approach for rounding a Linear Program (LP) relaxation of an
Integer Linear Program (ILP). We especially obtain bounds on the infeasibility of
the obtained solution for the ILP. The omitted proof is a simple application of a bound
on concentration of measure, e.g., Azuma-Hoeffding.
Theorem 6.13. Let A ∈ [−α, α]n,n be a matrix and let b ∈ Rn . Let z be any fractional
solution for the linear program
Ax = b,
x ∈ [0, 1]n .
Define a vector X ∈ {0, 1}n randomly by
(
1 with probability zi ,
Xi =
0 otherwise
for all i = 1, . . . , n.
Then, with high probability, an outcome x of X satisfies
p
p
Ax − b ∈ [−c n log n, c n log n]n ,
for a suitable constant c = c(α) > 0.
6.5
Integer Quadratic Programs
Many optimization problems can be formulated in terms of a Integer Quadratic Program (IQP). We are given a matrix A ∈ Rn,n , a vector b ∈ Rn , and our variables are
x ∈ {0, 1}n , see Problem 6.4.
We consider maximization, but of course one might also consider minimization. It is in
general NP-hard to solve IQPs. In this section we give a randomized sampling and rounding
algorithm which approximates those programs. The following algorithm approximates the
large class of IQP, where the entries of A are constants (like, e.g., Maxmimum Cut) and
the entries of b are linear in n (like dense graphs, when the minimum degree is linear in
n).
65
Problem 6.4 Integer Quadratic Programming
Instance. Matrix A ∈ Rn,n , vector b ∈ Rn
Task. Solve the problem
maximize
val(x) = xT Ax + bT x,
subject to x ∈ {0, 1}n .
Theorem 6.14. Let A ∈ [−α, α]n,n and b ∈ [−βn, βn]n , where α > 0 and β > 0 are
constants. Let x∗ ∈ {0, 1}n be an optimal solution for the corresponding IQP.
2
Then, for every ε > 0 there is an algorithm with running time O poly (n) n1/ε ,
which constructs a vector x ∈ {0, 1}n such that
val(x) ≥ val(x∗ ) − εn2 .
The proof is subdivided into the following principal steps:
(i) Assume for the moment that x∗ is known. Then we can reformulate the IQP into a
certain ILP (where it is at first sight not clear why we make any progress with this
step). But since we actually do not know x∗ , the objective function and the right
hand side of our ILP are unknown.
(ii) However, we will “estimate” the objective function and the right hand side of the
ILP using sampling with replacement.
(iii) Then we relax the ILP to an LP which can be solved in polynomial time.
(iv) Finally, the LP-solution will be rounded randomly.
Proof. The first step is to transform the Integer Quadratic Program (IQP) into a
Integer Linear Program (ILP). Let x∗ be optimal for IQP and assume for the moment
that we actually know x∗ . Define the vector r ∈ Rn by
rT = x∗T A + bT
which now yields an ILP:
maximize
rT x,
subject to xT A + bT = rT ,
x ∈ {0, 1}n .
Clearly, x∗ is an optimal solution for this system.
However, since x∗ is unknown, also r is unknown. The next step is hence to “estimate”
r with a vector s ∈ Rn . We will allow that ri and si differ by a linear error, i.e.,
|ri − si | ≤ εn
where I = {1, . . . , n} is the set of indices.
66
for i ∈ I,
Recall from Section 3.4.4 that an unknown quantity can be estimated within an additive
error by sampling with replacement. We want to estimate the quantity
X
ri =
aj,i x∗j + bi .
j∈I
P
Thus consider the universe U = {aj,i x∗j : j ∈ I} and we want to estimate u = j∈I aj,i x∗j .
Now choose a set J ⊆ I of k indices with replacement at random. For those k variables
xj with j ∈ J,thereare 2k possible 0 − 1 assignments. We choose k = Θ log n/ε2 and
have hence Θ n1/ε
2
many possible assignments. These are few enough so that we can
actually try all the
2k
possibilities and eventually will find an assignment such that
xj = x∗j
for all j ∈ J
which shall be assumed in the sequel. Hence, define a vector s∗ with
nX
aj,i x∗j + bi for i ∈ I
s∗i =
k
j∈J
as an estimator for ri . Since bi is fixed and the aj,i ∈ [−α, α] we can apply Theorem 3.19
on sampling with replacement. For suitable k = Θ log n/ε2 we have
|ri − s∗i | ≤ εn,
with probability at least 1 − 2n−2 per constraint i ∈ I. Hence with probability at least
1 − 2n−1 all n constraints satisfy this inequality.
Where are we now? We can estimate r with s up to an additive error of εn. Thus
we can approximate the IQP with the following ILP (where 1T denotes the transposed
all-ones vector):
maximize
sT x,
subject to xT A + bT ≥ sT − εn · 1T ,
xT A + bT ≤ sT + εn · 1T ,
x ∈ {0, 1}n .
The vector x∗ is a feasible solution for this system and we have
sT x∗ = rT x∗ − (rT − sT )x∗ = x∗T Ax∗ + bT x∗ − (rT − sT )x∗
≥ val(x∗ ) − εn2 .
However, we still are left with an ILP. We relax it to an LP by replacing the constraint
x ∈ {0, 1}n with x ∈ [0, 1]n . Let z be an optimal fractional solution. By this relaxation
we must have
sT z ≥ sT x∗ .
Apply the randomized rounding procedure stated above and obtain a vector x ∈ {0, 1}n
which satisfies
p
n log n · 1T ,
xT A + bT ≥ sT − εn + O
p
xT A + bT ≤ sT + εn + O
n log n · 1T
67
with high probability.
Then, by randomized rounding, we have
p
sT x ≥ sT z − O n n log n
p
≥ sT x∗ − O n n log n
p
≥ val(x∗ ) − εn2 − O n n log n .
Now define a vector δ T = z T A + bT − sT which captures an error. By the constraints we
have that |δi | ≤ εn for each i ∈ I and hence |δ T x| ≤ εn2 .
We are now in position to finalize the proof by combining the derived inequalities:
val(x) = xT Ax + bT x
= (xT A + bT )x
= (xT A + bT − (z T A + bT ))x + δ T x + sT x
p
p
≥ O n n log n − εn2 + val(x∗ ) − εn2 − O n n log n
= val(x∗ ) − (2ε + o (1))n2 .
Thus the proof suggests the following algorithm Round IQP:
Step 1. Sample J ⊆ I with cardinality k = Θ log n/ε2 .
Step 2. For each of the 2k assignments xj ∈ {0, 1} for j ∈ J do:
P
(a) Compute the vector s, where si = n/k · j∈J aj,i xj + bi for i ∈ I.
(b) Solve the LP max{sT x : sT − εn · 1T ≤ xT A + b ≤ sT + εn · 1T , x ∈ [0, 1]n }.
Call the optimal solution z.
(c) Round z ∈ [0, 1]n to x ∈ {0, 1}n randomly.
Step 3. Return the vector x found which maximizes val.
2
The running time of the algorithm is dominated by trying all 2k = O n1/ε assignments
set J ⊆ I and solving an LP for each. Thus we have running time
in the sampled
2
1/ε
O poly (n) n
in total.
6.6
Maximum Cut
Recall the problem Maximum Cut in which we are given an undirected graph G = (V, E)
and have to partition V into (L, R) with L ⊆ V and R = V − L. The objective is to
maximize the number of cut-edges, i.e., the edges e = {`, r} with the property ` ∈ L and
r ∈ R.
6.6.1
Dense Instances
Maxmimum Cut can be formulated as a IQP. Introduce for each vertex i ∈ V a variable
xi ∈ {0, 1} indicating if i ∈ L. Let Ni = {j ∈ V : {i, j} ∈ E} denote the set of neighbours
of i and di = |Ni | the degree of i. For each x ∈ {0, 1}n notice that


X
X
xi ·
val(x) =
(1 − xj )
i∈V
j∈Ni
68
counts the number of cut-edges. Furthermore, let A = (ai,j )i,j∈V ∈ {0, 1}n,n with ai,j = 1
if and only if {i, j} ∈ E denote the adjacency matrix of G. Moreover let d = (di )i∈V be
the degree vector of G. Then we have


X
X
xi ·
val(x) =
(1 − xj )
i∈V
j∈Ni

=
X
i∈V
=
X

xi · (di −
di xi +
i∈V
X
xj )
j∈Ni
X
xi xj
{i,j}∈E
1
= dT x − xT Ax.
2
Problem 6.5 Maximum Cut
Instance. Adjacency matrix A ∈ {0, 1}n,n , degree vector d ∈ Nn0
Task. Solve the problem
1
val(x) = dT x − xT Ax,
2
n
subject to x ∈ {0, 1} .
maximize
A graph G = (V, E) is called α-dense if it has at least αn2 many edges. Here we apply
the above algorithm to dense instances of Maximum Cut.
Corollary 6.15. Let G be α-dense for some constant α > 0. Then, forevery ε > 0 there
2
is a (1 − ε)-approximate algorithm with running time O poly (n) n1/ε for Maxmimum
Cut.
Proof. We apply the algorithm Round IQP to yield a solution x ∈ {0, 1}n with
val(x) ≥ val(x∗ ) −
εα 2
·n .
2
Since the graph is α-dense, i.e., it has at least αn2 edges, and since assigning each xi = 1
with proabaility 1/2 yields a cut with expected α/2 · n2 many edges, we have
val(x∗ ) ≥
α 2
·n .
2
Thus we have
val(x) ≥ val(x∗ ) −
εα 2
· n ≥ val(x∗ ) − εval(x∗ ) = (1 − ε)val(x∗ )
2
as claimed.
6.6.2
Approximation Algorithm
In this section we give a randomized 0.878-approximation algorithm for Maximum Cut.
The algorithm makes use of Semidefinite Programming, which we shall briefly introduce here.
69
Semidefinite Programming
We need some notions from linear algebra respectively matrix theory. Recall that a matrix
A ∈ Rn,n is symmetric if A = AT . An Eigenvector of a matrix A is a vector x 6= 0 such
that
Ax = λx,
for some λ ∈ C, in which case λ is called Eigenvalue. Observe that the Eigenvalues of A
are exactly the roots of the polynomial det(A − λI), where I denotes the identity matrix.
The following facts are well-known:
Theorem 6.16. Let A ∈ Rn,n be symmetric. Then A has n (not necessarily distinct) real
Eigenvalues λ1 ≥ · · · ≥ λn ∈ R.
Theorem 6.17 (Rayleigh). Let A ∈ Rn,n be symmetric. Then we have
λ1 (A) = max xT Ax
||x||=1
and
λn (A) = min xT Ax.
||x||=1
A matrix A ∈ Rn,n is called
(
positive semi-definite (psd) if xT Ax ≥ 0 for all x 6= 0,
positive definite (pd)
if xT Ax > 0 for all x =
6 0.
Thus, A is psd if and only if λn ≥ 0; pd if and only if λn > 0. Further, consider the
matrix
B 0
.
A=
0 C
This matrix A is psd if and only if B and C are psd.
Theorem 6.18. Let A ∈ Rn,n be symmetric. Then the following are equivalent:
(i) A is psd.
(ii) λi ≥ 0 for i = 1, . . . , n.
(iii) There is a matrix B ∈ Rn,n such that
A = B T B.
The form A = B T B is called Cholesky decomposition (which is in general not unique).
However, B can always be chosen to be an upper (or lower) triangular
matrix. This yields
an algorithm which computes such a decomposition in time O n3 . Therefore we can test
if a symmetric matrix is psd in polynomial time. We can even do that for non-symmetric
matrices.
Observation 6.19. Let A ∈ Rn,n and define the symmetric matrix A0 = 21 (A+AT ) ∈ Rn,n .
Then we have that A0 is psd if and only if A is psd.
Let A = (aTi,j ) ∈ Rn,2n be a vector matrix with row vectors aTi,j ∈ Rn for 1 ≤ i, j ≤ n.
A is symmetric if aTi,j = aTj,i . Let x ∈ Rn . Define the operation
A • x = (aTi,j x)1≤i,j≤n .
For a matrix B ∈ Rn,n and x ∈ Rn define
S(x) = A • x + B.
The matrix S(x) ∈ Rn,n contains entries si,j (x) = aTi,j x + bi,j that are linear functions.
70
Example 6.20. The notation yields, e.g.,
(1, 2) (−1, 5)
x1
0 8
x1 + 2x2
−x1 + 5x2 + 8
S(x) =
•
+
=
.
(7, −3) (0, 0)
x2
−3 1
7x1 − 3x2 − 3
1
With this notation we define Semidefinite Programming (SDP) as given in Problem 6.6. By Observation 6.19, the requirement that A and B are symmetric is without loss of generality. We may of course also require minimization instead of maximization. Notice the following: If A = diag(aT1 , . . . , aTn ) with row vectors aT1 , . . . , aTn
and B = diag(b1 , . . . , bn ), then we obtain Linear Programming as a special case of
Semidefinite Programming.
Problem 6.6 Semidefinite Programming
Instance. Symmetric vector matrix A ∈ Rn,2n , symmetric matrix B ∈ Rn,n , vector c ∈ Rn
Task. Solve the problem
maximize
val(x) = cT x,
subject to S(x)
is psd,
n
x∈R .
It is known that SDP can be solved in polynomial time up to an arbitrary error ε > 0
if we are guaranteed that there is a solution with polynomially bounded size. Such an
error term can in general be not avoided because solutions for SDP might be irrational.
For our purposes, we may assume that we can solve SDP in polynomial time.
Application to Maximum Cut
Consider the following formulation of Maximum Cut in Problem 6.7. The idea is that the
term (1 − xi xj )/2 ∈ {0, 1} is equal to one if and only if xi , xj ∈ {−1, 1} assume different
values. Thus define xi = −1 if and only if vertex i ∈ V is i ∈ L.
Problem 6.7 Maximum Cut
Instance. Graph G = (V, E)
Task. Solve the problem
maximize
val(x) =
X 1 − xi xj
,
2
{i,j}∈E
subject to x ∈ {−1, 1}n .
Since it is not known how to solve this mathematical program directly (in polynomial
time), we first relax the constraints as follows: Instead of requiring xi ∈ {−1, 1} consider
a vector xi ∈ Rn with ||xi || = 1. Thus we have Problem 6.8.
This relaxation is not yet an SDP, but by introducing a vector x = (xi,j )1≤i<j≤n of
variables, we will obtain such a program in Problem 6.9. The reason is as follows: Since
71
Problem 6.8 Vector Maximum Cut
Instance. Graph G = (V, E)
Task. Solve the problem
maximize
val(x) =
X 1 − xT xj
i
,
2
{i,j}∈E
subject to ||xi || = 1,
xi ∈ R
n
for i = 1, . . . , n
for i = 1, . . . , n.
symmetric psd matrices A can be written as A = B T B, we can interpret ai,j as a product
of two vectors bTi bj . If ai,i = 1 we have bTi bi = 1 which means ||bi || = 1.
Problem 6.9 Semidefinite Maximum Cut
Instance. Graph G = (V, E)
Task. Solve the problem
maximize
val(x) =
X 1 − xi,j
,
2
{i,j}∈E

1
 x1,2

subject to X :=  .
 ..
x1,2
1
..
.
x1,n x2,n
...
...
...
...

x1,n
x2,n 

.. 
. 
is psd.
1
The idea of the algorithm SDP Rounding Maximum Cut is as follows: Suppose
that we have solved the SDP relaxation of Maximum Cut with solution z and matrix
Z. By computing the Cholesky decomposition of Z, we obtain vectors vi for i = 1, . . . , n
with viT vj = zi,j and especially viT vi = 1. However, in our formulation of Maximum Cut,
we require that a solution is given by x ∈ {−1, 1}n . Hence we have to “round” the vector
vi ∈ Rn to xi ∈ {−1, 1} for i = 1, . . . , n. Now consider an arbitrary hyperplane with
normal vector r ∈ Rn , i.e., a set H = {x ∈ Rn : rT x = 0}. One possible way of rounding
vi is to assign xi = −1 if and only if vi and r are on different sides of H, i.e., if and only
if rT vi ≤ 0. How do we choose the normal vector r? Why not at random, but symmetric
under rotation.
Theorem 6.21. SDP Rounding Maximum Cut is 0.878-approximate for Maximum
Cut.
Proof. Let vi and vj be two of the obtained vectors. By linearity of expectation, we have
to estimate the probability that vi and vj lie on different sides of the hyperplane H induced
by the normal vector r. Consider the hyperplane H 0 = {vi s + vj t : s, t ∈ R} defined by vi
and vj . The angle α = arccos(viT vj ) is invariant under rotation.
The probability that H is exactly H 0 is zero, i.e., we may condition on H 6= H 0 . Thus
the hyperplanes intersect in a line g. If vi and vj shall be on differnt sides of H, then they
72
Algorithm 6.3 SDP Rounding Maximum Cut
Input. Graph G = (V, E).
Output. Vector x ∈ {−1, 1}n
Step 1. Solve the SDP relaxation in Problem 6.9, call the solution z inducing a matrix Z.
Step 2. Compute a Cholesky decomposition of Z and obtain vectors vi ∈ Rn for i =
1, . . . , n with zi,j = viT vj and viT vi = 1.
Step 3. Let r ∈ Rn be the normal vector of a hyperplane. Let r be chosen at random with
a distribution that is symmetric under rotation.
Step 4. Define xi = −1 if rT vi ≤ 0; xi = 1 otherwise for i = 1, . . . , n and return x.
must also be on different sides of g in H 0 . This happens if the direction vector d of g is
within the angle α.
vj
α
vi
Since r is chosen symmetrically under rotation, this happens with probability
arccos(viT vj )
2α
α
= =
.
2π
π
π
Let X denote the number of cut-edges of the obtained solution x ∈ {−1, 1}n , then we have
E [X] =
X arccos(v T vj )
i
,
π
{i,j}∈E
by linearity of expectation.
Now we compare arccos(viT vj )/π with (1−viT vj )/2, i.e., our original objective function.
By numerical evaluation of the functions arccos(v)/π and (1 − v)/2 we see that
arccos(v)
1−v
≥ 0.878 ·
π
2
yielding the claim.
73
for −1 ≤ v ≤ 1,
Download