Chapter 6 Randomized Rounding Randomized rounding is a powerful method in the design and analysis of approximation algorithms. An approximation algorithm runs in polynomial time and determines a feasible solution (for a usually NP-hard problem), which is provably not “too far” away from a best-possible solution. 6.1 Basics Many optmization problems can be formulated in terms of Integer Linear Programming (ILP). We are given an m×n-matrix A = (ai,j )1≤i≤m,1≤j≤n and vectors b = (bi )1≤i≤m and c = (cj )1≤j≤n . Hence an instance is I = (A, b, c). Furthermore, we have a vector x = (xj )1≤j≤n of integer variables, i.e., xj ∈ Z.PThe problem is to extremize, i.e., minimize or maximize, the linear function val(x) = nj=1 cj xj , subject to A · x ≥ b. val is also called the objective function and a vector x which satisfies A · x ≥ b is called feasible solution. The set F (A, b) = {x ∈ Zn : A · x ≥ b} is the feasible region. A feasible solution x∗ where the desired extremum is attained is called optimal. We often write opt(I) = val(x∗ ). Problem 6.1 Integer Linear Programming Instance. Matrix A ∈ Rm,n , vectors b ∈ Rm , c ∈ Rn Task. Solve the problem minimize val(x) = n X cj xj , j=1 subject to n X j=1 ai,j xj ≥ bi i = 1, . . . , m, xj ∈ Z j = 1, . . . , n. As it is in general NP-hard to solve ILP, we often resort to approximation algorithms. We consider a minimization problem w.l.o.g. here. A polynomial time algorithm alg is called a c-approximation algorithm for Integer Linear Programming, if it produces for any instance I = (A, b, c) a feasible solution x ∈ F (A, b) with objective value alg(I) = val(x) satisfying alg(I) ≤ c. opt(I) 56 If we have a maximization problem, then we require alg(I)/opt(I) ≥ c. A common strategy for designing approximation algorithms for ILP is called LPrelaxation. If we replace each constraint xj ∈ Z by xj ∈ R, then the resulting problem can be solved in polynomial time as it is an instance of (ordinary) Linear Programming (LP). Let z be an optimal solution for the relaxed problem. Observe that val(z) ≤ val(x∗ ), if the original ILP problem was a minimization problem. However, the optimal solution z for the LP is in general infeasible for the ILP problem. Hence we still have to “round” z to a solution x ∈ F (A, b) (but having an eye on solution quality). This means it is sufficient to show an inequality of the form val(x) ≤ c · val(z). Then the resulting algorithm alg(I) = val(x) is c-approximate because alg(I) = val(x) ≤ c · val(z) ≤ c · val(x∗ ) = c · opt(I). The basic setting for randomized rounding is this: Suppose we have the requirements xj ∈ {0, 1} (instead of xj ∈ Z) for j = 1, . . . , n. Suppose we replace each constraint xj ∈ {0, 1} by xj ∈ [0, 1]. Then the values zj ∈ [0, 1] of any solution z can be interpreted as probabilities. Thus we do the following: We randomly produce a vector X ∈ {0, 1}n , where Xj = 1 with probability zj and Xj = 0 otherwise. By solving the obtained LP with solution z ∗ , say, we then have a randomized algorithm alg with n n n X X X E [alg(I)] = E cj Xj = cj E [Xj ] = cj zj∗ = val(z ∗ ). j=1 j=1 j=1 However, we still have to ensure that the solution X is also in the feasible region F (A, b) (with a certain probability). Sometimes this task is easy, sometimes not. It depends on the problem at hand. 6.2 Minimum Set Cover The Minimum Set Cover problem is a simple to state – yet quite general – NP-hard problem. It is widely applicable in sometimes unexpected ways. The problem is the following: We are given a set U (called universe) of n elements, a collection of sets S = {S1 , . . . , Sk } where Si ⊆ U and ∪S∈S S = U , and a cost function c : S → R+ . The task is to find a minimum cost subcollection S 0 ⊆ S that covers U , i.e., such that ∪S∈S 0 S = U . Example 6.1. Consider this instance: U = {1, 2, 3}, S = {S1 , S2 , S3 } with S1 = {1, 2}, S2 = {2, 3}, S3 = {1, 2, 3} and cost c(S1 ) = 10, c(S2 ) = 50, and c(S3 ) = 100. These collections cover U : {S1 , S2 }, {S3 }, {S1 , S3 }, {S2 , S3 }, {S1 , S2 , S3 }. The cheapest one is {S1 , S2 } with cost equal to 60. For each set S, we associate a variable xS ∈ {0, 1} that indicates of we want to choose S or not. We may thus write solutions for Minimum Set Cover as a vector x ∈ {0, 1}k . With this, we write Minimum Set Cover as a mathematical program. 57 Problem 6.2 Minimum Set Cover Instance. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function c : S → R. Task. Solve the problem minimize val(x) = X c(S)xS , S∈S subject to X S:e∈S xS ≥ 1 xS ∈ {0, 1} e ∈ U, S ∈ S. Deterministic Rounding Define the frequency of an element to be the number of sets it is contained in. Let f denote the frequency of the most frequent element. The idea of the algorithm below is to include those sets S into the cover for which the corresponding value zS in the optimal solution z of the LP is “large enough”. Algorithm 6.1 Simple Rounding Set Cover Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function c : S → R. Output. Vector x ∈ {0, 1}k Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z. X minimize val(x) = c(S)xS , S∈S subject to X S:e∈S xS ≥ 1 xS ≥ 0 e ∈ U, S ∈ S. Step 2. For each set S let xS = 1 if zS ≥ 1/f . Step 3. Return x. Theorem 6.2. Simple Rounding Set Cover is f -approximate for Minimum Set Cover. Proof. Let x be the solution returned by the algorithm and z be the optimal solution of the LP. Consider an arbitrary element e ∈ U . Since e is in at most f sets, one of these sets must be picked to the extent of at least 1/f in the fractional solution z. Thus e is covered due to the definition of the algorithm and x is hence a feasible cover. We further have xS ≤ f zS and thus val(x) ≤ f · val(z) ≤ f · val(x∗ ) where x∗ is an optimal solution for the Minimum Set Cover problem. 58 Randomized Rounding Another natural idea for rounding fractional solutions is to use randomization: For example, for the above relaxation, observe that the values zS are between zero and one. We may thus interpret these values as probabilities for choosing a certain set S. Here is the idea of the following algorithm: Solve the LP-relaxation optimally and call the solution z. With probability zS include the set S into the cover. This basic procedure yields a vector x with expected value equal to the optimal fractional solution value but might not cover all the elements. We thus repeat the procedure “sufficiently many” times and include a set into our cover if it was included in any of the iterations. We will show that O (log n) many iterations suffice for obtaining a feasible cover, thus yielding an O (log n)-approximation algorithm. Algorithm 6.2 Randomized Rounding Set Cover Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function c : S → R. Output. Vector x ∈ {0, 1}k Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z. X minimize val(x) = c(S)xS , S∈S subject to X S:e∈S xS ≥ 1 xS ≥ 0 e ∈ U, S ∈ S. Step 2. Repeat 2dln ne times: For each set S let xS = 1 with probability zS , leave xS unchanged otherwise. Step 3. Return x. Theorem 6.3. Randomized Rounding Set Cover is 2dln ne-approximate for Minimum Set Cover, in expectation. Proof. Let z be an optimal solution for the LP. In each iteration in Step 2, for each set S we let xS = 1 with probability zS . Then we have an increase in objective value in each iteration of at most X X X E [c(S)xS ] = c(S)Pr [xS = 1] = c(S)zS = val(z). S∈S S∈S S∈S We estimate the probability that an element u ∈ U is covered in one iteration. Let u be contained in k sets and let z1 , . . . , zk be the probabilities given in the solution z. Since u is fractionally covered we have z1 + · · · + zk ≥ 1. With easy but tedious calculus we see that – under this condition – the probability for u being covered is minimized when the zi are all equal, i.e., z1 = · · · = zk = 1/k: 1 Pr [xS = 1] = 1 − (1 − z1 ) · · · · · (1 − zk ) ≥ 1 − 1 − k 59 k 1 ≥1− . e Summing up this step: In each of the iterations in Step 2 the value of the solution x constructed increases by at most val(z) in expectation. Each element is covered with probability at least 1 − 1/e. But maybe we have not covered all elements after 2dln ne iterations. Here we show that we will have with high probability. The probability that element u is not covered at the end of the algorithm, i.e., after 2dln ne iterations is 2dln ne 1 1 Pr [u is not covered] ≤ ≤ 2. e n Thus the probability that there is an uncovered element is at most n/n2 = 1/n. Clearly, E [val(x)] ≤ 2dln ne · val(z) ≤ 2dln ne · val(x∗ ), where x∗ is an optimal solution for Minimum Set Cover. So, with high probability, the algorithm returns a feasible solution, whose expected value is at most 2dln ne. 6.3 Maximum Satisfiability The Satisfiability problem asks if a certain given Boolean formula has a satisfying assignment, i.e., one that makes the whole formula evaluate to true. There is a related optimization problem called Maximum Satisfiability. The goal of this chapter is to develop a deterministic 3/4-approximation algorithm by first giving a randomized algorithm, which will then be derandomized. We are given the Boolean variables X = {x1 , . . . , xn }, where each xi ∈ {0, 1}. A literal `i of the variable xi is either xi itself, called a positive literal, or its negation x̄i with truth value 1 − xi , called a negative literal. A clause is a disjunction C = (`1 ∨ · · · ∨ `k ) of literals `j of X; their number k is called the size of C. For a clause C let SC+ denote the set of its positive literals; similarly SC− the set of its negative literals. Let C denote the set of clauses. A Boolean formula in conjunctive form is a conjunction of clauses F = C1 ∧ · · · ∧ Cm . Each vector x ∈ {0, 1}n is called a truth assignment. For any clause C and any such assignment x we say that x satisfies C if at least one of the literals of C evaluates to 1. The problem Maximum Satisfiability is the following: We are given a formula F in conjunctive form and for each clause C a weight wC , i.e., a weight function w : C → N. The objective is to find a truth assignment x ∈ {0, 1}n that maximizes the total weight of the satisfied clauses. As an important special case: If we set all weights wC equal to one, then we seek to maximize the number of satisfied clauses. Now we introduce for each clause C a variable zC ∈ {0, 1} which takes the value one if and only if C is satisfied under a certain truth assignment x. Now we can formulate this as an ILP as done in Problem 6.3. The algorithm we aim for is a combination of two algorithms. One works better for small clauses, the other for large clauses. Both are initially randomized but can be derandomized using the method of conditional expectation, i.e., the final algorithm is deterministic. 6.3.1 Randomized Algorithm For each variable xi we define the random variable Xi that takes the value one with a certain probability pi and zero otherwise. This induces, for each clause C, a random variable ZC that takes the value one if C is satisfied under a (random) assignment and zero otherwise. 60 Problem 6.3 Maximum Satisfiability Instance. Formula F = C1 ∧ · · · ∧ Cm with m clauses over the n Boolean variables X = {x1 , . . . , xn }. A weight function w : C → N. Task. Solve the problem maximize val(z) = X wC z C , C∈C subject to X X xi + − i∈SC + i∈SC (1 − xi ) ≥ zC C ∈ C, zC ∈ {0, 1} C ∈ C, xi ∈ {0, 1} i = 1, . . . , n. Algorithm for Large Clauses Consider this algorithm Randomized Large: For each variable xi with i = 1, . . . , n, set Xi = 1 independently with probability 1/2 and Xi = 0 otherwise. Output X = (X1 , . . . , Xn ). Define the quantity αk = 1 − 2−k . Lemma 6.4. Let C be a clause. If size(C) = k then E [ZC ] = αk . Proof. A clause C is not satisfied, i.e., zC = 0 if and only if all its literals are set to zero. By independence, the probability of this event is exactly 2−k and thus E [ZC ] = 1 · Pr [ZC = 1] + 0 · Pr [ZC = 0] = 1 − 2−k = αk which was claimed. Theorem 6.5. Randomized Large is a 1/2-approximate for Maximum Satisfiability, in expectation. Proof. By linearity of expectation, Lemma 6.4, and size(C) ≥ 1 we have E [val(Z)] = X C∈C wC E [ZC ] = X C∈C wC αsize(C) ≥ 1X 1 wC ≥ val(z ∗ ) 2 2 C∈C where (x∗ , z ∗ ) is an optimal P solution for Maximum Satisfiability. We have used the ∗ obvious bound val(z ) ≤ C∈C wC . 61 Algorithm for Small Clauses Maybe the most natural LP-relaxation of the problem is: X maximize val(z) = wC z C , C∈C subject to X + i∈SC xi + X − i∈SC 0 ≤ zC ≤ 1 0 ≤ xi ≤ 1 (1 − xi ) ≥ zC C ∈ C, C∈C i = 1, . . . , n. In the sequel let (x, z) denote an optimum solution for this LP. Consider this algorithm Randomized Small: Determine (x, z). For each variable xi with i = 1, . . . , n, set Xi = 1 independently with probability xi and Xi = 0 otherwise. Output X = (X1 , . . . , Xn ). Define the quantity 1 k . βk = 1 − 1 − k Lemma 6.6. Let C be a clause. If size(C) = k then E [ZC ] = βk zC . Proof. We may assume that the clause C has the form C = (x1 ∨ · · · ∨ xk ); otherwise rename the variables and rewrite the LP. The clause C is satisfied if X1 , . . . , Xk are not all set to zero. The probability of this event is !k Pk (1 − x ) i i=1 1 − Πki=1 (1 − xi ) ≥ 1 − k !k Pk x i = 1 − 1 − i=1 k zC k ≥1− 1− . k Above we firstly have used the arithmetic-geometric mean inequality, which states that for non-negative numbers a1 , . . . , ak we have √ a1 + · · · + ak ≥ k a1 · · · · · ak . k Secondly the LP guarantees the inequality x1 + · · · + xk ≥ zC . Now define the function g(t) = 1 − (1 − t/k)k . This function is concave with g(0) = 0 and g(1) = 1 − (1 − 1/k)k which yields that we can bound g(t) ≥ t(1 − (1 − 1/k)k ) = tβk for all t ∈ [0, 1]. Therefore z C k Pr [ZC = 1] ≥ 1 − 1 − ≥ βk z C k and the claim follows. 62 Theorem 6.7. Randomized Small is 1 − 1/e-approximate for Maximum Satisfiability, in expectation. Proof. The function βk is decreasing with k. Therefore if all clauses are of size at most k, then by Lemma 6.6 X X E [val(Z)] = wC E [ZC ] ≥ βk wC zC = βk val(z) ≥ βk val(z ∗ ), C∈C C∈C where (x∗ , z ∗ ) is an optimal solution for Maximum Satisfiability. The claim follows since (1 − 1/k)k > 1/e for all k ∈ N. 3/4-Approximation Algorithm Consider the algorithm Randomized Combine: With probability 1/2 run Randomized Large otherwise run Randomized Small. Lemma 6.8. Let C be a clause, then E [ZC ] = 3 · zC . 4 Proof. Let the random variable B take the value zero if the first algorithm is run, one otherwise. For a clause C let size(C) = k. By Lemma 6.4 and zC ≤ 1 E [ ZC | B = 0] = αk ≥ αk zC . and by Lemma 6.4 E [ ZC | B = 1] ≥ βk zC . Combining we have E [ZC ] = E [ ZC | B = 0] Pr [B = 0] + E [ ZC | B = 1] Pr [B = 1] ≥ zC (αk + βk ). 2 It is tedious but relatively easy to see that αk + βk ≥ 3/2 for all k ∈ N. Theorem 6.9. Randomized Combine is 3/4-approximate for Maximum Satisfiability, in expectation. Proof. This follows from Lemma 6.8 and linearity of expectation. 6.3.2 Derandomization The notion of derandomization refers to “turning” a randomized algorithm into a deterministic one (possibly at the cost of additional running time or deterioration of approximation guarantee). One of the several available techniques is the method of conditional expectation. We are given a Boolean formula F = C1 ∧· · ·∧Cm in conjunctive form over the variables X = {x1 , . . . , xn }. Suppose we set x1 = 0, then we get a formula F0 over the variables x2 , . . . , xn after simplification; if we set x1 = 1 then we get a formula F1 . Example 6.10. Let F = (x1 ∨ x2 ) ∧ (x̄1 ∨ x3 ) ∧ (x1 ∨ x̄4 ) where X = {x1 , . . . , x4 }. x1 = 0 : F0 = (x2 ) ∧ (x4 ) x1 = 1 : F1 = (x3 ) 63 T (F ) x1 = 0 F level 0 x1 = 1 F0 F1 T (F0) T (F1) level 1 Figure 6.1: Derandomization tree for a formula F . Applying this recursively, we obtain the tree T (F ) depicted in Figure 6.1. The tree T (F ) is a complete binary tree with height n+1 and 2n+1 −1 vertices. Each vertex at level i corresponds to a setting for the Boolean variables x1 , . . . , xi . We label the vertices of T (F ) with their respective conditional expectations as follows. Let X1 = a1 , . . . , Xi = ai ∈ {0, 1} be the outcome of a truth assignment for the variables x1 , . . . , xi . The vertex corresponding to this assignment will be labeled E [ val(Z) | X1 = a1 , . . . , Xi = ai ] . If i = n, then this conditional expectation is simply the total weight of clauses satisfied by the truth assignment x1 = a1 , . . . , xn = an . The goal of the remainder of the section is to show that we can find deterministically in polynomial time a path from the root of T (F ) to a leaf such that the conditional expectations of the vertices on that path are at least as large as E [val(Z)]. Obviously, this property yields the desired: We can construct determistically a solution which is at least as good as the one of the randomized algorithm in expectation. Lemma 6.11. The conditional expectation E [ val(Z) | X1 = a1 , . . . , Xi = ai ] of any vertex in T (F ) can be computed in polynomial time. Proof. Consider a vertex X1 = a1 , . . . , Xi = ai . Let F 0 be the Boolean formula obtained from F by setting x1 , . . . , xi accordingly. F 0 is in the variables xi+1 , . . . , xn . Clearly, by linearity of expectation, the expected weight of any clause of F 0 under any random truth assignment to the variables xi+1 , . . . , xn can be computed in polynomial time. Adding to this the total weight of clauses satisfied by x1 , . . . , xi gives the answer. Theorem 6.12. We can compute in polynomial time a path from the root to a leaf in T (F ) such that the conditional expectation of each vertex on this path is at least E [val(Z)]. Proof. Consider the conditional expectation at a certain vertex X1 = a1 , . . . , Xi = ai for setting the next variable Xi+1 . We have that E [ val(Z) | X1 = a1 , . . . , Xi = ai ] = E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 0] Pr [Xi+1 = 0] + E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 1] Pr [Xi+1 = 1] . 64 We show that the two conditional expectations with Xi+1 can not be both strictly smaller than E [ val(Z) | X1 = a1 , . . . , Xi = ai ]. Assume the contrary, then we have E [ val(Z) | X1 = a1 , . . . , Xi = ai ] < E [ val(Z) | X1 = a1 , . . . , Xi = ai ] (Pr [Xi+1 = 0] + Pr [Xi+1 = 1]) which is a contradiction since Pr [Xi+1 = 0] + Pr [Xi+1 = 1] = 1. This yields the existence of such a path. And by Lemma 6.11 it can be computed in polynomial time. The derandomized version of a randomized algorithm now simply executes these proofs with the probability distribution as given by the randomized algorithm. 6.4 Integer Linear Programs Here we give a general approach for rounding a Linear Program (LP) relaxation of an Integer Linear Program (ILP). We especially obtain bounds on the infeasibility of the obtained solution for the ILP. The omitted proof is a simple application of a bound on concentration of measure, e.g., Azuma-Hoeffding. Theorem 6.13. Let A ∈ [−α, α]n,n be a matrix and let b ∈ Rn . Let z be any fractional solution for the linear program Ax = b, x ∈ [0, 1]n . Define a vector X ∈ {0, 1}n randomly by ( 1 with probability zi , Xi = 0 otherwise for all i = 1, . . . , n. Then, with high probability, an outcome x of X satisfies p p Ax − b ∈ [−c n log n, c n log n]n , for a suitable constant c = c(α) > 0. 6.5 Integer Quadratic Programs Many optimization problems can be formulated in terms of a Integer Quadratic Program (IQP). We are given a matrix A ∈ Rn,n , a vector b ∈ Rn , and our variables are x ∈ {0, 1}n , see Problem 6.4. We consider maximization, but of course one might also consider minimization. It is in general NP-hard to solve IQPs. In this section we give a randomized sampling and rounding algorithm which approximates those programs. The following algorithm approximates the large class of IQP, where the entries of A are constants (like, e.g., Maxmimum Cut) and the entries of b are linear in n (like dense graphs, when the minimum degree is linear in n). 65 Problem 6.4 Integer Quadratic Programming Instance. Matrix A ∈ Rn,n , vector b ∈ Rn Task. Solve the problem maximize val(x) = xT Ax + bT x, subject to x ∈ {0, 1}n . Theorem 6.14. Let A ∈ [−α, α]n,n and b ∈ [−βn, βn]n , where α > 0 and β > 0 are constants. Let x∗ ∈ {0, 1}n be an optimal solution for the corresponding IQP. 2 Then, for every ε > 0 there is an algorithm with running time O poly (n) n1/ε , which constructs a vector x ∈ {0, 1}n such that val(x) ≥ val(x∗ ) − εn2 . The proof is subdivided into the following principal steps: (i) Assume for the moment that x∗ is known. Then we can reformulate the IQP into a certain ILP (where it is at first sight not clear why we make any progress with this step). But since we actually do not know x∗ , the objective function and the right hand side of our ILP are unknown. (ii) However, we will “estimate” the objective function and the right hand side of the ILP using sampling with replacement. (iii) Then we relax the ILP to an LP which can be solved in polynomial time. (iv) Finally, the LP-solution will be rounded randomly. Proof. The first step is to transform the Integer Quadratic Program (IQP) into a Integer Linear Program (ILP). Let x∗ be optimal for IQP and assume for the moment that we actually know x∗ . Define the vector r ∈ Rn by rT = x∗T A + bT which now yields an ILP: maximize rT x, subject to xT A + bT = rT , x ∈ {0, 1}n . Clearly, x∗ is an optimal solution for this system. However, since x∗ is unknown, also r is unknown. The next step is hence to “estimate” r with a vector s ∈ Rn . We will allow that ri and si differ by a linear error, i.e., |ri − si | ≤ εn where I = {1, . . . , n} is the set of indices. 66 for i ∈ I, Recall from Section 3.4.4 that an unknown quantity can be estimated within an additive error by sampling with replacement. We want to estimate the quantity X ri = aj,i x∗j + bi . j∈I P Thus consider the universe U = {aj,i x∗j : j ∈ I} and we want to estimate u = j∈I aj,i x∗j . Now choose a set J ⊆ I of k indices with replacement at random. For those k variables xj with j ∈ J,thereare 2k possible 0 − 1 assignments. We choose k = Θ log n/ε2 and have hence Θ n1/ε 2 many possible assignments. These are few enough so that we can actually try all the 2k possibilities and eventually will find an assignment such that xj = x∗j for all j ∈ J which shall be assumed in the sequel. Hence, define a vector s∗ with nX aj,i x∗j + bi for i ∈ I s∗i = k j∈J as an estimator for ri . Since bi is fixed and the aj,i ∈ [−α, α] we can apply Theorem 3.19 on sampling with replacement. For suitable k = Θ log n/ε2 we have |ri − s∗i | ≤ εn, with probability at least 1 − 2n−2 per constraint i ∈ I. Hence with probability at least 1 − 2n−1 all n constraints satisfy this inequality. Where are we now? We can estimate r with s up to an additive error of εn. Thus we can approximate the IQP with the following ILP (where 1T denotes the transposed all-ones vector): maximize sT x, subject to xT A + bT ≥ sT − εn · 1T , xT A + bT ≤ sT + εn · 1T , x ∈ {0, 1}n . The vector x∗ is a feasible solution for this system and we have sT x∗ = rT x∗ − (rT − sT )x∗ = x∗T Ax∗ + bT x∗ − (rT − sT )x∗ ≥ val(x∗ ) − εn2 . However, we still are left with an ILP. We relax it to an LP by replacing the constraint x ∈ {0, 1}n with x ∈ [0, 1]n . Let z be an optimal fractional solution. By this relaxation we must have sT z ≥ sT x∗ . Apply the randomized rounding procedure stated above and obtain a vector x ∈ {0, 1}n which satisfies p n log n · 1T , xT A + bT ≥ sT − εn + O p xT A + bT ≤ sT + εn + O n log n · 1T 67 with high probability. Then, by randomized rounding, we have p sT x ≥ sT z − O n n log n p ≥ sT x∗ − O n n log n p ≥ val(x∗ ) − εn2 − O n n log n . Now define a vector δ T = z T A + bT − sT which captures an error. By the constraints we have that |δi | ≤ εn for each i ∈ I and hence |δ T x| ≤ εn2 . We are now in position to finalize the proof by combining the derived inequalities: val(x) = xT Ax + bT x = (xT A + bT )x = (xT A + bT − (z T A + bT ))x + δ T x + sT x p p ≥ O n n log n − εn2 + val(x∗ ) − εn2 − O n n log n = val(x∗ ) − (2ε + o (1))n2 . Thus the proof suggests the following algorithm Round IQP: Step 1. Sample J ⊆ I with cardinality k = Θ log n/ε2 . Step 2. For each of the 2k assignments xj ∈ {0, 1} for j ∈ J do: P (a) Compute the vector s, where si = n/k · j∈J aj,i xj + bi for i ∈ I. (b) Solve the LP max{sT x : sT − εn · 1T ≤ xT A + b ≤ sT + εn · 1T , x ∈ [0, 1]n }. Call the optimal solution z. (c) Round z ∈ [0, 1]n to x ∈ {0, 1}n randomly. Step 3. Return the vector x found which maximizes val. 2 The running time of the algorithm is dominated by trying all 2k = O n1/ε assignments set J ⊆ I and solving an LP for each. Thus we have running time in the sampled 2 1/ε O poly (n) n in total. 6.6 Maximum Cut Recall the problem Maximum Cut in which we are given an undirected graph G = (V, E) and have to partition V into (L, R) with L ⊆ V and R = V − L. The objective is to maximize the number of cut-edges, i.e., the edges e = {`, r} with the property ` ∈ L and r ∈ R. 6.6.1 Dense Instances Maxmimum Cut can be formulated as a IQP. Introduce for each vertex i ∈ V a variable xi ∈ {0, 1} indicating if i ∈ L. Let Ni = {j ∈ V : {i, j} ∈ E} denote the set of neighbours of i and di = |Ni | the degree of i. For each x ∈ {0, 1}n notice that X X xi · val(x) = (1 − xj ) i∈V j∈Ni 68 counts the number of cut-edges. Furthermore, let A = (ai,j )i,j∈V ∈ {0, 1}n,n with ai,j = 1 if and only if {i, j} ∈ E denote the adjacency matrix of G. Moreover let d = (di )i∈V be the degree vector of G. Then we have X X xi · val(x) = (1 − xj ) i∈V j∈Ni = X i∈V = X xi · (di − di xi + i∈V X xj ) j∈Ni X xi xj {i,j}∈E 1 = dT x − xT Ax. 2 Problem 6.5 Maximum Cut Instance. Adjacency matrix A ∈ {0, 1}n,n , degree vector d ∈ Nn0 Task. Solve the problem 1 val(x) = dT x − xT Ax, 2 n subject to x ∈ {0, 1} . maximize A graph G = (V, E) is called α-dense if it has at least αn2 many edges. Here we apply the above algorithm to dense instances of Maximum Cut. Corollary 6.15. Let G be α-dense for some constant α > 0. Then, forevery ε > 0 there 2 is a (1 − ε)-approximate algorithm with running time O poly (n) n1/ε for Maxmimum Cut. Proof. We apply the algorithm Round IQP to yield a solution x ∈ {0, 1}n with val(x) ≥ val(x∗ ) − εα 2 ·n . 2 Since the graph is α-dense, i.e., it has at least αn2 edges, and since assigning each xi = 1 with proabaility 1/2 yields a cut with expected α/2 · n2 many edges, we have val(x∗ ) ≥ α 2 ·n . 2 Thus we have val(x) ≥ val(x∗ ) − εα 2 · n ≥ val(x∗ ) − εval(x∗ ) = (1 − ε)val(x∗ ) 2 as claimed. 6.6.2 Approximation Algorithm In this section we give a randomized 0.878-approximation algorithm for Maximum Cut. The algorithm makes use of Semidefinite Programming, which we shall briefly introduce here. 69 Semidefinite Programming We need some notions from linear algebra respectively matrix theory. Recall that a matrix A ∈ Rn,n is symmetric if A = AT . An Eigenvector of a matrix A is a vector x 6= 0 such that Ax = λx, for some λ ∈ C, in which case λ is called Eigenvalue. Observe that the Eigenvalues of A are exactly the roots of the polynomial det(A − λI), where I denotes the identity matrix. The following facts are well-known: Theorem 6.16. Let A ∈ Rn,n be symmetric. Then A has n (not necessarily distinct) real Eigenvalues λ1 ≥ · · · ≥ λn ∈ R. Theorem 6.17 (Rayleigh). Let A ∈ Rn,n be symmetric. Then we have λ1 (A) = max xT Ax ||x||=1 and λn (A) = min xT Ax. ||x||=1 A matrix A ∈ Rn,n is called ( positive semi-definite (psd) if xT Ax ≥ 0 for all x 6= 0, positive definite (pd) if xT Ax > 0 for all x = 6 0. Thus, A is psd if and only if λn ≥ 0; pd if and only if λn > 0. Further, consider the matrix B 0 . A= 0 C This matrix A is psd if and only if B and C are psd. Theorem 6.18. Let A ∈ Rn,n be symmetric. Then the following are equivalent: (i) A is psd. (ii) λi ≥ 0 for i = 1, . . . , n. (iii) There is a matrix B ∈ Rn,n such that A = B T B. The form A = B T B is called Cholesky decomposition (which is in general not unique). However, B can always be chosen to be an upper (or lower) triangular matrix. This yields an algorithm which computes such a decomposition in time O n3 . Therefore we can test if a symmetric matrix is psd in polynomial time. We can even do that for non-symmetric matrices. Observation 6.19. Let A ∈ Rn,n and define the symmetric matrix A0 = 21 (A+AT ) ∈ Rn,n . Then we have that A0 is psd if and only if A is psd. Let A = (aTi,j ) ∈ Rn,2n be a vector matrix with row vectors aTi,j ∈ Rn for 1 ≤ i, j ≤ n. A is symmetric if aTi,j = aTj,i . Let x ∈ Rn . Define the operation A • x = (aTi,j x)1≤i,j≤n . For a matrix B ∈ Rn,n and x ∈ Rn define S(x) = A • x + B. The matrix S(x) ∈ Rn,n contains entries si,j (x) = aTi,j x + bi,j that are linear functions. 70 Example 6.20. The notation yields, e.g., (1, 2) (−1, 5) x1 0 8 x1 + 2x2 −x1 + 5x2 + 8 S(x) = • + = . (7, −3) (0, 0) x2 −3 1 7x1 − 3x2 − 3 1 With this notation we define Semidefinite Programming (SDP) as given in Problem 6.6. By Observation 6.19, the requirement that A and B are symmetric is without loss of generality. We may of course also require minimization instead of maximization. Notice the following: If A = diag(aT1 , . . . , aTn ) with row vectors aT1 , . . . , aTn and B = diag(b1 , . . . , bn ), then we obtain Linear Programming as a special case of Semidefinite Programming. Problem 6.6 Semidefinite Programming Instance. Symmetric vector matrix A ∈ Rn,2n , symmetric matrix B ∈ Rn,n , vector c ∈ Rn Task. Solve the problem maximize val(x) = cT x, subject to S(x) is psd, n x∈R . It is known that SDP can be solved in polynomial time up to an arbitrary error ε > 0 if we are guaranteed that there is a solution with polynomially bounded size. Such an error term can in general be not avoided because solutions for SDP might be irrational. For our purposes, we may assume that we can solve SDP in polynomial time. Application to Maximum Cut Consider the following formulation of Maximum Cut in Problem 6.7. The idea is that the term (1 − xi xj )/2 ∈ {0, 1} is equal to one if and only if xi , xj ∈ {−1, 1} assume different values. Thus define xi = −1 if and only if vertex i ∈ V is i ∈ L. Problem 6.7 Maximum Cut Instance. Graph G = (V, E) Task. Solve the problem maximize val(x) = X 1 − xi xj , 2 {i,j}∈E subject to x ∈ {−1, 1}n . Since it is not known how to solve this mathematical program directly (in polynomial time), we first relax the constraints as follows: Instead of requiring xi ∈ {−1, 1} consider a vector xi ∈ Rn with ||xi || = 1. Thus we have Problem 6.8. This relaxation is not yet an SDP, but by introducing a vector x = (xi,j )1≤i<j≤n of variables, we will obtain such a program in Problem 6.9. The reason is as follows: Since 71 Problem 6.8 Vector Maximum Cut Instance. Graph G = (V, E) Task. Solve the problem maximize val(x) = X 1 − xT xj i , 2 {i,j}∈E subject to ||xi || = 1, xi ∈ R n for i = 1, . . . , n for i = 1, . . . , n. symmetric psd matrices A can be written as A = B T B, we can interpret ai,j as a product of two vectors bTi bj . If ai,i = 1 we have bTi bi = 1 which means ||bi || = 1. Problem 6.9 Semidefinite Maximum Cut Instance. Graph G = (V, E) Task. Solve the problem maximize val(x) = X 1 − xi,j , 2 {i,j}∈E 1 x1,2 subject to X := . .. x1,2 1 .. . x1,n x2,n ... ... ... ... x1,n x2,n .. . is psd. 1 The idea of the algorithm SDP Rounding Maximum Cut is as follows: Suppose that we have solved the SDP relaxation of Maximum Cut with solution z and matrix Z. By computing the Cholesky decomposition of Z, we obtain vectors vi for i = 1, . . . , n with viT vj = zi,j and especially viT vi = 1. However, in our formulation of Maximum Cut, we require that a solution is given by x ∈ {−1, 1}n . Hence we have to “round” the vector vi ∈ Rn to xi ∈ {−1, 1} for i = 1, . . . , n. Now consider an arbitrary hyperplane with normal vector r ∈ Rn , i.e., a set H = {x ∈ Rn : rT x = 0}. One possible way of rounding vi is to assign xi = −1 if and only if vi and r are on different sides of H, i.e., if and only if rT vi ≤ 0. How do we choose the normal vector r? Why not at random, but symmetric under rotation. Theorem 6.21. SDP Rounding Maximum Cut is 0.878-approximate for Maximum Cut. Proof. Let vi and vj be two of the obtained vectors. By linearity of expectation, we have to estimate the probability that vi and vj lie on different sides of the hyperplane H induced by the normal vector r. Consider the hyperplane H 0 = {vi s + vj t : s, t ∈ R} defined by vi and vj . The angle α = arccos(viT vj ) is invariant under rotation. The probability that H is exactly H 0 is zero, i.e., we may condition on H 6= H 0 . Thus the hyperplanes intersect in a line g. If vi and vj shall be on differnt sides of H, then they 72 Algorithm 6.3 SDP Rounding Maximum Cut Input. Graph G = (V, E). Output. Vector x ∈ {−1, 1}n Step 1. Solve the SDP relaxation in Problem 6.9, call the solution z inducing a matrix Z. Step 2. Compute a Cholesky decomposition of Z and obtain vectors vi ∈ Rn for i = 1, . . . , n with zi,j = viT vj and viT vi = 1. Step 3. Let r ∈ Rn be the normal vector of a hyperplane. Let r be chosen at random with a distribution that is symmetric under rotation. Step 4. Define xi = −1 if rT vi ≤ 0; xi = 1 otherwise for i = 1, . . . , n and return x. must also be on different sides of g in H 0 . This happens if the direction vector d of g is within the angle α. vj α vi Since r is chosen symmetrically under rotation, this happens with probability arccos(viT vj ) 2α α = = . 2π π π Let X denote the number of cut-edges of the obtained solution x ∈ {−1, 1}n , then we have E [X] = X arccos(v T vj ) i , π {i,j}∈E by linearity of expectation. Now we compare arccos(viT vj )/π with (1−viT vj )/2, i.e., our original objective function. By numerical evaluation of the functions arccos(v)/π and (1 − v)/2 we see that arccos(v) 1−v ≥ 0.878 · π 2 yielding the claim. 73 for −1 ≤ v ≤ 1,