Near-Optimal Algorithms for Unique Games Moses Charikar∗ Konstantin Makarychev † Yury Makarychev ‡ Princeton University Abstract Unique games are constraint satisfaction problems that can be viewed as a generalization of Max-Cut to a larger domain size. The Unique Games Conjecture states that it is hard to distinguish between instances of unique games where almost all constraints are satisfiable and those where almost none are satisfiable. It has been shown to imply a number of inapproximability results for fundamental problems that seem difficult to obtain by more standard complexity assumptions. Thus, proving or refuting this conjecture is an important goal. We present significantly improved approximation algorithms for unique games. For instances with domain size k where the optimal solution√satisfies 1 − ε fraction of all constraints, our algorithms satisfy roughly k −ε/(2−ε) and 1 − O( ε log k) fraction of all constraints. Our algorithms are based on rounding a natural semidefinite programming relaxation for the problem and their performance almost matches the integrality gap of this relaxation. Our results are near optimal if the Unique Games Conjecture is true, i.e. any improvement (beyond low order terms) would refute the conjecture. 1 Introduction Given a set of linear equations over Zp with two variables per equation, consider the problem of finding an assignment to variables that satisfies as many equations (constraints) as possible. If there is an assignment to the variables which satisfies all the constraints, it is easy to find such an assignment. On the other hand, if there is an assignment that satisfies almost all constraints (but not all), it seems quite difficult to find a good satisfying assignment. This is the essence of the Unique Games Conjecture of Khot[10]. One distinguishing feature of the above problem on linear equations is that every constraint corresponds to a bijection between the values of the associated variables. For every possible value of one variable, there is a unique value of the second variable that satisfies the constraint. Unique games are systems of constraints – a generalization of linear equations discussed above – that have this uniqueness property (first considered by Feige and Lovász [6]). ∗ moses@cs.princeton.edu Supported by NSF ITR grant CCR-0205594, NSF CAREER award CCR-0237113, MSPA-MCS award 0528414, and an Alfred P. Sloan Fellowship. † kmakaryc@cs.princeton.edu Supported by a Gordon Wu fellowship. Part of this work was done at Microsoft Research. ‡ ymakaryc@cs.princeton.edu Supported by a Gordon Wu fellowship. Part of this work was done at Microsoft Research. 1 Definition 1.1 (Unique Game). A unique game consists of a constraint graph G = (V, E), a set of variables xu (for all vertices u) and a set of permutations πuv on [k] = {1, . . . , k} (for all edges (u, v)). Each permutation πuv defines the constraint πuv (xu ) = xv . The goal is to assign a value from the set [k] to each variable xu so as to maximize the number of satisfied constraints. As in the setting of linear equations, instances of unique games where all constraints are satisfiable are easy to handle. Given an instance where 1 − ε fraction of constraints are satisfiable, the Unique Games Conjecture (UGC) of Khot [10] says that it is hard to satisfy even δ fraction of the constraints. More formally, the conjecture is the following. Conjecture 1 (Unique Games Conjecture [10]). For any constants ε, δ > 0, for any k > k(ε, δ), it is NP-hard to distinguish between instances of unique games with domain size k where 1 − ε fraction of constraints are satisfiable and those where δ fraction of constraints are satisfiable. This conjecture has attracted a lot of recent attention since it has been shown to imply hardness of approximation results for several important problems: MaxCut [11, 15], Min 2CNF Deletion [3, 10], MultiCut and Sparsest Cut [3, 14], Vertex Cover [13], and coloring 3-colorable graphs [5] (based on a variant of the UGC), that seem difficult to obtain by standard complexity assumptions. Note that a random assignment satisfies a 1/k fraction of the constraints in a unique game. Andersson, Engebretsen, and Håstad [2] considered semidefinite program (SDP) based algorithms for systems of linear equations mod p (with two variables per equation) and gave an algorithm that performs (very slightly) better than a random assignment. The first approximation algorithm p for general unique games was given by Khot [10], and satisfies 1 − O(k 2 ε1/5 log(1/ε)) fraction of all constraints if 1 − ε fraction of all constraints is satisfiable. Recently Trevisan [17] developed √ 3 an algorithm that satisfies 1 − O( ε log n) fraction of all constraints (this can be improved to √ 1 − O( ε log n) [9]), and Gupta and Talwar [9] developed an algorithm that satisfies 1 − O(ε log n) fraction of all constraints. The result of [9] is based on rounding an LP relaxation for the problem, while previous results use SDP relaxations for unique games. There are very few results that show hardness of unique games. Feige and Reichman [7] showed that for every positive ε there is c s.t. it is NP-hard to distinguish whether c fraction of all constraints is satisfiable, or only εc fraction is satisfiable. Our Results. We present two new approximation algorithms for unique games. We state our guarantees for instances where 1 − ε fraction of constraints are satisfiable. The first algorithm satisfies −ε/(2−ε) ! 1 k 2 Ω min(1, √ ) · (1 − ε) · √ (1) ε log k log k √ fraction of all constraints. The second algorithm satisfies 1 − O( ε log k) fraction of all constraints and has a better guarantee for ε = O(1/ log k). We apply the same techniques for d-to-1 games as well. In order to understand the complexity theoretic implications of our results, it is useful to keep in mind that inapproximability reductions from unique games typically use the “Long Code”, which increases the size of the instance by a 2k factor. Thus, such applications of unique games usually have domain size k = O(log n). In Figure 1, we summarize known algorithmic guarantees for unique games. In order to compare these different guarantees in the context of hardness applications (i.e. k = O(log n)), we compare the range of values of ε (as a function of k) for which each of these algorithms beats the performance of a random assignment. 2 Algorithm Khot [10] Trevisan [17] Gupta-Talwar [9] This paper Guarantee for OP T = 1 − ε p 1 − O(k√2 ε1/5 log(1/ε)) 1 − O( 3 ε log n) 1 − O(ε log n) Threshold ε Õ(1/k 10 ) O(1/k) O(1/k) ≈ Ω(k −ε/(2−ε) ) √ 1 − O( ε log k) const < 1 O(1/ log k) Figure 1: Summary of results. The guarantee represents the fraction of constraints satisfied for instances where OP T = 1 − ε. The threshold represents the range of values of ε for which the algorithm beats a random assignment (computed for k = O(log n)). Our results show limitations on the hardness bounds achievable using the UGC and stronger versions of it. Chawla, Krauthgamer, Kumar, Rabani, and Sivakumar [3] proposed a strengthened 1 form of the UGC, conjecturing that it holds for k = log n and ε = δ = (log n) Ω(1) . This was used to obtain an Ω(log log n) hardness for sparsest cut. Our results refute this strengthened conjecture.1 The performance of our algorithms is naturally constrained by the integrality gap of the SDP relaxation, i.e. the smallest possible value of an integer solution for an instance with SDP solution of value (1−ε)|E|. Khot and Vishnoi [14] constructed a gap instance for the semidefinite relaxation2 for the Unique Games Problem where the SDP satisfies (1 − ε) fraction of constraints, but the optimal solution can satisfy at most O(k −ε/9 ) (one may show that their analysis can yield O(k −ε/4+o(ε) )). This shows that our results are almost optimal for the standard semidefinite program. After we unofficially announced our results, Khot, Kindler, Mossel and O’Donnell [12] showed that a reduction in an earlier version of their paper [11] together with the techniques in the recently proved Majority is Stablest result of Mossel, O’Donnell, and Oleszkiewicz [15] give lower bounds for unique games that almost match the upper bounds we obtain. They establish the following hardness results (in fact, for the special case of linear equations mod p): Theorem 1.2 ([12], Corollary 13). The Unique Games Conjecture implies that for every fixed ε > 0, for all k > k(ε), it is NP-hard to distinguish between instances of unique games with domain size k where at least 1 − ε fraction of constraints are satisfiable and those where 1/k ε/2−ε fraction of constraints are satisfiable. Theorem 1.3 ([12], Corollary 14). The Unique Games Conjecture implies that for every fixed ε > 0, for all k > k(ε), it is NP-hard to distinguish between instances of unique games p with √ domain size k where at least 1 − ε fraction of constraints are satisfiable and those where 1 − 2/π ε log k + o(1) fraction of constraints are satisfiable. Thus, our bounds√are near optimal if the UGC is true – even a slight improvement of the results 1/k ε/(2−ε) or 1 − O( ε log k) (beyond low order terms) will disprove the unique games conjecture! Our algorithms are based on rounding an SDP relaxation for unique games. The goal is to assign a value in [k] to every variable u. The SDP solution gives a collection of vectors {ui } for 1 An updated version of [3] proposes a different strengthened form of the UGC, which is still√plausible given our algorithms. They use a modified analysis to account for the asymmetry in ε and δ to obtain an Ω( log log n) hardness for sparsest cut based on this. 2 We use a slightly stronger SDP than they used, but their integrality gap construction works for our SDP as well. 3 every variable u, one for every value i ∈ [k]. Given a constraint πuv on u and v, the vectors ui and vπuv (i) are close. In contrast to the algorithms of Trevisan [17] and Gupta, Talwar [9], our rounding algorithms ignore the constraint graph entirely. We interpret the SDP solution as a probability distribution on assignments of values to variables and the goal of our rounding algorithm is to pick an assignment to variables by sampling from this distribution such that values of variables connected by constraints are strongly correlated. The rough idea is to pick a random vector and examine the projections of this vector on ui , picking a value i for u for which ui has a large projection. (In fact, this is exactly the algorithm of Khot [10]). We have to modify this basic idea to obtain our results since the ui ’s could have different lengths and other complications arise. Instead of picking one random vector, we pick several Gaussian random vectors. Our first algorithm (suitable for large ε) picks a small set of candidate assignments for each variable and chooses randomly amongst them (independently for every variable). It is interesting to note that such a multiple assignment is often encountered in algorithms implicit in hardness reductions involving label cover. In contrast to previous results, this algorithm has a non-trivial guarantee even for very large ε. As ε approaches 1 (i.e. for instances where the optimal solution satisfies only a small fraction of the constraints), the performance guarantee approaches that of a random assignment. Our second algorithm (suitable for small ε) carefully picks a single assignment so that almost all constraints are satisfied. The performance guarantee of this algorithm generalizes that obtained by Goemans and Williamson [8] for k = 2. Note that a unique game of domain size k = 2 where 1 − ε fraction of constraints is satisfiable is equivalent to an instance of Max-Cut where the optimal solution cuts 1 − ε fraction of all edges. For such instances, the random hyperplane rounding algorithm of [8] gives a solution √ of value 1 − O( ε), and our guarantee can be viewed as a generalization of this to larger k. The reader might wonder about the confluence of our bounds and the lower bounds obtained by Khot et al. [12]. In fact, both arise from the analysis of the same quantity: Given √ two unit vectors with dot product 1 − ε, conditioned on the probability that one has projection Θ( log k) on a random Gaussian vector, what is the probability that the other has a large projection as well? This question arises naturally in the analysis of our rounding algorithms. On the other hand, the bounds obtained by Khot et al. [12] depend on the noise stability of certain functions. Via the results of [15], this is bounded by the answer to the above question. In Section 2, we describe the semidefinite relaxation for unique games. In Sections 3 and 4, we present and analyze our approximation algorithms. In Section 5, we apply our results to d-to-1 games. We defer some of the technical details of our analysis to the Appendix. Recently, Chlamtac, Makarychev and Makarychev [4] have combined our approach with techniques √ of metric embeddings. Their approximation algorithm for unique games satisfies 1 − O(ε log n log k) fraction of all constraints. This generalizes the result of Agarwal, Charikar, Makarychev, and Makarychev [1] for the Min UnCut Problem (i.e. the case k = 2). Note that their approximation guarantee is not comparable with ours. 2 Semidefinite Relaxation First we reduce a unique game to an integer program. For each vertex u we introduce k indicator variables ui ∈ {0, 1} (i ∈ [k]) for the events xu = i. For every u, the intended solution has ui = 1 for exactly one i. The constraint πuv (xu ) = xv can be restated in the following form: for all i ui = vπuv (i) . 4 The unique game instance is equivalent to the following integer quadratic program: ! k X 1 X 2 |ui − vπuv (i) | minimize 2 i=1 (u,v)∈E subject to ∀u ∈ V ∀i ∈ [k] ui ∈ {0, 1} ∀u ∈ V ∀i, j ∈ [k], i 6= j u i · uj = 0 ∀u ∈ V k X u2i = 1 i=1 Note that the objective function measures the number of unsatisfied constraints. The contribution of (u, v) ∈ E to the objective function is equal to 0 if the constraint πuv is satisfied, and 1 otherwise. The last two equations say that exactly one ui is equal to 1. We now replace each integer variable ui with a vector variable and get a semidefinite program (SDP): k 1 X X minimize |ui − vπuv (i) |2 2 (u,v)∈E i=1 subject to ∀u ∈ V ∀i, j ∈ [k], i 6= j hui , uj i = 0 k X ∀u ∈ V (2) |ui |2 = 1 (3) i=1 ∀(u, v) ∈ E i, j ∈ [k] hui , vj i ≥ 0 (4) 2 ∀(u, v) ∈ E i ∈ [k] 0 ≤ hui , vπuv (i) i ≤ |ui | (5) The last two constraints are triangle inequality constraints3 for the squared Euclidean distance: inequality (4) is equivalent to |ui − 0|2 + |vj − 0|2 ≥ |ui − vj |2 , and inequality (5, right side) is equivalent to |ui − vπuv (i) |2 + |ui − 0|2 ≥ |vπuv (i) − 0|2 . A very important constraint is that for i 6= j the vectors ui and uj are orthogonal. This SDP was studied by Khot [10], and by Trevisan [17]. Here is an intuitive interpretation of the vector solution: Think of the elements of the set [k] as states of the vertices. If ui = 1, the vertex is in the state i. In the vector case, each vertex is in a mixed state, and the probability that xu = i is equal to |ui |2 . The inner product hui , vj i can be thought of as the joint probability that xu = i and xv = j. The directions of vectors determine whether two states are correlated or not: If the angle between ui and vj is small it is likely that both events “u is in the state i” and “v is in the state j” occur simultaneously. In some sense later we will treat the lengths and the directions of vectors separately. 3 Rounding Algorithm We first describe a high level idea for the first algorithm. Pick a random Gaussian vector g (with standard normal independent components). For every vertex u add those vectors ui whose inner 3 We will use constraint 4 only in the second algorithm. 5 product with g are above some threshold τ to the set Su ; we choose the threshold τ in such a way that the set Su contains only one element in expectation. Then pick a random state from Su and assign it to the vertex u (if Su is empty do not assign any states to u). What is the probability that the algorithm satisfies a constraint between vertices u and v? Loosely speaking, this probability is equal to |Su ∩ πuv (Sv )| E ≈ E [|Su ∩ πuv (Sv )|] . |Su | · |Sv | Assume for a moment that the SDP solution is symmetric: the lengths of all vectors ui are the same and the squared Euclidean distance between every ui and vπuv (i) is equal to 2ε. (In fact, these constraints can be added to the SDP in the special case of systems of linear equations of the form xi − xj = cij (mod p).) Since we want the expected size of Su to be 1, we √ pick threshold τ such that the probability that hg, u i ≥ τ equals 1/k. The random variables hg, k · ui i i √ and hg, k · vπuv (i) i are standard normal random variables with covariance 1 − ε (note that we √ k). For such random variables if the multiplied the inner products by a normalization factor of √ √ probability of the event hg, k · u i ≥ t ≡ kτ equals 1/k, then i √ √ √ √ roughly speaking the probability of the event hg, k · ui i ≥ t ≡ k · τ and hg, k · vπuv (i) i ≥ t ≡ k · τ equals k −ε/2 · 1/k. Thus the expected size of the intersection of the sets Su and πuv (Sv ) is approximately k −ε/2 . Unfortunately this no longer works if the lengths of vectors are different. The main problem is that if, say, u1 is two times longer than u2 , then Pr (u1 ∈ Su ) is much larger than Pr (u2 ∈ Su ). One of the possible solutions is to normalize all vectors first. In order to take into account original lengths of vectors we repeat the procedure of adding vectors to the sets Su many times, but each vector ui has a chance to be selected in the set Su only in the first su,i trials, where su,i is some integer number proportional to the original squared Euclidean length of ui . We now formally present a rounding algorithm for the SDP described in the previous section. In Appendix D, we describe an alternate approach to rounding the SDP. Theorem 3.1. There is a polynomial time algorithm that finds an assignment of variables which satisfies −ε/(2−ε) ! 1 k Ω min(1, √ ) · (1 − ε)2 · √ ε log k log k fraction of all constraints if the optimal solution satisfies (1 − ε) fraction of all constraints. Rounding Algorithm 1 Input: A solution of the SDP, with the objective value ε · |E|. Output: An assignment of variables xu . 1. Define ũi = ui /|ui | if ui 6= 0, 0 otherwise. Note that vectors ũ1 , . . . , ũk are orthogonal unit vectors (except for those vectors that are equal to zero). 2. Pick random independent Gaussian vectors g1 , . . . , gk with independent components distributed as N (0, 1). 3. For each vertex u: (a) Set sui = d|ui |2 · ke. 6 (b) For each i project sui vectors g1 , . . . , gsui to ũi : ξui ,s = hgs , ũi i, 1 ≤ s ≤ sui . Note that ξu1 ,1 , ξu1 ,2 , . . . , ξu1 ,su1 , . . . , ξuk ,1 , . . . , ξuk ,suk are independent standard normal random variables. (Since ui and uj are orthogonal if i 6= j, their projections onto a random Gaussian vector are independent). The number of random variables corresponding to each ui is proportional to |ui |2 . (c) Fix a threshold t s.t. Pr (ξ ≥ t) = 1/k, where ξ ∼ N √(0, 1)(i.e. t is the (1 − 1/k)-quantile of the standard normal distribution; note that t = Θ log k ). (d) Pick ξui ’s that are larger than the threshold t: Su = {(i, s) : ξui ,s ≥ t} . (e) Pick at random a pair (i, s) from Su and assign xu = i. If the set Su is empty do not assign any value to the vertex: this means that all the constraints containing the vertex are not satisfied. We introduce some notation. Definition 3.2. Define the distance between two vertices u and v as: k εuv 1X |ui − vπuv (i) |2 = 2 i=1 and let 1 εiuv = |ũi − ṽπuv (i) |2 . 2 If ui and vπuv (i) are nonzero vectors and αi is the angle between them, then εiuv = 1 − cos αi . For consistency, if one of the vectors is equal to zero we set εiuv = 1 and αi = π/2. Lemma 3.3. For every edge (u, v), state i in [k] and s ≤ min(sui , svπuv (i) ) the probability that the algorithm picks (i, s) for the vertex u and (πuv (i), s) for v at the step 3.e is 2/(2−εiuv ) ! √ 1 1 log k Ω min(1, p )· √ · . (6) k log k εiuv log k Proof. First let us observe that ξui ,s and ξvπuv (i) ,s are standard normal random variables with covariance cos αi = 1 − εiuv . As we will see later (Lemma B.1) the probability that ξui ,s ≥ t and ξvπuv (i) ,s ≥ t is equal to (6). Note that the expected number of elements in Su is equal to (su1 + . . . + suk )/k which is at most 2. Moreover, as we prove in the Appendix (Lemma B.3), the conditional expected number of elements in Su given the event ξui ,s ≥ t and ξvπuv (i) ,s ≥ t is also a constant. Thus by the Markov inequality the following event happens with probability (6): The sets Su and Sv contain the pairs (i, s) and (πuv (i), s) respectively and the sizes of these sets are bounded by a constant. The lemma follows. 7 Definition 3.4. For brevity, denote √ log k/k 2/(2−x) by fk (x). Remark 3.1. It is instructive to consider the case when the SDP solution is uniform in the following sense: √ 1. The lengths of all vectors ui are the same and are equal to 1/ k. 2. All εiuv are equal to ε. In this case all sui are equal to 1. And thus the probability that a constraint is satisfied is k times the probability (6) which is equal, up to a logarithmic factor, to k −ε/(2−ε) . Multiplying this probability by the number of edges we get that the expected number of satisfied constraints is k −ε/(2−ε) |E|. In the general case however we need to do some extra work to average the probabilities among all states i and edges (u, v). Recall that we interpret |ui |2 as the probability that the vertex u is in the state i. Suppose now that the constraint between u and v is satisfied, what is the conditional probability that u is in the state i and v is in the state πuv (i)? Roughly speaking, it should be equal to (|ui |2 + |vπuv (i) |2 )/2. This motivates the following definition. Definition 3.5. Define a measure µuv on the set [k]: µuv (T ) = X |ui |2 + |vπuv (i) |2 2 i∈T , where T ⊂ [k]. Note that µuv ([k]) = 1. This follows from constraint (3). The following lemma shows why this measure is useful. Lemma 3.6. For every edge (u, v) the following statements hold. 1. The average value of εiuv w.r.t. the measure µuv is less than or equal to εuv : k X µuv (i)εiuv ≤ εuv . i=1 2. For every i, min(sui , svπuv (i) ) ≥ (1 − εiuv )2 µuv (i)k. Proof. 1. Indeed, k X µuv (i) · εiuv = i=1 ≤ k X |ui |2 + |vπuv (i) |2 − (|ui |2 + |vπuv (i) |2 ) · cos αi i=1 k X i=1 = 2 |ui |2 + |vπuv (i) |2 − 2 · |ui | · |vπuv (i) | · cos αi 2 k X |ui − vπuv (i) |2 i=1 2 8 = εuv Note that here we used the fact that hui , vπuv (i) i ≥ 0. 2. Without loss of generality assume that |ui | ≤ |vπuv (i) |, and hence min(sui , svπuv (i) ) = sui . Due to the triangle inequality constraint (5) in the SDP |vπuv (i) | cos αi ≤ |ui |. Thus (1 − εiuv )2 µuv (i) = cos2 αi · |ui |2 + |vπuv (i) |2 ≤ |ui |2 ≤ sui /k. 2 Lemma 3.7. For every edge (u, v) the probability that an assignment found by the algorithm satisfies the constraint πuv (xu ) = xv is k 1 Ω √ · min(1, √ ) · fk (εuv ) . (7) log k εuv log k Proof. Denote the desired probability by Puv . It is equal to the sum of the probabilities obtained in Lemma 3.3 over all i in [k] and s ≤ min (sui , svπuv (i) ). In other words, ! k X 1 1 Puv = Ω )fk (εiuv ) . min (sui , svπuv (i) ) √ min(1, p i log k log k ε uv i=1 Replacing min (sui , svπuv (i) ) with (1 − εiuv )2 µuv (i) · k we get ! k k X 1 Puv = Ω √ )(1 − εiuv )2 fk (εiuv ) . µuv (i) min(1, p log k i=1 εiuv log k p i i Consider √ the set M = i ∈ [k] : εuv ≤ 2εuv . For i in M the term εuv log k is bounded from above by 2εuv log k. Thus ! X 1 k min(1, √ ) µuv (i)(1 − εiuv )2 fk (εiuv ) . Puv = Ω √ log k εuv log k i∈M The function (1 − x)2 fk (x) is convex on [0, 1] (see Lemma B.4). The average value of εiuv among i in M (w.r.t. the measure µuv ) is at most the average value of εiuv among all i, which in turn is less than εuv according to Lemma 3.6. Finally, by the Markov inequality µuv (M ) ≥ 1/2. Thus by Jensen’s inequality k 1 Puv = Ω √ min(1, √ )µuv (M )(1 − εuv )2 · fk (εuv ) log k εuv log k k 1 2 = Ω √ min(1, √ )(1 − εuv ) · fk (εuv ) . log k εuv log k This finishes the proof. We are now in position to prove the main theorem. Theorem 3.1. There is a polynomial time algorithm that finds an assignment of variables which satisfies −ε/(2−ε) ! 1 k 2 Ω min(1, √ ) · (1 − ε) · √ ε log k log k fraction of all constraints if the optimal solution satisfies (1 − ε) fraction of all constraints. 9 Proof. Let us restrict our attention to a subset of edges E 0 = {(u, v) ∈ E : εuv ≤ 2ε}. For (u, v) in E 0 , since εuv log k ≤ 2ε log k, we have 1 k 2 min(1, √ )(1 − εuv ) · fk (εuv ) . Puv = Ω √ log k ε log k Summing this probability over all edges (u, v) in E 0 and using convexity of the function (1−x)2 fk (x) we get the statement of the theorem. 4 Almost Satisfiable Instances Suppose that ε is O(1/ log k). In the previous section we saw that in this case the algorithm finds an assignment of variables satisfying a constant fraction of constraints. But can we do better? In √ this section we show how to find an assignment satisfying 1 − O( ε log k) fraction of constraints. The main issue we need to take care of is to guarantee that the algorithm always picks only one element in the set Su (otherwise we loose a constant factor). This can be done by selecting the largest in absolute value ξui ,s (at step 3.d). We will also change the way we set sui . Denote by [x]r the function that rounds x up or down depending on whether the fractional part of x is greater or less than r. Note that if r is a random variable uniformly distributed in the interval [0, 1], then the expected value of [x]r is equal to x. Rounding Algorithm 2 Input: A solution of the SDP, with the objective value ε · |E|. Output: An assignment of variables xu . 1. Pick a number r in the interval [0, 1] uniformly at random. 2. Pick random independent Gaussian vectors g1 , . . . , g2k with independent components distributed as N (0, 1). 3. For each vertex u: (a) Set sui = [2k · |ui |2 ]r . (b) For each i project sui vectors g1 , . . . , gsui to ũi : ξui ,s = hgs , ũi i, 1 ≤ s ≤ sui . (c) Select ξui ,s with the largest absolute value, where i ∈ [k] and s ≤ sui . Assign xu = i. We first elaborate on the difference between the choice of sui in the algorithm above and that in Algorithm 1 presented earlier. Consider a constraint πuv (xu ) = xv . Projection ξui ,s generated by ui and ξvπuv (i) ,s generated by vπuv (i) are considered to be matched. On the other hand, a projection ξui ,s such that the corresponding ξvπuv (i) ,s does not exist (or vice versa) is considered to be unmatched. Unmatched projections arise when sui 6= svπuv (i) and the fraction of such projections limits the probability of satisfying the constraint. Recall that in Algorithm 1, we set sui = d|ui |2 · ke. Even if ui and vπuv (i) are infinitesimally close, it may turn out that sui and svπuv (i) differ by 1, yielding an unmatched projection. As a result, some constraints that are almost satisfied by the SDP solution (i.e. εuv is close to 0) could be satisfied with low probability (by the first rounding algorithm). In 10 h i Algorithm 2, we set sui = [2k · |ui |2 ]r . This serves two purposes: Firstly, Er |sui − svπuv (i) | can be bounded by 2k · |ui − vπuv (i) |2 , giving a small number of unmatched projections in expectation. Secondly, the number of matched projections is always at least k/2. These two properties are established in Lemma 4.3 and ensure that the expected fraction of unmatched projections is small. Our analysis of Rounding Algorithm 2 is based on the following theorem. Theorem 4.1. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables. Suppose that the random variables in each of the sequences are independent, the covariance of every ξi and ηj is nonnegative, and the average covariance of ξi and ηi is at least 1 − ε: cov(ξ1 , η1 ) + · · · + cov(ξm , ηm ) ≥ 1 − ε. m Then the probability that the largest r.v. in absolute value in the first√sequence has the same index as the largest r.v. in absolute value in the second sequence is 1 − O( ε log m). We informally sketch the proof. See Appendix C for the complete proof. It is instructive to consider the case when cov(ξi , ηi ) = 1 − ε for all i. Assume that the first variable ξ1 is the largest in absolute value among ξ1 , . .√. , ξm and its absolute value is a positive number t. Note that the typical value of t is approximately 2 log m − log log m (i.e. t is the (1−1/m)-quantile of N (0, 1)).√We want to show that η1 is the largest in absolute value among η1 , . . . , ηm with probability √ 1 − O( ε log m), or in other words the probability that any (fixed) ηi is larger than η1 is O( ε log m/m). Let us compute this probability for η2 . Since cov(η1 , ξ1 ) = 1−ε and cov(ξ2 , η2 ) = 1−ε, the random variable η1 is equal to (1−ε)ξ1 +ζ1 ; and η2 is equal to (1−ε)ξ2 +ζ2 , where ζ1 and ζ2 are normal random variables with variance, roughly speaking, 2ε. We need to estimate the probability of the event {η2 ≥ η1 } = {(1 − ε)ξ2 + ζ2 ≥ (1 − ε)ξ1 + ζ1 } = {(1 − ε)ξ2 + ζ2 − ζ1 ≥ (1 − ε)t} conditional on ξ1 = t and ξ2 ≤ t. For typical t this probability is almost equal to the probability of the event: {ξ2 + ζ ≥ t and ξ2 ≤ t} = {t − ζ ≤ ξ2 ≤ t} (8) where ζ = ζ2 − ζ1 . √ Since the variance of the random variable ζ is O(ε), can think that ζ ≈ O( ε). The density √ we √ 2 of ξ2 on the interval [t − ζ, t] √ is approximately 1/ 2πe−t /2 ≈ O( log m/m) (for typical t). Thus probability (8) is equal to O( ε log m/m). This finishes our informal “proof”. Now we are ready to prove the main lemma. Lemma 4.2. The probability that √ the algorithm finds an assignment of variables satisfying the constraint πuv (xu ) = xv is 1 − O( εuv log k). Proof. If εuv ≥ 1/8 the statement of the lemma follows trivially. So we assume that εuv ≤ 1/8. Let n o M = (i, s) : i ∈ [k] and s ≤ min(sui , svπuv (i) ) ; n o Mc = (i, s) : i ∈ [k] and min(sui , svπuv (i) ) < s ≤ max(sui , svπuv (i) ) . The set M contains those pairs (i, s) for which both ξui ,s and ξvπuv (i) ,s are defined (i.e. the matched projections); the set Mc contains those pairs for which only one of the variables ξui ,s and ξvπuv (i) ,s is defined (i.e. the unmatched projections). We will need the following lemmas. 11 Lemma 4.3. 1. The expected size of Mc is at most 4εuv k: E [|Mc |] ≤ 4εuv k. 2. The set M always contains at least k/2 elements: |M | ≥ k/2. Proof. 1. First we find the expected value of |sui − svπuv (i) | for a fixed i. This value is equal to Er [2k · |ui |2 ]r − [2k · |vπuv (i) |2 ]r = 2k · |ui |2 − |vπuv (i) |2 . Now by the triangle inequality constraint (5), 2k · |ui |2 − |vπuv (i) |2 ≤ 2k · |ui − vπuv (i) |2 . Summing over all i in [k] we finish the proof. 2. Observe that min(sui , svπuv (i) ) ≥ 2k · min(|ui |2 , |vπuv (i) |2 ) − 1 and min(|ui |2 , |vπuv (i) |2 ) ≥ |ui |2 − ||ui |2 − |vπuv (i) |2 | ≥ |ui |2 − |ui − vπuv (i) |2 . Summing over all i we get X X |M | = min(sui , svπuv (i) ) ≥ 2k · |ui |2 − 2k · |ui − vπuv (i) |2 − 1 i∈[k] i∈[k] ≥ 2k − 4kεuv − k ≥ k/2. Lemma 4.4. The following inequality holds: X 1 E εiuv ≤ 4εuv . |M | (i,s)∈M Proof. Recall that M always contains at least k/2 elements. The expected value of min(sui , svπuv (i) ) is equal to 2k · min(|ui |2 , |vπuv (i) |2 ) and is less than or equal to 2k · µuv (i). Thus we have " # k X X 1 1 i i εuv = Er min(sui , svπuv (i) ) · εuv Er |M | |M | (i,s)∈M i=1 ≤ k k i=1 i=1 X 2X 2k · µuv (i) · εiuv ≤ 4 µuv (i) · εiuv ≤ 4εuv . k 12 Proof of Lemma 4.2 Applying Theorem 4.1 to the sequences ξui ,s ((i, s) ∈ M ) and ξvπuv (i) ,s ((i, s) ∈ M ) we get that for given r the probability that the largest in absolute value random variables in the first sequence ξui ,s and the second sequence ξvπuv (i) ,s have the same index (i, s) is v u X 1 u 1 − O tlog |M | · εiuv . |M | (i,s)∈M √ Now by Lemma 4.4, and by the concavity of the function x, we have v u p u log |M | X i εuv ≥ 1 − O Er 1 − O t εuv log k . |M | (i,s)∈M The probability that there is a larger ξui ,s or ξvπuv (i) ,s in Mc is at most Er 4εuv k |Mc | ≤ = 8εuv . |M | k/2 Using the union bound we get that the probability of satisfying the constraint πuv (xu ) = xv is equal to p p 1 − O( εuv log k) − 8εuv = 1 − O( εuv log k). Theorem 4.5.√There is a polynomial time algorithm that finds an assignment of variables which satisfies 1 − O( ε log k) fraction of all constraints if the optimal solution satisfies (1 − ε) fraction of all constraints. Proof. Summing the probabilities obtained in Lemma 4.2 over all edges (u, v) and using the √ concavity of the function x we get that the expected number of satisfied constraints is 1 − √ O( ε log k)|E|. 5 d to 1 Games In this section we extend our results to d-to-1 games. Definition 5.1. We say that Π ⊂ [k] × [k] is a d-to-1 predicate, if for every i there are at most d different values j such that (i, j) ∈ Π, and for every j there is at most one i such that (i, j) ∈ Π. Definition 5.2 (d to 1 Games). We are given a directed constraint graph G = (V, E), a set of variables xu (for all vertices u) and d-to-1 predicates Πuv ⊂ [k] × [k] for all edges (u, v). Our goal is to assign a value from the set [k] to each variable xu , so that the maximum number of the constraints (xu , xv ) ∈ Πuv is satisfied. 13 Note that even if all constraints of a d-to-1 game are satisfiable it is hard to find an assignment of variables satisfying all constraints. We will show how to satisfy √ − √d−1+ε d+1−ε 1 k · (1 − ε)4 · √ Ω √ log k log k fraction of all constraints (the multiplicative constant in the Ω notation depends √ on d). Notice that this value can be obtained by replacing ε in formula (1) with ε0 = 1 − (1 − ε)/ d (and changing (1 − ε)2 to (1 − ε)4 ). Even though we do not require that for a constraint Πuv each i in [k] belongs to some pair (i, j) ∈ Πuv , let us assume that for each i there exists j s.t. (i, j) ∈ Πuv ; and for each j there exists i s.t. (i, j) ∈ Πuv . As we see later this assumption is not important. In order to write a relaxation for d-to-1 games introduce the following notation: X i wuv = vj . j:(i,j)∈Πuv The SDP is as follows: k X i 2 ui − wuv 1 X minimize 2 (u,v)∈E ! i=1 subject to ∀u ∈ V ∀i, j ∈ [k], i 6= j ∀u ∈ V hui , uj i = 0 k X |ui |2 = 1 (9) (10) i=1 ∀(u, v) ∈ V i, j ∈ [k] hui , vj i ≥ 0 i i 2 ∀(u, v) ∈ V i ∈ [k] 0 ≤ hui , wuv i ≤ min(|ui |2 , |wuv | ) (11) (12) 1 |2 + . . . + |w k |2 = 1, here we use the fact that for a fixed An important observation is that |wuv uv i . edge (u, v) each vj is a summand in one and only one wuv We use Algorithm 1 for rounding a vector solution. For analysis we will need to change some notation: ( i /|w i |, if w i 6= 0; wuv uv uv i w̃uv = 0, otherwise εiuv = i |2 |ũi − w̃uv 2 εiuv 0 = 1 − 1 − εiuv √ d i |2 |ui |2 + |wuv 2 The following lemma explains why we get the new dependency on ε. µuv (i) = 14 Lemma 5.3. For every edge (u, v) and state i there exists j 0 s.t. (i, j 0 ) ∈ Πuv and |ũi − ṽj 0 |2 /2 ≤ εiuv 0 . Proof. Let u0i be the projection of the vector ũi to the linear span of the vectors vj (where (i, j) ∈ i ; and let β be the angle between ũ and u0 . Clearly, Πuv ). Let αi be the angle between ũi and wuv i i i 0 i |ui | = cos βi ≥ cos αi =√1 − εuv . Since all ṽj ((i, j) ∈ Πuv ) are orthogonal unit vectors, there exists √ ṽj 0 s.t. hṽj 0 , u0i i ≥ |u0i |/ d. Hence, hṽj 0 , ũi i = hṽj 0 , u0i i ≥ (1 − εiuv )/ d. For every edge (u, v) and state i, find j 0 as in the previous lemma and define a function4 πuv (i) = j 0 . Then replace every constraint (xu , xv ) ∈ Πuv with a stronger constraint πuv (xu ) = xv . Now we can apply the original analysis√of Algorithm 1 to the new√problem. In the proof we need to substitute εiuv 0 for εiuv , 1 − (1 − εuv )/ d for εuv , and 1 − (1 − ε)/ d for ε. The only missing step is the following lemma. Lemma 3.60 . For every edge (u, v) the following statements hold. 1. The average value of εiuv w.r.t. the measure µuv is less than or equal to εuv . 2. The average value of εiuv 0 w.r.t. the measure µuv is less than or equal to 1 − 1−ε √ uv . d 3. min(sui , svπuv (i) ) ≥ d(1 − εiuv 0 )4 µuv (i)k. i and let α0 be the angle between u and v Proof. Let αi be the angle between ui and wuv i πuv (i) . i 1. Indeed, k X µuv (i) · εiuv = i=1 ≤ = k i |2 − (|u |2 + |w i |2 ) · cos α X |ui |2 + |wuv i i uv 2 i=1 k X i=1 k X i=1 i |2 − 2 · |u | · |w i | · cos α |ui |2 + |wuv i i uv 2 i |2 |ui − wuv = εuv . 2 2. This follows from part 1 and the definition of εiuv 0 . i | cos α ≤ |u |. Thus 3. Due to the triangle inequality constraint, |wuv i i (1 − εiuv )2 µuv (i) = cos2 αi · i |2 |ui |2 + |wuv ≤ |ui |2 . 2 Similarly |vπuv (i) | cos αi0 ≤ |ui | and (1 − εiuv 0 )2 |ui |2 ≤ cos2 αi0 · |ui |2 ≤ |vπuv (i) |2 . √ Combining these two inequalities and noting that (1 − εiuv 0 ) = (1 − εiuv )/ d, we get d(1 − εiuv 0 )4 µuv (i) ≤ (1 − εiuv 0 )2 |ui |2 ≤ |vπuv (i) |2 . The lemma follows. 4 The function πuv is not necessarily a permutation. 15 We now address the issue that for some edges (u, v) and states j there may not necessarily exist i s.t. (i, j) ∈ Πuv . We call such j a state of degree 0. The key observation is that in our algorithms we may enforce additional constraints like xu = i or xu 6= i by setting ui = 1 or ui = 0 respectfully. Thus we can add extra states and enforce that the vertices are not in these states. Then we add pairs (i, j) where i is a new state, and j is a state of degree 0 (or vice-versa). Alternatively we can rewrite the objective function by adding an extra term: ! k X 1 X 2 i 0 2 ui − wuv minimize + |wuv | , 2 (u,v)∈E i=1 0 is the sum of v over j of degree 0. where wuv j Acknowledgements We would like to thank Sanjeev Arora, Uri Feige, Johan Håstad, Muli Safra and Luca Trevisan for valuable discussions and Eden Chlamtac for his comments. The second and third authors thank Microsoft Research for their hospitality. References √ [1] A. Agarwal, M. Charikar, K. Makarychev, and Y. Makarychev. O( log n) approximation algorithms for Min UnCut, Min 2CNF Deletion, and directed cut problems. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, pp. 573–581, 2005. [2] G. Andersson, L. Engebretsen, and J. Håstad. A new way to use semidefinite programming with applications to linear equations mod p. Journal of Algorithms, Vol. 39, pp. 162–204, 2001. [3] S. Chawla, R. Krauthgamer, R. Kumar,Y. Rabani, and D. Sivakumar. On the hardness of approximating multicut and sparsest-cut. In Proceedings of the 20th IEEE Conference on Computational Complexity, pp. 144–153, 2005. [4] E. Chlamtac, K. Makarychev and Y. Makarychev. How to play any Unique Game. manuscript, February 2006. [5] I. Dinur, E. Mossel, and O. Regev. Conditional hardness for approximate coloring. ECCC Technical Report TR05-039, 2005. [6] U. Feige and L. Lovász. Two-prover one round proof systems: Their power and their problems. In Proceedings of the 24th ACM Symposium on Theory of Computing, pp. 733–741, 1992. [7] U. Feige and D. Reichman. On systems of linear equations with two variables per equation. In Proceedings of the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization, vol. 3122 of Lecture Notes in Computer Science, pp. 117–127, 2004. 16 [8] M. Goemans , D. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, vol. 42, no. 6, pp. 1115–1145, Nov. 1995. [9] A. Gupta and K. Talwar. Approximating Unique Games. In Proceedings of the 17th ACMSIAM Symposium on Discrete Algorithms, pp. 99–106, 2006. [10] S. Khot. On the power of unique 2-prover 1-round games. In Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 767–775, 2002. [11] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for MAX-CUT and other two-variable CSPs? In Proceedings of the 45th IEEE Symposium on Foundations of Computer Science, pp. 146–154, 2004. [12] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? ECCC Report TR05-101, 2005. [13] S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2 − ε. In Proceedings of the 18th IEEE Conference on Computational Complexity, pp. 379–386, 2003. [14] S. Khot and N. Vishnoi. The unique games conjecture, integrality gap for cut problems and the embeddability of negative type metrics into `1 . In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science, pp. 53–62, 2005. [15] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low influences: invariance and optimality. In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science, pp. 21–40, 2005. [16] Z. Šidák. Rectangular Confidence Regions for the Means of Multivariate Normal Distributions. Journal of the American Statistical Association, vol. 62, no. 318, pp. 626–633, Jun. 1967. [17] L. Trevisan. Approximation Algorithms for Unique Games. In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science, pp. 197–205, 2005. A Properties of Normal Distribution For completeness we present some standard results used in the paper. Denote the probability that a standard normal random variable is bigger than t ∈ R by Φ̃(t), in other words Φ̃(t) ≡ 1 − Φ0,1 (t) = Φ0,1 (−t), where Φ0,1 is the normal distribution function. Lemma A.1. 1. For every t > 0, √ t2 t 1 − t2 e− 2 < Φ̃(t) < √ e 2 2π(t2 + 1) 2πt 17 2. There exist positive constants c1 , C1 , c2 , C2 such that for all 0 < p < 1/3, t ≥ 0 and ρ ≥ 1 the following inequalities hold: t2 t2 c1 C1 e− 2 ≤ Φ̃(t) ≤ √ e− 2 ; 2π(t + 1) 2π(t + 1) p p c2 log (1/p) ≤ Φ̃−1 (p) ≤ C2 log (1/p); √ 3. There exists a positive constant C3 , s.t. for every 0 < δ ≤ 2 and t ≥ 1/δ the following inequality holds: Φ̃(δt + Proof. 1 2 ) ≥ C3 (t · Φ̃(t))δ · t−1 . δt 1. First notice that Φ̃(t) = 1 √ 2π Z ∞ 2 − x2 e t 2 − x2 1 −e dx = √ x 2π ∞ Z ∞ − x2 e 2 − dx x2 t t = √ 1 − t2 1 e 2 −√ 2πt 2π Z ∞ t Thus Φ̃(t) < √ 2 − x2 e x2 dx. 1 − t2 e 2. 2πt On the other hand 1 √ 2π Z ∞ t x2 e− 2 1 dx < √ 2 x 2πt2 Z ∞ e− x2 2 t dx = Φ̃(t) . t2 Hence Φ̃(t) > √ 1 − t2 Φ̃(t) e 2 − 2 , t 2πt and Φ̃(t) > √ t2 t e− 2 . 2π(t2 + 1) 2. This trivially follows from (1). 3. Using (2) we get δ 2 ·t2 1 1 −1 − (δt+ δt1 )2 2 ·e Φ̃(δt + ) ≥ C · 1 + δt + ≥ C 0 · (δt + 1)−1 · e− 2 δt δt 2 δ 2 − t2 00 e · tδ2 · t−1 ≥ C 000 · (t · Φ̃(t))δ2 · t−1 ≥ C t+1 18 We will use the following result of Z. Šidák [16]: Theorem A.2 (Šidák). Let ξ1 , . . . , ξk be normal random variables with mean zero, then for any positive t1 , . . . , tk , Pr (|ξ1 | ≤ t1 , |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≥ Pr (|ξ1 | ≤ t1 ) Pr (|ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) . Note that these random variable do not have to be independent. Corollary A.3. Let ξ1 , . . . , ξk be normal random variables with mean zero, then for any positive t1 , . . . , t k , Pr (ξ1 ≥ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≤ Pr (ξ1 ≥ t1 ) . Proof. By Theorem A.2, Pr (|ξ1 | ≤ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≥ Pr (|ξ1 | ≤ t1 ) . Thus Pr (ξ1 ≥ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) = ≤ B 1 1 − Pr (|ξ1 | ≤ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) 2 2 1 1 − Pr (|ξ1 | ≤ t1 ) = Pr (ξ1 ≥ t1 ) 2 2 Analysis of Algorithm 1 In this section we will prove some technical lemmas we used in the analysis of the first algorithm. Lemma B.1. Let ξ and η be correlated standard normal random variables, 0 < ε < 1, t ≥ 1. If cov(ξ, η) ≥ 1 − ε, then √ 2 Pr (ξ ≥ t and η ≥ t) ≥ C · min(1, ( εt)−1 ) · t−1 · (t · Φ̃(t)) 2−ε . (13) for some positive constant C. Proof. Let us represent ξ and η as follows: p p ξ = σX + 1 − σ 2 · Y ; η = σX − 1 − σ 2 · Y, where ξ+η ξ+η ξ−η ; X= ; Y = √ . σ = Var 2 2σ 2 1 − σ2 2 Note that X and Y are independent standard normal random variables; and ξ+η 1 ε 2 σ = Var = [2 + 2 cov(ξ, η) ] ≥ 1 − . 2 4 2 19 (14) Notice that 1/2 ≤ σ 2 ≤ 1. We now estimate the probability (13) as follows p Pr ξ ≥ t and η ≥ t = Pr σX ≥ t + 1 − σ 2 · |Y | σ2 t σ · Pr |Y | ≤ √ ≥ Pr X ≥ + σ t 1 − σ2 · t By Lemma A.1 (3) we get Pr ξ ≥ t and η ≥ t ≥ C · t −1 1/σ 2 · (tΦ̃(t)) σ2 · min 1, √ 1 − σ2 · t √ 2 ≥ C 0 · min(( ε · t)−1 , 1) · t−1 · (t · Φ̃(t)) 2−ε . Corollary B.2. Let ξ and η be standard normal random variables with covariance greater than or equal to 1 − ε; let Φ̃(t) = 1/k. Then − 2 ! 2−ε 1 1 k Pr (ξ ≥ t and η ≥ t) ≥ Ω min 1, √ ·√ · √ . ε log k log k log k Lemma B.3. Let ξ, η, ε, k and t be as in Corollary B.2, and let ξ1 , . . . , ξm be i.i.d. standard normal random variables and m ≤ 2k, then "m # X E I{ξi ≥t} | ξ ≥ t and η ≥ t = O(1), i=1 where I{ξi ≥t} is the indicator of the event {ξi ≥ t}. Proof. Let X qand Y be as in the proof of Lemma B.1. Put αi = cov(X, ξi ) and express each ξi as 2 ≤ 1 (since random variables ξ are ξi = αi X + 1 − αi2 · Zi . By Bessel’s Inequality α12 + · · · + αm i orthogonal). We now estimate p Pr(ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y |) = p p Pr ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X ≤ 4t · Pr X ≤ 4t | σX ≥ t + 1 − σ 2 · |Y | p p + Pr ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X > 4t · Pr X > 4t | σX ≥ t + 1 − σ 2 · |Y | Notice that p 1 − σ 2 · |Y | and X < 4t q p = Pr αi X + 1 − αi2 · Zi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X < 4t Z 4t q p = Pr αi x + 1 − αi2 · Zi ≥ t | σx ≥ t + 1 − σ 2 · |Y | dF (x) t/σ q p 2 2 ≤ max Pr 1 − αi · Zi ≥ t − αi x | 1 − σ · |Y | ≤ σx − t x∈[t/σ,4t] q by Corollary A.3 2 ≤ max Pr 1 − αi · Zi ≥ t − αi x ≤ Pr (Zi ≥ (1 − 4αi )t) . Pr ξi ≥t | σX ≥ t + x∈[t/σ,4t] 20 It suffices to prove that m X Pr (Zi ≥ (1 − 4αi )t) = i=1 m X Φ̃((1 − 4αi )t) = O(1). i=1 Fix a sufficiently large constant c. The number of αi that are greater than 1/c is O(1). The number of αi such that log−1 k ≤ αi ≤ 1/c is O(log2 k) and for them Φ̃((1 − 4αi )t) = O(k −1/2 ) (since c is a sufficiently large constant). Finally, if αi < 1/ log k, then Φ̃((1 − 4αi )t) = O(k −1 ). This finishes the proof. Lemma B.4. The function (1 − x)2 fk (x) is convex on the interval [0, 1]. √ Proof. Let m = k/ log k. Compute the first and the second derivatives of fk : fk00 (x) 2 m− 2−x = −2 log m · = m (2 − x)2 2 m− 2−x log m = 4 log m · · −1 . (2 − x)3 2−x 2 − 2−x 00 !0 00 Now (1 − x)2 · fk (x) = (1 − x)2 · fk00 (x) − 4(1 − x)fk0 (x) + 2fk (x). Observe that fk (x) 0 is always positive, and fk (x) is always negative. Therefore, if fk00 (x) is positive, we are done: 00 (1 − x)2 · fk (x) ≥ 0. Otherwise, we have 00 (1 − x)2 · fk (x) = (1 − x)2 · fk00 (x) − 4(1 − x)fk0 (x) + 2fk (x) ≥ fk00 (x) + 2fk (x) 2 2 2 log m ≥ 4 log m · m− 2−x − 1 + 2m− 2−x = 2m− 2−x (log m − 1)2 ≥ 0. 2 C Analysis of Algorithm 2 In this section, we present the formal proof of Theorem 4.1. We will follow an informal outline of the proof sketched in the main text. We start with estimating probability (8). Lemma C.1. Let ξ and ζ be two independent random normal variables with variance 1 and σ 2 respectively (0 < σ < 1). Then for every positive t Pr (ξ ≤ t and ξ + ζ ≥ t) = O(σe 2 /2 Remark C.1. In the “typical” case e(σt+1) (σt+1)2 2 is a constant. 21 t2 · e− 2 ). Proof. We have Z ∞ Pr ξ ≤ t and ξ + ζ ≥ t = Pr (ξ ≤ t and ξ + x ≥ t) dFζ (x) 0 Z ∞ x2 1 Pr (ξ ≤ t and ξ + x ≥ t) e− 2σ2 dx =√ 2πσ 0 Z ∞ y2 1 Pr (ξ ≤ t and ξ + σ y ≥ t) e− 2 dy =√ 2π 0 Z t/σ Z ∞ y2 y2 1 1 Pr (t − σ y ≤ ξ ≤ t) e− 2 dy + √ Pr (t − σ y ≤ ξ ≤ t) e− 2 dy. =√ 2π 0 2π t/σ Let us bound the first integral. Since the density of the random variable ξ on the interval (t − σy, t) is at most √1 e 2π −(t−σy)2 2 and y ≤ ey , we have −(t−σy)2 −t2 1 σ Pr (t − σ y ≤ ξ ≤ t) ≤ σy · √ e 2 ≤ √ · e 2 · e(σt+1)y . 2π 2π Therefore, 1 √ 2π Z t/σ −t2 2 − y2 Pr (t − σ y ≤ ξ ≤ t) e dy ≤ 0 σe 2 2π Z t/σ e(σt+1)y · e− y2 2 dy 0 −t2 Z ∞ (y−(σt+1))2 (σt+1)2 σe 2 2 e− ≤ · e 2 dy 2π −∞ (σt+1)2 −t2 = O σe 2 · e 2 . We now upper bound the second integral. If t ≥ 1, then 1 √ 2π Z ∞ 2 − y2 Pr (t − σ y ≤ ξ ≤ t) e t/σ 2 Z ∞ − t2 2σ y2 1 e dy ≤ √ e− 2 dy = Φ̃(t/σ) = O t/σ + 1 2π t/σ 2 − t2 t2 σ e − = O σe 2 . = O t+σ If t ≤ 1, then Z ∞ y2 1 √ Pr (t − σ y ≤ ξ ≤ t) e− 2 dy ≤ 2π t/σ 1 √ 2π Z ∞ 2 − y2 σy · e 2 − t2 dy = O (σ) = O σ e . 0 The desired inequality follows from the upper bounds on the first and second integrals. We need a slight generalization of the lemma. Corollary C.2. Let ξ and ζ be two independent random normal variables with variance 1 and σ 2 respectively (0 < σ < 1). Then for every t ≥ 0 and 0 ≤ ε̄ < 1 ! 2 (σ + ε̄t) · c(ε̄, σ, t) · e−t /2 Pr (ξ + ζ ≥ (1 − ε̄)t | |ξ| ≤ t) = O , 1 − 2Φ̃(t) 22 where c(ε̄, σ, t) = e (σt+1)2 +ε̄t2 2 . Remark C.2. As in the previous lemma, in the “typical” case c(ε̄, σ, t) is a constant. Proof. First note that Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t) Pr (|ξ| ≤ t) Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t) 1 − 2Φ̃(t) Pr (ξ + ζ ≥ (1 − ε̄)t | |ξ| ≤ t) ≤ = Now, Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t) ≤ Pr (ξ + ζ ≥ t and ξ ≤ t) + Pr ((1 − ε̄)t ≤ ξ + ζ ≤ t) . By Lemma C.1, the first probability is bounded as follows: (σt+1)2 t2 − Pr (ξ + ζ ≥ t and ξ ≤ t) ≤ O σe 2 · e 2 . Since Var [ξ + ζ] ≤ 1 + σ 2 , the second probability is at most − Pr ((1 − ε̄)t ≤ ξ + ζ ≤ t) ≤ ε̄t · e ((1−ε̄)t)2 2(1+σ 2 ) ≤ ε̄t · e (2ε̄+σ 2 )t2 2 t2 · e− 2 , here we used the following inequality (1 − ε̄)2 t2 (1 − ε̄)2 (1 − σ 2 )t2 (1 − 2ε̄ − σ 2 )t2 t2 (2ε̄ + σ 2 )t2 = ≥ ≥ − . 2(1 + σ 2 ) 2(1 − σ 4 ) 2 2 2 The corollary follows. In the following lemma we formally define the random variables ζ1 and ζ2 . Lemma C.3. Let ξ1 , ξ2 , η1 and η2 be standard normal random variables such that ξ1 and ξ2 are independent; η1 and η2 are independent; and • cov(ξ1 , η1 ) ≥ 1 − ε̄ ≥ 0 and cov(ξ2 , η2 ) ≥ 1 − ε̄ ≥ 0 (for some positive ε̄); • cov(ξ1 , η2 ) ≥ 0 and cov(ξ2 , η1 ) ≥ 0. Then there exist normal random variables ζ1 and ζ2 independent of ξ1 and ξ2 with variance at most 2ε̄ such that |η1 | − |η2 | ≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |. Proof. Express η1 as a linear combination of ξ1 , ξ2 , and a normal r.v. ζ1 independent of ξ1 and ξ2 : η1 = α1 ξ1 + β1 ξ2 + ζ1 , 23 similarly, η2 = α2 ξ1 + β2 ξ2 + ζ2 . Note that α1 = cov(η1 , ξ1 ) ≥ 1 − ε̄ and β1 = cov(η1 , ξ2 ) ≥ 0. Thus Var [ζ1 ] ≤ Var [η1 ] − α12 ≤ 1 − (1 − ε̄)2 ≤ 2ε̄. Similarly, α2 ≥ 0, β2 ≥ 1 − ε̄, and Var[ζ2 ] ≤ 2ε̄. Since η1 and η2 are independent, we have α1 α2 + β1 β2 + cov(ζ1 , ζ2 ) = cov(η1 , η2 ) = 0. Therefore (note that cov(ζ1 , ζ2 ) ≤ 0; α1 α2 ≥ 0; β1 β2 ≥ 0), p Var[ζ1 ] Var[ζ2 ] −β1 β2 − cov(ζ1 , ζ2 ) 2ε̄ α2 = ≤ ≤ . α1 1 − ε̄ 1 − ε̄ 2ε̄ Taking into account that α2 ≤ 1, we get α2 ≤ min(1, 1−ε̄ ) ≤ 3ε̄. Similarly, β1 ≤ 3ε̄. Finally, we have |η1 | − |η2 | ≥ (α1 − α2 )|ξ1 | − (β1 + β2 )|ξ2 | − |ζ1 | − |ζ2 | ≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |. In what follows we assume that ξ1 is the largest r.v. in absolute value among ξ1 , . . . , ξm and its absolute value is t. For convenience we define three events: At = {|ξi | ≤ t for all 3 ≤ i ≤ m} ; Et = At ∩ {|ξ1 | = t and |ξ2 | ≤ t} ; [ E = {|ξ1 | ≥ |ξi | for all i} = Et . t≥0 Now we are ready to combine Corollary C.2 and Lemma C.3. Lemma C.4. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables. Suppose that 1. the random variables in each of the sequences are independent, 2. the covariance of every ξi and ηj is nonnegative, 3. cov(ξ1 , η1 ) ≥ 1 − ε̄ and cov(ξ2 , η2 ) ≥ 1 − ε̄, where ε̄ ≤ 1/7. Then Pr (|η1 | ≤ |η2 | | Et ) = O ! √ √ 2 ( ε̄ + ε̄t) · e−t /2 · c(7ε̄, 8ε̄, t) , 1 − 2Φ̃(t) where c(ε̄, σ, t) is from Corollary C.2. 24 (15) Proof. By Lemma C.3, we have |η1 | − |η2 | ≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |. Therefore, Pr (|η1 | ≤ |η2 | | Et ) ≤ Pr ((1 + 3ε̄)|ξ2 | + |ζ1 | + |ζ2 | ≥ (1 − 4ε̄)|ξ1 | | Et ) ≤ Pr (|ξ2 | + |ζ1 | + |ζ2 | ≥ (1 − 7ε̄)t | Et ) X ≤ Pr (sξ2 + s1 ζ1 + s2 ζ2 ≥ (1 − 7ε̄)t | Et ) s,s1 ,s2 ∈{±1} Let us fix signs s, s1 , s2 ∈ {±1} and denote ξ = sξ2 , ζ = s1 ζ1 + s2 ζ1 , then we need to show that ! √ 2 √ e−t /2 · c(7ε̄, 8ε̄, t) . Pr (ξ + ζ ≥ (1 − 7ε̄)t | Et ) = O ( ε̄ + ε̄t) · 1 − 2Φ̃(t) Observe that the random variables ξ, ζ and the event At are independent of ξ1 , thus Pr (ξ + ζ ≥ (1 − 7ε̄)t | Et ) = Pr (ξ + ζ ≥ (1 − 7ε̄) t | At and |ξ1 | = t and |ξ| ≤ t) = Pr (ξ + ζ ≥ (1 − 7ε̄) t | At and |ξ| ≤ t) = Pr (ζ ≥ (1 − 7ε̄) t − ξ | At and |ξ| ≤ t) . Since ξ and At are independent, for every fixed value of ξ we can apply Corollary A.3. Thus Pr (ζ ≥ (1 − 7ε̄)t − ξ | At and |ξ| ≤ t) ≤ Pr (ζ ≥ (1 − 7ε̄)t − ξ | |ξ| ≤ t) = Pr (ξ + ζ ≥ (1 − 7ε̄)t | |ξ| ≤ t) . Finally, by Corollary C.2 (where σ 2 = Var [ζ] ≤ 8ε̄), Pr (ξ + ζ ≥ (1 − 7ε̄)t | |ξ| ≤ t) = O ! √ √ 2 ( ε̄ + ε̄t) · e−t /2 · c(7ε̄, 8ε̄, t) . 1 − 2Φ̃(t) Corollary C.5. Under assumptions of Lemma C.4, 1. if ε̄t2 ≤ 1, then √ (t + 1) · Φ̃(t) Pr (|η1 | ≤ |η2 | | Et ) = O ε̄ 1 − 2Φ̃(t) 2. if t > 1, then Pr (|η1 | ≤ |η2 | | Et ) = O Proof. 1. If ε̄t2 ≤ 1, then ε̄t ≤ √ √ ε̄ . ε̄ and c(7ε̄, √ 8ε̄, t) = e ( √ 8ε̄t+1)2 +7ε̄t2 2 25 = O(1). ! ; Notice that √ 2 ( ε̄ + ε̄t) · e−t /2 =O 1 − 2Φ̃(t) ! √ ( ε̄ + ε̄t) · (t + 1) · Φ̃(t) , 1 − 2Φ̃(t) since 2 e−t /2 t Φ̃(t) = Θ ! . 2. If ε̄ > 1/32 the statement holds trivially. So assume that ε̄ ≤ 1/32. Then √ ( 8ε̄t + 1)2 3t2 + 7ε̄t2 ≤ + O(t). 2 8 √ t2 Thus t·e− 2 ·c(7ε̄, 8ε̄, t) is upper bounded by some absolute constant. Since t ≥ 1, the denominator 1 − 2Φ̃(t) of the expression (15) is bounded away from 0. We now give a bound on the “typical” absolute value of the largest random variable. Lemma C.6. The following inequality holds: p 1 Pr |ξ1 | ≥ 2 log m | E ≤ . m Proof. Note that the probability of the event E is 1/m, since all random variables ξ1 , . . . , ξm are equally likely to be the largest in absolute value. Thus we have Pr |ξ | ≥ 2√log m p 1 1 1 1 Pr |ξ1 | ≥ 2 log m | E ≤ ≤ 2 = . Pr (E) m m m Lemma C.7. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables as in Theorem 4.1. Assume that cov(ξ1 , η1 ) ≥ 1 − ε̄ and cov(ξ2 , η2 ) ≥ 1 − ε̄, where ε̄ < min(1/(4 log m), 1/7). Then √ ε̄ log m Pr (|η1 | ≤ |η2 | | E) = O . m Proof. Write the desired probability as follows: p Pr (|η1 | ≤ |η2 | | E) = Pr |η1 | ≤ |η2 | and |ξ1 | ≤ 2 log m | E p + Pr |η1 | ≤ |η2 | and |ξ1 | ≥ 2 log m | E √ First consider the case |ξ1 | ≤ 2 log m. Denote by dF|ξ1 | the density of |ξ1 | conditional on E. Then √ 2 log m Z p Pr |η1 | ≤ |η2 | and |ξ1 | ≤ 2 log m | E = 0 Z Pr (|η1 | ≤ |η2 | | E and |ξ1 | = t) dF|ξ1 | (t) √ 2 log m = 0 26 Pr (|η1 | ≤ |η2 | | Et ) dF|ξ1 | (t) Now by Corollary C.5, Z √ 2 log m Z Pr (|η1 | ≤ |η2 | | Et ) dF|ξ1 | (t) = 0 √ 2 log m O 0 ! √ 2 ε̄ log m Φ̃(t) dF|ξ1 | (t). 1 − 2Φ̃(t) Let us change the variable to x = 1 − 2Φ̃(t). What is the probability density function of 1 − 2Φ̃(|ξ1 |) given E? For each i the r.v. 1−2Φ̃(|ξi |) is uniformly distributed on the interval [0, 1]. Now |ξi | > |ξj | if and only if 1 − 2Φ̃(|ξi |) > 1 − 2Φ̃(|ξj |), therefore 1 − 2Φ̃(|ξ1 |) is distributed as the maximum of m independent random variables on [0, 1] given E. Its density function is (xm )0 = mxm−1 (for x ∈ [0, 1]). We have √ Z ∞ √ 2 ε̄ log m Φ̃(t) 2 ε̄ log m Φ̃(t) dF|ξ1 | (t) ≤ dF|ξ1 | (t) 1 − 2Φ̃(t) 1 − 2Φ̃(t) 0 0 Z 1 √ Z 1 p 2 ε̄ log m · (1 − x)/2 = (1 − x)xm−2 dx · mxm−1 dx = m ε̄ log m x 0 0 √ p 1 ε̄ log m 1 = m ε̄ log m − = . m−1 m m−1 √ Now consider the case |ξ1 | ≥ 2 log m, by Corollary C.5, √ p Pr |η1 | ≤ |η2 | | E and |ξ1 | ≥ 2 log m = O ε̄ . Z √ 2 log m By Lemma C.6, p 1 Pr |ξ1 | ≥ 2 log m | E ≤ . m This concludes the proof. Now we will prove a lemma, which differs from Theorem 4.1 only by one additional condition (4). Lemma C.8. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables. Let εi = cov(ξi , ηi ). Suppose that 1. the random variables in each of the sequences are independent, 2. the covariance of every ξi and ηj is nonnegative, 1 Pm i 3. m i=1 ε = ε, 4. εi ≤ min(1/(4 log m), 1/7). Then the probability that the largest r.v. in absolute value in the first√sequence has the same index as the largest r.v. in absolute value in the second sequence is 1 − O( ε log m). Proof. By Lemma C.7, √ log m p 1 2 Pr |η1 | ≤ |η2 | | |ξ1 | ≥ max |ξj | = O max (ε , ε ) . j≥2 m 27 Applying the union bound, we get √ ! m q log m X max (ε1 , εi ) Pr(|η1 | ≤ max|ηj | | |ξ1 | ≥ max |ξj |) = O i≥2 j≥2 m i=2 !! √ m √ p X √ √ by Jensen’s inequality √ log m ≤ O =O εi log m( ε1 + ε) . · m ε1 + m i=1 Since the probability that |ξi | = maxj |ξj | equals 1/m for each i, the probability that the largest r.v. in absolute value among ξi , and the largest r.v. in absolute value among ηj have different indexes is at most ! m p p √ √ √ √ 1 Xp O log m · ( εi + ε) ≤ O log m · ( ε + ε) = O ε log m . m i=1 Proof of Theorem 4.1. Denote εi = 1 − cov(ξi , ηi ). Then (ε1 + · · · + εm ) ≤ mε. We may assume that ε < min(1/(4 log m), 1/7) — otherwise, the theorem follows trivially. Consider the set I = {i : εi < min(1/(4 log m), 1/7)}. Since ε < min(1/(4 log m), 1/7), the set I is not empty. Applying Lemma C.8 to random variables {ξi }i∈I and {ηi }i∈I , we conclude that the the largest r.v. in absolute value among {ξi }i∈I has the same index as the largest r.v. in absolute value among {ξi }i∈I with probability s p X 1 1 − O log |I| · ε log m . εi = 1 − O |I| i∈I Since each ξi is the largest r.v. among ξ1 ,. . . , ξm in absolute value with probability 1/m, the probability that the largest r.v. among ξ1 ,. . . , ξm does not belong to {ξi }i∈I is m−|I| m . Similarly, the probability that the largest r.v. among η1 ,. . . , ηm does not belong to {ηi }i∈I is (m − |I|)/m. Therefore, by the union bound, the probability that the largest r.v. in absolute value among ξi , and the largest r.v. in absolute value among ηj have different indexes is at most p m − |I| 1 − O( ε log m) − 2 . m (16) We now upper bound the last term. 2 m − |I| m by the Markov inequality ε min(1/(4 log m), 1/7) ≤ 2 ≤ p 2 (4 log m + 7)ε = O(ε log m) = O( ε log m). (Here we use that ε log m < 1.) √ Plugging this bound into (16) we get that the desired probability is 1 − O( ε log m). This finishes the proof. 28 D An Alternate Approach We would like to present an alternative version of Algorithm 1. It demonstrates another approach for taking into account the lengths of vectors ui : we can choose a separate threshold tui for each ui . This algorithm achieves the same approximation ratio. The analysis uses similar ideas and we omit it here. Input: A solution of the SDP, with the objective value ε · |E|. Output: An assignment of variables xu . 1. Define ũi = ui /|ui | if ui 6= 0, 0 otherwise. 2. Pick a random Gaussian vector g with independent components distributed as N (0, 1). 3. For each vertex u: (a) For each i project the vector g to ũi : ξui = hg, ũi i. (b) Fix a threshold tui s.t. Pr (ξui ≥ tui ) = |ui |2 (i.e. tui is the (1 − |ui |2 )-quantile of the standard normal distribution). Note that tui depends on ui . (c) Pick ξui ’s that are larger than the threshold tui : Su = {i : ξui ≥ tui } . (d) Pick at random a state i from Su and assign xu = i. 29