Near-Optimal Algorithms for Unique Games Moses Charikar Konstantin Makarychev Yury Makarychev

advertisement
Near-Optimal Algorithms for Unique Games
Moses Charikar∗
Konstantin Makarychev
†
Yury Makarychev
‡
Princeton University
Abstract
Unique games are constraint satisfaction problems that can be viewed as a generalization of
Max-Cut to a larger domain size. The Unique Games Conjecture states that it is hard to distinguish between instances of unique games where almost all constraints are satisfiable and those
where almost none are satisfiable. It has been shown to imply a number of inapproximability
results for fundamental problems that seem difficult to obtain by more standard complexity
assumptions. Thus, proving or refuting this conjecture is an important goal. We present significantly improved approximation algorithms for unique games. For instances with domain size
k where the optimal solution√satisfies 1 − ε fraction of all constraints, our algorithms satisfy
roughly k −ε/(2−ε) and 1 − O( ε log k) fraction of all constraints. Our algorithms are based on
rounding a natural semidefinite programming relaxation for the problem and their performance
almost matches the integrality gap of this relaxation. Our results are near optimal if the Unique
Games Conjecture is true, i.e. any improvement (beyond low order terms) would refute the
conjecture.
1
Introduction
Given a set of linear equations over Zp with two variables per equation, consider the problem of
finding an assignment to variables that satisfies as many equations (constraints) as possible. If
there is an assignment to the variables which satisfies all the constraints, it is easy to find such an
assignment. On the other hand, if there is an assignment that satisfies almost all constraints (but
not all), it seems quite difficult to find a good satisfying assignment. This is the essence of the
Unique Games Conjecture of Khot[10].
One distinguishing feature of the above problem on linear equations is that every constraint
corresponds to a bijection between the values of the associated variables. For every possible value
of one variable, there is a unique value of the second variable that satisfies the constraint. Unique
games are systems of constraints – a generalization of linear equations discussed above – that have
this uniqueness property (first considered by Feige and Lovász [6]).
∗
moses@cs.princeton.edu Supported by NSF ITR grant CCR-0205594, NSF CAREER award CCR-0237113,
MSPA-MCS award 0528414, and an Alfred P. Sloan Fellowship.
†
kmakaryc@cs.princeton.edu Supported by a Gordon Wu fellowship. Part of this work was done at Microsoft
Research.
‡
ymakaryc@cs.princeton.edu Supported by a Gordon Wu fellowship. Part of this work was done at Microsoft
Research.
1
Definition 1.1 (Unique Game). A unique game consists of a constraint graph G = (V, E), a set
of variables xu (for all vertices u) and a set of permutations πuv on [k] = {1, . . . , k} (for all edges
(u, v)). Each permutation πuv defines the constraint πuv (xu ) = xv . The goal is to assign a value
from the set [k] to each variable xu so as to maximize the number of satisfied constraints.
As in the setting of linear equations, instances of unique games where all constraints are satisfiable are easy to handle. Given an instance where 1 − ε fraction of constraints are satisfiable, the
Unique Games Conjecture (UGC) of Khot [10] says that it is hard to satisfy even δ fraction of the
constraints. More formally, the conjecture is the following.
Conjecture 1 (Unique Games Conjecture [10]). For any constants ε, δ > 0, for any k >
k(ε, δ), it is NP-hard to distinguish between instances of unique games with domain size k where
1 − ε fraction of constraints are satisfiable and those where δ fraction of constraints are satisfiable.
This conjecture has attracted a lot of recent attention since it has been shown to imply hardness
of approximation results for several important problems: MaxCut [11, 15], Min 2CNF Deletion [3,
10], MultiCut and Sparsest Cut [3, 14], Vertex Cover [13], and coloring 3-colorable graphs [5] (based
on a variant of the UGC), that seem difficult to obtain by standard complexity assumptions.
Note that a random assignment satisfies a 1/k fraction of the constraints in a unique game.
Andersson, Engebretsen, and Håstad [2] considered semidefinite program (SDP) based algorithms
for systems of linear equations mod p (with two variables per equation) and gave an algorithm
that performs (very slightly) better than a random assignment. The first approximation
algorithm
p
for general unique games was given by Khot [10], and satisfies 1 − O(k 2 ε1/5 log(1/ε)) fraction of
all constraints if 1 − ε fraction of all
constraints is satisfiable. Recently Trevisan [17] developed
√
3
an algorithm
that
satisfies
1
−
O(
ε
log n) fraction of all constraints (this can be improved to
√
1 − O( ε log n) [9]), and Gupta and Talwar [9] developed an algorithm that satisfies 1 − O(ε log n)
fraction of all constraints. The result of [9] is based on rounding an LP relaxation for the problem,
while previous results use SDP relaxations for unique games.
There are very few results that show hardness of unique games. Feige and Reichman [7] showed
that for every positive ε there is c s.t. it is NP-hard to distinguish whether c fraction of all
constraints is satisfiable, or only εc fraction is satisfiable.
Our Results. We present two new approximation algorithms for unique games. We state
our guarantees for instances where 1 − ε fraction of constraints are satisfiable. The first algorithm
satisfies
−ε/(2−ε) !
1
k
2
Ω min(1, √
) · (1 − ε) · √
(1)
ε log k
log k
√
fraction of all constraints. The second algorithm satisfies 1 − O( ε log k) fraction of all constraints
and has a better guarantee for ε = O(1/ log k). We apply the same techniques for d-to-1 games as
well.
In order to understand the complexity theoretic implications of our results, it is useful to keep in
mind that inapproximability reductions from unique games typically use the “Long Code”, which
increases the size of the instance by a 2k factor. Thus, such applications of unique games usually
have domain size k = O(log n). In Figure 1, we summarize known algorithmic guarantees for unique
games. In order to compare these different guarantees in the context of hardness applications (i.e.
k = O(log n)), we compare the range of values of ε (as a function of k) for which each of these
algorithms beats the performance of a random assignment.
2
Algorithm
Khot [10]
Trevisan [17]
Gupta-Talwar [9]
This paper
Guarantee
for OP T = 1 − ε
p
1 − O(k√2 ε1/5 log(1/ε))
1 − O( 3 ε log n)
1 − O(ε log n)
Threshold
ε
Õ(1/k 10 )
O(1/k)
O(1/k)
≈ Ω(k −ε/(2−ε)
)
√
1 − O( ε log k)
const < 1
O(1/ log k)
Figure 1: Summary of results. The guarantee represents the fraction of constraints satisfied for
instances where OP T = 1 − ε. The threshold represents the range of values of ε for which the
algorithm beats a random assignment (computed for k = O(log n)).
Our results show limitations on the hardness bounds achievable using the UGC and stronger
versions of it. Chawla, Krauthgamer, Kumar, Rabani, and Sivakumar [3] proposed a strengthened
1
form of the UGC, conjecturing that it holds for k = log n and ε = δ = (log n)
Ω(1) . This was used to
obtain an Ω(log log n) hardness for sparsest cut. Our results refute this strengthened conjecture.1
The performance of our algorithms is naturally constrained by the integrality gap of the SDP
relaxation, i.e. the smallest possible value of an integer solution for an instance with SDP solution of
value (1−ε)|E|. Khot and Vishnoi [14] constructed a gap instance for the semidefinite relaxation2 for
the Unique Games Problem where the SDP satisfies (1 − ε) fraction of constraints, but the optimal
solution can satisfy at most O(k −ε/9 ) (one may show that their analysis can yield O(k −ε/4+o(ε) )).
This shows that our results are almost optimal for the standard semidefinite program.
After we unofficially announced our results, Khot, Kindler, Mossel and O’Donnell [12] showed
that a reduction in an earlier version of their paper [11] together with the techniques in the recently
proved Majority is Stablest result of Mossel, O’Donnell, and Oleszkiewicz [15] give lower bounds
for unique games that almost match the upper bounds we obtain. They establish the following
hardness results (in fact, for the special case of linear equations mod p):
Theorem 1.2 ([12], Corollary 13). The Unique Games Conjecture implies that for every fixed
ε > 0, for all k > k(ε), it is NP-hard to distinguish between instances of unique games with domain
size k where at least 1 − ε fraction of constraints are satisfiable and those where 1/k ε/2−ε fraction
of constraints are satisfiable.
Theorem 1.3 ([12], Corollary 14). The Unique Games Conjecture implies that for every fixed
ε > 0, for all k > k(ε), it is NP-hard to distinguish between instances of unique games
p with
√ domain
size k where at least 1 − ε fraction of constraints are satisfiable and those where 1 − 2/π ε log k +
o(1) fraction of constraints are satisfiable.
Thus, our bounds√are near optimal if the UGC is true – even a slight improvement of the results
1/k ε/(2−ε) or 1 − O( ε log k) (beyond low order terms) will disprove the unique games conjecture!
Our algorithms are based on rounding an SDP relaxation for unique games. The goal is to
assign a value in [k] to every variable u. The SDP solution gives a collection of vectors {ui } for
1
An updated version of [3] proposes a different strengthened form of the UGC, which is still√plausible given our
algorithms. They use a modified analysis to account for the asymmetry in ε and δ to obtain an Ω( log log n) hardness
for sparsest cut based on this.
2
We use a slightly stronger SDP than they used, but their integrality gap construction works for our SDP as well.
3
every variable u, one for every value i ∈ [k]. Given a constraint πuv on u and v, the vectors ui and
vπuv (i) are close. In contrast to the algorithms of Trevisan [17] and Gupta, Talwar [9], our rounding
algorithms ignore the constraint graph entirely. We interpret the SDP solution as a probability
distribution on assignments of values to variables and the goal of our rounding algorithm is to pick
an assignment to variables by sampling from this distribution such that values of variables connected
by constraints are strongly correlated. The rough idea is to pick a random vector and examine the
projections of this vector on ui , picking a value i for u for which ui has a large projection. (In
fact, this is exactly the algorithm of Khot [10]). We have to modify this basic idea to obtain our
results since the ui ’s could have different lengths and other complications arise. Instead of picking
one random vector, we pick several Gaussian random vectors. Our first algorithm (suitable for
large ε) picks a small set of candidate assignments for each variable and chooses randomly amongst
them (independently for every variable). It is interesting to note that such a multiple assignment is
often encountered in algorithms implicit in hardness reductions involving label cover. In contrast to
previous results, this algorithm has a non-trivial guarantee even for very large ε. As ε approaches 1
(i.e. for instances where the optimal solution satisfies only a small fraction of the constraints), the
performance guarantee approaches that of a random assignment. Our second algorithm (suitable
for small ε) carefully picks a single assignment so that almost all constraints are satisfied. The
performance guarantee of this algorithm generalizes that obtained by Goemans and Williamson [8]
for k = 2. Note that a unique game of domain size k = 2 where 1 − ε fraction of constraints is
satisfiable is equivalent to an instance of Max-Cut where the optimal solution cuts 1 − ε fraction
of all edges. For such instances, the random hyperplane rounding algorithm of [8] gives a solution
√
of value 1 − O( ε), and our guarantee can be viewed as a generalization of this to larger k.
The reader might wonder about the confluence of our bounds and the lower bounds obtained
by Khot et al. [12]. In fact, both arise from the analysis of the same quantity: Given
√ two unit
vectors with dot product 1 − ε, conditioned on the probability that one has projection Θ( log k) on
a random Gaussian vector, what is the probability that the other has a large projection as well?
This question arises naturally in the analysis of our rounding algorithms. On the other hand, the
bounds obtained by Khot et al. [12] depend on the noise stability of certain functions. Via the
results of [15], this is bounded by the answer to the above question.
In Section 2, we describe the semidefinite relaxation for unique games. In Sections 3 and 4, we
present and analyze our approximation algorithms. In Section 5, we apply our results to d-to-1
games. We defer some of the technical details of our analysis to the Appendix.
Recently, Chlamtac, Makarychev and Makarychev [4] have combined our approach with techniques
√ of metric embeddings. Their approximation algorithm for unique games satisfies 1 −
O(ε log n log k) fraction of all constraints. This generalizes the result of Agarwal, Charikar, Makarychev, and Makarychev [1] for the Min UnCut Problem (i.e. the case k = 2). Note that their
approximation guarantee is not comparable with ours.
2
Semidefinite Relaxation
First we reduce a unique game to an integer program. For each vertex u we introduce k indicator
variables ui ∈ {0, 1} (i ∈ [k]) for the events xu = i. For every u, the intended solution has ui = 1
for exactly one i. The constraint πuv (xu ) = xv can be restated in the following form:
for all i ui = vπuv (i) .
4
The unique game instance is equivalent to the following integer quadratic program:
!
k
X
1 X
2
|ui − vπuv (i) |
minimize
2
i=1
(u,v)∈E
subject to ∀u ∈ V ∀i ∈ [k]
ui ∈ {0, 1}
∀u ∈ V ∀i, j ∈ [k], i 6= j
u i · uj = 0
∀u ∈ V
k
X
u2i = 1
i=1
Note that the objective function measures the number of unsatisfied constraints. The contribution
of (u, v) ∈ E to the objective function is equal to 0 if the constraint πuv is satisfied, and 1 otherwise.
The last two equations say that exactly one ui is equal to 1.
We now replace each integer variable ui with a vector variable and get a semidefinite program
(SDP):
k
1 X X
minimize
|ui − vπuv (i) |2
2
(u,v)∈E i=1
subject to
∀u ∈ V ∀i, j ∈ [k], i 6= j
hui , uj i = 0
k
X
∀u ∈ V
(2)
|ui |2 = 1
(3)
i=1
∀(u, v) ∈ E i, j ∈ [k]
hui , vj i ≥ 0
(4)
2
∀(u, v) ∈ E i ∈ [k]
0 ≤ hui , vπuv (i) i ≤ |ui |
(5)
The last two constraints are triangle inequality constraints3 for the squared Euclidean distance:
inequality (4) is equivalent to |ui − 0|2 + |vj − 0|2 ≥ |ui − vj |2 , and inequality (5, right side) is
equivalent to |ui − vπuv (i) |2 + |ui − 0|2 ≥ |vπuv (i) − 0|2 . A very important constraint is that for i 6= j
the vectors ui and uj are orthogonal. This SDP was studied by Khot [10], and by Trevisan [17].
Here is an intuitive interpretation of the vector solution: Think of the elements of the set [k]
as states of the vertices. If ui = 1, the vertex is in the state i. In the vector case, each vertex is
in a mixed state, and the probability that xu = i is equal to |ui |2 . The inner product hui , vj i can
be thought of as the joint probability that xu = i and xv = j. The directions of vectors determine
whether two states are correlated or not: If the angle between ui and vj is small it is likely that
both events “u is in the state i” and “v is in the state j” occur simultaneously. In some sense later
we will treat the lengths and the directions of vectors separately.
3
Rounding Algorithm
We first describe a high level idea for the first algorithm. Pick a random Gaussian vector g (with
standard normal independent components). For every vertex u add those vectors ui whose inner
3
We will use constraint 4 only in the second algorithm.
5
product with g are above some threshold τ to the set Su ; we choose the threshold τ in such a way
that the set Su contains only one element in expectation. Then pick a random state from Su and
assign it to the vertex u (if Su is empty do not assign any states to u). What is the probability that
the algorithm satisfies a constraint between vertices u and v? Loosely speaking, this probability is
equal to
|Su ∩ πuv (Sv )|
E
≈ E [|Su ∩ πuv (Sv )|] .
|Su | · |Sv |
Assume for a moment that the SDP solution is symmetric: the lengths of all vectors ui are
the same and the squared Euclidean distance between every ui and vπuv (i) is equal to 2ε. (In
fact, these constraints can be added to the SDP in the special case of systems of linear equations
of the form xi − xj = cij (mod p).) Since we want the expected size of Su to be 1, we
√ pick
threshold
τ
such
that
the
probability
that
hg,
u
i
≥
τ
equals
1/k.
The
random
variables
hg,
k · ui i
i
√
and hg, k · vπuv (i) i are standard normal random variables with covariance 1 − ε (note that we
√
k). For such random variables if the
multiplied the inner products
by
a
normalization
factor
of
√
√
probability of the
event
hg,
k
·
u
i
≥
t
≡
kτ
equals
1/k,
then
i
√
√
√
√ roughly speaking the probability
of the event hg, k · ui i ≥ t ≡ k · τ and hg, k · vπuv (i) i ≥ t ≡ k · τ equals k −ε/2 · 1/k. Thus the
expected size of the intersection of the sets Su and πuv (Sv ) is approximately k −ε/2 .
Unfortunately this no longer works if the lengths of vectors are different. The main problem is
that if, say, u1 is two times longer than u2 , then Pr (u1 ∈ Su ) is much larger than Pr (u2 ∈ Su ).
One of the possible solutions is to normalize all vectors first. In order to take into account
original lengths of vectors we repeat the procedure of adding vectors to the sets Su many times,
but each vector ui has a chance to be selected in the set Su only in the first su,i trials, where su,i
is some integer number proportional to the original squared Euclidean length of ui .
We now formally present a rounding algorithm for the SDP described in the previous section.
In Appendix D, we describe an alternate approach to rounding the SDP.
Theorem 3.1. There is a polynomial time algorithm that finds an assignment of variables which
satisfies
−ε/(2−ε) !
1
k
Ω min(1, √
) · (1 − ε)2 · √
ε log k
log k
fraction of all constraints if the optimal solution satisfies (1 − ε) fraction of all constraints.
Rounding Algorithm 1
Input: A solution of the SDP, with the objective value ε · |E|.
Output: An assignment of variables xu .
1. Define ũi = ui /|ui | if ui 6= 0, 0 otherwise.
Note that vectors ũ1 , . . . , ũk are orthogonal unit vectors (except for those vectors that are
equal to zero).
2. Pick random independent Gaussian vectors g1 , . . . , gk with independent components distributed as N (0, 1).
3. For each vertex u:
(a) Set sui = d|ui |2 · ke.
6
(b) For each i project sui vectors g1 , . . . , gsui to ũi :
ξui ,s = hgs , ũi i, 1 ≤ s ≤ sui .
Note that ξu1 ,1 , ξu1 ,2 , . . . , ξu1 ,su1 , . . . , ξuk ,1 , . . . , ξuk ,suk are independent standard normal random variables. (Since ui and uj are orthogonal if i 6= j, their projections onto a random
Gaussian vector are independent). The number of random variables corresponding to each ui
is proportional to |ui |2 .
(c) Fix a threshold t s.t. Pr (ξ ≥ t) = 1/k, where ξ ∼ N
√(0, 1)(i.e. t is the (1 − 1/k)-quantile
of the standard normal distribution; note that t = Θ log k ).
(d) Pick ξui ’s that are larger than the threshold t:
Su = {(i, s) : ξui ,s ≥ t} .
(e) Pick at random a pair (i, s) from Su and assign xu = i.
If the set Su is empty do not assign any value to the vertex: this means that all the constraints
containing the vertex are not satisfied.
We introduce some notation.
Definition 3.2. Define the distance between two vertices u and v as:
k
εuv
1X
|ui − vπuv (i) |2
=
2
i=1
and let
1
εiuv = |ũi − ṽπuv (i) |2 .
2
If ui and vπuv (i) are nonzero vectors and αi is the angle between them, then εiuv = 1 − cos αi . For
consistency, if one of the vectors is equal to zero we set εiuv = 1 and αi = π/2.
Lemma 3.3. For every edge (u, v), state i in [k] and s ≤ min(sui , svπuv (i) ) the probability that the
algorithm picks (i, s) for the vertex u and (πuv (i), s) for v at the step 3.e is
2/(2−εiuv ) !
√
1
1
log k
Ω min(1, p
)· √
·
.
(6)
k
log k
εiuv log k
Proof. First let us observe that ξui ,s and ξvπuv (i) ,s are standard normal random variables with
covariance cos αi = 1 − εiuv . As we will see later (Lemma B.1) the probability that ξui ,s ≥ t and
ξvπuv (i) ,s ≥ t is equal to (6).
Note that the expected number of elements in Su is equal to (su1 + . . . + suk )/k which is at
most 2. Moreover, as we prove in the Appendix (Lemma B.3), the conditional expected number of
elements in Su given the event ξui ,s ≥ t and ξvπuv (i) ,s ≥ t is also a constant. Thus by the Markov
inequality the following event happens with probability (6): The sets Su and Sv contain the pairs
(i, s) and (πuv (i), s) respectively and the sizes of these sets are bounded by a constant. The lemma
follows.
7
Definition 3.4. For brevity, denote
√
log k/k
2/(2−x)
by fk (x).
Remark 3.1. It is instructive to consider the case when the SDP solution is uniform in the following sense:
√
1. The lengths of all vectors ui are the same and are equal to 1/ k.
2. All εiuv are equal to ε.
In this case all sui are equal to 1. And thus the probability that a constraint is satisfied is k times the
probability (6) which is equal, up to a logarithmic factor, to k −ε/(2−ε) . Multiplying this probability
by the number of edges we get that the expected number of satisfied constraints is k −ε/(2−ε) |E|.
In the general case however we need to do some extra work to average the probabilities among
all states i and edges (u, v).
Recall that we interpret |ui |2 as the probability that the vertex u is in the state i. Suppose now
that the constraint between u and v is satisfied, what is the conditional probability that u is in the
state i and v is in the state πuv (i)? Roughly speaking, it should be equal to (|ui |2 + |vπuv (i) |2 )/2.
This motivates the following definition.
Definition 3.5. Define a measure µuv on the set [k]:
µuv (T ) =
X |ui |2 + |vπuv (i) |2
2
i∈T
, where T ⊂ [k].
Note that µuv ([k]) = 1. This follows from constraint (3).
The following lemma shows why this measure is useful.
Lemma 3.6. For every edge (u, v) the following statements hold.
1. The average value of εiuv w.r.t. the measure µuv is less than or equal to εuv :
k
X
µuv (i)εiuv ≤ εuv .
i=1
2. For every i,
min(sui , svπuv (i) ) ≥ (1 − εiuv )2 µuv (i)k.
Proof. 1. Indeed,
k
X
µuv (i) ·
εiuv
=
i=1
≤
k
X
|ui |2 + |vπuv (i) |2 − (|ui |2 + |vπuv (i) |2 ) · cos αi
i=1
k
X
i=1
=
2
|ui |2 + |vπuv (i) |2 − 2 · |ui | · |vπuv (i) | · cos αi
2
k
X
|ui − vπuv (i) |2
i=1
2
8
= εuv
Note that here we used the fact that hui , vπuv (i) i ≥ 0.
2. Without loss of generality assume that |ui | ≤ |vπuv (i) |, and hence min(sui , svπuv (i) ) = sui .
Due to the triangle inequality constraint (5) in the SDP |vπuv (i) | cos αi ≤ |ui |. Thus
(1 − εiuv )2 µuv (i) = cos2 αi ·
|ui |2 + |vπuv (i) |2
≤ |ui |2 ≤ sui /k.
2
Lemma 3.7. For every edge (u, v) the probability that an assignment found by the algorithm satisfies the constraint πuv (xu ) = xv is
k
1
Ω √
· min(1, √
) · fk (εuv ) .
(7)
log k
εuv log k
Proof. Denote the desired probability by Puv . It is equal to the sum of the probabilities obtained
in Lemma 3.3 over all i in [k] and s ≤ min (sui , svπuv (i) ). In other words,
!
k
X
1
1
Puv = Ω
)fk (εiuv ) .
min (sui , svπuv (i) ) √
min(1, p
i log k
log
k
ε
uv
i=1
Replacing min (sui , svπuv (i) ) with (1 − εiuv )2 µuv (i) · k we get
!
k
k X
1
Puv = Ω √
)(1 − εiuv )2 fk (εiuv ) .
µuv (i) min(1, p
log k i=1
εiuv log k
p
i
i
Consider
√ the set M = i ∈ [k] : εuv ≤ 2εuv . For i in M the term εuv log k is bounded from
above by 2εuv log k. Thus
!
X
1
k
min(1, √
)
µuv (i)(1 − εiuv )2 fk (εiuv ) .
Puv = Ω √
log k
εuv log k i∈M
The function (1 − x)2 fk (x) is convex on [0, 1] (see Lemma B.4). The average value of εiuv among
i in M (w.r.t. the measure µuv ) is at most the average value of εiuv among all i, which in turn is
less than εuv according to Lemma 3.6. Finally, by the Markov inequality µuv (M ) ≥ 1/2. Thus by
Jensen’s inequality
k
1
Puv = Ω √
min(1, √
)µuv (M )(1 − εuv )2 · fk (εuv )
log k
εuv log k
k
1
2
= Ω √
min(1, √
)(1 − εuv ) · fk (εuv ) .
log k
εuv log k
This finishes the proof.
We are now in position to prove the main theorem.
Theorem 3.1. There is a polynomial time algorithm that finds an assignment of variables which
satisfies
−ε/(2−ε) !
1
k
2
Ω min(1, √
) · (1 − ε) · √
ε log k
log k
fraction of all constraints if the optimal solution satisfies (1 − ε) fraction of all constraints.
9
Proof. Let us restrict our attention to a subset of edges E 0 = {(u, v) ∈ E : εuv ≤ 2ε}. For (u, v) in
E 0 , since εuv log k ≤ 2ε log k, we have
1
k
2
min(1, √
)(1 − εuv ) · fk (εuv ) .
Puv = Ω √
log k
ε log k
Summing this probability over all edges (u, v) in E 0 and using convexity of the function (1−x)2 fk (x)
we get the statement of the theorem.
4
Almost Satisfiable Instances
Suppose that ε is O(1/ log k). In the previous section we saw that in this case the algorithm finds
an assignment of variables satisfying a constant fraction of constraints.
But can we do better? In
√
this section we show how to find an assignment satisfying 1 − O( ε log k) fraction of constraints.
The main issue we need to take care of is to guarantee that the algorithm always picks only
one element in the set Su (otherwise we loose a constant factor). This can be done by selecting the
largest in absolute value ξui ,s (at step 3.d). We will also change the way we set sui .
Denote by [x]r the function that rounds x up or down depending on whether the fractional
part of x is greater or less than r. Note that if r is a random variable uniformly distributed in the
interval [0, 1], then the expected value of [x]r is equal to x.
Rounding Algorithm 2
Input: A solution of the SDP, with the objective value ε · |E|.
Output: An assignment of variables xu .
1. Pick a number r in the interval [0, 1] uniformly at random.
2. Pick random independent Gaussian vectors g1 , . . . , g2k with independent components distributed as N (0, 1).
3. For each vertex u:
(a) Set sui = [2k · |ui |2 ]r .
(b) For each i project sui vectors g1 , . . . , gsui to ũi :
ξui ,s = hgs , ũi i, 1 ≤ s ≤ sui .
(c) Select ξui ,s with the largest absolute value, where i ∈ [k] and s ≤ sui . Assign xu = i.
We first elaborate on the difference between the choice of sui in the algorithm above and that in
Algorithm 1 presented earlier. Consider a constraint πuv (xu ) = xv . Projection ξui ,s generated by ui
and ξvπuv (i) ,s generated by vπuv (i) are considered to be matched. On the other hand, a projection ξui ,s
such that the corresponding ξvπuv (i) ,s does not exist (or vice versa) is considered to be unmatched.
Unmatched projections arise when sui 6= svπuv (i) and the fraction of such projections limits the
probability of satisfying the constraint. Recall that in Algorithm 1, we set sui = d|ui |2 · ke. Even if
ui and vπuv (i) are infinitesimally close, it may turn out that sui and svπuv (i) differ by 1, yielding an
unmatched projection. As a result, some constraints that are almost satisfied by the SDP solution
(i.e. εuv is close to 0) could be satisfied with low probability (by the first rounding algorithm). In
10
h
i
Algorithm 2, we set sui = [2k · |ui |2 ]r . This serves two purposes: Firstly, Er |sui − svπuv (i) | can
be bounded by 2k · |ui − vπuv (i) |2 , giving a small number of unmatched projections in expectation.
Secondly, the number of matched projections is always at least k/2. These two properties are
established in Lemma 4.3 and ensure that the expected fraction of unmatched projections is small.
Our analysis of Rounding Algorithm 2 is based on the following theorem.
Theorem 4.1. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables. Suppose that the random variables in each of the sequences are independent, the covariance
of every ξi and ηj is nonnegative, and the average covariance of ξi and ηi is at least 1 − ε:
cov(ξ1 , η1 ) + · · · + cov(ξm , ηm )
≥ 1 − ε.
m
Then the probability that the largest r.v. in absolute value in the first√sequence has the same index
as the largest r.v. in absolute value in the second sequence is 1 − O( ε log m).
We informally sketch the proof. See Appendix C for the complete proof. It is instructive to
consider the case when cov(ξi , ηi ) = 1 − ε for all i. Assume that the first variable ξ1 is the largest in
absolute value among ξ1 , . .√. , ξm and its absolute value is a positive number t. Note that the typical
value of t is approximately 2 log m − log log m (i.e. t is the (1−1/m)-quantile of N (0, 1)).√We want
to show that η1 is the largest in absolute value among η1 , . . . , ηm with probability
√ 1 − O( ε log m),
or in other words the probability that any (fixed) ηi is larger than η1 is O( ε log m/m). Let us
compute this probability for η2 .
Since cov(η1 , ξ1 ) = 1−ε and cov(ξ2 , η2 ) = 1−ε, the random variable η1 is equal to (1−ε)ξ1 +ζ1 ;
and η2 is equal to (1−ε)ξ2 +ζ2 , where ζ1 and ζ2 are normal random variables with variance, roughly
speaking, 2ε. We need to estimate the probability of the event
{η2 ≥ η1 } = {(1 − ε)ξ2 + ζ2 ≥ (1 − ε)ξ1 + ζ1 } = {(1 − ε)ξ2 + ζ2 − ζ1 ≥ (1 − ε)t}
conditional on ξ1 = t and ξ2 ≤ t. For typical t this probability is almost equal to the probability of
the event:
{ξ2 + ζ ≥ t and ξ2 ≤ t} = {t − ζ ≤ ξ2 ≤ t}
(8)
where ζ = ζ2 − ζ1 .
√
Since the variance of the random variable ζ is O(ε),
can think
that ζ ≈ O( ε). The density
√ we
√
2
of ξ2 on the interval [t − ζ, t] √
is approximately 1/ 2πe−t /2 ≈ O( log m/m) (for typical t). Thus
probability (8) is equal to O( ε log m/m). This finishes our informal “proof”.
Now we are ready to prove the main lemma.
Lemma 4.2. The probability that
√ the algorithm finds an assignment of variables satisfying the
constraint πuv (xu ) = xv is 1 − O( εuv log k).
Proof. If εuv ≥ 1/8 the statement of the lemma follows trivially. So we assume that εuv ≤ 1/8.
Let
n
o
M = (i, s) : i ∈ [k] and s ≤ min(sui , svπuv (i) ) ;
n
o
Mc = (i, s) : i ∈ [k] and min(sui , svπuv (i) ) < s ≤ max(sui , svπuv (i) ) .
The set M contains those pairs (i, s) for which both ξui ,s and ξvπuv (i) ,s are defined (i.e. the matched
projections); the set Mc contains those pairs for which only one of the variables ξui ,s and ξvπuv (i) ,s
is defined (i.e. the unmatched projections). We will need the following lemmas.
11
Lemma 4.3. 1. The expected size of Mc is at most 4εuv k:
E [|Mc |] ≤ 4εuv k.
2. The set M always contains at least k/2 elements: |M | ≥ k/2.
Proof. 1. First we find the expected value of |sui − svπuv (i) | for a fixed i. This value is equal to
Er [2k · |ui |2 ]r − [2k · |vπuv (i) |2 ]r = 2k · |ui |2 − |vπuv (i) |2 .
Now by the triangle inequality constraint (5),
2k · |ui |2 − |vπuv (i) |2 ≤ 2k · |ui − vπuv (i) |2 .
Summing over all i in [k] we finish the proof.
2. Observe that
min(sui , svπuv (i) ) ≥ 2k · min(|ui |2 , |vπuv (i) |2 ) − 1
and
min(|ui |2 , |vπuv (i) |2 ) ≥ |ui |2 − ||ui |2 − |vπuv (i) |2 | ≥ |ui |2 − |ui − vπuv (i) |2 .
Summing over all i we get
X
X
|M | =
min(sui , svπuv (i) ) ≥
2k · |ui |2 − 2k · |ui − vπuv (i) |2 − 1
i∈[k]
i∈[k]
≥ 2k − 4kεuv − k ≥ k/2.
Lemma 4.4. The following inequality holds:


X
1
E
εiuv  ≤ 4εuv .
|M |
(i,s)∈M
Proof. Recall that M always contains at least k/2 elements. The expected value of min(sui , svπuv (i) )
is equal to 2k · min(|ui |2 , |vπuv (i) |2 ) and is less than or equal to 2k · µuv (i). Thus we have

"
#
k
X
X
1
1
i
i
εuv  = Er
min(sui , svπuv (i) ) · εuv
Er 
|M |
|M |

(i,s)∈M
i=1
≤
k
k
i=1
i=1
X
2X
2k · µuv (i) · εiuv ≤ 4
µuv (i) · εiuv ≤ 4εuv .
k
12
Proof of Lemma 4.2
Applying Theorem 4.1 to the sequences ξui ,s ((i, s) ∈ M ) and ξvπuv (i) ,s ((i, s) ∈ M ) we get that
for given r the probability that the largest in absolute value random variables in the first sequence
ξui ,s and the second sequence ξvπuv (i) ,s have the same index (i, s) is


v
u
X
1
u

1 − O tlog |M | ·
εiuv  .
|M |
(i,s)∈M
√
Now by Lemma 4.4, and by the concavity of the function x, we have



v
u
p

u log |M | X i 
εuv  ≥ 1 − O
Er 1 − O t
εuv log k .
|M |
(i,s)∈M
The probability that there is a larger ξui ,s or ξvπuv (i) ,s in Mc is at most
Er
4εuv k
|Mc |
≤
= 8εuv .
|M |
k/2
Using the union bound we get that the probability of satisfying the constraint πuv (xu ) = xv is equal
to
p
p
1 − O( εuv log k) − 8εuv = 1 − O( εuv log k).
Theorem 4.5.√There is a polynomial time algorithm that finds an assignment of variables which
satisfies 1 − O( ε log k) fraction of all constraints if the optimal solution satisfies (1 − ε) fraction
of all constraints.
Proof. Summing the probabilities obtained in Lemma 4.2 over all edges (u, v) and using the
√
concavity
of
the
function
x we get that the expected number of satisfied constraints is 1 −
√
O( ε log k)|E|.
5
d to 1 Games
In this section we extend our results to d-to-1 games.
Definition 5.1. We say that Π ⊂ [k] × [k] is a d-to-1 predicate, if for every i there are at most d
different values j such that (i, j) ∈ Π, and for every j there is at most one i such that (i, j) ∈ Π.
Definition 5.2 (d to 1 Games). We are given a directed constraint graph G = (V, E), a set
of variables xu (for all vertices u) and d-to-1 predicates Πuv ⊂ [k] × [k] for all edges (u, v). Our
goal is to assign a value from the set [k] to each variable xu , so that the maximum number of the
constraints (xu , xv ) ∈ Πuv is satisfied.
13
Note that even if all constraints of a d-to-1 game are satisfiable it is hard to find an assignment
of variables satisfying all constraints. We will show how to satisfy


√
− √d−1+ε
d+1−ε
1
k

· (1 − ε)4 · √
Ω √
log k
log k
fraction of all constraints (the multiplicative constant in the Ω notation depends √
on d). Notice that
this value can be obtained by replacing ε in formula (1) with ε0 = 1 − (1 − ε)/ d (and changing
(1 − ε)2 to (1 − ε)4 ).
Even though we do not require that for a constraint Πuv each i in [k] belongs to some pair
(i, j) ∈ Πuv , let us assume that for each i there exists j s.t. (i, j) ∈ Πuv ; and for each j there exists
i s.t. (i, j) ∈ Πuv . As we see later this assumption is not important.
In order to write a relaxation for d-to-1 games introduce the following notation:
X
i
wuv
=
vj .
j:(i,j)∈Πuv
The SDP is as follows:
k
X
i 2
ui − wuv
1 X
minimize
2
(u,v)∈E
!
i=1
subject to
∀u ∈ V ∀i, j ∈ [k], i 6= j
∀u ∈ V
hui , uj i = 0
k
X
|ui |2 = 1
(9)
(10)
i=1
∀(u, v) ∈ V i, j ∈ [k]
hui , vj i ≥ 0
i
i 2
∀(u, v) ∈ V i ∈ [k] 0 ≤ hui , wuv
i ≤ min(|ui |2 , |wuv
| )
(11)
(12)
1 |2 + . . . + |w k |2 = 1, here we use the fact that for a fixed
An important observation is that |wuv
uv
i .
edge (u, v) each vj is a summand in one and only one wuv
We use Algorithm 1 for rounding a vector solution. For analysis we will need to change some
notation:
(
i /|w i |, if w i 6= 0;
wuv
uv
uv
i
w̃uv =
0, otherwise
εiuv =
i |2
|ũi − w̃uv
2
εiuv 0 = 1 −
1 − εiuv
√
d
i |2
|ui |2 + |wuv
2
The following lemma explains why we get the new dependency on ε.
µuv (i) =
14
Lemma 5.3. For every edge (u, v) and state i there exists j 0 s.t. (i, j 0 ) ∈ Πuv and |ũi − ṽj 0 |2 /2 ≤
εiuv 0 .
Proof. Let u0i be the projection of the vector ũi to the linear span of the vectors vj (where (i, j) ∈
i ; and let β be the angle between ũ and u0 . Clearly,
Πuv ). Let αi be the angle between ũi and wuv
i
i
i
0
i
|ui | = cos βi ≥ cos αi =√1 − εuv . Since all ṽj ((i, j) ∈ Πuv ) are orthogonal
unit
vectors,
there
exists
√
ṽj 0 s.t. hṽj 0 , u0i i ≥ |u0i |/ d. Hence, hṽj 0 , ũi i = hṽj 0 , u0i i ≥ (1 − εiuv )/ d.
For every edge (u, v) and state i, find j 0 as in the previous lemma and define a function4
πuv (i) = j 0 . Then replace every constraint (xu , xv ) ∈ Πuv with a stronger constraint πuv (xu ) = xv .
Now we can apply the original analysis√of Algorithm 1 to the new√problem. In the proof we need
to substitute εiuv 0 for εiuv , 1 − (1 − εuv )/ d for εuv , and 1 − (1 − ε)/ d for ε. The only missing step
is the following lemma.
Lemma 3.60 . For every edge (u, v) the following statements hold.
1. The average value of εiuv w.r.t. the measure µuv is less than or equal to εuv .
2. The average value of εiuv 0 w.r.t. the measure µuv is less than or equal to 1 −
1−ε
√ uv .
d
3. min(sui , svπuv (i) ) ≥ d(1 − εiuv 0 )4 µuv (i)k.
i and let α0 be the angle between u and v
Proof. Let αi be the angle between ui and wuv
i
πuv (i) .
i
1. Indeed,
k
X
µuv (i) · εiuv =
i=1
≤
=
k
i |2 − (|u |2 + |w i |2 ) · cos α
X
|ui |2 + |wuv
i
i
uv
2
i=1
k
X
i=1
k
X
i=1
i |2 − 2 · |u | · |w i | · cos α
|ui |2 + |wuv
i
i
uv
2
i |2
|ui − wuv
= εuv .
2
2. This follows from part 1 and the definition of εiuv 0 .
i | cos α ≤ |u |. Thus
3. Due to the triangle inequality constraint, |wuv
i
i
(1 − εiuv )2 µuv (i) = cos2 αi ·
i |2
|ui |2 + |wuv
≤ |ui |2 .
2
Similarly |vπuv (i) | cos αi0 ≤ |ui | and
(1 − εiuv 0 )2 |ui |2 ≤ cos2 αi0 · |ui |2 ≤ |vπuv (i) |2 .
√
Combining these two inequalities and noting that (1 − εiuv 0 ) = (1 − εiuv )/ d, we get
d(1 − εiuv 0 )4 µuv (i) ≤ (1 − εiuv 0 )2 |ui |2 ≤ |vπuv (i) |2 .
The lemma follows.
4
The function πuv is not necessarily a permutation.
15
We now address the issue that for some edges (u, v) and states j there may not necessarily exist
i s.t. (i, j) ∈ Πuv . We call such j a state of degree 0. The key observation is that in our algorithms
we may enforce additional constraints like xu = i or xu 6= i by setting ui = 1 or ui = 0 respectfully.
Thus we can add extra states and enforce that the vertices are not in these states. Then we add
pairs (i, j) where i is a new state, and j is a state of degree 0 (or vice-versa). Alternatively we can
rewrite the objective function by adding an extra term:
!
k
X
1 X
2
i 0 2
ui − wuv
minimize
+ |wuv
| ,
2
(u,v)∈E
i=1
0 is the sum of v over j of degree 0.
where wuv
j
Acknowledgements
We would like to thank Sanjeev Arora, Uri Feige, Johan Håstad, Muli Safra and Luca Trevisan for
valuable discussions and Eden Chlamtac for his comments. The second and third authors thank
Microsoft Research for their hospitality.
References
√
[1] A. Agarwal, M. Charikar, K. Makarychev, and Y. Makarychev. O( log n) approximation
algorithms for Min UnCut, Min 2CNF Deletion, and directed cut problems. In Proceedings
of the 37th Annual ACM Symposium on Theory of Computing, pp. 573–581, 2005.
[2] G. Andersson, L. Engebretsen, and J. Håstad. A new way to use semidefinite programming
with applications to linear equations mod p. Journal of Algorithms, Vol. 39, pp. 162–204,
2001.
[3] S. Chawla, R. Krauthgamer, R. Kumar,Y. Rabani, and D. Sivakumar. On the hardness of
approximating multicut and sparsest-cut. In Proceedings of the 20th IEEE Conference on
Computational Complexity, pp. 144–153, 2005.
[4] E. Chlamtac, K. Makarychev and Y. Makarychev. How to play any Unique Game. manuscript,
February 2006.
[5] I. Dinur, E. Mossel, and O. Regev. Conditional hardness for approximate coloring. ECCC
Technical Report TR05-039, 2005.
[6] U. Feige and L. Lovász. Two-prover one round proof systems: Their power and their problems.
In Proceedings of the 24th ACM Symposium on Theory of Computing, pp. 733–741, 1992.
[7] U. Feige and D. Reichman. On systems of linear equations with two variables per equation. In Proceedings of the 7th International Workshop on Approximation Algorithms for
Combinatorial Optimization, vol. 3122 of Lecture Notes in Computer Science, pp. 117–127,
2004.
16
[8] M. Goemans , D. Williamson. Improved approximation algorithms for maximum cut and
satisfiability problems using semidefinite programming. Journal of the ACM, vol. 42, no. 6,
pp. 1115–1145, Nov. 1995.
[9] A. Gupta and K. Talwar. Approximating Unique Games. In Proceedings of the 17th ACMSIAM Symposium on Discrete Algorithms, pp. 99–106, 2006.
[10] S. Khot. On the power of unique 2-prover 1-round games. In Proceedings of the 34th ACM
Symposium on Theory of Computing, pp. 767–775, 2002.
[11] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for
MAX-CUT and other two-variable CSPs? In Proceedings of the 45th IEEE Symposium on
Foundations of Computer Science, pp. 146–154, 2004.
[12] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell. Optimal inapproximability results for
MAX-CUT and other 2-variable CSPs? ECCC Report TR05-101, 2005.
[13] S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2 − ε. In
Proceedings of the 18th IEEE Conference on Computational Complexity, pp. 379–386, 2003.
[14] S. Khot and N. Vishnoi. The unique games conjecture, integrality gap for cut problems and the
embeddability of negative type metrics into `1 . In Proceedings of the 46th IEEE Symposium
on Foundations of Computer Science, pp. 53–62, 2005.
[15] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with low influences:
invariance and optimality. In Proceedings of the 46th IEEE Symposium on Foundations of
Computer Science, pp. 21–40, 2005.
[16] Z. Šidák. Rectangular Confidence Regions for the Means of Multivariate Normal Distributions.
Journal of the American Statistical Association, vol. 62, no. 318, pp. 626–633, Jun. 1967.
[17] L. Trevisan. Approximation Algorithms for Unique Games. In Proceedings of the 46th IEEE
Symposium on Foundations of Computer Science, pp. 197–205, 2005.
A
Properties of Normal Distribution
For completeness we present some standard results used in the paper.
Denote the probability that a standard normal random variable is bigger than t ∈ R by Φ̃(t),
in other words
Φ̃(t) ≡ 1 − Φ0,1 (t) = Φ0,1 (−t),
where Φ0,1 is the normal distribution function.
Lemma A.1.
1. For every t > 0,
√
t2
t
1 − t2
e− 2 < Φ̃(t) < √
e 2
2π(t2 + 1)
2πt
17
2. There exist positive constants c1 , C1 , c2 , C2 such that for all 0 < p < 1/3, t ≥ 0 and ρ ≥ 1 the
following inequalities hold:
t2
t2
c1
C1
e− 2 ≤ Φ̃(t) ≤ √
e− 2 ;
2π(t + 1)
2π(t + 1)
p
p
c2 log (1/p) ≤ Φ̃−1 (p) ≤ C2 log (1/p);
√
3. There exists a positive constant C3 , s.t. for every 0 < δ ≤ 2 and t ≥ 1/δ the following
inequality holds:
Φ̃(δt +
Proof.
1
2
) ≥ C3 (t · Φ̃(t))δ · t−1 .
δt
1. First notice that
Φ̃(t) =
1
√
2π
Z
∞

2
− x2
e
t
2
− x2
1
−e
dx = √ 
x
2π
∞

Z ∞ − x2
e 2
−
dx
x2
t
t
=
√
1 − t2
1
e 2 −√
2πt
2π
Z
∞
t
Thus
Φ̃(t) < √
2
− x2
e
x2
dx.
1 − t2
e 2.
2πt
On the other hand
1
√
2π
Z
∞
t
x2
e− 2
1
dx < √
2
x
2πt2
Z
∞
e−
x2
2
t
dx =
Φ̃(t)
.
t2
Hence
Φ̃(t) > √
1 − t2
Φ̃(t)
e 2 − 2 ,
t
2πt
and
Φ̃(t) > √
t2
t
e− 2 .
2π(t2 + 1)
2. This trivially follows from (1).
3. Using (2) we get
δ 2 ·t2
1
1 −1 − (δt+ δt1 )2
2
·e
Φ̃(δt + ) ≥ C · 1 + δt +
≥ C 0 · (δt + 1)−1 · e− 2
δt
δt
2


δ
2
− t2
00  e
 · tδ2 · t−1 ≥ C 000 · (t · Φ̃(t))δ2 · t−1
≥ C
t+1
18
We will use the following result of Z. Šidák [16]:
Theorem A.2 (Šidák). Let ξ1 , . . . , ξk be normal random variables with mean zero, then for any
positive t1 , . . . , tk ,
Pr (|ξ1 | ≤ t1 , |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≥ Pr (|ξ1 | ≤ t1 ) Pr (|ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) .
Note that these random variable do not have to be independent.
Corollary A.3. Let ξ1 , . . . , ξk be normal random variables with mean zero, then for any positive
t1 , . . . , t k ,
Pr (ξ1 ≥ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≤ Pr (ξ1 ≥ t1 ) .
Proof. By Theorem A.2,
Pr (|ξ1 | ≤ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) ≥ Pr (|ξ1 | ≤ t1 ) .
Thus
Pr (ξ1 ≥ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk ) =
≤
B
1 1
− Pr (|ξ1 | ≤ t1 | |ξ2 | ≤ t2 , . . . , |ξk | ≤ tk )
2 2
1 1
− Pr (|ξ1 | ≤ t1 ) = Pr (ξ1 ≥ t1 )
2 2
Analysis of Algorithm 1
In this section we will prove some technical lemmas we used in the analysis of the first algorithm.
Lemma B.1. Let ξ and η be correlated standard normal random variables, 0 < ε < 1, t ≥ 1. If
cov(ξ, η) ≥ 1 − ε, then
√
2
Pr (ξ ≥ t and η ≥ t) ≥ C · min(1, ( εt)−1 ) · t−1 · (t · Φ̃(t)) 2−ε .
(13)
for some positive constant C.
Proof. Let us represent ξ and η as follows:
p
p
ξ = σX + 1 − σ 2 · Y ; η = σX − 1 − σ 2 · Y,
where
ξ+η
ξ+η
ξ−η
; X=
; Y = √
.
σ = Var
2
2σ
2 1 − σ2
2
Note that X and Y are independent standard normal random variables; and
ξ+η
1
ε
2
σ = Var
= [2 + 2 cov(ξ, η) ] ≥ 1 − .
2
4
2
19
(14)
Notice that 1/2 ≤ σ 2 ≤ 1. We now estimate the probability (13) as follows
p
Pr ξ ≥ t and η ≥ t = Pr σX ≥ t + 1 − σ 2 · |Y |
σ2
t
σ
· Pr |Y | ≤ √
≥ Pr X ≥ +
σ
t
1 − σ2 · t
By Lemma A.1 (3) we get
Pr ξ ≥ t and η ≥ t ≥ C · t
−1
1/σ 2
· (tΦ̃(t))
σ2
· min 1, √
1 − σ2 · t
√
2
≥ C 0 · min(( ε · t)−1 , 1) · t−1 · (t · Φ̃(t)) 2−ε .
Corollary B.2. Let ξ and η be standard normal random variables with covariance greater than or
equal to 1 − ε; let Φ̃(t) = 1/k. Then
− 2 !
2−ε
1
1
k
Pr (ξ ≥ t and η ≥ t) ≥ Ω min 1, √
·√
· √
.
ε log k
log k
log k
Lemma B.3. Let ξ, η, ε, k and t be as in Corollary B.2, and let ξ1 , . . . , ξm be i.i.d. standard
normal random variables and m ≤ 2k, then
"m
#
X
E
I{ξi ≥t} | ξ ≥ t and η ≥ t = O(1),
i=1
where I{ξi ≥t} is the indicator of the event {ξi ≥ t}.
Proof. Let X
qand Y be as in the proof of Lemma B.1. Put αi = cov(X, ξi ) and express each ξi as
2 ≤ 1 (since random variables ξ are
ξi = αi X + 1 − αi2 · Zi . By Bessel’s Inequality α12 + · · · + αm
i
orthogonal). We now estimate
p
Pr(ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y |) =
p
p
Pr ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X ≤ 4t · Pr X ≤ 4t | σX ≥ t + 1 − σ 2 · |Y |
p
p
+ Pr ξi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X > 4t · Pr X > 4t | σX ≥ t + 1 − σ 2 · |Y |
Notice that
p
1 − σ 2 · |Y | and X < 4t
q
p
= Pr αi X + 1 − αi2 · Zi ≥ t | σX ≥ t + 1 − σ 2 · |Y | and X < 4t
Z 4t
q
p
=
Pr αi x + 1 − αi2 · Zi ≥ t | σx ≥ t + 1 − σ 2 · |Y | dF (x)
t/σ
q
p
2
2
≤ max Pr
1 − αi · Zi ≥ t − αi x | 1 − σ · |Y | ≤ σx − t
x∈[t/σ,4t]
q
by Corollary A.3
2
≤
max Pr
1 − αi · Zi ≥ t − αi x ≤ Pr (Zi ≥ (1 − 4αi )t) .
Pr ξi ≥t | σX ≥ t +
x∈[t/σ,4t]
20
It suffices to prove that
m
X
Pr (Zi ≥ (1 − 4αi )t) =
i=1
m
X
Φ̃((1 − 4αi )t) = O(1).
i=1
Fix a sufficiently large constant c. The number of αi that are greater than 1/c is O(1). The
number of αi such that log−1 k ≤ αi ≤ 1/c is O(log2 k) and for them Φ̃((1 − 4αi )t) = O(k −1/2 )
(since c is a sufficiently large constant). Finally, if αi < 1/ log k, then Φ̃((1 − 4αi )t) = O(k −1 ). This
finishes the proof.
Lemma B.4. The function (1 − x)2 fk (x) is convex on the interval [0, 1].
√
Proof. Let m = k/ log k. Compute the first and the second derivatives of fk :
fk00 (x)
2
m− 2−x
= −2 log m ·
= m
(2 − x)2
2
m− 2−x
log m
= 4 log m ·
·
−1 .
(2 − x)3
2−x
2
− 2−x
00
!0
00
Now (1 − x)2 · fk (x)
= (1 − x)2 · fk00 (x) − 4(1 − x)fk0 (x) + 2fk (x). Observe that fk (x)
0
is always positive, and fk (x) is always negative. Therefore, if fk00 (x) is positive, we are done:
00
(1 − x)2 · fk (x) ≥ 0. Otherwise, we have
00
(1 − x)2 · fk (x) = (1 − x)2 · fk00 (x) − 4(1 − x)fk0 (x) + 2fk (x) ≥ fk00 (x) + 2fk (x)
2
2
2
log m
≥ 4 log m · m− 2−x
− 1 + 2m− 2−x = 2m− 2−x (log m − 1)2 ≥ 0.
2
C
Analysis of Algorithm 2
In this section, we present the formal proof of Theorem 4.1. We will follow an informal outline of
the proof sketched in the main text. We start with estimating probability (8).
Lemma C.1. Let ξ and ζ be two independent random normal variables with variance 1 and σ 2
respectively (0 < σ < 1). Then for every positive t
Pr (ξ ≤ t and ξ + ζ ≥ t) = O(σe
2 /2
Remark C.1. In the “typical” case e(σt+1)
(σt+1)2
2
is a constant.
21
t2
· e− 2 ).
Proof. We have
Z ∞
Pr ξ ≤ t and ξ + ζ ≥ t =
Pr (ξ ≤ t and ξ + x ≥ t) dFζ (x)
0
Z ∞
x2
1
Pr (ξ ≤ t and ξ + x ≥ t) e− 2σ2 dx
=√
2πσ 0
Z ∞
y2
1
Pr (ξ ≤ t and ξ + σ y ≥ t) e− 2 dy
=√
2π 0
Z t/σ
Z ∞
y2
y2
1
1
Pr (t − σ y ≤ ξ ≤ t) e− 2 dy + √
Pr (t − σ y ≤ ξ ≤ t) e− 2 dy.
=√
2π 0
2π t/σ
Let us bound the first integral. Since the density of the random variable ξ on the interval (t − σy, t)
is at most
√1 e
2π
−(t−σy)2
2
and y ≤ ey , we have
−(t−σy)2
−t2
1
σ
Pr (t − σ y ≤ ξ ≤ t) ≤ σy · √ e 2
≤ √ · e 2 · e(σt+1)y .
2π
2π
Therefore,
1
√
2π
Z
t/σ
−t2
2
− y2
Pr (t − σ y ≤ ξ ≤ t) e
dy ≤
0
σe 2
2π
Z
t/σ
e(σt+1)y · e−
y2
2
dy
0
−t2 Z
∞
(y−(σt+1))2
(σt+1)2
σe 2
2
e−
≤
· e 2 dy
2π
−∞
(σt+1)2
−t2
= O σe 2 · e 2
.
We now upper bound the second integral. If t ≥ 1, then
1
√
2π
Z
∞
2
− y2
Pr (t − σ y ≤ ξ ≤ t) e
t/σ


2
Z ∞
− t2
2σ
y2
1
e

dy ≤ √
e− 2 dy = Φ̃(t/σ) = O 
t/σ + 1
2π t/σ


2
− t2
t2
σ
e
−
 = O σe 2 .
= O
t+σ
If t ≤ 1, then
Z ∞
y2
1
√
Pr (t − σ y ≤ ξ ≤ t) e− 2 dy ≤
2π t/σ
1
√
2π
Z
∞
2
− y2
σy · e
2
− t2
dy = O (σ) = O σ e
.
0
The desired inequality follows from the upper bounds on the first and second integrals.
We need a slight generalization of the lemma.
Corollary C.2. Let ξ and ζ be two independent random normal variables with variance 1 and σ 2
respectively (0 < σ < 1). Then for every t ≥ 0 and 0 ≤ ε̄ < 1
!
2
(σ + ε̄t) · c(ε̄, σ, t) · e−t /2
Pr (ξ + ζ ≥ (1 − ε̄)t | |ξ| ≤ t) = O
,
1 − 2Φ̃(t)
22
where
c(ε̄, σ, t) = e
(σt+1)2
+ε̄t2
2
.
Remark C.2. As in the previous lemma, in the “typical” case c(ε̄, σ, t) is a constant.
Proof. First note that
Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t)
Pr (|ξ| ≤ t)
Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t)
1 − 2Φ̃(t)
Pr (ξ + ζ ≥ (1 − ε̄)t | |ξ| ≤ t) ≤
=
Now,
Pr (ξ + ζ ≥ (1 − ε̄)t and ξ ≤ t) ≤ Pr (ξ + ζ ≥ t and ξ ≤ t)
+ Pr ((1 − ε̄)t ≤ ξ + ζ ≤ t) .
By Lemma C.1, the first probability is bounded as follows:
(σt+1)2
t2
−
Pr (ξ + ζ ≥ t and ξ ≤ t) ≤ O σe 2 · e 2 .
Since Var [ξ + ζ] ≤ 1 + σ 2 , the second probability is at most
−
Pr ((1 − ε̄)t ≤ ξ + ζ ≤ t) ≤ ε̄t · e
((1−ε̄)t)2
2(1+σ 2 )
≤ ε̄t · e
(2ε̄+σ 2 )t2
2
t2
· e− 2 ,
here we used the following inequality
(1 − ε̄)2 t2
(1 − ε̄)2 (1 − σ 2 )t2
(1 − 2ε̄ − σ 2 )t2
t2 (2ε̄ + σ 2 )t2
=
≥
≥
−
.
2(1 + σ 2 )
2(1 − σ 4 )
2
2
2
The corollary follows.
In the following lemma we formally define the random variables ζ1 and ζ2 .
Lemma C.3. Let ξ1 , ξ2 , η1 and η2 be standard normal random variables such that ξ1 and ξ2 are
independent; η1 and η2 are independent; and
• cov(ξ1 , η1 ) ≥ 1 − ε̄ ≥ 0 and cov(ξ2 , η2 ) ≥ 1 − ε̄ ≥ 0 (for some positive ε̄);
• cov(ξ1 , η2 ) ≥ 0 and cov(ξ2 , η1 ) ≥ 0.
Then there exist normal random variables ζ1 and ζ2 independent of ξ1 and ξ2 with variance at most
2ε̄ such that
|η1 | − |η2 | ≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |.
Proof. Express η1 as a linear combination of ξ1 , ξ2 , and a normal r.v. ζ1 independent of ξ1 and ξ2 :
η1 = α1 ξ1 + β1 ξ2 + ζ1 ,
23
similarly,
η2 = α2 ξ1 + β2 ξ2 + ζ2 .
Note that α1 = cov(η1 , ξ1 ) ≥ 1 − ε̄ and β1 = cov(η1 , ξ2 ) ≥ 0. Thus
Var [ζ1 ] ≤ Var [η1 ] − α12 ≤ 1 − (1 − ε̄)2 ≤ 2ε̄.
Similarly, α2 ≥ 0, β2 ≥ 1 − ε̄, and Var[ζ2 ] ≤ 2ε̄. Since η1 and η2 are independent, we have
α1 α2 + β1 β2 + cov(ζ1 , ζ2 ) = cov(η1 , η2 ) = 0.
Therefore (note that cov(ζ1 , ζ2 ) ≤ 0; α1 α2 ≥ 0; β1 β2 ≥ 0),
p
Var[ζ1 ] Var[ζ2 ]
−β1 β2 − cov(ζ1 , ζ2 )
2ε̄
α2 =
≤
≤
.
α1
1 − ε̄
1 − ε̄
2ε̄
Taking into account that α2 ≤ 1, we get α2 ≤ min(1, 1−ε̄
) ≤ 3ε̄. Similarly, β1 ≤ 3ε̄. Finally, we
have
|η1 | − |η2 | ≥ (α1 − α2 )|ξ1 | − (β1 + β2 )|ξ2 | − |ζ1 | − |ζ2 |
≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |.
In what follows we assume that ξ1 is the largest r.v. in absolute value among ξ1 , . . . , ξm and its
absolute value is t. For convenience we define three events:
At = {|ξi | ≤ t for all 3 ≤ i ≤ m} ;
Et = At ∩ {|ξ1 | = t and |ξ2 | ≤ t} ;
[
E = {|ξ1 | ≥ |ξi | for all i} =
Et .
t≥0
Now we are ready to combine Corollary C.2 and Lemma C.3.
Lemma C.4. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables.
Suppose that
1. the random variables in each of the sequences are independent,
2. the covariance of every ξi and ηj is nonnegative,
3. cov(ξ1 , η1 ) ≥ 1 − ε̄ and cov(ξ2 , η2 ) ≥ 1 − ε̄, where ε̄ ≤ 1/7.
Then
Pr (|η1 | ≤ |η2 | | Et ) = O
!
√
√
2
( ε̄ + ε̄t) · e−t /2 · c(7ε̄, 8ε̄, t)
,
1 − 2Φ̃(t)
where c(ε̄, σ, t) is from Corollary C.2.
24
(15)
Proof. By Lemma C.3, we have
|η1 | − |η2 | ≥ (1 − 4ε̄)|ξ1 | − (1 + 3ε̄)|ξ2 | − |ζ1 | − |ζ2 |.
Therefore,
Pr (|η1 | ≤ |η2 | | Et ) ≤ Pr ((1 + 3ε̄)|ξ2 | + |ζ1 | + |ζ2 | ≥ (1 − 4ε̄)|ξ1 | | Et )
≤ Pr (|ξ2 | + |ζ1 | + |ζ2 | ≥ (1 − 7ε̄)t | Et )
X
≤
Pr (sξ2 + s1 ζ1 + s2 ζ2 ≥ (1 − 7ε̄)t | Et )
s,s1 ,s2 ∈{±1}
Let us fix signs s, s1 , s2 ∈ {±1} and denote ξ = sξ2 , ζ = s1 ζ1 + s2 ζ1 , then we need to show that
!
√
2
√
e−t /2 · c(7ε̄, 8ε̄, t)
.
Pr (ξ + ζ ≥ (1 − 7ε̄)t | Et ) = O ( ε̄ + ε̄t) ·
1 − 2Φ̃(t)
Observe that the random variables ξ, ζ and the event At are independent of ξ1 , thus
Pr (ξ + ζ ≥ (1 − 7ε̄)t | Et )
= Pr (ξ + ζ ≥ (1 − 7ε̄) t | At and |ξ1 | = t and |ξ| ≤ t)
= Pr (ξ + ζ ≥ (1 − 7ε̄) t | At and |ξ| ≤ t)
= Pr (ζ ≥ (1 − 7ε̄) t − ξ | At and |ξ| ≤ t) .
Since ξ and At are independent, for every fixed value of ξ we can apply Corollary A.3. Thus
Pr (ζ ≥ (1 − 7ε̄)t − ξ | At and |ξ| ≤ t) ≤ Pr (ζ ≥ (1 − 7ε̄)t − ξ | |ξ| ≤ t)
= Pr (ξ + ζ ≥ (1 − 7ε̄)t | |ξ| ≤ t) .
Finally, by Corollary C.2 (where σ 2 = Var [ζ] ≤ 8ε̄),
Pr (ξ + ζ ≥ (1 − 7ε̄)t | |ξ| ≤ t) = O
!
√
√
2
( ε̄ + ε̄t) · e−t /2 · c(7ε̄, 8ε̄, t)
.
1 − 2Φ̃(t)
Corollary C.5. Under assumptions of Lemma C.4,
1. if ε̄t2 ≤ 1, then
√ (t + 1) · Φ̃(t)
Pr (|η1 | ≤ |η2 | | Et ) = O
ε̄
1 − 2Φ̃(t)
2. if t > 1, then
Pr (|η1 | ≤ |η2 | | Et ) = O
Proof. 1. If ε̄t2 ≤ 1, then ε̄t ≤
√
√ ε̄ .
ε̄ and
c(7ε̄,
√
8ε̄, t) = e
(
√
8ε̄t+1)2
+7ε̄t2
2
25
= O(1).
!
;
Notice that
√
2
( ε̄ + ε̄t) · e−t /2
=O
1 − 2Φ̃(t)
!
√
( ε̄ + ε̄t) · (t + 1) · Φ̃(t)
,
1 − 2Φ̃(t)
since
2
e−t /2
t
Φ̃(t) = Θ
!
.
2. If ε̄ > 1/32 the statement holds trivially. So assume that ε̄ ≤ 1/32. Then
√
( 8ε̄t + 1)2
3t2
+ 7ε̄t2 ≤
+ O(t).
2
8
√
t2
Thus t·e− 2 ·c(7ε̄, 8ε̄, t) is upper bounded by some absolute constant. Since t ≥ 1, the denominator
1 − 2Φ̃(t) of the expression (15) is bounded away from 0.
We now give a bound on the “typical” absolute value of the largest random variable.
Lemma C.6. The following inequality holds:
p
1
Pr |ξ1 | ≥ 2 log m | E ≤ .
m
Proof. Note that the probability of the event E is 1/m, since all random variables ξ1 , . . . , ξm are
equally likely to be the largest in absolute value. Thus we have
Pr |ξ | ≥ 2√log m
p
1
1
1
1
Pr |ξ1 | ≥ 2 log m | E ≤
≤ 2
= .
Pr (E)
m
m
m
Lemma C.7. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables as in Theorem 4.1. Assume that cov(ξ1 , η1 ) ≥ 1 − ε̄ and cov(ξ2 , η2 ) ≥ 1 − ε̄, where ε̄ <
min(1/(4 log m), 1/7). Then
√
ε̄ log m
Pr (|η1 | ≤ |η2 | | E) = O
.
m
Proof. Write the desired probability as follows:
p
Pr (|η1 | ≤ |η2 | | E) = Pr |η1 | ≤ |η2 | and |ξ1 | ≤ 2 log m | E
p
+ Pr |η1 | ≤ |η2 | and |ξ1 | ≥ 2 log m | E
√
First consider the case |ξ1 | ≤ 2 log m. Denote by dF|ξ1 | the density of |ξ1 | conditional on E. Then
√
2 log m
Z
p
Pr |η1 | ≤ |η2 | and |ξ1 | ≤ 2 log m | E =
0
Z
Pr (|η1 | ≤ |η2 | | E and |ξ1 | = t) dF|ξ1 | (t)
√
2 log m
=
0
26
Pr (|η1 | ≤ |η2 | | Et ) dF|ξ1 | (t)
Now by Corollary C.5,
Z
√
2 log m
Z
Pr (|η1 | ≤ |η2 | | Et ) dF|ξ1 | (t) =
0
√
2 log m
O
0
!
√
2 ε̄ log m Φ̃(t)
dF|ξ1 | (t).
1 − 2Φ̃(t)
Let us change the variable to x = 1 − 2Φ̃(t). What is the probability density function of 1 − 2Φ̃(|ξ1 |)
given E? For each i the r.v. 1−2Φ̃(|ξi |) is uniformly distributed on the interval [0, 1]. Now |ξi | > |ξj |
if and only if 1 − 2Φ̃(|ξi |) > 1 − 2Φ̃(|ξj |), therefore 1 − 2Φ̃(|ξ1 |) is distributed as the maximum of
m independent random variables on [0, 1] given E. Its density function is (xm )0 = mxm−1 (for
x ∈ [0, 1]). We have
√
Z ∞ √
2 ε̄ log m Φ̃(t)
2 ε̄ log m Φ̃(t)
dF|ξ1 | (t) ≤
dF|ξ1 | (t)
1 − 2Φ̃(t)
1 − 2Φ̃(t)
0
0
Z 1 √
Z 1
p
2 ε̄ log m · (1 − x)/2
=
(1 − x)xm−2 dx
· mxm−1 dx = m ε̄ log m
x
0
0
√
p
1
ε̄ log m
1
= m ε̄ log m
−
=
.
m−1 m
m−1
√
Now consider the case |ξ1 | ≥ 2 log m, by Corollary C.5,
√ p
Pr |η1 | ≤ |η2 | | E and |ξ1 | ≥ 2 log m = O
ε̄ .
Z
√
2 log m
By Lemma C.6,
p
1
Pr |ξ1 | ≥ 2 log m | E ≤ .
m
This concludes the proof.
Now we will prove a lemma, which differs from Theorem 4.1 only by one additional condition (4).
Lemma C.8. Let ξ1 , · · · , ξm and η1 , · · · , ηm be two sequences of standard normal random variables.
Let εi = cov(ξi , ηi ). Suppose that
1. the random variables in each of the sequences are independent,
2. the covariance of every ξi and ηj is nonnegative,
1 Pm
i
3. m
i=1 ε = ε,
4. εi ≤ min(1/(4 log m), 1/7).
Then the probability that the largest r.v. in absolute value in the first√sequence has the same index
as the largest r.v. in absolute value in the second sequence is 1 − O( ε log m).
Proof. By Lemma C.7,
√
log m p
1
2
Pr |η1 | ≤ |η2 | | |ξ1 | ≥ max |ξj | = O
max (ε , ε ) .
j≥2
m
27
Applying the union bound, we get
√
!
m q
log m X
max (ε1 , εi )
Pr(|η1 | ≤ max|ηj | | |ξ1 | ≥ max |ξj |) = O
i≥2
j≥2
m
i=2
!!
√
m √
p
X
√
√
by Jensen’s inequality
√ log m
≤
O
=O
εi
log m( ε1 + ε) .
· m ε1 +
m
i=1
Since the probability that |ξi | = maxj |ξj | equals 1/m for each i, the probability that the largest
r.v. in absolute value among ξi , and the largest r.v. in absolute value among ηj have different
indexes is at most
!
m
p
p
√
√
√
√ 1 Xp
O
log m · ( εi + ε) ≤ O
log m · ( ε + ε) = O
ε log m .
m
i=1
Proof of Theorem 4.1. Denote εi = 1 − cov(ξi , ηi ). Then (ε1 + · · · + εm ) ≤ mε. We may assume
that ε < min(1/(4 log m), 1/7) — otherwise, the theorem follows trivially.
Consider the set I = {i : εi < min(1/(4 log m), 1/7)}. Since ε < min(1/(4 log m), 1/7), the set I
is not empty. Applying Lemma C.8 to random variables {ξi }i∈I and {ηi }i∈I , we conclude that the
the largest r.v. in absolute value among {ξi }i∈I has the same index as the largest r.v. in absolute
value among {ξi }i∈I with probability
s

p
X
1
1 − O  log |I| ·
ε log m .
εi  = 1 − O
|I|
i∈I
Since each ξi is the largest r.v. among ξ1 ,. . . , ξm in absolute value with probability 1/m, the
probability that the largest r.v. among ξ1 ,. . . , ξm does not belong to {ξi }i∈I is m−|I|
m . Similarly,
the probability that the largest r.v. among η1 ,. . . , ηm does not belong to {ηi }i∈I is (m − |I|)/m.
Therefore, by the union bound, the probability that the largest r.v. in absolute value among ξi ,
and the largest r.v. in absolute value among ηj have different indexes is at most
p
m − |I|
1 − O( ε log m) − 2
.
m
(16)
We now upper bound the last term.
2
m − |I|
m
by the Markov inequality
ε
min(1/(4 log m), 1/7)
≤
2
≤
p
2 (4 log m + 7)ε = O(ε log m) = O( ε log m).
(Here we use that ε log m < 1.)
√
Plugging this bound into (16) we get that the desired probability is 1 − O( ε log m). This
finishes the proof.
28
D
An Alternate Approach
We would like to present an alternative version of Algorithm 1. It demonstrates another approach
for taking into account the lengths of vectors ui : we can choose a separate threshold tui for each
ui . This algorithm achieves the same approximation ratio. The analysis uses similar ideas and we
omit it here.
Input: A solution of the SDP, with the objective value ε · |E|.
Output: An assignment of variables xu .
1. Define ũi = ui /|ui | if ui 6= 0, 0 otherwise.
2. Pick a random Gaussian vector g with independent components distributed as N (0, 1).
3. For each vertex u:
(a) For each i project the vector g to ũi :
ξui = hg, ũi i.
(b) Fix a threshold tui s.t. Pr (ξui ≥ tui ) = |ui |2 (i.e. tui is the (1 − |ui |2 )-quantile of the
standard normal distribution). Note that tui depends on ui .
(c) Pick ξui ’s that are larger than the threshold tui :
Su = {i : ξui ≥ tui } .
(d) Pick at random a state i from Su and assign xu = i.
29
Download