An Extension of the Lovasz Local Lemma, and its Applications to Integer Programming Aravind Srinivasany Abstract Though the LLL is powerful, one problem is that the \dependency" d is high in some cases, precluding the use of the LLL if p is not small enough. We present a partial solution to this via an extension of the LLL (Theorem 3.1), which shows how to essentially reduce d for a class of events Ei ; this works well when each Ei denotes a random variable deviating from its mean. We apply our LLL to improve the known integrality gap for three classes of NP-hard integer linear programming problems (ILPs){minimax, packing and covering integer programs (MIPs/PIPs/CIPs). A key technique, randomized rounding of linear relaxations, was developed by Raghavan & Thompson [27] to get approximation algorithms for these ILPs (see also Beck & Spencer [7]). We use Theorem 3.1 to prove that this technique produces, with non-zero probability, much better feasible solutions than known before, if the constraint matrices of the IPs are sparse (having \few" non-zero entries in all columns). For MIPs, this complements a result of Karp, Leighton, Rivest, Thompson, Vazirani & Vazirani [18] (see also Beck & Fiala [6]); for PIPs/CIPs, it improves on recent work [31], which in turn improved on that of [27]. Such results cannot be got via Lemma 1.1, as the dependency d, in the sense of Lemma 1.1, can be as high as (m) for these problems. (Indeed, the striking applications of Lemma 1.1 arise in situations where d grows as o(m), e.g., if d is O(polylog(m)).) Theorem 3.1 works well in combination with an idea that has bloomed in de-randomization and pseudorandomness results, in the last decade or so: (approximately) decomposing a function of several variables into a sum of terms, each of which depends on only a few of these variables. Concretely, suppose Z is a sum of random variables Zi . Many tools have been developed to upper-bound Pr(Z ? E[Z] z) and Pr(jZ ? E[Z]j z) even if the Zi s are only (almost) k-wise independent for some \small" k, rather than completely independent. The idea is to bound the probabilities by considering E[(Z ? E[Z])k ] or similar expectations, which look at the Zi k or fewer at a time (via linearity of expectation). The main application of this has been that the Zi s can then be sampled using \few" random bits, yielding a de-randomization/pseudo-randomness result (e.g., The Lovasz Local Lemma (LLL) is a powerful tool in proving the existence of rare events. We present an extension of this lemma, which works well when the event to be shown to exist is a conjunction of individual events, each of which asserts that a random variable does not deviate much from its mean. We consider three classes of NP-hard integer programs: minimax, packing, and covering integer programs. A key technique, randomized rounding of linear relaxations, was developed by Raghavan & Thompson to derive good approximation algorithms for such problems. We use our extended LLL to prove that randomized rounding produces, with non-zero probability, much better feasible solutions than known before, if the constraint matrices of these integer programs are sparse (e.g., VLSI routing using short paths, problems on hypergraphs with small dimension/degree). We also generalize the method of pessimistic estimators due to Raghavan, to constructivize our packing and covering results. 1 Introduction. The powerful Lovasz Local Lemma (LLL) [13] is often used to show the existence of rare combinatorial structures by showing that a random sample from a suitable sample space produces them with positive probability; see Chapter 5 of Alon, Spencer & Erd}os [4] for many such applications. Let e denote the base of natural logarithms as usual. The LLL (symmetric case) shows that all of a set of \bad" events Ei can be avoided under some conditions: Lemma 1.1. ([13]) Let E1; E2; : : :; Em be any events with Pr(Ei) p 8i. If each Ei is mutually independent of all but at most Vd of the other events Ej and if ep(d + 1) 1, then Pr( m i=1 Ei ) > 0. Work done in parts at the National University of Singapore, at DIMACS (supported in part by NSF-STC91-19999 and by support from the N.J. Commission on Science and Technology), at the Institute for Advanced Study, Princeton, NJ (supported in part by grant 93-6-6 of the Alfred P. Sloan Foundation), and while visiting the Max-Planck-Institut fur Informatik, 66123 Saarbrucken, Germany. y Dept. of Information Systems & Computer Science, National University of Singapore, Singapore119260, Republic of Singapore. E-mail: 1 [3, 22, 8, 23, 24, 29]). Our results show that such ideas can in fact be used to show that some structures exist! This is one of our main contributions. However, while many applications of Lemma 1.1 have been constructivized (Beck [5], Alon [1]), our MIP result is only existential. For PIPs and CIPs, we present a generalization of the powerful method of pessimistic estimators of Raghavan [26], to constructivize our bounds. Many proofs/details are omitted here for lack of space; they will be given in the full version. Our main contributions are as follows. (a) The LLL extension is of independent interest: it draws from a de-randomization tool{decomposing a function of many variables into a sum of terms, each of which depends on only a few of these variables. We expect further interaction between this idea and the rich set of available derandomization tools. (b) This work shows that certain classes of sparse IPs have better solutions than known before; sparse problems abound in practice. Two such results shown later (on routing using short paths and hypergraph-partitioning), look particularly interesting. (c) Our generalized method of pessimistic estimators should prove fruitful in other contexts also. (d) The central theme of this work is the analysis of various types of correlations in random processes, which yields interesting results and nice open questions. 2 Our application domain and approximation results. Let Z+ denote the: set of non-negative integers; for any k 2 Z+ , [k] = f1; : : :; kg. \Random variable" is abbreviated by \r.v.", and logarithms are to the base 2 unless specied otherwise. Definition 2.1. An MIP (minimax integer program) has variables W and fP xi;j : i 2 [n]; j 2 [`i ]g, for some integers f`i g. Let N = i2[n] `i and let x denote the N-dimensional vector of the variables xi;j (arranged in any xed order). An MIP seeks to minimize W, an unconstrained real, subject to: P (i) Equality constraints: 8i 2 [n] j 2[` ] xi;j = 1; ~ where (ii) a system of linear inequalities Ax W, m N ~ A 2 [0; 1] and W is the m-dimensional vector with the variable W in each component, and (iii) Integrality constraints: xi;j 2 f0; 1g 8i; j. We let g denote the maximum column sum in any column of A, and a be the maximumnumber of non-zero entries in any column of A. To see what problems MIPs model, note, from constraints (i) and (iii) of MIPs, that for all i, any feasible solution will make the set fxi;j : j 2 [`i ]g have precisely one 1, with all other elements being 0; MIPs i 2 thus model many \choice" scenarios. Consider, e.g., global routing in VLSI gate arrays [27]. Given are an undirected graph G = (V; E), a function : V ! V , and 8i 2 V , a set Pi of paths in G, each connecting i to (i); we must connect each i with (i) using exactly one path from Pi , so that the maximum number of times that any edge in G is used for, is minimized{ an MIP formulation is obvious, with xi;j being the indicator variable for picking the jth path in Pi . This problem, the vector-selection problem of [27], and the discrepancy-type problems of Section 4, are all modeled by MIPs; many MIP instances, e.g., global routing, are NP-hard. Definition 2.2. Given A 2 [0; 1]mn, b 2 [1; 1)m and c 2 [0; 1]n with maxj cj = 1, a PIP (resp. CIP) seeks to maximize (resp. minimize) cT x subject to x 2 Z+n and Ax b (resp. Ax b). For PIPs, we also allow constraints of the form 0 xj dj . If A 2 f0; 1gmn, each entry of b is assumed integral. We dene B = mini bi , and let a be the maximum number of non-zero entries in any column of A. A PIP/ CIP is called unweighted if cj = 1 8j, and weighted otherwise. Note the parameters g, a and B of denitions 2.1 and 2.2. Though there are usually no restrictions on the entries of A; b and c in PIPs/CIPs aside of nonnegativity, the above restrictions are without loss of generality (w.l.o.g.), because of the following. First, we may assume that 8i; j; Aij is at most bi. If this is not true for a PIP, then we may as well set xj := 0; if this is not true for a CIP, we can just reset Aij := bi. Next, by scaling each row of A such that maxj Ai;j = 1 for each row i and by scaling c so that maxj cj = 1, we get the above form for A, b and c. Finally, if A 2 f0; 1gmn, then for a PIP, we can always reset bi := bbi c for each i and for a CIP, reset bi := dbi e; hence the assumption on the integrality of each bi , in this case. PIPs and CIPs again model many NP-hard problems in combinatorial optimization. Recall that a hypergraph H = (V; E) is a family of subsets E (edges) of a set V (vertices). A set M E is a matching in H if no vertex occurs in more than one edge in M; a basic and well-known NP-hard problem is to nd a maximumcardinality matching in a given nite hypergraph. A generalization of this is `-matching, for integral ` 1 (Lovasz [21]): each vertex can be in at most ` elements of M. The `-matching problem is naturally written as a PIP with 0-1 variables xi , and with B = `. Similarly, CIPs model, e.g., the classical set cover problem{ covering V using the smallest number of edges in E (such problems have natural weighted versions). The parameter a for these IPs is the maximum number of vertices in any edge. (y (y =m)1=B ) if A 2 f0; 1gmn. (ii) For covering, an approximation ratio of We now present a simple lemma, to quantify the approximation results. Part (a) is the Cherno-Hoeding bound (see, e.g., Appendix A of [4]); (b) is easily derived from the denition of G (proof omitted). Lemma 2.1. Given independent r.v.s X1 ; : : :; Xn 2 P [0; 1], let X = ni=1 Xi and = E[X]. a. For any > 0, Pr(X (1 + )) G(; ), where ? G(; ) = e =(1 + )1+ : p 1 + O(maxfln(mB=y )=B; ln(mB=y )=B g): Thus, while the work of [27] gives a general approximation bound for MIPs, the above-seen result of [18] gives good results for sparse MIPs. For PIPs/CIPs, the current best results are those of [31]; however, no better results were known for sparse PIPs/CIPs. b. 8 > 0 8p 2 (0; 1); 9 = H(; p) > 0 such that 2.1 Improvements achieved. For MIPs, we use d e G(; ) p and such that H(; p) is log(p?1 ) ) if log(p?1 )=2, and is ( log(log(p ?1 )=) s ?1 ( log( + p ) ) otherwise. Since many MIPs/PIPs/CIPs are NP-hard, we seek good approximation algorithms for them. One such approach is to start with their LP relaxation: allowing the entries of x to be non-negative reals. Definition 2.3. Given a CIP/PIP/MIP, x and y denote, resp., an optimal solution to, and the optimum value of, its LP relaxation. (For PIPs, constraints such as \xj 2 f0; 1; : : :; dj g" are relaxed to \xj 2 [0; dj ]".) If the optimal integral value is C, then maxfC=y ; y =C g is called the integrality gap of the LP relaxation. Given an IP, we can solve its LP relaxation eciently, but need to round fractional entries in x to integers. The idea of randomized rounding is: given a real v > 0, round v to bvc + 1 with probability v ? bvc, and round v to bvc with probability 1 ? v+ bvc. This has the nice property that the mean outcome is v. Starting with this idea, the analysis of [27] produces an integral solution of value at most y + O(minfy ; mg H(minfy ; mg; 1=m)) for MIPs (though phrased a bit dierently); this is de-randomized in [26]. But this does not exploit the sparsity of A; the previously mentioned result of Karp et al. ([18]) produces an integral solution of value at most y + g + 1. For PIPs, the idea of [27] is to solve the LP relaxation, scale down the components of x suitably, and then perform randomized rounding; for CIPs, we scale up instead. (See Section 5 for the details.) Starting with this idea, the work of [27] leads to certain approximation bounds; similar bounds are achieved through dierent means by Plotkin, Shmoys & Tardos [25]. Recent work of this author [31] improved upon these results by observing a \correlation" property of PIPs/CIPs, getting: (i) For packing integer programs, an integral solution of value (y (y =m)1=(B?1)) in general, and 3 the extended LLL and an idea of E va Tardos that leads to a bootstrapping of the LLL extension, to show the existence of an integral solution of value y + O(minfy ; mg H(minfy ; mg; 1=a)) + O(1); see Theorem 4.2. Since a m, this is always as good as the y + O(minfy ; mg H(minfy ; mg; 1=m)) bound of [27] and is a good improvement, if a m. It also is an improvement over the additive g factor of [18] in cases where g is not small compared to y . Consider, e.g., the global routing problem and its MIP formulation, sketched above; m here is the number of edges in G, and g = a is the maximum length of S any path in i Pi. To focus on a specic interesting case, suppose y , the fractional congestion, is at most one. Then while the above-cited previous results ([27] and [18], resp.) give bounds of O(logm= log logm) and O(a) on an integral solution, we get the improved bound of O(log a= logloga). Similar improvements are easily seen for other ranges of y also; e.g., if y = O(loga), an integral solution of value O(log a) exists, improving on the previously known bounds of O(log m= log(2 logm= log a)) and O(a). Thus, routing along short paths (this is the notion of sparsity for the global routing problem) is very benecial in keeping the congestion low. Section 4 presents a scenario where we get such improvements, for discrepancy-type problems [30, 4]. In particular, we generalize a hypergraphpartitioning result of Furedi & Kahn [15]. Please refer now to the above-seen bounds of [31] for PIPs/CIPs; our bounds for PIPs/CIPs depend only on the set of constraints Ax b (or Ax b), i.e., they hold for any non-negative objective-function vector c. Our improvements over [31] get better as y decreases. For PIPs, we show the existence of an (y =a1=(B?1) ) solution in general, and of an (y =a1=B ) solution if A 2 f0; 1gmn. This improves on [31] if y n=a, i.e., for sparse weighted PIPs; in particular, it shows the existence of an O(1) integrality gap for weighted hypergraph B-matching, if B = (loga). For CIPs, we show an integrality gap of p 1 + O(maxfln(a + 1)=B; ln(a + 1)=B g); positive integer d, we can dene, for each i 2 [m], a nite number of r.v.s Ci;1; Ci;2; : : : each taking on only non-negative values such that: (i) any Ci;j is mutually independent of all but at most d of the events Ek , k 6= i, and (ii) 8I ([m] ? fig), once again improving on [31] for weighted CIPs. This CIP bound is better than that of [31] if y mB=a: this inequality fails for unweighted CIPs and is generally true for weighted CIPs, since y can get arbitrarily small in the latter case. In particular, we generalize the result of Chvatal [10] on weighted set cover. Consider, e.g., a facility location problem on a directed graph G = (V; A): given a cost ci 2 [0; 1] for each i 2 V , we want a min-cost assignment of facilities to the nodes such that each node sees at least B facilities in its outneighborhood{multiple facilities at a node are allowed. If in is the maximum in-degree of G, we show an integrality gap of p 1 + O(maxfln(in + 1)=B; ln(B(in + 1))=B g): This improves on [31] if y jV jB=in; it shows an O(1) (resp., 1 + o(1)) integrality gap if B grows as fast as (resp., strictly faster than) log in. Theorem 5.1 presents our packing/covering results. A key corollary of our results is that for families of instances of both PIPs and CIPs, we get a good (O(1) or 1+o(1)) integrality gap, if B grows at least as fast as loga. For PIPs and CIPs, we use the FKG inequality in addition to Theorem 3.1; for PIPs, we also need the powerful Janson's inequality ([16, 9], see also Chapter 8 of [4]). Bounds on the result of a greedy algorithm for CIPs relative to the optimal integral solution, are known [11, 12]. Our bound improves that of [11] and is incomparable with [12]; for any given A, c, and the unit vector b=jjbjj2, our bound improves on [12] if B is more than a certain threshold. As it stands, randomized rounding produces such improved solutions for several PIPs/CIPs only with a very low, sometimes exponentially small, probability. Thus, it does not imply a randomized algorithm, often. To this end, we generalize Raghavan's method of pessimistic estimators to constructivize our PIP/CIP results, in x5.2. Pr(Ei Z(I)) P X j E[Ci;j Z(I)]: Let pi denote j E[Ci;j ]; clearly, Pr(Ei) pi (set I = in (ii)). Suppose Vthat for all i 2 [m] we have epi (d + 1) 1. Then Pr( i Ei) (d=(d + 1))n > 0. Remark 3.1. Ci;j and Ci;j can \depend" on different subsets of fEk jk 6= ig; the only restriction is that these subsets be of size at most d. Note that we have essentially reduced the dependency among the Eis, to just d: epi (d + 1) 1 suces. Another important point is that the dependency among the r.v.s Ci;j could be much higher than d: all we count is the number of Ek that any Ci;j depends on. Proof of Theorem 3.1. We prove by induction on jI j suces that if i 62 I then Pr(Ei Z(I))V epi , which Q to prove the theorem since Pr( i Ei) = i2[m] (1 ? Pr(Ei Z([i ? 1]))). For the base case where I = ;, Pr(Ei Z(I)) = Pr(Ei) pi . For the inductive step, let Si;j;I =: fk 2 I Ci;j depends on Ek g, and 0 = I ? Si;j;I ; note that jSi;j;I j d. If Si;j;I = ;, Si;j;I then E[Ci;j Z(I)] = E[Ci;j ]. Otherwise, letting Si;j;I = f`1 ; : : :; `r g, we have Z(S 0 )] E[C Ind(Z(S )) i;j i;j;I i;j;I E[Ci;j Z(I)] = 0 )) Pr(Z(Si;j;I ) Z(Si;j;I 0 E[Ci;j Z(S i;j;I )] ; (3.1) Pr(Z(S ) Z(S 0 i;j;I i;j;I )) 3 The extended LLL and an approach to large since Ci;j is non-negative. The numerator of the last deviations. term is E[Ci;j ], by assumption. The denominator is We now present our LLL extension, Theorem 3.1. For Y 0 ))) (1 ? Pr(E` Z(f`1 ; `2; : : :; `s?1g [ Si;j;I any event E, dene Ind(E) to be its indicator r.v.: 1 if E holds and 0 otherwise. Suppose we have \bad" s2[r] events E1; : : :; Em with a \dependency" d0 (in the sense of Lemma 1.1) that is \large". Theorem 3.1 shows how which is at least Y to essentially replace d0 by a (possibly much) smaller (1 ? ep` ) d, under some conditions. It generalizes Lemma 1.1 s2[r] (dene one r.v., Ci;1 = Ind(Ei ), for each i, to get Lemma 1.1), its proof is very similar to the classical by induction hypothesis, i.e., at least (1 ? 1=(d+1))r proof of Lemma 1.1, and its motivation will be claried (d=(d + 1))d > 1=e. Hence,PE[Ci;j Z(I)] eE[Ci;j ] by the applications. and thus, Pr(Ei Z(I)) j E[Ci;j Z(I)] epi Theorem 3.1. Given events E1; : : :; Em and any 1=(d + 1). This concludes the proof. V Ei. Suppose that for some I [m], let Z(I) =: 0 s s i2I 4 The crucial point is that the events Ei could have a large dependency d0 , in the sense of the classical Lemma 1.1. The main utility of Theorem 3.1 is that if we can \decompose" each Ei into the r.v.s Ci;j that satisfy the conditions of the theorem, then there is the possibility of eectively reducing the dependency by a lot (d0 can be replaced by the value d). Concrete instances of this will be studied in later sections. The tools behind our MIP application are our new LLL, and a result of [29]. Dene, for z = (z1 ; : : :; zn ) 2 <n, a family of polynomials Sj (z); j = 0; 1; : : :; n, where S0 (z) 1, and for j 2 [n], X Sj (z) =: zi zi z i : Proposition 4.1. If 0 < 1 2, then for any > 0, G(1 ; 2=1) G(2; ). Theorem 4.1. Given an MIP conforming to Definition 2.1, randomized rounding produces a feasible solution of value at most y + minfy ; mg H(minfy ; mg; 1=(et)), with non-zero probability. Proof. Conduct randomized rounding: indepen- dently for each i, randomly round exactly one xi;j to 1, guided by the \probabilities" fxi;j g. We may assume that fxi;j g is a basic feasible solution to the LP relaxation. Hence, at most m of the fxi;j g will be neither zero nor one, and only these variables will participate in the rounding. Thus, since all the entries of A are in [0; 1], 1 2 we assume w.l.o.g. from now on that y m (and that 1i1<i2 <i n maxi2[n] `i m); this explains the minfy ; mg term in our stated bounds. If z 2 f0; 1gN denotes the randomly The relevant theorem of [29] is i ] = bi by linearity of expecTheoremP 3.2. ([29]) Given r.v.s X1 ; : : :; Xn 2 rounded vector, then E[(Az) . Dening k = dy H(y ; 1=(et))e n tation, i.e. , at most y [0; 1], let X = i=1 Xi and = E[X]. Then, events E1; E2; : :V:; Em by Ei \(Az)i bi +k", we (a) For any > 0, any nonempty event Z and any and now show that Pr( i2[m] Ei) > 0 using Theorem 3.1. non-negative integer k (1 + ), Rewrite the ith constraint of the MIP as X X Pr(X (1 + ) Z) E[Yk Z]; x ; A X W; where X = j j where Yk = Sk (X1 ; : : :; Xn )= ?(1+) k . r2[n] i;r i;r s2[`r ] i;(r;s) r;s (b) If the Xi s are independent and k = d e, then the notation Ai;(r;s) assumes that the pairs f(r; s) : r 2 [n]; s 2 [`r ]g have been mapped bijectively to [N], in Pr(X (1 + )) E[Yk ] G(; ), where G(; ) = e =(1 + )1+ : ? Suppose r1 ; r2; : : :rn 2 [0; 1] satisfy r q. Then, a simple proof is given in [29], i i=1 for the fact that for? any non-negative integer k q, Sk (r1 ; r2; : : :; rn) kq . This clearly holds even given the occurrence of any nonempty event Z. Thus we get Pr(X (1 + ) Z) Pr(Yk 1 Z) E[Yk Z], where the second inequality follows from Markov's inequality. A proof of (b) is given in [29]. Pn Proof. some xed way. Dening the r.v. Zi;r = X s2[`r ] Ai;(r;s) zr;s ; we note that for each i, the r.v.s fZi;r : rP2 [n]g lie in [0; 1] and are independent. Also, Ei \ r2[n] Zi;r bi + k". Theorem 3.2 suggests a suitable choice for ?the crucial r.v.s Ci;j (to apply Theorem 3.1). Let u = nk ; we now dene the r.v.s fCi;j : i 2 [m]; j 2 [u]g as follows. Fix any i 2 [m]. Identify each j 2 [u] with some distinct k-element subset S(j) of [n], and let Q 4 Approximating Minimax Integer Programs. Z : Ci;j = v?2bS (+jk) i;v : Suppose we are given an MIP conforming to Denik tion 2.1. Dene t to be maxi2[m] NZi , where NZi is the number of rows of A which have a non-zero co- We now need to show that the r.v.s C satisfy the ecient corresponding to at least one variable among conditions of Theorem 3.1. For any i 2i;j[m], let i = fxi;j : j 2 [`i ]g. Note that k=bi . Since bi y , we have, for each i 2 [m], i (4.2) g a t minfm; a maxi2[n] `i g: G(bi; i ) G(y ; k=y ) by Proposition 4.1 G(y ; H(y ; 1=(et))) 1=(ekt); by the denition of H. Theorem 4.1 now shows how Theorem 3.1 can help, for sparse MIPs{those where t m. We will then bootstrap Theorem 4.1 to get the further improved Now by Theorem 3.2, we get Theorem 4.2. We start with the simple 5 Fact 4.1. For all i 2 P [m] and for all nonempty to 1.) Next, ideas similar to those used in Theorem 4.1 eventsPZ , Pr(Ei Z) j 2[u] E[Ci;j Z]. Also, show that with nonzero probability, we have: pi =: j 2[u] E[Ci;j ] < G(bi; i ) 1=(ekt). 3 5 1:5 2 Next since any Ci;j involves (a product of) k terms, each of which \depends" on at most (t ? 1) of the events fEv : v 2 ([m] ? fig)g by denition of t, we see the important Fact 4.2. 8i 2 [m] 8j 2 [u], Ci;j 2 [0; 1] and Ci;j \depends" on at most d = k(t ? 1) of the set of events fEv : v 2 ([m] ? fig)g. From Facts 4.1 and 4.2 and by noting that epi (d + 1) e(kt ? k + 1)=(ekt) 1, we invoke Theorem 3.1, to V see that Pr( i2[m] Ei) > 0, concluding the proof of Theorem 4.1. Theorem 4.1 gives good results if t m, but can we improve it further, say by replacing t by a ( t) in it? As seen from (4.2), the key reason for t a(1) is that maxi2[n] `i a(1) . If we can essentially \bring down" maxi2[n] `i by forcing many xi;j to be zero for each i, then we eectively reduce t (t a maxi `i , see (4.2)); this is so since only those xi;j that are neither zero nor one take part in the rounding. A way of bootstrapping Theorem 4.1 to achieve this is shown by: (i)(Az)i (y ) log t(1+O(1=((y ) log t))) 8i 2 [m]; and (ii) j X j zi;j ? (y )2 log5 tj O(y log3 t); 8i 2 [n]: P A subtle point is that (ii) bounds the lower tail of j zi;j also: this uses the fact that each of the n events in (ii) depends on at most t of the other (m + n ? 1) events in (i) and (ii). Suppose we have a rounding z satisfying (i) and (ii). PNext, for each i 2 [n] and j 2 [`i ], let x00i;j :=Pzi;j = u zi;u . From (i) and (ii) we deduce: (a) 1 )) for all i from Since j zi;j (y )2 log5 t(1 ? O( y log 2t (ii), we have, 8i 2 [m], 1:5 2 (Ax00 )i y (1 + O(1=((y ) 2log t))) 1 ? O(1=(y log t)) (4.4) = y (1 + O(1=(y log2 t))): (b) Importantly, since the z are non-negative integers i;j Theorem 4.2. For MIPs, there is an integral summing to at most (y )2 log5 t(1 + O(1=(y log2 t))), solution of value at most y + O(minfy ; mg at most O((y )2 log5 t) values x00i;j are nonzero, for each H(minfy ; mg; 1=a)) + O(1). i 2 [n]. Thus, by losing a little in y (see (4.4)), our down" method has given Proof. We present a brief sketch for lack of space. \scaling up{rounding{scaling 00 a much-reduced `i for each We assume from now on that t is suciently large (oth- a fractional solution 2x with 5 t), essentially. Thus, i; ` is now O((y ) log t has i erwise Theorem 4.2 matches Theorem 4.1). Suppose been reduced to O(a(y )2 log5 t) = O(t1=4+2=7 log5 t), since (4.3) was assumed false. Repeating this scheme (4.3) (y t1=7 ) or (t a4) O(log logt) times makes t small enough to satisfy (4.3). here, holds; Theorem 4.2 will again match Theorem 4.1 then. Case II: t?1=7 < y < 1: The idea is the5 same So we may assume that (4.3) is false. Also, we shall with the scaling up of xi;j being by (log t)=y . This show in the full version, as to how the extreme case concludes the proof sketch for Theorem 4.2. of y < t?5 can be handled: so we also assume that y t?5 (the common situation). Next if y t?1=7, We now study our improvements for discrepancyTheorem 4.1 anyway guarantees an integral solution type problems, which are an important class of MIPs of value O(1), as is promised by Theorem 4.2. Thus, that, among other things, are useful in devising dividesuppose y > t?1=7. The basic idea now is, as sketched and-conquer algorithms. Given is a set-system (X; F), above, to set many xi;j to zero for each i (without where X = [n] and F = fS1; : : :; SM g 2X . Given a losing too much on y ), so that maxi `i and hence, t, positive integer `, the problem is to partition X into will essentially get reduced. Such an approach, whose ` parts, so that each Sj is \split well": we want a performance will be validated by arguments similar to : X ! [`] which minimizes maxj 2[M ];k2[`] jfi 2 those of Theorem 4.1, is repeatedly applied till (4.3) Sj : (i) = kgj. (The case ` = 2 is the standard holds, owing to the (continually reduced) t becoming set-discrepancy problem.) To motivate this problem, small enough to satisfy (4.3). There are two cases: suppose we have a (di)graph (V; A); we want a partition Case I: y 51: Solve the LP relaxation, and set of V into V1; : : :; V` such that 8v 2 V , fjfj 2 N(v) \ x0i;j := (y )2(log t)xi;j . Conduct randomized rounding Vk gj : k 2 [`]g are \roughly the same", where N(v) is on the x0i;j now, rounding each x0i;j independently to the (out-)neighborhood of v. See, e.g., [2, 17] for how zi;j 2 fbx0i;j c; dx0i;j eg. (Note the key dierence from this aids divide-and-conquer approaches. This problem Theorem 4.1, where for each i, we round exactly one xi;j is naturally modeled by the above set-system problem. 6 Let be the degree of (X; F), i.e., maxi2[n] jfj : i 2 Sj gj, and let 0 =: maxS 2F jSj j. Our problem is naturally written as an MIP with m = M`, `i = ` for each i, and g = a = , in the notation of Denition 2.1; y = 0=` here. The analysis of [27] gives an integral solution of value at most y (1 + O(H(y ; 1=(M`)))), while [18] presents a solution of value at most y + . Also, since any Sj 2 F intersects at most ( ? 1)0 other elements of F, Lemma 1.1 shows that randomized rounding produces, with positive probability, a solution of value at most y (1 + O(H(y ; 1=(e0`)))). This is the approach taken by [15] for their case of interest: = 0 , ` = = log. Theorem 4.2 shows the existence of an integral solution of value y (1 + O(H(y ; 1=))) + O(1), i.e., removes the dependence on 0. This is an improvement on all the three results above. As a specic interesting case, suppose ` grows at most as fast as 0 = log. Then we see that good integral solutions{those that grow at the rate of O(y ) or better{exist, and this was not known before. (The above approach of [15] shows such a result for ` = O(0= log(maxf; 0g)). Our bound of O(0= log) is always better than this, and especially so if 0 .) that Ei \Ai X > i (1 + i )", where i = E[Ai X] and i = (bi ? Ai s)=i ? 1. Also, Em+1 \cT X < m+1 (1 ? m+1 )", where m+1 = E[cT X] and m+1 = 1 ? (y =(2) ? cT s)=m+1 . For CIPs, the idea is to solve the LP relaxation, and for an > 1 to be xed later, to set x0j = xj , for each j 2 [n]. We then construct a random integral solution z by setting, independently for each j 2 [n], zj = bx0j c + 1 with probability x0j ? bx0j c, and zj = bx0j c with probability 1 ? (x0j ? bx0j c). Let Ai , si , and X1 ; X2 ; : : :; Xn be as dened above for PIPs. Here again, we want an () approximation. The bad events now are Ei \Ai X < i (1 ? i )" 8i 2 [m]; and Em+1 \cT X > m+1 (1 + m+1 )", where i = E[Ai X] and i = 1 ? (bi ? Ai s)=i for i 2 [m], m+1 = E[cT X], and m+1 = (y ? cT s)=m+1 ? 1. Let us also dene Definition 5.1. For a matrix D 2 [0; 1]mn, ppack (D; ; ) : =: (e=)+1 , if D 2 f0; 1gmn; ppack (D; ; ) = (e=) otherwise. We also need a simple lemma from [31], which combines the Cherno-Hoeding bound (Lemma 2.1(a)) with some simple algebraic observations about PIPs/CIPs: Lemma 5.1. (i) For a PIP, maxi2[m] Pr(Ei) < ppack (A; ; B); (ii) for a CIP, maxi2[m] Pr(Ei) < pcov =: e?B(?1)2 =(2) . Similarly, we can show that the \worst case" (in terms of having a large value for the integrality gap ) for our analysis of PIPs/CIPs, occurs (as usual) when si = 0 8i above; in this case, m+1 = y = for PIPs and m+1 = y for CIPs. Thus, our assumptions henceforth are that (5.5) m+1 = y = and m+1 = 1=2 for PIPs; (5.6) m+1 = y and m+1 = ? 1 for CIPs: j 5 Approximating packing and covering integer programs. For PIPs and CIPs, we extend some of the ideas of Theorem 3.1, and also use some \correlation" results of [31]. A crucial way in which we will need to extend Theorem 3.1 for PIPs is in allowing one Ei , Em+1 , to have Ci;j s that can become negative: we use Janson's inequality [16, 9] to aid in this. For PIPs, we solve the LP relaxation and set x0i := xi = for some > 1 to be xed later; this scaling down is done to boost the chance that the constraints in the PIP are all satised. Dene a random z 2 Z+m , the outcome of randomized rounding, as follows. Independently for each i, set zi to be bx0i c + 1 with probability x0i ? bx0i c, and bx0i c with probability 1 ? (x0i ? bx0i c). We need to show that all constraints are satised and that cT z is not \much below" y , with positive probability; we also need to choose . Note that E[(Az)i ] = (Ax0 )i bi = and E[cT z] = y =. For some > 1 to be xed later, dene events E1; : : :; Em by Ei \(Az)i > bi ", and let Em+1 \cT z < y =()". Now, z is an (){ Vm+1 approximate solution to PIP if i=1 Ei holds. We now focusV on achieving a \small" value for (), such that Pr( mi=1+1 Ei ) > 0 holds; in fact, we x = 2 for PIPs. For every j 2 [n], let sj = bx0j c, and pj = 0 xj ? sj 2 [0; 1). Let Ai denote the ith row of A. Let X1 ; X2 ; : : :; Xn 2 f0; 1g be independent r.v.s with Pr(Xj = 1) = pj 8j 2 [n]. For i 2 [m], it is clear 5.1 Correlation properties of PIPs and CIPs. Our focusV from now on is to nd \small" ; > 1 so that Pr( mi=1+1 Ei ) > 0 (for both PIPs and CIPs); some preliminary lemmas are in order. Definition 5.2. (a) Events T1 ; T2; : : :; Tm are positively correlatedQ if for every nonempty S [m], V Pr( i2S Ti ) i2S Pr(Ti). (b) An event E is negatively correlated with a set of events VfT1 ; T2; : : :; Tm g if for every nonempty S [m], Pr(E i2S Ti ) Pr(E). The \intuitively clear" Lemma 5.2 is a crucial part and an immediate consequence of, the results of [31]. Its proof follows from the FKG inequality [14, 28]; the reader is referred to [31] for the proof. Lemma 5.2. Let r.v.s X1 ; X2 ; : : :; Xn and events E1 ; E2; : : :; Em+1 be as above, for PIPs/CIPs. (a) The events fE1; E2; : : :; Em g are positively corre7 lated for PIPs; similarly for CIPs. (b) For PIPs, 81 i < j n, the event \Xi = 1" and the event \Xi = Xj = 1", are each negatively correlated with the set of events fE1; : : :; EWmg. Also for any S [m], the event \(Xi = 1) ^ ( j 2S Ei)" is negatively correlated with the set of events fEj : j 2 [m] ? S g, for PIPs. via the FKG inequality, Lemma 5.2(a). By arguments similar to those of Lemma 5.1(i), we can show that for any j 2 [m], Pr(((Az)j ? Aji Xi ) > (B ? 1)) < ppack (A; ; B ? 1): Using this, (5.8), and the fact that jQij a (from (P1)), We now present an important lemma, to get cor- with (5.7), we conclude the proof of part (i). relation inequalities which \point" in the \direction" (ii) For part(ii), we use some ideas from the proof of opposite to FKG (i.e., instead of etc.); it will be a Lemma 1.1 and Theorem 3.1. Let Qi , Q0i , Z1 and Z2 key ingredient in proving Theorem 5.1. be as in part(i); (P1) and (P2) hold now also. Now, Z2 ) Lemma 5.3. (i) In a PIP, 8i 2 [n], Pr ((( X =1) ^ Z ) 1 . The Pr(Xi = 1 (Z1 ^ Z2 )) = m Pr(Z1 Z2 ) ^ Pr(Xi = 1 Ej ) pi (1 ? appack (A; ; B ? 1)): ((X =1) Z2 ) , which equals Pr(X =1) , r.h.s. is at most PrPr j =1 Z (Z1 2 ) Pr(Z1 Z2) Pr by (P2). This fraction, in turn, is at most Pr(X(Z=1) by (ii) In a CIP, 8i 2 [n], 1) Lemma 5.2(a), i.e., at most Pr(Xi = 1)=(1 ? pcov )a . m ^ a This concludes the proof of part(ii). Ej ) pi =(1 ? pcov ) : Pr(Xi = 1 i i i i j =1 Next, a simple conditional version of Markov's and Proof. (i) Fix i. Let Qi [m] be the indices of Chebyshev's inequalities (proof omitted). the rows of A, which have a non-zero entry in the ith P 0 Proposition 5.1. Let Y = `i=1 Yi be a sum of column; let Qi = [m] ? Qi. Note, importantly, that P1: \jQij a" (see the denition of a in Defn. 2.2), independent r.v.s Yi , each of which lies in [0; 1]; let E[Y ] =P. Then, and (i) 2 1i<j ` E[Yi Yj ] < 2 . P2: \Xi is mutually independent of the set of events (ii) For any u > 0 and any positive probability event fEj : j 2 Q0iV g" V 2, where T denotes Z , Pr(jY P ? j u Z) T=uP hold. Let Z1 ( j 2Q Ej ) and Z2 ( j 2Q Ej ); thus, Vm ` E[Y Y the sum i=1 E[Yi Z] + 2 i j (Z1 ^ Z2 ) ( j =1 Ej ). Now, 1 i<j ` P` 2 Z] + ? 2 i=1 E[Yi Z]. Pr(Xi = 1 (Z1 ^ Z2 )) Pr(((Xi = 1) ^ Z1 ) Z2 ); (iii) For any u > 0 andany positive probability event Z , and the right-hand-side equals Pr(Xi = 1 Z2 ) ? P` Pr(Y u Z) i=1 E[Yi Z] =u. Pr(((X = 1) ^ Z ) Z ); this dierence, in turn, is 0 i 1 i i 2 at least Pr(Xi = 1) ? Pr((Xi = 1) ^ Z1), from (P2) and Lemma 5.2(b). Thus we get (5.7) Pr(Xi = 1 (Z1 ^ Z2 )) Pr((Xi = 1) ^ Z1 ): The above approach was inspired by the simple proof of Janson's inequality due to [9]. To simplify (5.7), we need to lower-bound Pr((Xi = 1) ^ Z1 ) (an upper bound is immediate, via FKG). To do this, note that ((Xi = 1) ^ V ) implies ((Xi = 1) ^ Z1 ); V where V j 2Q (((Az)j ? Aji Xi ) (B ? 1)). Thus, Pr((Xi = 1) ^ Z1 ) Pr((Xi = 1) ^ V ) = Pr(Xi = 1)Pr(V ); since \Xi = 1" is independent of V . So the quantity of interest, Pr((Xi = 1) ^ Z1 ), is at least Y (5.8) pi Pr(((Az)j ? AjiXi ) (B ? 1)); Lemmas 5.2 and 5.3 and Proposition 5.1 now lead to our main theorem for PIPs/CIPs: Theorem 5.1. (i) For PIPs, there is a xed c1 > 0 such that if = c1a1=(B?1) V(or c1 a1=B if A 2 f0; 1gmn) and = 2, then Pr( mi=1+1 Ei) > 0 holds, i.e., there exists an integral solution of value at least y =(). (ii) For CIPs, there is a xed c2 > 0 such that if ; > 1 are chosen as: = c2 ln(a + 1)=B and = 2; if ln(a + 1) B; and p = = 1 + c2 ln(a + 1)=B; if ln(a + 1) < B; i j 2Qi then, there exists a feasible solution of value at most y . Thus, the p integrality gap is at most 1 + O(maxfln(a + 1)=B; ln(a + 1)=B g). V Proof. For PIPs, let Z i2[m] Ei ; thus, 8 V Pr( mi=1+1 Ei) = Pr(Em+1 Z) Pr(Z). By Q implicitly dened L f0; 1gn. Theorem n5.2 shows an approach to eciently nd some v 2 f0; 1g ?L. For any function f : [n] ! f0; 1g, let f[i 7! j] be the function g such that g(k) = f(k) if k V6= i, with g(i) = j; for any S [n], let Z(S; f) denote i2S (Xi = f(i)). Lemma 5.2(a), Pr(Z) i2[m] Pr(Ei) (1 ? ppack )m . Also, ppack < 1 since V > 1. Hence, Pr(Z) > 0 and m+1 E ) > 0, it suces to show thus, to show that Pr( i=1 i V that Pr(Em+1 i2[m] Ei) < 1; similarly for CIPs. (i) We handle PIPs rst. Using (5.5) and setting =: m+1 = y =, we get, by Proposition 5.1(ii), that Pr(Em+1 Z) is at most (5.9) T = (=2)2 ; Pn Z) + where T denotes the sum c Pr(X = 1 i i i =1 P Z) + 2 ? 2 P1i<j n ci cj Pr(Xi = Xj = 1 n 2 i=1 ci Pr(Xi = 1 Z). We wish to pick suitably, to make (5.9) smaller than 1. Note, now, that 81 i < j n, (5.10) Pr(Xi = 1 Z) Pr(Xi = 1) and (5.11)Pr(Xi = Xj = 1 Z) Pr(Xi = Xj = 1); by Lemma 5.2(b). More importantly, we need to lower bound Pr(Xi = 1 Z), to handle terms such as ?2ciPr(Xi = 1 Z); this is done by Lemma 5.3(i). Using Lemma 5.3(i) along with (5.10,5.11) and Proposition 5.1(i), we see that (5.9) is at most 42 ( + 22 ? 22(1 ? appack (A; ; B ? 1))), i.e., at most 4 (1 + 2ap (A; ; B ? 1)): (5.12) pack Now, we may assume that 1, since there is always a feasible solution of value at least 1 for PIPs. Simple algebra then shows that choosing > 1 as in part(i) of the statement of the theorem, makes (5.12) strictly smaller than 1, concluding the proof of part (i). V (ii) For CIPs, setting Z i2[m] Ei and =: m+1 = y , we see, from Proposition 5.1(iii), that Pr(Em+1 Pn Z) is at most R = ( i=1 ci Pr(Xi = 1 Z))=(). We again seek to make R < 1. Lemma 5.3(ii) shows that R 1=((1 ? pcov )a ). It is once again simple algebra to show that choosing ; > 1 as in part (ii) of the statement of the theorem, ensures that R < 1. Theorem 5.2. Suppose there is an eciently computable function U that takes some S [n] and some f : [n] ! f0; 1g as inputs, produces a non-negative real output, and satises the following: (a) U(; f) < 1 for any f : [n] ! f0; 1g; (b) 8S [n] 8f : [n] ! f0; 1g, (b1) U(S; f) Pr((X1; : : :; Xn ) 2 L Z(S; f)); (b2) If U(S; f) < 1 but 8i 2 ([n] ? S) 8j 2 f0; 1g, U(S [ fig; f[i 7! j]) 1 holds, then there is an eciently computable v 2 f0; 1gn ? L. We now present such a function for PIPs; the CIP construction will be shown in the full version. Let Xi , Ei , pi, etc. be as before for PIPs, and ; , and = y = be as in Theorem 5.1. We want a function U as above for the Xi , with L being the subset of f0; 1gn corresponding to points that are either infeasible for the PIP or with objective function value smaller than y =(), or both. For i 2 [n], let Qi [m] be, as before, the indices of the rows of A, which have a nonzero entry in the ith column. For j 2 Qi, let V (i; j) denote the event \((Az)j ? Aji Xi ) (B ? 1)". The proof of Theorem Q 5.1 suggests a natural choice for U: U(S; f) =: F i2[m] (1 ? Pr(Ei Z(S; f))), where F is shorthand for X 1 ? 42 ((1 ? 2) ciE[Xi Z(S; f)] + i2[n] X 2 1i<j n X ci X i2[n] j 2Qi (Pr(Xi Z(S; f))Pr(V (i; j) Z(S; f)))): To be able to compute Pr(Ei Z(S; f)), Pr(V (i; j) Z(S; f)) etc. eciently, we rst perturb the nonzero entries of the matrix A suitably, so that they all become rationals with the same denominator that is a suciently large polynomial in n and m; this can be done in a way that aects the quality of our approximation negligibly. Pr(Ei Z(S; f)) etc. can now be computed using polynomial interpolation, in polynomial time. The approach of Theorem 5.1 can be used to show that our function U satises requirements (a) and (b1) of Theorem 5.2; that it satises (b2) will be shown in the full paper. Similarly, the CIP constructivization combines the approach of Theorem 5.1 with some technical results. 5.2 Constructivization. 