>> Jana Kulkarni: Welcome everyone. It's a pleasure to have David Harrison from Maryland. Today he's going to tell us about a new randomized rounding algorithm based on partial resampling lemma. >> David Harris: This is work with me and Aravind Srinivasan about partial resampling. We're going to consider two types of integer programs in this talk. The simplest one to describe and I'll spend most of the time on it because it's just a little bit easier to describe it, is the integer covering problem. You have n integer variables and you have some linear constraints on them, so some covering constraints. All of the coefficients are positive and you have a positive righthand side. You can just scale all of this constraint so the coefficients are all in the range 0 to 1. You want to solve these constraints and you want to minimize some linear objective function c dot x. You can kind of think of this as a weighted generalization of set cover. In a set cover instance you are given a collection of sets and you want to find a subset of them that covers the entire space. You can kind of think of this as each element in your ground set gives you a covering constraint, but the coefficients are all just 0 or 1. In the second instance you want to find the smallest set covers so your objective function is just the sum with coefficient 1. The integer covering problem, you can have other objective functions, other weights in your objective functions and other weights in your constraints. >>: [indiscernible] I just didn't read the whole thing. >> David Harris: Sure. Another integer programming problem I want to consider is the assignment packing problem. You have variables x1 through xn but these are just kind of categorical variables so they can take on some values and some set j1 through jn or just instead of integers, for example. And you have the constraints that are linear constraints but they have the coefficient Akij, but they also have the term which is an indicator variable for whether variable i takes on value j, so this is just the Iversen notation. It's 1 if a variable i takes on value j and 0 otherwise. These are all the constraints. They are all packing constraints but you also have the additional constraint that a variable has to take at least 1 value out of some range. And you want to find just some values for the variables that would satisfy all of the constraints. There is not necessarily an objective function here. You just want to find a feasible solution. You have packing constraints, but you also have assignment constraints because you need to assign every area both one value from the set. You can't solve these problems exactly. You can approximate them and there are a lot of different ways you can talk about approximating these types of integer programs. We'll talk about mostly the integer covering problem because it's a lot simpler to describe. One scheme is you need to satisfy all of the constraints exactly, but you want to minimize the objective function. You want to get the objective function as close to the optimal one as possible. There are other types of approximations where you might approximately satisfy the covering constraints and so on, but we'll just talk about this variant where you satisfy the covering constraints exactly and you satisfy the objective function approximately. One type of approximation algorithm is based on the LP relaxation followed by randomized rounding paradigm. The first step is you replace every constraint, the variables have to be integers with the constraint that they have to be real numbers. If you do that then you get a linear program, which you can solve exactly and you can get a fractional solution and the fractional solution is value less than the optimal one. The next step is you want to find an integer valuable for the variables that makes it close to the fractional value for the objective function. We're actually going to do something a little bit more general. We're going to pick a random process with a property that for any individual variable the expected value of xi, the integer value xi, is that most some parameter beta times the fractional value x hat i. This will be true for each variable individually. In this case it's automatically true that the expected value of the objective function is that most beta times the optimal. You automatically get a beta approximation algorithm, but it's kind of an oblivious approximation algorithm because when you are running this algorithm, you don't actually need to know what the objective function is. >>: [indiscernible] >> David Harris: Yes. The xi satisfies the feasibility constraints with probability 1 and they have this expected value property, which kind of automatically gives you an oblivious approximation algorithm. At least in expectation you by just repeating this algorithm multiple times you can get very close in actuality to the expected value. So I will talk about that issue. Just getting a good approximation ratio and expectation will be good enough for us. The simplest randomized rounding scheme is you just draw the variables to be Bernoulli independently with a probability which is slightly bigger than x hat i. I'm going to assume here that all of the values of x hat i are very small, so that you can multiply them by small constant factors and you don't have to worry about them becoming bigger than 1 and not being probabilities anymore. It turns out that that's the hardest case to deal with. Reducing the general case to that is kind of cumbersome, so I won't really get into that now. I'll just assume that x hat i is small. If you do this then if you look at any individual covering constraint, the expected value of the variables x i is at least equal to alpha times ak because the fractional solution is equal to ak. You have a some of independent 0 1 variables with mean alpha ak and you want to know what the probability is that this is actually at least equal to ak in actuality, not just in expectation. >>: Greater than or equal. >> David Harris: Greater than or equal to ak, yeah. You can use this standard turnoff bound and if you do this and you set the parameter alpha to be about 1 plus log m over the minimum value of a on the right-hand side plus this square root term, you have to remember that either a min or m could be big. It's possible that a min is going to infinity. In that case, in the case that a min is very big and m is small than the square root term becomes the dominant on and you get close to approximation factor 1. If you set alpha to this value than all of the constraints are satisfied with high probability and you can show that the expected value of the xi given that the constraints are satisfied is still close to alpha so you get this same kind of approximation ratio. 1 plus this term which is like log of m over a min plus square root of that same value. This is the standard turnoff bound, standard randomized rounding. One problem with this type of approximation algorithm is that the dependence of the approximation ratio depends on the overall number of constraints, which can go to infinity as the problem size becomes big. What you often like is a scale free approximation ratio, one which does not depend on the overall size of the system, but kind of only on its structural property. One very common way of getting this in this context is in terms of the system is column sparse. That is every variable appears in relatively few constraints. There are two ways you could measure how column sparse the system is, two common ways are in terms of the L0 or L1 terms of the columns. The L0 norm is just a number of nonzero entries in every column and the L1 norm is just the sum of the coefficients in a column. And remember we have scaled all of the entries of the coefficients are in the range 0 to 1, so the L1 norm is always smaller than the L0 norm and it's possible you could have systems where the L1 norm is much smaller and these are both much smaller than m. So can you get an approximation ratio that is a function of these columns sparsity measures, not the overall system size? There is previous work by Srinivasan which gave an approximation algorithm and it was based on a random process and was analyzed using the FKG inequality. I won't get into a lot of detail with it here, but it gives you an approximation ratio that has this form. There is an error on the slide. There should be an extra term for log of a min over a min. I left it off the slide. Giving this approximation ratio. The work of Srinivasan was not based on the Lovasz Local Lemma, but the Lovasz Local Lemma is another very standard technique for getting these kinds of scale free approximation ratios. You could use that tool to get a similar approximation ratio, although that was not the approach taken by Srinivasan. Let's review the basic form of the Lovasz Local Lemma on how it would apply to this problem. In the Lovasz Local Lemma you have bad events in some probability space. In our context a bad event would be that one of our covering constraint is violated. And these bad events cover a subset of the variables. In this case the variables are the integer variables which you are drying which are Bernoulli pi and you have a separate bad event for every covering constraint, namely the some of the variables is less than A sub k which it is supposed to be. And the key property in understanding the local lemma is to decide whether bad events affect each other. And in the case of the local lemma, this would be bad events affect each other if they overlap on a variable, if there is a common variable that affects both of them. So one thing you have to be careful of in the local lemma context is there is a very binary classification of whether a variable affects a bad event or not. That is if the bad event is a function of that variable than that variable affects that bad event even if it hardly ever affects it, even if the amount of the effect is very small, the local lemma just says does this variable effect that bad event. You can imagine a system where all of the coefficients are nonzero, are all tiny. In that case every variable is affecting every constraint and so everything overlaps with everything else. This is why if you use the local lemma, you always get an approximation ratio that is phrased in terms of the L0 norm of a column. The number of nonzero entries, because an entry which is very small but nonzero is from the point of view of the local lemma affects the constraint just as much as if the coefficient are big, even though if the coefficient was very small you might think that heuristically it shouldn't really matter for that constraint. That's why you'll get these delta zero terms in the approximation ratio if you use a local lemma. All right. The local lemma by itself is not a constructive. It only shows a very small probability that you satisfy all of the constraints. It's not an algorithm all right. You can turn it into an algorithm using the framework of Moser and Tardos which turns almost all of the applications of the local lemma into constructive algorithms. You could use it for this problem just like you could use it for everything else. It would basically work like this. You begin by drawing all of your variables from their original distribution Bernoulli P and if you find some covering constraint is violated, that is the sum of variables is less than the right-hand side, a sub k, then for every nonzero coefficient a sub ki you would draw xi from its original distribution again. If a sub ki is zero you don't change its value. You just leave it alone. And if you set alpha to be that same value for the approximation ratio this algorithm converges. I just want to talk heuristically why this algorithm, even though it is kind of the generic way of transforming the local lemma, it doesn't really make sense for this problem. Suppose you come to a violated constraint, some covering constraint k is violated. If x sub i is equal to 1 then the algorithm says you still might need to resample that variable if the coefficient is nonzero. But why? If x sub i is 1 than that variable is kind of helping that constraint be satisfied. So you are kind of going in the opposite direction of progress if you are resampling it and setting it to zero, you are kind of messing that variable up. That variable is helping you. You shouldn't change it. And if x sub i is zero then probably x sub i is not really at fault for violating that constraint. You probably didn't have a very big effect on that constraint. Most of the variables were maybe expected to be equal to the zero anyway, so that variable is probably not causing that constraint to be violated. You could think of the guilty variables, the ones that are causing the constraint are the difference between the actual number of zero variables and the expected number. That's why the constraint is violated. You had fewer variables being 1 than you expected. So only about square root of them are kind of the difference between what the mean is and what you would expect to happen in a deviation. So you should only be resampling may be square root a sub k of the variables, not all a sub k of them. So the Moser Targos algorithm is really resampling way too many variables for constraint. Instead of resampling all of variables, we'll use partial resampling and this is actually a very general framework which extends the local lemma in a very general way that you can apply to many problems involving Latin transversals, packet routing et cetera. I just want to describe how this applies to the integer covering problem, integer programming problems where I don't really need to get into the full generality of the framework. For this particular application, here is how you would apply partial resampling. Again, you draw xi through xn from the original distribution. If you come to some constraint of k that's violated you do this. If xi is equal to one then you just leave it alone. It's helping you, so don't mess with it. If xi is equal to zero you resample it, but you don't draw it with the original probability pi. You draw it with a smaller probability. This probability depends linearly on the coefficient a sub ki and it also is multiplied by another scale and parameter sigma. You can see that if the coefficient is zero you never resample it just like in the local lemma. But this is kind of smoothly interpolating between a zero coefficient and a coefficient of one. Also, you can see that the value of xi is always increasing over time, so you never change a one to a zero. You only change a zero to one. This algorithm obviously terminates. So the only question is what is the expected value of xi at the end of this process because it certainly will satisfy all of the covering constraint. We will show that this satisfies this type of approximation ratio with this expected value for the expected probability that the variable is equal to one is a small multiple of the fractional value, which will automatically give us our approximation ratio. We're going to analyze this algorithm in a kind of strange way. If we come to a constraint k that is violated, the algorithm says you resample xi with probability sigma times a sub ki times pi. So instead of thinking of it as drawing xi as a Bernoulli random variable with this probability, you think of it is a two-step process. You have a set of variables, y, and each variable i goes into this set y with probability sigma times a sub ki. And then you look at all of the variables in y and you draw them as new Bernoulli variables with probability p sub i. This is obviously equivalent to this two-step process, but this kind of two-step way of thinking of things could be very important to analyzing the algorithm, even though it's kind of weird that you are breaking it apart for no real good reason. Our goal is to get an upper bound on the probability that xi is equal to one. In order to do that we are going to construct a kind of a witness which explains why you set x sub i equal to one. This witness is going to be a structure which is kind of the explanation for that variable. Then you are going to take a union bound over all possible witnesses and then the expected value of xi is that most the sum over all of these witnesses of the probability of seeing that particular witness. This is the same proof strategy for the original Moser and Tardos algorithm. But our witnesses will be much simpler than theirs. Before I talk about the witnesses for this algorithm, I'm going to try to motivate this approach by talking about how you just do witnesses for a standard turnoff bound, not any kind of algorithm resampling algorithm, just the Chernoff bounds for the lower tail. Suppose you have n independent Bernoulli p variables and [indiscernible] their mean and you want to bound the lower tail. Probably the sum of these is less than t or t is something that is smaller than mean. Consider the following process. If the sum of the variables z is less than t you mark a subset of the variables. You mark them how? If zi is equal to zero then it gets marked independently with probability sigma. Otherwise you do not mark it. Since you only have any unmarked variables at all, if the sum of the variables is less than or equal to t you have this obvious equality the probability that it's less than or equal to t is the sum over all possible subsets of 2n that v is the marked set of variables. And v you can think of it as a witness that the sum was too small. Now suppose you consider some fixed v subset of un n variables. The following are necessary conditions for v to be the marked set. First, any variable inside v had to have zi equal to zero and that has probability one minus p to the size of v. Any i and v had to be marked. That has probability sigma to the v. The third condition is that any integer which is not marked but still has zi equal to zero must have, any variable that is not in v but is equal to zero must have been on marked. Otherwise you would have put it into v. The last term, you take the product of one minus sigma over all variables which were equal to zero but were not in v. The key point here is there has to be at least n-t minus the size of v of them in order to mark anything. Otherwise, you would not have had the sum of zi less than or equal to t. This last term is at most equal to one minus sigma times to the power of n-t minus the size of v. So the overall probability that v was the marked set is at most the product of those three terms. If you some over all of v you get the probability that the sum of z is less than t is at most that expression there. That bound is a valid bound for any sigma in the range zero to one, so we can optimize it and when we do and do some further calculus we get the following expression, which is the classical Chernoff bound. You have kind of given a witness based proof for the standard Chernoff [indiscernible] bound. You should really keep this example in mind as we talk about other witnessing which are more complex. We need to give a kind of a witness bound not just for the event that the initial values for these variables failed to satisfy some constraint, but that the values of the sum variables after multiple rounds of resampling failed to satisfy some covering constraints. This is going to be a more complicated witness but the same intuition will apply. For any constraint you can list all the re-sampled sets for that constraint. Remember that we have this two-step process. First we choose a set y and then for every variable in y we draw the variables with probability p sub i. So y sub k1, y sub k2 are the re-sampled sets, the first set you draw from this two-step process. There are two ways you could have had some variable xi equal to one. First of all, in the very first step of the algorithm you could have drawn x sub i equal to one. In that case your witness structure will just be the empty list, so it will just be the null structure. The second way you could have it is during the Lth resampling of some constraint k you set xi is equal to one the first time. In that case the witness will be the list of sets y sub k1 through y sub kL and you necessarily have to have variable i inside y sub kL because otherwise the variable xi will change during that resampling. So the witness will be that list, those lists of sets. Yes? >>: You see on the [indiscernible] k is the constraint. What is the 1 to? >> David Harris: These are, you might need to do multiple rounds of resampling in order to fix the constraint. >>: [indiscernible] >> David Harris: Yes. For simplicity in order to explain this algorithm let's just say L is equal to one. We are dealing with the simplest type of structure which is this is a single set for some constraint k. You have the potential witness which is just this set. Fix some set z sub k 1. What is the probability that the actual witness that you generate for this variable is equal to this fixed value z sub k1? Y sub k 1 is a random variable. Z sub k1 is just some fixed value which is just a set, a subset of the variables. So the following events are necessary in order to have y sub k1 equal to this fixed set z. After the first resampling of that constraint k you must set x of i equal to one. That's just what we said, that was the definition of the length of the list of the witness is the time when variable xi got equal to one. Is there any variable in that set you must have set x sub j equal to zero initially, during the initial sampling of the variables. Why? If x sub j is equal to one initially then it will never be resampled and in particular it cannot be resampled during the first resampling of constraint k. Third, that set z was chosen as the first sampled set for that constraint k. Those are all necessary conditions in order to have this particular witness structure. So just writing those conditions again. The first event has probability p sub i because every time you fix a sampled set the variables are drawn independently from their distribution pi. For any variable inside z the second event has probability one minus pj and those are all independent because they are all based on the initial sampling. And for the third event if you consider this current value of the variables that the time you resampled that constraint, so not their initial value, but whatever value they had just at the time you were about to resample them. Again, if variable had value zero, then it goes into the set z with probability sigma a sub kj. If vj is equal to one then it cannot go into z. Again, letting v1 through v… And denote the current value of variables just at the time you resample k. So the probability that you set y1 equal to z is this product where it's in terms of the current value of the variables that for any variable which is equal to zero it goes into z with probability sigma times akj and if it's equal to one it definitely does not go into z. Again, you make the key observation that the constraint k is not currently being satisfied in order to have it being resampled. So the sum of akjvj is less than ak at the time it was resampled. And if you plug that in to this expression you see that this term there can be upper bounded by one minus sigma to the minus ak. If you put this all together you see that the total probability of encountering this witness structure is at most this probability which you see. And if you some now over all values, so this is just one particular type of witness structure, the simplest type that has just a single set in it. Now if you some over all possible witness structures including allowing k and L to vary, you can get this expression here. And you see that you have the sum of a sub ki, so you have the bound you can put in is the L1 norm of that column. And the last line you just have to choose sigma and alpha carefully in order to, you basically optimize them in order to minimize that expression, which is kind of involved but it's nothing too interesting. You get this dependence on sigma one instead of sigma zero because you actually have the sum of the aki. In fact, the coefficient there is one so it's not one plus constant times, that term. It's actually one plus that term, that constant for that first term is actually equal to one. So you get this bound on the expected value of any variable and that automatically gives you this same approximation ratio that we talked about before. Let's see if we can get any lower bounds on this approximation ratio. How close is this approximation ratio to optimal? For one thing when the minimum value a min goes to infinity, the approximation ratio basically is something of your one plus square root of the natural log of delta one plus one over a min. So how close is that? That's one half of the asymptotic. The other half is what happens when delta is very large. In that case you get an approximation ratio which is basically one minus some little o of one term times the natural log of delta one plus one over a min. That actually is optimal if you reduce the set cover. The hardness of the approximating set cover shows you that at least the first-order term is optimal including the optimal constant. But that kind of hardness ratio is really vacuous when a min goes to infinity. So the hardness of set cover gives you nothing when a min is large because you are saying that the hardness is some value less than one which is stupid. The approximation ratio can't be better than one. Previously there were no nontrivial bounds that were actually known in that regime when a min is much larger than delta. >>: [indiscernible] >> David Harris: Yes. Here is actually the construction of an integrality gap for the case when a min is large. You can consider the following integer covering system. You think of i and k where i is the set of variables and k is the set of constraints is vectors over gf2 to the n and you can consider a system which is defined by the sum over all i's which are perpendicular to k that should be such that k sub i is equal to zero. I left that off my slide. Of those xi that is at least equal to a. So you have 2 to the n constraints and 2 to the n variables. And the objective function is just the symmetric one where you are just summing over all of xi. This is a very simple fractional solution where you just set xi equal to a over 2 to the n minus one. And there was a previous analysis by Vazirani which was really only targeted for the case when a is equal to one, just the simplest case which showed that any interval solution has to satisfy that sum of x of i has to be at least equal to n. This kind of gives you an integrality gap on these types of integer covering systems, but this analysis… >>: [indiscernible] >> David Harris: I don't know, sorry. This analysis is not helpful when a is large because this integrality gap it claims is not even bigger than one. So one result we have is that any integral solution actually has to satisfy a stronger condition which is it has to be at least equal to 2a plus omega of n. What this basically amounts to showing is that any sparse Boolean function has to have a large Fourier coefficient. Fourier coefficient meaning over [indiscernible] 2 to the n. This shows you that this covering system actually has an integrality gap of one plus something on the order of log of delta one plus one over a min. So our approximation algorithm kind of is almost optimal. It has a square root instead of a linear dependence on that term, so it's kind of close but there is kind of an interesting polynomial gap there. But this is the first nontrivial bound at all for what happens in the regime when a min becomes large. Another kind of interesting issue to talk about with this algorithm is what happens when you have multiple objective functions? You might have some method of balancing them. Let's say you have L different objective functions and you want to minimize the max of them. Let's say you can solve the fractional relaxation of that, however you decide to balance them. If you decide to take the max of the mins then that can still be solved using a linear program. Can you get a solution in which all L objective functions are simultaneously close to their fractional values? This is one way in which other types of algorithms for set cover and the way that algorithms don't really extend. The other main algorithm for set cover is a greedy algorithm where you always choose the set which kind of increases your coverage the most in a single timestamp. But it's not even really clear how you define a greedy criterion if you have multiple objectives. I mean you kind of need to boil all of the objectives down into a single number in order to make a decision about what variable to accept and there is not any obvious way to do that. So the greedy algorithm has a real hard time even getting started for these types of multiple objective problems. The LP-based solutions can handle this much more cleanly. We can show that not only is the expected value of c dot x close to the fractional value, but it's actually pretty concentrated in a very similar way that a Chernoff bound would be concentrated. The Chernoff bound concentration around this value beta times CL dot x set. So with high probability you have that for every individual constraint CL dot x is close to beta times CL dot x set. So with high probability you get all of the objective functions that are simultaneously close to their means. This doesn't follow just from this expected value property with the variables. And the way you do this is you show basically a bound on the correlation, the bound on the product of monomials. We previously showed that the probability that any individual variable is equal to one is at most this term rho sub i, where rho sub i is equal to the probability plus some small approximation [indiscernible]. In fact, we can show that for any subset of variables you have this bound on the probability that they are all simultaneously equal to one, which is the product of the rho i's. This is basically the same as it would be if they were independently drawn as Bernoulli was probability rho i and particularly this type of monomial product property is enough to give you Chernoff upper tail bounds. The way you show that is given the set R you can build a witness for the event that all of these variables are all equal to one, not just the witness that any individual variable is equal to one. And the way you do that is for each variable that was equal to one you find out resampling of which constraint made it equal to one and you list all of the resampled sets for those constraints before the last time that that variable got equal to one. So some of these lists might appear twice because you might have two different variables which were equal to one on different resamplings of the same constraint. But that's okay. You just list both of them. And you can do a very similar thing where you some over witness structures to get this joint probability property. Another extension we could talk about is if you are given multiplicity constraints. In the statement of the problem I just said that xi has to be an integer of unbounded size. But you could also consider a version in which there are constraints on the size of the variables, not just an integer of unbounded size, but some upper bound on its size d sub i. These are called multiplicity constraints. These can be easily incorporated into the linear relaxation. They are not very easy to incorporate into a greedy algorithm, but the LP-based approach can easily put them in. But can you still get a good approximation ratio while trying to preserve these multiplicity constraints? If you just analyze the algorithm straightforwardly you will see that the solution says can be much bigger than the fractional value. So I was only talking about the case where the fractional values were all small, but it turns out that the reduction from the general case in the case where they are all small loses a lot in the solution size, possibly. Kolliopoulous and Young gave an algorithm that you can't respect them exactly. You can violate these constraints compared to the optimal solution by a 1 plus upsilon factor, and if you do so you determined the approximation ratio which is basically on the order of one over epsilon squared. If you want to satisfy them exactly you can do it, but now you get an approximation ratio that doesn't depend on A and depends on the log of delta zero. We show that if you just modify a few parameters of our algorithm, don't make any other changes. Just basically change alpha and sigma to new values, then you can still get this expected value property where not only approximation ratio is only inflated by a factor of one over epsilon. And you can see that this improves on the result of Kollioupolous in a lot of different ways. These deltas should be delta ones, so you get delta one instead of delta zero. You get one over epsilon instead of one over epsilon squared and so on. In fact, you can show a hardness on the order of natural log of delta one plus one over a min times epsilon. This is essentially optimal approximation ratio for this case when you have these types of multiplicity constraints. Now I've talked a lot about the integer covering problem. I'll talk about how to extend these types of analysis to the assignment packing problem. Recall the assignment packing problem, you have variables which take on values with some range ji and you have all these constraints which are all sums of indicator variables for these variables. And you want to find some values x which nearly satisfy the constraints. The first step is you can find a fractional solution which satisfies all of the constraints if an integer solution exists. And if you use a simple application of the local lemma plus some randomized rounding you'll get an approximation ratio that looks like this where you have delta zeros, again, the same issue that if a coefficient is nonzero but tiny, then to the local lemma it means that that variable affects the constraint. So again, you could use the Moser Tardos algorithm to solve this, to get an algorithmic form of what the local lemma gives you. And, again, you have the same issue. Suppose you come to some violated constraint. The straightforward application of the Moser Tardos algorithm would say that for any variable with a nonzero coefficient you have to resample that variable. But that doesn't make any sense heuristically because if the coefficient is tiny then that variable has almost no effect, so you probably shouldn't be resampling it. And if that variable takes on a value different than the bad one, then that variable is helping you so why are you changing it? You can use a similar type of resampling, partial resampling. If you have a violated constraint what you should do is you should choose the square root of R variables per sample. R is the right-hand side of this assignment packing constraint. And you shouldn't choose the set uniformly at random among all sets of that cardinality. The probability of choosing any set should be proportional to the product of the coefficients, the corresponding coefficients. And once you choose this resampled set you draw the variables from the original distribution. And this avoids all the problems at least heuristically of the Moser Tardos algorithm. Again, about square root of R? It's the difference between what you would expect a bad event to look like and what the expectation is. We expected number of variables that are going to be true in this constraint is about R because that's what the expected value of the sum was. And usually if the sum of random variables is bigger than its mean then it's about a standard variation of its mean, so only about square root R of the variables are kind of doing, being worse than you'd expect. Those are kind of only the guilty variables that are really causing the problem. That is kind of the intuition between why you see square root of R. >>: [indiscernible] >> David Harris: Yeah, but if you just think of it as typical bad constraint, it's probably typically bad because it's about square root of R. This is just kind of the intuition. >>: [indiscernible] depending on the violation? More variable [indiscernible] >> David Harris: No. You don't need to pick more pending on the violation. The number you pick is kind of optimized for the smallest violation. If it's a bigger violation you can just kind of ignore the extra variables. You can show that this algorithm terminates with probability one with a small number of resamplings and as long as you get a value of xi, and they don't satisfy just the original constraints exactly, but they have a small discrepancy term and it looks like this, where again, instead of just the L0 norm you get the L1 norm. If you look closely you can also see that instead of log in the second term, you've got the square root of R times log delta one, so you also are saving the square root of R times log R term which is relevant when R is much bigger than delta. So it is saving that term as well. An example application of this is a multidimensional scheduling problem. You have some machines and jobs and every job has d dimension of cost and you assign this to some machine and you want to minimize the total makespan. Not just the makespan one dimension, like time, but the makespan over all d dimensions. So you want to minimize the maximum, the sum of all the costs of that job, the maximum over all the dimensions. A simple algorithm is you can first guess what the optimal makespan is and then you can form an LP solution, an LP which is feasible and if some job costs more than this makespan, then you can't do it on that machine. So you just set that fractional variable equal to zero. Otherwise, you just allow it to be some fractional value. You can just plug this in almost directly, that fractional solution into this partial resampling and you get a solution for this makespan t ties this extra factor which is log d over log log d. So you are getting a log d over log log d approximation to the optimal makespan. This is not the only way to get this type of approximation ratio. Maybe it's not even the best, but it is certainly really simple once you have this assignment packing framework. You can basically just plug this LP directly into this assignment pack and framework almost for free. Another thing this type of analysis gives you is that you can handle cases in which you have different right-hand values. You can get an approximation ratio which kind of trades off between a multiplicative factor and an additive plus standard deviation factor. This is very difficult to do for other approaches to these types of assignment packing problems. Again, you can analyze this algorithm by building witness trees. It's very similar to the Moser Tardos analysis but the key idea is you only keep track of the resampled variables and the values they take on. If a variable was not resampled, you just ignore it when you are building the witness tree. If you have resampled three different constraints and you selected three different resampled sets you would build the witness tree in terms of just these resampled sets. You don't say anything about S2 if it's not in the witness tree. The key lemma for this is that you can show a witness tree lemma which is that if you have any particular witness tree whose nodes are labeled by the sampled sets then the probability of encountering that tree can be bounded in terms of this resampling probability times the probability distribution on the variables. It's a similar proof to the covering problem. And the key idea is that the total number of expected resampling so is at most equal to this sum over all witness trees of the probability of the tree. By only keeping track of your resampled variables you are greatly reducing the total number of witness trees because most variables are not being counted in the witness trees. So you are losing information but you are also really reducing the total number of witness trees you are considering, so the sum is over a much smaller set by ignoring those variables. And so the sum becomes smaller. So wondering whether you can continue to apply this type of partial resampling methodology to other types of integer programming problems, packing problems, mixed problems involving covering and n packing. We have bounds in terms of the L1 and L0 norms but can you also get any bounds in terms of higher norms? And also this kind of interesting question is what is the ratio for the covering integer problem when a min is becoming large? Do you have that square root term there or not? I think the linear term is the correct one. It's a guess. That's all. Thank you. [applause]. >>: [indiscernible] approach where you were using Chernoff concentration and when you have the variables that are very small then there are much better concentration equalities to use, like [indiscernible] give you better estimates than just… >> David Harris: My impression was that if you have an infinite number of very small values then the Chernoff bound is kind of correct because approach is a poisson which is basically what the Chernoff bound is. >>: The other thing is, I guess it doesn't lose [indiscernible]. It's just that you are using the Moser Tardos algorithm. Is there a way to apply the local lemma and get the same bounds? >> David Harris: I think Nick Harvey has this approach to a similar type of problem in which you basically kind of quantize the coefficients. The coefficients between a half and one you do a sampling on those and then you get some discrepancy. Then you look at the coefficients in the range of a quarter to a half and you get another discrepancy but the discrepancy is smaller. And so if you some them all up you still get some small value. So you can do that. It's kind of cumbersome to use this multistage approach and I don't know how general it is. [applause].