>> Jana Kulkarni: Welcome everyone. It's a pleasure... Today he's going to tell us about a new randomized...

advertisement
>> Jana Kulkarni: Welcome everyone. It's a pleasure to have David Harrison from Maryland.
Today he's going to tell us about a new randomized rounding algorithm based on partial
resampling lemma.
>> David Harris: This is work with me and Aravind Srinivasan about partial resampling. We're
going to consider two types of integer programs in this talk. The simplest one to describe and
I'll spend most of the time on it because it's just a little bit easier to describe it, is the integer
covering problem. You have n integer variables and you have some linear constraints on them,
so some covering constraints. All of the coefficients are positive and you have a positive righthand side. You can just scale all of this constraint so the coefficients are all in the range 0 to 1.
You want to solve these constraints and you want to minimize some linear objective function c
dot x. You can kind of think of this as a weighted generalization of set cover. In a set cover
instance you are given a collection of sets and you want to find a subset of them that covers the
entire space. You can kind of think of this as each element in your ground set gives you a
covering constraint, but the coefficients are all just 0 or 1. In the second instance you want to
find the smallest set covers so your objective function is just the sum with coefficient 1. The
integer covering problem, you can have other objective functions, other weights in your
objective functions and other weights in your constraints.
>>: [indiscernible] I just didn't read the whole thing.
>> David Harris: Sure. Another integer programming problem I want to consider is the
assignment packing problem. You have variables x1 through xn but these are just kind of
categorical variables so they can take on some values and some set j1 through jn or just instead
of integers, for example. And you have the constraints that are linear constraints but they have
the coefficient Akij, but they also have the term which is an indicator variable for whether
variable i takes on value j, so this is just the Iversen notation. It's 1 if a variable i takes on value j
and 0 otherwise. These are all the constraints. They are all packing constraints but you also
have the additional constraint that a variable has to take at least 1 value out of some range.
And you want to find just some values for the variables that would satisfy all of the constraints.
There is not necessarily an objective function here. You just want to find a feasible solution.
You have packing constraints, but you also have assignment constraints because you need to
assign every area both one value from the set. You can't solve these problems exactly. You can
approximate them and there are a lot of different ways you can talk about approximating these
types of integer programs. We'll talk about mostly the integer covering problem because it's a
lot simpler to describe. One scheme is you need to satisfy all of the constraints exactly, but you
want to minimize the objective function. You want to get the objective function as close to the
optimal one as possible. There are other types of approximations where you might
approximately satisfy the covering constraints and so on, but we'll just talk about this variant
where you satisfy the covering constraints exactly and you satisfy the objective function
approximately. One type of approximation algorithm is based on the LP relaxation followed by
randomized rounding paradigm. The first step is you replace every constraint, the variables
have to be integers with the constraint that they have to be real numbers. If you do that then
you get a linear program, which you can solve exactly and you can get a fractional solution and
the fractional solution is value less than the optimal one. The next step is you want to find an
integer valuable for the variables that makes it close to the fractional value for the objective
function. We're actually going to do something a little bit more general. We're going to pick a
random process with a property that for any individual variable the expected value of xi, the
integer value xi, is that most some parameter beta times the fractional value x hat i. This will be
true for each variable individually. In this case it's automatically true that the expected value of
the objective function is that most beta times the optimal. You automatically get a beta
approximation algorithm, but it's kind of an oblivious approximation algorithm because when
you are running this algorithm, you don't actually need to know what the objective function is.
>>: [indiscernible]
>> David Harris: Yes. The xi satisfies the feasibility constraints with probability 1 and they have
this expected value property, which kind of automatically gives you an oblivious approximation
algorithm. At least in expectation you by just repeating this algorithm multiple times you can
get very close in actuality to the expected value. So I will talk about that issue. Just getting a
good approximation ratio and expectation will be good enough for us. The simplest
randomized rounding scheme is you just draw the variables to be Bernoulli independently with
a probability which is slightly bigger than x hat i. I'm going to assume here that all of the values
of x hat i are very small, so that you can multiply them by small constant factors and you don't
have to worry about them becoming bigger than 1 and not being probabilities anymore. It
turns out that that's the hardest case to deal with. Reducing the general case to that is kind of
cumbersome, so I won't really get into that now. I'll just assume that x hat i is small. If you do
this then if you look at any individual covering constraint, the expected value of the variables x i
is at least equal to alpha times ak because the fractional solution is equal to ak. You have a
some of independent 0 1 variables with mean alpha ak and you want to know what the
probability is that this is actually at least equal to ak in actuality, not just in expectation.
>>: Greater than or equal.
>> David Harris: Greater than or equal to ak, yeah. You can use this standard turnoff bound
and if you do this and you set the parameter alpha to be about 1 plus log m over the minimum
value of a on the right-hand side plus this square root term, you have to remember that either a
min or m could be big. It's possible that a min is going to infinity. In that case, in the case that a
min is very big and m is small than the square root term becomes the dominant on and you get
close to approximation factor 1. If you set alpha to this value than all of the constraints are
satisfied with high probability and you can show that the expected value of the xi given that the
constraints are satisfied is still close to alpha so you get this same kind of approximation ratio.
1 plus this term which is like log of m over a min plus square root of that same value. This is the
standard turnoff bound, standard randomized rounding. One problem with this type of
approximation algorithm is that the dependence of the approximation ratio depends on the
overall number of constraints, which can go to infinity as the problem size becomes big. What
you often like is a scale free approximation ratio, one which does not depend on the overall size
of the system, but kind of only on its structural property. One very common way of getting this
in this context is in terms of the system is column sparse. That is every variable appears in
relatively few constraints. There are two ways you could measure how column sparse the
system is, two common ways are in terms of the L0 or L1 terms of the columns. The L0 norm is
just a number of nonzero entries in every column and the L1 norm is just the sum of the
coefficients in a column. And remember we have scaled all of the entries of the coefficients are
in the range 0 to 1, so the L1 norm is always smaller than the L0 norm and it's possible you
could have systems where the L1 norm is much smaller and these are both much smaller than
m. So can you get an approximation ratio that is a function of these columns sparsity
measures, not the overall system size? There is previous work by Srinivasan which gave an
approximation algorithm and it was based on a random process and was analyzed using the
FKG inequality. I won't get into a lot of detail with it here, but it gives you an approximation
ratio that has this form. There is an error on the slide. There should be an extra term for log of
a min over a min. I left it off the slide. Giving this approximation ratio. The work of Srinivasan
was not based on the Lovasz Local Lemma, but the Lovasz Local Lemma is another very
standard technique for getting these kinds of scale free approximation ratios. You could use
that tool to get a similar approximation ratio, although that was not the approach taken by
Srinivasan. Let's review the basic form of the Lovasz Local Lemma on how it would apply to this
problem. In the Lovasz Local Lemma you have bad events in some probability space. In our
context a bad event would be that one of our covering constraint is violated. And these bad
events cover a subset of the variables. In this case the variables are the integer variables which
you are drying which are Bernoulli pi and you have a separate bad event for every covering
constraint, namely the some of the variables is less than A sub k which it is supposed to be.
And the key property in understanding the local lemma is to decide whether bad events affect
each other. And in the case of the local lemma, this would be bad events affect each other if
they overlap on a variable, if there is a common variable that affects both of them. So one
thing you have to be careful of in the local lemma context is there is a very binary classification
of whether a variable affects a bad event or not. That is if the bad event is a function of that
variable than that variable affects that bad event even if it hardly ever affects it, even if the
amount of the effect is very small, the local lemma just says does this variable effect that bad
event. You can imagine a system where all of the coefficients are nonzero, are all tiny. In that
case every variable is affecting every constraint and so everything overlaps with everything
else. This is why if you use the local lemma, you always get an approximation ratio that is
phrased in terms of the L0 norm of a column. The number of nonzero entries, because an entry
which is very small but nonzero is from the point of view of the local lemma affects the
constraint just as much as if the coefficient are big, even though if the coefficient was very small
you might think that heuristically it shouldn't really matter for that constraint. That's why you'll
get these delta zero terms in the approximation ratio if you use a local lemma. All right. The
local lemma by itself is not a constructive. It only shows a very small probability that you satisfy
all of the constraints. It's not an algorithm all right. You can turn it into an algorithm using the
framework of Moser and Tardos which turns almost all of the applications of the local lemma
into constructive algorithms. You could use it for this problem just like you could use it for
everything else. It would basically work like this. You begin by drawing all of your variables
from their original distribution Bernoulli P and if you find some covering constraint is violated,
that is the sum of variables is less than the right-hand side, a sub k, then for every nonzero
coefficient a sub ki you would draw xi from its original distribution again. If a sub ki is zero you
don't change its value. You just leave it alone. And if you set alpha to be that same value for
the approximation ratio this algorithm converges. I just want to talk heuristically why this
algorithm, even though it is kind of the generic way of transforming the local lemma, it doesn't
really make sense for this problem. Suppose you come to a violated constraint, some covering
constraint k is violated. If x sub i is equal to 1 then the algorithm says you still might need to
resample that variable if the coefficient is nonzero. But why? If x sub i is 1 than that variable is
kind of helping that constraint be satisfied. So you are kind of going in the opposite direction of
progress if you are resampling it and setting it to zero, you are kind of messing that variable up.
That variable is helping you. You shouldn't change it. And if x sub i is zero then probably x sub i
is not really at fault for violating that constraint. You probably didn't have a very big effect on
that constraint. Most of the variables were maybe expected to be equal to the zero anyway, so
that variable is probably not causing that constraint to be violated. You could think of the guilty
variables, the ones that are causing the constraint are the difference between the actual
number of zero variables and the expected number. That's why the constraint is violated. You
had fewer variables being 1 than you expected. So only about square root of them are kind of
the difference between what the mean is and what you would expect to happen in a deviation.
So you should only be resampling may be square root a sub k of the variables, not all a sub k of
them. So the Moser Targos algorithm is really resampling way too many variables for
constraint. Instead of resampling all of variables, we'll use partial resampling and this is actually
a very general framework which extends the local lemma in a very general way that you can
apply to many problems involving Latin transversals, packet routing et cetera. I just want to
describe how this applies to the integer covering problem, integer programming problems
where I don't really need to get into the full generality of the framework. For this particular
application, here is how you would apply partial resampling. Again, you draw xi through xn
from the original distribution. If you come to some constraint of k that's violated you do this. If
xi is equal to one then you just leave it alone. It's helping you, so don't mess with it. If xi is
equal to zero you resample it, but you don't draw it with the original probability pi. You draw it
with a smaller probability. This probability depends linearly on the coefficient a sub ki and it
also is multiplied by another scale and parameter sigma. You can see that if the coefficient is
zero you never resample it just like in the local lemma. But this is kind of smoothly
interpolating between a zero coefficient and a coefficient of one. Also, you can see that the
value of xi is always increasing over time, so you never change a one to a zero. You only change
a zero to one. This algorithm obviously terminates. So the only question is what is the
expected value of xi at the end of this process because it certainly will satisfy all of the covering
constraint. We will show that this satisfies this type of approximation ratio with this expected
value for the expected probability that the variable is equal to one is a small multiple of the
fractional value, which will automatically give us our approximation ratio. We're going to
analyze this algorithm in a kind of strange way. If we come to a constraint k that is violated, the
algorithm says you resample xi with probability sigma times a sub ki times pi. So instead of
thinking of it as drawing xi as a Bernoulli random variable with this probability, you think of it is
a two-step process. You have a set of variables, y, and each variable i goes into this set y with
probability sigma times a sub ki. And then you look at all of the variables in y and you draw
them as new Bernoulli variables with probability p sub i. This is obviously equivalent to this
two-step process, but this kind of two-step way of thinking of things could be very important to
analyzing the algorithm, even though it's kind of weird that you are breaking it apart for no real
good reason. Our goal is to get an upper bound on the probability that xi is equal to one. In
order to do that we are going to construct a kind of a witness which explains why you set x sub i
equal to one. This witness is going to be a structure which is kind of the explanation for that
variable. Then you are going to take a union bound over all possible witnesses and then the
expected value of xi is that most the sum over all of these witnesses of the probability of seeing
that particular witness. This is the same proof strategy for the original Moser and Tardos
algorithm. But our witnesses will be much simpler than theirs. Before I talk about the
witnesses for this algorithm, I'm going to try to motivate this approach by talking about how
you just do witnesses for a standard turnoff bound, not any kind of algorithm resampling
algorithm, just the Chernoff bounds for the lower tail. Suppose you have n independent
Bernoulli p variables and [indiscernible] their mean and you want to bound the lower tail.
Probably the sum of these is less than t or t is something that is smaller than mean. Consider
the following process. If the sum of the variables z is less than t you mark a subset of the
variables. You mark them how? If zi is equal to zero then it gets marked independently with
probability sigma. Otherwise you do not mark it. Since you only have any unmarked variables
at all, if the sum of the variables is less than or equal to t you have this obvious equality the
probability that it's less than or equal to t is the sum over all possible subsets of 2n that v is the
marked set of variables. And v you can think of it as a witness that the sum was too small. Now
suppose you consider some fixed v subset of un n variables. The following are necessary
conditions for v to be the marked set. First, any variable inside v had to have zi equal to zero
and that has probability one minus p to the size of v. Any i and v had to be marked. That has
probability sigma to the v. The third condition is that any integer which is not marked but still
has zi equal to zero must have, any variable that is not in v but is equal to zero must have been
on marked. Otherwise you would have put it into v. The last term, you take the product of one
minus sigma over all variables which were equal to zero but were not in v. The key point here is
there has to be at least n-t minus the size of v of them in order to mark anything. Otherwise,
you would not have had the sum of zi less than or equal to t. This last term is at most equal to
one minus sigma times to the power of n-t minus the size of v. So the overall probability that v
was the marked set is at most the product of those three terms. If you some over all of v you
get the probability that the sum of z is less than t is at most that expression there. That bound
is a valid bound for any sigma in the range zero to one, so we can optimize it and when we do
and do some further calculus we get the following expression, which is the classical Chernoff
bound. You have kind of given a witness based proof for the standard Chernoff [indiscernible]
bound. You should really keep this example in mind as we talk about other witnessing which
are more complex. We need to give a kind of a witness bound not just for the event that the
initial values for these variables failed to satisfy some constraint, but that the values of the sum
variables after multiple rounds of resampling failed to satisfy some covering constraints. This is
going to be a more complicated witness but the same intuition will apply. For any constraint
you can list all the re-sampled sets for that constraint. Remember that we have this two-step
process. First we choose a set y and then for every variable in y we draw the variables with
probability p sub i. So y sub k1, y sub k2 are the re-sampled sets, the first set you draw from
this two-step process. There are two ways you could have had some variable xi equal to one.
First of all, in the very first step of the algorithm you could have drawn x sub i equal to one. In
that case your witness structure will just be the empty list, so it will just be the null structure.
The second way you could have it is during the Lth resampling of some constraint k you set xi is
equal to one the first time. In that case the witness will be the list of sets y sub k1 through y
sub kL and you necessarily have to have variable i inside y sub kL because otherwise the
variable xi will change during that resampling. So the witness will be that list, those lists of sets.
Yes?
>>: You see on the [indiscernible] k is the constraint. What is the 1 to?
>> David Harris: These are, you might need to do multiple rounds of resampling in order to fix
the constraint.
>>: [indiscernible]
>> David Harris: Yes. For simplicity in order to explain this algorithm let's just say L is equal to
one. We are dealing with the simplest type of structure which is this is a single set for some
constraint k. You have the potential witness which is just this set. Fix some set z sub k 1. What
is the probability that the actual witness that you generate for this variable is equal to this fixed
value z sub k1? Y sub k 1 is a random variable. Z sub k1 is just some fixed value which is just a
set, a subset of the variables. So the following events are necessary in order to have y sub k1
equal to this fixed set z. After the first resampling of that constraint k you must set x of i equal
to one. That's just what we said, that was the definition of the length of the list of the witness
is the time when variable xi got equal to one. Is there any variable in that set you must have set
x sub j equal to zero initially, during the initial sampling of the variables. Why? If x sub j is
equal to one initially then it will never be resampled and in particular it cannot be resampled
during the first resampling of constraint k. Third, that set z was chosen as the first sampled set
for that constraint k. Those are all necessary conditions in order to have this particular witness
structure. So just writing those conditions again. The first event has probability p sub i because
every time you fix a sampled set the variables are drawn independently from their distribution
pi. For any variable inside z the second event has probability one minus pj and those are all
independent because they are all based on the initial sampling. And for the third event if you
consider this current value of the variables that the time you resampled that constraint, so not
their initial value, but whatever value they had just at the time you were about to resample
them. Again, if variable had value zero, then it goes into the set z with probability sigma a sub
kj. If vj is equal to one then it cannot go into z. Again, letting v1 through v… And denote the
current value of variables just at the time you resample k. So the probability that you set y1
equal to z is this product where it's in terms of the current value of the variables that for any
variable which is equal to zero it goes into z with probability sigma times akj and if it's equal to
one it definitely does not go into z. Again, you make the key observation that the constraint k is
not currently being satisfied in order to have it being resampled. So the sum of akjvj is less than
ak at the time it was resampled. And if you plug that in to this expression you see that this term
there can be upper bounded by one minus sigma to the minus ak. If you put this all together
you see that the total probability of encountering this witness structure is at most this
probability which you see. And if you some now over all values, so this is just one particular
type of witness structure, the simplest type that has just a single set in it. Now if you some over
all possible witness structures including allowing k and L to vary, you can get this expression
here. And you see that you have the sum of a sub ki, so you have the bound you can put in is
the L1 norm of that column. And the last line you just have to choose sigma and alpha carefully
in order to, you basically optimize them in order to minimize that expression, which is kind of
involved but it's nothing too interesting. You get this dependence on sigma one instead of
sigma zero because you actually have the sum of the aki. In fact, the coefficient there is one so
it's not one plus constant times, that term. It's actually one plus that term, that constant for
that first term is actually equal to one. So you get this bound on the expected value of any
variable and that automatically gives you this same approximation ratio that we talked about
before. Let's see if we can get any lower bounds on this approximation ratio. How close is this
approximation ratio to optimal? For one thing when the minimum value a min goes to infinity,
the approximation ratio basically is something of your one plus square root of the natural log of
delta one plus one over a min. So how close is that? That's one half of the asymptotic. The
other half is what happens when delta is very large. In that case you get an approximation ratio
which is basically one minus some little o of one term times the natural log of delta one plus
one over a min. That actually is optimal if you reduce the set cover. The hardness of the
approximating set cover shows you that at least the first-order term is optimal including the
optimal constant. But that kind of hardness ratio is really vacuous when a min goes to infinity.
So the hardness of set cover gives you nothing when a min is large because you are saying that
the hardness is some value less than one which is stupid. The approximation ratio can't be
better than one. Previously there were no nontrivial bounds that were actually known in that
regime when a min is much larger than delta.
>>: [indiscernible]
>> David Harris: Yes. Here is actually the construction of an integrality gap for the case when a
min is large. You can consider the following integer covering system. You think of i and k
where i is the set of variables and k is the set of constraints is vectors over gf2 to the n and you
can consider a system which is defined by the sum over all i's which are perpendicular to k that
should be such that k sub i is equal to zero. I left that off my slide. Of those xi that is at least
equal to a. So you have 2 to the n constraints and 2 to the n variables. And the objective
function is just the symmetric one where you are just summing over all of xi. This is a very
simple fractional solution where you just set xi equal to a over 2 to the n minus one. And there
was a previous analysis by Vazirani which was really only targeted for the case when a is equal
to one, just the simplest case which showed that any interval solution has to satisfy that sum of
x of i has to be at least equal to n. This kind of gives you an integrality gap on these types of
integer covering systems, but this analysis…
>>: [indiscernible]
>> David Harris: I don't know, sorry. This analysis is not helpful when a is large because this
integrality gap it claims is not even bigger than one. So one result we have is that any integral
solution actually has to satisfy a stronger condition which is it has to be at least equal to 2a plus
omega of n. What this basically amounts to showing is that any sparse Boolean function has to
have a large Fourier coefficient. Fourier coefficient meaning over [indiscernible] 2 to the n.
This shows you that this covering system actually has an integrality gap of one plus something
on the order of log of delta one plus one over a min. So our approximation algorithm kind of is
almost optimal. It has a square root instead of a linear dependence on that term, so it's kind of
close but there is kind of an interesting polynomial gap there. But this is the first nontrivial
bound at all for what happens in the regime when a min becomes large. Another kind of
interesting issue to talk about with this algorithm is what happens when you have multiple
objective functions? You might have some method of balancing them. Let's say you have L
different objective functions and you want to minimize the max of them. Let's say you can
solve the fractional relaxation of that, however you decide to balance them. If you decide to
take the max of the mins then that can still be solved using a linear program. Can you get a
solution in which all L objective functions are simultaneously close to their fractional values?
This is one way in which other types of algorithms for set cover and the way that algorithms
don't really extend. The other main algorithm for set cover is a greedy algorithm where you
always choose the set which kind of increases your coverage the most in a single timestamp.
But it's not even really clear how you define a greedy criterion if you have multiple objectives. I
mean you kind of need to boil all of the objectives down into a single number in order to make
a decision about what variable to accept and there is not any obvious way to do that. So the
greedy algorithm has a real hard time even getting started for these types of multiple objective
problems. The LP-based solutions can handle this much more cleanly. We can show that not
only is the expected value of c dot x close to the fractional value, but it's actually pretty
concentrated in a very similar way that a Chernoff bound would be concentrated. The Chernoff
bound concentration around this value beta times CL dot x set. So with high probability you
have that for every individual constraint CL dot x is close to beta times CL dot x set. So with
high probability you get all of the objective functions that are simultaneously close to their
means. This doesn't follow just from this expected value property with the variables. And the
way you do this is you show basically a bound on the correlation, the bound on the product of
monomials. We previously showed that the probability that any individual variable is equal to
one is at most this term rho sub i, where rho sub i is equal to the probability plus some small
approximation [indiscernible]. In fact, we can show that for any subset of variables you have
this bound on the probability that they are all simultaneously equal to one, which is the product
of the rho i's. This is basically the same as it would be if they were independently drawn as
Bernoulli was probability rho i and particularly this type of monomial product property is
enough to give you Chernoff upper tail bounds. The way you show that is given the set R you
can build a witness for the event that all of these variables are all equal to one, not just the
witness that any individual variable is equal to one. And the way you do that is for each
variable that was equal to one you find out resampling of which constraint made it equal to one
and you list all of the resampled sets for those constraints before the last time that that
variable got equal to one. So some of these lists might appear twice because you might have
two different variables which were equal to one on different resamplings of the same
constraint. But that's okay. You just list both of them. And you can do a very similar thing
where you some over witness structures to get this joint probability property. Another
extension we could talk about is if you are given multiplicity constraints. In the statement of
the problem I just said that xi has to be an integer of unbounded size. But you could also
consider a version in which there are constraints on the size of the variables, not just an integer
of unbounded size, but some upper bound on its size d sub i. These are called multiplicity
constraints. These can be easily incorporated into the linear relaxation. They are not very easy
to incorporate into a greedy algorithm, but the LP-based approach can easily put them in. But
can you still get a good approximation ratio while trying to preserve these multiplicity
constraints? If you just analyze the algorithm straightforwardly you will see that the solution
says can be much bigger than the fractional value. So I was only talking about the case where
the fractional values were all small, but it turns out that the reduction from the general case in
the case where they are all small loses a lot in the solution size, possibly. Kolliopoulous and
Young gave an algorithm that you can't respect them exactly. You can violate these constraints
compared to the optimal solution by a 1 plus upsilon factor, and if you do so you determined
the approximation ratio which is basically on the order of one over epsilon squared. If you
want to satisfy them exactly you can do it, but now you get an approximation ratio that doesn't
depend on A and depends on the log of delta zero. We show that if you just modify a few
parameters of our algorithm, don't make any other changes. Just basically change alpha and
sigma to new values, then you can still get this expected value property where not only
approximation ratio is only inflated by a factor of one over epsilon. And you can see that this
improves on the result of Kollioupolous in a lot of different ways. These deltas should be delta
ones, so you get delta one instead of delta zero. You get one over epsilon instead of one over
epsilon squared and so on. In fact, you can show a hardness on the order of natural log of delta
one plus one over a min times epsilon. This is essentially optimal approximation ratio for this
case when you have these types of multiplicity constraints. Now I've talked a lot about the
integer covering problem. I'll talk about how to extend these types of analysis to the
assignment packing problem. Recall the assignment packing problem, you have variables which
take on values with some range ji and you have all these constraints which are all sums of
indicator variables for these variables. And you want to find some values x which nearly satisfy
the constraints. The first step is you can find a fractional solution which satisfies all of the
constraints if an integer solution exists. And if you use a simple application of the local lemma
plus some randomized rounding you'll get an approximation ratio that looks like this where you
have delta zeros, again, the same issue that if a coefficient is nonzero but tiny, then to the local
lemma it means that that variable affects the constraint. So again, you could use the Moser
Tardos algorithm to solve this, to get an algorithmic form of what the local lemma gives you.
And, again, you have the same issue. Suppose you come to some violated constraint. The
straightforward application of the Moser Tardos algorithm would say that for any variable with
a nonzero coefficient you have to resample that variable. But that doesn't make any sense
heuristically because if the coefficient is tiny then that variable has almost no effect, so you
probably shouldn't be resampling it. And if that variable takes on a value different than the bad
one, then that variable is helping you so why are you changing it? You can use a similar type of
resampling, partial resampling. If you have a violated constraint what you should do is you
should choose the square root of R variables per sample. R is the right-hand side of this
assignment packing constraint. And you shouldn't choose the set uniformly at random among
all sets of that cardinality. The probability of choosing any set should be proportional to the
product of the coefficients, the corresponding coefficients. And once you choose this
resampled set you draw the variables from the original distribution. And this avoids all the
problems at least heuristically of the Moser Tardos algorithm. Again, about square root of R?
It's the difference between what you would expect a bad event to look like and what the
expectation is. We expected number of variables that are going to be true in this constraint is
about R because that's what the expected value of the sum was. And usually if the sum of
random variables is bigger than its mean then it's about a standard variation of its mean, so
only about square root R of the variables are kind of doing, being worse than you'd expect.
Those are kind of only the guilty variables that are really causing the problem. That is kind of
the intuition between why you see square root of R.
>>: [indiscernible]
>> David Harris: Yeah, but if you just think of it as typical bad constraint, it's probably typically
bad because it's about square root of R. This is just kind of the intuition.
>>: [indiscernible] depending on the violation? More variable [indiscernible]
>> David Harris: No. You don't need to pick more pending on the violation. The number you
pick is kind of optimized for the smallest violation. If it's a bigger violation you can just kind of
ignore the extra variables. You can show that this algorithm terminates with probability one
with a small number of resamplings and as long as you get a value of xi, and they don't satisfy
just the original constraints exactly, but they have a small discrepancy term and it looks like
this, where again, instead of just the L0 norm you get the L1 norm. If you look closely you can
also see that instead of log in the second term, you've got the square root of R times log delta
one, so you also are saving the square root of R times log R term which is relevant when R is
much bigger than delta. So it is saving that term as well. An example application of this is a
multidimensional scheduling problem. You have some machines and jobs and every job has d
dimension of cost and you assign this to some machine and you want to minimize the total
makespan. Not just the makespan one dimension, like time, but the makespan over all d
dimensions. So you want to minimize the maximum, the sum of all the costs of that job, the
maximum over all the dimensions. A simple algorithm is you can first guess what the optimal
makespan is and then you can form an LP solution, an LP which is feasible and if some job costs
more than this makespan, then you can't do it on that machine. So you just set that fractional
variable equal to zero. Otherwise, you just allow it to be some fractional value. You can just
plug this in almost directly, that fractional solution into this partial resampling and you get a
solution for this makespan t ties this extra factor which is log d over log log d. So you are
getting a log d over log log d approximation to the optimal makespan. This is not the only way
to get this type of approximation ratio. Maybe it's not even the best, but it is certainly really
simple once you have this assignment packing framework. You can basically just plug this LP
directly into this assignment pack and framework almost for free. Another thing this type of
analysis gives you is that you can handle cases in which you have different right-hand values.
You can get an approximation ratio which kind of trades off between a multiplicative factor and
an additive plus standard deviation factor. This is very difficult to do for other approaches to
these types of assignment packing problems. Again, you can analyze this algorithm by building
witness trees. It's very similar to the Moser Tardos analysis but the key idea is you only keep
track of the resampled variables and the values they take on. If a variable was not resampled,
you just ignore it when you are building the witness tree. If you have resampled three different
constraints and you selected three different resampled sets you would build the witness tree in
terms of just these resampled sets. You don't say anything about S2 if it's not in the witness
tree. The key lemma for this is that you can show a witness tree lemma which is that if you
have any particular witness tree whose nodes are labeled by the sampled sets then the
probability of encountering that tree can be bounded in terms of this resampling probability
times the probability distribution on the variables. It's a similar proof to the covering problem.
And the key idea is that the total number of expected resampling so is at most equal to this sum
over all witness trees of the probability of the tree. By only keeping track of your resampled
variables you are greatly reducing the total number of witness trees because most variables are
not being counted in the witness trees. So you are losing information but you are also really
reducing the total number of witness trees you are considering, so the sum is over a much
smaller set by ignoring those variables. And so the sum becomes smaller. So wondering
whether you can continue to apply this type of partial resampling methodology to other types
of integer programming problems, packing problems, mixed problems involving covering and n
packing. We have bounds in terms of the L1 and L0 norms but can you also get any bounds in
terms of higher norms? And also this kind of interesting question is what is the ratio for the
covering integer problem when a min is becoming large? Do you have that square root term
there or not? I think the linear term is the correct one. It's a guess. That's all. Thank you.
[applause].
>>: [indiscernible] approach where you were using Chernoff concentration and when you have
the variables that are very small then there are much better concentration equalities to use,
like [indiscernible] give you better estimates than just…
>> David Harris: My impression was that if you have an infinite number of very small values
then the Chernoff bound is kind of correct because approach is a poisson which is basically
what the Chernoff bound is.
>>: The other thing is, I guess it doesn't lose [indiscernible]. It's just that you are using the
Moser Tardos algorithm. Is there a way to apply the local lemma and get the same bounds?
>> David Harris: I think Nick Harvey has this approach to a similar type of problem in which you
basically kind of quantize the coefficients. The coefficients between a half and one you do a
sampling on those and then you get some discrepancy. Then you look at the coefficients in the
range of a quarter to a half and you get another discrepancy but the discrepancy is smaller.
And so if you some them all up you still get some small value. So you can do that. It's kind of
cumbersome to use this multistage approach and I don't know how general it is. [applause].
Download