>> Nikhil Devanur Rangarajan: Hi everyone. It's my... from Columbia University who has worked on fair packing problems...

advertisement
>> Nikhil Devanur Rangarajan: Hi everyone. It's my pleasure to welcome Jelena Marasevic
from Columbia University who has worked on fair packing problems and fast distributed
algorithms. How are you Jelena?
>> Jelena Marasevic: Okay.
>> Nikhil Devanur Rangarajan: She's here until Friday in case you haven't met her yet, send an
e-mail.
>> Jelena Marasevic: Thank you for the introduction. I'm a PhD student at Columbia University.
This is joint work with Cliff Stein and Gil Zussman. The work is on fair resource allocation and I
will explain very soon what it is. Let's start first with some applications. When it comes to fair
resource allocation, the most well-known problems are those coming from network congestion
controller, network crate control. Indeed, this is where fairness problems have been really
mostly studied. But there are many other applications where we care about fairness. More
recently there are problems in resource management and resource allocation in data centers.
You can pose almost any operations research problem as also a fair resource allocation
problem. I'm showing healthcare scheduling, but there are also other instances of problems,
like air traffic control or allocation [indiscernible] organs et cetera. One instance where fairness
problems arise and it is not immediately clear that there exists a connection, our market
equilibria problems and I will actually make this relationship more clear as we get towards the
end of the talk. Let me start with the example where we care about fairness to a different
extent. This is a classical example that is very often used in classes and communication that
works. The setting is we have one very long path that uses n-1 capacitated links and we have n1 short paths shown here in orange that use only one capacitated link and each link has a unit
capacity. The question we want to ask is how do we allocate flows in this network? How do we
allocate rates to these routes? One thing is if we are doing anything reasonable we would do
the same rate to each of the short routes because they just see the same constraints. They see
the same condition. One objective we may want to have is just to maximize efficiency. In this
case, efficiency means maximize the sound throughput. Take the most out of the network. If
we were to do that we would give zero units of flow the long row and one unit of flow to each
of the short routes. This gives us n-1 efficiency but it is very unfair especially the long row user
was actually paying something to get something sent through this network. On the other hand
if we really wanted to be as fair as possible, one thing that is not too difficult to see is that in
that case we would just give everyone half a unit of flow. But what happens in this case to our
efficiency is that we have lowered it down by a factor of about 2.
>>: When you say maximum fairness, what is that?
>> Jelena Marasevic: I know it's a little bit vague right now, but you can see here that you're
actually using all of the links here to the maximum extent and you're giving everyone an equal
amount of flow. The third type of objective that we can think of is if we are very fair, efficiency
goes down. If we are very unfair our efficiency is better, so maybe we should want to look at
some trade-offs. Intuitively, the long routes is using too many links. It uses too many
resources, so maybe they should get something proportional to what they use. In that case we
could assign one over n units of flow through the long row. n-1 units of flow to the short routes
and our efficiency would actually at least asymptotic we close to the maximum efficiency. If we
were to somehow parameterize fairness on a scale from zero to infinity then talking about the
previous problems that I've shown you, the first problem should really get zero, because it has
no fairness guarantees. The second allocation should be somewhere in this vicinity because it
was as fair as possible and the third one should be somewhere in the middle. This is just like
intuitively speaking. The questions of measuring fairness or measuring inequality are actually
not very new. Back in 1997 there were questions asked about measuring inequality but in
terms of inequalities of income distributions. The problem there was to rank different income
distributions. The parameter also was called inequality aversion parameter. The functions that
appear in this kind of work are a little bit curious and I'll tell you why on the next slide. 30 years
fast-forward in the mean of network congestion control there is a definition of alpha fairness.
What is nice about this definition is the lemma that that says there exists a family of concave
objectives that we can maximize and actually reach this alpha fair resource allocation. These
functions are the same functions as I have shown you on the last slide. They may not be very
easy to remember. The way to think about them is as functions whose derivative is 1 over x to
the alpha. A very high level intuition is that as x goes to 0 a derivative goes to infinity so you
really want to push all the allocations away from zero. As alpha gets larger you really push the
allocations away from zero to a larger extent. For the three examples that I mentioned at the
beginning, the first example is actually the case of alpha equals zero. It is called utilitarian. In
this case we only have linear objectives. The second case is known as max-min fair. Allocation
which is the most utilitarian way of adding resources and the third example is known as
proportional fairness and it happens when alpha equals one. The intuition about trading off
efficiency and fairness is not only an intuition. There has actually been work in the last couple
of years that quantifies this trade-off under different metrics. The problem that this talk is
about is alpha fair packing. It is a class of problems where the feasible region is a polytope
determined by positive linear constraints. The problems with such feasible region have been
really extensively studied for linear objectives. But not as much, at least with convergence
guarantees for this more general objective. These objective functions are concave so in
centralized manner we know how to solve this in polynomial time. The focus of this talk is to
look into distributed algorithms that have asynchronous updates. This is something that arises
very often in practice and even if we didn't have a really distributed setting we could parallelize
computations and get really fast algorithms. Yes?
>>: If alpha is zero this would imply the maximizing of the sum of the laws, not the sum which
on the last slide.
>> Jelena Marasevic: If alpha is zero…
>>: [indiscernible]
>> Jelena Marasevic: When alpha equals one you are maximizing the sum of the logs. Alpha
equals zero is the linear objective, yes. The main result that I will talk about is an epsilon
approximation algorithm is very robust because it has many nice properties. It can run in
distributed fashion. It allows asynchronous updates. It only reacts with the current state of the
network. It makes local updates. It can start from any initial state, which also means that it is
full tolerant and it can allow for a constant number of variables and constraints, insertions or
deletions. The convergence time of the algorithm is poly-logarithmic in the input size and
polynomial accuracy parameter epsilon. I will tell you more exactly what the convergence time
is later after I introduce some notation. What I should point out here is that this is probably
asymptotically that the dependence you should expect for this type of algorithm because at
least for linear programming in this setting there are lower bounds. Looking at related work,
when it comes to maximum fairness there has been a lot of work. One thing to note here is
that when we have maximum fairness these problems are not anymore convex optimization
problems. Where we get is a multi-objective problem where we're looking at the whole
specter. These problems typically have more combinatorial structure. At the other end of our
line there is working on packing linear programs. Most relevant to this talk is the work by
Awerbuch and Khandekar. But of course if we have linear programming we can only support
linear objectives. There is no straightforward extension of these results to the alpha fair
setting. In terms of work on the network congestion control there has been a lot of work in the
last almost 20 years. Most relevant to this talk is work by Kelly. What is interesting about this
line of work is that there is no guaranteed convergence time as a function of input. The
convergence time that is shown for these algorithms, they are usually continuous time
algorithms, is that you reach the optimal solution after some finite time, but there is no
guarantee that this happens in polynomial time as a function of input. More recently, there has
been work on network utility maximization. This work can solve the problem I talked about
after some scaling, but in general leads to convergence time that is at least linear in the input if
not even polynomial. Talking to Nikhil, we have actually observed in one of the cases, actually
alpha equals one, is equivalent to the problem of market equilibrium in Eisenberg-Gale markets
with utilities and there has been work from Stalk [phonetic] 2013. What we observed there is
the dependence on epsilon of this type of algorithm is better than this work, but the
dependence on input is worse, so it is at least linear, whereas here it is only logarithmic. I will
move now to talking about model in some of the preliminaries. First of all, this just makes the
analysis easier. It's a very tender thing to do in linear programming just to scale all of the
constraints how you scale them. First, you divide both sides of each constraint by the righthand side to get one on the right-hand side. You divide all of the constraint matrix elements by
the minimum nonzero element and you scale all of the variables by the same amount. What
you get when you do that is that actually in the scale problem any nonzero entry of the matrix
is at least one. What this gives you is that for any feasible solution your variables must be
between zero and one. Does everyone see this? Why can we do this? We actually showed
that this preserves the approximation guarantees. If you want to scale back you will have the
same approximation guarantees and the scaling is really just for the purpose of the analysis.
The algorithm can be around on the original instance.
>>: When you say approximation, do you mean the objective as well as a constraint?
>> Jelena Marasevic: What I will show later is that when the algorithm runs, the constraints
never get violated unless they were initially violated. But after polylog number of rounds they
are always satisfied. What I mean is that the guarantee on the objective function is the same,
so if multiplicative you get the same multiplicative. If additive you get the same additive. The
model of distributed computation of… I will talk here in terms of sellers and buyers just for
people who have worked in some market problems it will be easier to capture the main idea.
For every variable we say that there is a node associated with that variable and we can call it a
buyer. For each constraint there is also a node and we can call it the seller. Between the
variable and the constraint there is an edge if the variable participates in that constraint, if it
has a nonzero coefficient for such pairs. And adds a coefficient over an edge equal to the
coefficient with which this variable appears in a corresponding constraint. The type of
information that nodes have is every buyer, but we need on the variable side we need upper
bounds on the global information. The information that this collected in each round is the price
of the constraint variables. It would actually be enough just to collect the relative congestion.
The variables collect information only from those constraints in which they participate. They
don't need global information. For the constraint our seller sets the price that is a function of
the global problem parameters and of the relative slack or relative congestion. I'm calling it
relative in the non-scale problem it would be relative. Here it is just [indiscernible] slack. Is this
model clear? If we were in natural congestion control problems, what this would mean is that
we would need to in each round each node would need to collect the relative congestion on
each of the links it uses on their path. Important for the analysis are certain KKT conditions.
Just to get clear what the notation is with each constraint, we will associate Lagrange multiplier
I will just refer to duel variable. If you ride the KKT conditions they are just a standard thing.
You have primal feasibility, dual feasibility, complementary slackness and the fourth condition
you get when you maximize the Lagrangian. This fourth KKT condition will actually be the most
important one for the algorithm. Let me tell you what the algorithm is, but before I get there I
want to give you some intuition. I'm writing at the top the KKT condition that I said would be
the most important one. There are two algorithms that seemingly won't look so similar. On
the left is the algorithm by Kelly. It's all from 1998. It is for a particular instance of alpha
fairness. It is for alpha equals one, known as proportional fairness. The algorithm is a
continuous time algorithm. How it works is you describe all of the updates in the network by a
system of differential equations. One thing to notice here is that the updates over the variables
are actually guided by any slack in this KKT condition. The dual variables are chosen as some
unspecified monotonically increasing function, so for a dual variable i is the left-hand side
constraint i. In linear programming we need some proper initialization. We start with some
reasonable solution. This algorithm actually has discrete updates. What happens in each round
is that the dual variable is set as exponential function of this relative slack of the corresponding
constraint. What the algorithm does whatever the KKT condition on the topic is not satisfied, it
makes multiplicative updates to get closer to making it satisfied. This is not completely
multiplicative because there is a stepped increase delta j affects [indiscernible] are very small,
because if they were zero we wouldn't be making any progress. When there is a decrease there
is a multiplicative decrease. The convergence of these two algorithms was shown, and for the
first one there was no really dependence on the input side. It's just a finite time conversion. A
certain potential function was used called the [indiscernible] function. It's just a bounded
monotonically increasing function. If you look at the potential function for the linear
programming the first term looks similar and the second term not so much. In linear
programming is quite standard to choose dual variables as exponential function of this
constraint slack. If you use a similar idea for alpha equals one here, if you choose an
exponential function here and you plug this in to the potential function you get a function of
the same form. This was one of the first observations that we made when we started working
on this problem and this somehow gave us the intuition that we can get good conversions to
discrete updates with something that looks similar to the linear programming algorithm. The
algorithm is indeed very similar to the linear programming algorithm. We don't need a real
initialization. We only need to restrict each variable to some domain between delta j and one.
If the variable for any reason goes outside of this domain we just put it back. The choice of the
dual is only slightly different. We have some c in front of the exponent. One difference does
not seem so important, really, is that when we make a decrease it is not always multiplicative.
Because we are setting this lower threshold, we may be making a decrease that is smaller than
multiplicative, whereas, in linear programming is always at least as large as multiplicative. This
actually raises one challenge in the analysis. One thing to notice here is that for linear
programming the variables are between zero and one. In the algorithm here they will be
between some delta j and one.
>>: Since the difference between putting the max here versus there is whether to bump it up
to delta in the current step or the next step. [indiscernible] logarithm down a little bit, but in
the next step will be back [indiscernible].
>> Jelena Marasevic: Yeah, but there is a reason why you cannot really do a step increase here.
Because your KKT condition looks different, what you can show here by the stepped increase is
that the value of the left-hand side does not change significantly. It changes by some small
multiplicative factor, however, regardless of how the variables update. In this case you would
lose that.
>>: [indiscernible]
>> Jelena Marasevic: You will make a step increase, but your entries here are allowed to
become as small as possible. It doesn't mean they will go below delta. They will go back.
>>: [indiscernible] because in the next up they will go back?
>> Jelena Marasevic: Not necessarily. They can keep going down.
>>: [indiscernible] this is a standard one [indiscernible] right? I don't understand. I just want
to [indiscernible]. I don't know this paper by [indiscernible]. Did they give a [indiscernible]
analysis for it?
>>: [indiscernible]
>> Jelena Marasevic: They get a very robust algorithm. They get all of these properties of self
stabilization. They don't really get self stabilization but they get statelessness. They get
solution is always feasible as long as the algorithm runs.
>>: If you give up they will give you something about the average. Really you don't want
[indiscernible]
>>: [indiscernible]
[multiple speakers] [indiscernible]
>>: But still, that's what I'm saying. That's the basic [indiscernible] for the last 20 years.
[multiple speakers] [indiscernible]
>>: Maybe you will tell me later on why, I guess, but this…
>>: That is I think one of the things that may be was [indiscernible].
>>: [indiscernible] maybe the constraint…
>>: No.
>> Jelena Marasevic: It gave somehow the right intuition what is going on in terms of the
potential function. I don't remember…
>>: [indiscernible]
>> Jelena Marasevic: It is not. In this case you have the scaling here for the linear rhyming and
for more a more general output it is not really as well kept.
>>: [indiscernible] case of [indiscernible] comes from max min, right? Can you rewrite… So
everything here is, can be rewritten in terms of derivatives of the soft max of the [indiscernible]
in some sense? I mean the exponential potential subsets. Maybe we can talk about this later.
>>: I was wondering, maybe it was being [indiscernible]
[multiple speakers] [indiscernible]
>> Jelena Marasevic: It was just an intuition. The algorithm, once again, the way to think about
it is we are trying to satisfy this KKT condition which is duals as the exponential functions of the
constraints. We look at the value of the left-hand side of the KKT condition. If it is somewhere
around the right-hand side, this is just a fraction of epsilon. If it is close enough we don't do
anything. If it is far enough, if it is like much smaller then we increase the xj to get closer in the
interest of multiplicatively. If it is larger we decrease xj multiplicatively unless it goes below this
threshold we have set. I will quickly tell you what the algorithm parameters are. They are a
little bit complicated. I don't think that anyone should think about them for the rest of the talk
too much, but just if you want to get the sense of what they look like. Some notation, this is
really what you would expect from the notation, so w max is the maximum weight; w min is the
minimum weight and A max is the maximum element of the matrix. The parameters are delta
j's are really complicated things. There is a motivation for the choice of delta j on the next slide.
We actually proved some lower bound that is each component of this allocation vector takes.
But we end up choosing something that is much looser or at least [indiscernible] looser for
technical reasons later.
>>: I just wanted, for [indiscernible] also was one, right?
>> Jelena Marasevic: It was zero. It was zero so you didn't even have this xj.
>>: So in that case what are you setting your delta j to be?
>>: Is a polynomial? What is it?
>> Jelena Marasevic: You cannot go all the way down to linear programming with this. You
need to be a little bit bounded away from zero, because the lower down goes to zero.
>>: [indiscernible] in the parameters of your input, are they exponentially like…
>> Jelena Marasevic: You are polynomially bounded but there is a dependence on the one
overall for, so off I cannot go all the way down zero. That's one catch. So the c that multiplies
the exponent is conveniently chosen. If you look at delta j, the only difference in terms of j is
just this wj. When you raise the whole thing to the alpha and you divide by delta j you get the
same thing. Cap is just one over epsilon times the log of the input. Gamma is epsilon over 4
and theta that determined this multiplicative updates that are one plus theta one minus theta
are conveniently chosen so that the left-hand side of the KKT condition does not change too
much. It changes by a factor of one plus minus gamma over four. We actually prove that if
some x* optimally [indiscernible] alpha fair packing then each element affects is bounded from
below as a function of the input. Of course, you don't need to grasp like all the lecture here in
this equation. If you tried plotting this as a function of alpha, it is actually a continuous
function. I don't know if the bound is tight, but the bound changes quite dramatically between
zero and one. When alpha is greater than one there is much less change and as alpha goes to
infinity you get something that is roughly 1 over n a max square. And i is number of nonzero
elements in ith constraint. Just talking about the order of A max here. Let me move to the
more fun part, convergence analysis. I'll give you a very high level overview of what happens.
The first thing we show is that if we started from an infeasible solution we get a feasible
solution fast. If we were at the feasible solution already, the algorithm will not make the
solution infeasible in any round. The second condition we get for free just by choice of the
duals. The third condition for complementary slackness show with holds in approximate and
aggregate sense after some polylog number of rounds and that is actually sufficient. These first
three KKT conditions are in some sense preliminary this. The most work happens about
showing some things about this fourth one. The way the proof of convergence works we
choose a bounded nondecreasing potential function and you won't be surprised what it is.
Then we define some stationary intervals and show that if we are in a nonstationary interval,
then the potential increases significantly. If we are the solution is epsilon approximate. The
first lemma says if we started with a feasible solution, we remain feasible and I want to go just
quickly over proof because it is relatively simple. It appears in a similar form in Awerbuch and
Khandekar this one and something that will appear in two slides from now. What I want to
point out if I gave you the parameters you could go line by line and get the same proof. But
one of the challenges is really finding these parameters and making them work. The proof
works as follows. You select the first round in which the solution becomes infeasible. Some
notation, we just denote by x0x right before the updates that made it infeasible the next one
right after the update. Now the only way the solution could have become infeasible by the way
the algorithm works the only constraints that we could violate our the packing constraints. This
became greater than one. For this to happen at least one variable that appears in the
constraint has got to increase. How would it otherwise have gotten larger than one? For the
variable to increase we just by the way the algorithm works, we had to have this for the KKT
condition. From one round to another the way that multiplicative updates are chosen, this
term can increase by a factor of at most one plus gamma over four. Combined with the
previous slide this gives you something that is actually even strictly less than omega j. On the
other hand, if you want to bound this from below, you just select one term and then you
choose the term in which you have the dual that corresponds to the constraint that got
violated. Since we did the scaling Aij since it's nonzero is greater than or equal to one so we can
take it out. We have that xja must be greater than or equal to delta j, again, by the way the
algorithm works and we just write out the yi's there the way we chose the this is where we
need the delta j not to be zero. This thing is at least the wj. The constraint got violated, so this
is greater than zero so the whole thing is greater than wj and we got the contradiction.
>>: [indiscernible] delta j. That's why it's not a problem in your program because alpha is
[indiscernible]?
>> Jelena Marasevic: Yeah, because it doesn't show up there. The next lemma shows if we
started with an infeasible solution we reach a feasible solution relatively fast. I don't want to
get into all the details of the proof but just how it works is after at most one round if this effect
is what violated the feasibility it becomes positive after at most one round. So the only thing
that could get violated, that remain violated our the packing constraints. The things we show
here is that none of the variables that appear in this constraint decrease, so they cannot bring
this down. All the variables that appear, this should be actually increase. None of the variables
that appear increase. All of the variables that appear in there are greater than one over n a
max decrease. Since they decrease they are large enough to decrease multiplicatively after just
log base one minus theta. In one or n max they will get down below one. So combined with
previous lemma we get that after some point the solution is always feasible. Another thing to
point out is that you don't really have this in linear programming and it is not too difficult to
show it that there you need to start from a feasible solution to remain feasible. For the
complementary slackness our third KKT condition we show that after some polylog number of
rounds we get that these codes in approximate and aggregate sense and so the actual all the
KKT conditions are written for each yi, so just noticed that this is written over the sum of yi's
and it is approximate. So what is left to deal with is just the famous for the KKT condition. Let
me tell you what the potential function is. This is just a reminder of what the KKT condition
that I was mentioning was, what the algorithm was and what [indiscernible] are. The potential
function is just a more general version of what we had for the algorithms I gave us intuition.
The intuition about why this potential function makes sense is that if you look at partial
derivatives with respect to xj's and you just group this conveniently when you get here is just a
slack of your fourth KKT condition. That's what guides the updates. If some xj increases it must
be because this term is actually positive, so the potential function increases. If some xj
decreases, it must be because this term is actually negative and the potential function increases
once again. The idea is that whatever you do, wherever updates you make throughout the
algorithm execution, potential function never decreases. The main idea for the rest of the
proof is since we have the fact that each xj is bounded in some interval, what you can show is
that there are bounds also for the potential function. They may be polynomially large, this gap
may be polynomially large even exponential in alpha, but it is bound. The algorithm makes
updates as long as at least one KKT condition is not approximately satisfied. If you want to
analyze this as long as algorithm makes updates it may take a very long time before algorithm
stops making updates. Actually the convergence is well the algorithm may have actually
converged before it stopped making updates. The type of the convergence that we get is that
after at most polylog number of rounds at least one round holds and epsilon approximate
solution and the total number of rounds where we don't have and epsilon approximate solution
is bounded by the same term.
>>: [indiscernible] forever, so [indiscernible] you ran for this long and then after this
[indiscernible]?
>>: So you cannot make anything over anyone particular around?
>> Jelena Marasevic: Yes. I mean you can also ask a question why don't we stop after we reach
this state. So if you are running algorithm in parallel you could, but here you don't have a
global coordination.
>>: What do you mean by [indiscernible]
>> Jelena Marasevic: You'll see in the next slide. For alpha less than one it's one plus epsilon
multiplicative in this many rounds. We just need to make epsilon small enough. For alpha
equals one w is sum of weights times epsilon. And this many rounds here why is it one minus
epsilon alpha? In this case the objective is actually negative always. I intentionally didn't put
alpha in the convergence time bound. You will see in the next slide that you don't really want
alpha to be very large or at least you shouldn't expect for a very large alpha to have a very fast
algorithm. If you look at what these functions look like for different values of alpha, this is why
there are three proofs for these three cases of alpha. When alpha is zero it is just a linear
function. As alpha increases to one this function, you know, it becomes a little bit more curved.
The gradient becomes larger and larger like closer to zero. As you get really close to one it goes
all the way up to the infinity. So it has the same shape as alpha equals one but it is translated
all the way up to infinity. For alpha greater than one, again, you have the same shape of the
function as in alpha equals one but as you approach alpha from above this function goes all the
way down to minus infinity. What happens and the video will show alpha increasing from
something close to 1 to 100 in steps of one. What happens is this function becomes really,
really steep really, really fast. When alpha equals 100 it almost looks like a step function. This
is the reason why at least using some of the conventional methods you wouldn't really expect
to have a very fast algorithm because at least for the first quarter methods you need either that
the gradient is bounded or that the gradient doesn't change too much. I'll do a quick proof
sketch for alpha equals one. I should have mentioned that some of the preliminary results are
in archive. They don't contain this part. This is in part one is the reason I am talking about this.
But they will be posted relatively soon. A heads up, I will assume I start from a properly
initialized solution. I don't really need this. It is possible to extend the proof to start from any
initial solution, but I just don't want to complicate things too much. Since potential never
increases, where he start from is actually the minimum potential. Delta j is really small so you
have to have a huge slack in each of the dual variables. You will get something that the order
some of these weights times log. The maximum potential happens where is bounded from
above by x when x is one. What you get in that case is just zero. The total increase in the
potential is some of the weights times log. To get the convergence bound if you want we need
to show that between each stationary interval the increase, we will actually here have
stationary rounds, that the increase is this w possibly times some polynomial in epsilon and
possibly over some polylog of the input. This is what we are shooting for. One thing that we
need to show is, I won't go over the proof, but I have mentioned at the beginning that there is a
problem when we are not making multiplicative updates. I will call the variables that they are
close to delta j. If we were to make decrease we wouldn't do it by a multiplicative one minus
theta. I would pull those variables small. The other variables I would pull large. What the next
lemma says is that the increase in the potential due to decrease of small variables is dominated
by the increase in the potential due to decrease of large variables. So whatever small variables
do is dominated by what the large variables do. Is this clear? I'm stating that this one lemma is
at least two just to give you the actual idea. For the increase in the potential we show the
following results. x plus is just those xj's that increase and we have like the same notation for
before and after an update. We showed this results so let me just tell you that here if we have
a large gap in our KKT condition, if the gap is at least one +2 gamma we would have gamma
times this increase. We will see it soon. Let me remind you that the sum of weights is capital
W. We define a stationary round in this way. Why? One thing we show is that actually these
stationary rounds are going to give a large potential increase. If a round is nonstationary, in
case this first part of the definition is violated, we need both to hold for the definition to be
valid. What happens then is that we just get that the potential is greater than or equal to w
over tau. I will remind you what tau is in a bit. It's the second condition doesn't hold then just
by combining these two things we get that the increase is actually gamma times w. I will
remind you that tau is just the order of log squared over epsilon squared. What we get in the
nonstationary round is that we have indeed a large increase in the potential and if you recall
what our whole increase in the potential was combining these two things, you will actually get
polylog in input over polynomial in epsilon. This is just the main idea. Is it clear? Another thing
that I'll tell you is how we show that the solution is actually epsilon proximate. We show that
by looking at the duality gap we show that in each stationary round this is bounded by some
constant times epsilon times capital W. The right-hand side term is bounded by using
approximate complementary slackness which was one of the preliminary lemmas and the
second part was the stationary round definition. The left part is bounded by using part one of
the stationary round definition and actually a lower bound than this term that I won't get to
going to, but this is just what the main idea is. In the rest of the time I guess I have only a few
more minutes. I just want to get to the relation between these sorts of problems and some of
the markets. I expect most of you know what Fisher markets are, but I will nevertheless go over
it. In Fisher markets we have buyers on one side and goods on the other. There are bars are
index A by j and goods by i. The xij is the amount of goods i allocated to a buyer j. Every buyer
has some money. It gets some utility over a bundle that gets allocated to them and there are
some prices of goods. The markets equilibrium of these problems is captured by EisenbergGale conducts program that looks kind of similar to our alpha equals one case. Eisenberg-Gale
markets were introduced in 2007 by Vijay Vaziriani. They are just a generic generalization of
Fisher markets where a buyer may be interested only in a subset of goods and for some subset
of goods may want goods in some specific ratios. These are actually just the markets that solve
Eisenberg-Gale type convex program which is just a more general convex program than the
previous one. If you want to interpret alpha equals one case as a market you could do it by
choosing linear utilities. We have that each buyer wants only a specific subset of goods and in
specific ratios. I will borrow and interpretation from Vaziriani which is building a product. Here
there is a number of goods and one buyer maybe wants to make a cake here. The buyer needs
the goods in specific ratios, maybe one third should be flour, one fourth eggs, one fourth sugar
and whatever is left is cherries and the buyer doesn't really want eggplant. They want to make
as many cakes as possible but they need goods in specific ratios. Of course there are other
buyers more interested in other subsets of goods. For me the easiest way to look at the
connections between these three problems is by looking at network flow problems because this
is actually where the alpha equals one came from originally. If you want to interpret these
problems as network flow problems the problem is as follows. For each variable you have a
source and sync pairs and a fixed number of paths and you fix flow over paths in specific ratios.
Your location is the total flow between the source and sync pairs. Eisenberg-Gale markets are a
more general version of this where, again you have a source and sync pairs but you can split
flow over the paths anyway you like. For linear utilities you would be fair in terms of the total
flows, but you can have some other utilities. The Fisher markets are a special case of EisenbergGale markets but not the problem above, where once again, you can split the flows in any way
you like over the paths but only one edge her path is actually capacitated, whereas, in these
two cases there can be arbitrarily many edges that are capacitated. To summarize, I have
talked about a fast, distributed and very robust algorithm for the class of alpha fair packing
problems. This problem arises in many applications. Some open problems are whether these
techniques could apply for ongoing Eisenberg-Gale markets. This could be important, for
example, for the development of automated online markets. Another question is whether
some of these techniques apply to or extend to other types of convex problems. So that's it.
[applause]. Are there any more questions? I think I'm exactly on time even though we started
3 minutes later. No? Okay. Thank you.
Download