1 >> Kamal Jain: Hello. It's my pleasure to... He's been a researcher there for the last couple of...

advertisement
1
>> Kamal Jain: Hello. It's my pleasure to introduce Rohit Khandekar . He's from IBM.
He's been a researcher there for the last couple of years. He took his Ph.D. from IT Delhi
from where I did my undergrad, so he got more out of it there. And also at IBM I just did
an internship there. And here it is.
>> Rohit Khandekar: Thank you, Kamal. Really glad to be here.
So I'll be talking about some recent work that I did with Baruch Awerbuch. It's on
stateless distributed gradient descent for positive linear programs. And for those that
don't like complicated titles, I'm essentially going to give a very simple and fast
converging algorithm for solving certain families of linear programs.
So let me start with a motivating example. It's one of distributed flow control. So you're
given a flow network, it's a graph with edge capacities. And you're given a bunch of
commodities, where each commodity corresponds to a fixed part in the network. So the
red commodity wants to flow along this fixed part, from this node to that node; the blue
commodity wants to flow along the blue part and so on. And you're objective is to
maximize the total profit you accrue by routing the flow subject to its capacity
constraints.
More precisely, think of BP as profit you accrue by sending unit flow along path B and
FP is the flow you're going to send along path B. So you want to maximize the total
profit subject to the constraint that the total flow through any edge is at most its capacity.
Now, this is an instance of a packing linear program. It's a linear program, and in fact
explicitly given there are many algorithms to solve this. But the model that we are going
to look at today is somewhat more restrictive, and let me begin by describing it.
So we assume that there is an agent associated with every path. And that agent decides
how much flow is going to get routed along that path. But these agents have limited
information about the instance. They don't know the entire network, they don't know the
entire -- they don't know how much other parts are there in the network, they don't know
what are their individual flows. All that they can see at any point is the edge congestions
on the edges on their own path. And just based on this information they are supposed to
update the flow values.
So our question is ->>: But what is the goal of the individual agents?
>> Rohit Khandekar: Okay. So these individual edges are going to follow some
prescribed protocol that you give them, and we want to design a simple protocol that
when these agents act according to the protocol, they together converge to a global
optimum solution.
So there are no [inaudible] issues here. Right. So these agents are not greedy, they don't
have their own objective function.
>>: [inaudible] to follow the protocol.
2
>> Rohit Khandekar: Right. So that will be another -- another question. Yeah. But here
we are not looking at [inaudible] issues. We just want to -- these agents are faithful to
follow whatever you tell them to.
The main question is ->>: Can you tell me where we can find these agents?
>> Rohit Khandekar: We'll take that -- we'll discuss that offline.
So the question is can these agents achieve these global optimum solution just by looking
at this local feedback that they get from the system. And, moreover, we want the
protocol or the algorithm to be stateless and self-stabilizing. So we really want the
routing -- the decisions that these agents take just to just depend on the current flow
situation. They should not store any history about the past execution.
And such a thing is useful, especially in a distributed system, where things are not robust.
If the edge capacities change over time or the instance changes because some new agents
come into the system, you don't want the algorithm to start from zero or start from a
well-defined state. You just want to start from the current situation, and these agents
should be able to update just from the current solution and converge to a near optimum
solution to the new instance.
Now, as I said, the flow control problem was really an instance of a packing linear
program, and we can in fact talk about -- talk like in more generality about packing linear
programs.
So here is a linear program where we want to maximize a linear function C vertex subject
to some packing constraints, X less than equal to B. And all the coefficients in A, C, and
B are nonnegative. And we work with the same model. In particular, we assume that
there is an agent J associated with every variable XJ, and that agent knows the column
corresponding to that variable, he knows the column J in the constraint matrix, and he
also knows its coefficient in the objective function, CJ.
And this agent is allowed to get some feedback from the constraints that he's present in.
So, for instance, in this picture this agent corresponding to this variable is present in these
four constraints. So at any point he's allowed to read off the so-called condition values,
namely AIX over BI for the constraints for which AIJ is nonzero.
And the question is again the same: Just based on this local feedback, can the update -- is
XJ value such that these agents together converge to a near optimum solution quickly.
So we'll be interested in a fast converging stateless algorithm, and that's the -- that's the
main topic of this talk. So in the rest of the talk hopefully I'll be able to give you enough
intuition and enough details about our algorithm.
So before I start describing the algorithm, let me make some simplification. This is just
to simplify the notation and make the presentation a little easier to follow.
So this was the original LP that we were working with. It's maximize C dot X subject to
3
some packing constraints. I'm going to assume that the vector C and B are vectors of all
1s. All of the entries are 1. And that can be done without loss of generality by scaling.
To make all Bs 1, you have to scale the rows; to make all Cs 1, you have to scale the
columns.
And we also assume that all the nonzero entries in A lie in the range 1 to A max, where A
max is some real number. And we assume that these agents know the values of M, N,
and A max. M is the number of constraints; N is the number of variables. They need not
know the exact values, they may -- it's enough to know some upper bounds. These
values are used to set some parameters in the algorithm.
Okay. So our algorithm is based on a primal dual complemented slackness conditions
and how they imply near optimality. So let me briefly mention that -- so this is the
primal LP that we're trying to solve. Let's consider it's dual. So we also see the variable
YI with every ith constraint. And the object in the dual is minimize 1 dot Ys so that A
transpose Y is at least 1.
Now, let's recall primal and dual complemented slackness conditions. So the primal
complemented slackness conditions state that if Jth primal variable is positive, then the
Jth dual constraint is tight. And similarly, dual complemented slackness conditions state
that if ith dual variable is positive, then the ith primal constraint is tight.
And it's very easy to prove that if X and Y form feasible solutions to the primal and dual
respectively, and then they satisfy -- and if they satisfy both the complemented slackness
conditions, then they in fact form optimum solutions to the corresponding linear
programs.
So our algorithm is going to try and make the solution X and Y satisfy complemented
slackness condition. So that's the general goal. And eventually we'll be able to show
that -- the optimum, near optimum solutions.
Okay. So with this introduction, let me describe the algorithm. The algorithm is really
very simple to -- it fits on a single slide and it's very simple to follow.
So we start with any solution X that satisfies primal constraints. So start with any X
square and equal to 0, so that X is less than equal to 1. And now I'm going to repeat -every agent is going to repeat some execution.
So in every round we associate dual variables YI corresponding to the primal constraints,
and essentially YI capture how tightly the corresponding primal constraints are satisfied.
So if ith constraint is very tightly satisfied -- namely, AIX is very close to 1 -- then the
value of YI is going to be large.
>>: [inaudible]
>> Rohit Khandekar: So let's understand this. So let's, on the other hand, assume that
AIX is very small as compared to 1. If the ith constraint is very well satisfied, then this
quantity is going to be negative. And I'll give you the values later, but mu is going to
be -- mu is a large constraint. So if this quantity is negative, then YI is going to be very
close to zero.
4
On the other hand, if AIX is equal to 1, it's tight, then this is 0, so Y is going to be 1. So
the whole point here that YI capture how tightly the corresponding constraints are
satisfied in the primal. Larger the YI, tighter is the constraint.
So you can think of YI is also importance of the primal constraints. So if some constraint
is very tight, then it's very important for me to focus on it. So YI gives a high weight for
that constraint.
And then every agent J is going do some simple update on its variables. So what agent J
does is as follows. So it computes the value of AJ transpose Y. Remember, the dual
constraints where agent transpose Y greater than or equal to 1. So the Jth agent is
essentially computing the value of Jth dual constraint. And it's going to compare that
value to 1.
>>: [inaudible] the row or the column?
>> Rohit Khandekar: So the primal agents are indexed by J. And so we have a primal
variable XJ, and there is an agent J ->>: So you have agents for each ->> Rohit Khandekar: Each column ->>: [inaudible]
>> Rohit Khandekar: No. We have agent just with primal variables. There are no agents
with duals. We only get some feedback from the system corresponding to the constraints.
>>: [inaudible] does the agents -- the ith agent know the ith role or the [inaudible]
column of the matrix?
>> Rohit Khandekar: So ith agent know it is ith column of matrix A.
>>: Don't -- aren't you using both the column and the rows there?
>> Rohit Khandekar: No. So -- okay. So agent transpose Y really indicates the Jth entry
in A transpose Y. So this is -- agent transpose J -- sorry. Agent transpose is really the Jth
column of A.
>>: It's like a dual constraint corresponding to ->> Rohit Khandekar: Right. So that's the dual constraint corresponding to the Jth primal
variable. So if you -- let's go back to the ->>: AIX is also -- AIX is not the ith row times X?
>> Rohit Khandekar: AIX denotes -- yes. So there is -- yeah. So AIX denotes the ith -AI denotes the I through --
5
>>: [inaudible] AIX [inaudible] know something about [inaudible].
>> Rohit Khandekar: Right. Right. So he know it is values of AIX over BI for the
constraints that he's present at. And he knows the -- he know it is Jth column.
All right. So what does agent J do. He computes the value of the Jth constraint, Jth dual
constraint; namely, agent transpose Y. So this is essentially the dot product of J column
of A with Y. And he's going to compare that with 1. If this value is much smaller than 1,
then he's going to increase his variable by a small multiplicative factor.
On the other hand, if this value is much larger than 1, then he's going to decrease his
value by a small multiplicative factor. And we need this somewhat -- we need to have
some additive change here, because if XJ zero to begin with, then just multiplicative
change will not be enough to change XJ. So just to bootstrap that multiplicative change,
we have this additive small delta.
So what's happening here? Let's try to understand. In the flow maximization problem,
we have agents associated with paths. And YIs are associated with the edges. So AIX
denotes the total flow on edge I. And 1 corresponds to the capacity of edge E -- edge I.
So if the flow is very close to capacity, then the dual variable associated with an edge is
going the be very large. So think of YI as the length of edge I. All right. So if the flow
is very close to the capacity, then the length of edge I is going to be large. Right?
Now, what does agent corresponding to path J do? He computes the length of path J
under this length metric, and if this length is small, then he thinks that there is enough
space for pushing more flow. So in that case he updates his flow by multiplicative factor.
On the other hand, the length of his own path is much larger than target length to 1. Then
there have perhaps too much congestion on that part, so he reduces his own flow.
So just -- it's easy to verify that an agent J just needs to know the values of YI for the
constraints that he's present in, for the constraints for which AIJ is nonzero. Because he
just needs to know the value of this product, agent [inaudible] plus Y.
>>: Do you maintain that physical flow during the process [inaudible]?
>> Rohit Khandekar: Yes. We will prove in a minute that the flow always remains
feasible, so under these updates, for appropriate values of this constraints mu [inaudible]
delta, the solution X, the primal solution X always remains feasible. You always satisfy
X less than equal to 1.
So that's the entire algorithm. So start from any feasible point and just make these
updates. And we will show some nice properties about this algorithm. So any questions
regarding the algorithm?
Okay. So what do we show? What is the main result of our paper? We show that this
algorithm starts from any -- starting from any feasible solution X, the algorithm always
maintains feasibility, primal feasibility, X less than equal to 1, and converges to a 1 plus
epsilon approximation in number of rounds that the polylog in the size of the LP and 1
6
over poly in epsilon. Yes.
>>: The math looks very much like a summarization of Sutherland's paper [inaudible]
mark in computer time. Is it?
>> Rohit Khandekar: I'm not aware of that [inaudible].
>>: It's a classic. He divided up time on a PDP-1 computer, and guess how long time
ago [inaudible].
>> Rohit Khandekar: Um-hmm.
>>: By playing with prizes per hour.
>> Rohit Khandekar: I see.
>>: Look it up on the Web.
>> Rohit Khandekar: Sure. Thank you.
All right. So starting from any feasible solution ->>: Didn't prove anything about that. He didn't prove anything, but in one university
where I put this article on the boss's desk, it was implemented on Monday and by
Wednesday it was as stable as a rock. Of course, that's anecdotal evidence. It didn't
prove anything. Sorry.
>> Rohit Khandekar: So we show that this algorithm converges to a near optimum
solution in polylog number of [inaudible]. And as you can see, this algorithm just
depends on this local feedback that every agent needs to know from the system, and that's
why it can be distributed -- it can be implemented in a distributed manner, if there is such
a feedback available.
And, moreover, this is self-stabilizing, because we start from any feasible solution. So as
soon as the instance changes, there's a very easy rule to make the current solution feasible
for the new instance by -- in a single step. And from then on you can just run with this
algorithm and converge to a near optimum solution quickly.
>>: So if you're nonfeasible, you cannot -- you have to do a [inaudible]?
>> Rohit Khandekar: Yes. It's actually very easy to do preprocessing step. So let's say if
you know -- if any variable is present in an infeasible constraint, then he can just set its
value to zero in one step.
>>: Or zero is feasible.
>> Rohit Khandekar: Right.
>>: No, but then [inaudible].
7
>> Rohit Khandekar: Right. You don't want to start from [inaudible] zero. So you
should set your value to zero only if you're part of infeasible constraint. So that's why -- I
mean, if -- the whole point here is that if the instance changes in some part of the system,
you don't want everyone to start from zero, you want the system to kind of self-adjust
itself and the effects should travel to the rest of the system slowly.
>>: [inaudible] how you get delta?
>> Rohit Khandekar: Delta is set to -- delta is sufficiently small that increasing variables
by [inaudible] delta should not affect system too much.
And to set these parameters, the agents need to know the values of M, N, and A max.
That's the only place where they use the information about size of the network.
So I like to think of this algorithm as a [inaudible] interior point method, because starting
from any point inside the feasible region, the algorithm takes an internal path and
converges to a near optimum solution in polylog number of [inaudible].
Okay. So let me say a few words about previous work on [inaudible] algorithms for
solving linear programs. So there has been a lot of work on designing [inaudible]
algorithms for packing and covering linear programs. And most of those algorithms are
not stateless. So, for instance, in Plotkins, Shmoys, Tardos [inaudible], the algorithm set
starts all the variables from zero and then increments some -- it maintains some global
information and increments some selected set of variables.
The only work that I know of which is truly stateless is the work of Gurgan Young and
Fox [phonetic] 2002 where they designed an algorithm for a multi-commodity flow
problem with fixed paths, so essentially the flow control problem that I mentioned. And
they thought of flow as injecting -- they thought of flow as a stream of packets injected at
the source. And the only feedback that they could receive was the total packet loss rate
end to end.
So the packets traveled through the network, and if some edge is more congested than it
should be, then a random fraction of the packets were dropped. And the agents can read
off the values of the total end-to-end drop rate. And just based on that information, they
updated the condition rate, the packet injection rate.
So they showed convergence to 1 plus epsilon approximate solutions in time proportional
to C max, where C max is the maximum objective coefficient. And it was not clear how
to generalize this algorithm for arbitrary positive linear programs. So we can think of our
algorithm as somewhat extending this work to arbitrary linear programs, and also
improving the convergence rate.
>>: Feedback [inaudible].
>> Rohit Khandekar: Feedback ->>: [inaudible]
>> Rohit Khandekar: Feedback is somewhat related, because there -- so the end-to-end
8
drop rate is actually related to the total length of your path of your associated exponential
length function. So there is some connection, although it's not -- it's not an explicit ->>: As you stated, you need to get feedback from each edge [inaudible].
>> Rohit Khandekar: As I -- no. Actually a path just needs to know the total length of
the path. We don't need separate lengths from the edges. Because we compare the total
length of the path with 1, and if the path length is small, we increase the flow. If path
length is larger than 1, we decrease the flow. So we just need to know the aggregate path
length.
All right. So on this slide I've briefly mentioned the key differences from previous work.
So most of the previous algorithms, as I mentioned, start from zero. And they only
increase the variables. And they maintain some global threshold information and
increase the variables that are better than the threshold.
On the other hand, we start from arbitrary feasible solutions. And since we start from
arbitrary solutions, we need to have an ability to both increase and decrease the variables
if we want to achieve near optimality. And we are stateless. We don't maintain any state
information or we don't need any initialization. But they have better convergence,
dependence on epsilon. I think it's 1 over epsilon squared, YLR convergence is only -we could show only a 1 over epsilon to the 5 convergence.
Okay. So let me go and give you some intuition about the analysis. So let me quickly
argue that algorithm always maintains feasibility, primal feasibility throughout the
execution. So to prove this, we show that all the YIs are at most 1. And if all the YIs are
at most 1, then it follows immediately that AIX is less than equal to 1. Now suppose not.
Suppose some YI increases to more than 1. Now, for that to happen, some XJ with AIJ
greater than zero must increase to make Y larger than 1. But for XJ to increase, the value
of AJ transpose Y must be less than 1 minus alpha from the definition of the algorithm.
Only then can XJ increase.
But all the nonzero entries in A are at least 1. And in particular AIJ is at least 1.
Therefore, Y1 must be less than1 minus alpha before this increase. Right? So YI is
sufficiently smaller than 1.
And now comes the trick. So we set the parameters [inaudible] namely, the step length
parameter, the multiplicative increase by which we update the variables [inaudible] small
enough so that any YI changes by very small multiplicative factor. More precisely, we
prove that YI changes by most a factor of alpha over 4 in any single round. So if YI is
less than 1 minus alpha to begin with ->>: [inaudible]
>> Rohit Khandekar: No. Multiplicative. 1 plus alpha over 4. Right. Changes by at
most a multiplicative factor of alpha over 4. So if YI is less than 1 minus alpha, then in a
single round it cannot be more than 1.
So, in a sense, the point here is that we have to set the step length parameter small enough
if you want to maintain feasibility. But you should not set it too small, because you want
9
fast convergence. So that will come next.
Now, under the quick point that we should verify before proving that the algorithm
converges to near optimality is that the fixed points of this algorithm are [inaudible] near
optimum. Because the fixed points are fixed points and the [inaudible] near optimum if
your theorem is true.
So let's quickly show that. So let's take a pair of solutions X and Y. X is the solution and
Y is the corresponding setting of lower variables. And assume that X comma Y has
entered a fixed point of our algorithm. We'll show that X and Y both correspond to near
optimum solutions to primal and dual respectively.
So first let's show approximate feasibility. Since X is a fixed point, no XJ is changing;
therefore the value of AJ transpose Y for every J is between 1 minus alpha and 1 plus
alpha. So all the dual constraints are not only approximately satisfied, they are very tight.
All the dual constraints are tightly satisfied. And in fact in the previous slide, we already
showed that X is always feasible. So primal feasibility is already established. So we
have that both X and Y are approximate feasible.
Now we argue that the complemented slackness -- both the complemented slackness
conditions are satisfied in approximate sense. More precisely, if XI -- if XJ is large, then
the Jth dual constraint is approximately tight. Well, we just argued that all the dual
constraints are approximately tight, so the primal complemented slackness conditions are
trivially satisfied.
On the other hand, if YI is large, then from the definition of YI, the ith primal constraint
is approximately satisfied. Because we called it, YI was a very fast growing function of
how tightly the constraint was -- corresponding constraint was satisfied.
>>: Is this true for all solutions [inaudible]?
>> Rohit Khandekar: Yes. Yes. This point is actually true for all X and Y. And if you
put these two things together, we can argue that X and Y actually form near optimum
solutions. So the fixed points are actually near optimum solutions to both primal and
dual.
But since our goal is to not just analyze fixed points but to show fast convergence, we
need some measure of some potential that we can track and prove that we are making
significant progress toward optimality in every step. So, to this end, we ->>: The only place where this fixed point is the -- is ability of the dual or [inaudible]
where is the property that the fixed point [inaudible] as being the solution?
>> Rohit Khandekar: I guess we only use it here. That's right. Approximate dual
feasibility. That's right.
So to show fast convergence, we have to have some [inaudible] property of the algorithm.
Because the variables are going up and down, so it's very hard to track what's happening
in the system. So that's why we looked at this particular potential function, which is
essentially the primal objective function, submission XJ, minus a penalty for violating
10
primal constraints. So recall YI was -- think of YI as penalty for violating the ith
constraint. So you get profit by routing more slow, but you get penalized for violating its
capacity constraints.
And, in fact, to give some intuition about this potential, I tried to plot -- so let's consider a
very simple linear program, which has two variables, X1 and X2, and the linear program
is maximize X1 plus X2, subject to the constraint that X1 is less than equal to 1, and X2
is less than equal to 1. So what's happening here is that I have plotted -- so this is X1,
this is X2, and along the Z axis I have plotted the potential function.
So the potential function increases with X1 and X2. But as we come close to the
boundary of the [inaudible] the penalty shoots up and the potential function dips to minus
infinity. So as you get closer and closer to the boundary, the values of YI increase so
rapidly that you -- that it starts penalizing. So the maximum point of the potential
function, which is here, is actually very close to the optimum solution, which is at this
vertex. Because as long as you are far from the boundary, you are doing the same thing
as the objective function does. But when you go close to the boundary, your potential
function dips.
So what we show in our analysis is that our algorithm actually is working in hindsight
with this potential function, and this potential function only [inaudible] increases doing
the algorithm.
So analytically this potential function has some nice properties. And this is what the next
equation shows. So first of all know that this potential function is a differentiable
function of variables XJ. It's not only continuous, but since YI are exponential functions
of AIX minus 1, it's actually very smooth and you can differentiate.
So if you look at the partial [inaudible] of the potential function with respect to Jth
variable, and if you plug in this expression and simplify, then it turns out to be equal to 1
minus AJ transpose Y. So this is the gap -- this captures how tightly the J constraint is
satisfied.
And if you recall, the decision of whether to increase XJ or decrease XJ was actually
based on this quantity. So if you approximate the change in the potential by the first
order of terms -- namely, the change in XJ values times the derivatives -- then you can
show that this [inaudible] is in fact always positive, nonnegative. Because you increase
XJ only if this is positive. And you decrease XJ only if this is negative. So this is the
basic reason why the potential only increases.
And, moreover, we can say something stronger. We call that -- we increase XJ only if
this is significantly larger than zero. This is at least alpha. And we decrease XJ if this is
at most minus alpha. So overall the increase in the potential is at least alpha times the
absolute change in the X values. So if X values are changing rapidly, no matter whether
they're going up or down, they're changing rapidly, you're making progress with respect
to the potential. And that's why you can think of this algorithm as doing distributed
gradient ascent on this potential.
Okay. So how do we show fast convergence? So we just notice that a significant change
in X values leads to a significant increase in the potential. So, similarly, you can show
11
that a significant change in Y values leads to significant increase in the potential.
Because, intuitively speaking, Y values are [inaudible] by X values. So if Y values are
changing rapidly, then X values are also changing. And that's why the potential function
increases.
But the potential function is bounded. It cannot always increase. I mean, it cannot
increase without bound. So such a thing cannot happen forever. So it cannot happen that
X and Y values are changing a lot for a long time. And then we show this important
[inaudible] that says that if X and Y values do not change significantly in an interval of
appropriate length, then we have near optimality.
>>: [inaudible]
>> Rohit Khandekar: Epsilon is the error parameter that -- so we have [inaudible] 1 plus
epsilon approximation. So more precisely if you take any interval of logarithmic number
of rounds, and if you notice that throughout the interval the X and Y values did not
change by too much, then we show that X and Y -- X, solution X throughout the interval
is actually a near optimum solution to the primal.
So let's call this a stationary interval, an interval of logarithmic length where X and Y
values do not change. And on this slide I'm -- I've tried to show you an intuition of the
proof Y stationary interval implies near optimality. So consider a stationary interval and
consider any X in that interval. And if X is not near optimal, then we show that there
exist a variable XJ such that AJ transpose Y is consistently less than 1 minus alpha
throughout the interval.
And if such a thing holds, then from the definition of the algorithm, XJ will be increased
multiplicatively. But since length of the interval is logarithmic, XJ will become more
than 1 soon, which would contradict the feasibility that we have already shown.
So the key point here is that assuming that we are not near optimal and assuming that the
X and Y values are not changing rapidly, [inaudible] exists an XJ for which this property
holds. And the proof of that factor is also easy. It's essentially four lines. But the
intuitive reason is that if we're very far away from optimality, then there must be some
variable XJ that optimum solution is exploiting that you are missing on. And that is the
variable that -- that is the variable that you should focus on. So that's the intuition.
But, more precisely, let me just walk you through this chain of equations. So let's say X
star is the optimum solution. From our assumption that X is not near optimum, we know
that 1 dot X star is much larger than 1 dot X. This is the objective values.
Now, since X values are stationary, all the AJ transpose Ys are approximately equal to 1.
So this is roughly equal to this. But since Y is a fast growing function, most of the mass
of Y is concentrated on those constraints for which AIX is roughly equal to 1. So that's
why this is approximately equal to this.
And since X star is at most 1, X star is the optimum solution, so it satisfies the
constraints, this is at least Y transpose X star.
So in a nutshell what we get is that 1 dot X star is much larger than Y transpose X star.
12
Now, think of this quantity as an average of values AJ transpose Y under the weight X
star J. So this divided by this is the average value of AJ transpose Y, which is much
larger -- much less than 1. So in particular there will be a variable XJ for which AJ
transpose Y is much less than 1. So of course this is just an outline. The details are given
in the paper.
So let me summarize. So we just solve very simple stateless and gradient descent based
algorithm for positive -- for packing linear programs. And a very similar algorithm also
works for covering linear programs, where the object is to minimize C dot X, where X is
the X squared and equal to B. Mix packing and covering, the positive linear program
where we have both packing and covering constraints.
And some convex programs where the objective is -- let's say we want to maximize some
concave function of Xs subject to some packing constraints. Like in flow control, instead
of maximizing the flow control, if you want to maximize a fair objective function, like
log on the flows, then you can use similar ideas.
The main open questions that I would like to know answers to is what happens if the
linear programs are not given explicitly. Let's say in multi-commodity flows if the paths
are not fixed, then it's an implicitly given LP because there is one variable for every path.
So can you implement this algorithm in a compact way so that the running time remains
polynomial. And of course look for other applications of this simple update rule, let's say
from market equilibrium questions, can we compute market equilibria using such
algorithms.
I'll stop here. Thank you for your attention.
[applause]
>>: So you talked about the [inaudible] convex program [inaudible] the objective
functions enough for additive, objective function is multiplicative.
>> Rohit Khandekar: So what we use crucially in this analysis is the positiveness of the
packing linear programs. So, for instance, I don't know how to generalize this to
arbitrary linear programs even.
>>: No, but the constraint [inaudible] packing constraints.
>> Rohit Khandekar: Are packing constraints.
>>: Yeah. The only thing different is the objective function, instead of saying objective
function it is summation AIXI, it is summation AI log XI or you can interpret it as a
multiplication XI [inaudible].
>> Rohit Khandekar: Right. So if the objective function is -- so we are trying to
maximize a concave objective function.
>>: [inaudible]
13
>> Rohit Khandekar: Subject to packing constraints. Yes. So I think for that it should
work too. Although, we have to be careful regarding what kind of approximate solutions
are we computing, because we may be computing a weak approximate [inaudible].
>>: Let's say you're approximating the convex program.
>> Rohit Khandekar: Right. I think some algorithm like this should work. Because all
that we're really using is convexity of some potential function. And basically doing
gradient descent on that.
>>: [inaudible] taking a constraint optimization and converting to [inaudible].
>> Rohit Khandekar: Right. Right. Adding a penalty function for violating constraints.
>>: But the other part is somehow you're -- by doing this you're also getting this
[inaudible].
>> Rohit Khandekar: Right. So I really think that that is good for extending this work to
the more general mathematical programs.
>>: Could you go back one slide.
>> Rohit Khandekar: This?
>>: No, no. Yeah. The open [inaudible]. So what does it mean [inaudible]?
>> Rohit Khandekar: So here -- so our algorithm maintained a separate agent for every
variable. And we are -- and an agent is updating his own variable value. So here you
would like to associate an agent with a commodity, a sourcing pair, and not with every
path.
So if you -- if you try to -- if you kind of try to implicitly update all the flow variables by
not doing this exponential amount of work, but just try to do [inaudible] amount of work,
then at least the initial calculations run into some problems.
So another question is can we modify the update rule to make it more smooth and then
give an implicit way of updating all the flows along the exponential paths implicitly.
>>: [inaudible] your update rule different on the length of the [inaudible] so you can
[inaudible] ->>: Or could the exponential number of paths ->> Rohit Khandekar: But that exponential number of paths, so -- and you don't want to
maintain flow values for every path.
So we have been able to show some results in this direction. More precisely, if the path
lengths are bounded -- if let's say I allow you to send flows along paths of length at most
edge, then we can do -- we can get convergence which is polynomial in edge. Since sort
of polylog convergence, we can show that with convergent like edge square number of
14
[inaudible].
So there you don't have to maintain -- you just maintain the flow, not the flow path
decomposition. And then you update the flow along short paths in your flow. But then -but I don't know how to show this for general -- and to get -- normal importantly to get
polylogarithmic in N conversions.
>>: [inaudible] since you allow access to start from [inaudible] then we should be able to
move access -- act quickly to the new optimal. Did you look at that?
>> Rohit Khandekar: That's a good question. Yeah. I've been asked that question
before. I don't know how to show fast -- I mean, convergence which is better than this
for starting from a near optimum solution. All right. That's a nice -- interesting question.
I don't know how to show that here.
>> Kamal Jain: Let's thank the speaker.
>> Rohit Khandekar: Thank you.
[applause]
Download