21840 >>: Okay. Good morning, everyone. I've been... flows for a long time in their probabilistic context as...

advertisement
21840
>>: Okay. Good morning, everyone. I've been a fan of electrical
flows for a long time in their probabilistic context as have others
here. And it's delightful to see they're flowering in the algorithmic
realm. So today we're happy to welcome Aleksander Madry from MIT.
>> Aleksander Madry: So hello. Thank you for the introduction, for
the invitation. So what I plan on doing today is describing a new tool
for graph algorithms that extends from exploiting the connection of the
notion of electrical flows to the certain type of linear systems called
Laplacian systems.
So the way I intend to describe this tool is by describing to you how
does it apply to a particular problem. Namely, maximum flow problem.
And then by talking briefly about broader context.
But before I do all that, let me talk a little bit about broader
research motivation that motivated this research in particular and the
research I intend to do in the future.
So, well, this motivation stems from the fact that given the recent
explosion in various types of services and applications, we are faced
with the reality that we live in a world of huge graphs.
It is not that these graphs are just out there. We really want to get
our hands on them. We want to analyze them and understand them.
And, in particular, the task of interest that we might think about
pursuing is, for instance, graph partitioning because we would like to
perform clustering or community detection or connectivity analyzing
because we'd like to do congestion estimation in these networks or
analyze resilience to link failures or doing network design.
We would like to find a way of supporting on this graph some
information infrastructure that is reliable and efficient. So this has
some example tasks. And, of course, they're not really new to us. We
studied these kind of problems for quite a long time already. But what
have changed is that given the size of these graphs, now the time
becomes the hard constraint. So now our algorithms, whatever they are
doing, they need to be extra efficient.
And in particular in this regime, having just the polynomial time
algorithm, is not really enough. Like we need something much, much
faster.
So we see that the nature of changes we are facing changed and they
have evolved but hopefully or fortunately so did our tools.
And, in particular, in recent years, we have seen quite remarkable
development in various areas of algorithmic graph theory and in
scientific computing and optimization methods, and this development
equipped us with new and very exciting tools that we didn't really have
before.
So what my research is mostly about is trying to look at these new
tools that we are equipped with now and build that diverse algorithmic
toolkit that will help us in addressing the changes that I've just
mentioned.
In particular, by employing this algorithmic toolkit, I managed to
advance the state of the art on various basic graph problems that
ranges from various flow problems through partitioning problems to,
well, TSP problems and generation of random spanning trees.
So what I intend to do today is just describe one of these results,
namely, how some of the tools I just mentioned apply to the maximum
flow problem.
Okay. So let's start by defining formally what the maximum flow
problem is. So in this problem we are given directed graph G, and we
have a special vertex S, which is a source, and special vertex T which
is the sync. And then additionally for each art we're given an integer
capacity.
Now the task that we are shooting for is finding a visible ST flow of
maximum value. So what does it mean? It means that we want to have a
flow we can view as an assignment of numbers to the arcs.
And we want two conditions to be satisfied. So first condition is that
if we look at any vertex other than S and P, the new lag total flow
into this vertex to be equal to the total flow out of this. So we'd
like to have flow conservation constraints.
The second constraint is that if you look at an arc, then the flow on
this arc will not be built in this capacity. Now given the constraint,
our task is to maximize the value of the flow which is just the net
flow into T, which is by flow conservation constraints equal to the net
flow out of S.
So in this example we have a flow of value of 7. But the actual max
flow value of this graph is 10. And here is an example of a flow that
achieves this value.
Okay. So this is a problem. And now a natural question that we should
ask is: Why do we care about this problem? And in this case actually
the answer is pretty easy. Well, what one can just say is one of the
fundamental optimization problems, and there's many reasons for that.
So the study of this problem dates back to the 1930s. And it's one of
the problems that is extremely broadly applied in practice. And
there's formally many reasons. One of the reasons is because, just by
the [inaudible] variability to the transportation problems but also
through various reductions it turns out that maximum flow computation
corresponds to tasks like scheduling, graph partitioning or image
segmentations.
So it's really broadly applied in practice but that's not the only
reason why we care about it. Namely, this is one of those problems,
well, when we were trying to understand it, it shaped our understanding
of combinatorial algorithms in general.
So many tools that were developed to just understand this problem
turned out to be useful in broader context. And in particular it seems
that somehow this problem captures some parts of a algorithmic graph
theory we'd like to understand. Namely, once again, some techniques
are aimed at making progress on this problem turned out to be broadly
useful for other in our contexts as well.
So this is why do we care about this problem.
little bit what is known about this problem.
And now let's talk a
So well as you might imagine, given the long history of the problem,
there is a lot of previous work. So let me just -- well, in the
interests of time, let me just not talk about this previous work at
all. I hope you will forgive me.
All I will mention is the current state-of-the-art algorithm which is
the seminal algorithm by Goldberg Rao, whose running time can be
expressed by this formula; but as we care about big picture, as we care
about big picture in this talk, so let me introduce alternate notation
that suppresses all the algorithmic factors.
So if I write alt Q that means that there's some logarithmic factors
lingering; but usually they will not be too big. Here we have a
running time of Goldberg Rao, and one of the stems is bigger depending
on the sparsity of the graphs, depending on the ratio of the edges to
the vertices.
Again, let me do one more simplification and throughout this talk let's
assume that we are dealing with a sparse graph, so a graph that was
number of edges is not that much bigger than number of edges.
Once again, there's a couple of reasons why we might do that. First
reason is the huge graphs tend to be sparse. And the second reason is
that we already have some limited way of dealing with density.
So there are techniques that allow slightly, reduced the density of the
graph as dense. But still sort of the regime in which we really don't
know too much is when the graph is sparse to begin with. Sort of like
understanding algorithms in this regime seems to be the benchmark of
our understanding of the problem.
Okay. So from now on let's focus on sparse graphs. So now, well, if
we focus on sparse graphs, the running time of Goldberg Rao is NG
one-half. Probably one of the most essential questions in algorithmic
graph theories whether we can improve this running time.
Unfortunately, as much as you would like to know, well, we don't, and
in some sense it's even more embarrassing than that. Namely, even if
you look at the baby case of this problem, this question, namely we
should trick ourselves switch to graphs all capacity is being 1. Then
this running of N to the three-halves is known for 35 years and is not
improved yet.
So we are really stuck here. Because of that what people did over time
is formulated sort of a slightly -- well, hopefully simpler challenge
that will be, that can be viewed as a first step toward understanding
the max flow complexity. Namely, what people set out to do is just
obtain an algorithm that one minus epsilon approximate is the maximum
flow in undirected graphs. They would like to be N to the three-halves
running time.
Just a remark here, so we should look at this challenge and it seems
that we are giving up two things. First of all, we settle for
approximation and then we settle for unrelated graphs only. But
actually if we only settled for undirected graphs, then max flow has an
interesting feature that if we are able to solve max flow exactly on
the undirected graphs, then we're actually able to solve max flow on
undirected graphs as well.
So just giving up the directness of the graph is not enough. It
doesn't really make the problem simpler. That's why we need to also
introduce the approximation.
So now this is this slightly simpler challenge, or maybe very -- much
simpler challenge. But still even for this challenge the best
algorithm, even if all the capacities are one, is just take Goldberg
Rao and run it. So we don't know how to improve this running time.
What I want to talk about today is the first improvement towards this
challenge; namely, the result due to Paul Christiana, Don Kelner,
[phonetic] and me which shows that one minus epsilon approximate max
flow in directed graphs in time which is roughly N times one-third. So
for sparse graph is N to the four-thirds. So this is the result that I
want to focus on for the rest of the talk.
So let's start with a discussion of the general approach we are taking.
So as probably you recall, Goldberg Rao algorithm is based on
augmenting sparse framwork. And just one minute description will frame
graph is it's based on iterative finding of ST path in dual graphs.
First off, if all the [inaudible] are 1, then what happens we start
with our original graph. We find some ST path. And then we just flip
all the edges on this ST path and this way we get a residual graph
corresponding to augmentation of the flow of the graph.
Now we have this residual graph then we proceed to looking for ST graph
in this new path. We repeat this procedure and we try to do it as long
as until it finishes until we cannot find any more ST paths, and there
is an easy procedure that allows us extracting of the final flow out of
the last residual graph that we actually compute.
So this is the way this framework works. And as you can see, it's
purely combinatorial. So the flow is built path by path and it's
greedy.
>>: Just for in addition, what's the difference between this and what
percentage?
>> Aleksander Madry:
focus.
No, this is like this framework was developed by
>>: You compute the blocking flow which ->> Aleksander Madry:
Yes, but like --
>>: It's really path by path.
>> Aleksander Madry: It is path by path. But it's just efficiently.
The way you can view block, blocking flow, is computation of greedy,
sequence of greedy ST paths in such a way that you will do it until you
will bottleneck the graph.
The only difference here that I sort of, that is the beauty of blocking
flow graph is you can do it faster by just looking at the whole graph
each time you find it. But in the end you can conceptually think about
it as finding a sequence of ST paths, you just do it faster than just
doing [inaudible] each time. Focus is essentially doing -- they didn't
know that it acts the right way, but focus is just doing this in less
action manner, blocking flows still does the same thing but does it
faster, because it allows to find the many paths simultaneously. But
conceptually it's still the same framework and still the same
algorithm.
Okay. So this is the framework. And like due to its simplicity and
elegance, it was the base for a lot of beautiful algorithms, in
particular Goldberg Rao was based on it. But so then the tempting
approach would be just to look at Goldberg Rao and assume that we are
okay with having approximation and we're dealing only with undirected
graphs and try to improve something.
But, unfortunately, that's exactly -- that's exactly the barrier that
we are starting for some time already. And it's really unclear how to
get the speed-up on this route.
So what we do in our result, we try to attack the problem from a
different angle. And sort of the one sentence explanation of what this
angle is is just we try to probe the global flow structure of the graph
by solving linear systems.
And now the obvious question here will be how can you relate flow
structure, [inaudible] as flow structure to something as linear
algebraic as linear systems and as you might guess from the title of
the talk the answer is electrical flows.
So let me talk a little bit about what electrical flows are. And
probably the easiest definition of electrical flows is the physics 101
definition. So the way we define it is like we have undirected graph
G. We have source and sync. And also we have some resistances that
are assigned to each of the edges.
And now the simple recipe for getting an electrical flow corresponding
to this resistance is just as follows. We treat each edge as the
resistor of the resistance assigned by RE. And now while we treat the
whole graph as electrical circuit and we just connect the battery to an
SNT so we have some current-induced that flows from S to T. And this
is the current exactly the electrical flow we are looking for.
So as simple as that. Unfortunately, even though it's very intuitive
definition, it's not really the most easiest definition to work with
when you want to prove anything about this electrical flow.
So the definition that we will actually use in this talk will be
slightly different but equivalent. Namely, what it says is that if we
have a graph and resistance is in source and sync, then what an
electrical flow in this graph is, is the unique ST flow of, like
electrical flow of value S is the unique ST flow of value F in this
graph minimizes the energy over all possible ST flows of value F.
Energy here is just the -- well, heat dissipation of this current on
the edges. So this is just some of RE times the flow on the edge
squared.
So this is the definition that we are actually using. And it's
equivalent to the one that you get by using Kehoff's law [phonetic] and
Ohm's Law. So this is the electrical flow. This is the definition.
Now, the question why do we care about electrical flows from the point
of view from algorithms.
Okay. So network efficient that comes to mind now how do you actually
compute an electrical flow. And the remarkable thing is all you really
need to do to get an electrical flow corresponding to a graph is just
to solve a linear system.
It's even better than that because it's not just an arbitrary linear
system but it's a linear system -- well, it's called the [inaudible]
linear system. It means the constrained matrix is a Laplacian graph.
I will not define what Laplacian is, it's not important for the
understanding of the rest of the talk. But I'll say, it's an extremely
important matrix associated with the graphs and there's the whole very
nice field called spectrograph theory which sole purpose is inferring
properties of the graph G based on analyzing properties of Laplacian.
Now we see to compute electrical flow, we need to solve a linear
system, and now the bottom line here is we actually know that these
type of systems, Laplacian linear systems, can be solved very
efficiently.
We can solve them essentially -- this is the result due to Spielman
Tang that was recently simplified by Koutis, Miller and Tang [phonetic]
and it's essentially because why we don't solve them exactly, we solve
them approximately. Seeing the dependence on the error is good enough,
for the sake of this application we can assume we can do it exactly.
Given this result, we see the electrical flow is just a new linear time
primitive. So we have a very fast algorithm that computes for us an
electrical flow. And now the question we want to ask is how can we
employ this primitive to actually say something about maximum flow of
the graph.
>>: It's like a [inaudible] algorithm.
>> Aleksander Madry:
Excuse me?
>>: Like I was wondering how [inaudible].
>> Aleksander Madry: That's an excellent question. Couple of answers
to it. First of all, this is a practical, this is a beautiful
algorithmic result, absolutely unpractical, absolutely unpractical.
This result actually is what I would call it very likely to be
practical, because it's actual. Both of them are linear, but, first of
all, the Koutis Miller Tang is much, much simpler and running time is
linear, meaning N times log squared. When for Spielman Tang it's N
times log 17 or like there is no officially established upper bound on
this poly log.
It's known there are some constants. It's not a huge constant. But
even if you just disregard this, turns out, like solving Laplacian
linear system, well known practical problem that people have also
heuristical algorithm that do it very efficiently. Depending from
which side we want to look at it, it seems from both theoretical and
theoretical point of view, it's practical primitive.
From the [inaudible] point of view it wasn't the case when Spielman
Tang came out.
>>: This is a technical detail. Maybe you'll get into this later.
when you map from the ST flow, traditional ST flow to maximum
capacities electrical flow, do you have to basically replace the
resistances with resistance because inductance, the larger the
conductor the bigger the flow.
But
>> Aleksander Madry: I'll come to that in a second. You'll see how
resistances as well inverses of conductances are [inaudible] depends on
the way you want to think about it. I prefer to think about
resistance -- but it will be actually in the next slide.
This is the primitive that we are getting. And now what I would like
to talk about is how can you employ such a primitive to solve max flow
problem? So when I will be doing this, let me do some simplifications.
Let me assume that all capacities are one. Let me also assume that I
actually know the value of maximum flow. So not what is the maximum
flow but what is the value. If you're concerned about the second
issue, then actually it's an assessment you can make essentially
without loss of generality because if we have an algorithm that works
given this value and we can just use appropriate binary search to
actually get this value to sufficient accuracy.
So we shouldn't be bothered by that. This is our assumption. And now
when you -- probably the first way you would think about using
electrical flow to solve max flow would be just the following
algorithm. So you just start by setting all resistance to 1. You
would compute the corresponding electrical flow of value S star and you
would hope for the best. You would just output this flow and it must
be close to the actual max flow.
So the question whether this will work, and probably you are very
skeptical and you should be, because here is a simple example when it
already fails. So consider that your graph G is just consisting of two
ST paths, one is of length one, one is of length seven. If you look at
what max flow will do will just send one unit of flow on this side and
one unit of flow on this side.
And, well, that would be the max flow. So the value of the max flow is
two. When you look at the corresponding electrical flow, then it will
favor this shortcutting edge much more than this long path because of
the differences in the resistances. And what we will end up with is
that, well, this is far from being a maximum flow. Even approximately,
and by repeating this kind of phenomenon on bigger scale,
farther and farther from max flow. So it's not really as
that. So this is not what we can do. What we can really
suffices. Then, well, the question is how can we fix it.
well the fix also seems to be a very natural one would be
following.
you can go
simple as
-- what
And probably
just the
So instead of like once we compute this electrical ST flow, well, we
don't output it right away. We just look at it and see whether there
are some edges which flow much more than they should.
What we do is just increase the resistances of the corresponding edges
because you want to discourage the electrical flow from flowing too
much on these edges.
And once we increase the corresponding resistances, what we do, we just
try to repeat the process. So we just now want to compute this edge
flow again and hope after not too many iterations this will converge
and we will be happy and we will get some approximation tool, maximum
flow.
So the surprising thing is that this actually can be made to work.
Like this actually -- by fitting this appropriately, this can be
working and in particular this is the ultimate of our algorithm. The
only slightly nonobvious thing that we are doing is that at the end,
the way we get the final flow that we output is not by looking at the
last flow we output, but by taking the average of all the flow to be
computed over time.
And still will be clear in a second why do we do that. Okay. So this
is the -- this is the general outcome of our algorithm. And what I
intend to do in the rest of the talk is just fill in some of the blanks
that are still left here.
>>: At the end you scale such all capacity constraints are preserved?
How do you make sure that ->> Aleksander Madry: I will talk in a second about taking the
average -- if I take an average of flows of value F star then I will
get a flow of value F star. Now, of course, I have to prove that
somehow the capacity will be all right and I will explain in a second
how one can go about it. Okay. So that's what I'll be just doing.
Now I'll be more precise about what this converges condition is, what
is the update resistance and so on.
And let us assume that like from now on we are, all capacities are one.
Our graph is sparse. And we know the value of max flow F star. Okay?
So the first observation that allows us to fill in some of the blanks
is the following: So let us fix some resistance RE and so arbitrary
RE. And let us look at the corresponding electrical flow in our graph
over value F star.
Now, of course, we don't expect this flow to be, to obey all the
capacity constraints. We just have seen an example when this is not
the case. But there is some interesting feature of this electrical
flow that sort of relates it to maximum flow, even though it's not a
maximum flow by itself. Namely, what happens is that if one looks at
the expected, at the expected overflow on the edges, the expected flow
on the edges. So what I mean by that is if I take expectation over all
the edges and expectation of the flow and each edge is weighted
proportional to the resistance, then if I look at this expected, the
expected flow, then it will be at most one. So the way you can, you
can interpret it is just on the weighted average, when weights are
given by RE, this flow is visible.
Okay? So it's not visible everywhere. It's not for every edge. It
does not flow more than one. But if you take this particular average,
then the integer will be at most one. So it will not overflow on
average.
And now the reason why this is important is that, well, the way -- like
the reason why it's important is that it gives us actually very fast
algorithm that solves the following algorithmic task. We're given a
weight WE, and we are able to compute a good flow. By good, I mean a
flow of F star flow, when you look at average flow on edges it will be
at most 1. Okay. And the thing is that we always -- we have to
specify the weights in advance, and then we give F flow all the way
around.
So now the key point, why this is important, is that actually we
already know a tool that allows us turning such an algorithm that
returns our, returns flows that are visible on average to, we know how
to harden such an algorithm to get an algorithm that is outputs an
approximation of the flow that is visible everywhere.
So we know how to turn this being good on average into being good
everywhere. And this method that achieves this is the multiplicative
weight method and it stems from the work on boosting and Lagrangian and
was cleaned out and passed from a framework way from Arora Hazan Kale.
Roughly speaking, what this method does, it's based on calling this
crude algorithm that are associated on average repeatedly with
different weights and by doing that in the end it outputs the average
of all the return flows and, well, there is ->>: Crude algorithm would be ->> Aleksander Madry:
precise in a second.
This is just one sentence. I will make it more
So this is just like the way you just treat this
crude algorithm is a black box. You just feed it with different
weights. That will evolve over time.
And while each will give you a flow, then you'll take the average of
all the flows you'll get. The question is how do you make these
weights evolve and why does it really work. Let me be more precise
about how this method works.
So the way it starts is by it will associate with each edge a weight
and initially this weight would be one for every edge.
As I said, it will repeatedly call our crude algorithm of different
weights, evolve over time. And in the end it will return to average.
The only question is how do these weights evolve over time?
And the way they will evolve is via multiplicative updates. Hence, the
name. What this multiplicative update does, it increases each weight
proportionally to the flow that the edge suffers in the last, suffered
in the last flow.
And, well, it's proportional to the epsilon which is our desired
accuracy parameter. And inverse proportionally to raw. What raw is,
it's just the wheat of the crude algorithm, and this is in general
global upper bound on the largest edge overflow that can occur in any
FIs. So we just want to have a general upper bound that we are always
sure that no matter what weights we fit to this crude algorithm, there
will be no edge that overflows more than raw.
In some sense, you can view raw as the worst case estimate, how close
our -- the flows are done by the crude algorithm are close to the
actual max flow.
>>: What is the -- you say what is a crude algorithm.
>> Aleksander Madry: I didn't say that. This is just an assumption.
This is the way of lifting this crude algorithm. So by crude
algorithm, I mean just the algorithm that returns to the flow that will
like given the weights returns you a flow that the weighted average of
the flows is at most one.
So this is the one that returns to the flow that is good on average.
>>: Just finding the electrical flow?
>> Aleksander Madry: Yes, that's what we will end up doing, yes. But
this is more general than that. That's all -- it just needs this
property. Okay. And the reason why we divided by raw is that we want
to make sure that this update, like this part of the update always
between zero and epsilon. So in this way we ensure that the weights
evolve smoothly enough so we can actually keep track of the evolution.
Okay. So this is the way the algorithm works. And now the underlying
dynamics of it is just the following: So when edge E suffers large
overflow, then by this multiplicative update its weight will grow
rapidly.
On the other hand, we know that while the crude algorithm returns flows
that are on average are good, so this we can use to prove that the sum
of all the weights doesn't grow too fast. It just grows slowly. And
from just comparing like the growth of one weight to the growth of the
sum of the weights, we know that there is no single edge that suffers
as large overflow too often.
And now if we take the average at the end, then for every edge this
average will average out these few bounds that can occur. And this is
the way why you can be sure that you take the average at the end it
will actually be close to the max flow. Even though each individual
flow can be quite far from it.
>>: You do have examples where the final flow is bad and really need
this averaging? Or ->> Aleksander Madry: I would say it's not -- if you just do this
algorithm and it might happen that each individual flow is bad. But I
don't have a specific example -- like ->>: The natural intuition is that you're roughly improving overall.
your flows at the end would be good.
>> Aleksander Madry:
thing.
So
I would really like to be able to prove such a
>>: But you probably ran examples.
>> Aleksander Madry: Well, we looked at examples. And we think -- we
have a process like very similar to that when it seems that it
converges to actual max flow as opposed to looking at the average but
we can't prove anything. I think if you were able to prove it you
would get much faster algorithm.
>>: But there's no example where it's ->> Aleksander Madry: No, there's no example where there's an
oscillation that always -- yeah. Okay. So this is the intuition on
dynamics. And the actual formal theorem that one might prove is if we
want to get a one minus approximation of max flow using this algorithm,
then the number of iterations to which we have to iterate this
procedure should be roughly proportional to raw over epsilon squared.
We need this raw dependence to make sure we can average out the bumps
that can happen, the bumps of raw.
So this is the theorem. And this is the description of the
multiplicative weight update method.
And now the reason why we care about it is that exactly as Yuval said,
we already know -- we already know an instance of this crude algorithm.
We can just look at the electrical flow computation. Now when you
substitute this electrical flow computation to this picture you'll see
it's very similar to the template that we just presented.
Namely, once you do the fitting, then the algorithm that you will end
up with is one that like it works with weights. And the convergence
condition is it just does this raw over epsilon squared number of
iterations and the way the weights evolve which corresponds to the way
the resistance evolves, is just via this multiplicative update rule.
So the way the electrical ST flow works, it looks at the current
weights, sets resistance equal to the weights and computes the flow.
Okay. And since by the way we obtain this instantiation of our
template we know that this algorithm works, namely after -- after
roughly raw over epsilon iteration, it will return the desired
approximation to max flow. And this will result in N times raw epsilon
squared epsilon running.
So now we know everything except one single thing namely what is raw.
So what is the worst case bound we can impose on the overflow of the
electrical flows that we are returning?
So this is the one thing that we still don't know. So let's think
about this. So let's start with the simple case corresponding to the
beginning when all the resistances are 1. So in this case what one can
actually prove is that there's no edge that the flows more than square
root of N that's roughly square root of N.
And the proof of it is quite simple, actually. So I will present it.
Namely, it consists of two steps. So first what we want to show is
that the energy of the electrical flow corresponding to the resistance
is at most M.
And the way we prove it is quite simple again. Namely, what we look at
is we look at a particular ST flow of value F star being the max flow.
We have no idea how max flow looks like. But we know that if you look
at energy, we know that on every edge the flow will be at most one. So
we see that we have N terms. Each of them is at most one. We know the
energy of the max flow is at most M.
Now all we really use is the fact that, well, we defined electrical
flow to be the deminimizer of all the energy of all the flows, all ST
flows of value F star. So from this we see that the upper bound on the
energy of the max flow has to be an upper bound on the energy of the
electrical flow as well. So this concludes step one.
And now for step two, all we just need to realize is that if we look at
any particular edge E, and we look at its contribution to energy, then
it can be at most the total energy, clearly. But we know that energy
is at most M. So from this we just get by taking square root of both
sides, we just get that the flow is at most square root of M, which is
square root of N of square root of M. That's all. It's a very simple
proof.
We know that if all resistances are 1, then raw is roughly square root
of N. The problem is that if all the resistances are not 1, if they're
not very uniform, then this argument doesn't really work. The thing is
that there might be one edge which has very small resistance, and if
electrical flow is okay with putting really large flow on that, really
large flow on that.
And, well, that's a problem. And the way we fix it is by just changing
our algorithm slightly. Namely, what we do, we just want to restrict
this uniformity of resistances. The way we do it whenever we compute
an electrical flow, we always mix in a uniform component resistances.
What I mean by that whenever we do the computation we add this thing
where a thing here is a normalization factor that makes sure that both
sides of this sum, they roughly, they contribute on the right scale.
And now when we have this mixing of the uniform measure, a reasoning
that's computing analogous to 1 that I just showed can convince us that
raw is never bigger than square root of N over epsilon. So we get an
additional epsilon here.
And now by -- well, now once we establish this, this results in a 1
minus epsilon approximation algorithm that runs in time which is N to
the three-halves over epsilon, epsilon to the third.
Okay. So this is the way we can get such a running time. But, of
course, what we are here after was the faster running time. Something
better than N to the three-halves. So what we're after is getting N to
the four-thirds running time.
So how do we do that? So let me sketch how we do that. Well, it seems
electrical flow computations are the basic operation that we are doing
here. Then we would like to reduce the number of this computation that
we are performing. And somewhat a tempting way of doing that would be
to improve our bound on the width. After all the arguments we're
comparing contribution of one edge versus the total energy, there's no
way this will be tied. But unfortunately it is.
And the example that shows it is just the following. So our graph G
is, will consist of roughly square root of N paths, and every of this
path except the last one is of length roughly square root of N. Okay?
So the value of max flow theory is square root of N roughly. And it
corresponds to something one user flow over every path.
However, if you look at the electrical flow, we will see that roughly
half of the flow will go over the single edge. And to indeed suffer
the square root of N width. So our bound is tied even though it didn't
look like it.
So that's a problem. But also so this example shows that we have a
problem, but also it gives us a hope of fixing stuff. Namely, what we
notice is that if you just remove this one shortcutting edge, then the
value of the max flow will not change much. The value is square root
of N. And we just removed one edge. This is a tiny fracture of the
max flow. But once we remove this edge and look at the electrical flow
in the resulting graph, then we will get, the electrical flow will be
much better behaved. It will actually be the max flow of the remaining
resulting graph. Now the obvious question is whether we can turn this
observation that the price on this particular example into some
algorithmic technique.
And you might -- as you might probably guess the answer is yes. And
the way we do it is we change our crude algorithm and we make it
self-enforced smaller with raw prime. So we'd like the width to be
smaller than raw prime. The way we do it is very naive in some sense.
Namely, what it does, previously when it was given the weight, and it
was computing the electrical flow corresponding to the corresponding
resistances and outputting this electrical flow back. What it does now
is after computation of the electrical flow, it looks at what it
output. If there is an edge that flows more than raw prime, then it
removes that edge. And once it's removed it's removed forever and
tries the computation again in the graph without this edge.
And it just repeats this procedure until it gets a flow that satisfies
this constraint or like something will happen, like it will disconnect
SNT. So clearly by the definition of this crude algorithm we know that
if this algorithm always successfully terminates, then its width is raw
prime because it made sure this is the case.
But of course now the question is what should -- what value of raw
prime should we choose to make sure it always successfully terminates.
Because we have two things that fight each other here.
So obviously we want raw prime to be as small as possible, because the
smaller it is, the less cause to the crude algorithm we have in our
multiplicative updates routine. So that's good. But then the smaller
raw prime you put the more likely you are to remove edges and this in
particular causes us to make a computation of the flow inside the crude
algorithm. So now we need to balance out these two things. It turns
out the way to do that is setting up raw prime to be N to the
one-third, and the one, just one sentence reason how can you arrive to
this conclusion is that, well, what you realize is that if you remove
an edge that flows a lot in the electrical flow, then its removal
increases the energy of the electrical flow significantly. But the
point of the max flow doesn't really matter. Each edge is equal from
the point of maximum flow. Once you look at what does this increase of
the energy, the significant increase of energy means, and you will do
the computation, that it will turn out that this is the setting that
allows you to make sure that always everything will work the way you
like it.
And, well, this setting of raw prime gives you the, well, gives you the
desired N to the four-third algorithm. Okay. So this is the way the
algorithm works. And let me just talk a bit about future work. So
where do I plan to go from now?
So this is our result regarding max flow. So we have this N to the N,
N times N to the one-third algorithm. And here note that I didn't tell
you how to get this running time when the M might be much bigger than
N. So I didn't tell you what to do when the graph is sparse, but it's
not hard. And the natural question here is can -- whether we can make
this approach work and give us a new linear time algorithm for max flow
and actually for exact max flow. Not only approximate. So there are
two questions that immediately stem from this question. So the first
question is whether you can even if you just look at the undirected
graphs and the approximation, can you just make a variant of this
algorithm run in linear time.
So this is going sort of back to your question. In particular, we have
no example, which shows that if you set this raw prime to be poly log
that anything could break out. We have no example. I don't really
believe there's an example that sets that. We just can't prove that
this is the case. So one question here would be can you actually do
better in terms of what happens in this, what's the edges,
multiplication of factor, and get better bound on raw prime, because
this would immediately give you a, immediately would give you a better
running time. And what I'm actually planning to do, soon, hopefully,
just like if I can prove it yet then I will at least try to run
experiment and see what happens. Because I really don't believe that
you need to set raw prime to be so high. So this is the first
question.
But the second question is even like more important one in some sense,
is whether we can make these ideas work for directed graph.
And at first it might look very -- well, very -- very crazy why would I
hope -- why would I hope for that, because the notion of electrical
flows seems to be very connected to undirected graphs. But as I
mentioned, there is a reduction that allows you reducing the question
for the directed graphs to the question of undirected graphs.
So in order to get any results for exact algorithm for directed case,
you don't need to leave the envelope from directed graphs. All you
really need to do is design an algorithm that is approximate and is for
undirected graphs that all that needs to change is dependence on error.
So if you are able to make it logarithmic instead of polynomial, then
we are done.
So that's the second question that I want to pursue here. But that's
only about max flow. And sort of in more general theme, something I
really would like to be doing in the future is looking at all the basic
problems in algorithmic graph theory and wondering when trying to prove
that actually we can solve this problem in -- whether we can solve this
problem in near linear time. What I mean by this is I would like to
have algorithms extremely fast. Essentially as fast as they can be.
And still go to like offer us a good priority solution. Maybe this
quality doesn't need to be exactly the best possible, the best known,
the best that we know that is possible to achieve for in polynomial
time. But be comparable.
So as an example of what do I mean by that is like we can focus our
attention on generalized sparsest cut problem which probably I don't
need to introduce here. And it is known that, well, using our max flow
result together with the framework of Sherman, what we can get is just
square root of logarithm sparsest time essentially N to the
four-thirds. So the significance of log N this is the best
approximation known that we know how to achieve for sparsest cut in
polynomial time, so it's not only if you want to be fast.
As long as this
anything better
is the fastest,
when we want to
is the ARV approximation and we have no idea how to get
even if you can take N to the hundredth time. So this
the fastest approximation algorithm for sparsest cut
have --
>>: Sparsest [inaudible].
>> Aleksander Madry: [inaudible]. So if we just want to get exactly
square root of log N, this is the fastest we can get. But what I
showed is that if you are willing to cut yourself a little bit of
slack, then, namely, instead of having square root of log N
approximation guaranteed to have just a poly logarithmic approximation
guarantee, what you can get is running time that's close to linear.
That's essentially linear.
And, by the way, just what I should mention here is also there's
another very, very efficient algorithm for a partitioning for sparsest
cut, it's just better partitioning, but the problem with partitioning
is that if the conductance of the graph is very bad, then the
approximation guarantee is very bad as well. It can be square root of
N in the worst case. That's why I didn't mention it here.
Now it turns out that actually the framework I introduced here, it
works not only for sparsest cut problem but it also works for
generalized sparsest cut problem, which was not known to have any fast
algorithm, the fastest algorithm was N squared. And for actually it's
even more broader than that. It works for a lot of other graph
partitioning problems. And so the question that I try to pursue here
is what other class of problems can admit this kind of result. You can
give up a little bit of approximation ratio and get a very, very fast
algorithm. Because if you want to think about working with these big
graphs, like I said, the running time seems to be the hard constraint.
So this is about the problems I want to pursue, so I want to say
something about the tools that I think are interesting and should be
pursued as well. So the question I want to ask here is: Where else
bridging algorithms can be useful? I say where else because you know a
lot of instances when this is useful. So this is the one example and
there's a more convenient one, namely, the successful story of the
eigenvalue connection, when we -- when we understood how the second
small eigenvalue relates to graph partitioning, results in partitioning
and how does it relate to understanding of random walks. So this is a
huge success of the bridging linear algebra and commuted algorithms and
sort of the question I'm after here is that can we take special graph
theory beyond lambda two, beyond the second most eigenvalue. In
particular, when you look at this, Laplacian seems to be a much richer
object than just object having second smallest eigenvalue. It has a
lot of other eigenvalues and in particular what it has, it has
electrical flows. Sort of like in some sense it describes electrical
flows of the graph. So my question here is whether we can employ
electrical flows to other combinatorial problems and will make them
helpful there.
So that's it. And let me just conclude with this statement that I
think is very important to realize, and we should realize it, that in
the last decade our community developed really, really a lot of
exciting tools. Like really a lot of exciting tools and I strongly
feel that this is time for us to take these tools and to re-examine all
the classical problems that we are studying for a couple of decades
now.
I think that this new tools will give us insight into that and I really
think that well this pursuit will be very fruitful in the future.
Thank you.
[applause]
>>:
Questions or comments?
>>: The very last thing you mentioned, do you have some examples in
mind?
>> Aleksander Madry:
Of this?
>>: Which -- the algebraic, connecting algebraic graph theory, what
really works, what do you get? Do you see examples?
>> Aleksander Madry: Yes, like,
world of flow problems, there is
flow. But, well, that's already
result, this is the direction to
well, obviously in the whole like
much more problems than just N max
sort of implied by the max flow
go to.
I still think that graph partitioning, that we could do something like
spectral partitioning but even better when we look at electrical flows
like the eigenvector corresponding to the second smallest eigenvalue.
I think, as I said already, we can use maximum flow to do this
partitioning, but maybe if we just don't try to go to this reduction,
just try to go get something directly that might be useful. And also,
like the other thing that I think -- like this is hard for me to tell
because if I knew how to apply I would be talking about these results
right now.
But like the other thing is that in general I think that this linear
algebraic tools are much better suited for like obtaining local
algorithms. So what I was saying here is like we have huge graphs and
we want to have as fast as algorithm as possible, because we cannot
really afford to have anything slower, like slower than linear. But
sometimes even new linear is not good enough. Sometimes we would just
like to have an algorithm that just works proportionally to the -- if
you want to cut out a chunk that is well supported from the rest of the
graph. Then what we might hope for is to have an algorithm whose
running time is not proportional to the size of the whole graph but
only to the size that we are like taking out. And this is called the
local algorithms and Yuval also had some results in this way. Probably
the best one currently, yes. And so like the thing is that you can ask
more questions whether like if you would sort of like your resulting
answer to just depend on very small size like very small piece of the
graph and you would like your running time to be proportional to this.
So, for instance, like local maximum flow computation. So the reason
I'm bringing this up here is that it seems that at least to me linear
algebra seems to be better suited to tackle these kind of questions.
Seems like the combinatorial method we have don't really seem to go in
this direction, I don't know how to do this.
But I still think that there is much richer subject to be discovered.
I don't know what it is yet. I will try to realize that at some point.
>>: Can anything be done with directed electrical flows?
>> Aleksander Madry: Like, you can always define -- like sort of you
can always define the electrical flows, does directed flow that
minimizes the energy. The thing is that the reason why we are
introducing electrical flows, they're a unique feature that you can
compute them [inaudible]. So I don't think there's an efficient way of
computing electrical flows. If there was, you can use this framework
and get directed.
>>: Linear programming.
>> Aleksander Madry: Like the thing is it seems to be linear
programming, not a linear system program. That sort of ->>: Does it have enough structure that maybe ->> Aleksander Madry: I don't know if there is. I don't know how to
exploit it. Like it's really, when you look into it, the reason why we
can solve like electrical flows so fast it seems to be extremely,
extremely like well special, seems like the summing of the square, like
sort of the reason why we can do it essentially is like this flow is a
potential flow. And this is what under lies all of this. It seems
like extremely special structure. And it doesn't really seem to be
applicable for directed graphs.
>>: When you said it from your point of view, you preferred the version
which you minimize the energy, but so this is only used in your proof.
So I can show that you can find the flow using linear systems. This is
just ->>: You use this in two ways. So, first of all, is that because like
you do it because you can do it fast. But -- I'm sorry. The other
thing when you do the bounding of the width you use the fact this is
square, not linear. For instance, if you just look at -- what you can
think about is using multiplicative ways to solve maximum flow problem
using like L1 like a flow that minimizes L1 the shortest path
computation. And the trouble there is in particular you don't have any
anodized bound on the width. Like if the resistances are one you look
at L1 minimization, your path might put, and it will, put everything on
the shortest path so an edge can suffer the congestion being
proportional to the value of the flow. So invariably the flow may be
exponential in the size of the problem. So you also lose -- it seems
like there is more to it than just the fact that you can do it fast.
Of course, like just using this framework, if you were able to
efficiently minimize, I don't know, well, L4 norm, then immediately it
gives you an algorithm that will be like N to the 4, N to the -- N to
the 4 over -- it will get better immediately by just the same by just
the same reasoning.
But like we don't know a way of efficiently doing that. So there are
two things. First we can do it efficiently and the other thing is we
need this higher than 1 power to make this we've bounding nontrivial.
So make sure you spread out your flow that you don't focus in your
path.
>>: Okay. So Aleksander is here until tomorrow. More questions?
Some of us are scheduled for meetings but if someone would like to meet
with him, we can still arrange something. So let's thank Aleksander
again.
[applause]
Download