21840 >>: Okay. Good morning, everyone. I've been a fan of electrical flows for a long time in their probabilistic context as have others here. And it's delightful to see they're flowering in the algorithmic realm. So today we're happy to welcome Aleksander Madry from MIT. >> Aleksander Madry: So hello. Thank you for the introduction, for the invitation. So what I plan on doing today is describing a new tool for graph algorithms that extends from exploiting the connection of the notion of electrical flows to the certain type of linear systems called Laplacian systems. So the way I intend to describe this tool is by describing to you how does it apply to a particular problem. Namely, maximum flow problem. And then by talking briefly about broader context. But before I do all that, let me talk a little bit about broader research motivation that motivated this research in particular and the research I intend to do in the future. So, well, this motivation stems from the fact that given the recent explosion in various types of services and applications, we are faced with the reality that we live in a world of huge graphs. It is not that these graphs are just out there. We really want to get our hands on them. We want to analyze them and understand them. And, in particular, the task of interest that we might think about pursuing is, for instance, graph partitioning because we would like to perform clustering or community detection or connectivity analyzing because we'd like to do congestion estimation in these networks or analyze resilience to link failures or doing network design. We would like to find a way of supporting on this graph some information infrastructure that is reliable and efficient. So this has some example tasks. And, of course, they're not really new to us. We studied these kind of problems for quite a long time already. But what have changed is that given the size of these graphs, now the time becomes the hard constraint. So now our algorithms, whatever they are doing, they need to be extra efficient. And in particular in this regime, having just the polynomial time algorithm, is not really enough. Like we need something much, much faster. So we see that the nature of changes we are facing changed and they have evolved but hopefully or fortunately so did our tools. And, in particular, in recent years, we have seen quite remarkable development in various areas of algorithmic graph theory and in scientific computing and optimization methods, and this development equipped us with new and very exciting tools that we didn't really have before. So what my research is mostly about is trying to look at these new tools that we are equipped with now and build that diverse algorithmic toolkit that will help us in addressing the changes that I've just mentioned. In particular, by employing this algorithmic toolkit, I managed to advance the state of the art on various basic graph problems that ranges from various flow problems through partitioning problems to, well, TSP problems and generation of random spanning trees. So what I intend to do today is just describe one of these results, namely, how some of the tools I just mentioned apply to the maximum flow problem. Okay. So let's start by defining formally what the maximum flow problem is. So in this problem we are given directed graph G, and we have a special vertex S, which is a source, and special vertex T which is the sync. And then additionally for each art we're given an integer capacity. Now the task that we are shooting for is finding a visible ST flow of maximum value. So what does it mean? It means that we want to have a flow we can view as an assignment of numbers to the arcs. And we want two conditions to be satisfied. So first condition is that if we look at any vertex other than S and P, the new lag total flow into this vertex to be equal to the total flow out of this. So we'd like to have flow conservation constraints. The second constraint is that if you look at an arc, then the flow on this arc will not be built in this capacity. Now given the constraint, our task is to maximize the value of the flow which is just the net flow into T, which is by flow conservation constraints equal to the net flow out of S. So in this example we have a flow of value of 7. But the actual max flow value of this graph is 10. And here is an example of a flow that achieves this value. Okay. So this is a problem. And now a natural question that we should ask is: Why do we care about this problem? And in this case actually the answer is pretty easy. Well, what one can just say is one of the fundamental optimization problems, and there's many reasons for that. So the study of this problem dates back to the 1930s. And it's one of the problems that is extremely broadly applied in practice. And there's formally many reasons. One of the reasons is because, just by the [inaudible] variability to the transportation problems but also through various reductions it turns out that maximum flow computation corresponds to tasks like scheduling, graph partitioning or image segmentations. So it's really broadly applied in practice but that's not the only reason why we care about it. Namely, this is one of those problems, well, when we were trying to understand it, it shaped our understanding of combinatorial algorithms in general. So many tools that were developed to just understand this problem turned out to be useful in broader context. And in particular it seems that somehow this problem captures some parts of a algorithmic graph theory we'd like to understand. Namely, once again, some techniques are aimed at making progress on this problem turned out to be broadly useful for other in our contexts as well. So this is why do we care about this problem. little bit what is known about this problem. And now let's talk a So well as you might imagine, given the long history of the problem, there is a lot of previous work. So let me just -- well, in the interests of time, let me just not talk about this previous work at all. I hope you will forgive me. All I will mention is the current state-of-the-art algorithm which is the seminal algorithm by Goldberg Rao, whose running time can be expressed by this formula; but as we care about big picture, as we care about big picture in this talk, so let me introduce alternate notation that suppresses all the algorithmic factors. So if I write alt Q that means that there's some logarithmic factors lingering; but usually they will not be too big. Here we have a running time of Goldberg Rao, and one of the stems is bigger depending on the sparsity of the graphs, depending on the ratio of the edges to the vertices. Again, let me do one more simplification and throughout this talk let's assume that we are dealing with a sparse graph, so a graph that was number of edges is not that much bigger than number of edges. Once again, there's a couple of reasons why we might do that. First reason is the huge graphs tend to be sparse. And the second reason is that we already have some limited way of dealing with density. So there are techniques that allow slightly, reduced the density of the graph as dense. But still sort of the regime in which we really don't know too much is when the graph is sparse to begin with. Sort of like understanding algorithms in this regime seems to be the benchmark of our understanding of the problem. Okay. So from now on let's focus on sparse graphs. So now, well, if we focus on sparse graphs, the running time of Goldberg Rao is NG one-half. Probably one of the most essential questions in algorithmic graph theories whether we can improve this running time. Unfortunately, as much as you would like to know, well, we don't, and in some sense it's even more embarrassing than that. Namely, even if you look at the baby case of this problem, this question, namely we should trick ourselves switch to graphs all capacity is being 1. Then this running of N to the three-halves is known for 35 years and is not improved yet. So we are really stuck here. Because of that what people did over time is formulated sort of a slightly -- well, hopefully simpler challenge that will be, that can be viewed as a first step toward understanding the max flow complexity. Namely, what people set out to do is just obtain an algorithm that one minus epsilon approximate is the maximum flow in undirected graphs. They would like to be N to the three-halves running time. Just a remark here, so we should look at this challenge and it seems that we are giving up two things. First of all, we settle for approximation and then we settle for unrelated graphs only. But actually if we only settled for undirected graphs, then max flow has an interesting feature that if we are able to solve max flow exactly on the undirected graphs, then we're actually able to solve max flow on undirected graphs as well. So just giving up the directness of the graph is not enough. It doesn't really make the problem simpler. That's why we need to also introduce the approximation. So now this is this slightly simpler challenge, or maybe very -- much simpler challenge. But still even for this challenge the best algorithm, even if all the capacities are one, is just take Goldberg Rao and run it. So we don't know how to improve this running time. What I want to talk about today is the first improvement towards this challenge; namely, the result due to Paul Christiana, Don Kelner, [phonetic] and me which shows that one minus epsilon approximate max flow in directed graphs in time which is roughly N times one-third. So for sparse graph is N to the four-thirds. So this is the result that I want to focus on for the rest of the talk. So let's start with a discussion of the general approach we are taking. So as probably you recall, Goldberg Rao algorithm is based on augmenting sparse framwork. And just one minute description will frame graph is it's based on iterative finding of ST path in dual graphs. First off, if all the [inaudible] are 1, then what happens we start with our original graph. We find some ST path. And then we just flip all the edges on this ST path and this way we get a residual graph corresponding to augmentation of the flow of the graph. Now we have this residual graph then we proceed to looking for ST graph in this new path. We repeat this procedure and we try to do it as long as until it finishes until we cannot find any more ST paths, and there is an easy procedure that allows us extracting of the final flow out of the last residual graph that we actually compute. So this is the way this framework works. And as you can see, it's purely combinatorial. So the flow is built path by path and it's greedy. >>: Just for in addition, what's the difference between this and what percentage? >> Aleksander Madry: focus. No, this is like this framework was developed by >>: You compute the blocking flow which ->> Aleksander Madry: Yes, but like -- >>: It's really path by path. >> Aleksander Madry: It is path by path. But it's just efficiently. The way you can view block, blocking flow, is computation of greedy, sequence of greedy ST paths in such a way that you will do it until you will bottleneck the graph. The only difference here that I sort of, that is the beauty of blocking flow graph is you can do it faster by just looking at the whole graph each time you find it. But in the end you can conceptually think about it as finding a sequence of ST paths, you just do it faster than just doing [inaudible] each time. Focus is essentially doing -- they didn't know that it acts the right way, but focus is just doing this in less action manner, blocking flows still does the same thing but does it faster, because it allows to find the many paths simultaneously. But conceptually it's still the same framework and still the same algorithm. Okay. So this is the framework. And like due to its simplicity and elegance, it was the base for a lot of beautiful algorithms, in particular Goldberg Rao was based on it. But so then the tempting approach would be just to look at Goldberg Rao and assume that we are okay with having approximation and we're dealing only with undirected graphs and try to improve something. But, unfortunately, that's exactly -- that's exactly the barrier that we are starting for some time already. And it's really unclear how to get the speed-up on this route. So what we do in our result, we try to attack the problem from a different angle. And sort of the one sentence explanation of what this angle is is just we try to probe the global flow structure of the graph by solving linear systems. And now the obvious question here will be how can you relate flow structure, [inaudible] as flow structure to something as linear algebraic as linear systems and as you might guess from the title of the talk the answer is electrical flows. So let me talk a little bit about what electrical flows are. And probably the easiest definition of electrical flows is the physics 101 definition. So the way we define it is like we have undirected graph G. We have source and sync. And also we have some resistances that are assigned to each of the edges. And now the simple recipe for getting an electrical flow corresponding to this resistance is just as follows. We treat each edge as the resistor of the resistance assigned by RE. And now while we treat the whole graph as electrical circuit and we just connect the battery to an SNT so we have some current-induced that flows from S to T. And this is the current exactly the electrical flow we are looking for. So as simple as that. Unfortunately, even though it's very intuitive definition, it's not really the most easiest definition to work with when you want to prove anything about this electrical flow. So the definition that we will actually use in this talk will be slightly different but equivalent. Namely, what it says is that if we have a graph and resistance is in source and sync, then what an electrical flow in this graph is, is the unique ST flow of, like electrical flow of value S is the unique ST flow of value F in this graph minimizes the energy over all possible ST flows of value F. Energy here is just the -- well, heat dissipation of this current on the edges. So this is just some of RE times the flow on the edge squared. So this is the definition that we are actually using. And it's equivalent to the one that you get by using Kehoff's law [phonetic] and Ohm's Law. So this is the electrical flow. This is the definition. Now, the question why do we care about electrical flows from the point of view from algorithms. Okay. So network efficient that comes to mind now how do you actually compute an electrical flow. And the remarkable thing is all you really need to do to get an electrical flow corresponding to a graph is just to solve a linear system. It's even better than that because it's not just an arbitrary linear system but it's a linear system -- well, it's called the [inaudible] linear system. It means the constrained matrix is a Laplacian graph. I will not define what Laplacian is, it's not important for the understanding of the rest of the talk. But I'll say, it's an extremely important matrix associated with the graphs and there's the whole very nice field called spectrograph theory which sole purpose is inferring properties of the graph G based on analyzing properties of Laplacian. Now we see to compute electrical flow, we need to solve a linear system, and now the bottom line here is we actually know that these type of systems, Laplacian linear systems, can be solved very efficiently. We can solve them essentially -- this is the result due to Spielman Tang that was recently simplified by Koutis, Miller and Tang [phonetic] and it's essentially because why we don't solve them exactly, we solve them approximately. Seeing the dependence on the error is good enough, for the sake of this application we can assume we can do it exactly. Given this result, we see the electrical flow is just a new linear time primitive. So we have a very fast algorithm that computes for us an electrical flow. And now the question we want to ask is how can we employ this primitive to actually say something about maximum flow of the graph. >>: It's like a [inaudible] algorithm. >> Aleksander Madry: Excuse me? >>: Like I was wondering how [inaudible]. >> Aleksander Madry: That's an excellent question. Couple of answers to it. First of all, this is a practical, this is a beautiful algorithmic result, absolutely unpractical, absolutely unpractical. This result actually is what I would call it very likely to be practical, because it's actual. Both of them are linear, but, first of all, the Koutis Miller Tang is much, much simpler and running time is linear, meaning N times log squared. When for Spielman Tang it's N times log 17 or like there is no officially established upper bound on this poly log. It's known there are some constants. It's not a huge constant. But even if you just disregard this, turns out, like solving Laplacian linear system, well known practical problem that people have also heuristical algorithm that do it very efficiently. Depending from which side we want to look at it, it seems from both theoretical and theoretical point of view, it's practical primitive. From the [inaudible] point of view it wasn't the case when Spielman Tang came out. >>: This is a technical detail. Maybe you'll get into this later. when you map from the ST flow, traditional ST flow to maximum capacities electrical flow, do you have to basically replace the resistances with resistance because inductance, the larger the conductor the bigger the flow. But >> Aleksander Madry: I'll come to that in a second. You'll see how resistances as well inverses of conductances are [inaudible] depends on the way you want to think about it. I prefer to think about resistance -- but it will be actually in the next slide. This is the primitive that we are getting. And now what I would like to talk about is how can you employ such a primitive to solve max flow problem? So when I will be doing this, let me do some simplifications. Let me assume that all capacities are one. Let me also assume that I actually know the value of maximum flow. So not what is the maximum flow but what is the value. If you're concerned about the second issue, then actually it's an assessment you can make essentially without loss of generality because if we have an algorithm that works given this value and we can just use appropriate binary search to actually get this value to sufficient accuracy. So we shouldn't be bothered by that. This is our assumption. And now when you -- probably the first way you would think about using electrical flow to solve max flow would be just the following algorithm. So you just start by setting all resistance to 1. You would compute the corresponding electrical flow of value S star and you would hope for the best. You would just output this flow and it must be close to the actual max flow. So the question whether this will work, and probably you are very skeptical and you should be, because here is a simple example when it already fails. So consider that your graph G is just consisting of two ST paths, one is of length one, one is of length seven. If you look at what max flow will do will just send one unit of flow on this side and one unit of flow on this side. And, well, that would be the max flow. So the value of the max flow is two. When you look at the corresponding electrical flow, then it will favor this shortcutting edge much more than this long path because of the differences in the resistances. And what we will end up with is that, well, this is far from being a maximum flow. Even approximately, and by repeating this kind of phenomenon on bigger scale, farther and farther from max flow. So it's not really as that. So this is not what we can do. What we can really suffices. Then, well, the question is how can we fix it. well the fix also seems to be a very natural one would be following. you can go simple as -- what And probably just the So instead of like once we compute this electrical ST flow, well, we don't output it right away. We just look at it and see whether there are some edges which flow much more than they should. What we do is just increase the resistances of the corresponding edges because you want to discourage the electrical flow from flowing too much on these edges. And once we increase the corresponding resistances, what we do, we just try to repeat the process. So we just now want to compute this edge flow again and hope after not too many iterations this will converge and we will be happy and we will get some approximation tool, maximum flow. So the surprising thing is that this actually can be made to work. Like this actually -- by fitting this appropriately, this can be working and in particular this is the ultimate of our algorithm. The only slightly nonobvious thing that we are doing is that at the end, the way we get the final flow that we output is not by looking at the last flow we output, but by taking the average of all the flow to be computed over time. And still will be clear in a second why do we do that. Okay. So this is the -- this is the general outcome of our algorithm. And what I intend to do in the rest of the talk is just fill in some of the blanks that are still left here. >>: At the end you scale such all capacity constraints are preserved? How do you make sure that ->> Aleksander Madry: I will talk in a second about taking the average -- if I take an average of flows of value F star then I will get a flow of value F star. Now, of course, I have to prove that somehow the capacity will be all right and I will explain in a second how one can go about it. Okay. So that's what I'll be just doing. Now I'll be more precise about what this converges condition is, what is the update resistance and so on. And let us assume that like from now on we are, all capacities are one. Our graph is sparse. And we know the value of max flow F star. Okay? So the first observation that allows us to fill in some of the blanks is the following: So let us fix some resistance RE and so arbitrary RE. And let us look at the corresponding electrical flow in our graph over value F star. Now, of course, we don't expect this flow to be, to obey all the capacity constraints. We just have seen an example when this is not the case. But there is some interesting feature of this electrical flow that sort of relates it to maximum flow, even though it's not a maximum flow by itself. Namely, what happens is that if one looks at the expected, at the expected overflow on the edges, the expected flow on the edges. So what I mean by that is if I take expectation over all the edges and expectation of the flow and each edge is weighted proportional to the resistance, then if I look at this expected, the expected flow, then it will be at most one. So the way you can, you can interpret it is just on the weighted average, when weights are given by RE, this flow is visible. Okay? So it's not visible everywhere. It's not for every edge. It does not flow more than one. But if you take this particular average, then the integer will be at most one. So it will not overflow on average. And now the reason why this is important is that, well, the way -- like the reason why it's important is that it gives us actually very fast algorithm that solves the following algorithmic task. We're given a weight WE, and we are able to compute a good flow. By good, I mean a flow of F star flow, when you look at average flow on edges it will be at most 1. Okay. And the thing is that we always -- we have to specify the weights in advance, and then we give F flow all the way around. So now the key point, why this is important, is that actually we already know a tool that allows us turning such an algorithm that returns our, returns flows that are visible on average to, we know how to harden such an algorithm to get an algorithm that is outputs an approximation of the flow that is visible everywhere. So we know how to turn this being good on average into being good everywhere. And this method that achieves this is the multiplicative weight method and it stems from the work on boosting and Lagrangian and was cleaned out and passed from a framework way from Arora Hazan Kale. Roughly speaking, what this method does, it's based on calling this crude algorithm that are associated on average repeatedly with different weights and by doing that in the end it outputs the average of all the return flows and, well, there is ->>: Crude algorithm would be ->> Aleksander Madry: precise in a second. This is just one sentence. I will make it more So this is just like the way you just treat this crude algorithm is a black box. You just feed it with different weights. That will evolve over time. And while each will give you a flow, then you'll take the average of all the flows you'll get. The question is how do you make these weights evolve and why does it really work. Let me be more precise about how this method works. So the way it starts is by it will associate with each edge a weight and initially this weight would be one for every edge. As I said, it will repeatedly call our crude algorithm of different weights, evolve over time. And in the end it will return to average. The only question is how do these weights evolve over time? And the way they will evolve is via multiplicative updates. Hence, the name. What this multiplicative update does, it increases each weight proportionally to the flow that the edge suffers in the last, suffered in the last flow. And, well, it's proportional to the epsilon which is our desired accuracy parameter. And inverse proportionally to raw. What raw is, it's just the wheat of the crude algorithm, and this is in general global upper bound on the largest edge overflow that can occur in any FIs. So we just want to have a general upper bound that we are always sure that no matter what weights we fit to this crude algorithm, there will be no edge that overflows more than raw. In some sense, you can view raw as the worst case estimate, how close our -- the flows are done by the crude algorithm are close to the actual max flow. >>: What is the -- you say what is a crude algorithm. >> Aleksander Madry: I didn't say that. This is just an assumption. This is the way of lifting this crude algorithm. So by crude algorithm, I mean just the algorithm that returns to the flow that will like given the weights returns you a flow that the weighted average of the flows is at most one. So this is the one that returns to the flow that is good on average. >>: Just finding the electrical flow? >> Aleksander Madry: Yes, that's what we will end up doing, yes. But this is more general than that. That's all -- it just needs this property. Okay. And the reason why we divided by raw is that we want to make sure that this update, like this part of the update always between zero and epsilon. So in this way we ensure that the weights evolve smoothly enough so we can actually keep track of the evolution. Okay. So this is the way the algorithm works. And now the underlying dynamics of it is just the following: So when edge E suffers large overflow, then by this multiplicative update its weight will grow rapidly. On the other hand, we know that while the crude algorithm returns flows that are on average are good, so this we can use to prove that the sum of all the weights doesn't grow too fast. It just grows slowly. And from just comparing like the growth of one weight to the growth of the sum of the weights, we know that there is no single edge that suffers as large overflow too often. And now if we take the average at the end, then for every edge this average will average out these few bounds that can occur. And this is the way why you can be sure that you take the average at the end it will actually be close to the max flow. Even though each individual flow can be quite far from it. >>: You do have examples where the final flow is bad and really need this averaging? Or ->> Aleksander Madry: I would say it's not -- if you just do this algorithm and it might happen that each individual flow is bad. But I don't have a specific example -- like ->>: The natural intuition is that you're roughly improving overall. your flows at the end would be good. >> Aleksander Madry: thing. So I would really like to be able to prove such a >>: But you probably ran examples. >> Aleksander Madry: Well, we looked at examples. And we think -- we have a process like very similar to that when it seems that it converges to actual max flow as opposed to looking at the average but we can't prove anything. I think if you were able to prove it you would get much faster algorithm. >>: But there's no example where it's ->> Aleksander Madry: No, there's no example where there's an oscillation that always -- yeah. Okay. So this is the intuition on dynamics. And the actual formal theorem that one might prove is if we want to get a one minus approximation of max flow using this algorithm, then the number of iterations to which we have to iterate this procedure should be roughly proportional to raw over epsilon squared. We need this raw dependence to make sure we can average out the bumps that can happen, the bumps of raw. So this is the theorem. And this is the description of the multiplicative weight update method. And now the reason why we care about it is that exactly as Yuval said, we already know -- we already know an instance of this crude algorithm. We can just look at the electrical flow computation. Now when you substitute this electrical flow computation to this picture you'll see it's very similar to the template that we just presented. Namely, once you do the fitting, then the algorithm that you will end up with is one that like it works with weights. And the convergence condition is it just does this raw over epsilon squared number of iterations and the way the weights evolve which corresponds to the way the resistance evolves, is just via this multiplicative update rule. So the way the electrical ST flow works, it looks at the current weights, sets resistance equal to the weights and computes the flow. Okay. And since by the way we obtain this instantiation of our template we know that this algorithm works, namely after -- after roughly raw over epsilon iteration, it will return the desired approximation to max flow. And this will result in N times raw epsilon squared epsilon running. So now we know everything except one single thing namely what is raw. So what is the worst case bound we can impose on the overflow of the electrical flows that we are returning? So this is the one thing that we still don't know. So let's think about this. So let's start with the simple case corresponding to the beginning when all the resistances are 1. So in this case what one can actually prove is that there's no edge that the flows more than square root of N that's roughly square root of N. And the proof of it is quite simple, actually. So I will present it. Namely, it consists of two steps. So first what we want to show is that the energy of the electrical flow corresponding to the resistance is at most M. And the way we prove it is quite simple again. Namely, what we look at is we look at a particular ST flow of value F star being the max flow. We have no idea how max flow looks like. But we know that if you look at energy, we know that on every edge the flow will be at most one. So we see that we have N terms. Each of them is at most one. We know the energy of the max flow is at most M. Now all we really use is the fact that, well, we defined electrical flow to be the deminimizer of all the energy of all the flows, all ST flows of value F star. So from this we see that the upper bound on the energy of the max flow has to be an upper bound on the energy of the electrical flow as well. So this concludes step one. And now for step two, all we just need to realize is that if we look at any particular edge E, and we look at its contribution to energy, then it can be at most the total energy, clearly. But we know that energy is at most M. So from this we just get by taking square root of both sides, we just get that the flow is at most square root of M, which is square root of N of square root of M. That's all. It's a very simple proof. We know that if all resistances are 1, then raw is roughly square root of N. The problem is that if all the resistances are not 1, if they're not very uniform, then this argument doesn't really work. The thing is that there might be one edge which has very small resistance, and if electrical flow is okay with putting really large flow on that, really large flow on that. And, well, that's a problem. And the way we fix it is by just changing our algorithm slightly. Namely, what we do, we just want to restrict this uniformity of resistances. The way we do it whenever we compute an electrical flow, we always mix in a uniform component resistances. What I mean by that whenever we do the computation we add this thing where a thing here is a normalization factor that makes sure that both sides of this sum, they roughly, they contribute on the right scale. And now when we have this mixing of the uniform measure, a reasoning that's computing analogous to 1 that I just showed can convince us that raw is never bigger than square root of N over epsilon. So we get an additional epsilon here. And now by -- well, now once we establish this, this results in a 1 minus epsilon approximation algorithm that runs in time which is N to the three-halves over epsilon, epsilon to the third. Okay. So this is the way we can get such a running time. But, of course, what we are here after was the faster running time. Something better than N to the three-halves. So what we're after is getting N to the four-thirds running time. So how do we do that? So let me sketch how we do that. Well, it seems electrical flow computations are the basic operation that we are doing here. Then we would like to reduce the number of this computation that we are performing. And somewhat a tempting way of doing that would be to improve our bound on the width. After all the arguments we're comparing contribution of one edge versus the total energy, there's no way this will be tied. But unfortunately it is. And the example that shows it is just the following. So our graph G is, will consist of roughly square root of N paths, and every of this path except the last one is of length roughly square root of N. Okay? So the value of max flow theory is square root of N roughly. And it corresponds to something one user flow over every path. However, if you look at the electrical flow, we will see that roughly half of the flow will go over the single edge. And to indeed suffer the square root of N width. So our bound is tied even though it didn't look like it. So that's a problem. But also so this example shows that we have a problem, but also it gives us a hope of fixing stuff. Namely, what we notice is that if you just remove this one shortcutting edge, then the value of the max flow will not change much. The value is square root of N. And we just removed one edge. This is a tiny fracture of the max flow. But once we remove this edge and look at the electrical flow in the resulting graph, then we will get, the electrical flow will be much better behaved. It will actually be the max flow of the remaining resulting graph. Now the obvious question is whether we can turn this observation that the price on this particular example into some algorithmic technique. And you might -- as you might probably guess the answer is yes. And the way we do it is we change our crude algorithm and we make it self-enforced smaller with raw prime. So we'd like the width to be smaller than raw prime. The way we do it is very naive in some sense. Namely, what it does, previously when it was given the weight, and it was computing the electrical flow corresponding to the corresponding resistances and outputting this electrical flow back. What it does now is after computation of the electrical flow, it looks at what it output. If there is an edge that flows more than raw prime, then it removes that edge. And once it's removed it's removed forever and tries the computation again in the graph without this edge. And it just repeats this procedure until it gets a flow that satisfies this constraint or like something will happen, like it will disconnect SNT. So clearly by the definition of this crude algorithm we know that if this algorithm always successfully terminates, then its width is raw prime because it made sure this is the case. But of course now the question is what should -- what value of raw prime should we choose to make sure it always successfully terminates. Because we have two things that fight each other here. So obviously we want raw prime to be as small as possible, because the smaller it is, the less cause to the crude algorithm we have in our multiplicative updates routine. So that's good. But then the smaller raw prime you put the more likely you are to remove edges and this in particular causes us to make a computation of the flow inside the crude algorithm. So now we need to balance out these two things. It turns out the way to do that is setting up raw prime to be N to the one-third, and the one, just one sentence reason how can you arrive to this conclusion is that, well, what you realize is that if you remove an edge that flows a lot in the electrical flow, then its removal increases the energy of the electrical flow significantly. But the point of the max flow doesn't really matter. Each edge is equal from the point of maximum flow. Once you look at what does this increase of the energy, the significant increase of energy means, and you will do the computation, that it will turn out that this is the setting that allows you to make sure that always everything will work the way you like it. And, well, this setting of raw prime gives you the, well, gives you the desired N to the four-third algorithm. Okay. So this is the way the algorithm works. And let me just talk a bit about future work. So where do I plan to go from now? So this is our result regarding max flow. So we have this N to the N, N times N to the one-third algorithm. And here note that I didn't tell you how to get this running time when the M might be much bigger than N. So I didn't tell you what to do when the graph is sparse, but it's not hard. And the natural question here is can -- whether we can make this approach work and give us a new linear time algorithm for max flow and actually for exact max flow. Not only approximate. So there are two questions that immediately stem from this question. So the first question is whether you can even if you just look at the undirected graphs and the approximation, can you just make a variant of this algorithm run in linear time. So this is going sort of back to your question. In particular, we have no example, which shows that if you set this raw prime to be poly log that anything could break out. We have no example. I don't really believe there's an example that sets that. We just can't prove that this is the case. So one question here would be can you actually do better in terms of what happens in this, what's the edges, multiplication of factor, and get better bound on raw prime, because this would immediately give you a, immediately would give you a better running time. And what I'm actually planning to do, soon, hopefully, just like if I can prove it yet then I will at least try to run experiment and see what happens. Because I really don't believe that you need to set raw prime to be so high. So this is the first question. But the second question is even like more important one in some sense, is whether we can make these ideas work for directed graph. And at first it might look very -- well, very -- very crazy why would I hope -- why would I hope for that, because the notion of electrical flows seems to be very connected to undirected graphs. But as I mentioned, there is a reduction that allows you reducing the question for the directed graphs to the question of undirected graphs. So in order to get any results for exact algorithm for directed case, you don't need to leave the envelope from directed graphs. All you really need to do is design an algorithm that is approximate and is for undirected graphs that all that needs to change is dependence on error. So if you are able to make it logarithmic instead of polynomial, then we are done. So that's the second question that I want to pursue here. But that's only about max flow. And sort of in more general theme, something I really would like to be doing in the future is looking at all the basic problems in algorithmic graph theory and wondering when trying to prove that actually we can solve this problem in -- whether we can solve this problem in near linear time. What I mean by this is I would like to have algorithms extremely fast. Essentially as fast as they can be. And still go to like offer us a good priority solution. Maybe this quality doesn't need to be exactly the best possible, the best known, the best that we know that is possible to achieve for in polynomial time. But be comparable. So as an example of what do I mean by that is like we can focus our attention on generalized sparsest cut problem which probably I don't need to introduce here. And it is known that, well, using our max flow result together with the framework of Sherman, what we can get is just square root of logarithm sparsest time essentially N to the four-thirds. So the significance of log N this is the best approximation known that we know how to achieve for sparsest cut in polynomial time, so it's not only if you want to be fast. As long as this anything better is the fastest, when we want to is the ARV approximation and we have no idea how to get even if you can take N to the hundredth time. So this the fastest approximation algorithm for sparsest cut have -- >>: Sparsest [inaudible]. >> Aleksander Madry: [inaudible]. So if we just want to get exactly square root of log N, this is the fastest we can get. But what I showed is that if you are willing to cut yourself a little bit of slack, then, namely, instead of having square root of log N approximation guaranteed to have just a poly logarithmic approximation guarantee, what you can get is running time that's close to linear. That's essentially linear. And, by the way, just what I should mention here is also there's another very, very efficient algorithm for a partitioning for sparsest cut, it's just better partitioning, but the problem with partitioning is that if the conductance of the graph is very bad, then the approximation guarantee is very bad as well. It can be square root of N in the worst case. That's why I didn't mention it here. Now it turns out that actually the framework I introduced here, it works not only for sparsest cut problem but it also works for generalized sparsest cut problem, which was not known to have any fast algorithm, the fastest algorithm was N squared. And for actually it's even more broader than that. It works for a lot of other graph partitioning problems. And so the question that I try to pursue here is what other class of problems can admit this kind of result. You can give up a little bit of approximation ratio and get a very, very fast algorithm. Because if you want to think about working with these big graphs, like I said, the running time seems to be the hard constraint. So this is about the problems I want to pursue, so I want to say something about the tools that I think are interesting and should be pursued as well. So the question I want to ask here is: Where else bridging algorithms can be useful? I say where else because you know a lot of instances when this is useful. So this is the one example and there's a more convenient one, namely, the successful story of the eigenvalue connection, when we -- when we understood how the second small eigenvalue relates to graph partitioning, results in partitioning and how does it relate to understanding of random walks. So this is a huge success of the bridging linear algebra and commuted algorithms and sort of the question I'm after here is that can we take special graph theory beyond lambda two, beyond the second most eigenvalue. In particular, when you look at this, Laplacian seems to be a much richer object than just object having second smallest eigenvalue. It has a lot of other eigenvalues and in particular what it has, it has electrical flows. Sort of like in some sense it describes electrical flows of the graph. So my question here is whether we can employ electrical flows to other combinatorial problems and will make them helpful there. So that's it. And let me just conclude with this statement that I think is very important to realize, and we should realize it, that in the last decade our community developed really, really a lot of exciting tools. Like really a lot of exciting tools and I strongly feel that this is time for us to take these tools and to re-examine all the classical problems that we are studying for a couple of decades now. I think that this new tools will give us insight into that and I really think that well this pursuit will be very fruitful in the future. Thank you. [applause] >>: Questions or comments? >>: The very last thing you mentioned, do you have some examples in mind? >> Aleksander Madry: Of this? >>: Which -- the algebraic, connecting algebraic graph theory, what really works, what do you get? Do you see examples? >> Aleksander Madry: Yes, like, world of flow problems, there is flow. But, well, that's already result, this is the direction to well, obviously in the whole like much more problems than just N max sort of implied by the max flow go to. I still think that graph partitioning, that we could do something like spectral partitioning but even better when we look at electrical flows like the eigenvector corresponding to the second smallest eigenvalue. I think, as I said already, we can use maximum flow to do this partitioning, but maybe if we just don't try to go to this reduction, just try to go get something directly that might be useful. And also, like the other thing that I think -- like this is hard for me to tell because if I knew how to apply I would be talking about these results right now. But like the other thing is that in general I think that this linear algebraic tools are much better suited for like obtaining local algorithms. So what I was saying here is like we have huge graphs and we want to have as fast as algorithm as possible, because we cannot really afford to have anything slower, like slower than linear. But sometimes even new linear is not good enough. Sometimes we would just like to have an algorithm that just works proportionally to the -- if you want to cut out a chunk that is well supported from the rest of the graph. Then what we might hope for is to have an algorithm whose running time is not proportional to the size of the whole graph but only to the size that we are like taking out. And this is called the local algorithms and Yuval also had some results in this way. Probably the best one currently, yes. And so like the thing is that you can ask more questions whether like if you would sort of like your resulting answer to just depend on very small size like very small piece of the graph and you would like your running time to be proportional to this. So, for instance, like local maximum flow computation. So the reason I'm bringing this up here is that it seems that at least to me linear algebra seems to be better suited to tackle these kind of questions. Seems like the combinatorial method we have don't really seem to go in this direction, I don't know how to do this. But I still think that there is much richer subject to be discovered. I don't know what it is yet. I will try to realize that at some point. >>: Can anything be done with directed electrical flows? >> Aleksander Madry: Like, you can always define -- like sort of you can always define the electrical flows, does directed flow that minimizes the energy. The thing is that the reason why we are introducing electrical flows, they're a unique feature that you can compute them [inaudible]. So I don't think there's an efficient way of computing electrical flows. If there was, you can use this framework and get directed. >>: Linear programming. >> Aleksander Madry: Like the thing is it seems to be linear programming, not a linear system program. That sort of ->>: Does it have enough structure that maybe ->> Aleksander Madry: I don't know if there is. I don't know how to exploit it. Like it's really, when you look into it, the reason why we can solve like electrical flows so fast it seems to be extremely, extremely like well special, seems like the summing of the square, like sort of the reason why we can do it essentially is like this flow is a potential flow. And this is what under lies all of this. It seems like extremely special structure. And it doesn't really seem to be applicable for directed graphs. >>: When you said it from your point of view, you preferred the version which you minimize the energy, but so this is only used in your proof. So I can show that you can find the flow using linear systems. This is just ->>: You use this in two ways. So, first of all, is that because like you do it because you can do it fast. But -- I'm sorry. The other thing when you do the bounding of the width you use the fact this is square, not linear. For instance, if you just look at -- what you can think about is using multiplicative ways to solve maximum flow problem using like L1 like a flow that minimizes L1 the shortest path computation. And the trouble there is in particular you don't have any anodized bound on the width. Like if the resistances are one you look at L1 minimization, your path might put, and it will, put everything on the shortest path so an edge can suffer the congestion being proportional to the value of the flow. So invariably the flow may be exponential in the size of the problem. So you also lose -- it seems like there is more to it than just the fact that you can do it fast. Of course, like just using this framework, if you were able to efficiently minimize, I don't know, well, L4 norm, then immediately it gives you an algorithm that will be like N to the 4, N to the -- N to the 4 over -- it will get better immediately by just the same by just the same reasoning. But like we don't know a way of efficiently doing that. So there are two things. First we can do it efficiently and the other thing is we need this higher than 1 power to make this we've bounding nontrivial. So make sure you spread out your flow that you don't focus in your path. >>: Okay. So Aleksander is here until tomorrow. More questions? Some of us are scheduled for meetings but if someone would like to meet with him, we can still arrange something. So let's thank Aleksander again. [applause]