24949 >> Yuval Peres: Okay. So we're delighted to have Shayan tell us about approximating the expansion profile. >> Shayan Gharan: Hello, everybody. So today I'm going to talk about the expansion profile, of how approximating expansion profile of a graph and local graph clustering algorithms. This is a joint work with uca Trevisan. So let me -- I'm going to start with motivating the problem and then I'll talk about the results. So I will -- today I will talk about clustering problems. So suppose we are given an undirected graph G. That may, for example, represent friendships in a social network. And we want to partition it into good clusters. Okay. So throughout the talk I'm going to assume that the graph is deregular but the results are generalized to nonregular graphs. And so I'm going to define it rigorously but by good partitioning, by good clustering, I mean a good cluster is -- it is said it is highly connected inside and loosely connected outside. So let me now define how we're going to measure the quality of the cluster. We're going to use expansion. So for a set S, the expansion of S is defined as the ratio of the number of edges leaving S over the summation of the degree of vertices in S. Since we assume that the graph is deregular, the denominator is just D times S. So, for example, here in this picture the expansion of this set S is one-fifth. So the expansion is always between 0 and 1, and it is said to have a smaller expansion, it means it's a better cluster. If we want to think up the expansion as the probability that random walk is started at uniformly random vertex of S leaves it [inaudible]. All right. So is this clear? This is telling you the definitions throughout the talk. The expansion of the graph is also defined as the worst expansion of any set in the graph. This is also called the sparsest cut of the graph. So the sparsest cut problem has been very well studied and there are different lines of algorithms. But let me talk about the Cheeger type inequalities, which would be of interest to us today. So Cheeger type inqualities characterize the sparsest cut of a graph in terms of the second eigenvalue of the normalized adjacency matrix. [[inaudible]] Allen proved that phi of G for any graph is between one-half of one minus lambda two. It's at most root 2-1 minus lambda 2. And in fact this proof is constructed. So you can get a set of this expansion. So this means that you can obtain an order over 1 over root by approximation of the sparsest cut of the graph. So why this is important, because this guarantees independent of the size of the graph. Nowadays, since we have to deal with massive graphs, it would be very good if we can have approximation algorithms that are independent, that approximation factor is independent of the size of the graph. And, of course, this algorithm, the rounding algorithm is very Can be implemented in near linear time. So all you need to do you need to compute the eigenvector, the second eigenvector of normalized adjacency metrics, then embed the graph on the line the value of the vertices in this eigenvector, shoot this line left to right and find the best cut. simple. is that the based on from So in practice, people use this algorithm to kind of, to partition the graph. And so they use kind of a precursor. First they find a sparsest cut, approximation of sparsest cut, divide the graph in two. Then they do this recursively on each of the two sides until the graph is partitioned into a good cluster. But this may not do well in practice because of these two things. The first thing is that although this algorithm is near linear time, you may end up to spend quadratic amount of time for running it, because there is no guarantee on the size of the output set. It could be that after running linear time you may just end up separating three or four vertices from the rest of the graph. Okay. So you may have to run this algorithm order of N times to partition the graph. And, on the other hand, it could be that many of the small communities are misclassified by this algorithm because, for example, once you do this, you separate this giant component, you may end up with classifying a small community because you want which side of the cut you choose doesn't change the value of the cut that you're selecting that much. So in order to overcome the deficiencies, Spielman-Tang suggested the following interesting problem, just called the local graph clustering problem. So suppose we are given a vertex U of interest in a massive graph and we want to find a nonexpanding cluster around U in time proportional to the cluster size. So, for example, think about the Facebook graph, the whole massive graph. Let's say you want to find, for example, the history as a cluster in this graph. And we are given to us, for example, the Facebook page of Yuval. So all we can do is go to the Facebook page of Yuval, and we can look at his friends. And we can go to the Facebook page of his friends and do this again and again. But we are not allowed to jump in this network. So basically the question is: of U to find this cluster? How should we explore the neighborhood And before talking about the results, note that if we can solve this problem, we kind of, you know, get rid of these deficiencies. In fact, we can do the clustering of the graph by running this algorithm over and over and we know that every time the amount of time that is spent is proportional to the size of the cluster that we get, that we're going to get. So the running time would remain linear. And, on the other hand, we know that if we start from small community, then hopefully we're going to find something very close to it. Okay. So first thing -- I'll answer this question kind of and prove the following nice guarantees. So I should also mention that they use this algorithm in their linear time, near linear time Laplacian solver. So they prove the following: For any target set A, if we are given a random vertex U inside A, then the algorithm will find a set S mostly in A with the following guarantees. The expansion of S is at most square root of 5 log cube of N. And the running time is order of the size of the output 1 over 5 squared poly log N. So we call this 1 over 5 squared poly log N as work to volume ratio. This is basically the ratio of the running time of the algorithm and the size of the output. So ideally if this ratio is constant, then by applying this algorithm we can have near linear time partitioning of the graph. >>: 5 is still 5-A. >> Shayan Gharan: Yes, 5 is 5-A. 5. >>: Mostly A is epsilon. >> Shayan Gharan: Yes, mostly A is 1 minus epsilon, some constant. All right. And also this algorithm is randomized. So they can do this with some constant probability. In fact, you cannot prove that for every vertex in A this is true. Can prove that for you. Half of the vertices in A they can do this. So this result has been improved. So under Andersen Chung and Lang improved the approximation guarantee to root poly log N. And the work per volume ratio to 1 over 5 poly log N. And more recently Anderson and Press improved the work volume ratio to 1 over root by poly log N. All of these results -- that the underlying idea of all these results is to run the random walk from view. So basically they [inaudible] random walk but they use very different techniques as you can see here, tell you the truncation of random walks, Andersen Chung and Lang used approximation patient vectors and Anderson and Press used evolving set. So today I will talk about evolving set processes and we'll use that today. Okay. So what do I want to show you from this table? The thing is if you look at the approximation guarantee of all these algorithms, unlike the Cheeger inequality, here there's a dependency on the size of the graph. So it has been an open problem, if one can find a local graph clustering algorithm without any dependency on the size of the graph. And that essentially would give you local variant of the Cheeger inequality because it would give you the same guarantee of the Cheeger inequality. This is basically the object of this talk. And you kind of solved this problem. So we proved the following: We proved that for any target set A, if we're given a random vertex U and A, then we can find a set S such that expansion of S is at most square root of expansion of A over epsilon. And the work payer volume ratio is at most A to the epsilon 1 over root poly log N. So this indeed kind of provides a local variant of the Cheeger inequality because both the approximation guarantee and the running time is independent of a size of the graph. It just depends on the size of the cluster that you're trying to find. >>: Can you explain ->> Shayan Gharan: Yes, up to this poly log N. Yes, I will. So one more thing is that this algorithm generalizes the previous result. In the sense that here you can choose epsilon whatever you want. It features epsilon to be 1 over log of A, we can replicate the previous result. In fact, there's a trade-off between the running time and the approximation guarantee. By choosing larger values of epsilon you will obtain a better approximation guarantee with the worst running time. By choosing smaller value of epsilon you'll get a better running time and a worst approximation guarantee. Any questions? Good. So if we're talking about the proofs or the algorithm, I want to talk about an implication of this result, which is interesting in the theory community. So there's a close connection between this problem and the small set expansion problem. So let me first define the expansion profile of the graph. For any -the parameter between 0 and N halves, then 5 mu is defined as the worst expansion of any set of size at most mu. So, for example, phi of N half is just the sparsest cut of the graph. So it's an expansion problem asks if you can find approximation algorithm for this problem of phi of 5 mu which is independent of the size of the graph. So by Cheeger inequality, we know that we can do this for mu being N half. So basically here the question is if you can do that for any values of mu. >>: From mu which is some constant times the size for any constant, because mu depends on this. So I'm not sure ->> Shayan Gharan: For every mu. It cannot depend on mu. >>: On mu at once? >> Shayan Gharan: Hmm? >>: Mu at once. >> Shayan Gharan: I give you a mu at input, you give me an approximation algorithm. >>: On N. >> Shayan Gharan: Yeah, could depend on N. >>: If it's N, it could depend on G. N is the size of G, right? >> Shayan Gharan: Mu may depend on the size of G. But I want the approximation guarantee to not depend on size of G. So that approximation guarantee cannot depend on mu, for example. It may depend on phi of mu. But it may not depend on mu. So 5 mu is always between 0 and 1. So it may depend on 5 of mu. It could be for example some crazy function of 5 of mu. So the widest problem is of interest to us. First of all, Raghavendra Steurer conjectured that this is an NP hard problem. So the conjecture that for every value of row there exists some delta, much, much smaller than row, such that NP hard to distinguish whether phi of delta N is close to 1. Or if phi of delta N is close to 0. And, more importantly, they proved that this conjecture improves the unique games conjecture. So, in other words, this means that if you want to design an algorithm for the unique game you should start from the small set expansion problem. Because this is an easier -- this is an easier problem than the unique games. Every algorithm -- if you design a polynomial time algorithm for unique games, you'll obtain a polynomial time algorithm for a small expansion problem. >>: So, in other words, you're using these conjectures? >> Shayan Gharan: Yeah, in other words, if you want to refute this, you should start by refuting this. >>: Because if you prove it ->> Shayan Gharan: No. No. Not for the proof. No. It could be that you could design an algorithm for this one but not for the unique games. Okay. So our result as a corollary imply an approximation algorithm for this problem. So we can prove that for any value of mu and epsilon, we can find a set of size at most mu to the 1 plus epsilon and expansion at most root by mu over epsilon. So this that we size of refuted does not refute the conjecture, because the size of the set find is larger than mu. If you could group, for example, that S is within a constant factor of mu, then you would have the conjecture. >>: What's the running time, though? >> Shayan Gharan: The running time is again, if I have a vertex of the target set, then it's going to be something -- it's going to be proportional to the size of the set. It's basically a corollary of the algorithm. So, as I said previously, our algorithm can, our local graph clustering algorithm can ensure you that the output set has a large intersection with the target set. In fact, this gives you the intersection -- the output set, mu minus epsilon fraction of the output set lies in the target set. >>: The definition of 5 mu was global ->>: What? >>: In your algorithm you're used to start with the target set? >> Shayan Gharan: No. Our algorithm, we can run that from every vertex in the graph and just find the best. We don't need to start from the target set. I mean, when the running time would be worse. If you know -- yeah, it depends. For here you don't want to optimize the running time. We just want to find the approximation to the problem. >>: See the different natures, the issue is not locality, the algorithm to do this will start with the vertex, what happened with the [inaudible]. >> Shayan Gharan: So I mean -- so the idea is that if you have a local algorithm, it's a good hope that you can solve this problem, because a local algorithm should -- let's start from the neighborhood of the vertex and then it expands to all of the vertices. So you may hope that along the way you find a small nonexpanding set. So that is why these two problems are kind of related to each other. For example, it could be, cannot prove, could be that if you get rid of the A to the epsilon you could have solved the response expansion problem. Okay. So this result is also proved independently by Kwok and Lau using different techniques. So if there's no question, let me go on to talk about the algorithm and the analysis. >>: So can you go back? You said it's still possible even over there you removed size A could be epsilon? >> Shayan Gharan: Yeah. Yeah, it is possible. And in fact even the algorithm that I'm going to present may do that. But we cannot prove it. I will mention by enough you can that the current exam you don't have any type of example for it. It could be that it can refute this kind of conjecture, don't know. Okay. So this talk, I put more main focus on this theorem. So I tried to figure out the running time. I'm going to show you how we can approximate the small expansion problem. I'll give you some ideas also how we can talk about the running time, too. Okay. So let me now talk about the algorithm for a couple of minutes. Ten minutes, I think. So we use the same machinery, the evolving set process that Anderson and Press used in their work. I will define the revolving set process in a minute, but just, I mean, as I will show you later, the algorithm is, again, very simple. We just run this process and we return the best set that we find. So what is the process? The process is Markov chain on a subset of vertices in a graph. So let me tell you how we can -- define a Markov chain. So for a set S we first choose random threshold U uniformly from 0 and 1, and we include all of the vertices, where the probability of going from V to S with a lazy random walk with [inaudible] is more than U. Right. So, for example, if U is -- if U is less than one-half, then mu set would always be a super set of the old one. Why? Because this is a lazy random walk. So the probability of going from every vertex inside S to S is more than one-half. So it would be a super set of those ones. And if U is more than one-half and mu will be a subset of the whole. So let me show you an example. It would be more clear. So let's say, for example, we run around this process and this cycle. So we start from this graph. Say the first threshold point too. So we include the vertices that have probability more than two going to that single vertex. This vertex probably has one corridor. So we include these two vertices. Then the next threshold is .7, which is not going to change. >>: Perhaps [inaudible]. >> Shayan Gharan: Well, it will have probability one. >>: One? >> Shayan Gharan: >>: No, before. For staying inside. [inaudible] the threshold was .2. >>: [inaudible]. >> Shayan Gharan: So it's above the threshold. >>: Sorry. So then the next one is .1. You retrieve again. And from .8 now you retrieve because these guys have probability only one quarter of going to S. So you decrease the set. And then you can decrease, and you decrease to empty set. So it's not hard to see that this process has two absorbing states, either the empty set or the full set. So since you want to run this process for a sufficiently long time to find a good set, you would like that the process absorbing the full set, the whole set. So in order to avoid observing an empty set, we condition the process to get absorbed in the [inaudible] and we call this new process the volume by S volume set process. >>: [inaudible] the shoots. >> Shayan Gharan: Uniformly random between 01. So here is the Markov kernel for the evolving by active set process. We just need to multiply the original Markov chain kernel by the ratio of the new set over the old set. Just follow simply by the fact that revolving set -- the size of the set is a Martingale volume set process using the new H transform. So let me now prove it. This is going to be a Markov chain for the new. So the rest of the talk whenever I use this hat notation, that means the volume biased process. And whenever I don't, it's the original one. >>: This probably would never be [inaudible] this one could be ->> Shayan Gharan: It could be large. >>: Larger than one. >> Shayan Gharan: This one could be larger than S. >>: Than S. >> Shayan Gharan: Yeah, the probability would never be larger than one. This follows a loop H transform. But, for example, you may see that this will never absorb into the empty set because if S1 is empty, this is going to be 0. So for sure never absorbed in the empty set. Just absorbed in the full set. And it's also not hard to see that this is indeed a Markov chain. So those are the not the probabilities. Okay. So let me tell you the close connection. So as I said, there's a close connection between this process and the random walk process. And, in fact, Diaconis and Fill first started this process in order to compute the mixing time of random walks. So there in the paper they prove a very nice coupling between this process and the lazy random walk. So the finest coupling XTST that has the following properties. First of all, XT is always inside ST. So random walk is always inside the S. And moreover conditional ST being some particular set S, the random walk is uniformly reviewed in that set. Okay. So as an example, note that once the process absorbed in the whole set, the random walk is mixed. So what this means is if you want to compute the mixing time of random walk, you can just run this process and compute the absorbing time of this process. >>: Upper bound. >> Shayan Gharan: Yeah, upper bound. Absorbing this process. >>: [inaudible] all bucks. >> Shayan Gharan: No. So also Anderson and Press in their paper used this coupling to prove that one can simulate the evolving set process. So they will prove that for any sample pad S01, S1 up to ST, we can simulate it in time essentially proportional to the size of ST. So the size of ST times root T poly log N. So kind of what they do is they look at -- they look at the vertex. They do [inaudible] of the random walk and then they condition on the set, on the new set that contains this new vertex and see how it should change. So the upshot is what this means is that this means you can run this process for a sufficiently long time without being verified of the running time, right? So we just need to prove that the process is good. It's going to give you a small amount expanding set. There's no problem with the running time. So think back algorithm, we just do that. Just for small expansion we just run this process from every vertex of the graph and we run each copy once the length of the log is sufficiently large, at least log of mu over 5 of mu, and if any of the other copies find small and expanding set, we just return it and we're done. So very simple. We run the process from each vertex. Find it. Okay. So let me now talk about the analysis. I'm going to show you that there exists a vertex V such that if you run this process, it's some non-zero probability, you're going to find it's not an expanding set. So before going into the mathematical notation, let me give you a high level overview of the proof. So the proof uses the following two observations. The first one is that the evolving set process grows rapidly on expanding sets. And the second one is that the process cannot get kind of get trapped in an expanding set, cannot leave them very easily. So it's kind of putting these together, what you would like to say is if this is the target set, I want to say the process goes very fast until it hits my target set and then it has to spend some time there. So I should be able to find this target set. So, for example, let's say that this is, think about this dumbbell graph. You have two expanders connected by an edge and we'll start the process from this vertex. So you may get some intuition by looking at random walk. So the process works the same as random walk. We know that random walk the first cog in the steps, very rapidly mix in this expander. So the process kinds of do the same thing. Very rapidly expands and covers the whole set. And then for quite large amount of time he can just, he would just add, need some vertices. It will be very close to A. And we should be able to find it. We'll be able to find. >>: You were going over the edge ->> Shayan Gharan: Yeah. >>: So once you get someone over there. There's a difference between random walk and -- I agree, the random walk may take a lot of time for it to keep this vertex. But the process may get increased but it would then decrease. >>: Why won't it then expand into the other set? >> Shayan Gharan: The reason is that from this point of view, like at this time, you can think about running evolving set process from a single vertex. But if you run evolving process with single probability high probability it will go to new set. New process won't be volume biased anymore because it always has this mass at the left. So, for example, if I had this threat, if I include his neighbors, then the size wouldn't change. So it's the same as if you were running the non-volume biased process from this vertex. So with high probability you're going to ->>: Fraction of the neighbors [inaudible] you said is very small. So only [inaudible] next time, the uniform threshold happens to be below this very small ratio. >>: You're at a point, chance going to step into the set. >>: Above the threshold, right? >> Shayan Gharan: Yes. >>: Includes a fraction of its neighbors in this current set [inaudible]. >>: Right. So this vertex is very unlikely to be in the set, in the [inaudible]. >>: It's tough, because it's ->>: That and the objects, to say where, you mean. >>: I was thinking of neighbors. >>: [inaudible] it's an exit. >>: Right. That's probably enough. >> Shayan Gharan: this. If you run it a couple of times then you wouldn't do >>: But it won't group. For those guys' sake -- >>: I don't see where the neighbors won't go into this. >>: For each neighbor, for it to go in, very unlikely. >> Shayan Gharan: I think that the right way to think about it is suppose I just ran the volume, not the volume bias, the evolving set process from a single vertex. Then the process is a Martingale. So the probability 1 over N, the process would absorb in the empty set. Sorry, probability 1 minus 1 over N. The probability would absorb in the empty set. And then only probability 1 over N would get the whole set. My claim is once you get here, then the process -- this new process from the point of view of the expander is just as if you run a nonvolume biased process because you have a giant component at the left. So this ratio is always going to be one. Okay. So let me now tell you how we can make this quantitative. So we're going to prove the following true statements. And I'm going to show you that this is sufficient for the proof. So the first one is the following. We can show it for anytime T if you start the process from some vertex V, for any vertex V, the minimum of the expansion squared of all the sets in the pack is at most order of log of ST with high probability. So, in other words, if, for example, these expansions are large, then you should have a large set by this time. So, again, the theme that if you're going through the expanding set, your sets should go really fast. But in the second statement, I'm going to tell you that, I'm going to show that for quite a large time, the set cannot grow very fast. The set should remain small. Here we're going to show that if you choose T very large, something like epsilon [inaudible] over mu over 5 mu, then all of the sets are small. All of the sets in the sample path are at most mu to the one plus epsilon. Okay. So these two basically contradict each other. One says either you find a good non-expanding set or you're done. If you don't, your process should go very fast. But you know it can't grow very fast because there's a target nonexpanding set. So this means with some probability you should grow slowly. So you should be able to find a small expansion. So let me tell you why this proves the theorem. It's very easy. Just plug in alpha equals mu in the top. Then by union bound you can prove these, both of the things statements cared with some probability mu minus epsilon over 2 something, and then what do you get? One and one you get all the sets are small. They have size at most mu to the one plus epsilon. From the top one you get there exists a set of size at expansion at root 5 over epsilon. So put T equals epsilon 5 of mu, the whole thing will be good five over epsilon. Okay. So now let me tell you how we can group each of the statements. I'm going to start with the top one. First I'm going to show you how we can show the process grows rapidly in one step. Okay. So this is maybe observation due to Morris and Pratt. So they show that for any set S, if you look at the expectation, expectation of the root, of the change in one epsilon is at most one minus five squared. So, for example, think of the five being a constant. This means that you want to set up your set is constant times more than the old one. >>: [inaudible]. >> Shayan Gharan: Hmm. We should have that. >>: More with respect to the handfuls ->> Shayan Gharan: Yeah. So -- so this is with respect to volume biased process. You can write it in terms of the nonvolume bias by multiplying it by S over 1S. You get this. But now this is not hard to prove. So in fact the proof simply follows through the fact that the volume biased process is the Martingale, the size of the set in the volume by process is a Martingale. So this means, the reason is that if your threshold is below one-half, your set, your new set would be 1 plus 5 more than the old one in expectation, and if it is more than one-half it would be 1 minus 5 the old one in expectation. The reason follows from the same reason that the probability of the random walk remain inside the set S is 1minus 5. It's exactly the same reason. So having this in hand, you can just prove this using Jensen's inequality. All right. So now how can we use this to prove that totally like overall the set grows rapidly. So let me call this ratio psi of S. This is a ratio just depends on S. Now, the idea is to define a Martingale. So we'll define this ratio of 1 over root ST times the product of the psi functions of S 0 up to ST. It's very interesting to see that this is a Martingale. Because expectation of NT given S0 up to ST minus 1 is you can write it as expectation of root ST minus 1, root function. Psi of S0 up to psi of ST And then if you take this thing out, thing is a psi function. This is NT ST minus 1 times expectation of root ST., root ST minus 1 times the psi minus 1. Conditioned thereabove. it's just NT of minus 1. This of minus 1 times 1 over psi over ST minus 1 over ST. This is with respect to that. Okay. It's easy to show. Don't worry about it. So what this means is that in expectation, this NT is going to be 1. So it's an application of Markov inequality. You can show it's going to be less than any alpha with large probability of 1 minus 1 over alpha. Plug in the values we'll get the following. With probability 1 minus 1 over alpha, the product of 1 over 1 minus 5 squared is less than alpha times the square root of ST. And if you just take a logarithm of both sides, you're going to get this inequality very easily. So let me now talk about the second part of the group. So here we want to show that the set gets trapped in nonexpanding sets. So let me -let me, instead of proving that all the sets are small, let's just prove that the last one is small. And because the process is growing, if the last one is small, essentially all of them should be small. So how are we going to prove that the last one is small? I'm going to prove for the sample path ST., ST is at most mu to the 1 plus epsilon, with this probability, best probability of mu for some very large T. So here the idea is to use the coupling between the process and the random walk. So my claim is this is equivalent to the following. I just need to prove that random walk started from this vertex V is going to be inside A with some -- sorry. I need to prove that there exists some set A such that random walk is from B is going to be A and T steps with some probability, some non-zero probability with minus epsilon. Why is Y equivalent? Just use the coupling. So we recall what did the coupling say? It said that the distribution of the random walk is -- sorry. It says conditioned on the evolving set process being equal to some particular set, random walk is uniformly distributed on that set. Now, if I look at the distribution of the probability that random walk is some set A, this is equal to the expected fraction of the [inaudible] of ST with A. Now, because we can easily go from the evolving set process to the random walk. If I look at the distribution of the sets, I can just apply the uniform distribution and then take the average. This would give me the random walk distribution. If I want to look at the probability of being in some set A, I can just look at the set distribution projected on to the set A, take the uniform distribution and then take the averages you give me. Probability. Now, this -this tells me the right-hand side is small. I know the left-hand side is smaller than the ratio of A over ST. So because A -- I take A to be small, ST can be at most mu to the epsilon more than A. So ST can be at most mu to the 1 plus epsilon. So I just need to prove this equation, which is just forget about the evolving set process, it's just the random walk. So just -- I just need to prove there exists some set A. You might guess the set A would be my target set. I want to prove that for my target set there exists a vertex that the walk is going to be in that set after T steps with some probability mu to the minus epsilon. So I'm going to prove something stronger. Instead of proving that the walk is at A at time T, I'm going to prove that the walk never leaves A up to time T. All right. And I'm going to prove it with this probability. But these two are essentially the same. If you take T to the epsilon log of mu over 5 mu, and you see these two are exactly the same. So we just want to prove some property of the random walk. Just want to show for every set A there exists a vertex V. Such that the random walk never leaves A with probability of 1minus A to the T. So, again, I'm not going to make it even stronger. I say I'm going to prove if you choose V uniformly at random A it never is going to leave A with this probability. Now this should kind of recall, off the definition of Y of A. So what was the definition of Y of A? It said it's the probability that the random walk leaves A and 1 minus 5 probability that the random walk remains inside A. So intuitively this should hold because if the probability that the walk leaves the set A at each time was independent of each other, you would exactly get 1 minus 5. So you want to say it's even better. So here is some extreme example where you get equality thereabout. And so let's say -- remember that each vertex in A has on average D times 5 of A edges going outside. Now let's say that's the case for all vertices. All vertices has exactly this much amount of edges going out. And then if you do one step of the random walk, the distribution remains uniform inside A. Because everybody had the same number of edges going out. >>: Distribution being conditioned to stay in A? >>: Staying in A. >> Shayan Gharan: Yeah, you do one of the random walk. look at -- the distribution remains inside A. And then you >>: So here you're -- say it again, what are you saying? >> Shayan Gharan: I'm saying start from uniform distribution in A. Now do one step of the random walk. From each vertex exactly the same amount would go outside. If you project the probability back to the set A, you get uniform distribution. But again if you condition A to be in the set. Conditional probability. >>: Some is made more? >> Shayan Gharan: No? Yeah, exactly. >>: When you say ->>: In this case. >> Shayan Gharan: In this -- >>: General, going forward and published. >> Shayan Gharan: In general, some vertices might have more. So what we want to say is that intuitively if some edges go more to the outside, then this probability, the probability of being on them should only decrease. Because if you are on those vertices, you would certainly leave the set faster. So after some number of steps, your probability should be more concentrated on the vertices with less number of edges going out. So you should have a higher probability of remaining inside. So how can we prove this? It's also again very simple. So let P be the transition probability of the lazy random walk. Then this thing is just equivalent of the following: So let me parse this for you. So U of A is the uniform distribution on A. Start from the uniform distribution of A. We do one step up the random walk. Project it back. Y of A it's the identity metrics on set A. You project the walk back on the set. You do one more step, one more step on to the time T where we just add up the summation of probability. And this is exactly that. And then the right-hand side is just this thing T, by removing the T and we know that this is because we know that 1 minus 5 is the probability that the 1 is depending on this. So just have to prove this equation. And I can even deduce it even more. So let X be root, square root of the uniform distribution, and also define this PIA to be a matrix Q. And the left-hand side is just X transpose QTX. All right. I can write this square root of X. Square root of X. >>: All right. >> Shayan Gharan: And the right-hand side is X transpose QX with T. By symmetry. So just want to prove this thing. All right. And this is also simple to prove. So you just need to diagonalize Q and use the fact that positive semidefinite matrix. So you can write X transpose Q of X. >>: The metrics. >> Shayan Gharan: Q is not symmetric. putting an I of A here. But I can make it symmetric by So X transpose QX as summation of X transpose VI where VI are the eigenvectors, so let's say V1 up to VN are the eigenvectors of Q and on the 1 up to the eigenvalues. Then this thing would be exactly this. All set. This is squared. And the right-hand side is just the whole -- so you just want to prove this thing is more than -- and this is again simple to prove because this is just Jensen's inequality. These eigenvalues are not negative. The summation of these guys add up to one because X is a unit vector. This is Jensen's inequality. All right. So I'm almost done. In fact this equation of X -- thank you. So let me finish the talk. So again, we prove that for any mu and epsilon the algorithm can find a set of size at most mu to the 1 plus epsilon expansion root phi over epsilon. This has been the first approximation expansion problem without loss in expansion. So previously there has been many works even with many people in this room where they could give an approximation algorithm with preserving the size of the, preserving the size of the set, the size of the target set but losing the expansion. >>: The log [inaudible]. >> Shayan Gharan: >>: This one? Yeah. >> Shayan Gharan: No, it's log over mu log over N. >>: Log N squared. >> Shayan Gharan: No. This log N, it's not the denominator. >>: Squared N. >> Shayan Gharan: Yeah. Yeah. >>: Isn't that case [inaudible]. >> Shayan Gharan: A bracket here. >>: And also we proved a Cheeger cut can be computed in almost linear time in the size of the target set. And that's containing a local variant of the Cheeger inequality. So let me give you some open problems and finish the talk. So perhaps the remain existing open problem is if one can find approximation algorithm for phi of mu, that's independent of the size of the graph. So, in particular, this is a nice question. If you can prove that, with some inverse polynomial probability mu, all of the sets in the sample path of the volume set processes are at most order of mu. So here we proved that there are at most mu to the 1 plus epsilon and you manage to get this size mu to the 1 plus epsilon to the output set. If you can prove the order of mu you would refuse the expansion conjecture. Now often the problem -- so mu works, they all use the semidefinite programming relaxation, not the random walk. And also would be interesting if one can replicate our results using the expansion results. Semidefinite programming relaxation. It would -- let me highlight some new ideas for the problem. In terms of the local graph clustering problem, one interesting question is if one can generalize these two related graphs, currently we don't know any of the graphs we use through related graphs. And the other one is that if you use a traditional spectra clustering algorithms, as I said, you may misclassify the small communities. But here since you can using local graph clustering you can run the algorithm from every vertex in the graph, you can hope to find overlapping communities in a network. But so there has been -- although there's quite a lot of practical interests in this problem, there hasn't been much of the theoretical works. And it could be very interesting direction to work on. Okay. Thanks. [applause] Questions? >>: What do you make of this using relaxation -- [inaudible] relaxation? >> Shayan Gharan: Yeah. >>: So this result, they use STP relaxation. And if you don't want the set to be large, you can just, you can just plug in mu to be the half and you get Cheeger inequality. But then there is no guarantee on the size of the set. It could be very large. >>: So didn't you say at the beginning you didn't need regular graphs. >> Shayan Gharan: Need regular, but we need unweighted graphs. >>: Multi-graphs? Have to be a simple graph. >> Shayan Gharan: Needs to be simple graph. >>: Needs to be simple graph. >> Shayan Gharan: Yeah. >>: [inaudible]. >> Shayan Gharan: I mean, it's not clear you should. So all the algorithms kind of depend on the number of edges in the graph. But we have a better answer? >>: Interpret, again, the weights are conductances. You just interpret them as multiplicity of edges. Run the very same ->> Shayan Gharan: running time? You can run the algorithm, but can you bound the >>: The bounds will involve these volumes. Involve the weights. >> Shayan Gharan: Yeah. I mean the point is that -- I mean, it might be that the analysis would work. Just none of the people have done this. And I don't know if it is -- you had a difficulty on the problem or not. >>: If the graphs are directed, that's a different course. >> Shayan Gharan: difficult. If the graphs are directed, it's much more >>: So then you mean for weighted graph, the weights are the small, it's the same. Small. >>: Then it's exactly the same. You can just -- interpret it as multiple edges and run the algorithm exactly. There's no difference. If the weights can be very large, the algorithms still work. Just a matter of the analysis ->> Shayan Gharan: Yeah, that there is -- you may have very large edges versus very small, very large weights versus very small weights. I don't know. It could be the analysis works. >>: To be effective, the running time. >> Shayan Gharan: Yeah. Yeah. The running time, but the approximation guarantees would work, of course. >> Yuval Peres: Any other questions? Let's thank the speaker. [applause]