>> Yuval Peres: Good morning, everyone. We're happy to welcome Shayan who will tell us about multi-way spectral partitioning. >> Shayan Oveis Gharan: Hello, everyone. It's very here. Thanks very much. So I'm going to talk about partitioning and higher order Cheeger inequalities. `this talk is based on joint work with James Lee and I'll talk about some newer results as well. good to be back spectral The main part of Luca Trevisan, but In the talk, I try to be -- since most of the people are from theory group I try not to be wishy-washy and give you almost the proof. So I want to talk about K clustering problem. Suppose we are given an undirected graph G and an integer K. We want to find K good clusters as well as K in the graph, K joint clusters. So this graph G may represent the friendships in a social network. that case a good cluster would represent a community in a social network. In Or the graph may come up from a set of data points where the edges could be weighted and the weight of an edge would represent a similarity between the data points. So in that the date. unweighted generalize case a cluster in the graph which represents a cluster of So in the talk, I'm going to assume that the graph is and regular for simplicity. But all of the results would to nonregular unweighted graphs. Okay. Now let me tell you how I'm going to measure the quality of the clustering. I'm going to use this notion of expansion to measure the quality. So suppose we have a set S of vertices. The expansion of the set is defined as the ratio of the number of edges leaving the set to the sum of the degrees of vertices in S. Okay. So because we assume the graph is deregular, the denominator is exactly D times S. For example, in this graph, the expansion of the set is one-fifth because three edges are leaving the set and summation of the vertex degrees is 50. Okay. There are other parameters to measure the quality of the cluster such as the diameter and cluster coefficient but for these parameters you can find examples where there's a natural clustering with a bad quality. So I'm going to work with this notion of quality. The expansion parameter is always between 0 and 1. And the closer to 0 means that we have a better cluster. Okay. So ideally we'd like to find a clustering of a graph such that every cluster has a small expansion, very close to 0. So this is the quality of one set to measure the quality of the whole clustering, I'm going to look at the maximum expansion of all of the sets. So I want to find a clustering such that the maximum expansion of all the set is as small as possible. Okay. And our benchmark or the optimum solution will be the one that achieves this. It will be a clustering into K disjoint sets such that the maximum expansion is as small as possible. So I'm going to use this parameter phi of K to denote the optimal. Again phi of K is the clustering into K disjoint sets such that the maximum expansion is as small as possible. >>: [inaudible] of SIs? >> >> Shayan Oveis Gharan: Yes. So next I'm going to characterize YFK in terms of the eigenvalues of the graph. So turns out that there is an interesting connection between the algebraic connectivity of a graph and phi of 2. So let L be the normalized Laplacian of a graph defined as the identity minus the adjacency matrix over degree. It's easy to see that the Laplacian, normalized Laplacian is a positive semidefinite matrix, the eigenvalues are nonnegative and moreover there are at most two. So let lambda one be the first one, and lambda two be the second one. Lambda one will always be 0. And let's say they are in this increasing order, and there's a basic fact in algebraic, with respect to L graph theory which says the number of connected components of G is exactly equal to the multiplicity of 0. What this implies is that let's say the spectral gap be the difference of the first and second eigenvalue, which since lambda 1 is 0 it's going to be lambda 2. This fact says that the spectral gap is 0 if and only if the graph is disconnected. But on the other hand, we know that the graph is disconnected if and only if py of 2 is 0 if the graph is disconnected I can just choose my two clusters as two connected components and they would have expansion 0. So putting these two together, I'm going to get the spectral gap is 0 if and only if phi of 2 is 0. In other words, by knowing that how the eigenvalues look, knowing that whether or not the spectral gap is 0 I can say whether or not phi of 2 is 0. Is this clear? Now, Cheeger type inequalities provide a robust version of this fact. Meaning that phi of 2 is very well characterized by lambda 2. It's at least one half of lambda 2, at most root 2 lambda 2. So you can think of this as the following: A graph is barely connected if and only if the spectral gap is very close to 0. Or lambda 2 is very close to 0. Okay? The importance of this inequality is that it is independent of a size of a graph. So no matter how large G is, still you would have the same characterization of phi of 2 in terms of the eigenvalues. And the proof of this is algorithmic. It gives you so-called spectral partitioning algorithm that I'm going to talk about it later. But it's a very simple linear time algorithm to find the two clusters. Miclo conjectured that the same characterization must generalize to higher eigenvalues. In the sense we should be able to characterizes phi of K in terms of lambda K without any dependency to the size of the graph. And this is the subject of this talk. Okay? I will try to answer the question. So now let me tell you our main result. We proved that for any graph G phi of K is at least one-half of lambda K and it's at most order of K squared root lambda 2. The left side of the inequality is easy. It was known before. The main part is the right side of the inequality. Let me show you an example to understand this better. Suppose our graph is just a cycle. How would you find -- how would you partition a cycle into K clusters of a small expansion? The best way to do it is to just find K paths each of length roughly N over K. Now, the expansion of each of these paths would be K over N, essentially, up to constant time distance. So what this means is that phi K for a cycle is K over N. But we know that lambda K for cycle is K over N squared. So plugging this in above we see that phi of K is at most root lambda K. Even if you don't have a dependency to K squared. Note that this inequality answers the Miclo question because although we have dependency to K in the right-hand side, this is independent of the size of the graph. Another nice thing is that the proof is algorithmic as well. So not only we can characterize Y of K in terms of eigenvalues, we can give an algorithm that finds a K clustering with corresponding quality. >>: Lambda K over K-1 squared that's for small K. K close to N or K bigger? That holds for even >>: The asymptotic version [inaudible]. >> Shayan Oveis Gharan: Here or -- >>: Big N. >> Shayan Oveis Gharan: The top one? I think it holds for large eigenvalue as well up to constant factor. But my main interest is a small K here. If K is very large this is not going to give very good estimate. >>: You're saying K can depend on that? >> Shayan Oveis Gharan: result -- Yeah, K can depend on that. Okay. So is the >>: N order 1. >> Shayan Oveis Gharan: Lambda is never more than 1. Okay. So we can improve the dependency to K exponentially. If I am allowed to use much larger eigenvalues, set up the lambda K, if I can use lambda 2 K. I can show that Y of K is at most root lambda 2 K log K. And furthermore, if the graph is in low dimension, like a planar graph bounded genus or doesn't have a fixed minor, then we can completely get through dependency of K and show Y of K is at most root order lambda 2. >>: Do you know that this definition is needed or ->> Shayan Oveis Gharan: Right. So both of these are typed. The bottom one is typed for the example I just showed you, for the cycle. The middle result is tied for graph called noisy hyper key, which is a generalization of the hypercube. The top result is not necessarily tight. It's still open to see -- I will talk about it at the end, it's interesting to see if we can improve dependency to K like to polylogarithmic function. >>: Some kind of [inaudible] algorithm that these are the K parts and two K parts and the type better approximation or something? You said the top result logarithmic. >>: So what's the algorithmic, you say two K parts instead of K? >> Shayan Oveis Gharan: I'm going to break into K parts. going to use lambda 2 K to measure the quality. But I'm These two also can be replaced by any constant greater than one. >>: Is it easy to have some difference in K? K squared, can you just completely remove K squared is that possible or not. >> Shayan Oveis Gharan: As I said the middle one is typed, right. So you have to have log K dependence, but we don't know if the dependency should be polylogarithmic or polynomial. >>: The example lambda K and lambda 2 K the first substantially ->> Shayan Oveis Gharan: Well, we can make -- it's easy to construct graphs where lamb did 2 K is much larger. >>: [inaudible]. >> Shayan Oveis Gharan: Yeah. Is the result clear? So I'm going to focus mainly on the first result, and if I have time tell you a little bit about the second two. So here's the outline of the talk. First I'm going to give an overview of and show you how we can deal with it. the ideas of the proof. And lastly, I based on the tools and techniques that Okay? the special cases of our problem And then this part will be not will talk about new results I talk about in this talk. We're going to use the Rayleigh quotient as the spectral relaxation of expansion. Suppose we have a function S from the vertex to the reels. Then the Rayleigh quotient of F is defined as this ratio, the summation of FU minus FU squared over V times the summation of FU squared. To understand this, let's plug in a 01 function. If F is 01 function then the Rayleigh quotient is exactly the expansion of the support of F. Support of F is those with value 1. Why? Because the numerator is just a number of edges leaving the support for such F. And the denominator is just D times the size of the sub. >>: Are you assuming degree is constant in all of this. >> Shayan Oveis Gharan: Degree is D. >>: That's throughout? >> Shayan Oveis Gharan: generalized. Yes, throughout the talk. But everything was So if you could, for example, find that 01 function minimizing the Rayleigh quotient, that would give us the best non-expanding file. It would be sparsest part of the graph. NP hard problem, we cannot do that. This shows why we think of this Rayleigh quotient as a continuous relaxation. So we cannot solve the discrete version, we will deal with continuous relaxation. Now it turns out that we know the optimizers of this continuous relaxation, and they are the eigenfunction or eigenvectors of the normalized Laplacian matrix. So, for example, the one that minimizes the Rayleigh quotient is F1. And the Rayleigh quotient would be 0. So let me remind you that the first eigenvector or eigenvector function of the Laplacian is a constant function. And for that the numerator would be 0. So order of F1 is 0. It's equal to lambda 1 of 1, which is 0. And for any FR, RF of I is lambda I. So F1 is the function that minimizes this. F2 is the function that minimizes the Rayleigh quotient. And all functions which are orthogonal to F1, F3 is the one that minimizes the Rayleigh functions that are orthogonal to F1 and F2 and so on. So, in fact, F1 of FK gives you best K dimensional subspace minimizing the Rayleigh quotient. So you can think of what I'm going to do, well what I'm going to prove as a rounding algorithm, starting from this continuous relaxation of the K clustering problem and round it into a K cluster. Again, here if you can see that F1 to FK was a 01 function, then this would be the optimal solution of our problem. Okay. So I want to talk about the rounding problem, how to start from F1 to FK and round them into a K clustering, K disjoint sets. >>: Minimize before the definition of the Rayleigh quotient but minimize over F that just gives you F1, right? >> Shayan Oveis Gharan: Yes. Now, if you minimize over those that are orthogonal to F1 would give you F2. >>: Give F2 K you need to have a constant say F1, based on F1 and it's orthogonal to the first one, you get ->> Shayan Oveis Gharan: Right. >>: And it didn't normalize the ->> Shayan Oveis Gharan: Yeah, I just need to say the function is not all 0. Is more constant. >>: No. Means 0. If you minimize over all functions F. >> Shayan Oveis Gharan: That are not 0? >>: You can ->> Shayan Oveis Gharan: Then F one would be optional. >>: If you add a huge constant to F2 to make this ratio very small ->>: Numerator is here. >> Shayan Oveis Gharan: norm of F1. Think of the denominator as just making the >>: If you take a non[inaudible] and add a huge constant you make this ratio arbitrarily small. >> Shayan Oveis Gharan: No. >>: It's really the same, it's just ->>: So the normalization technicality. >> Shayan Oveis Gharan: Doesn't make it arbitrarily small. If you make a huge constant you just make it closer and closer to F1. >>: Time of F1 is ->> Shayan Oveis Gharan: the minimizer. >>: Okay. F1 if I make it smaller and smaller, but F1 is You count in the constant. >>: That's the orthogonal draft. >>: Okay. Fine. >> Shayan Oveis Gharan: Okay. So now let me tell you how we can do this rounding. I'm going to start with the simplest case, which is -sorry, before that, so the rest of the talk all I'm going to use from F1 up to FK is they're ortho normal and they have a small Rayleigh quotient. I'm not going to be interested whether or not they are eigenvectors or eigenfunctions. So let me tell you how we can do this rounding. If RF of F1 and RF of F2 is 0, how can we show why F2 is 0? So as well as two observations the first one is RF 0 is function F, then for each adjacent pair of vertices F of U equal to F of V and vice versa. Why? Because this plug is in here. What this means by repeated application of this you can see that every connected component has the same value in F. Now, on the other hand, I know that F1 is orthogonal to F2. So if one of them is not constant. So it has at least two different values. disconnected. So the graph must be Just two observations. Now we can generalize this in two ways. One is to generalize 2 to K. Say you have K functions of 0 Rayleigh quotient, how can we show why FK is 0. And then the second one is to generalize 0 to some small number delta, say RF of 1 RF 2 delta, how can we show why F2 is less than order of proof two delta. Start with the left one, go to the right. So, again, I want to assume RF of F1 up to RF K is 0 and show FK is 0. And I'm going to use spectral embedding. This will be used later -this will be used throughout the talk. It's important to remember this. So the spectral embedding of the graph is what we get by -- is an embedding of the graph in a K dimensional space where the value of each vertex, capital F of U is just F1 of FU is K dimensional vector of F1 F2, FK. So here you see an embedding of a cycle based on its three first eigenfunctions, F1, F2 through F3. If you remember F1 is constant and F2 and F3 just give you the cycle. Now let's see how we can use this to prove our claim. First observation is that since all these Rayleigh quotients are 0, R of F will be 0 as well. In fact, R of F, the Rayleigh quotient is always less than the maximum of the Rayleigh quotient of F 0. That's very easy to see. Now what this means is again for any adjacent pair of vertices, they must be mapped to the same point in this high dimensional embedding. >>: [inaudible] also defining R of F of value ->>: That's why the norm was there earlier. >> Shayan Oveis Gharan: Right. >>: And it's usual or? >> Shayan Oveis Gharan: It's the L2 distance of adjacent pairs over the summation of the L2 norm. So okay what this means is each connected component of a graph is mapped to a single point to this K dimensional embedding. So to prove that Y of K is 0 I just need to say there are K points at least in this embedding. Right? Why is that true? Again, why ortho normality. So remember that F1 of F2 is ortho normal. Now construct this matrix with rows F1 of FK. Then the columns of this matrix would be my embedding. Now, because F1 to FK are ortho normal, the matrix has ranked K. So there are K linearly independent columns. There are K disjoint points in that. >>: Can you say that again? >> Shayan Oveis Gharan: to FK these are normal. of this matrix. This is means there's K disjoint So because to construct a matrix with rows F1 So matrix has rank K. Now look at the columns exactly embedding, there's K linear columns points in the embedding. This thing. >>: So you're just saying in particular linearly independent implies this thing? >> Shayan Oveis Gharan: Yeah. So they have J disconnected components. Now let's see how we can prove the other generalization. Say RF of 1 and RF of 2 is not 0, but let's say very small, small number of delta. If I say Y of F2 is order of proof delta. This is by the way the Cheeger inequality. Let me not prove it completely, just give you the idea. So, first of all, since F1 and F2 are orthogonal, one of them is not constant. Say F2 is not constant by what we said before. Now, because R of F2 is not 0, we cannot say adjacent vertices are mapped to the same points, but because it's small we can say they're mapped to close values. So the idea is to map the vertices on a line, certain, based on their values in F2. Now, sweep this line from left to right and consider all the cuts and just choose the best one. This is exactly the spectral partitioning algorithm. Very simple. Just use it the second eigen function. The intuitive idea that this works is that think of random cut, random threshold in this region, if two points are close the probability is that it cuts them is very small. But we know that the edges, the adjacent vertices are mapped to close values. So on average we're going to cut very few number of edges. So we're going to get sparse at one expanding cut. So okay now our main theorem generalizes both of these special cases. The one say you have K functions of a small value phi of K will be as small. So polynomial FK times root delta. So by what I said so far, our proof must have two main elements. The first one is that we have to use the fact that the Rayleigh quotient of capital F is small. And argue that adjacent vertices are mapped to close points in this high dimensional embedding. To understand this, look at the cycle. The second observation is that you have to use ortho normality of the vectors to argue that the embedding kind of spreads in this space not concentrated in few number of places. This is what we've done in these special cases. So in particular, we use the first observation to choose our nonexpanding sets from clusters of close points in this high dimensional embedding. And you use the second observation to argue that we can actually find K clusters. K disjoint clusters. Okay? Is there any question? >>: When it spreads? >> Shayan Oveis Gharan: I'm going to make it rigorous. But the intuitive idea is that it's spread in this space. It's not -- see that cycle? Okay. So now for the next I think 10 to 15 minutes I'm going to talk about the ideas of the proof. There are three main ingredients. The first one is the radial projection distance. This is a particular distance we use in the proof. I'm going to define it in the next slide. The second one is a spreading property that we show that the vertices spread in the space with respect to this particular metric. And the last one is that we show that to prove the theorem it's sufficient to find the regions in the space, each containing a large mass of points such that they're well separated. We call it advanced well separated. So let me first define what I mean by radial projection metric. If you have two points, U and V, the radial projection distance is defined as follows. First we project the points on the unit sphere around the origin, and then we complete the Euclidian distance of the projected points. So this is a Euclidian metric. Why is it useful? This helps us to reduce the problems with simpler case where all the vertices are mapped to, around the unit sphere. They all have the same distance to the origin. So for simplicity, the rest of the proof, you can assume that this is the case. The vertices all have the same distance. I'll point out what it would make different, if you don't want to assume. Now let me tell you the proof plan. There are two main steps. The first step we find K legions in the space, X1 up to XK such that they have 1 over K fraction of a total L2 mass, and their pairwise difference is at least 2 epsilon. What do I mean by L2 mass. L2 mass is just the summation of the normal square root of the vertices. If you assume that the vertices are at the same distance of the origin, you can replace the L2 mass with it a number of vertices in it. So think of L2 of X as the number of vertices in X. In the second step, we show that if you have K disjoint regions, we can find -- we can turn them into, if you have K well separated regions we can turn them into K disjoint non-expanding sets each defined on the epsilon 1 neighborhood of the regions. By epsilon, I mean the points at the distance epsilon. Now because we assume that the regions are at distance 2 epsilon, my sets will be disjoint. >>: [inaudible] subset of ->> Shayan Oveis Gharan: Yeah, the support is a subset of the epsilon neighborhood of the regions. >>: Partitioning of -- includes all the points, all the Xs? necessarily the points here, some for every ->> Shayan Oveis Gharan: Not Yeah. >>: Why do the assumptions keep all the points, not sure? >> Shayan Oveis Gharan: I'm not going to say -- S1 up to SK would be disjoint. I'm not going to say there would be a partitioning, but we can make it into a partitioning. If you have it clustering into K disjoint sets, we can make it a partitioning by adding the remaining vertices to the largest set. So okay. So we say a region is dense if it has 1 over K fraction of the vertices. Fraction of the mass. And you say the regions are well separated if their distance is at least two epsilon. So again the plan is to find first K dense separated region and then turn them into K disjoint nonexpanding sets. And epsilon will be a function of K. Okay. Is the plank here? start with the second step and talk about the first step. Let me So suppose we have a region that contains let's say [inaudible] K fraction of the mass. When rounded into nonexpanding sets such that the set is a subset of the epsilon neighborhood of the region. And the expansion is small, square root of polynomial function of K times the Rayleigh quotient of kappa. How are we going to do that? This is essentially generalization of Cheeger inequality. So the idea is to choose a random threshold in the epsilon neighborhood of X. So we consider all these balls in the epsilon neighborhood and we choose the best one. Now, observe that each of these balls contain 1 over K fraction of a total mass. Well, on the other hand the probability of cutting each edge is only its length over epsilon. So putting these two together proves the statement we can show that there exists a set of expansion summation of a few minus SV over epsilon divided by 1 over K fraction of the mass, and by a simple action of [indiscernible] we get what. Okay. Now, let me tell you how we can find K dense separated regions. Remember that because we want to find these densely separated regions with respect to the idea of projection distance, the regions would look like narrow cones around the origin. The vertices in this cone that are, if you have one vertex here, one vertex there, the distance would be very small, because we project on the spheres. So the regions would look like this narrow cone. So let's see how we can find it. Okay. So the proof has two steps. First step is to prove the spreading property that I promised you to talk about. So the spreading property says the following: If you have a region of small diameter, then it cannot have a say constant diameter one-half, it cannot have more than 1 over K fraction of the points. So in other words think of a region of constant diameter as sparse region. It has very few fraction. So the first thing we proved it. >>: The projection, regular projection. >> Shayan Oveis Gharan: Yes. >>: You're projecting on sphere of radius one so the whole space is two? So you say constant as in -- constant methods? >> Shayan Oveis Gharan: Yeah. Yeah, it's one-half of two, of course. The second step is the following: We use the literature on random partitioning of metric spaces to partition this space into well-separated regions covering almost all of the points. Now why to prove the theorem, think of the following. Suppose we obtain this random partitioning we get the regions that are well separated. Now, because they have a small diameter, one-half. Each of them has a few fraction of the mass. What we can do is simply manage them and make them dense. Because they're originally well separated after managing them they're going to be well separated as well. We're going to get K densely separated regions. >>: [inaudible]. >> Shayan Oveis Gharan: By dense I mean we really have 1 over K fraction. Here they have less than 1 over K fraction. But if they have 1 over K, have nothing to do, have less manage to make them [inaudible] okay. So let's start with the first step and see how we can prove the spreading property. So, again, we want to assume that we have a region of some diameter delta. I want to show that the L2 mass inside this region is at most 1 over K-1 minus delta squared of L2 of 3. Okay? For example, delta is one-half, it's going to be 4 over 3 K of the total mass. How are we going to prove this? So this is one place that we use the radial projection distance. It's important here, because if you don't use the radial projection distance, region of diameter one-half around the origin would contain almost all of the points. Because you can show that almost all of the points are on average distance root K over N from the origin. Okay. So it's important to use this. How are we going to prove this? The idea is to prove the isotropic property. So what this said is that our embedding is kind of very similar trick. It's not skewed in one direction. Mathematically it says that for any vector Z, for any vector Z, if you project all of the points on Z and take a summation of the norm squared of the projection, it's exactly 1. And it simply follows from the fact that our embedding is from an ortho normal basis so any embedding from an ortho normal set of vectors would have this property. It's very easy to prove. Okay. Now let's see how we can use this to prove the theorem. I'm going to choose my vector Z to be inside my region X. Now think of delta as being very small, very close to 0. Then for the vertices in X, their norm and they know after the projection are very close. Okay. So this sum, summation of the normal square projection, will be lower bounded by L2 of X. You should have this delta squared log -trigonometric inequalities. But if delta is very small you can essentially bound this by L2 of X. over 1 minus delta squared. To say that L2 of X is at most 1 Now, on the other hand, we know that L2 of V, the total mass of the vertices, is exactly changed. Why? Because our embedding comes from K normal vectors, if you just sum up these things we can deduct them as the sum of the normal functions F1 up to FK and get K. Now, I'll put them together, prove the theorem. L2 of X is 1 over K. So again what we've proved you prove any region of let's say diameter one-half has at most essentially 1 over K fraction of the mass, delta mass. Now let's see how we can use this to find densely separated regions. So, again, as I said I want to use their random partitioning of metric spaces to partition their pace into well separated regions of constant diameter and then we're going to measure them and get densely separated regions. So I start from putting a grid on the, based on the radial projection distance such that the diameter of each cell of the grid is constant one-half. So grid with respect to radial conjecture distance looks like this. Then I shape the grid. Okay. And I just delete the points that are close to the boundary, choose each set to be the points completely inside, far from the boundary. Now, it's very easy to see that. If I choose epsilon to be 1 over K, each point will be far from the boundary with some constant probability. Can make it very close. So essentially what this says is that I can find these regions, regions of points that they are well separated. They have distance at least 2 epsilon, and they cover almost all of the points. Now, since I started from the grid that have a small cell, cells of diameter one-half, these regions would have diameter one-half as well. So by the spreading property, they have a small mass. At most 1 over K. Now I can measure them. Maybe I measure this region with this one reduced to a size and get K dense well separated regions. >>: Choose a random rotation ->> Shayan Oveis Gharan: Yes. >>: Hyperplanes instead of like rotating this also works, because they're like roughly orthogonal. >>: Why are we choosing -- you should rotate the grid, right? >>: But instead of -- instead of this, let's say you choose log K random hyper plans and around each one. >> Shayan Oveis Gharan: You don't want to start from the grid or -- >>: No, but then intuitively there are almost orthogonals, you get -related to the previous work, not the talk -- sorry about that. >> Shayan Oveis Gharan: But that is -- that is different. >>: I know it's different. I can ask you later. >> Shayan Oveis Gharan: So this is the final algorithm. First we embed the vertices based on the spectral embedding, the eigenfunctions, then we consider the random partitioning of diameter one-half, let's say that covers almost all of the points, almost all of the points are at most distance 1 over K at the boundary. Then we remove the points that are close to the boundary and just look at the regions that are completely in the interior of each cell. We marriage the regions to get K dense well separated regions. And finally we apply spectral grounding on each of these dense regions to get a nonexpanding set on the epsilon data. So this algorithm that I just described is very similar to the spectral partitioning algorithm, spectral clustering algorithm that people use in practice. This paper of Jordan Eng [phonetic] have suggested this algorithm only with the difference that instead of using random partitioning they use K means. But they use, for example, the radial projection distance and simply all others. And here's simple application of spectral nice application of spectral clustering to the image applications to detect parts of objects. So essentially our work gives a theoretical justification for these spectral clustering algorithms. >>: One at a time or [inaudible]. >> Shayan Oveis Gharan: I guess I better -- >>: [inaudible] the objects. >> Shayan Oveis Gharan: Right. So as I said, so for the image application, you have to start from the image and then construct a similarity graph based on how close the two data are. >>: Exact on the element of the [inaudible] correct, which, on the vertices elements of an image, are they images? >> Shayan Oveis Gharan: The vector -- >>: [inaudible] [cross talk]. >>: This is applying each ->> Shayan Oveis Gharan: Yes. To start from an image you construct the graph and then apply spectral clustering algorithm. But there's very vast literature how you can start from a graph and make it -- start from an image or other context and make them into a graph. And this work, for example, is very new, they have very novel techniques how to make these graphs and then ->>: Somehow -- maybe we don't need to dwell on it too long, but sort of spectral partitioning is better than clustering for this application. >> Shayan Oveis Gharan: What do you mean? >>: Partitioning is method to for clustering. >>: Yes, but maybe spectral partitioning is better than clustering so as to minimize the expanding -- what the goal is. >> Shayan Oveis Gharan: No, the point is each of these -- each of the parts of these objects would have a small expansion essentially. For example, look at these green areas here. They would have very small expansion. >>: Also what if you just took a little bit of that ->> Shayan Oveis Gharan: How much time do I have? >>: Seven minutes. >> Shayan Oveis Gharan: Seven minutes. So let me talk about new results. So I'm going to talk about the techniques -- I'm going to talk about new results based on techniques that I covered here. Let's zoom back a little and see what we've done. We've used this particular embedding called spectral embedding. This embedding is an isotropic embedding. Among all isotropic embeddings has minimum energy. What do I mean by energy, the energy of embedding is just the summation of the distance squared of the edges and pairs of the vertices. So it's very easy to see that the spectral embedding the energy is upper bounded by K lambda K. And, of course, it has the isotropic property for any direction the summation of the inner product square is exactly one. For example, here you see the spectral embedding of a cycle hypercube. And these two properties were in fact the main properties that we used for this proof, everything was followed from this. Now by extracting this out and thinking more about it, we can prove a couple of other results that we can talk about. This is joint work with Kwok, Lau, Lee and Trevisan. We proved the Cheeger inequality. So we call that Cheeger inequality we say phi of 2 is at most root lambda 2. We say for any graph phi of 2 is at most order of K, lambda root 2 over K. >>: [inaudible] when you start the sentence for any graph ->> Shayan Oveis Gharan: Sorry. I have a typo. Any K. Right. Sorry. >>: Then K. >> Shayan Oveis Gharan: Any K. >>: But I thought the inequality [inaudible]. >> Shayan Oveis Gharan: than root lambda 2. So lambda 2 of root lambda K is always better So in particular you can say phi of 2 is always at most lambda 2 over root lambda 3 is always better with lambda 2, and the constant is not huge, it's 10. Okay. So and more interestingly the analysis shows that the spectral partitioning algorithm the one that used the eigenvector satisfies this. You don't need to use some fancy algorithm, you can use the special partitioning. And although the algorithm doesn't know anything about higher eigenvectors, eigenfunctions eigenvalues, anything like that, it still matches the quality. For example, it shows that if lambda K is a constant for some constant K, we get a constant factor approximation for phi Y of 2 or sparsest cut graph. This in fact happens in many applications of spectral clustering. By using this new result and putting it together with what I've just talked about, you can even improve the results that I talked about. Instead of having bounding phi of K by root lambda K, you can bound it upper bound by some preliminary function of K and lambda K over root number L if L is larger than K, for any L larger than K. For example, if lambda L is a constant for some L greater than K, then you essentially get a very tight inequality. Or you can use it -- okay. Now let me tell another result. So we use the spectral embedding technique to lower bound the eigenvalues of graphs, the eigenvalues of the Laplacian or upper bound eigenvalues of the random walk matrices to show for any unweighted graph G, lambda K is at least omega J cubed over NQ. And if the graph is regular it's at least K squared over N squared. >>: Is it constant? >> Shayan Oveis Gharan: Absolute constant. And to prove whole new other result that I end up having here, generalizes to infinite graphs, to vertex transliterate graphs. can bound many properties about mixing time of random walks, probabilities. Here's a nice algorithmic application. You We can use this theorem to design a fast algorithm, local algorithm for approximating the number of spanning trees of a graph. Okay. Let me wrap up. So traditionally we knew a lot about second eigenvalue or the spectral gap. We use it to analyze Markov chain, we use it to give an approximation algorithm for sparsest cut problem. Use it in partitioning clustering, but we knew very little about higher order eigenvalues. So of all the works I talked about today, you can see that if we didn't know about higher eigenvalues, we can improve many of these traditional results. We can provide a generalization of Cheeger inequality to higher eigenvalues get K partitioning instead of two partitioning. We can get even an improved Cheeger inequality even without changing the current algorithms, or we can get a new framework for analyzing recursive Markov chains or mixing time of the random walks. Also said that our results give a theoretical justification for spectral partitioning algorithms, spectral clustering algorithms that use the spectral embedding to partition a graph. Here are some open problems. First one is I said to find the right dependency on K for a phi of K. For example, if it is possible to find K disjoint clusters, each of exponential polylog K times the root lambda K. The interesting question is the connection of this problem higher order Cheeger inequalities to the unit games conjecture, small set of expansion problem, for example, open problem if for some large K, this time it's important that the K is large. Think of K as being, for example, N to the 1 over log, log N. It's possible to find a set much smaller than N and expansion at most order of root lambda K. So in particular note that using our result, we can show that for any graph there exists a set of size N over K, expansion of order of lambda K random root log K. So the difference here is that you don't want expansion to be dependent on K. So you're allowed to retain it much larger than N over K, just needed to be sublinear in N. But you don't want independence to do K index expansion. This would have huge impacts on the unit games conjecture as well as expansion problem. The last thing is to analyze different partitioning methods for spectral clustering such as K means, instead of the random partitioning that I talked about. We have some partial results for this case. For example, we can show that if lambda K plus 1 is much larger than lambda K, they have a large gap between lambda K and lambda plus one, then the spectral embedding looks like special K clouds of points, in fact the K means would work. But still we don't know robust version of this definition. here. [applause] And I stop >> Yuval Peres: Questions? Sam will be here at some point tomorrow. We're scheduled but he'll find some time. And also we have some dinner today at 6:30, 6:30 plus epsilon. So anyone interested please come talk to me.