>> Yuval Peres: Good morning. We're happy to have Shayan Oveis Gharan from Stanford to tell us about spectral graph algorithms.
>> Shayan Oveis Gharan: Hello. Hello everyone. Thanks for inviting me here. It's very good to be around. So I'm -- I want to talk about new analysis of spectral graph algorithms through higher eigenvalues. Okay? Okay.
Suppose we have a set of data points, okay? And we want to cluster them. They want me to do the clustering is first to construct a graph based on this data points and then use a graph partitioning algorithm. Okay? How can we construct a graph? First we put a vertex for each of the date points. Now here, I connected two vertices. If their distance is less than a threshold epsilon. Okay?
So there are also other ways to construct a graph. For example, you may construct a complete graph, a complete weighted graph where the weight between each pair of vertices is a function of their distance. If, for example, you may use this function, E to the minus this stanza squared or normalized by some constancy. Okay? So once we constructed the graph you want to use -- you want to partition the graph.
So how can we partition this graph beginning with a special graph partitioning algorithms.
Okay? So -- so where is the compute multiple eigenvectors of the graph, say V1 to VK, then we use this eigenvectors to embed this graph in a K dimensional space. So basically what we do is that we construct this matrix F where the columns of the matrix are the vectors V1 up to VK.
Now, the rules of this matrix would give me the coordinate of the vertices. The coordinate of
vertices I would be V1 of I up to VK of I. Okay? So -- so this gives me a mapping of the graph to decide the dimensional space. Maybe for the graph I show last slide, this is the embedding, now I can use, you know, one of the heuristics, for example, K mean to vertices in this high dimensional embedding, and that basically gives the partitioning of my graph.
So putting it all together, we get -- we get the spectral clustering algorithm. Okay so let me summarize it again. We start from data points, then we construct a graph, then we look at the eigenvectors of the graph of we embedded in the high dimensional space, then we apply K means partition this vertices and we use that to get a partitioning of the original points. Just map it back to the origin data points. Okay? So the spectral clustering has been used for over 20 years. It has a lot of applications. I chose this particular paper, Ng, Jordan, and
Weiss, that this is one of the very famous papers in this direction. Okay? So is this clear?
Good.
In general, a special graph algorithms are simple heuristics that explore graph structure using several eigenvectors of the graph. Okay? These algorithms are widely used in practice because of the following reasons. First of all, they're very simple to implement. If you have a linear algebra library they can simply implement them. They provide very high quality solutions, and they run very fast. They run in near linear time. Okay?
Here are some applications, for example, I -- I talked about applications in data clustering and can also use them in image segmentation, community detection, VLSI design and many other fields. So for example, image segmentation, what you do is that you you put a vertex in a graph for each of pixels in the image and then the weight of the edge between, you know, a pair of vertices, you know the function of the distance of the pixels and how close their colors or the intensity are. And then you -- you partition this graph and you get a segmentation of
the image. Okay?
Now let me tell you what we know about special graph algorithms in theory. Classical analysis of a spectral graph algorithms only exploit the first or last two eigenvalues of the graph, eigenvalues or eigenvectors of the graph. Okay? Here are some examples. For example we'll bound on chromatic number of the graph. Cheeger's Inequality, the algorithms of finding edge separator of the graph. Even algorithms finding maximum cut, there are many others I want to save later. But all these algorithms only use the first or last two eigenvalues.
Okay?
I should mention that in some random or semi-random models, we know algorithms that use matrix vertebration [phonetic] theory and multiple eigenvectors, but here I do not assume anything about the graph. I mean there is the results would be unconditional. Okay?
So let me summarize what I said so far. Let me put theory in practice, okay. So -- so I said in practice we typically use multiple eigenvectors and more eigenvectors gives better conclusions. In theory, we can analyze two eigenvectors or two eigenvalues and show that normally give good results. Okay? Let me tell you what we do. So, basically we analyze or study the spectral graph algorithms through the lens of higher eigenvalues. Okay? So you can see our result is a bridge between theory and practice.
So let me just give overview of the results without getting into details at this point. So for example, if we take the K eigenvalue of a graph to K way partitionings. We can use higher eigenvalues to justify the performance of the spectral algorithms in practice, we can also use higher eigenvalues to even give faster spectral algorithms.
Okay now if you want to see the actual quantitative version of this result, you should give me some time that I set up some notations and then I'll -- I give the details. Okay? So here is what I'm going to do. I'm going to start by talking about Cheeger's Inequality, okay, here I'm going to set up some notations, and then I'll some results, I'll talk about these three results.
And finally I will very briefly talk about other aspects of my research in some future direction.
Is there any question at this point? Okay.
So let me start by talking about Cheeger's Inequality. So for the next five minutes, I'm going to define the Cheeger Inequality to you and set up some notations, okay? Let L be the normalization of the adjacency matrix of a graph. This is also known as the normalized
Laplacian matrix. Let not define it rigorously, just the point is normalize the entry corresponding to each edge of the graph, but by the degree of its endpoints. Okay? Use normalization to basically normalize the eigenvalues to fit this constant ways between S line two. Okay. This gives us normalization of the eigenvalues.
So, so throughout the talk I'm going to use lambda 1 up to lambda N as the eigenvalues of this normalized Laplacian matrix. Okay? Number one is always zero, and number N is always at most two. Okay?
Now, there is a basic fact in in algebraic graph theory, which says that lambda two is equal to zero if and only if the graph is disconnected. Okay? Now, Cheeger's Inequality gives a robust version of this fact. Okay, you can imagine what it would say. It would say lambda two is very close to zero if and only if the graph is barely connected. But to give the actual quantitative version, I need to define a robust version of connectivity. Okay? So let me do that first, and then give you the Cheeger's Inequality. Okay?
So -- so what is the -- who is the robust version of conductivity. I'm going to use conductance as a robust version of connectivity. So the conductance of a set S of vertices is the ratio of the number of edges leaving the set to the sum of the degrees vertices on S. Okay, for example, in this picture the conductance of this set is one eighth, because two edges are leaving the third and the sum of the degrees is 16.
Now conductance is always a quantity between zero and one, it can easily show that. And a point is the closer to zero means that we have a better cluster. Okay, for example, if the edges of the graph represents friendships in a social network, then a set with a small conductance would represent community in a social network. Or if the edges of the graph represents the similarity between data points, then a set with a small conductance would represent a cluster of data. Okay?
So this part that comes of the graph is set with the least with the smallest conductance. We going to choose it among all of the sets that have at most half of the vertices or half of vertex decrease. Okay. So I'm going to use phi of G as the set with the smallest conductance in the graph. Okay?
Now I'm ready to tell you the Cheeger's Inequality. So, Cheeger's Inequality for -- for graphs is proved by Alon and Milman, and it says the following: For every graph G, phi of G is very well characterized by lambda two. It is at least one-half of number two and it is at most, root two lambda two. Okay? So let me tell you how you should read this. You should read this as follows: G is barely connected if and only if lambda two is very close to zero.
The importance of this inequality is that it is -- it is independent of the size of the graph. So no matter how large your graph is, you still get the same characterization.
The proof of this inequality would give you a simple linear time algorithm. In fact, it shows that the spectral partitioning algorithm for graphs would work if K is equal to you. If you just want to partition your graph into two sets, use the second eigenvector it would work, it would give you a set of conductance at most square root to phi of G.
Okay? Good. So, let me tell you some applications of this inequality. Cheeger's Inequality is one of the fundamental results in the spectral graph theory. It has applications in various fields of computer science. For example, in algorithm design, many of the approximation algorithms use this inequality, or algorithms that looks for a separator of a graph.
There's a whole literature in probability theory that use Cheeger's Inequality to analyze the mixing time of random walks, and you can use this inequality to design algorithms that can sample from very sophisticated distributions. There's also many applications to complexity theory and cryptography. For example, in constructing expanded graphs or in error-correcting codes. Okay?
Okay. Now that's -- we all understand Cheeger's Inequality, let me tell you our results, our contributions. So as I said, we analyzed the spectral graph algorithms with a theme of higher eigenvalues. In a joint work with Lee and Trevisan, we prove a higher order variance of
Cheeger's Inequality relate the conductance of cuts in K way partitionings to lambda K, the K eigenvalue of the normalized Laplacian. This will provide a rigorous justification of the spectral clustering algorithm that I describe at the beginning of the talk.
In a joint work with Kwok, Lee, Lau, and Trevisan, we managed to improve Cheeger's
Inequality, using higher eigenvalues of the graph. And this gives us rigorous justification of
the performance of the great performance of a spectral partitioning algorithms in practice.
Also in a joint work with Trevisan, we prove it is on fast local algorithms for finding small communities in social networks. This basically gives -- gives an algorithm the same guarantee as of the spectral partitioning, but with the advantage that it can be run in sublinear-time. Okay? So -- so I'm going to basically talk about these three results. And I'll spend most of the time talking about the first result. I'll give you some ideas and then I'll talk about the last two. Okay?
So next I'm going to talk about high order Cheeger's Inequality. High order. Okay, so for the next ten to fifteen minutes, I'm going to talk about this result, and then talk about the last.
Okay. So here, you want to study the following problem, K clustering problem, okay. So we are given an uncorrected graph. It can be weighted but let's assume it's not weighted to make it simple, make it -- make notation simpler. We want to determine K disjoint clusters of small SK of small conductance. All right? So for example, if this is our graph, maybe we find these clusters, the conductance is as follows.
Now the quality of this clustering is just the worst conductance of these sets, just a mass of conductance of the sets. So here it's two S. All right? So in general, we if you find a clustering of S1 up to SK, we define its quality as the maximum conductance. And our goal is to find a K clustering with maximum conductance is as small as possible. Okay? So the optimum is -- is the clustering that achieves this and going to use this problem of phi of K for the optimum. Okay, so again, phi of K is a K clustering whose maximum conductance is as small as possible.
So let me tell you what we prove. In joint work with Lee and Trevisan, we show that phi of K is very well characterized by lambda K. This proves the conjecture by Miclo. The point is here similar to Cheeger's Inequality, there is no dependence to the size of the graph, so now matter how large the small graph, still you get the same characterization. There's a dependence to the size of the clustering K but not to the size of the graph.
Okay. Let me show you an example to -- to understand this better. Say our graph is just a cycle. Okay? What is the K clustering of a cycle? Basically what you need to do is to find K in paths each of length about and over K. Okay? This is the best we can do. And therefore, phi of K will be K over N, because the conductance of all these sets will be about K over N, or two K over N.
Okay so -- so this phi of K will be about K over N, lambda K will be K over N squared for a circle. So putting these two together you see phi of K is less than root lambda K. We don't even have a dependency to K in the right answer. Okay? Is this clear?
So what else do we prove? We show that if you're allowed to use two K eigenvalue -- lambda two K instead of lambda K -- then we can significantly improve the dependency to K from K squared to root log K. And if the graph is low dimensional it's a planar bond ingenious
[phonetic] graph, we can completely get rid of the dependency to K.
Okay? So the second result here -- a weaker version of the second result is proved by -- independently by Louis, Raghvendra, Tetali, and Vempala. Both of the second and third results are optimal up to contact centers, for the first result we don't know, still it's an open problem. Okay?
>>: [inaudible] is bounded by constants?
>> Shayan Oveis Gharan: Yeah tri-genus [phonetic] is bounded by --
>>: So what is the dependence on [indiscernible]?
>> Shayan Oveis Gharan: Logarithmic.
>>: Okay.
>>: [inaudible] lambda two K but it's at lambda half K?
>> Shayan Oveis Gharan: Yeah, we can do that. Yeah. For any constant greater than one.
>>: But does [inaudible]?
>> Shayan Oveis Gharan: Yeah. If you do one plus epsilon, then you lose some -- some function of this. In addition to this, our proof is constructive. It basically give you an algorithm to find S1 up to SK such that the maximum conductance is at most O of K squared phi of K.
All of this is all for construct. Okay? We can.
So it -- the proof also provides a rigorous justification for a spectral clustering algorithm, for example, shows that you know your data or your graph has a good K clustering if and only if lambda K is very close to zero. Okay? Let me tell you a little bit of the proof. Okay. So since
I have limited time, I'll try to give you the main ideas. Okay? So for the next five to seven minutes, I'm going to talk about proof ideas.
Before saying the ideas, let me first define a continuous relaxation of conductance and then tell you the ideas. Okay? So I'm going to use Rayleigh quotient as a continuous relaxation of the conductance. Okay? So for a vector X Rayleigh quotient is this quantity, okay? You don't need to understand this, the point is if our vector X is a zero one function, a zero one vector, then this quantity will be exactly equal to the conductance of the support effects.
You can easily check that in this case the numerator would be exactly the number of edges leaving the support and a denominator is exactly the summation of the degree of vertices of the support. Okay? So -- so basically the point is if you could for example find the zero one vector minimizing this Rayleigh quotient that would give us this part you the start of the graph so you cannot do it then the hard problem.
Now let's see what are the optimizers of this continuous relaxation. The nice thing about this is that the optimizers the minimizers are exactly the eigenvectors of the normalized Laplacian matrix. After -- you need some -- you need to do some normalization, it will decrease, but after the normalization, optimizers are exactly the eigenvectors. In fact, serve E 1 is the minimizer of this, R of V1 is lambda one, V2 is the minimizer of the Rayleigh quotient over all vectors that are orthogonal to V1 and so on and so forth. V3 is the one that minimizes over all vectors of orthogonal to V1 and V2 and so on. Okay? Or V1 is lambda one or if V2 is lambda two or if VK is lambda K.
So -- so basically, we can think of V1 up to VK as a K dimensional orthonormal basis minimizing the Rayleigh quotient. So in other words, you can think of our problem as a rounding problem. So, this K dimensional basis gives us an solution to the continuous relaxation of our K clustering problem. You want to round it into an integer solution.
So how -- how are we going to use this? We're going to use the continuous relaxation by -- by embedding our graph in high dimensional space. This is exactly what I did at the very beginning of the talk. Okay, so basically I map each vertex I to reflect V1 of I, V2 of I up to VK of I.
Okay. So let me show you some example to better understand this. For example, if -- if the graph is just -- has just has K connected components, then this spectral embedding maps each connected component to a set rate point in this high dimensional embedding. Each component maps the same point. You can see that clustering will be very easy. Okay? If the graph is a cycle and K is equal to three, then this the special embedding gives you exactly the cycle.
So -- so what happens in general? In general this embedding has two important properties which will be crucial for the proof. The first one is that the mapping spreads in the space.
The vertexes cannot be concentrated into three places. They have to spread in the whole space. Okay? Think of the cycle for example. The second one says that adjacent vertices will map to close points in this high dimensional embedding.
>>: [inaudible]
>> Shayan Oveis Gharan: I'm going to say the quantitative -- yeah. So -- so we're going to use the first statement, the first property to argue that we can choose K disjoint clusters because the appointments are spread in the space we can choose K disjoint clusters. We use the second property to basically to choose our clusters from groups of closed points.
Okay? So you see the cycle for example.
So -- so basically these these two make two components of our proof. So what I'm going to do is that in the next two slides, I'm going to tell you about each of these components separately. Okay? So let me first start the first component and then tell you about the second.
So the first component is called the spreading property. So we pool the followed statement, we prove that each narrow cone through the origin has at most essentially one over K fraction of the vertices. Okay? Here is the exact quantitative version but you do not need to understand this. The point is if you look at each cone it has only one over K fraction, so it cannot be too much vertices in one direction. You have the spread the space. Okay?
Now the way the proof of this goes is -- is through isotropy property which shows that if you choose a unit vector in this high dimensional embedding and project all of the points to this vector, the mass after the projection is exactly one over K of the mass before the projection.
So basically, all directions look the same in each direction, you see only one over K fraction of the mass. And the proof of this only is the fact that our embedding comes from an orthonormal set of vectors. So we don't use any thing special about eigenvectors, any embedding you give me from an orthonormal vector would satisfy this.
Okay, so this was the first component. Now let me tell you about the second component.
Which is -- which is that -- how we choose this -- this joint clusters. K disjoint clusters. Okay?
In the previous case, the previous component I wanted to argue that the veteran size cannot be concentrated into three spaces, right? Now if the vertices now -- now in the -- in this case the difficulty is that the vertices kind of form this -- so in fact, if the vertices are concentrated in
K places, I can just return those K point -- K clouds. Right? And that would give very good
solutions.
The hard instances in this case are if are those when you have kind of uniform distribution of the vertices in the whole space because here you kind of have to, you know, partition these vertices and separate the vertices that are very, very close to each other and that would make the problem hard. Okay? So in order to argue that we don't cut too many vertices, we use random partitioning of metric spaces. So basically, we randomly partition this space. And because we do it random, each edge would be cut with some very small probability.
Okay? So here we use some vast literature on random partitioning of metric spaces using the works of Charikar, Chekuri, Goel, Guha, Plotkin, Gupta, Krauthgamer, and Lee. Okay? Now, these random partitionings in care of loss, that is the polynomial function of the dimension.
Okay, and since we are in K dimension, we get a loss of polynomial in K.
Now, if you wanted give a better loss, if you want to decrease the loss. The idea is to, as you may guess, is to do dimension reduction. To go from K dimensional space to log K dimensional space. But this wasn't an easy task. The reason is that because K can be very much smaller than N dimension reduction will not preserve many of the pair wise distances.
Okay. Nonetheless, we show that it preserves the spreading property and the average age length with with very high probability and that is what -- what is essential for our proof.
Okay. So before finishing this part of the talk, let me compare what we did with the Cheeger's
Inequality. So in Cheeger's Inequality, as I said, we just use the second eigenvector. We basically mapped the vertices to the two line based on the values in the second eigenvector, not because this is just one initial embedding you can test all cuts, okay, and choose the the best of them.
Here, instead of choosing one eigenvector, we use multiple eigenvectors and map the vertices in a high dimensional space. Now you can see where the difficulty comes from. We cannot test all of the cuts. They're exponentially made. So we use random partitioning to avoid cutting too many edges so you can see our proof as a high dimensional variant of
Cheeger's Inequality proof.
Okay. So let me conclude this part of the talk. Cheeger's Inequality is one of the fundamental results in special graph theory. It has applications in various fields of computer science. We manage to generalize this inequality higher dimension -- to -- sorry, the K way partitionings.
Our proof gives the rigorous justification for the spectral clustering algorithms that use multiple eigenvectors. In addition to that, our proof introduced new components that can be used and possibly improve the quality of a special clustering algorithms.
So here's an example that I try. So say you want to cluster these data points, okay? Now I --
I use the origin special clustering algorithm and this is what I get, and then I applied the dimension reduction and this significant improved the quality. Okay? What I here -- I'm not claiming that dimension reduction always gives better answer, I'm just try -- I'm just saying this may help in some cases to get better quality solutions. Okay? One reason for that is basically that partitioning in lower dimensional space is easier.
Any question? Okay. So I'm done with the higher Cheeger's Inequality. Now let me talk about improved Cheeger's Inequality. First, let me tell you about the tightness of Cheeger's
Inequality. Okay? Turns out that both sides of the Cheeger's Inequality are tight. The left side is tight for hypercube, and the right side is tight for a cycle. For a cycle, for example, lambda two is one over N squared, and as -- one over N. Okay?
So if it is tight, how can we improve it? So let's look at the K eigenvalue of a cycle. It's about
K squared over N squared, yeah? Now, you can see that K lambda two over root lambda K for a cycle is about root lambda two. So although root lambda two was tight, this is also tight.
So -- so basically we show that this wasn't a coincidence for a cycle. It -- it actually holds for any graph. For every graph G, phi of G is at most O of K lambda two over root lambda K.
The importance of this is that this guarantee is achieved by the special partitionings algorithm that used only the second eigenvector. This was very surprising to us, not because -- because a special part -- a spectral partitioning knows nothing about higher eigenvalues and high eigenvectors. Okay, but still you can analyze it using the higher eigenvalues and -- and prove guarantees.
So -- so let's just think things finally say, in general, you may have graphs where lambda two is very close to, say, lambda two is one over N. Then the original spectral partitioning origin analysis would tell you that the solution of better partitioning can be many times worse than the optimum, could be very bad. Okay, but here what we say is that although lambda two could be very very small, if I know that lambda K is much much larger, I still can can show, you know, this algorithm gives a very good answer.
So this is actually what happens sometimes in practice. So say you want to do image segmentation. So typically in image segmentation, you have to have K or K minus one objects. Now you construct a graph based on these objects, right? You put a vertex for each pixel and connect them.
Now, it turns out that the graph that you would construct typically would have very large -- like
the K eigenvalue of the graph to construct would typically be very large, would be very close to one. The reason is that this graph would look like unions of K expanders.
Now by our analysis we can argue that phi of G is at most order of K lambda two, therefore, the solution of the special partitioning algorithm is only order of K times worse than optimum.
So it's actually very close to the optimum. And that is exactly why -- why people use this algorithm in practice and they get very good results. Because many of the practical applications satisfy this prompt.
Okay. So this was what I wanted to say about this result, so basically, you can segment this image and get these objects. Let me tell you very very little about this last result, like, a two minute to three minutes about finding small communities in large social networks. First, let me tell you why I'm interested in finding small communities, then tell you what we do.
So basically, here are the three results, the three reasons. The first one is Leskovec, Lang,
Dasgupta, Mahoney show that do a practical empirical analysis and argue that best communities in large social networks are very very small. They have size only about a hundred. Okay? Second reason is that typically a small communities would represents a group of people with -- with similar interests, whereas large communities would correspond to larger scale known factors, such as age or ethnicity. Okay.
And lastly, if you find that partitioning into large communities -- sorry, if you want to find partitioning into small communities that would in sell of partitioning into large ones. This is because if you have two community of a small conductance, their union would have a small conductance as well. So basically you can just combine the small communities and give a partitioning into large community.
So what do we prove? Show we showed the following. We show that for any small community, for any target T that you would like to find, you can find a set which is a slightly larger than T. We can guarantee that conductance of S is at most square root of conductance of T. And we can do it very fast, in time proportional to the output size. You know, S could be much smaller than the size of the graph.
So this basically improves upon the results of Spielman, Teng, Andersen, Chung, Lang,
Andersen, Peres, so basically, these results show that you can do the same thing but their guarantee has this dependency to the size of the graph, has the root log N dependency to the size of the graph. So we manage to get rid of this dependency and get, you know, a guarantee in the sense very similar to what you get from spectral partitioning and Cheeger's
Inequality. Also a weaker version of this is proved by Kwok and Lau independently.
So -- so -- so, what we show basically implies that you can get an algorithm that would almost the same guarantee as of the special partitioning but with the advantage that it can be run in sublinear-time. Also our proof gives improved lower bounds on mixing time of random walks that I'm not going to get into.
Okay. So let me -- let me conclude what I said during this talk. I talked about spectral algorithms, spectral graph algorithms, through the lens of higher eigenvalues. I talked about these three generalization and strengthening of Cheeger's Inequality in a spectral partitioning algorithm. Our results use developments in high dimensional geometry. It also develops new techniques, new tools that has been used elsewhere to get new results. For example, we use these techniques to give universal bounds on higher eigenvalues of graphs giving proof approximation algorithms for max cut, min bisection, so on. Also prove a new regularity
lemma. Okay? I'm not going to talk about these results here. You can ask me offline.
All right. So I --I worked on approximability of various problems. And here I wouldn't have time, I mean I can just give one talk, I cannot tell you about all of this, so let me tell you about approximating traveling salesman problem. Okay?
So let me remind you of the traveling salesman problem. Say this may have happened to many of you. You go for a short visit to a new place, maybe you go for a conference, to
Seattle. And perhaps in the second day of the conference, you -- you get tired, so you decide to visit some places. Maybe in this case you want to visit Parker Street, Medicine Park,
University of Washington, Green Lake, Space Needle. I'm [indiscernible] where it was. A little bit farther.
So basically, you don't have unlimited amount of time. You want to head back the to the conference as soon as possible. The question is: What is the fastest route to visit all these places? And maybe in this case this is the fastest route. So this problem is known as the traveling salesman problem. So again, you have a set of places you want to visit all of them and return back to the starting point.
So this problem has -- has many variance. These are, I would say, three of the most important ones. The most famous one's the symmetric TSP where we assume that the distance function is symmetric. Okay. Going from here to, for example, U Dub has the same cost of going to U Dub from here.
Then there's asymmetric version of TSP where we don't assume this symmetric -- this symmetry property. The distances could be different. This is a generalization of symmetric
TSP. And then there's the Euclidean version, where we assume the points are embedded plane and the distance between a pair of points is -- is the Euclidean distance between corresponding points on the planes. Okay?
So let me tell you what we knew what -- what was the state of the art before our work.
There's a 1.5 approximation algorithm by Christofides that has hasn't been solved for a -- hasn't been improved for over 35 years. There is a log N approximation by Frieze, Galbiati,
Maffioli that although many people have tried they only managed to improve the constant in front of the log N. And there was a PTAS, polynomial time approximation scheme, for
Euclidean TSP which means that you can get very close to one polynomial time. Okay?
So -- so what did we do? Here just a summary we in a joint work with Saberi and Singh here managed to improve Christofides to 1.5 minus epsilon approximation for a canonical important special case of TSP called graphic TSP.
In a joint work with Saberi, Goemans -- sorry, Asadpour, Goemans, Madry, Saberi, we managed to improve the log N barrier for asymmetric TSP and improve it to log N of log log N.
Also in a joint work with Saberi, we managed to get a give a constant approximation for asymmetric TSP on planar graphs. Okay? In addition to that, the proof of this develop a new technique in algorithm design called rounding by sampling. It has been used in many other -- for many other problems, many other applications.
All right. So in the very limited amount of time that remain, let me very briefly tell you about future. So let me tell you some interesting problems. So here is a theoretical problem. So there is a inherent connection between special algorithms and some very hard conjectures and theoretical computer science unique games conjecture.
So let me remind you that here I -- I showed that by higher order Cheeger's Inequality we know phi of K at most O of root log K lambda K. I told you that this root log K is necessary; you cannot get rid of it. It's tight. Now Arora, Barak, and Steurer showed that this root log K is not necessary, is unnecessary, if K is very large. So although for a small K is necessary, if
K is polynomial Y, it is not necessary. You can get rid of it. And using that, we managed to get some improvement.
Now the point is if you can improve this result a little bit from N to the epsilon to, say, two to the root log N, then that would refute the unique games conjecture. It could possibly imply too many advances in design approximation algorithms for -- for many problems like max cut, vertex cover and so on.
Okay, here is another problem. I'm also very interested in online optimization, and I've done some work in this direction, you can ask me offline, but here is an interesting problem. So, let me remind you that in online optimization, the difficulty is that we don't have full access to the input. Okay. The input arrives online, and we have to decide irrevocably once we see a new element. Okay?
So what we do usually is that we compare ours with the optimum offline algorithm. Okay? I call this the information god. Okay this is the algorithm that has the full knowledge of the input. Okay. Now usually there is a information theoretic barrier, because we don't have full information of the input, we cannot get very close to this information god. Okay?
Now, ideally, we would like to compare ours to the optimum online algorithm. This is the algorithm with the same knowledge as us, but with the advantage that it can be run that it has
unlimited computational power. I call this one the computation god. Okay? Now it's very easy to see that computational god is always less powerful than the information god because it has less information. It has the same information as ours. So competing with him is easier.
You can hope to get past the information theoretic barrier.
Okay. Let me tell you some -- let me be more specific. So in particular, will he say we have some online optimization problem, and you are given the distribution of the arriving inputs a priori, so you know the distribution. Once you know distribution, you can compute the optimum online algorithm in exponential time, you can write a exponential time dynamic program and compute it. Now the question would be: How can you approximate this optimum online algorithm in polynomial time? Now the answer to this question can have a lot of applications. In a stochastic optimization, for example in online advertising or in flight scheduling, many other fields.
Okay. So let me finish up with this last slide. As a theoretician, I usually think how -- how my research can impact practical applications. I try to have this in mind when, when I choose my
-- my direction of research.
So here are two ways that I think a theoretician can be helpful in practice. The first one is to design new tools and new techniques that can give us, that can help oh people, other algorithm heuristics in practice. The second one is to prove a rigorous justification for many of the heuristics that people use in practice, but I don't have any idea why these are working.
Okay? Okay. Let me stop here and I would be happy to have your questions.
>>: [applause]
>>: Questions?
>>: So, like [indiscernible] higher eigenvalues, like how much time does it take if I want to get say the first K item is K times --
>> Shayan Oveis Gharan: K times N you can do.
>>: -- N.
>> Shayan Oveis Gharan: Theoretically, you need to use this linear Laplacian solvers, and using that you can do anything K times N.
>>: [inaudible]
>> Shayan Oveis Gharan: Yeah [inaudible].
>>: [inaudible] to the improved Cheeger Inequality, in what way you want to use it?
>> Shayan Oveis Gharan: So -- so the point is you can use this to to get algorithms for to get very fast algorithms for low threshold graphs. So these graphs are generalization of expanders. So expanders, you have the property that, the second eigenvalue is very close to one. Here the first K eigenvalue could be small, but the K plus one eigenvalue's very close to one.
Now there was, there is many other papers that study many algorithms on low threshold rank graphs. You can use sophisticated STPs or the Laseur [phonetic] hierarchy to study them.
But here, what you wanted to say is that you can use this very simple algorithm, it just goes to second eigenvalue second eigenvector, to give very good performance for the special -- no problems on low threshold rank graphs.
>>: Any more questions?
>>: In special clustering, I just tried to use it in practice a few times, and I'm always a little bit confused about you know, how to pick the right type of Laplacian matrix and you know how
[indiscernible] graph the eigenvectors. Are are there any -- are there any -- can you give me guidance on -- on those kinds of questions or are there people working on those kinds of questions? This type of problem?
>> Shayan Oveis Gharan: Right so, so here we use one particular Laplacian like normalized
Laplacian, we use eigenvectors of this. In -- I mean in the -- in the literature I've seen people use different eigenvector, different eigenvectors and different normalization. There is a very nice survey by Lukesburg [phonetic], I think, who basically kind of gathered different normalization. Like, we can use either eigenvalues -- eigenvectors of normalized Laplacian, or the random walk matrix, or in fact the whole adjacency matrix without any normalization.
So all three are started.
>>: I think part of the question is even in when we start this data, how to choose which graph to -- to look at?
>>: Right. Yeah. [indiscernible]
>>: [indiscernible]
>> Shayan Oveis Gharan: Yeah, so, yeah if you want to -- yeah. So -- so one question is how you construct the graph, the second one is once you construct it, which normalization do you use? This was the question, this is answer to that normalization. Now, for constructing the graph, again, there are many ways. I've described two of them. In the first slide, one of them was that connect the vertices that are close to each other give some threshold and connect the vertices that are close to each other. The second one is basically construct a complete weighted graph where the weight of each edge is, you know, some function of the distance. For example, E to the like exponential function of minus distance scared.
>>: Right, yeah. Well, so both those two things have a magic number, right? One is threshold, and one is [inaudible] --
>> Shayan Oveis Gharan: Right, so --
>>: -- to the exponent.
>> Shayan Oveis Gharan: So -- so for example, in this paper of Ng, Jordan, and Weiss, they, they argued that what you should do is basically you should try very different thresholds and see what -- what would give you the best answer.
>>: So there's no somehow --
>> Shayan Oveis Gharan: If, for example, you assume your data is from some Gaussian distribution and you may have some prior knowledge on bounds on the variants of the
Gaussians, then you can use that.
>>: [inaudible] but isn't this just guessing --
>> Shayan Oveis Gharan: Yeah, you can -- you can do it as a -- but mistake is more time -- yeah.
>>: Any other questions?
>>: [applause]