>> Yuval Peres: Okay. So we're delighted to have Dan Spielman from Yale with us, and he'll tell us about graph approximation and local clustering, with applications to solution of diagonally dominant linear systems. >> Dan Spielman: Thanks. Okay. So, yeah, it used to be the other way around. A couple years ago Shang-Hua Teng and I started looking at how to solve the near equations quickly and we started following up on an approach introduced by Pravin Vaidya that I'll explain to you in a few minutes, and our goal is going to be find actually nearly linear time algorithms for solving linear equations at least from reasonable families using tools from graph theory to do it. So in the end we wound up using a heck of a lot of tools from graph theory to do it. And I think this is one of those rare instances where the parts are actually more than the sum. It turns out that I actually real got into the graph theory in the graph algorithms involved in this and while I really like solving linear equations I think that for most audiences actually the graph theory is more interesting. So that's what I'm going to focus on in this talk. In particular I'll talk about local clustering I'll explain in a minute and what various notions of what it would mean to approximate one graph by another graph that we wind up using here. Usually we're going to approximate one graph by a sparser graph. I should say also, you know, be stop me at any time with questions or whatever. This is a small audience and I'm happy to go off on tangents if we need. So why don't I tell you first about the tools we're going to use. The first one is a notion of what it means to approximate one graph to another graph that. It's wrong to say we invent it, it's really forced on us by numerical analysis. And I will explain exactly how and why in a few minutes. And then we're going to look at a sparsification which is taking an arbitrary graph and approximating it by a sparse graph. Now, there are different notions of sparse. The first one that comes up is approximating a graph maybe by a graph of the linear number of edges in the vertices, sort of like think of -- well, as you'll see later I'll say an expander graph is the right approximation of the complete graph. That I'll explain first. Then we're going to talk about approximating graphs by even sparser graphs, say by trees. And then the question is what is the right notion here. It turns out the right notion is something called a low stretch spanning tree. And I will explain to you why and tell you a little about those. And actually what I didn't put on this slide, what we're really going to use is a combination of these two ideas which gets us a notice of approximating our graph by something looks like a tree plus a couple of extra edges added back in. Or at least a sublinear number of edges. And that's really what we'll wind up using. As I said, we were interested in really linear algorithms for doing these things so that after we had these notions the first question was how do we come up with them. And to do sparsification, it turned out we needed a lot of other tools. We needed some fast algorithms for graph partitioning. Now, I this say there's a heck of a lot of algorithms out there for graph partitioning. Some we can proof theorems about, some we use in practice, and the two occasionally meet but we wanted to prove a theorem and have something that ran -- you know, we could prove would run really quickly. The way we got it that was through a clustering algorithm. And we came up with a problem that we call local graph clustering. So I'll explain this problem to you this way. Let's man that you have a giant graph. Imagine like, you know, the graph of everyone who's called everyone in the United States, someone probably has that graph. And then you have a node of interest. And you want to say tell me something about this node like, you know, what's this small clusters around this node or near by this node? So what we wanted was an algorithm where you wouldn't look at the whole graph, where you would start from that node, you know, and start exploring the graph from around that node. And then try to output something useful. In our case coming up with a cluster. And we want to define the cluster in time proportional to the size of the cluster or just a little bit more. And I will actually tell you I'll state this problem more precisely later in the talk. I won't actually tell you how we did it, I will tell you about improvement upon how we did it of Andersen, Chung and Lang, which some of you may have heard of because Reid Andersen was here recently, I believe until a few months ago. I'll talk about that. So those are the tools. Okay. Let me go back to my motivation, which is solving linear equations. Something I feel that's often underappreciated by a lot of my generation especially in computer science though a lot of computer science departments started out with this as their central problem. So the reason I'm very interested in solving linear equations is I find it to be the bottle neck computation in a lot of problems, in almost all of optimization and in quite a good deal of machine learning. It is what is dominating the cost of a lot of computations. One reason is a lot of people just do it stupidly, they use Backslash and MATLAB, which while it's great, you can always speed it up for any particular application. But also there's a lot that hasn't -- you know, there's a lot of improvements in the field that haven't made their way into Backslash. So my belief is that solving linear equations should be like sorting. It should take time nearly linear in the number of bits you need to specify that linear equation or the number of non zeros in your matrix, and you shouldn't hesitate to do it. You know, say you don't hesitate to sort your data. My belief is you shouldn't hesitate to solve linear equations. Now, are we ever going to reach this world? Probably not for general linear equations but it is my belief that in any particular application domain where there's some characteristic of your equations, they have some structure, yes, I think this is achievable. We're starting out with is the first family people almost always study when they're studying such things in numerical analysis, that symmetric diagonally dominant systems of equations. And I'll explain what those are in a minute. The reason the folks in numerical analysis first studied these, well, aside from many nice properties they come up if you're doing things like solving elliptic partial differential equations by the finite element method, so they come up in scientific computing. Not everyone knows they come up in optimization. If you're trying to solve the maximum flow problem and you apply an anterior point algorithm, basically what you're doing is an algorithm where iteratively you are solving diagonally dominant linear equations. That's what comes up. They also come up if you look at spectral graph theory. They come up with as the Laplacians of graphs, and that's where I will start. So rather than telling you all about that and the dominant linear equations let me just tell you about Laplacians, and we'll focus on them for this talk. So how many of you have seen the Laplacian matrix of a graph before? Okay. A lot, not everyone. So there's an isomorphism between the Laplacian matrices and graphs. So if I have a weighted graph like this one here, the Laplacian matrix that comes up so much, the off diagonal entries are negative and they correspondence to the negative weights of edges. So for example I have a weight of -- an edge of weight four between nodes two and three, so both my two-three and three-two entry in the matrix are minus four. Diagonal entries in the matrix are the weighted degrees. That's set up so that if I take a look at the sum of the entries in any row or column I get zero. This is diagonally dominant. That just means that the diagonal is at least the sum of the absolute values of the entries in the row. So the matrix might not be that illuminating. What's more illuminating is the quadratic form associated with the matrix. So in this case you think of X as a vector assigning a real number to every vertex in the graph. So it lives in R to the vertices. And if you take a look at X transpose LX, which is one of sort of the standard things to do with a matrix, what it is, it's the sum of over all edges in the graph. Okay. There's a term for each edge. You take the weight of the edge times the value of the vector at node I minus the value of the vector at node J. Well you look at that discrepancy and you square it. So that's what this quadratic form is. And this quadratic form is usually what we'll deal with. It's very naturally associated with a graph. And it's what we'll use to leverage a notion of approximation. So when I want to type up approximating one graph by another graph, it's going to be related to this quadratic form. Okay. This may be a good place to ask if there are any questions in this quadratic form just because. Well, you'll see others but it's the key concept. Okay. So how did I get into this business in well, I heard about this paper written by Pravin Vaidya, and the sound byte I heard about it was that he was going to solve the system of linear equations in a Laplacian matrix by preconditioning with a maximum spanning tree. Now, when I heard about this, I had no idea what it meant, but I was completely amazed that a maximum spanning tree would come up at all in the problem of solving linear equations. Now, you'll see as I talk about as revealed by, so Vaidya's paper was rejected from a conferences, his grant proposal was rejected. We know about it essentially because he sent a manuscript to John Gilbert before he went off and started a company to solve linear equation solvers, and unfortunately I think because of this we lost really a brilliant mind from academia. He'd solved a number of other great problems beforehand. It was a visionary paper written around '90. So there were a couple of people who bit the bullet and actually, you know, wrote up papers explaining the ideas in it. >>: [inaudible]. >> Dan Spielman: Yes. Still in business anyway. He's got at least, you know, when I checked in a few years ago there were some people paying him a couple million bucks a year to do this. So in the scientific computing business. There are a lot of people who want to solve linear equations quickly. He, however, has not, well, talked to us much about what we're doing now. Any way. Who knows. I don't know what he's doing in his company. So these papers are just listed in order of when they were written as opposed to when they were published. And I have a web page dedicated to all this stuff you can find it. So let me go back to explaining what Vaidya was talking about. So first of all, how many of you have seen the precondition conjugate gradient method for solving linear equations? Okay. Very few. So let me tell you about this because it's what we're going to use and it's really and amazing sort of algorithm. So we want to solve linear equations, AX equals B. B needs to be positive semi-definite. So the first idea is we're going to take A and approximate it by another matrix, B. It's called the preconditioner. And then there's going to be iterative algorithm and in each iteration of this algorithm you solve a system of linear equations in B and if you do one multiplication of a vector by A. Now, that's a very natural idea in a lot of computer science that is to solve a problem you find a related simpler problem and solve that one instead. What's different about precondition conjugate gradient is you can make your solutions to AX equals B as accurate as you want, only solving systems in B. So there's a number of iterations. The key point is log one over epsilon where epsilon is the accuracy. This term here square root condition number I'll explain in a moment. That's a measure of how well A approximates B and vice versa. After that many iterations you get a solution of accuracy epsilon, meaning the X we get minus the actual thing you want, you take a look at the norm of that factor. Okay, it's an A norm. Let's not worry about that. But it's at most epsilon times the norm of what you're searching for, which is as good as you could hope for. So again, I find this an amazing idea that by taking an approximate problem and solving the approximate problem you could actually leverage as good a solution as you want to the original problem. I keep looking for that paradigm somewhere else. I've never -- I haven't been able to do it anywhere else, but it's a remarkable thing you can do. I mention precondition [inaudible] just another variant that we actually use in the paper. Okay. So what this meant is that Vaidya was approximating the Laplacian of a graph by the Laplacian of a tree. And he took a maximum spanning tree of the graph. Now, let me tell you how we're going to measure a notion of approximation. So the first -- the definition of this term called the condition number of errors with respect to B, here's one definition that if you do linear algebra a lot will be familiar to you. It's sort of the maximum of vectors X of the quadratic form in A over the quadratic form in B times the maximums of the reverse ratio. These things are like what are called generalized Rayleigh quotients. This is also the largest Eigenvalue of A times B inverse if they're both positive semi-definite. And that would be the largest Eigenvalue of B times A inverse. So that's one definition. I find this definition awkward to work with. So let me give you another definition. For me for another definition it's going to be -- amount to the same thing. We want a notion of an inequality on a graph. What does it mean for one graph to be greater than or equal to another graph? Okay. So here's the definition that comes from the linear algebra. I'll say the graph H is less than equal to the graph G. I'll give you two equivalent definitions. One is if the Laplacian of G minus the Laplacian of H is positive semi-definite more concrete is if for all X the quadratic form in H is always less than or equal to the quadratic form in G. If that's true for all X, then we will say that H is less than equal to G. For example, to check if you're following, if you go back to the formula for the adequate dramatic form it should be easy to observe that if H is a subgraph of G then H is less than or equal to G. Because these quadratic forms were just sums over the edges of sums square. So if you drop some of the edges it gets lower. Okay. If you've got that notion of inequality -- yes? >>: [inaudible] equivalent to saying it's a weighted subgraph? >> Dan Spielman: Yes. It is equivalent to saying it's a weighted subgraph. You can just decease weights on edges and you get less than or equal to. Okay. Given a notion of what it means for one graph to be less than or equal to another, we can make a notion of approximation. So it turns out if you can sandwich G between H and some kappa times H, put it between both of them, then you've got a kappa approximation. Your condition number is at most kappa. Or actually constants here don't matter. So if I can sandwich G between a constant times H and the same constant times H times kappa, then H is a good approximation of G and the value is kappa. >>: [inaudible]. >> Dan Spielman: Sorry. Actually it does go both ways. If they have if both of the graphs are connected. There's a caveat or two. Or if they have the same connective components, yes, it's if and only if. I will always work with this later definition here. Well, probably I'll use this one in my proofs because it's easier for me to take subgraphs. I'll always approximate graphs by subgraphs or almost always. Let me give you an example. Let's take the complete graph. Let's just look at vectors orthogonal one because one is the -- all one vectors in the null space. If I take a look at the complete graph, this quadratic form is the identity. X transposed Laplacian of the complete graph X is always equal to N. Okay. N times the identity. So what's a good approximation of the complete graph? It's called an expander graph. Actually how many of you are familiar with expanders. Good. Pretty much everyone. For those who aren't, they are the most useful thing abstract mathematicians have ever invented in my personal view. People teach courses about them I can't begin to say enough. Let me do say that one of the most famous examples are the Ramanujan expanders which you can characterize this way. They're de-regular graphs, and it's -- you can -- the property that they have is that for all vectors X the Laplacian, the quadratic form in the Laplacian of the expander is very tightly controlled. It always lives between D plus or minus two root D minus one. So the Eigenvalue of the Ramanujan expanders are very tightly concentrated or the Eigenvalues of the Laplacian. >>: [inaudible]. >> Dan Spielman: Oh, I forgot to say X is norm one. Yes. Thank you. I knew there was something I forgot off one of these slides when I gave them last but I forgot to put it back on. Yeah. X has norm one. That's right. Thanks. Okay. So what does this tell you? For the complete graph the quadratic form is very, very concentrated for Ramanujan expanders it's pretty darn concentrated. This means if I take this Ramanujan expander, multiply it by N over D, then all the Eigenvalues will be near end and the quadratic form will always be near ends. What does that mean? That just means re-weight the edges, multiply the weight of every edge by N over D. Then I get this nice approximation or well I guess you don't need to but another way of saying it is the actual graph is between the raw -- something times the Ramanujan expander and something else just a little bit more times the Ramanujan expander. So for me, those expanders are the best approximations of the complete graph. Okay. Our goal at sparsification is to be able to do this for any graph. The -- all that just maps to zero and you really -- and because it's -- yes, so the all ones vectors in the no space of the Laplacian. And if you add it in, it really doesn't affect anything. So you do get this thing happens for all of us. Good question. Okay. So here is what we want to do. I say one of our first goals is taking any graph and approximating it by a sparse graph. You can again I put up here in the quadratic form sense what that looks like. And we want to be able to do this for any graph, not just the complete graph. One thing that helped us is Benczur and Karger already right did something very similar many years ago, '96, studying cut problems. They did this for all vectors in zero-one, so you saw every X size is zero or one. These are the characteristics vectors of cut. So you want to take a look at this quadratic form and you put in a zero-one vector so the characteristic vector of a set. Whenever XI and XJ are both in this set we get zero, if neither is in this set you get zero. When this is an edge going from inside the set to outside the set you get one. So the idea of Benczur and Karger's paper was they wanted to take any graph and get me another graph which is sparse so that for every single set of vertices the weight of the edges leaving would be approximately the same. And this then enabled them to solve cut problems or max flow faster in graphs, in time proportional number of vertices. And they showed how to do this in nearly linear time with essentially N LogN with over epsilon squared edges via random sampling procedure. So what we need to do was the same thing but we don't have -- our Xs don't necessarily live in zero-one. They are all real vectors, though we can take a lot of inspiration from what they did. So I'll tell you briefly what I know about this problem. >>: [inaudible]. >> Dan Spielman: Oh, we want the size of F. Oops. Oh, thank you. That should be the size of F. Thank you. >>: For any [inaudible]. >> Dan Spielman: Yeah. For any graph -- can I even edit that? I won't worry about it. Do it later. You want to get the number of edges small. Okay. So if we want and what Shang-Hua Teng and I first did was we got an algorithm where the number of edges is small but not real small. Looks like N over epsilon squared, but there's some power of LogN. At least we can do that in time nearly linear in N. And I've had a team of grad students, undergrads implement heuristics for it, so we have a lot of heuristics that well none of the fast heuristics actually works for all graphs unfortunately but if you have a family of graphs we can get you something that will work. More recently one of my grad students Nikhil Srivastava and I came up with an algorithm that actually gets a number of edges that looks like Benczur and Karger's algorithm. I will tell you one sentence about the algorithm. People can ask me more about it later if you want, because I'll be here for a week. The algorithm is insanely simple. It's this. Compute the effective resistance of every edge in your graph. Now choose your edges N LogN over epsilon squared edges, each of them with probability proportional to its effective resistance. For those who haven't seen effective resistance, you treat every single edge in your graph as a resistor and you can just measure the resistance between two points. Just do random sampling of probability proportional to effective resistance. The only reason this doesn't give us a speed improvement is the question is how do you go and compute the effective resistance of every single edge in your graph? Well, we can actually do that in nearly linear time but it requires solving some diagonally dominant linear equations. So that's why I say this is time. You need to solve LogN. So it's LogN times the time to do a linear solve. But it actually doesn't help us for solving linear equations quickly. We now know that we got a high-powered undergraduate, Joshua Batson to join up with us, who I believe was here last summer, so some of you may have encountered him. When we got it down to actually the linear number of edges. So we now know these things exist with N over epsilon squared edges. We have a polynomial time algorithm for constructing them. And no clue yet how we could make it fast. But at least we know existence which gives us a lot of confidence that one day someone will get this. Okay. So that's a little about sparsification. What about solving linear equations? Because we need two things from the sparse -- from the approximation. We need a good approximation of our original problem, but we also need to be able to solve linear equations in our approximate problem quickly. Okay. So let me tell you quickly a little bit about what we know about solving linear equations. The first completely forget about Gaussian elimination. That's nice if you want to solve one linear equation a zillion times. If you want to solve a linear equation once, there's this algorithm called the conjugate gradient, which can be used as a direct method, Ramanujan time order MN. So if M is the number of nonzeros, N is your dimension, you can just solve linear equations in that much time. That at least for sparse matrices greatly beat destroys anything based on trying to compute in inverse. If your matrix has structure, you can solve linear equations really quickly. Like many people probably remember if you have a tridiagonal matrix, Gaussian elimination you do it right runs in time order N for an N by N matrix. That corresponds to a path graph. The same thing is true of trees. If your graph looks like a tree you can solve the linear equations in M order N. This basically looks like doing dynamic programming where you propagate stuff from the leaves of the tree to the root and back down. If you look at what an LU factorization is doing. I'll tell you more about that in a moment. I can't resist mentioning that one of the classic results in the area was of Lipton, Rose and Tarjan for solving linear equations and planar graphs. They have some preprocessing, it's time order N to the three halves but after that it's time N LogN, thereafter. This is based on finding graph separators. So if you can find small separators in graphs quickly and in the subgraphs then you can solve linear equations quickly. And some of the fastest code for solving linear equations is based on stuff like this. Metus [phonetic] has ordering code that works doing that that's very nice. What we will actually use is sort of a combination of these. We're going to look at linear equations that correspond to a tree plus a few edges. And I will just say for now we're going to solve these linear equations quickly. I'll be more precise in a minute. Let me tell you why and what's going on. So for this I need to say a little about what happens to a graph when you do Gaussian elimination? And if I have Laplacian matrix of a graph, it's a symmetric matrix, so you don't want to just do Gaussian elimination the plain old-fashioned way of doing elimination on rows, you want to preserve symmetry. This is what's called Cholesky factorization where every time you to an elimination on a row you also do an elimination on a column. The same column. And if you do this, there's a very nice property. Your submatrices you keep getting are all Laplacian matrices if you started out with Laplacian matrices. So you're always within the realm of graph theory. And you can graph theoretically understand what happens. When I take a node and eliminate its row and column, I eliminate that vertex from the graph and I wind up putting a clique on its neighbors. Okay. There's some weights on the clique but it's a weighted clique. So what you can understand Gaussian elimination on a graph, eliminating nodes, is getting rid of nodes and putting cliques on them. Okay. So I want to point out too great examples. If a node has degree one like a leaf of a tree, you eliminate it, you put a clique on its neighbor but there has only got one neighbor, that's a self loop, and that disappears. So getting rid of nodes of agree one is essentially free. Nodes of degree two is essentially free as well. When I get rid of that node I put a clique on its two neighbors. That's just an edge between them. And if you take a look at the time of algorithms and linear algebra for doing this, the time for an actual solve is the sum of the degrees that occur as you're doing your elimination. Well and there's preprocessing which is the sum of the degrees squared that you eliminate. So this is sort of why when you have trees, you can get a linear time algorithm because you'll always a node of degree one. So that is the graph theory of our classic Gaussian elimination. Okay. So now you know that you can solve systems in trees quickly. And I told you I can approximate graphs by something like expanders, so it's not clear how that's going to help yet. So let me give you some trees that help. So let's go back to what Vaidya did, well he used maximum spanning trees, we're going to use the right thing for this problem. It turns out to be something called a low stretch spanning tree. He can be excused for not knowing about it. It was invented by Alon, Karp, Peleg and West in '91, so a year after his initial paper. But people didn't realize the relevance of it till Boman and Hendrickson in '01 wrote this paper. They said if you take a tree and you try to approximate a graph by it, the graph is less than or equal to the tree times -- well, this stretch of the tree with respect to degree. Let me tell you what the stretch is. Okay. The stretch of the tree with respect to G we define this way. Let's say here I've got my spanning tree in black and a couple other edges in purple. For every single edge in the graph, there's a unique path in the tree between its end points. Take a look at how long that path is. In this case, it's length six. That's what we call the stretch of this edge with respect to the tree. The stretch of a spanning tree is just the sum over all edges if the original graph of the length of their path in the tree. And this is the notion of -- this is sort of the measure or quality of approximation using the tree. So Boman and Hendrickson as I say, pointed out the graph is less than equal to the tree times the sum of the stretches. >>: There's an edge to the graph which is not in the tree and this has a huge weight. >> Dan Spielman: There's a question about how to do this with weights. Sorry. I'm giving you the example of the weights [inaudible]. To do it right with weights, yeah, there's a weighted version of the formula. That is a good question. The short answer is you darn well better include that edge. >>: [inaudible]. >> Dan Spielman: You'll see in a moment how to compensate for that, by the way. So why don't I tell you. Well, Alon, Karp, Peleg and West showed that for every graph there is a tree with stretch like M to the one plus low level of one, so that's pretty good, it's near optimal. N LogN would be optimal. And by the way, this was on a paper related to analysis of the K server problem and competitive analysis, not something about graph theory at all. Sort of a miracle got extracted from here. But this has been improved a couple years ago with Elkin and Emek, we got it down to M log squared and then in the next fox there's going to be a paper getting this down to M LogN log LogN. Probably you can get to M LogN. But who knows? I don't see any good reason you shouldn't be able to. So this already gets you something pretty good in terms of solving linear equations because if you now plug in the precondition conjugate gradient, each of these linear equations in our approximation will solve in order N time and then our multiplications their unique order N time and we're only going to do this, the square root of this stretch times. So that's going to be about M to the square root M times. And that gets us an improvement in solving linear equations. Let me show you a proof of this theorem of Boman and Hendrickson. I'm going to do a very combinatorial proof. Their proof was playing with matrix norms, which is not as intuitive. So to prove it, you just need to know one equality on graphs and then we'll add it up together. If I just have an end between two nodes, let's say number them one and eight and it's got weight one, I compare that to a path with seven edges between nodes one and eight. Okay. The edge of weight one you can see is less than or equal to seven times the path of length seven. Generally speaking an edge of weight one is less than or equal to K times the path of length K. And you can rethink of that as taking the path but blowing up the weight of every single edge by K or seven in this case. That is the one inequality we will leverage. Almost everyone learned this inequality in high school at one point. This is mathematically the same statement that if I take K resist unit resisters in serial it has resistance K. Well it is the same statement. Though it might not be obvious yet. But once you put the right electrical interpretation in, if you want to figure out how to generalize this stuff to weighted graphs, that's the same thing. And this is what people in the random walks literature talk about as [inaudible] inequalities and exploit that way. So this is what we'll use. So what are we going to do? We're just going to take these inequalities and sum them up. So here's what happens. Again, when I -- yes? >>: [inaudible]. >> Dan Spielman: You can get it from Koshee Schwartz [phonetic]. It's not -- it's a few lines from Koshee Schwartz, I think. Unless you know a fancier derivation than I do. I need an induction around it. But we'll see. Yeah. Okay. Let me think about that. You wouldn't be -- you would be amazed at some of the complicated proofs of this statement. Anyway, so let's say I have graph G and a spanning tree T and let's compute some constant so that that constant times T is greater than or equal to G. Since the subgraph will always be less than or equal to. So what we do is take a edge in G that's not in T, and I have this inequality, this edge, say I know it's less than or equal to three times the path of blank three between its end points. That's my inequality. Keep the other vertices around. You just don't have edges on them. But keep the vertices. The way the proof goes is we sum up these inequalities. So I just take every single edge in the graph G, I get one inequality for each edge in the graph G, I have ignored the inequalities you get from the edges themselves that are also in the tree, because that's just a one, you sum them all up. What I get in the end when I sum up these inequalities, I get that my graph is less than or equal to some weighted graph with different -- times or something like my spanning tree but with different weights on all the edges. We just take the maximum of these and certainly the stretch of the tree was in upper bound on the maximum of these. So that's what we will use. And that essentially is the proof that, you know, if I take the sum of the stretch time the spanning tree, that's at least the graph. Okay. Well, fortunately for us Vaidya's maximum spanning trees weren't nearly as go as low stretch spanning trees. Why is that fortunate for us? When he did all this work and used maximum spanning trees, the actual algorithm he got wasn't any better than the conjugate gradient. So he needed to come up with another idea. His second idea was take a tree and add a couple of edges to it. >>: When you say maximum [inaudible]. >> Dan Spielman: That was -- well, if it's unweighted all spanning trees are the same. But, yeah, he's more worried about -- which you can see why some things go wrong. So we're going to use the same idea but apply it to low stretch trees. I've deleted the slides on exactly how, but I can explain them later if people want. But special what we do is we combine a low stretch spanning tree with the sparsifier as I told you about before with the linear number of edges to get something we call an ultra sparsifier. So think I say looks like a tree plus a couple edges. Here's one we did from this air foil mesh. The way I think of it is it's N edges plus well okay, N over K is the parameter I'm going to control and there's some poly log term there. We're still the constant sort of lowers every few weeks now. There's people come up with better constructions of low stretch spanning trees and sparsifiers and so on, so you don't know what it is yet. What you get when you do this is you get an approximation where the graph is less than or equal to K times the ultra sparsifier. So sort of you take a look that when you take basically think of taking when you take a tree plus N over K edges, this term will probably become one in the end or times LogN. You get an approximate approximation of order K. And that's what we actually use. And actually all of this low stretch spanning tree and sparsifiers for us really all goes into making these things we call ultra sparsifiers and if someone can give me a direct construction of these, I would be incredibly happy. There probably is one, but we don't know yet, so we're going through all this other machinery. We do this in nearly linear time. Before I tell you about other fancy things in the talk, why don't I just tell you why this gives us an algorithm for solving linear equations. What happens is if I take a look at this graph, which looks like a tree plus a couple edges, we make K big enough to swamp this out, so this is less than N. That means you have a bunch of degree one and two vertices. If I have a bunch of degree one and two vertices, I can eliminate them like I showed you earlier, and that only takes linear time, and when I'm done, I get a system of linear equations fewer vertices. K is big enough to make this smaller than N. And then we use recursion to solve that system. So you actually approximate your largest to my smaller system than you use recursion. And that's how we get nearly linear time algorithms. So here is roughly the statement. At first we just say for any symmetric family dominate matrix A and any B you get a solution of the linear equations nearly linear time, and I always put up from the numerical analysts there are no other assumptions, you don't care if there's a differential equation underlying anything physical or not, you know, works on social network graphs or what have you. Okay. I don't need to save time. So let me tell you now about how we construct sparsifiers. To tell you about how we construct sparsifiers, I have to tell you about graph partitioning and the measure I will use of it and the measure that we use is the one called conductance. So how many of you are familiar with the conductance of a cut? Many. Maybe half. Okay. So this is for me one of my -- the most fundamental notions to theoretical computer scientists. By the way, I actually go around and I give talks to numerical analysts and I explain to them why this is the same thing at the condition number, which is almost up to Cheeger's inequality which is for them the most fundamental thing. So if I had a set of vertices S, the conductance of the cut is for me the number of edges leading the set divided by the sum of the degrees on the smaller side or the number of edges touching the smaller side. I'll always consider the smaller side, you know. Who cares about the bigger side. The smaller side is was dominating. So I take the ratio of how many edges are leaving versus the sum of the degrees or how many edges touch the smaller side. This is sort of a measure of how much of a cluster S is. If there are many more edges inside than leaving than you say S looks more like a nice cluster somehow because you want it to be more richly connected to itself than the outside world. We're also interested in measuring the conductance of a graph which is the minimum of this quantity over all sets. So you often want to find the set of smallest conductance. Now, the reason the conductance is my favorite quantity in a graph is no matter what its value is, you can do something useful. Meaning if the conductance is large you have an expander or for our purposes, more importantly, we know that if the conductance of a graph is large then we can do a random sampling procedure to sparsify it. After original proof of this was based on a variation of this argument of Feretti and Kowach for analyzing Eigenvalues of random graphs -- or random matrices and sort of inspired by work of [inaudible] McSherry and the connection to go from conductance to these sorts of arguments, spectral arguments you need to go through Cheegers and [inaudible]. But we know random sampling works. So very naive things work. Your graph is wonderful and it is high conductance. When it is low conductance, you're also in really good shape. It means you can remove not too many edges in the graph and partition it into two parts. Well, not -- if I say I don't have to worry about, you know, remove a few edges that I don't have to worry about, those sort of can go in my sparsifier. So fundamentally where we're going to sparsify is by decomposing a graph. We partition the vertex set so we remove not too many edges and each part has high conductance. And you can prove you can do that. There always exists a partition where in this case, at least if the graph has a lot of edges, if it has few edges we don't worry about it, it's already sparse, if it has a lot of edges, there's a decomposition, where each cluster itself has high conductance and not too much edges cross the partitions. And iffy we have such a thing, then we can build sparsifiers. So actually I'm going to type up graph partition just for the purpose of building sparsifiers. But what you would do then is you do random sampling on these edges and well these edges actually retreat recursively and build a sparsifier on those. So you sort of say you know, do random sampling on these, these ones you can recursively find sparsifiers on them because there's fewer of them and doesn't run too many rounds. But the big problem's I have to do the graph partitioning. So now I'll tell you how. So -- and that was a problem because, you know, chopping up a graph like this is a little non trivial. So what -- this is what let us to this local clustering problem. Should I describe this again and say finding a vertex V, you start with a vertex V that's interesting in some large graph. You want to find a cluster near it. You might not actually include view and redo it. In time proportional the cluster site just by exploring from the vertex. So you look at it some of its neighbors, some of their neighbors. We're going to worry about oh, the conductance. We want the conductance to be small if the cluster. And the first question I'll ask you is how you should explore from a vertex. I think if I asked this question 15 years ago, people would have said you do sort of breath first search, your shortest paths. You know, you look at the node, you look at its neighbors, you look at their neighbors, you look at their neighbors. We're now very aware that for most real world graphs once you do that, you'll have looked at every vertex in the graph. So that option is out. So you need a more intelligent way of looking at nodes that are near V. You might do a little bit more work but hopefully you don't load in the whole graph. So the way that Shang-Hua and I initially did this was by looking at nodes that were likely to occur in random walks starting at V. And we started -- we did an analysis of this actually using an analysis of mixing of random walks due to Lovatts and Shimonovich [phonetic] that appeared in a paper on volume computation. I said I'm not going to rewrote this algorithm we called Nibble that does this, I said I won't tell you about our algorithm, I'll tell you about a better algorithm by Andersen, Chung, and Lang which also uses the Lovatts-Shimonovich [phonetic] analysis, but theirs is based in page rank. Or something called the personal page rank vector, which was mentioned in this initial paper of Brennan Page. Let me give you an idea of this algorithm. The idea as I category it was called vertices hold chips or money. But chips is probably a better idea. Which they're going to pass to their neighbors. For those who have seen chip firing games this is natural. So vertices hold chips. They're going to pass them to their neighbors. But every time you pass some chips to your neighbors, you stick some in your pocket. It's capitalism. So you stick some in your pocket, you stick a P fraction in your pocket. Did I put that here? Yes. So any time you pass them to your neighbors, you stick a P fraction in your pocket. How do you decide how you do it? Well, you allocate of your chips, you save half for yourself in the pool that's not in your pocket, and you attribute the other half uniformly among your neighbors. But you only do this if you have enough chips to make it worthwhile. There's some parameter epsilon governing that. You only do this if you have more than epsilon times your degree number of chips. Okay. So this is sort of an example. This thing is .14 chips, this node. It's going to stick one fifth in its pocket, .028. And then send of the remainder keeps half of for itself and distributes the other half equally among its neighbors. These two parameters P and epsilon control fundamental things. The smaller epsilon is, the more nodes you'll actually examine in this process. That's because we say you only send chips to your neighbors if you have more than epsilon times the degree so as you decease epsilon, you're more likely to send chips to your neighbors. Otherwise you just stop. The smaller P is the lower of the conductance of the cut you're looking for. That's a less obvious connection. So the suggestion of this paper essentially as you explore photograph by starting with all of your chips on one vertex and you run this process. What I love about this process is it doesn't require any assumptions of synchrony or anything. You can make the vertices do this operation they call a firing or push operation at any time you want, as long as they're above threshold. Unlike the operation Shang-Hua and I did. So this means you can optimize for cash performance and do all other sorts of fun things. Y so let me give you an example for those who haven't seen this: Here's my graph. Well, here's part of my graph. I have one node we're going to start at. It's got all the chips. I know this part of the graph. Here's some sort of boundary edge. And who the heck knows what's out here. But let's assume this is the rest of the people in the United States over here. Now, if you think about what you'd like to do, before you start talking with the rest of the people in the United States, you'd like to make sure that you've explored this portion of the graph. And that's what will happen. So if we take a look at this, this node starts out, as I say it's going to keep a fifth for itself, move the rest to its neighbors, and okay, so it's got .2, they've each got .1. So now let's -- we can actually pick any node to do this with as long as it's above threshold. My threshold is one over 50th times the degree. So I could do any of these nodes, but I may as well do the middle node again. Actually I could do all the operations on the middle node at once, but it gets low enough. Let's say we pick this node. The amount it has in its pool is above threshold, so it will move some into its pocket, some to its neighbors and some back to here. It's a little bit annoying. You're doing some extra work because you're going back to that node again. But, you know, we're doing some computation to try to save exploration. Keeps going. We could do that node up top. Eventually you do the -- you might do the node in the middle again. Well, you sort of have to eventually because until it's below threshold, you'll want to do this operation. >>: [inaudible]. >> Dan Spielman: Actually it doesn't matter. All you need is any node when it is above threshold says I'm going to do this operation eventually. But to -- you don't need any real guarantee. You don't need to synchronize it, you don't need any global control. As long as you do this you're guaranteed not to explore it too far and go too far afield. The annoying thing is you do keep coming back to some nodes many times, and one of the big problems in the area is how to stop that because it's frustrating. -okay. So here I took us to step 15. As of step 15, you'll notice almost any node that has done this firing operation is put something in its pocket. And every single node has except for this node here and this node up here. An interesting question to ask yourself is can this node do this firing operation. Well, the threshold, it has to have at least degree times epsilon, which would be degree times one over 50 which it doesn't. It's degrees three. So that one isn't going to be able to fire. But this one could still fire, this one can't fire. Hey, that one can just barely fire. Okay. I think. Anyway, so if I take this through all the way to step 24 where you ended up with, so you wind up having explored only these nine nodes. Took you 24 steps which might be a little bit annoying. But at least you didn't go into the rest of the graph. You never actually fired this one and never touched this node. And that means the process actually stayed local. If you decided that you had to explore further what you do, you lower epsilon. And this is how you wind up exploring in a graph, according to this idea. Now, how do we wind up partitioning? Well, okay, if you just did this, there's a very natural idea of partitioning. Take all the nodes that fired. But that turns out to be a little too crude. Though it's good here. Because here you just cut right there. Generally what you do at this point is you order the nodes by how many chips they stuck in their pocket. So this node had the most in its pocket, and you can see the numbers going around. And then you look for a cut in this prefix of this order. So you take let's say first what if I take the node that does the most, that's not a very good cut. Take the top three, well you take the top two, the top three, the top four and so on. None of them are very good cuts until a little further on. Once I take the top eight, this isn't such a bad cut. I wind up cutting this edge, this edge, and that edge. Sometimes this, you know, the outputs are going to look like that. In this case, I of course rigged the example so the best cut is this one. And this is the cut of lowest conductance you get. But you do it by taking the vert sees in this order and take a prefix. So that's how it works. They proved some theorems for it. I'll give you a rough idea. There are many parameters in these theorems so one of them is this. Like if there is a set of conductance less than one over log to the fourth N, if you choose -- if you started a random vertex of that set, then you probably output a set C with conductance that small, one over LogN, let's say, the nice thing is the set C mostly lives in S and you do it in time proportional to the volume of that set times log to the fourth N. The high powers of logs are a little annoying. I don't know if they're an artifact of the analysis are necessary yet. The one thing I can say for the algorithm is you can guarantee that the number of nodes it looks at is at most four times the number of nodes it's going to output. So even though it's doing a lot more work here, this log to the fourth, it doesn't actually examine too many nodes, which is probably the dominant cost to just pulling things into memory. We use this for some sort of approximate cut algorithm, so you sort of just do this, and you do it again and you do it again and you do it again, you keep moving chunks until you can't do it anymore. And you remove things and what you get is basically we -- the algorithm approximate cut, if it outputs a cut you have a guarantee on the conductance of the cut the outputs. It's the most V. And we also know that if the cut is highly unbalanced, if we output a small set, then we know that the compliment is contained in a graph of large conductance. A little bit worse. You get like V squared. For comparison, if you took the optimal cut, if you took the sparsest cut and you knew it was small e then you can prove that the compliment is an expander or has high conductance. Here we can't do that. But here we can approve at least the compliment is contained in a graph which is an expander, which is about as good as we can hope for right now. Okay. So that's enough about graph partitioning. I don't have that much time left. So why don't I just give you some highlights of other things that have been done in this area. So first as I said, I mentioned interior point methods earlier. If you look at solving a family of linear programs like maximum flow and multi commodity flow, what have you, with interior point algorithms, when you do it, you keep solving linear -- the linear systems you get are always very special. So if you're solving maximum flow, you always get diagonally dominant linear systems. Same thing with main cost flow. So Sam Daitch, who is a gradual student working with me and I took a look at interior point methods and said what happens if we solve these systems approximately in. Took a little bit of retooling of the interior point algorithms but with a little work you see that you get very -- that they work with only approximate linear solvers and you get fast algorithms for things this way. So for like with maximum in cost flow we get order M to the three halves, well there's some log terms. So asymptotically, that's almost as good as the best max flow algorithm of Goldberg and Row [phonetic[ and it's beat the previous best min cost flow algorithm. We looked at generalized flow. This is a flow problem where the amount that flows into an edge is different than the amount that flows out. You've got some gain or loss factor. And then it turned out the linear systems were all these things called M matrices, which have been studied for about 50 years in numerical analysis and they come up in economics and a bunch of other cases and we figured out or Sam -- we figured out how we solved those and that gets us fast algorithm for generalized flow. And my hope is that people will extend this chart. I think that for a lot of algorithms that people do solve or a lot of problems people solve by using linear programming, if you study the linear equations you get, you can get faster algorithms for them and thereby get -- throw a new and interior point method in, thereby get faster algorithms for the original problem. My next hope is that someone will do this for multi commodity flow, which would be very interesting. The other fancy things have been used. So first Gremban and Miller and then a paper of Mags, Miller and many other authors and a couple others looked at actually adding vertices to your graph and approximating it. So for example they take an initial graph and then build a tree on top of it. And there's a notion that you get for this of approximation with a graph of more vertices that makes sense and you sometimes get faster algorithms. There are two ways to say what you get. One of them is to get the right quadratic form just on the original is set of vertices. You either choose always the minimum value for the other vertices, whatever minimizes the quadratic form, formally you can say it's what you get by doing gap Cholesky factorization and eliminating those nodes. That induces a graph on the rest. And then you can measure quadratic forms. But when you do that, your system with these more vertices might be much simpler. Like they looked at using trees and they were inspired by work of Harold Rack to do that and first used the rack trees in one of those papers. Also Miller and Koutis looked at a case where they used fewer vertices, they sort of cluster vertices together and this got them a linear -- and with some work they get a linear time algorithm for solving Laplacian and planar systems. So that gets rid of all the nasty log factors that we had. It's a nice result. Shklarski Toledo looked at splitting vertices. They actually talked about taking a vertex and splitting it and then its edges get shared among the copies and they looked at -they were interested -- they introduce this to try to solve linear equations that occur when you're doing structural dynamics. At least for two dimensional tristructures. And Sam Daitch and I finally did the analysis of using this and then preconditioners like I showed you before, we can speed up solving trust structures or doing linear equation. At least two dimensional trust structures. We get to like end of the five fourths. Three dimensional trust structures that come -- there's some common are torics, were don't yet know how to handle. The algebra I think we can do. Okay. So why don't I just give you a bunch of questions that arise from this work. A lot of things that need to be done. First is what else can we do of local algorithms or what else do we want to do with local algorithms? Like everyone's got massive graphs. I'm sure many people have nodes of interest. For me the first thing I ask is can you find a cluster around that node. Reid Andersen you should look at his home page, he's done some good work on finding communities and other things. But there must be a bunch of other problems you might want to solve this way. I don't even know what they are. Probably people at Microsoft have a better idea than I do. But I think it's a -- it's a very good paradigm for designing algorithms. It makes sense and there's some interesting tools for it. The next thing I want to do is some fast and practical sparsification. I said I've had a team of students come up, try to implement all sorts of different heuristics for doing graph sparsification. The best of them is now here at Microsoft, it's a software developer. I'm really still trying to drag them to graduate school because he's a brilliant algorithm's engineer I think. But he came up with a lot of different -- let's see, he came up with a lot of heuristics and you know say usually for any graph one of them works but we don't know how to pick it in advance. So it double help us yet. Faster low stretch spanning trees. The constructions of these things right now are actually like N log squared N or N log cubed N and it turns out that's a bottle negligence for us at least if you want to solve linear systems under about a million, two million vertices that's a bottle neck if we try to do this stuff. So we use other heuristics right now. I say direct constructions of ultra sparsifiers if we could get that, that would be really handy. I could drop these previous two things. And if people want, I can tell them how we build them but it's a bit hairy. It turns out actually for a lot of our applications we don't necessarily -- well, unless we get to linear systems on about 10 million vertices, it's not a win, and we can substitute with something simpler. Above that 10 million vertices we really need these ultra sparsifiers and we just don't have a quick algorithm for them. Let's see. What other types of linear equations can we solve ser these ideas? There's a lot of them. I'm sure people get different types of linear equations in different areas. I will say the one thing we need is we got to some combinatorics. It has to come from a well structured problem meaning I have to -- you have to give me a good reason that the system is not singular or degenerate. Like with linear equations and Laplacian if the graph is connected we know what the null space is. Or we need the combinatorially understand the null space. With stiffness matrices and trust structures we know what they are with interior point methods that there are combinatorial reasons why these things are not degenerate. If you can't tell me why your system cannot be degenerate, then I cannot help you solve it. But if you can give me some combinatorial reason that it's not, then that's something we can work with to try to solve it. And finally the last question, well this is less on this talk as some of my others is what can we do towards making all this linear equation machinery practical? I've done a descent amount of implementation work, we beat a heck of a lot of our competitors. All the ones we've been able to compile but still there's a lot -there's some we can't compile that we know boat us, but you know, stuff built by folks at national labs that are on their machines. But there's a lot to do here and I think it's an interesting problem especially the sort of linear equations that might come up in if you're taking a look at trying to analyze social networks. And I guess I briefly mentioned one application we can do for you is, you know, we can -- by solving a couple of linear equations we can compute the effect of resistance of every single edge in a graph or actually for every pair, we'll get you a data structure that you can use to query the effect of resistance between a pair of nodes. Which seems like a useful thing because it's a very handy measure of how connected two nodes are. To do that we only need to solve a couple of linear equations. So if people want to do that, you know, we're happy to gear up and do it for them or at least take a look at the types of graphs they have and see if we can give an advantage on it. And I think I'll stop there for any questions. [applause]. >> Dan Spielman: Yes in. >>: Do you have any nice lower bounds on the stretch? >> Dan Spielman: Yeah. You can't -- oh, actually you can't do better than N LogN. An expander is a lower bound on that. So if you have an expander and you try to get a tree you're going to have to have at least total stretch N LogN. And I mean and even nice example not every edge will get roughly N LogN. There are also examples where some edges have to have high stretch. For example if you take a cycle, try to get a tree you're going to have, you know, one edge you drop out. It's going to have stretch N. >>: Can you give a specific example of how [inaudible]. >> Dan Spielman: Okay. >>: Suppose you took a binary tree, a regular tree. >> Dan Spielman: Sure. >>: This incident crosses with a cycle. >> Dan Spielman: Cross -- take the product through the cycle? >>: [inaudible]. >> Dan Spielman: Interesting question. I'm not sure. But we can check. That can't be hard. That's my -- well, okay, when I say that can't be hard, there's also a lower bound for the grid of N LogN by Gary Miller and I forget who now. I'm embarrassed to say. So it might be you can't do better than N LogN, because you know, grid is just a path, cross a path. If you give me a tree cross a path I don't know we're going to do any better. But there's a small chance you can do expert. >>: [inaudible] and so are there any graphs where you think [inaudible] the lower bound would be bigger than [inaudible]. >> Dan Spielman: No, I actually think N LogN is the lower bound and I have good reasons to [inaudible]. >>: [inaudible]. >> Dan Spielman: I mean sorry. It's -- well, I mean it's a lower band and upper band, right. It's an upper band, yes. Good reasons to think we can get there, but I can't prove them yet. >>: [inaudible]. >> Dan Spielman: No, I don't -- I mean we don't know the best we know is this N LogN, N log log N, we don't know for every graph their exists a tree if N LogN. The best we have is this result of [inaudible] right now. Yeah? >>: Why don't you just ask for tightness of the distribution of the stretches [inaudible]. >> Dan Spielman: It doesn't help you. So this buy taking linearity of expectation, if the distribution of the stretches comes out right, then there's one that's expected to come out, right. Or well, okay, so if you want every stretch to be expected LogN, okay, I'm sorry ->>: [inaudible] distribution so you look at a typical a pair of neighbors in the original graph. >> Dan Spielman: Yes. Oh, you're looking at the distribution -- okay. So you take a look at -- take a look at a pair of neighbors. >>: [inaudible]. >> Dan Spielman: Oh, if you want a typical -- oh, I understand what you're asking. >>: I don't know if that's useful to you or not, it's relevant. >> Dan Spielman: For a typical pair of neighbors, you can make the stretch smaller. So if you take a look at the analysis of the construction that we did and then the bartaladal [phonetic] improved for almost all neighbors the stretch between them winds up being lower. It's just that there could be a couple exceptional ones that are big. I can explain later how the construction goes. But it would be fairly clear. >>: [inaudible] lower bounds [inaudible]. >> Dan Spielman: Oh, lower bounds than that. >>: [inaudible] special cases [inaudible]. >> Dan Spielman: No. But we could think about them. I mean, yeah, sit down, give me some examples. We could probably -- some of the lower bounds are tricky to make. The lower bound for the grid was non trivial but we could probably exploit it. Yes? >>: This may be a basic question but since you gave this definition of approximation that kind of implies that you [inaudible] definition of distance. >> Dan Spielman: Yes. Okay. >>: I wondered if that definition that you gave of distance is that only standard, it's the definition or [inaudible]. >> Dan Spielman: Okay. So that is a good question. Okay. So let me remember here the right answer to this. So if you take the log of the condition number that's a reasonable notion of distance. I mean the condition number itself isn't because it's always bigger than one. But the log of the conditions numbers is always at least zero. It o base the triangle in equality. And I once proposed this as a definition of distance. And my former gradual student John Kelner [phonetic] explained to me why this is the wrong notion of distance. And if you ask me later I can call them up and find out why I vaguely remember that if you look locally around the matrix there is a right notion of distance or something that locally looks sort of Euclidian and this was not it. So I forget why though. But there is a right notion. This is a reasonable notion of distance. I don't remember why it's the wrong notion. But there is something you can express that at least locally on matrices is supposed to be the right distance. At least on positive semi-definite matrices. Someone else may be able to help me the right notion of positive semi -- notion on positive semi-definite matrices. But there is ->>: [inaudible]. And wouldn't you be getting the lower bound [inaudible]. >> Dan Spielman: Oh, you mean for the ->>: [inaudible]. >> Dan Spielman: Oh, for -- no, there's things are much better than search for construction like these low stretch spanning trees. So the breadth-first search trees don't help us much. I mean actually the best way that I compute them empirically is finding separators in the graph decomposing the graph into two parts, recursively finding a tree on each and then I can use dynamic programming to choose the best edge to have between the two of them. Empirically that works but experimentally -- well, I mean certain people -- you can't prove much about it. And I know a lot of people have tried. And then you have to resort to using graph partitioning code that slows it down a little. But the breadth-first search trees, there's some graphs in which they're good, but -- well, the problem is there are too many of them. If I had -- you can make them that look like, you know, on the grid it could if you're unfortunate look something like this which has very high stretch. That has very high stretch whereas the right thing on a grid sort of looks like this tree with Hs in it. And it goes in a sort of fractal way like that. And we don't know anything better. >>: [inaudible] starting from each node and eventually you have come to this [inaudible]. >> Dan Spielman: I don't -- well, it's not clear to me but maybe if you hook them up the right way. We have to be a little careful how you hook them. I mean, I will tell you the original algorithm of Alon, Karp, Peleg and West did work by growing breadth-first search trees out to a small radius and then eventually condensing those down together and then hooking them up, then sort of iterative -- then sort of recursively repeating this process. It was a bottom up process during local clustering and then grabbing these and clustering them again. And you could see how that would get you something like that. So but they had to be careful to restrict the growth to not go too far in any one breadth-first search tree. And then there were problems of these things overlapping and biting off different chunks of the graph and -- but that is sort of how the first algorithm went. The algorithms that seem to work best use a top down approach. At least so far. They sort of do some sort of graph partitioning and then do something in each part and then add edges to hook the mark. But we can't do a partitioning the graph into two. We have a somewhat fancier partitioning graph that we start with. >>: [inaudible]. >> Dan Spielman: I'm sorry. Say that again? >>: [inaudible]. >> Dan Spielman: Oh, no. LA pack we use. The fancier things that I can't compile or -- well, in theory by the time I get back one of my graduate students will have this compiled but the algebraic multi-grid codes we've had trouble getting to work. There are a number of them. But all the packages seem to require something that you either have to twist or [inaudible] to install or then they conflict with each other. So LA pack we've got working nicely. Thankfully. But, yeah, LA pack double go that far. So there's this other approach called algebraic multi-grid which is this approach towards preconditioning and doing the multi level thing. It looks a lot like ours and it works very well for things that come from partial differently equations. And it seems to work not as well for things that are coming from stuff like social network graphs as far as I can tell. At least I can think of no reason that it should. And people tell me they have trouble with it in various cases so one of my students spent last summer at Argonne National Labs trying to fix things up. But as I said, we can't run all of the code that he used there on our machines yet. So we don't know the full deal yet. [applause]