18039 >> Eyal Lubetzky: It's a great pleasure to have...

advertisement
18039
>> Eyal Lubetzky: It's a great pleasure to have Nikhil Srivastava here, a student of Dan Spielman
at Yale. And he'll be talking about twice Ramanujan sparsifiers.
>> Nikhil Srivastava: Can you hear me? You can probably hear me anyway. But anyway, so this
is joint work with Josh Batson and Dan Spielman. So let me begin by stating the problem. So the
objective of sparsification is to be able to approximate any given graph G by a sparse graph H for
some natural or useful notion of approximation. So what we would like to happen is sort of shown
here in this picture. You're given a graph G on N vertices undirected graph.
This could have up to N two edges. Could be dense. And you would like to find a graph of H on
the same set of vertices which is sparse, so it has order N, order N log edges. And we'd like H to
be close to G in some rigorous and well-defined set.
We're going to allow that the weights of edges in H are different from each other and from their
weights in G. In general, G is some weighted undirected graph. H is some other weighted
undirected graph that may have different weights.
So there are essentially two reasons why you would want to prove something like this. The first is
that it's just interesting graph theoretic statement. If you can see show every graph is close to a
sparse graph in whatever sense, and the more practical kind of motivation is that H is a lot faster
to store in memory and to compute with than G, since it has few edges.
So if it's close to G in some appropriate notion, then depending on the problem you can use H as
a proxy for G computations and do things more quickly. Okay.
So one of the first constructions of sparsifier is given by Benczur and Karger in 1996 for the
purpose of finding cuts.
For Benczur and Karger, H approximates G if for every cut S, subset vertices S, the total weight
of edges leaving S is approximately equal in H and G up to 1 plus minus epsilon factor.
What we'd like to happen as shown in this picture, on the left you have G, have some cut S,
whole bunch of edges crossing it. On the right you have the corresponding cut S in H, and this
has far fewer edges, since H is sparse. But the weights have been blown up. So the total weight
is approximately equal up to this factor.
And we would like this to hold for all cuts. Okay.
So what Benczur and Karger showed is every graph G has sparsifier H. Actually subgraph. Set
of edges of H is subset of edges G with order N log N or epsilon squared edges, if you want 1
plus or minus epsilon quality approximation, which is a cut sparsifier.
And they gave a random sampling algorithm to find this H in new linear time in M the number of
edges of G. And this was used as, you know, a preprocessing for all kinds of algorithms for cut
optimization, like min cut, max cut or sparse cut, whatever, because if you only care about cuts, G
and H are essentially equivalent you have far fewer edges.
If your runtime is on M it can be significantly reduced by doing this first.
>>: M is the number of entries.
>>: Of G, yes.
>>: And it corresponds to some sense you devised -- solid matching. Is it matching also?
Approximate matching?
>> Nikhil Srivastava: It doesn't -- so I guess the number of matchings. Far fewer edges.
>>: Size of the matching is what?
>>: Yeah. I think so.
>>: So that could give a faster algorithm for matching, you're doing ->> Nikhil Srivastava: Now these edges have weights.
So I'm not sure if -- it depends on if you're okay with having weights, basically. Okay.
So to introduce a notion of approximation that I'm going to talk about, I will need to show you
matrix associated with the graph, which is a graph Laplacian, which I'll review quickly.
So if G is an undirected graph on N vertices then Laplacian L is a symmetric N by N matrix given
by D minus A where D is a diagonal matrix containing the degree of the vertices and A is the
adjacency matrix.
And it's well known that the Laplacian also has this, sorry. Laplacian can be written as a sum of
outer products over the edges. It's a sum over edges of IJ and E of delta minus J N transpose
where delta I is a vector with a 1 in position I.
And this sum of outer parts is weighted by CIJ if the edges of the graph are weighted by CIJ.
What we're really interested in this work the quadratic form of the Laplacians which is defined as
follows.
If X is a potential on the vertices, that is if X is a real function from the vertices, real function on
the vertices, then the quadratic form X transpose X is simply the sum of differences squared XI
minus XIJ squared over all edges IJ.
This is weighted by CIJ if the edges of the graph is weighted. It's metered from the side product
expansion. And immediately gives us two basic facts of the Laplacian which we'll need. The first
is it's positive semidefinite because this thing is squared.
And the second if the graph is connected then the kernel is exactly the span of one the vector
because if all these things are zero then X must be constant across all edges.
And I say this because all the vectors we are going to deal with in the talk are orthogonal to the
one vector. I'll talk about L inverse in a few places and that's okay.
Okay. So let me go back to cuts. So it turns out if I evaluate the quadratic form of the graph at
01 vector, the characteristic vector of a cut S, I simply get the total weight of edges causing the
cut.
Easy to see because the sum of squares the only terms non-zero correspond to the ones that
have a one in position I and a zero in position J and this corresponds to having one edge inside S
and one end point inside S, one point outside S.
So in this language Benczur Karger, what their theorem says is that for every G you can find a
sparse H so that the quadratic form is preserved on zero one vectors up to a factor of one plus or
minus epsilon.
What we do in this work is we construct sparsifiers that satisfy stronger notion. We want this
approximation guarantee to hold for all real vectors. This is strictly stronger than [indiscernible]
Spielman and Tang in 2004.
So why would we want to do this? There are a few reasons. So the first is by the current
pressure theorem, the quadratic form of a matrix determine the orthogonal values. Unable to
preserve the quadratic form. I get a multiplicative approximation for all eigenvalues. And
theorem in spectral graph theory say these imply a lot of combinatorial properties of graphs. H
inherits several natural properties of by virtue of having the same eigenvalues.
I'm interested in some spectral algorithms like cuts using the smallest eigenvector, something like
that values. And again H and G behave very similarly. I can use H as a like spectral clustering or
something.
Few more reasons. So the quadratic form RX transpose X is really sort of the -- it's a natural
notion of energy on graph. Corresponds to the notion of energy, if I look at electrical potentials on
graph.
Determines the behavior of electrical flows on a graph. Which in turn are closely related to
random walks on the graph and related things.
It's also the notion of approximation that we consider distortion of the quadratic form exactly
corresponds to the relative condition in linear algebra.
So it's useful notion of approximation for just linear operators.
In particular, it's the one you want if you're interested in preconditioning systems of linear
equations.
And it turns out this is what Spielman and Tang originally used it for, if you can do this kind of
sparsification fast, you can use it to solve linear equations really fast.
So all I want you to take away from this is it's a pretty strong notion of approximation. Okay.
So let me show you a couple of examples to give you a better idea of what these things look like.
So the first example is something everyone has probably seen under slightly, different context.
So suppose the graph G that I would like to sparsify is the complete graph on N vertices.
I know that all the non-zero Laplacian eigen values of G are exactly equal to N. Now, if I take H
to be a deregular ramanujan graph. So a ramanujan graph is a very good kind of expander
graph. It's a deregular graph on N vertices and all the Laplacian eigenvalues are approximately
equal to D. In particular, they're in the range of D two squared D minus 1.
So anyway if I take H to be a deregular ramanujan graph, then, well it's sparse because it's
deregular, if I take D of the constant. All the eigenvalues are about equal to D.
So if I multiply it by N over D. If I reroute all the edges N over D the eigenvalues will roughly
speaking be about N. This ratio will be sort of around one. Because the numerator will be
around N and the denominator will be exactly N. And so what I want you to take away from this is
that expander graphs are a good sparsifier to the complete graph, if you blow up the weights.
Slightly more complicated.
>>: Do you know the edge of this area, do you know the linearity?
>> Nikhil Srivastava: Yes, the edges are weighted by linearity.
This is a slightly more complicated example of a dumbbell.
If you have two copies ->>: Doesn't have to be that way it just has to have eigenvalues.
>> Nikhil Srivastava: Doesn't have to be ramanujan. Right. This is just the best possible
approximation. But any expander will give you a concept.
So here's another example, the dumbbell. It's two complete graphs shown by a single edge. And
I can sparcify this.
I can take two copies of my ramanujan graph, joined by a single edge, and this will work because
the quadratic form is linear in the Laplacian, so the sum of the sparsifier is a sparsifier of the
sums.
Things to know about this example is that the weights are radically different. The weight in the
center is, the edge at the center is one and the rest of the edges are N over D which is huge.
You can prove this is actually necessary in this case.
Other thing is any sparsifier for this better include the center edge, otherwise you'll disconnect the
graph and send one of the eigenvalues to zero.
>>: So D is setting up as a constant?
>> Nikhil Srivastava: Think of B as a constant.
>>: Doesn't really matter as a constant? For these?
>> Nikhil Srivastava: Depends on how good N approximation you want. If you take D bigger
you'll get a better approximation.
>>: Pacifies.
>> Nikhil Srivastava: You can take it to be not a constant, you will get even better approximation
but it won't be as sparse.
Okay. So having shown you the examples, I can now tell you what our result is and what we
show is that you can do this, which I did in the last slide, for any graph G up to a factor of two,
hence, the title.
You can achieve this trade-off between sparsity and approximation as ramanujan graphs have for
the complete graph for any undirected weighted graph G, if you use twice linear edges.
So you know the statement that ramanujan graphs exist is equivalent to saying they're very
sparse H that look like the complete graph.
So by very sparse, I mean degree D by, look like the complete graph, I mean this ratio of the
quadratic form H to that of the complete graph is D plus or minus two squared, D minus one,
which can be shown to be optimal.
What we show here is that there are very sparse graphs that look like any undirected weighted
graph G. So here look like is exactly the same thing. So ratio of the quadratic forms is bounded
by this quantity.
But by sparse, now we have average degree 2 D instead of degree D. So we have twice as
many edges. It's not regular anymore. You can show that for general graphs you can't
approximate them by regular graphs.
And the other thing is that the ace that we construct has weights.
So the edges have weights. It's a weighted subgraph of G in the sense that the set of edges is a
subset of the edges of G. But weights are different.
And we give a deterministic slow but polynomial time algorithm for constructing such a graph.
Okay. So ->>: Is it supposed to imply one graph looks like any other graph?
>> Nikhil Srivastava: No.
For every graph we can find a graph that looks like it. I know, in English this is bad. That's not
true.
But [chuckling].
>>: How big is the maximum degree?
>> Nikhil Srivastava: The maximum degree, you mean combinatorially?
>>: You said the average is 2D, how large can the largest be.
In your construction, you get more constant.
>> Nikhil Srivastava: No, no, it could be pretty bad.
>>: It could be linear?
>> Nikhil Srivastava: Yeah, probably, we have just actually no control over it. So I actually -- I
don't know. I imagine it could be pretty bad. Okay.
So what will, how does this compare to previous work. You can think of it as some generalization
of expander graphs. You can approximate arbitrary graphs.
It's deterministic. Although you can say it's explicit. Edges have weights, may not be useful in
complex theory.
It improves the previous result in which we showed that you can do it with DN and log N edges.
So it gets rid of the log N.
In fact, it's optimal up to like a factor of four because you can't do better than ramanujan graphs.
It's also the first deterministic algorithm for any approximation of graphs.
Okay. So I'll show you the proof of this, and it's completely elementary. It can fit in this talk. And
for simplicity I'll show you how to find for an unweighted graph G a weight of subgraph G with six
N edges.
So the quadratic form is preserved for a factor of 13. I mean, the techniques were exactly the
same if you want tighter bounds.
But, yeah, okay.
Okay. So I'll start by reducing this to some like pretty clean problem in linear algebra, and then
we'll just do that, okay. So recall that this is what I wanted to do. I have a graph G.
I want to select a sparse weighted subgraph H. So the quadratic form is approximated up to this
factor of 13.
So I'm going to rewrite this in a slightly different way which will make life a little easier for me.
And I'm going to do that by recalling the outer the product expansion.
If G is an unweighted graph, then the sum of these outer parts over edges IJ. And I'm just going
to -- instead of calling -- I'm going to call this delta I minus delta J. I'll index them by E which are
just the edges just for notation. Sum of B, B transpose. The nice thing is if H is a weighted
subgraph of G,
That is if it's a subset of edges H, H is a subset of G it has exactly the same outer expansion but
now the terms are weighted by the weighted of the edges H.
If it's a sparse subgraph then most of these Ss are going to be zero. The rest will be whatever
the weight of the corresponding edge is in H.
So it has the same outer part expansion. And with this in hand I can start modifying this. So I
wanted to show G as a sparse H versus quadratic form is at most the ratios between 1 and 13.
So I can multiply it by the square root of the inverse of LG on both sides. And I can easily see
this is equivalent to showing that there's a sparse H for which the matrix LG to the minus one half
LHLG one half has all its eigenvalues between one and 13.
>>: Inverse on the sub space?
>> Nikhil Srivastava: Yes. Inverse on the sub space. Everything is happening on the sub space.
So we're ignoring the one vector everywhere. Okay. So okay now I'm going to open the H in
terms of this outer product expansion. So I have the sum of over edges E of the original graph.
SE times this LG to the minus one half BE times its transpose. And again I'm just going to define
VE to be the vector LG to the minus one half BE.
And so now my goal is just to show that there's a sparse set of scalars SE for which sum SEVE
transpose has eigenvalues between one and 30. Given vectors VE in this manner from the graph
G.
Okay. So let me look a little more closely at VEs and see what I know about them. So the VEs
are indexed by the edges of G. They're M of them. If there are M edges G. They live in R minus
1 because that's the rank of a Laplacian. And the most important thing about them and the only
thing we'll use is that they form a decomposition of the identity in the sense that the sum of outer
products over VE, sum of V E transpose is equal to the identity on this N minus 1 sub space.
This is simply by definition edges VE is LG to the minus one half BE. I take the sum inside. This
thing is just equal to Laplacian. You just get the identity.
So what this means is that, you know, for every unit vector U the sum of projections on to the VE
is squared to the square one. So what this means is that you know for every unit vector U, the
sum of projections on to the VE is squared is equal to one. So it's the moment ellipsoidal of this
vector is sphere. Okay. So this is my graph GM.
I'm going to look at my graph G as a bunch of vectors satisfying this property from now on. And
let me return to what I wanted to do. I wanted to select the sparse subgraph H. This will
correspond to choosing a small set of edges, choosing a small subset of vectors VE possibly with
stretching factors SE. That will be my subgraph.
And my approximation guarantee is now exactly that the eigen values of the sum of the sum
SEVE transpose are between 1 and 13.
Or sort of more generically the moment ellipsoidal of this thing is approximately a sphere in the
sense longest to shortest axis is 13.
Okay. So I mean this is what I have achieved. I haven't really done anything special. I've just
translated this problem of sparsifying graphs and preserving the quadratic form into this other
form. I'm given an identity, find the small weighted subset which is nearly a decomposition of the
identity.
And so in particular I've shown that it suffices to prove this theorem, if I want to prove my theorem
about graphs. Given any sum, given any decomposition identity N dimensions, there are at most
six N nonscalars SE for which this sum of SEVE, VE transpose has eigen values between 1 and
13. And so for the rest of the talk I'm just going to prove this theorem. And you can just forget
about graphs and it's just going to be some vectors.
And what we actually show is we can show this with these parameters. So if you want D and
non-zeros you can get this D plus two squared D minus one factor. And this immediately gives
twice ramanujan result for graphs.
But in the talk I'm just going to show you this. Okay. So let me give you some intuition for the
proof before actually showing you what it is. So the broad outline is we're going to build this
approximation in steps. We're going to build it iteratively. We're going to start with zero. And in
every step we're going to choose a vector VE and add some amount of it. That is, we're going to
increase one of the scalars SE. Okay. This is going to change the eigen values of this, what I
have at the moment in some way, and I'm going to control this process so that after a small
number of steps, my eigen values are in the right place. Because in each step I added only one
vector. At the end I only added a small number of vectors and everything else is zeros. Since I'm
going to do this iteratively, the basic question I need to answer and the thing I need to get a grip
on is what happens to a matrix when I add a vector to it, that is, when I add a rank one matrix to
it. So the answer to this is very well known. So I have a matrix A. It has some eigen values
lambda A. Blue ellipses on the line.
What happens when I add VV transpose to A. Add a vector V. The eigen values shift forward in
a way the new eigen values interlace the eigen values that were there before. And you know,
roughly, I mean, vaguely speaking, the shift in the -- the shift in any eigen value is largely
influenced by the projection of this eigenvector V onto the corresponding eigen vector. So maybe
this moved a lot because, I don't know, V had a lot of projection on this or something like that. I
mean, that's not rigorously true. But I'll make it rigorous in the next slide. Okay. So this is
roughly what happens. Is 2.1 and let's see what happens. And yes, it splits it. So to get a better
grip on exactly where the new eigen values are we can look at what happens to characteristic
polynomial under a rank one update. Characteristic polynomial A is determinative. Okay and if I
add now again vector VV transpose, I can easily compute what happened with the characteristic
polynomial by the matrix determinate level. It's easy to see how the determinant behaves under
rank one updates. I see the characteristic of A plus V transpose is the simply the characteristic of
polynomial A times this thing here. One plus sum over I which are the eigen values of A of this
projection of the vector that I'm adding V onto the eigenvector UI squared divided by lambda
MSX. Okay. So it turns out that this equation induces, has an interpretation okay. So the new
shifted eigenvalues are roots of this thing in the bracket, in the parentheses here. And it turns out
that this thing has a very nice physical interpretation, which gives us a lot of intuition for where the
eigenvalues go.
And the physical model is this.
>>: Can you go back to that original ->> Nikhil Srivastava: Yes.
>>: Where the argument of P is X.
>> Nikhil Srivastava: Is X. Yes. I should have put it here.
>>: So that's the X down there.
>> Nikhil Srivastava: That's the X down there. Okay. So this is the characteristic -- so we're
interested in the roots, where this is zero and this corresponds to, well I'll show you the following
simple physical model. So the real line is a slope, like incline plane or something like that.
The N eigenvalues of A are unit charges sitting on the slope against some massless chargeless
barriers, which are whose locations on the slope correspond to the locations of eigenvalues. So
they're N barriers. Against each of them there's some charge sitting there.
And there's gravity. So the charge is like not falling down the slope and it's sort of sitting snuggly
against this barrier. So these are the eigenvalues of A.
So what happens when I add VV transpose? Well, when I look at A plus VV transpose I'm going
to put charges on the barriers in the following manner. The amount of charge on the Ith barrier
which corresponds to the ith value will be equal to the projection of the vector that I'm adding V
onto the ith eigenvector. So in this example suppose my V has projections on three and three
only in three directions. I'm going to put charges on one two and N. I'm not going to put any
charge on three. Let's say it's orthogonal to the third eigenvector. Now there's going to be some
charges proportional to the projections on the barriers.
Okay. And now, okay, so the eigenvalues which are orthogonal -- the eigenvectors are
orthogonal to the eigenvector which I'm adding, those eigenvalues are not going to move. So I'm
just going to forget about them. And now I have a bunch of charged particles and a bunch of
charged barriers and they're going to repel each other.
And the repulsion is going to have an inverse law. So the propulsion on a charge is going to be
proportional -- the barrier and charge is going to be proportional to the charge on the barrier.
And inversely proportional to the distance from the barrier. Not distance squared as in real life.
But okay. So this is going to happen. The eigenvalues are going to move around. So the first
one is going to get pushed up by the first barrier. It's going to get pushed down by the second
one. It's going to go up a little and at some point gravity is going to balance everything out and
it's going to stop somewhere. Same thing will happen to the second one and it will move up and
at some point it will reach equilibrium with gravity.
And these are where the new eigenvalues lie. And now it becomes clear what this has to do with
the equation I showed you earlier. This quantity in the sum here is simply the net -- so if X is an
eigenvalue, X is a position of -- I'm trying to calculate the equilibrium of the eigenvalues, well the
net repulsive force due to the charges is exactly this quantity. Because the amount of charge I
put on a barrier is equal to the projection squared. And the propulsion is proportional to the
distance.
>>: Not just the neighboring ->> Nikhil Srivastava: Which one?
>>: All the barriers act ->> Nikhil Srivastava: So the particles don't repel each other. This is not that realistic but they
don't repel each other. The barriers repel. All the barriers repel each particle. Okay? So this is
the net. This is let's say the net upward force due to repulsions and that's gravity. And so this is
the roots are going to be where the equilibrium of this are.
>>: So is that a routine ->> Nikhil Srivastava: Yeah, I mean ->>: It's the motivation.
>> Nikhil Srivastava: Yeah. Okay. Okay. Now I can put back the eigenvalues that didn't move
and this is where the new eigenvalues are. So this will give me some intuition about how this
works. So let me show you a couple of quick examples. So in the first example, suppose I add a
vector that has all the weight on the first eigenvector. Well, this is just none of the other
eigenvalues move. This one's going to shoot way out because I added a lot of charge and this is
where the new eigenvalues are.
And the second example, suppose I put equal weight on the first two eigenvectors. Well, okay.
So the first eigenvalue is going to move up, but it's not going to go too far because it's going to be
pushed down by the second barrier. And then the second one, the second eigenvalue is going to
move way out, because it's being repelled upward by both these barriers, which are sort of close.
And you know it's going to settle somewhere beyond there. What I want you to take away from
this is I could put a hell of a lot of charge on this barrier and this is still not necessarily going to go
very far because it's going to be pushed down.
>>: What happened to your interrelation ->> Nikhil Srivastava: Right. So okay. So my interlacing picture which I gave earlier, interlacing
still holds. But it's not -- the reason it's a little confusing is because it's not really correct to
associate, to identify eigenvalues in this manner and say that, oh, this is this eigenvalue that
moved forward. New eigenvalues do interlace the old one.
>>: You're doing your picture across ->>: Putting some close to the barrier?
>>: So even this first case, I'm guessing the first one is way up?
>>: Yes.
>> Nikhil Srivastava: Okay. Here's another way -- if you're concerned about this we're going to
apply a small perturbation so all the barriers receive some tiny amount of charge and then
nothing crosses the barriers.
And that -- I mean that would lead to approximately the same positions. And then you'll have
interlacing. And you can send the charges back to zero if you want.
>>: Trying to understand your picture. When one shoots out others come down?
>> Nikhil Srivastava: In this picture? So in this one, nobody comes down. So this picture is -- so
the shady thing I'm doing here I'm forgetting about the eigenvalues that don't move. That's
probably what's making it seem -- so after I forget about the eigenvalues that don't move, nothing
jumps across barriers.
So I agree that, okay, in the first example, because none of these move, this thing can go way out
as far as it wants and up here and across a bunch of barriers. But if -- I mean if I have a -- I
mean, nothing is crossing the barriers that have actually have charged. Is this what ->>: Sounds like [indiscernible].
>> Nikhil Srivastava: We're just forgetting about the charging barriers. I mean, this is --
>>: You're forgetting, but does the matrix forget?
>> Nikhil Srivastava: Yes. Okay. Okay. Here is a way to see this. So I'm interested in the roots
of this. The roots of this are either roots of this or I mean they're either roots of this or they're not.
If they're not roots of this, then this thing better be zero. And essentially this case analysis is
what I'm doing when I'm forgetting about some of the barriers.
>>: So the picture with the arrows is misleading.
>> Nikhil Srivastava: I shouldn't put the picture with the arrows. You're right. The picture with
the arrows is misleading, because this is not -- the new positions interlace the old positions but
this is not actually what happens. Fair enough.
Okay. Okay. So this is actually what happens. And I guess what I wanted you to take away from
this example is that if I have a bunch of eigenvalues close together, and I put some charge on
them, then like one of them is going to shoot out really far. Like that.
>>: Because if they have a barrier it's a very small amount of charge. Like eigen [indiscernible]
and it shows a perturbation of no charge, across.
>> Nikhil Srivastava: Well, they're going to end up in the same place. Like the crossing is -- I
mean, the crossing problem is arising because we're identifying eigenvalues with, we're saying oh
this is the first eigenvalue that like, okay, so in this case if I wanted to do this with perturbations,
what would actually happen is this eigenvalue doesn't move forward. But what would happen is
this moves here. This moves here. This moves here. Something like that.
>>: Eigenvalues are -- don't have identity.
>> Nikhil Srivastava: They don't have identities. Okay. So maybe this doesn't -- okay. Anyway,
the reason I talked about this in the first place is because I'm interested in the case when I add a
vector that has equal weight on all the eigenvectors.
And what I sort of expect to happen is that they all sort of drift forward in some steady sort of
manner. If the barriers are equally spaced I don't expect anything too whacky to happen. I don't
expect anything to shoot out way far or anything like that.
And the reason I'm interested in this case is because it's sort of related to what happens when I
add a random vector from my original set of vectors. So I'm going to try -- I'm going to look at this
case a little more carefully by looking at what happens to the characteristic polynomial. So
suppose I add a vector V like a balanced vector that has equal projections on all the
eigenvectors.
Then, okay, the new characteristic polynomial is this. Each of the numerator terms will just say
they're all equal, say they're all one, doesn't really matter. Okay. And so the new characteristic
polynomial is simply this. But when I multiply P by this thing here, since X, since lambda are the
roots of P, I just get P minus its derivative.
I get P minus P prime because P is a product of 1 minus X and I multiply it by the same.
So what it is saying is if I add a balanced vector, if I have a vector that has equal projections on
all the eigenvectors, then it's the same thing as subtracting the derivative off the characteristic
polynomial.
Okay. So what does this have to do with what I'm doing? So suppose I have vectors VEVE
transpose set up the identity, decomposition of the identity. Then for every eigenvector UI or for
every vector U, actually, but for every eigenvector UI, the sum of projections onto UI squared is
equal to one, because it's a decomposition identity. So in particular if I look at the expected
projection on any eigenvector it's exactly will do 1 over M where 1 is the number of vectors.
So I choose a random vector over the decomposition identity then the expected projection on
every eigenvector is the same. It doesn't mean they're actually real vectors that have the same
projection of all the eigenvectors.
But suppose for a moment that it did mean that. Then what kind of proof would we be able to
have. So let's look at the following process. We start with A. Key track of matrix A. A zero keep
track of the polynomial initially X to the N although the roots are zero. So we add -- so, okay, we
have this condition that a random vector has the same balanced projection of all the
eigenvectors. Now, suppose that something magic happened and we could actually find a
eigenvector V that has this property, has the same projection amount of eigenvectors. The
eigenvalues would shift forward and we would just subtract a characteristic polynomial the
derivative from the characteristic polynomial to get the characteristic polynomial of the updated
matrix. Suppose we were in some dream am I state and every step we could actually find this
balanced ideal vector with expected behavior. Then we'd just take X to the N and keep recursive
subtracting derivatives off.
And you know by this picture that I showed you earlier, we had some idea that nothing too
whacky should happen and these things should sort of drift forward suddenly. We hope if we do
this for a very, very long time, they end up in a place so that the ratio of the largest to the smallest
is a 13 or something.
So the really fortunate thing in like the punch line of this is that these polynomials, this process of
taking a polynomial across the derivative generates a classical family of orthogonal polynomial
which are the associated yeah gear polynomials and everything is known about the roots. And in
particular after DN steps the ratio of the largest to smallest root is exactly bounded by this
quantity. D plus two minus squared minus 1 actually it's bounded by something that's a little bit
better. But it's bounded by this quantity, okay.
And so if I was able to find vectors that have this behavior, I would be totally to do done, because
in the N steps I would have DN vectors and perfect roots.
But, of course, this is completely, I mean, not true. So we have to do something to find actual
vectors that matches behavior. That's what we'll do in the actual proof.
So here's the actual proof. I'm going to do this by an iterative process. I'm going to start with my
matrix A at zero which for some reason is the empty set. I'm going to keep track of two real
numbers called barriers. A lower barrier and an upper barrier. Initially I'll have the lower barrier
at minus N and upper AN. And I'm going to take steps. And in each step I'm going to do two
things. So one thing I'm going to choose a vector from my collection to add to my matrix. I'm
going to get A plus V transpose. This is going to move the eigenvalues forward.
And simultaneously I'm going to shift the barriers forward by fixed constants. So I'm going to shift
the lower barrier forward by a third and upper barrier forward by two. These are fixed constants.
And what I'm going to guarantee and the whole point of this I'm going to guarantee that the
eigenvalue of the matrix always lie between the barriers. So I'm going to prove that I can always
choose a vector that moves the values in a way so they still lie between the shifted barriers.
Now, notice that these barriers are doing different things to the eigenvalues. The lawyer barrier is
saying, okay, I want all the eigenvalues to be bigger than me. So I'm moving forward it's
tightening a constraint on the eigenvalues. It's pushing the eigenvalues forward. The upper
barrier is moving forward. This is actually loosening the constraint. It's saying I now want all
eigenvalues to be something that's bigger. But it's moving forward slowly enough that I don't
want the values to be too huge.
So suppose I could prove that I could do this. Then I could finish the theorem in the following
way: So I start off at zero. Everything is between the barriers. Eigenvalues are between the
barriers. I add some vector, move forward by plus third and two, and I keep doing this. And I do
this for like six to ten steps, let's say. For six to ten steps the lower barrier is at N and the upper
barrier is at 1310 because I moved it by two. And all my eigenvalues are in this range so I'm
done.
So this is a broad outline of the proof. And the main thing we need to show is you can take such
steps. So at any stage, if your eigenvalues are between the barriers, then there's a vector which
you can add. Unfortunately, this invariant, this induction hypothesis we're maintaining having the
eigenvalues between the barriers is not strong enough to prove this. This statement is actually
just false.
What we need is a better way to measure the quality of eigenvalues of a matrix, we need a
stronger induction hypothesis. We're going to do this with potential functions, which are inspired
by the physical model I showed you earlier. Suppose I hear about the upper barrier for now. I
want all my eigenvalues to be less than U. I want to define it script U of A, to be trace minus A
inverse. It's the sum of over the eigenvalues of 1 over U lambda minus I. So it's easy to see from
this that this function. So assuming all the eigenvalues are on this side, this function blows up as
any of the eigenvalues get close to U.
So in particular, you know, if I know that this is bounded, then I know that none of the eigenvalues
are very close to U. If I know it's bounded by 1, then none of the eigenvalues of A are within
distance 1 of U. But I mean the whole point of this, the reason it's better than looking at the max
and min it says something about all the eigenvalues. It says that no two eigenvalues are within
distance two nor two are within distance three. There's no accumulation of eigenvalues near U.
It actually says something about all the eigenvalues, and in terms of the physical model, this is
just -- this would just be the total repulsion between the eigenvalues which are unit charges and a
unit charge placed at U on a flat line. Okay. And so intuitively this makes -- I mean it makes sort
of sense that the total repulsion is small not too many eigenvalues can be very bad.
Okay. So this is what I'm going to use to keep track of the upper, of the eigenvalues not being
too big. I can define something analogous for the lower barrier. Just raise of LI minus inverse.
And with these two things in hand I can actually complete the proof. So here's how it goes. So I
start at zero barriers minus N and N.
The upper and lower potentials are both equal to 1. Because each eigenvalue is at distance N
and take the reciprocal and there are N of them.
And okay so my potentials are equal to 1. So what I'm going to show is if in any situation, when
my eigenvalues are between the barriers and the upper and lower potentials are bounded by 1,
then there exists some multiple of a vector that I can add. There exists some scalar S and some
vector V from said VE. Some SVE transpose you can add to e that shifts the eigenvalues in a
way, these potentials don't increase.
And after I prove this lemma, then I can do the same thing and I can finish proving the theorem.
So all the remains to be done is to prove this lemma. So given any matrix A with upper and lower
potentials bounded by 1, and some vectors in isotropic position, I can always choose some -- I'm
sorry, vectors are decompositions of the identity. Same thing. I can always choose some scalar
S, choose some amount of some vector to add. So that both potentials don't increase when I
shift them by constants, a third ->>: These are chosen from your collection.
>> Nikhil Srivastava: The V is chosen from the collection. I guess that's a point. Okay. So I'm
going to prove this lemma. So the first question to ask is which vector should we add. It's not
easy to search for vectors. The right question to ask is given a fixed vector how much can we
add? What's the right range of S? So then we look at the upper barrier for now. So I have A.
It's eigenvalue less than U. Upper potential is bounded by 1 and I'm interested in moving
forward, moving forward U by 2 to U prime. And I want to find out -- I have some vector V. I want
to make sure how much VV transpose I can add to A. So how much VV transpose can I add
without blowing up the potential. Well, the new potential -- so the potential of the updated matrix
A plus A transpose shifted barrier UU prime is just trace of U prime I minus ASV transpose
inverse and I mean the reason these potential functions are nice, so I can exactly compute what
this potential is using the Sherman Morrisson formula, which tells me what happens to the inverse
matrix under rank one updates. With this standard fact I can write the new potential as the
potential of the old matrix of the shifted barrier. So V of U prime of A plus this quantity. With a
numerator some quadratic form V numerator 1 over S minus quadratic V. So what did I want to
do? I wanted to find out what S would have to be for this new potential to be at most the old
potential. Right? And I can just do that by rearranging this inequality. And I can get a condition
on S. So I can actually immediately get that the -- I can add S of a vector V without increasing
the potential with this shift, if and only F one 1 over S is at least some quadratic form of V. So I'll
look at this more carefully later.
So this is a lower bound on 1 over S. Upper bound on S and that makes sense because I'm
asking how much of V can I add without blowing up the eigenvalues.
And I'm going to call this quantity in the parentheses this is matrix U sub A what I want you to
take away from this is I've drived the lower bound on S lower bound on 1 over S that is linear on
the outer part of VV transpose. So this is just so the condition is 1 over S is at least some matrix
dot VV transpose.
Okay. So I can do the same thing for the lower barrier, actually. Almost exactly the same way.
And here I get an upper bound on 1 over S which is the lower bound S which also makes sense
because it's telling me how much VI must add in order for the eigenvalues to move forward
enough so that they're now bigger than the new shifted lower barrier. And here I've got an upper
bound 1 over S which is some quantity OOA VV transpose, linear, outer product. So now I
wanted to show that I can always, there's some vector which allows me to respect both barriers.
Keep both potentials under control. And this just reduces to showing that there is a vector where
this lower bound in 1 over S is at most N over S this is the amount that I can add. I'm sorry, so
this corresponds to the amount that I can add. This corresponds to the amount I must add.
And all I want to show is that there is a vector for the amount I must add is less than the amount I
can add. So once I show this, I can squeeze the scaling factor 1 over S in between and add as
much that vector and that proves the lemma.
So I'm going to show this. And I can just show it by taking an average. So I'm going to take the
sum of the left-hand side over all my vectors VE which form a decomposition identity. Now,
because this is linear in the outer part, VV transpose I can take the sum inside the outer product
and I just have this matrix thought product, VV transpose. So sum over VV transpose which I
know is the identity. This is the only place where I'm using this fact. Which is just a trace of this
matrix, usually.
Okay. So now I'm going to look more closely at what this quantity is. And bound its trace. And
that will finish the proof of the theorem. Okay. So note that this is the step in which we are sort of
transferring to the behavior of this balanced ideal vector, which didn't exist. So in this whole thing
we're showing if allowed to use weights there's some vector you can add. But here when we're
taking this average over all our vectors, we're essentially doing this computation for our balanced
ideal vector, which has the same projection in all directions.
So you can think of this as a step where we're relating actual vectors to one or two earlier. Okay.
So let's look at this quantity. So the trace of we're interested in the trace of this quantity. The
second term here is simply the potential of A at U prime. Okay. Which we know at most is the
potential of A at U because moving upper barrier forward reduces potential at most reduction.
The numerator up there is the derivative of the barrier function with respect to the value, the
position of the barrier at U prime. Okay. And the denominator is by convexity at least as large as
what I would get if I took a linear approximation. So this is the difference between -- okay, what is
this? This is the difference between the potential of A at U and the potential of A at U prime. So I
moved U from U prime -- I've moved U to U prime. So the potential has gone down.
And what I'm saying is if I approximate what this difference is by using the derivative, then by
convexity this quantity that least simply, the shift in the barrier, delta U, times the same derivative.
So these things cancel. And I get the traces at most 1 over delta plus 1. So the point this doesn't
depend on A it doesn't depend on even where the barrier is. It doesn't depend on the vectors. It
only depends on the shift in the barrier.
Similarly, I can drive a bound for the average value of this L sub A and this is at least one over
delta L minus one. Again, this one comes from the induction hypothesis.
And now it becomes clear why I chose two and a third. Because if I said delta U to 2 then this
thing, as I said delta L to one-third then this thing is two and two is bigger than a third. So there
exists some vector.
And you can prove this lemma and you can just keep doing this and that proves the theorem.
So if you do this carefully and you fix that you're taking DN steps, you get the bound D plus one
plus two squared D over D plus two minus squared D, corresponds to the ratio of roots of
Laguerre polynomials, which means you can exactly get the behavior we got from the Laguerre -from the ideal case. So this is not actually the deep plus or minus 2 square or D minus 1 quantity
that appears in the ramanujan case. This thing is actually better than that when D is bigger than
the golden ratio. So I mean if you're interested -- that's what it is. So it's only twice ramanujan
interfere, average degree is bigger than twice the golden ratio.
Right. So I mean, essentially why I'm saying this is the origin of this bound seems to be
somewhat different from the origin D plus minus two squared or D minus 1 which appears in the
ramanujan graph case. This happens to be at most twice that. But the origin is somewhat
different.
I mean, this has a relation to random matrices, because if you take -- if you take a random
Wishart matrix, then its eigenvalues lie in this range. Anyway, so that finishes the proof of the
theorem. And there's some open questions left by this.
So one is can you do something to actually get closer to -- can you get riddle of the factor of two?
The second thing is if this produces weighted graphs, can you produce unweighted graphs, even
for some special family. Can you even produce unweighted sparsifiers of the complete graph
which would be really expanded graph without weights. Is there a way to speed it up N to the
fourth right now not useful for any of the linear system solving or any of that kind of stuff. It's
mostly just interesting to see these exist.
And the last thing is that this is related to this old conjecture called the Kadison-Singer conjecture,
which actually comes down to the question given a graph G, if I know that all the effective
resistances of the edges are bounded, can I break it into graphs which are both sparsifiers for it.
They don't have to be sparse I just need to break it into two sets but each one has to approximate
the original graph. I mean, this is sort of the -- this is sort of the dream thing you would like to
prove as far as getting rid of weights goes. And that's the end of the talk.
[applause]
>>: Question.
>> Nikhil Srivastava: Yes.
>>: N to the fourth complexity for the domestic algorithm, you're allowing minimization your
procedure can speed it up?
>> Nikhil Srivastava: So the only thing I know how to do right now is -- okay. So with these
parameters at this moment, I don't think so. So okay so the running time is DM to the 3, right? If
you want to -- if you're giving yourself a little bit of sparse send M to the N, that's not very
impressive.
The problem with -- the N to the third comes from the matrix inversion, and currently there doesn't
seem to be a way to avoid that. And ->>: N to the 2 plus.
>> Nikhil Srivastava: Well, one hope is to show that right now we've shown that there is a vector
which there's some vector which works, but I think if you're able to loosen the parameters a little.
You might be able to show -- I haven't been able to show this, but it's conceivable that there are
enough vectors at work that instead of iterating over all of them you could choose some of them
and just check it on that. But that would save you one factor event, I don't see a way to get
around the matrix inversion.
>>: Eyal Lubetzky: Good. So let's thank the speaker again.
[applause]
Download