>> Mohit Singh: Okay, hi, everyone. It's a... is doing a postdoc here this year. He has...

advertisement
>> Mohit Singh: Okay, hi, everyone. It's a pleasure to introduce Sasho Nikolov. Sasho
is doing a postdoc here this year. He has done a lot of nice work in many areas, including
discrepancy, differential privacy and a lot of things like convex geometry, as well, and
he's going to tell us something in that area.
>> Sasho Nikolov: Thanks. Good to give a talk here. So I'll tell you about an
approximation algorithm for a geometric problem, and I'll start with the problem, because
I think it's a natural problem. Please stop me at any time if there are any questions. So
here's the problem. So the input is -- let me see if I can do this. No. The input is
endpoints in d-dimensional space, and our goal is to find the j-dimensional simplex in the
convex hull. So here, those red dots are my points, and you imagine this is threedimensional space. This cube I've drawn is supposed to be the convex hull, and our goal
is to find the largest simplex of some given dimension inside that convex hull, largest I'm
measuring by basically volume. Okay, cool. All right, so we have points, n points in d
dimensions. We're looking at the convex hull, and we're looking for the largest volume
simplex. So a simplex, a j-dimensional simplex is just the convex hull of j plus 1 points.
So a two-dimensional simplex is a triangle. A three-dimensional simplex is a
tetrahedron, and so on. And we are measuring largeness in terms of volume, so when
we're looking for a d-dimensional simplex and our points are in d dimensions, we really
are just looking for the d-dimensional simplex of the largest volume. If j is less than the
full dimension, less than d, it's kind of the obvious thing, so we would look at the flat in
which the simplex lies, so the affine span, and we would look at the [indiscernible]
measure on that flat. So in this case, we are in three dimensions. If we are looking for
the largest two-dimensional simplex, we're really looking for the triangle that has the
largest area. Make sense? Okay, and to make the problem at least sound more
combinatorial, let's notice that we can always assume that an optimal simplex is the
convex hull of some subset of the input points of the operations. Right, because there
might be some other optimal simplex, but you can always just move the points one by
one until they snap to an input point, without changing -- there's always a way to do that
without changing the objective, or at least without decreasing it. This makes sense? So
we're really looking for a subset of j plus 1 points from the input that have the largest
volume simplex, whose convex hull is the largest one of simplex. Okay, so that's the
problem. Good? Cool. All right, so I'm claiming it's a natural problem in computational
geometry. It definitely has been studied for some time. You can see there's one example
from a general class of problems approximating some complicated object, say, complex
body, with a simpler one, say, contained inside of it. And you can see it as a sort of a
polyhedral analog of the largest volume ellipsoid problem. The largest volume ellipsoid
problem, that's already the joint ellipsoid problem is the same problem, but you're looking
to find the largest ellipsoid containing the convex hull of your points. It's also a much
harder problem in the representation which I'm working, but that doesn't matter. Okay.
Right, and there are some applications of this in low-rank matrix approximation and also
in discrepancy theory, and I'll tell you a little about the second application.
>> Also, does it make sense to announce we have lower problems than defined? If I say
it's also in a lower dimension, with its own ellipsoid?
>> Sasho Nikolov: I think you can define it the same way. I'm not sure if people have
looked at it. That's all I can say. Okay. All right, so let me tell you about this connection
with discrepancy theory, and it's a little bit of a loaded slide, so I'll go slowly over it. So I
want to tell you something called linear discrepancy. That's a measure of how well you
can round in a certain size, which I'll define -- so the linear discrepancy vectors, again, of
points V1 and Vn and Rd is some measure of how well you can round with respect to
these points, in the following way. All right, so we are given some real weights, X1 to
Xn. Those are real coefficients that correspond to each one of the vectors V1 to Vn, and
our goal is to round them. So they're in plus minus 1, where they could be anything.
They could be square root of two over two. But our goal is to round these things to plus,
minus 1, so to the endpoints, to the integer endpoints, so that the linear combination with
the given real coefficients is close to the linear combination with the rounded
coefficients, and so we are measuring closeness in Euclidean norm. So what linear
discrepancy is, is for a fixed set of vectors, what's the worst-case error over all possible
linear combinations for rounding? Does this make sense? Okay, so that's something that
comes up in approximation algorithms, for example, where you solve some linear
program, you want to round a solution to linear program to an integer solution without
losing too much. Okay. And what did I write here? So here is the solution to this largest
volume simplex problem, so let me denote cursive V-sub-j, the maximal volume of a jdimensional simplex in the convex hull of the points. Okay, so here is a theorem. One
side, so the lower bound is due to Lovasz, Spencer, and Vesztergombi. The upper bound
is due to Matuschek from a few years ago. It doesn't matter exactly what this formula is,
but it tells you that basically this linear discrepancy is up to log factors equal to some
function of these V-sub-js. In particular, you can approximate this if you can -- if you
know some approximate values for all these V-sub-js. And also notice that we are
normalizing here, so this is j-dimensional volume, so we are raising to the power 1/j, so
this kind of gives the right scaling factor, and this will be nice. You'll see on one slide
why it's nice for this result. Make sense?
>>: Is this the reason you want us to [indiscernible]?
>> Sasho Nikolov: Thanks definitely one reason why I'm interested. I also -- so I saw a
recent SODA paper on that -- this is the honest answer. And I realized I already had an
improvement, but I just didn't know people cared about this problem. So it has been
studied in computational geometry for some time, I guess. You could argue it's natural
on its own. For me, as far as applications, this is maybe the main application I'm
interested in.
>> But it's natural to join the ellipsoid in [indiscernible] applications?
>> Sasho Nikolov: Yes, I think it's a natural problem. I could dig out more applications.
The connection to low-rank approximations is also interesting, but it would have taken a
little more time to kind of define what it is, exactly.
>> This log d, in the upper bound, do you know if that's optimal?
>> Sasho Nikolov: Probably we know that it -- I need to think about it a little bit for
exactly this measure. I think it's not known to be optimal, but some poly log factor is
necessary.
>>: [Indiscernible].
>> Sasho Nikolov: Yes, it's a good question. So it's a little hard to answer without going
into a proof, so the lower bound, at least, is some covering argument, and that's kind of
what comes out, so you're covering one object with simplices, basically, or something
like that. When you compute the volumes, that's what comes out. It's some function of
the volume of a unit bowl and the volume of a sort of regular simplex. I need to think
about it a little bit, when you normalize the volume to the 1 over dimension. Maybe it's
the ratio of the two, which is what comes out when you just do the obvious volume lower
bound for covering. Does this make any sort of sense? I can think more about it and
come back to you. Okay, so it's not the ratio of the two, but it's some function of those
two things. So let me move to my result, unless there are more questions. Okay, so here
is what I want to prove. I mean, here's what I prove in this work and what I'll talk to you
about. So I'll show there exists a deterministic polynomial time algorithm that
approximates the volume of the largest j-dimensional simplex up to a factor -- okay, so
the analysis exactly gives this j to the j over j factorial under square root, and this by
Sterling is sort of like e to the j/2. Okay, and notice that here, we always take Vj to the
1/j, so this -- that observation and this approximation factor gives us a constant factor of
approximation to this linear discrepancy problem. All right. Sorry, it gives a constant
factor of approximation to this volume lower bound for the linear discrepancy problem.
It gives us log approximation to the actual linear discrepancy problem. Also, this e to the
-- this constant to the dimension sort of approximation factor is the optimal kind of
approximation you can get, in the sense that we know that the approximation cannot be
better than exponential in the dimension. So there exists some constant bigger than 1
such that this problem is NP hard to approximate better than the constant to the j, to the
dimension. Okay, this constant is really much closer to 1 than e, so it's still not clear
what the right constant is there, but at least we'll get the right order of dependence, and
this NP hardness is true as long as -- so, obviously, j is constant, you can just brute-force
the problem. So this NP hardness holds as long as j is sort of big enough. Again,
polynomial time here means polynomial in both dimension and number of points. This is
not the usual computational geometry setting where we are in the plane, because then the
problem becomes too easy. And it's a significant improvement on what we knew before,
so previously, for general j, we knew j to the j/2 approximation. I'm losing some constant
to the j here. This was due to [Khatian] and Packer. And very recently -- I forgot the
reference. Okay, the [indiscernible], Fritz Eisenbrand and a few other people. That's
embarrassing, I'm sorry. I apologize to Fritz and everybody else. So they improved this
only for the case where you're looking for the largest d-dimensional ellipsoid, so only for
the full-dimensional case. They get something which is like log d to the d/2. This is the
SODA 2015 paper, the last SODA. And so we improve on those two results, I think
significantly. And this is my result. Any questions? Okay. So I want to first tell you
about the d-dimensional case, because that's easier. It contains a lot of the intuition, and I
want to give you as much of a full proof as I can in the time that I have for this case,
because I think it's simple enough. So I'll go slow. Please stop me, again, if you have
questions. So here are some simple reductions that we can do to the problem to make it
slightly easier. There are a bunch of them, so I'll go slow over them. So let me use delta,
because it looks like a triangle, to be the optimal simplex. So this is the convex hull of
some subset of d plus 1 points from the input. Let's guess the first vertex. We can
always do that. There are only n options. In reality, what we will do is we will just kind
of run over all the options. We will run the algorithm once per every option, but for now,
let's assume we just know the first vertex, someone tells us what it is. And let's replace
each point Vi, so by replace I mean let's just move each point. Yes, let's just subtract the
first vertex, Vi1 from each point Vi. So this just kind of shifts the whole set of points, so
it obviously doesn't change volumes, but now we know that one of the optimal points is
the origin. It's just convenient. It's convenient, you've seen a second y, but now, after we
do this, the new problem is to find not d plus 1 but only d points from the input, so a
subset of d points from the input, so that the volume of the convex hull of these d points
in the origin is maximized. Okay, and that's convenient, just because it lets us make the
problem a linear algebra problem as opposed to a geometry problem. Okay, so this is
equivalent to we have our points, so let me write something. Okay, so we have our
points, and let's think of this matrix maybe A is not the right thing. Sorry about that.
Matrix V, which is just the matrix which has columns, which are the points. Except the 0
point, which we don't care about anymore. Okay. So V2 and so on, okay, and now I'm
claiming that this problem of finding these d points whose convex together with 0 has the
largest volume is the same as finding d by d -- so here, this dimension is d and this
dimension is n. Okay, so that problem is equivalent to the problem of finding a d by d
sub-matrix of this that has the largest determinant in absolute value. Why? Because the
volume of this thing is just equal to the value of the corresponding sub-matrix times a
scaling factor that doesn't -- sorry. The volume of this is equal to the absolute value of
the determinant of the corresponding sub-matrix times 1/d-factorial, and that's -- yes.
Let's just take that on faith. I think that's convincing for everyone, right? Okay, cool.
All right, so now we have this problem. We just have this n by d matrix, and we want to
find a d by d sub-matrix of it that has the largest determinant in absolute value. That's
our problem right now. Let's just do one more simplification, and then we're almost
done. Let's just add also the inverse -- I mean, right. So after doing this first
transformation, now let's also add minus Vi for every Vi. What this lets us have is that it
makes the set of points symmetric around the origin, and that's just kind of convenient. It
also obviously doesn't change the optimal solution. Why? Because, let's see -- so if we
take a sub-matrix that has both Vi and minus Vi, the determinant is 0. So okay, so we
always -- so for a nonzero solution, we always take either Vi or minus Vi, but which one
we take just changes the sign, and we're taking absolute values, so it doesn't really change
the value of the optimal solution.
>>: [Indiscernible] decrease.
>> Sasho Nikolov: Okay, so remember, we kind of changed the problem so that we
always take the origin to be one of the vertices. So say we're just looking for the optimal
two dimensional.
>>: Or three-dimensional.
>> Sasho Nikolov: It's kind of -- yes.
>>: So two points are very close to each other.
>> Sasho Nikolov: That's just two-dimensional, right? So we look at this and this.
>>: Two dimension, it's just ->> Sasho Nikolov: So my point is that this triangle and this triangle have the same area.
>>: So they went to -- so if we have some kind of base and one -- we only care about the
high, the distance.
>> Sasho Nikolov: Right, so why are they the same? Because this length and this length
are the same, and the height doesn't do this, and you have the same for larger dimensions.
You always have like base times height. Okay? Cool. All right, so now those are the
basic transformations. Now I want to move towards the actual algorithm a little more.
So we are approximating a hard maximization problem, so the usual game is to find an
upper bound on the optimal solution, and then we compare how well our algorithm does
against that upper bound. I am worried that Mohit doesn't seem convinced. Okay, cool.
Okay, so here's the basic upper bound we will use, with some modification. So here is
one. Here is one upper bound that we have available, and remember, we have this
problem now of finding the largest determinant of a square sub-matrix. Okay, so I'm
claiming that one thing that's true is that if all columns have length at most R of this
matrix, then any determinant of this matrix is absolute value at most r^d, and I just had an
inequality, a basic statement about determinants in matrices. All right, so that's clearly
not a good upper bound, so our points could look like -- well, like this I guess, but our
points could look like this, so we could have maybe a few long points -- a few long
vectors that are far from the origin and a bunch of really tiny ones. And say we just have
one big guy. So now, this upper bound would be huge because of just this guy, but you
obviously can't get something that's comparably large in terms of an actual simplex, so
it's not a good upper bound itself, but we will fix it somehow.
>>: [Indiscernible].
>> Sasho Nikolov: Product of what?
>>: Do you know [indiscernible]?
>>: But he doesn't know which one, right?
>>: I see, but you can maximize. A better program is to maximize over all subsets of
size d, the product of the Vis.
>> Sasho Nikolov: Can you compute that? I guess you could compute that. Yes, that's
probably also not very good.
>>: You just take the log, right?
>> Sasho Nikolov: That's probably also not very good, because they could be pretty
close together and it will still be a problem. You would have a bunch of long ones, but
they're very close. All right, so here's something we will do to try to use this upper
bound. So I'll introduce one more object, which is the longer ellipsoid of this set of
points, so the columns to this matrix. And by that, I mean the minimum volume ellipsoid
that contains all the points. And because we did this transformation, let's make sure that
our set of points is symmetric around the origin, we can also see that the minimum
volume ellipsoid by sort of a standard argument is centered at zero, as well. So it will be
centered at zero. So how do we use that? Okay. So the point of this is that it fits tightly
around our set of points, so we want to kind of use that. So here's what we will do. So
this is what it looks like. We have our upper bound, which is somehow in terms of L2
norms, so in terms of containing names in a sphere, so we want to kind of map this whole
thing to a sphere, so what we do is the following. So we have this ellipsoid EL, which is
the minimum volume ellipsoid that contains our set of points. We can always compute a
linear map that maps this ellipsoid to a ball, to a standard Euclidean ball, and we can also
do that with a linear map that actually has determinant equals 1, so it doesn't change
volume. I mean, it just doesn't change volume.
>>: So the linear volume [indiscernible]?
>> Sasho Nikolov: The what now?
>>: The volumes of the sphere should be the same.
>> Sasho Nikolov: Yes, so the volume of the sphere would also not change, yes. The
volume of the sphere would also not change, and the volume of any full-dimensional
simplex would also not change, because the determinant of this linear map, say, as a
matrix is 1. Okay, so that means that R in particular has to be the product of the axis
lengths of this ellipsoid to the power 1/d. Okay. This makes sense. I thought that
something was wrong with that statement. Okay, so what we do is that we compute the
smallest volume ellipsoid. That's a convex problem. We can do it up to any
approximation. I believe that approximations are fine for us. It's not immediately
obvious, but they are. And right then, we'll compute this linear map, or in other words,
this ellipsoid will be given by either this map or its inverse. All right, and then we just do
this transformation where we apply this map to each point in the input, each Vi, and now
we have that the Lowner ellipsoid is a ball of some radius R, and now our upper bound
will be the radius to the power d. Because we just applied this linear map that doesn't
change volume, and it's a valid upper bound here, after the transformation, and the
transformation didn't change the value of any solution. This is a valid upper bound. This
makes sense? Questions?
>>: So it's just the product of the eigenvalues.
>> Sasho Nikolov: In other words, it's just the responded of the eigenvalues, yes.
>>: Across the scale.
>> Sasho Nikolov: It is just the -- for the determinant, actually, it is just the product of
the eigenvalues, and that's another. Or the product of the axis lengths of this ellipsoid, so
that's another way to see this upper bound. So not sure which is easier. Maybe what
you're saying is easier, actually. Okay, kind of one reason why I did this transformation
is I wanted to give some intuition why it's a good upper bound, and the point is, again,
well, the smallest-volume ellipsoid somehow has to fit tightly around the set of points.
Why? Because if it doesn't, so if our points don't sort of touch in some vague sense the
ellipsoid in every direction, so if there is some direction where there's big gaps, you can
kind of squeeze the ellipsoid and get a smaller volume. This is sort of the rough intuition
why this does give you some information about how -- well, remember, the bad example
for the upper bound was these thin bodies, so we want to avoid that, and this is why this
helps. And so one nice thing about the Lowner ellipsoid that was used in previous
algorithms but is too weak for the kind of result I proved is that -- so one thing we know
is that, okay, if we have some convex body whose Lowner ellipsoid is a bowl of some
radius R, then if you scale the radius by -- oh, and it's a symmetric body, as well. If you
scale the radius by root d, then you will get a bowl that's guaranteed to be contained in
the convex body. It's not guaranteed to be contained as tightly as it is here, but it's always
going to be contained. Okay, so that's one thing that tells you that the upper bound is
good in the sense that, well, this convex hull has to contain large things, because it
contains this relatively large bowl inside of it. And this was used by previous algorithms,
but it's not quite good enough for us. But what I'll use is the fact from convex
optimization that lets us prove something like that. In other words, I'll use the dual of the
Lowner ellipsoid. So this is what we get from convex duality about this Lowner
ellipsoid. This is sort of a dual characterization of what the ellipsoid is, so you can say
this is -- I guess this is John's theorem, in some restatement, and here's what it says. It
says that if the smallest volume-enclosing ellipsoid of -- it says that the smallest volumeenclosing ellipsoid of the convex hull of points V1 to Vn is a bowl of some radius if and
only if there exists non-negative weights that you assign to the points so that the sum of
Ci times these outer product matrices -- I mean, these rank 1 Vi, Vi transposed matrices
has to be equal -- so the smallest volume. So the Lowner ellipsoid is a bowl if and only if
there exist weights such that this thing is equal to the radius squared times the identity
matrix. Okay, so the intuition is that this is what comes -- and the other thing is that the
weights have to sum up to d. This is just what you get from writing the conditions for the
smallest volume containing ellipsoid problem. That's one way to look at it. If you want
some intuition except just the manipulations to get the dual, this sort of says that these
Vis have to point in all kinds of directions in order for you to be able to do something like
that, in order for you to be able to decompose the identity in this way. Okay. Okay, so
this is what we'll use, and the main idea of the actual approximation algorithm is to treat
the Cis in some sense as probability weights, and let me show you. That's vague, so let
me show you immediately what actually the algorithm is, because this is sort of simple.
So the algorithm to actually find approximately largest simplex, d-dimensional simplex,
or rather a d by d sub-matrix of this matrix, would be to sample d columns from the
matrix. Each column is sampled independently with replacement, and column I is
sampled with probability proportional to Ci, and the Ci is nonnegative, and they sum up
to d, so the probability of sampling I is just Ci/d. That's the algorithm. So we compute
this Lowner ellipsoid. We can also get these weights. Actually, some of the algorithms
that compute the Lowner ellipsoid first compute these weights, so it's already sort of for
free, and then we use these weights to run the sampling algorithm. This is what I call
randomized value. Okay, and let's now try to -- so let me now slowly go through the
analysis. This is just a couple of slides. So why is this good? So let's compute the
expectation of the squared value of a sampled solution. It turns out to be more
convenient to work with the squares for some reason. You will see why that's okay. All
right, so let's just write that out. Okay, all right. So I said I sample with replacement, but
if I sample the same column twice, I obviously get determinant 0, so those don't
contribute. So in my expectation, I only have contribution from the terms -- well, from
the cases when a sampled d distinct things. So for each such set I could have sampled, I
could sample it in d-factorial ways. They're all distinct, so there are d-factorial ways to
sample it, and the probability of sampling exactly those elements is exactly the product of
the problems, because I sample independently. So this is the probability of sampling the
set S. I can sample in d-factorial was, and the value of the set is actually this, so just the
square. I mean, for me, right now, value is the squared value, so it's just the squared
determinant. All right, so a few more manipulations, so again, Pi is just Ci/d, and all
these sets are of size d, so this gives me a d to the d-factor up front, and there is also this
d-factorial that doesn't depend on S, so just things that don't depend on S I have brought
upfront, so I have this. Remember, this is actually the inverse of the approximation factor
I'm shooting for, so I want to somehow show that this compares with my upper bound,
which is this radius. For the square value, the upper bound is the radius of the Lowner
ellipsoid, which is in this. We make sure it's a bowl to the power 2d now. Okay. Right,
and I did just one more thing. So this is -- I just brought the constants in, so what I have
is some sub-matrix, and I'm taking product of Ci/i in the index set of the columns of the
sub-matrix, times the determinant of let me call Vs the sub-matrix. So what I'm doing,
instead of having these two things separately, I can just multiply the i'th column in the
sub-matrix by its coefficient, but it's not -- right. But because I'm taking squared
determinants, I'm multiplying by the square root of the coefficient of the column so that I
get the same thing. Does this make sense? Okay. So this last step is just sort of bringing
those things in. Okay.
>>: Why did that [indiscernible]?
>> Sasho Nikolov: Why is that what I want to show? So this is the expectation of the
squared value of my algorithm, of this randomized algorithm. All right, and I said it's
equal to this big summation times this factor. So this factor is the approximation factor
I'm going to be shooting for, and now I just want to show that this is at least as big as my
upper bound on the optimal value. And my upper bound was this radius of the containing
bowl. To the power d was my upper bound for the biggest absolute value of the
determinant, but I'm squaring everything, so I just added squares on both sides. So this is
my approximation factor, and this term I want to show is at least as big as my upper
bound in the optimal solution.
>>: Well, enough to show.
>> Sasho Nikolov: Yes. Fine. It is enough to -- yes. Well, for this approximation -okay, yes, fine. Yes, yes, I agree. I agree. Okay. So let's do that. Right. So we have
this summation, so what is this? I have basically, in my matrix, I've multiplied every
column by square-root Ci, so square-root C1, square-root C2 and so on. And now this
summation is sum over all d by d sub-matrices of the square determinant of the submatrix. Okay, so I want to relate this to this radius using John's theorem. And this turns
out to just be -- to follow immediately from the Binet-Cauchy identity for determinants.
Okay, this is something -- yes, right. So it's one of these things you just write out the
formula for determinant, it kind of follows. Who has seen the Binet-Cauchy formula
before? Not really. Yes, one, two? Two people, cool. Okay, so let me sort of say in a
little more generally, so what it says is that say you have some matrix A, and say this
matrix looks like this, has this shape. I mean, it actually doesn't matter, but it makes
more sense in this way. And I'm interested in the determinant of the matrix times its
transpose, so the determinant of A times A transpose. Okay, so if A is just like a single
row, it's just the inner product, and it's just the sum over all 1 by 1 sub-matrices of this
row squared, right? So if it's just a single A times A transpose, the determinant of this is
just -- these are just the numbers, which is the number, and this is just equal to sum of Ai
squared. In general, this is equal to the sum over -- so say this is, again, d by n, sum of s,
subset of n, size of s equals d, of the determinant of A-sub-s squared. A-sub-s is just the
sub-matrix determined by this set, so it's sort of a generalization of this thing. Does this
make sense? All I can say is it makes sense for the one-dimensional case. It also makes
sense for the case where this is a square matrix, because then it's just the trivial identity.
All right. So what this gives us in this case, what we have here is exactly this summation,
where A is this matrix, and what it tells us is that this is equal to the determinant of this
matrix times its transpose. This is exactly the summation of all the products. Okay, and
of course, the two square root Cis we can bring together. Now, we have this summation
of Ci times Vi, Vi transpose. That's exactly what John's theorem tells us, that this is
equal to R-squared times the identity, and obviously the determinant of this is just R^2d,
so this kind of completes basically the analysis, so what we derived was that the
expectation of the squared value of the solution is at least this factor of d over d to the d,
times R^2d, which was our upper bound to the square of the optimal solution. And again,
you can use Sterling to see that this is basically e to the minus d, minus [indiscernible] d.
Right, so this tells us that this randomized sampling algorithm in expectation gives us
something that's good. And good here means large enough. The problem is, I think there
are cases where this can give you horrible things, except with exponentially low
probability. Because while you only get an exponential approximation factor here, so I
think it's totally possible that with high probability doesn't give you anything good, so
let's not run that algorithm. Let's instead de-randomize it, and you can de-randomize it
using standard conditional expectations machinery. So once you de-randomize it, the
guarantee you get is that you get a deterministic algorithm that has a guarantee at least as
good as this expectation guarantee, and that's enough for us. And because it's
deterministic, it just works. We don't care about probabilities anymore. And derandomizing is a little technical, but it's sort of standard. You just have to compute the
actual conditional expectations. Okay? Okay, so that's the algorithm for the full
dimensional case. And I want to a little more vaguely tell you what happens when j is
less than d, so when we're looking for less than a full-dimensional simplex. That turns
out to be a little more technical, but similar ideas work. You just have to work harder.
Okay. So first, give you a second. Okay, so switching contexts, we're back to the
beginning, so we can do basically the same transformations as before. If we can guess
the first vertex of an optimal j-dimensional simplex, again, shift everything, so subtract
Vi1, the guest vertex, from every input point, and also take the symmetric point around
the origin. So now we transformed our problem to -- now, because we took both of these
guys, now we have a problem that -- input that's symmetric around the origin. And now
we're looking not for j plus 1 points, but only for j points, so that the simplex they make
together with the origin is as large as possible in volume, and again, you can transform
this to a determinant problem. Now, it's a little -- so again, we have this matrix,
basically. Let me now put this here. And we have this matrix. But now, the problem is
to find -- okay. So d is always the ambient dimension, but we're looking for a d times j
sub-matrix that has the largest determinant. Well, the problem is what is the determinant
of a d plus j sub-matrix, and you can see that the right thing for the problem is to look at
the sub-matrix transposed times itself, look at that's a square matrix, look at the
determinant of that and take the square root. This up to sort of a normalizing constant is
the same as this volume of the j-dimensional simplex. What is probably an easier way to
think of this is that what this is actually just -- I guess it would be natural to also define
that as the determinant of a rectangular matrix. It's just a product of the singular values.
Okay. All right. Where is this? Right, so this is our problem now. We're looking to find
a d by j sub-matrix, our big input matrix. So the d sub-matrix has a large determinant,
and by determinant, I mean product of singular values for this thing. And you could
again use the same algorithm as before. It's well defined. You just sample j times a set
of d times, and it does give you an approximation factor, but the factor looks like j
factorial over d^d, so if j is pretty big, it's not too bad. It's not too far off from what we
are shooting for, but in general, it's not good enough. At least, it's not the right order of
magnitude. Okay, so we'll need a different upper bound, basically. It kind of makes
sense that this sort of full dimensional -- maybe it's not quite clear why it doesn't work,
but I'm pretty sure that the Lowner ellipsoid upper bound doesn't work immediately, so
I'm going to modify it to something that I know works. Okay. So here is the upper
bound I'm going to use. I don't know why I said upper bound ID. It's just the upper
bound. So one thing we know, it's not very hard to show, is that if all the points, all the
columns of this matrix, contain some ellipsoid e, and because our point is symmetric
around the origin, we can always assume this is centered at zero, and say the major axes
of this ellipsoid are A1 to Ad, and I've ordered them by length, so one upper bound that's
true is that so this is my value of the optimal solution. That's what I call d-sub-j, so the
maximum determinant of a d by j sub-matrix. We know that this is bounded by the
product of the j longest major axis for any ellipsoid that contains the points. It's not very
hard to show. It's the right analog, say, of this [indiscernible] quality plus Lowner
ellipsoid upper bound. Okay, so let me call this a j Lowner ellipsoid. Sorry, what I call a
j Lowner ellipsoid is the ellipsoid that minimizes this upper bound over all ellipsoids that
contain the points.
>>: [Indiscernible].
>> Sasho Nikolov: Lowner, like a V? Right, should I pronounce the umlaut like an O
umlaut? Lowner? I'm not sure. So the j Lowner ellipsoid, let's say, it minimizes this
upper bound, so this product of the j longest axis over all ellipsoids that contain the input.
So this will be -- so this achieved minimum will be this upper bound that we're going to
use for this problem. And I also want to sort of give you an algorithm that competes
against this upper bound. Let me -- because I can give you too much details, maybe as
motivation -- I'm not sure it's helpful, but let me talk through the most trivial case, where
we're just looking for the longest basically column of that matrix. So it's clearly a trivial
problem. We just go over the columns, take the longest one, so I give you an exact
algorithm for this problem, using the same machinery that I used for the general case, and
I shouldn't touch this problem. All right, so the optimal there is the length of the longest
column, and my ellipsoid, my optimal ellipsoid -- okay, it doesn't have to be a ball, but
you can see that always an optimal solution is a ball in this trivial case, so we're just
looking for an ellipsoid whose longest axis -- an ellipsoid that contains all the columns of
this matrix, and its longest axis is minimized. Okay, so in other words, you can just look
for a ball that contains all the points and the radius is minimized. And the upper bound is
just the radius. Okay, and the way this generally is going to work is I'm going to write
this as a convex program. I'm going to again take a look at the conditions, take the dual.
What the dual is going to tell me is that wherever the optimal R is, there exists
nonnegative coefficients C1 to Cn that sum up to 1, so this 1 is this 1. It's the dimension
1. Remember, in the full-dimensional case, this was d. And what I know about these
things is that I again look at the same matrix, summation CiVi transpose. Now I know
that the trace of this thing is equal to the optimal R^2. But the trace of this thing, well,
trace is linear, you can just take it inside and the trace of ViVi transpose is just the normal
Vi^2. So what this tells me is that there exists weights so that summation of Ci, length of
Vi^2 is equal to the optimal solution, which is my upper bound. Okay, so now if a
sample Vi with property Ci, this is the expectation, and it's optimal. So this is an optimal
algorithm. Take this program, write the dual, solve the dual. Take the weights, sample
according to the weights, and that actually in expectation works. So that's a very bad way
to compute a very simple thing. But the thing is, it generalizes for general j, so this is
basically what we're going to do. We're going to write the dual of the ellipsoid
minimization problem, and the dual is going to tell us that there exists non-negative
weights that sum up to j, where j is the dimension, so that some function that's
complicated -- I'll tell you what the function is, but for now, take it as a black box. So
that some well-defined but complicated function of this same old matrix is actually equal
to the optimal upper bound that we get from the ellipsoid minimization problem. Again,
this, all I'm saying here is that we know this is an upper bound on the optimal solution.
And what I mean here is the challenge in deriving this whole thing is that for the j
Lowner ellipsoid, when you write it as the solution of a minimization problem, the
objective is not -- it's convex, but it's not differentiable, so that makes deriving the dual
and writing the [indiscernible] conditions a little more complicated. It's not a big deal,
but it's just a little uglier, and it also makes this function a little more complicated, but the
algorithm will be the same. So we'll compute these dual weights, and then we'll scale
them down by j, so that they give us a probability distribution. We'll sample. I'm sorry,
this should be j. We'll sample j times independently, with replacement, using these
weights, and this will be the randomized algorithm. Again, we'll de-randomize it at the
end when we analyze this.
>>: Where do you sample these two weights?
>> Sasho Nikolov: I guess all I want to say is probably sampling without replacement is
-- I think it can only do better. It's just I can't really analyze it better. I cannot give a
better analysis for that. Okay, and why I keep pointing it out, maybe just to be clear that
that's what I'm analyzing.
>>: Actually, it's almost the same, because for all practical purposes, actually, you
replace every point ->> Sasho Nikolov: That's actually true.
>>: -- given points, and divide the weight.
>> Sasho Nikolov: That's probably true, yes. I think you're right. Okay. So in general,
how does the analysis work? So again, I'll compute. This is the expectation of the sum
of squared singular values of the matrix that I get. This is this determinant. Okay, so
again, I have -- right. So if I sample something twice, that gives me no contribution
again, because right? Well, I'll get zero. So I only get a contribution from the cases
when I sample a set of j elements, of j distinct elements. Each such set I can sample in jfactorial ways, and the probably of sampling it is the product of the probabilities, by
being dependents. And again, I have this j-factorial. I should have put it up here maybe,
but Bi is Ci/j, so I get this j^j, as well, because I take a product of j terms. And this is my
expectation. Now, before I had this magic -- okay. Sorry. Before I had some sort of
magic formula that told us what this was in the full-dimensional case, well, now there is
some analog of this, which tells us what? It tells us that this big summation is equal to
the degree j elementary symmetric polynomial, evaluated with the eigenvalues of this
matrix that we've seen before. Okay, so this, you have to believe me for this. This is part
of the technical detail, but it's not very hard once you know it. It's a useful fact, basically.
So this is the sum. So what's the degree j elementary symmetric polynomial -- okay, I'm
almost done. Okay, so you have all the eigenvalues here. You just take every subset of j
eigenvalues, take their product, take the sum over all of these things. Okay, so this is the
expectation, and once again, this is the approximation factor that I'm going to prove, so
what I need to show is that this elementary symmetric polynomial is at least as large as
my upper bound, and by duality, I got that my upper bound is actually this function,
whatever that is. So I need to compare basically this and this. I need to show that this is
at least this. Not a need. It's sufficient to show that this is at least this. It's enough. I
don't necessarily need it, but I really like -- I would really like it if I can do that. All
right, so that you don't complain that I haven't shown you this, this is the function. I
guess maybe it doesn't look so bad. So this is what the function looks like. So again, the
lambda or the eigenvalues of this PSD matrix, and it turns out to be you take the product
of the top few, top k, and then you take this kind of like an average to the power j minus
k, and k is well defined by this. K is the unique number that satisfies this. So I had to
show -- at least for me it wasn't obvious that this is ever satisfied, but it turns out to be
satisfied for unique k, and this is the function. It's a little strange.
>>: It's not [indiscernible].
>> Sasho Nikolov: Because it's not differentiable, so you get this. In the duality, you get
some what's the word? Some sub-differential of something is equal to something else. It
gets a little -- okay. But what I can show is that -- is that the right thing? Sorry, this is
reversed. This is the wrong way. So what I can show is that this function is always
bounded above by this elementary symmetric polynomial, and I'll just give you kind of
the keyword there, is that this follows by staring at it and using Schur convexity of
elementary symmetric polynomials. So it's Schur convexity is a very general tool for
proving inequalities. It turns out to be useful here. Okay, all right, so that's it. So in
summary, I talked about the first polynomial time algorithm to approximate the
maximum j-dimensional simplex problem within a factor, which is exponential in the
dimension, which is the right kind of dependence for the approximation factor. It gives a
constant factor approximation to this determinant lower bound for linear discrepancy, and
so there are a bunch of open problems. What's the right constant in the hardness is one
problem. For these ellipsoid upper bounds, is this the type analysis? Is that all they can
give? Maybe that's an easy problem, but I haven't thought about it too much. Right, and
is the approximation factor optimal? Maybe it is. It's natural. It's e to the something.
Okay, so this is for Mohit and whoever else is interested, but that's what we've talked
about a little bit. So for this -- so I'll tell you just very quickly. So the following function
is modular, is a sub-modular set function, so the sets are subsets of the columns of this
matrix, and the function is log of the determinant of the sub-matrix, and again, the
determinant, what I mean by determinant is the product of the singular values, so that's a
submodular function. And our problem is really maximizing this modular function with
the constraint that we need to get a set of size j. So the constraint is -- okay, there we go.
Sorry, just a second. So the constraint is that the set you picture would be a basis of a
uniform matroid. The problem is that the function that sub-modular is a log of the
objective function, so to get an approximation guarantee for the actual objective function,
I need an additive guarantee for the submodular maximization problem, and I'm not sure - maybe you can get that with standard machinery, but at least using black-box results, I
didn't quite see how to get it. Does this make sense as far as connection to submodular
maximization? So if you can get it using just submodularity somehow, maybe you can
generalize to something else. That's basically what I'm asking. It might also give some
intuition what's happening. That's it. Thank you.
>> Mohit Singh: Is there like -- okay, so you always present this dual word, which I
guess is implicit in this upper bound that you have, but can we write a convex relaxation
directly, which is a relaxation for the problem? Maybe [indiscernible].
>> Sasho Nikolov: That's a good question. So STP for a volume problem or determinant
problem seems a little ->> Mohit Singh: Or just some convex problem.
>> Sasho Nikolov: Some convex problem, yes. So for the full-dimensional case, so
from the dual, you get that. It's just that for the less than full-dimensional case, it's a little
bit strange. So for the full-dimensional case, it's just maximizing log of determinant of
summation CiViVi transpose. That's convex. That's ->> Mohit Singh: Maybe versus Ci.
>> Sasho Nikolov: Subject to the variable is Ci. Subject to the summation Ci equals d,
Ci bigger than 0 for all i. So that's -- okay, you can also see immediately that that's a
convex relaxation. It's just not now I came up with this, and it's also not -- it doesn't
allow me to easily generalize it to general j case. Yes. Because as you saw, it turns out
that, well, at least if I go the ellipsoid way, the dual that I get for the general case has a
much more complicated objective function that I would have never come up with just by
looking at the dual. This you could come up with, the full-dimensional case you could
come up with. Why you can see it's a relaxation? Just the usual way. Just take the Cis to
be an indicator of an optimal set. So feasible, but yes. Maybe there's a more general
relaxation is what I'm saying. Maybe there is a more natural relaxation that you can
come up with without looking at ellipsoids. Yes, more questions?
>>: Okay, that's [indiscernible].
>> Sasho Nikolov: Thanks.
Download