>> Mohit Singh: Okay, hi, everyone. It's a pleasure to introduce Sasho Nikolov. Sasho is doing a postdoc here this year. He has done a lot of nice work in many areas, including discrepancy, differential privacy and a lot of things like convex geometry, as well, and he's going to tell us something in that area. >> Sasho Nikolov: Thanks. Good to give a talk here. So I'll tell you about an approximation algorithm for a geometric problem, and I'll start with the problem, because I think it's a natural problem. Please stop me at any time if there are any questions. So here's the problem. So the input is -- let me see if I can do this. No. The input is endpoints in d-dimensional space, and our goal is to find the j-dimensional simplex in the convex hull. So here, those red dots are my points, and you imagine this is threedimensional space. This cube I've drawn is supposed to be the convex hull, and our goal is to find the largest simplex of some given dimension inside that convex hull, largest I'm measuring by basically volume. Okay, cool. All right, so we have points, n points in d dimensions. We're looking at the convex hull, and we're looking for the largest volume simplex. So a simplex, a j-dimensional simplex is just the convex hull of j plus 1 points. So a two-dimensional simplex is a triangle. A three-dimensional simplex is a tetrahedron, and so on. And we are measuring largeness in terms of volume, so when we're looking for a d-dimensional simplex and our points are in d dimensions, we really are just looking for the d-dimensional simplex of the largest volume. If j is less than the full dimension, less than d, it's kind of the obvious thing, so we would look at the flat in which the simplex lies, so the affine span, and we would look at the [indiscernible] measure on that flat. So in this case, we are in three dimensions. If we are looking for the largest two-dimensional simplex, we're really looking for the triangle that has the largest area. Make sense? Okay, and to make the problem at least sound more combinatorial, let's notice that we can always assume that an optimal simplex is the convex hull of some subset of the input points of the operations. Right, because there might be some other optimal simplex, but you can always just move the points one by one until they snap to an input point, without changing -- there's always a way to do that without changing the objective, or at least without decreasing it. This makes sense? So we're really looking for a subset of j plus 1 points from the input that have the largest volume simplex, whose convex hull is the largest one of simplex. Okay, so that's the problem. Good? Cool. All right, so I'm claiming it's a natural problem in computational geometry. It definitely has been studied for some time. You can see there's one example from a general class of problems approximating some complicated object, say, complex body, with a simpler one, say, contained inside of it. And you can see it as a sort of a polyhedral analog of the largest volume ellipsoid problem. The largest volume ellipsoid problem, that's already the joint ellipsoid problem is the same problem, but you're looking to find the largest ellipsoid containing the convex hull of your points. It's also a much harder problem in the representation which I'm working, but that doesn't matter. Okay. Right, and there are some applications of this in low-rank matrix approximation and also in discrepancy theory, and I'll tell you a little about the second application. >> Also, does it make sense to announce we have lower problems than defined? If I say it's also in a lower dimension, with its own ellipsoid? >> Sasho Nikolov: I think you can define it the same way. I'm not sure if people have looked at it. That's all I can say. Okay. All right, so let me tell you about this connection with discrepancy theory, and it's a little bit of a loaded slide, so I'll go slowly over it. So I want to tell you something called linear discrepancy. That's a measure of how well you can round in a certain size, which I'll define -- so the linear discrepancy vectors, again, of points V1 and Vn and Rd is some measure of how well you can round with respect to these points, in the following way. All right, so we are given some real weights, X1 to Xn. Those are real coefficients that correspond to each one of the vectors V1 to Vn, and our goal is to round them. So they're in plus minus 1, where they could be anything. They could be square root of two over two. But our goal is to round these things to plus, minus 1, so to the endpoints, to the integer endpoints, so that the linear combination with the given real coefficients is close to the linear combination with the rounded coefficients, and so we are measuring closeness in Euclidean norm. So what linear discrepancy is, is for a fixed set of vectors, what's the worst-case error over all possible linear combinations for rounding? Does this make sense? Okay, so that's something that comes up in approximation algorithms, for example, where you solve some linear program, you want to round a solution to linear program to an integer solution without losing too much. Okay. And what did I write here? So here is the solution to this largest volume simplex problem, so let me denote cursive V-sub-j, the maximal volume of a jdimensional simplex in the convex hull of the points. Okay, so here is a theorem. One side, so the lower bound is due to Lovasz, Spencer, and Vesztergombi. The upper bound is due to Matuschek from a few years ago. It doesn't matter exactly what this formula is, but it tells you that basically this linear discrepancy is up to log factors equal to some function of these V-sub-js. In particular, you can approximate this if you can -- if you know some approximate values for all these V-sub-js. And also notice that we are normalizing here, so this is j-dimensional volume, so we are raising to the power 1/j, so this kind of gives the right scaling factor, and this will be nice. You'll see on one slide why it's nice for this result. Make sense? >>: Is this the reason you want us to [indiscernible]? >> Sasho Nikolov: Thanks definitely one reason why I'm interested. I also -- so I saw a recent SODA paper on that -- this is the honest answer. And I realized I already had an improvement, but I just didn't know people cared about this problem. So it has been studied in computational geometry for some time, I guess. You could argue it's natural on its own. For me, as far as applications, this is maybe the main application I'm interested in. >> But it's natural to join the ellipsoid in [indiscernible] applications? >> Sasho Nikolov: Yes, I think it's a natural problem. I could dig out more applications. The connection to low-rank approximations is also interesting, but it would have taken a little more time to kind of define what it is, exactly. >> This log d, in the upper bound, do you know if that's optimal? >> Sasho Nikolov: Probably we know that it -- I need to think about it a little bit for exactly this measure. I think it's not known to be optimal, but some poly log factor is necessary. >>: [Indiscernible]. >> Sasho Nikolov: Yes, it's a good question. So it's a little hard to answer without going into a proof, so the lower bound, at least, is some covering argument, and that's kind of what comes out, so you're covering one object with simplices, basically, or something like that. When you compute the volumes, that's what comes out. It's some function of the volume of a unit bowl and the volume of a sort of regular simplex. I need to think about it a little bit, when you normalize the volume to the 1 over dimension. Maybe it's the ratio of the two, which is what comes out when you just do the obvious volume lower bound for covering. Does this make any sort of sense? I can think more about it and come back to you. Okay, so it's not the ratio of the two, but it's some function of those two things. So let me move to my result, unless there are more questions. Okay, so here is what I want to prove. I mean, here's what I prove in this work and what I'll talk to you about. So I'll show there exists a deterministic polynomial time algorithm that approximates the volume of the largest j-dimensional simplex up to a factor -- okay, so the analysis exactly gives this j to the j over j factorial under square root, and this by Sterling is sort of like e to the j/2. Okay, and notice that here, we always take Vj to the 1/j, so this -- that observation and this approximation factor gives us a constant factor of approximation to this linear discrepancy problem. All right. Sorry, it gives a constant factor of approximation to this volume lower bound for the linear discrepancy problem. It gives us log approximation to the actual linear discrepancy problem. Also, this e to the -- this constant to the dimension sort of approximation factor is the optimal kind of approximation you can get, in the sense that we know that the approximation cannot be better than exponential in the dimension. So there exists some constant bigger than 1 such that this problem is NP hard to approximate better than the constant to the j, to the dimension. Okay, this constant is really much closer to 1 than e, so it's still not clear what the right constant is there, but at least we'll get the right order of dependence, and this NP hardness is true as long as -- so, obviously, j is constant, you can just brute-force the problem. So this NP hardness holds as long as j is sort of big enough. Again, polynomial time here means polynomial in both dimension and number of points. This is not the usual computational geometry setting where we are in the plane, because then the problem becomes too easy. And it's a significant improvement on what we knew before, so previously, for general j, we knew j to the j/2 approximation. I'm losing some constant to the j here. This was due to [Khatian] and Packer. And very recently -- I forgot the reference. Okay, the [indiscernible], Fritz Eisenbrand and a few other people. That's embarrassing, I'm sorry. I apologize to Fritz and everybody else. So they improved this only for the case where you're looking for the largest d-dimensional ellipsoid, so only for the full-dimensional case. They get something which is like log d to the d/2. This is the SODA 2015 paper, the last SODA. And so we improve on those two results, I think significantly. And this is my result. Any questions? Okay. So I want to first tell you about the d-dimensional case, because that's easier. It contains a lot of the intuition, and I want to give you as much of a full proof as I can in the time that I have for this case, because I think it's simple enough. So I'll go slow. Please stop me, again, if you have questions. So here are some simple reductions that we can do to the problem to make it slightly easier. There are a bunch of them, so I'll go slow over them. So let me use delta, because it looks like a triangle, to be the optimal simplex. So this is the convex hull of some subset of d plus 1 points from the input. Let's guess the first vertex. We can always do that. There are only n options. In reality, what we will do is we will just kind of run over all the options. We will run the algorithm once per every option, but for now, let's assume we just know the first vertex, someone tells us what it is. And let's replace each point Vi, so by replace I mean let's just move each point. Yes, let's just subtract the first vertex, Vi1 from each point Vi. So this just kind of shifts the whole set of points, so it obviously doesn't change volumes, but now we know that one of the optimal points is the origin. It's just convenient. It's convenient, you've seen a second y, but now, after we do this, the new problem is to find not d plus 1 but only d points from the input, so a subset of d points from the input, so that the volume of the convex hull of these d points in the origin is maximized. Okay, and that's convenient, just because it lets us make the problem a linear algebra problem as opposed to a geometry problem. Okay, so this is equivalent to we have our points, so let me write something. Okay, so we have our points, and let's think of this matrix maybe A is not the right thing. Sorry about that. Matrix V, which is just the matrix which has columns, which are the points. Except the 0 point, which we don't care about anymore. Okay. So V2 and so on, okay, and now I'm claiming that this problem of finding these d points whose convex together with 0 has the largest volume is the same as finding d by d -- so here, this dimension is d and this dimension is n. Okay, so that problem is equivalent to the problem of finding a d by d sub-matrix of this that has the largest determinant in absolute value. Why? Because the volume of this thing is just equal to the value of the corresponding sub-matrix times a scaling factor that doesn't -- sorry. The volume of this is equal to the absolute value of the determinant of the corresponding sub-matrix times 1/d-factorial, and that's -- yes. Let's just take that on faith. I think that's convincing for everyone, right? Okay, cool. All right, so now we have this problem. We just have this n by d matrix, and we want to find a d by d sub-matrix of it that has the largest determinant in absolute value. That's our problem right now. Let's just do one more simplification, and then we're almost done. Let's just add also the inverse -- I mean, right. So after doing this first transformation, now let's also add minus Vi for every Vi. What this lets us have is that it makes the set of points symmetric around the origin, and that's just kind of convenient. It also obviously doesn't change the optimal solution. Why? Because, let's see -- so if we take a sub-matrix that has both Vi and minus Vi, the determinant is 0. So okay, so we always -- so for a nonzero solution, we always take either Vi or minus Vi, but which one we take just changes the sign, and we're taking absolute values, so it doesn't really change the value of the optimal solution. >>: [Indiscernible] decrease. >> Sasho Nikolov: Okay, so remember, we kind of changed the problem so that we always take the origin to be one of the vertices. So say we're just looking for the optimal two dimensional. >>: Or three-dimensional. >> Sasho Nikolov: It's kind of -- yes. >>: So two points are very close to each other. >> Sasho Nikolov: That's just two-dimensional, right? So we look at this and this. >>: Two dimension, it's just ->> Sasho Nikolov: So my point is that this triangle and this triangle have the same area. >>: So they went to -- so if we have some kind of base and one -- we only care about the high, the distance. >> Sasho Nikolov: Right, so why are they the same? Because this length and this length are the same, and the height doesn't do this, and you have the same for larger dimensions. You always have like base times height. Okay? Cool. All right, so now those are the basic transformations. Now I want to move towards the actual algorithm a little more. So we are approximating a hard maximization problem, so the usual game is to find an upper bound on the optimal solution, and then we compare how well our algorithm does against that upper bound. I am worried that Mohit doesn't seem convinced. Okay, cool. Okay, so here's the basic upper bound we will use, with some modification. So here is one. Here is one upper bound that we have available, and remember, we have this problem now of finding the largest determinant of a square sub-matrix. Okay, so I'm claiming that one thing that's true is that if all columns have length at most R of this matrix, then any determinant of this matrix is absolute value at most r^d, and I just had an inequality, a basic statement about determinants in matrices. All right, so that's clearly not a good upper bound, so our points could look like -- well, like this I guess, but our points could look like this, so we could have maybe a few long points -- a few long vectors that are far from the origin and a bunch of really tiny ones. And say we just have one big guy. So now, this upper bound would be huge because of just this guy, but you obviously can't get something that's comparably large in terms of an actual simplex, so it's not a good upper bound itself, but we will fix it somehow. >>: [Indiscernible]. >> Sasho Nikolov: Product of what? >>: Do you know [indiscernible]? >>: But he doesn't know which one, right? >>: I see, but you can maximize. A better program is to maximize over all subsets of size d, the product of the Vis. >> Sasho Nikolov: Can you compute that? I guess you could compute that. Yes, that's probably also not very good. >>: You just take the log, right? >> Sasho Nikolov: That's probably also not very good, because they could be pretty close together and it will still be a problem. You would have a bunch of long ones, but they're very close. All right, so here's something we will do to try to use this upper bound. So I'll introduce one more object, which is the longer ellipsoid of this set of points, so the columns to this matrix. And by that, I mean the minimum volume ellipsoid that contains all the points. And because we did this transformation, let's make sure that our set of points is symmetric around the origin, we can also see that the minimum volume ellipsoid by sort of a standard argument is centered at zero, as well. So it will be centered at zero. So how do we use that? Okay. So the point of this is that it fits tightly around our set of points, so we want to kind of use that. So here's what we will do. So this is what it looks like. We have our upper bound, which is somehow in terms of L2 norms, so in terms of containing names in a sphere, so we want to kind of map this whole thing to a sphere, so what we do is the following. So we have this ellipsoid EL, which is the minimum volume ellipsoid that contains our set of points. We can always compute a linear map that maps this ellipsoid to a ball, to a standard Euclidean ball, and we can also do that with a linear map that actually has determinant equals 1, so it doesn't change volume. I mean, it just doesn't change volume. >>: So the linear volume [indiscernible]? >> Sasho Nikolov: The what now? >>: The volumes of the sphere should be the same. >> Sasho Nikolov: Yes, so the volume of the sphere would also not change, yes. The volume of the sphere would also not change, and the volume of any full-dimensional simplex would also not change, because the determinant of this linear map, say, as a matrix is 1. Okay, so that means that R in particular has to be the product of the axis lengths of this ellipsoid to the power 1/d. Okay. This makes sense. I thought that something was wrong with that statement. Okay, so what we do is that we compute the smallest volume ellipsoid. That's a convex problem. We can do it up to any approximation. I believe that approximations are fine for us. It's not immediately obvious, but they are. And right then, we'll compute this linear map, or in other words, this ellipsoid will be given by either this map or its inverse. All right, and then we just do this transformation where we apply this map to each point in the input, each Vi, and now we have that the Lowner ellipsoid is a ball of some radius R, and now our upper bound will be the radius to the power d. Because we just applied this linear map that doesn't change volume, and it's a valid upper bound here, after the transformation, and the transformation didn't change the value of any solution. This is a valid upper bound. This makes sense? Questions? >>: So it's just the product of the eigenvalues. >> Sasho Nikolov: In other words, it's just the responded of the eigenvalues, yes. >>: Across the scale. >> Sasho Nikolov: It is just the -- for the determinant, actually, it is just the product of the eigenvalues, and that's another. Or the product of the axis lengths of this ellipsoid, so that's another way to see this upper bound. So not sure which is easier. Maybe what you're saying is easier, actually. Okay, kind of one reason why I did this transformation is I wanted to give some intuition why it's a good upper bound, and the point is, again, well, the smallest-volume ellipsoid somehow has to fit tightly around the set of points. Why? Because if it doesn't, so if our points don't sort of touch in some vague sense the ellipsoid in every direction, so if there is some direction where there's big gaps, you can kind of squeeze the ellipsoid and get a smaller volume. This is sort of the rough intuition why this does give you some information about how -- well, remember, the bad example for the upper bound was these thin bodies, so we want to avoid that, and this is why this helps. And so one nice thing about the Lowner ellipsoid that was used in previous algorithms but is too weak for the kind of result I proved is that -- so one thing we know is that, okay, if we have some convex body whose Lowner ellipsoid is a bowl of some radius R, then if you scale the radius by -- oh, and it's a symmetric body, as well. If you scale the radius by root d, then you will get a bowl that's guaranteed to be contained in the convex body. It's not guaranteed to be contained as tightly as it is here, but it's always going to be contained. Okay, so that's one thing that tells you that the upper bound is good in the sense that, well, this convex hull has to contain large things, because it contains this relatively large bowl inside of it. And this was used by previous algorithms, but it's not quite good enough for us. But what I'll use is the fact from convex optimization that lets us prove something like that. In other words, I'll use the dual of the Lowner ellipsoid. So this is what we get from convex duality about this Lowner ellipsoid. This is sort of a dual characterization of what the ellipsoid is, so you can say this is -- I guess this is John's theorem, in some restatement, and here's what it says. It says that if the smallest volume-enclosing ellipsoid of -- it says that the smallest volumeenclosing ellipsoid of the convex hull of points V1 to Vn is a bowl of some radius if and only if there exists non-negative weights that you assign to the points so that the sum of Ci times these outer product matrices -- I mean, these rank 1 Vi, Vi transposed matrices has to be equal -- so the smallest volume. So the Lowner ellipsoid is a bowl if and only if there exist weights such that this thing is equal to the radius squared times the identity matrix. Okay, so the intuition is that this is what comes -- and the other thing is that the weights have to sum up to d. This is just what you get from writing the conditions for the smallest volume containing ellipsoid problem. That's one way to look at it. If you want some intuition except just the manipulations to get the dual, this sort of says that these Vis have to point in all kinds of directions in order for you to be able to do something like that, in order for you to be able to decompose the identity in this way. Okay. Okay, so this is what we'll use, and the main idea of the actual approximation algorithm is to treat the Cis in some sense as probability weights, and let me show you. That's vague, so let me show you immediately what actually the algorithm is, because this is sort of simple. So the algorithm to actually find approximately largest simplex, d-dimensional simplex, or rather a d by d sub-matrix of this matrix, would be to sample d columns from the matrix. Each column is sampled independently with replacement, and column I is sampled with probability proportional to Ci, and the Ci is nonnegative, and they sum up to d, so the probability of sampling I is just Ci/d. That's the algorithm. So we compute this Lowner ellipsoid. We can also get these weights. Actually, some of the algorithms that compute the Lowner ellipsoid first compute these weights, so it's already sort of for free, and then we use these weights to run the sampling algorithm. This is what I call randomized value. Okay, and let's now try to -- so let me now slowly go through the analysis. This is just a couple of slides. So why is this good? So let's compute the expectation of the squared value of a sampled solution. It turns out to be more convenient to work with the squares for some reason. You will see why that's okay. All right, so let's just write that out. Okay, all right. So I said I sample with replacement, but if I sample the same column twice, I obviously get determinant 0, so those don't contribute. So in my expectation, I only have contribution from the terms -- well, from the cases when a sampled d distinct things. So for each such set I could have sampled, I could sample it in d-factorial ways. They're all distinct, so there are d-factorial ways to sample it, and the probability of sampling exactly those elements is exactly the product of the problems, because I sample independently. So this is the probability of sampling the set S. I can sample in d-factorial was, and the value of the set is actually this, so just the square. I mean, for me, right now, value is the squared value, so it's just the squared determinant. All right, so a few more manipulations, so again, Pi is just Ci/d, and all these sets are of size d, so this gives me a d to the d-factor up front, and there is also this d-factorial that doesn't depend on S, so just things that don't depend on S I have brought upfront, so I have this. Remember, this is actually the inverse of the approximation factor I'm shooting for, so I want to somehow show that this compares with my upper bound, which is this radius. For the square value, the upper bound is the radius of the Lowner ellipsoid, which is in this. We make sure it's a bowl to the power 2d now. Okay. Right, and I did just one more thing. So this is -- I just brought the constants in, so what I have is some sub-matrix, and I'm taking product of Ci/i in the index set of the columns of the sub-matrix, times the determinant of let me call Vs the sub-matrix. So what I'm doing, instead of having these two things separately, I can just multiply the i'th column in the sub-matrix by its coefficient, but it's not -- right. But because I'm taking squared determinants, I'm multiplying by the square root of the coefficient of the column so that I get the same thing. Does this make sense? Okay. So this last step is just sort of bringing those things in. Okay. >>: Why did that [indiscernible]? >> Sasho Nikolov: Why is that what I want to show? So this is the expectation of the squared value of my algorithm, of this randomized algorithm. All right, and I said it's equal to this big summation times this factor. So this factor is the approximation factor I'm going to be shooting for, and now I just want to show that this is at least as big as my upper bound on the optimal value. And my upper bound was this radius of the containing bowl. To the power d was my upper bound for the biggest absolute value of the determinant, but I'm squaring everything, so I just added squares on both sides. So this is my approximation factor, and this term I want to show is at least as big as my upper bound in the optimal solution. >>: Well, enough to show. >> Sasho Nikolov: Yes. Fine. It is enough to -- yes. Well, for this approximation -okay, yes, fine. Yes, yes, I agree. I agree. Okay. So let's do that. Right. So we have this summation, so what is this? I have basically, in my matrix, I've multiplied every column by square-root Ci, so square-root C1, square-root C2 and so on. And now this summation is sum over all d by d sub-matrices of the square determinant of the submatrix. Okay, so I want to relate this to this radius using John's theorem. And this turns out to just be -- to follow immediately from the Binet-Cauchy identity for determinants. Okay, this is something -- yes, right. So it's one of these things you just write out the formula for determinant, it kind of follows. Who has seen the Binet-Cauchy formula before? Not really. Yes, one, two? Two people, cool. Okay, so let me sort of say in a little more generally, so what it says is that say you have some matrix A, and say this matrix looks like this, has this shape. I mean, it actually doesn't matter, but it makes more sense in this way. And I'm interested in the determinant of the matrix times its transpose, so the determinant of A times A transpose. Okay, so if A is just like a single row, it's just the inner product, and it's just the sum over all 1 by 1 sub-matrices of this row squared, right? So if it's just a single A times A transpose, the determinant of this is just -- these are just the numbers, which is the number, and this is just equal to sum of Ai squared. In general, this is equal to the sum over -- so say this is, again, d by n, sum of s, subset of n, size of s equals d, of the determinant of A-sub-s squared. A-sub-s is just the sub-matrix determined by this set, so it's sort of a generalization of this thing. Does this make sense? All I can say is it makes sense for the one-dimensional case. It also makes sense for the case where this is a square matrix, because then it's just the trivial identity. All right. So what this gives us in this case, what we have here is exactly this summation, where A is this matrix, and what it tells us is that this is equal to the determinant of this matrix times its transpose. This is exactly the summation of all the products. Okay, and of course, the two square root Cis we can bring together. Now, we have this summation of Ci times Vi, Vi transpose. That's exactly what John's theorem tells us, that this is equal to R-squared times the identity, and obviously the determinant of this is just R^2d, so this kind of completes basically the analysis, so what we derived was that the expectation of the squared value of the solution is at least this factor of d over d to the d, times R^2d, which was our upper bound to the square of the optimal solution. And again, you can use Sterling to see that this is basically e to the minus d, minus [indiscernible] d. Right, so this tells us that this randomized sampling algorithm in expectation gives us something that's good. And good here means large enough. The problem is, I think there are cases where this can give you horrible things, except with exponentially low probability. Because while you only get an exponential approximation factor here, so I think it's totally possible that with high probability doesn't give you anything good, so let's not run that algorithm. Let's instead de-randomize it, and you can de-randomize it using standard conditional expectations machinery. So once you de-randomize it, the guarantee you get is that you get a deterministic algorithm that has a guarantee at least as good as this expectation guarantee, and that's enough for us. And because it's deterministic, it just works. We don't care about probabilities anymore. And derandomizing is a little technical, but it's sort of standard. You just have to compute the actual conditional expectations. Okay? Okay, so that's the algorithm for the full dimensional case. And I want to a little more vaguely tell you what happens when j is less than d, so when we're looking for less than a full-dimensional simplex. That turns out to be a little more technical, but similar ideas work. You just have to work harder. Okay. So first, give you a second. Okay, so switching contexts, we're back to the beginning, so we can do basically the same transformations as before. If we can guess the first vertex of an optimal j-dimensional simplex, again, shift everything, so subtract Vi1, the guest vertex, from every input point, and also take the symmetric point around the origin. So now we transformed our problem to -- now, because we took both of these guys, now we have a problem that -- input that's symmetric around the origin. And now we're looking not for j plus 1 points, but only for j points, so that the simplex they make together with the origin is as large as possible in volume, and again, you can transform this to a determinant problem. Now, it's a little -- so again, we have this matrix, basically. Let me now put this here. And we have this matrix. But now, the problem is to find -- okay. So d is always the ambient dimension, but we're looking for a d times j sub-matrix that has the largest determinant. Well, the problem is what is the determinant of a d plus j sub-matrix, and you can see that the right thing for the problem is to look at the sub-matrix transposed times itself, look at that's a square matrix, look at the determinant of that and take the square root. This up to sort of a normalizing constant is the same as this volume of the j-dimensional simplex. What is probably an easier way to think of this is that what this is actually just -- I guess it would be natural to also define that as the determinant of a rectangular matrix. It's just a product of the singular values. Okay. All right. Where is this? Right, so this is our problem now. We're looking to find a d by j sub-matrix, our big input matrix. So the d sub-matrix has a large determinant, and by determinant, I mean product of singular values for this thing. And you could again use the same algorithm as before. It's well defined. You just sample j times a set of d times, and it does give you an approximation factor, but the factor looks like j factorial over d^d, so if j is pretty big, it's not too bad. It's not too far off from what we are shooting for, but in general, it's not good enough. At least, it's not the right order of magnitude. Okay, so we'll need a different upper bound, basically. It kind of makes sense that this sort of full dimensional -- maybe it's not quite clear why it doesn't work, but I'm pretty sure that the Lowner ellipsoid upper bound doesn't work immediately, so I'm going to modify it to something that I know works. Okay. So here is the upper bound I'm going to use. I don't know why I said upper bound ID. It's just the upper bound. So one thing we know, it's not very hard to show, is that if all the points, all the columns of this matrix, contain some ellipsoid e, and because our point is symmetric around the origin, we can always assume this is centered at zero, and say the major axes of this ellipsoid are A1 to Ad, and I've ordered them by length, so one upper bound that's true is that so this is my value of the optimal solution. That's what I call d-sub-j, so the maximum determinant of a d by j sub-matrix. We know that this is bounded by the product of the j longest major axis for any ellipsoid that contains the points. It's not very hard to show. It's the right analog, say, of this [indiscernible] quality plus Lowner ellipsoid upper bound. Okay, so let me call this a j Lowner ellipsoid. Sorry, what I call a j Lowner ellipsoid is the ellipsoid that minimizes this upper bound over all ellipsoids that contain the points. >>: [Indiscernible]. >> Sasho Nikolov: Lowner, like a V? Right, should I pronounce the umlaut like an O umlaut? Lowner? I'm not sure. So the j Lowner ellipsoid, let's say, it minimizes this upper bound, so this product of the j longest axis over all ellipsoids that contain the input. So this will be -- so this achieved minimum will be this upper bound that we're going to use for this problem. And I also want to sort of give you an algorithm that competes against this upper bound. Let me -- because I can give you too much details, maybe as motivation -- I'm not sure it's helpful, but let me talk through the most trivial case, where we're just looking for the longest basically column of that matrix. So it's clearly a trivial problem. We just go over the columns, take the longest one, so I give you an exact algorithm for this problem, using the same machinery that I used for the general case, and I shouldn't touch this problem. All right, so the optimal there is the length of the longest column, and my ellipsoid, my optimal ellipsoid -- okay, it doesn't have to be a ball, but you can see that always an optimal solution is a ball in this trivial case, so we're just looking for an ellipsoid whose longest axis -- an ellipsoid that contains all the columns of this matrix, and its longest axis is minimized. Okay, so in other words, you can just look for a ball that contains all the points and the radius is minimized. And the upper bound is just the radius. Okay, and the way this generally is going to work is I'm going to write this as a convex program. I'm going to again take a look at the conditions, take the dual. What the dual is going to tell me is that wherever the optimal R is, there exists nonnegative coefficients C1 to Cn that sum up to 1, so this 1 is this 1. It's the dimension 1. Remember, in the full-dimensional case, this was d. And what I know about these things is that I again look at the same matrix, summation CiVi transpose. Now I know that the trace of this thing is equal to the optimal R^2. But the trace of this thing, well, trace is linear, you can just take it inside and the trace of ViVi transpose is just the normal Vi^2. So what this tells me is that there exists weights so that summation of Ci, length of Vi^2 is equal to the optimal solution, which is my upper bound. Okay, so now if a sample Vi with property Ci, this is the expectation, and it's optimal. So this is an optimal algorithm. Take this program, write the dual, solve the dual. Take the weights, sample according to the weights, and that actually in expectation works. So that's a very bad way to compute a very simple thing. But the thing is, it generalizes for general j, so this is basically what we're going to do. We're going to write the dual of the ellipsoid minimization problem, and the dual is going to tell us that there exists non-negative weights that sum up to j, where j is the dimension, so that some function that's complicated -- I'll tell you what the function is, but for now, take it as a black box. So that some well-defined but complicated function of this same old matrix is actually equal to the optimal upper bound that we get from the ellipsoid minimization problem. Again, this, all I'm saying here is that we know this is an upper bound on the optimal solution. And what I mean here is the challenge in deriving this whole thing is that for the j Lowner ellipsoid, when you write it as the solution of a minimization problem, the objective is not -- it's convex, but it's not differentiable, so that makes deriving the dual and writing the [indiscernible] conditions a little more complicated. It's not a big deal, but it's just a little uglier, and it also makes this function a little more complicated, but the algorithm will be the same. So we'll compute these dual weights, and then we'll scale them down by j, so that they give us a probability distribution. We'll sample. I'm sorry, this should be j. We'll sample j times independently, with replacement, using these weights, and this will be the randomized algorithm. Again, we'll de-randomize it at the end when we analyze this. >>: Where do you sample these two weights? >> Sasho Nikolov: I guess all I want to say is probably sampling without replacement is -- I think it can only do better. It's just I can't really analyze it better. I cannot give a better analysis for that. Okay, and why I keep pointing it out, maybe just to be clear that that's what I'm analyzing. >>: Actually, it's almost the same, because for all practical purposes, actually, you replace every point ->> Sasho Nikolov: That's actually true. >>: -- given points, and divide the weight. >> Sasho Nikolov: That's probably true, yes. I think you're right. Okay. So in general, how does the analysis work? So again, I'll compute. This is the expectation of the sum of squared singular values of the matrix that I get. This is this determinant. Okay, so again, I have -- right. So if I sample something twice, that gives me no contribution again, because right? Well, I'll get zero. So I only get a contribution from the cases when I sample a set of j elements, of j distinct elements. Each such set I can sample in jfactorial ways, and the probably of sampling it is the product of the probabilities, by being dependents. And again, I have this j-factorial. I should have put it up here maybe, but Bi is Ci/j, so I get this j^j, as well, because I take a product of j terms. And this is my expectation. Now, before I had this magic -- okay. Sorry. Before I had some sort of magic formula that told us what this was in the full-dimensional case, well, now there is some analog of this, which tells us what? It tells us that this big summation is equal to the degree j elementary symmetric polynomial, evaluated with the eigenvalues of this matrix that we've seen before. Okay, so this, you have to believe me for this. This is part of the technical detail, but it's not very hard once you know it. It's a useful fact, basically. So this is the sum. So what's the degree j elementary symmetric polynomial -- okay, I'm almost done. Okay, so you have all the eigenvalues here. You just take every subset of j eigenvalues, take their product, take the sum over all of these things. Okay, so this is the expectation, and once again, this is the approximation factor that I'm going to prove, so what I need to show is that this elementary symmetric polynomial is at least as large as my upper bound, and by duality, I got that my upper bound is actually this function, whatever that is. So I need to compare basically this and this. I need to show that this is at least this. Not a need. It's sufficient to show that this is at least this. It's enough. I don't necessarily need it, but I really like -- I would really like it if I can do that. All right, so that you don't complain that I haven't shown you this, this is the function. I guess maybe it doesn't look so bad. So this is what the function looks like. So again, the lambda or the eigenvalues of this PSD matrix, and it turns out to be you take the product of the top few, top k, and then you take this kind of like an average to the power j minus k, and k is well defined by this. K is the unique number that satisfies this. So I had to show -- at least for me it wasn't obvious that this is ever satisfied, but it turns out to be satisfied for unique k, and this is the function. It's a little strange. >>: It's not [indiscernible]. >> Sasho Nikolov: Because it's not differentiable, so you get this. In the duality, you get some what's the word? Some sub-differential of something is equal to something else. It gets a little -- okay. But what I can show is that -- is that the right thing? Sorry, this is reversed. This is the wrong way. So what I can show is that this function is always bounded above by this elementary symmetric polynomial, and I'll just give you kind of the keyword there, is that this follows by staring at it and using Schur convexity of elementary symmetric polynomials. So it's Schur convexity is a very general tool for proving inequalities. It turns out to be useful here. Okay, all right, so that's it. So in summary, I talked about the first polynomial time algorithm to approximate the maximum j-dimensional simplex problem within a factor, which is exponential in the dimension, which is the right kind of dependence for the approximation factor. It gives a constant factor approximation to this determinant lower bound for linear discrepancy, and so there are a bunch of open problems. What's the right constant in the hardness is one problem. For these ellipsoid upper bounds, is this the type analysis? Is that all they can give? Maybe that's an easy problem, but I haven't thought about it too much. Right, and is the approximation factor optimal? Maybe it is. It's natural. It's e to the something. Okay, so this is for Mohit and whoever else is interested, but that's what we've talked about a little bit. So for this -- so I'll tell you just very quickly. So the following function is modular, is a sub-modular set function, so the sets are subsets of the columns of this matrix, and the function is log of the determinant of the sub-matrix, and again, the determinant, what I mean by determinant is the product of the singular values, so that's a submodular function. And our problem is really maximizing this modular function with the constraint that we need to get a set of size j. So the constraint is -- okay, there we go. Sorry, just a second. So the constraint is that the set you picture would be a basis of a uniform matroid. The problem is that the function that sub-modular is a log of the objective function, so to get an approximation guarantee for the actual objective function, I need an additive guarantee for the submodular maximization problem, and I'm not sure - maybe you can get that with standard machinery, but at least using black-box results, I didn't quite see how to get it. Does this make sense as far as connection to submodular maximization? So if you can get it using just submodularity somehow, maybe you can generalize to something else. That's basically what I'm asking. It might also give some intuition what's happening. That's it. Thank you. >> Mohit Singh: Is there like -- okay, so you always present this dual word, which I guess is implicit in this upper bound that you have, but can we write a convex relaxation directly, which is a relaxation for the problem? Maybe [indiscernible]. >> Sasho Nikolov: That's a good question. So STP for a volume problem or determinant problem seems a little ->> Mohit Singh: Or just some convex problem. >> Sasho Nikolov: Some convex problem, yes. So for the full-dimensional case, so from the dual, you get that. It's just that for the less than full-dimensional case, it's a little bit strange. So for the full-dimensional case, it's just maximizing log of determinant of summation CiViVi transpose. That's convex. That's ->> Mohit Singh: Maybe versus Ci. >> Sasho Nikolov: Subject to the variable is Ci. Subject to the summation Ci equals d, Ci bigger than 0 for all i. So that's -- okay, you can also see immediately that that's a convex relaxation. It's just not now I came up with this, and it's also not -- it doesn't allow me to easily generalize it to general j case. Yes. Because as you saw, it turns out that, well, at least if I go the ellipsoid way, the dual that I get for the general case has a much more complicated objective function that I would have never come up with just by looking at the dual. This you could come up with, the full-dimensional case you could come up with. Why you can see it's a relaxation? Just the usual way. Just take the Cis to be an indicator of an optimal set. So feasible, but yes. Maybe there's a more general relaxation is what I'm saying. Maybe there is a more natural relaxation that you can come up with without looking at ellipsoids. Yes, more questions? >>: Okay, that's [indiscernible]. >> Sasho Nikolov: Thanks.