>> Ofer Dekel: Next speaker is Hariharan Narayanan and the talk is entitled Testing the Manifold Hypothesis. >> Hariharan Narayanan: Okay. Hi. Thank you for the invitation to speak here. This is work with Charles Fefferman and Sanjoy Mitter. So as we all know, nowadays we often encounter data which has many dimensions. And often the dimension is comparable or larger than the number of samples we have, that we can analyze. And the analysis of such data is associated with some specific phenomena. One of which is well known. It's called the curse of dimensionality. Among its aspects are the fact that the sample complexity of function approximation goes exponentially in the dimension. And there's also a comparable algorithmic law as the dimension grows, which is also exponential. What is also sometimes mentioned there are certain blessings associated with data analysis in high dimension. But these are actually not so much blessings as ways of getting around the curse. And also they generally come up mostly in the analysis and not in the actual bounds. So it becomes easier to analyze certain phenomena because of laws of large numbers that come up in asymptotic analysis. And there's also this phenomenon of concentration of measure where in high dimensions random quantities start behaving like deterministic quantities. And that is also beneficial when it comes to analysis. So one way of getting around the curse of dimensionality is what is called manifold learning. And this is based on the manifold hypothesis, which is the hypothesis that this high dimensional data actually lies in the vicinity of a low dimensional manifold. And if this were true, we're often able to prove bounds that do not depend on the ambient dimension of the data but rather on the dimension of the manifold. This is the hypothesis that has gained a lot of traction over the past one or two decades. And there's a great deal of work that is based on this hypothesis where, for example, in iso map you develop maps that reduce the dimensionality of the data, assuming that it actually comes from some, a low dimensional surface embedded in high dimensions. But the question of whether or not data actually lies in a manifold is less well understood. And that is what I would like to talk about today. I would like to come up with a test that says whether or not high dimensional data lies in the vicinity of low dimensional manifold. So one comment that should be made when addressing this question is that this is very much dependent on the representation of the data. So there may be representations where data lie on a manifold and representations where they don't. For example, if you have a scene and you move a camera around the scene in a smooth manner, but then you parameterize the image that you see by the position of the camera, that is something that is smooth and it would lie on a manifold. The position or orientation of the camera lies on a manifold. But if the actual scene is cluttered and you try to measure two scenes that are shifted by a little bit by say something like L-2 distance, that is not going to be smooth. So there may be representations where data is on a manifold and other representations where it's not. But here we'll be talking about a fixed representation. And now I'll move on to the statistical and algorithmic aspects of the question of testing if data lie in your manifold. So before I move on, I need to tell you what I mean by a manifold in some quantitative sense, because otherwise you can always fit a very complicated curve to any number of points. So you have to have certain trade-off between how complicated the curve is and how many points you have and things like that. So because of that, let's introduce a few quantities. So we'll define the reach of a set to be the largest number such that for any R less than that number tau, if you take a point at the distance of tau to the set-so here's the set, then that set has a unique nearest point. In this case, it's this point. So, for example, if you had to have a straight line then the reach is infinite, because any point has a unique nearest point to the line. But if you take a circle of a radius tau, then the reach of the circle is tau itself, because the center point of that -- because the center of that circle has a unique -- has several nearest points. And therefore it's not true with tau. So here is a curve of large reach. Basically the reach is something like this. It's the radius you need to enlarge before which the boundary of that tube starts behaving badly. And that happens at some point like this. And here this has much smaller reach. And we'll define GE of DV and tau to be the family of D dimensional submanifolds of the unit ball which have a D dimensional volume less or equal to V and reach of less or equal to tau. And a reach that is greater or equal to tau, I'm sorry. So here's our formalization of the question. We're going to assume that the data is drawn IEED from some unknown probability, distribution D, in visible space and X1, X2, et cetera, are IEEd samples and based on these samples you have to come up with an algorithm that says yes or no, that there is a manifold or there's not a manifold. And so with the error epsilon, given error epsilon, dimension D volume D tau and the confidence one minus delta our question is there some algorithm that will take this many IEED samples from P and then output yes or no depending on whether there is a manifold belonging to this class, namely having bounded volume and reach such that the average square data of random point from this manifold is less than epsilon. So we are asking for a mean squared error to the manifold to be less than epsilon. That's our notion of nearness. So this has two aspects. First, how much data do you need before which you can even answer this question in a statistical sense. Because if you have too little data and ask does this data lie on a very complicated curve, then just on the basis of that data you probably would always be able to fit the curve but it would not tell you anything about the actual probability distribution. As you got more and more points, your answer would probably come out false. This is the issue of generalization error. And we have a reserve that says that the sample complexity depends only on the intrinsic dimension volume and reach of the manifolds you're fitting but not on the ambient dimension of the Hilbert space where the data lies. The number of samples you need to answer the question really does not rely on the ambient dimension, and that is good news. And secondly there is the algorithmic question. So given certain number of data points, how do we test whether or not there is a manifold that lies close to these data points in the average squared sense? And here we are allowing ourselves certain slack. So we are -- this is to make the problem easier for us. We allow -- we don't necessarily say it would have to be volume less or equal to V. Any volume less or equal to CV is okay with us, but it would be nice to make that one plus epsilon. Now I'll talk about the results we have on the sample one complexity and move on to the algorithm. So here's some more notation. Let LMPB be the expected squared distance of a random point M to P where that random point is drawn from P. And we also define the empirical loss, empirical of M to be the sum over all I equal from 1 to S of the distance of XI to M squared by S. This is parallel to empirical error and true error. The loss corresponds to true error empirical error compares to empirical error. And we define the sample complexity to be the smallest number S such that there exists some rule A which when given X1 X2 up to XIID from P will be able to produce a manifold ME. This is the manifold that the rule produces. Whose performance in terms of loss is not much worse than the best possible manifold in the class with high probability. So this is the usual formulation of statistical learning theory. So that's our sample complexity. And it's well known that in order to get upper bounds on the sample complexity it's good enough to get uniform bounds relating the empirical squared, empirical although loss and the true loss, and the reason is that suppose I could guarantee when I take S samples the average square is over all manifolds to hear the manifolds on the X axis, is uniformly close to the true loss. Then instead of optimizing the true loss, which is what I want to do, I could optimize the empirical loss and get an answer that is pretty close. If I knew that both of these are uniformly close with high probability. So it's a sufficient condition to get a uniform bound of this kind in order to prove sample complexity bounds. And we are able to prove such uniform bounds over the space of manifolds and show that if S is larger than V times 1 over epsilon plus 1 over tau to the D, plus log 1 over delta by epsilon squared then with probability at least 1 minus delta, the loss of every manifold in the class of interest, that is this, is close to the empirical loss with the samples. So this is a uniform bound over the space of manifolds. And the way we prove this bound is by approximating manifold as point clouds and then considering a point cloud as the center set of K means and then using, getting a uniform bound over K means, which is independent interest. So the idea is that you have a manifold but you can actually approximate it as a set of points, sufficiently small scale, and then view them and then declare that the manifold is actually not this manifold but discrete set of points. And then talk about the set, the distances to the discrete set of points but then you're actually in the setting of K means. And here instead of proving manifolds you can just as well approve a uniform bound over these point clouds and this is in fact a stronger statement, because every manifold can be approximated by a point cloud and the distance but the converse is not true. So we're proving the harder uniform bound when one is proving this for K means. And the question for K means after a little bit of linear algebra becomes a question of proving uniform bound over functions of the form minimum over IAI.X and this is because the function for K means is minimum over IAI minus X norm square because this nearest point squared that's the K means loss, and you just open it out, get rid of the X term, X square term because that is common. And get rid of the AI term because that is fixed. You get something like this. And this however was in high dimensions because you're working in a Hilbert space, whose dimension we have no control over. So in order to bring this down to a dimension we have control over we use the Johnson Lindenstrauss lemma and randomly project over S dimension squared dimensions and there by the Johnson Lindenstrauss lemma all the essential geometry characteristics are preserved and we can do our analysis on this lower dimensional space. And that's essentially how we get a sample complexity bound of our K over epsilon squared or S over epsilon. And so this is actually the complexity of that function class. It's called a fact chartering dimension. And we can show that this fact chartering dimension is bounded in this way in terms of K and epsilon for K means and this leads to a sample complexity that is K over epsilon squared, some logarithmic terms. So this uses a number of results in empirical process. So in K means there was this lower bound that was known to the form K over epsilon squared log 1 over delta by epsilon squared to [inaudible] what it says is you really need this many samples before which if you do K means on the sample, you can expect to get something meaningful. If you have less than this number of samples, then just bring K means on that is not going to give you something good in the long run. So this is a lower bound. This upper bound was proven by [inaudible] which is from K squared over epsilon squared. This means that K squared over epsilon squared samples suffice. If you have this many samples and you do K means on that you can estimate to get something reasonable. And our bound improves upon this in reasonable when K over epsilon is reasonably small, because there's a log in front of it. So we get something new for K means as well. Now let me move on to the algorithmic part. A consequence of this uniform bound is that instead of fitting to a measure, which is what we were asked to do, fit the probability distribution, fit a manifold to probability of distribution, we can instead just sample that probability distribution and fit a manifold to that, because of this uniform bound that we have. So this has -- this allows us to do certain dimensional reduction. you can in fact search for manifolds within the fine span of the samples and you don't have to search for manifolds in the entire Hilbert space. So this uniform bound leads to a dimensionality reduction of that kind. And And so now that we tell you something about the algorithmic question. To hear the question is given N points X1 to XN is there a manifold with bounded dimension volume and reach such that the average squared distance of a point from that collection to M is less than epsilon? So we have to do an optimization over space of manifolds, given N explicit points. There's no randomness now. So we have a theorem that says that, yes, you can do this. It has an exponential dependence on all the parameters of the manifold. But it will output yes if there is a manifold of this class of interest, and it will output no if there's no manifold, not just in the class of interest but in a somewhat larger class of manifolds. And here's the sort of outline of how the manifold goes, how the algorithm goes. So first you observe that any manifold whose dimension is bounded by D, volume by V and reach bounded by tau, is almost contained inside and our fine subspace of dimension NP where NP is some packing number of the manifold. And the reason is you can take a fine net of this manifold, take lots of points in that, and take and the fine span of those points and now the manifold will almost lie in that fine span. And this reduces the dimension to NP. And now you exhaust -- now we're working in some relatively slow dimensional space and we're going to exhaustively search among all candidate manifolds in some sense. So the space of candidate manifolds of course is a continuum. So you can't do that with a computer but you can look at every manifold in a small part of it. And that's going to be the idea. So you first take an evenly spaced set of points. You're going to have to take lots of these. These are kind of like gross approximations of the manifold we hope to find. Then in the neighborhood of this you define a vector bundle, which for every point in this neighborhood takes a fine subspace of dimension N minus D where if D is the dimension of the manifold. N is the dimension of the ambient space. These subspaces are meant to be normal to the manifold. So they have complementary dimension. But there is one subspace for every point in this neighborhood. So there are lots of these -- this is called a vector bundle over the tubular neighborhood. The base space is this tube, and the fibers are all look like the normal bundle. And then we define the [inaudible] manifold of set of zeros for specific section of this vector bundle. A section of this vector bundle is a function which for every point assigns a value to the corresponding fiber. So the set of all points which are mapped on to themselves would correspond to the zeros of the section. So this is a specific construction. And then once you have this putative manifold so this is the set -- this is the putative or guest manifold, obtained from the zeros of the section, you try to adjust it for the, in a lateral way to get the actual manifold. And there we're going to do some sort of locally we're going to try to use function approximation, use optimization over functions to get what are called local sections or things that are defined locally on the manifold. And this involves with these inequalities, which I won't go into. And finally you patch all of these local sections together to get a manifold. So here is one local section. Here is another local section. Unfortunately, they don't agree. And that's the whole point. But you take this vector bundle and then you average out using the vector bundle. So you kind of average out on these and then you get a global manifold. So that's the algorithm. And some concluding remarks. We have an algorithm which is more of a theoretical framework for testing the manifold hypothesis, and we have improved sample complexity bounds for K means. So we'd like to make this practical and test on real data and understand how to make this nonperimetric where you vary the reach and volume and some intelligent way with data. And how would one improve the efficiency, for example, better optimization or sections of the vector bundle and how to understand the rule of topology in optimization question. Thank you. [applause] >> Ofer Dekel: Time for questions. >>: The condition you define reach how is it different from curvature? >> Hariharan Narayanan: So this reach involves global aspects also. So, for example, if you took two parallel lines at a distance of R from each of them, the reach of that set is R over 2. Because there are points at a distance R over 2 that have two near rose points to that set. So it's not just a local -- it measures nearness to self-intersection. >> Ofer Dekel: Any other questions? So I had a question actually. So it would be really cool -- I mean, you described defining the metaphor. And if I have some classification task or some type of decision theoretic task, and I want to exploit the fact that I assume the data is on the manifold, can I take this approach and couple it with -let's say specification algorithm and then take advantage of that. >> Hariharan Narayanan: Hopefully, yeah. At the present moment that guest manifold mutated manifold is infinitely defined as set of Z over function. There in order to actually yield on the manifold you'd need to do something iterative optimization to get the manifold. But I think -- I hope so, too, yeah. >>: So in the question anchoring testing the distance of the manifold you had both reach and epsilon, points B to B, seems the error and the reach somehow you could find -- that you would relax reach and error could potentially be used >> Hariharan Narayanan: That is true. But that trade-off, I think, depends on the ambient dimension. So I don't want to bring in the dimension of the Hilbert space, because actually all of these reserves work in accountable dimension, too. So I agree with you. But if you want to try to fill space in a Hilbert space, then you really need a lot of volume. So, yeah, I agree with you. But the bounds would depend on the ambient dimension, yeah. >> Ofer Dekel: Let's thank the speaker. Sorry, one more question. >>: Just ask one brief question. Since you have these constant C that enter into the sample size, can you talk about how big those are, if they're like 10 to the 30th it's not practical, but if it's 5, then it is >> Hariharan Narayanan: So those constants depend on the intrinsic dimension of the manifold. And they're exponential in that. >>: If we go back to some of the isomap original papers where they were showing the two dimensional manifolds in 3-D space, but so for B equals 2. >> Hariharan Narayanan: I hope that it won't be too bad. They're explicit. They're not -- they're things that you know where they come from. >>: Right. It would be nice to know how big that would be >> Ofer Dekel: Okay. Let's thank the speaker one more time. [applause]