1 >> Sing Bing Kang: Good morning, everybody. It's my pleasure to welcome back Daniel Keren. He last visited us about nine years ago. So Daniel got his Ph.D. from Hebrew University in '91, and since '94 he's been with Haifa University. He has a wide variety of interests, his main interest being machine learning, regularization theory, loss distributed systems. But he has also done a lot of work in a vision graphics. So Daniel. >> Daniel Keren: Okay. Thank you very much, Sing Bing. It's definitely been a while, and I realized when I got back that only Sing Bing and Greg Zielinski are still here from the original group, which was when I last visited, so I'm definitely below the [inaudible]. But it's very nice to come back. As Sing Bing said, my perspective is more mathematical, vision and machine learning oriented, so I hope I will not misrepresent any other work in graphics. I'm not really a graphics person. But the work does seem relevant to computer graphics. It's published in a graphics journal and it's a joint work with Craig Gotsman, whom I assume many people now know, and joint graduate student Roi Poranne. So feel free to interrupt and ask any questions in any way you want. So this is an example of what we're doing. MaD is Mahalanobis distance, and of course I assume that even if the younger crowd are familiar with Alfred E. Neuman. So the input to our algorithms, the one which I will describe, is a set of points, either in 2 or 3D. And the output is a function which is constructed so as to be smaller on the data and large everywhere else. So as opposed to the common wisdom of implicit fitting in which the function is -- attempts to obtain a value of zero on the data and nonzero elsewhere, and typically nonzero would be a different sign inside or outside a curve. This is different. The functions which we fit are everywhere positive. And the set which we try to approximate is characterized by the fact that I said that the function attempts to obtain a minimal value on the data, and I hope to be able to convince you that this has some advantages. So this is the overview of the talk. We will talk -- ah, [inaudible] nice to see you. So a little background. Our function, the surface extraction. And I apologize in advance, I will not talk too much about that. Results. And an extension which we hope to pursue which puts a very, very different spin on this probabilistic scheme. And it has the potential we so believe to very broadly generalize all these radius. So the input is the same input to many works which a lot of you must be familiar. You're given a bunch of points, and you want to reconstruct a nice surface. I already chose this. I assume this is a -- I hope it's a common example. And you want to end up with a surface. And so this is the problem. And some of the background I will really just glance over. 3D point clouds can come -- if you're doing vision, they can come from stereo or they can come 2 from 3D scanners. Of course you have the Kinect. There are many, many sources for that. The famous Michelangelo project. And these are things which you all know. Another example. So we're giving this huge set of points in 3D and we want to fit a surface. And the question is also of interest in 2D. This is a very well-known problem. A great many papers. Voronoi based. Other papers which you must know. Implicit functions, which is something which is something which has always interested me, and I actually have some previous work, some of it very old, some of it not as old, including with Craig Gotsman. And so -- and this may resemble it, but, as I said, it will be different. For a very typical solution for the implicit approach -- and, by the way, when I say implicit, the model can be anything. It can be an implicit polynomial, it can be something which is built from radial basis function. It doesn't matter. The principle is the same. The guiding is the same. You have a function defined from Euclidean space to the reels and you try to -- for it to set aside the following; that it obtains negative value outside of the shape, a positive shape inside, and zero on the shape. And then you extract the surface as the set of points in which that function equals zero. And then immediately you will run into some problems, and I will make a [inaudible] example. Suppose that you said in 2D looks like this. Okay. And this is not a mistake. This is part of the set. Okay. It's not clear to see what function you will build, a reasonable function which will have the property that it is -- obtains the value of zero on this set, which is not only disconnected, but which is not closed. Usually these things tend to close the shape and very often you will run into extraneous components. It will somehow try to close this by adding an artificial part, which is not -- which you don't want to be there. So this is one of the things we aim to solve. Another example which is more difficult to do, although I will show it in the experiments, is suppose that you have a wire in 3D. And you have this wire in 3D and you want to describe it. Okay. Typically, if you have a function, from R3 to R, that's a reasonable function. And you look at F minus 1 of 0, okay, all the points which go to 0, usually this will always have co-dimension 1. And as we would show, when you try to use radial basis function, et cetera, to describe a wiry thing, a 1D entity which lives in 3D, it gives you a huge extraneous set. Okay? It gives you an entire surface which does contain this wire, but it contains -- most of it is redundant. You don't want it. And this is also a problem which we can solve using our approach. So this is the -- was the main motivation at least when we started it. And I will concentrate on that. So you extract the 0 set and you -- all of you know about the [inaudible]. You use marching cubes, et cetera. And now let's talk about how you construct the function. So typically you use radial basis functions, which are good because they're local. I and some other people, we -- I did a lot of work on implicit polynomials, but they have some problems because it's a global model. So radial basis functions are definitely very popular and quite appropriate. 3 So this is how it looks. Okay. You have your points. Okay. And you build something like this, which is a combination of the radial basis functions. These are the centers. The centers may or may not equal the entire set of sample points you have. You may take less. If you have a million points, you may definitely want to take less centers. But that doesn't really change anything; only makes your function simpler. And you have the coefficients. And here you add something which is usually constant or linear. Okay. And this is of course very, very well known. And, by the way, I'm mentioning it because I hope to make further connections later. If you think about an SVM, a kernel classifier, it also looks like this. Very often you have the function, which looks exactly like this, where this is typically a Gaussian, and you classify things. For example, for the set I want a set of images or text or documents or whatever I want to recognize are those in which this is positive and the ones which I want to reject are those in which this is negative. So does a very nice and simple relation between surface fitting and classification. And Bernhard Schölkopf have done a lot of work in -- a lot of very nice work in this direction. And I will also hope to [inaudible] have to relay this to recognition. And then you extract the zero set after you do this using various algorithms. Now, the question is of course how to find this function ->>: I have a question [inaudible] that doesn't give you a function that's positive one side, negative the other side. >> Daniel Keren: Well, it depends on the sign of the alphas. Or maybe I misunderstood the question. But the alphas can be negative of course. So -- okay? >>: And it's still negative on both sides? >> Daniel Keren: No. No, no. The alphas, some of them can be positive; some of them can be negative. Okay. I will talk about that. Although, our functions will not have this problem. They're all very positive. But in the common wisdom, this is [inaudible]. I mean, you can think about it as a polynomial. If you're looking at a simple locator, which is an implicit polynomial, then assume that all your points lie in the positive quadrant. Then still some of this will be positive, some of the coefficients will be negative, and you will get a function which can be positive, negative, and [inaudible]. So, yeah, that's possible. Okay. Okay. So how do we find this function? You have your point cloud, X, and you -- you want to satisfy this equation. Okay. You want to satisfy this equation, which means that your function is zero on the set of points. Okay. So you have your sample. I'm doing things of course in 2D, but it generalizes immediately to 3D. You have this. And you're looking for a function which obtains a value of zero on this, which is pretty easy, of course. You just take the whole function [inaudible] interesting. So you 4 solve a linear system. You just substitute the points in the function, and you demand that the result at every point will be zero. This gives you a set of equations and these coefficients. Now, it's sometimes confusing to people. The equations are linear. The fact that the function phi are not linear, that's not a problem. The system is of course linear in the coefficients. And you solve it, but this of course yields nothing. Because, for example, you will simply get the zero solution. Okay. So this is -- you obviously have to add some constraints. And this is where it starts to be more difficult. Very typical constraints are the following. Okay? You look at this. Listen, if I just want to obtain a value of zero here, what does it mean. I mean, usually I will not get anything meaningful. So people start doing things like this. They add anchor points. So they saw a bunch of points here, and they demand that the value of the function in these points will, for example, be negative. And then they -- for a bunch of points here, and they demand that it will be positive. And then you're guaranteed to have a function which firstly is nonzero, it's not trivial; second, you're saying I will force it to obtain a value of zero only on the sample points. Okay. And if I demand that outside to be positive and inside negative, it means that it will go through the points. Okay. And it will hopefully obtain a value of zero only on the points. Of course usually you still have many problems. One problem is of course that you have to provide these anchor points, which is not necessarily trivial, because your input is just a huge cloud of points, a messy cloud of points in 2D or 3D. Secondly, let's go back to this example. Okay. What will you do here? You want your function to be zero only on this patch which is not closed. It doesn't have an inside and an outside. So you cannot apply this trick. So it's definitely not obvious how -- what you will do. I mean, you can just say, okay, I'll just throw a zillion points all around here and demand the function to be positive everywhere. But that's not really clear how do it. Okay. This method, which is very common, the implicit fitting, really works better -- is really good to the place for the cases in which you want to fit something which is closed. Okay. And I skipped something. Sometimes if you have normal information, if you have the normal sort of surface, it of course helps you to find these points. Okay. If you want to find a point inside, you go along the normal, the correct direction, assuming you can find it. It's not trivial. And if you go the opposite direction you find points outside. But then you need normals, and normals are not always trivial to obtain. Okay. So you get these kind of results and you can try to improve them and if you don't do it good, you get something like this. And if you do a better job and you really very carefully compute the normals, you get something like that. Okay. But it requires a lot of hard work. And, as I said, it still will not work for this case or for the case that the code I mentioned is larger than 1. Okay. A different approach which is -- by the way, oh, this is not what we did, this is old stuff -- is 5 simply you say that I don't want -- I want to exclude the zero solution. Okay. In order to exclude the zero solution, when I say the zero solution, I mean of course to this set of equations. Then I would simply do a very stand-up thing and demand that in addition to satisfying a set of equations, then the norm of the solution will be equal to 1. Okay. And so it can advantage, but this still has a lot of problems. Firstly, it's a heuristic. There's no really -- no theory behind it. It's just a heuristic which may work and may not. And still you run into these spurious components. Okay. By the way, often in the graphics literature you don't see these spurious components because they are hidden by the surface itself. And some people in the literature have acknowledged it. You see it when you look at the simple case of 2D. Then in 2D you cannot hide anything behind the surface and you see these extraneous components very vividly. Okay. So now we want to generalize this. Are there any questions so far? Okay. So we look at the set of functions, of basis functions as defining a map from, for example, if we are trying to fit a curve, X lies in the plane, and we project these points into a much higher dimensional space. Now, this is actually quite simple. You have your basis functions, okay, which are many functions in two variables. For point and how to, simply is projected into this higher dimensional space. The definition is very simple. You take your, for example, 100 functions, and then it's projected into R100, with an ith coordinate, simply the value of the ith basis function at your point. Now, this is of course a very old concept. When you do kernel SVM, this is what you do. Right? You're saying I cannot do linear SVM in Euclidean space. It does not -- it's not good enough. It's not rich enough. I saw the points to higher dimensional space, and I classify them using a linear classifier in the higher dimension space. And this is a very famous camera trick which Vapnik invented quite a few years ago. So this is how it looks. Okay. And what you do then, you find exactly as the case I mentioned for machine learning: you try to find a hyperspace in the higher dimensional case. Okay. In the higher dimensional space, I'm sorry, not case. And then it's -- there's a hyperplane which goes exactly to all the points, right, and therefore it has a linear equation which describes it. Right? It's just a hyper -- hyperplane is described by one linear equation. And then your equation for the original space is constructed through the high dimensional space. Okay. You take the points, you slow them to a very high dimensional space, you find a linear equation in this space. The linear equation here is of course nonlinear here because the basis functions are nonlinear. But still you can do it quite easily. But it's not good enough for the reasons I mentioned before. But this is just -- there's still nothing new here. This is just standard obvious, right? Okay. So and this is the idea behind this. Okay. So now if you think about it, your coefficients, the coefficients which we set out to look for or to find are simply the normal to this hyperplane, right, because the equation of a hyperplane is simply defined of course by the normal, by the normal vector, and it's very simple. And you add this of course because it doesn't have to go 6 through the origin, so this is quite simple. Okay. And you can -- and this is the matrix you get. And it's exactly the same. So so far nothing new. So what are we going to do to extend this idea? Okay. What we're going to do is we're going to take this bunch of points at the high dimensional space and obtain a type of approximation to them. Okay. And why are we doing this? I will try to make a simplistic sketch here to explain. Okay. Although it does appear on the slides, but I like to write things. So suppose that this is your bunch of points. And you project them to a very high dimensional space. And you find the linear equation in this high dimensional space which satisfies the following. All the points here go into this hyperplane. The problem is of course that many, many other points here will also go into this hyperplane. So your solution in the original space will include these points. And this is of course not good. This is where -- this is the source of all these extraneous components which we want to get rid of. So what we're going to do is we're going to take this so-called hyperplane, which is usually just a bunch of points in the high dimensional space, which we sketch here in the Euclidean space to make it visible, and we're going to approximate them not with one hyperplane, but with many hyperplanes. Okay? And usually you can do it. Right? Because this is a tiny set in a very, very high dimensional space. Usually you can find many hyperplanes which go through it, which contain it or which at least approximate it very, very tightly. So why not use all of them? If you use all of them, you'll have a much better chance of ridding yourself of the extraneous components, because now you have a much harsher condition. Let me try to demonstrate. This is the optimal hyperplane which RBF gives you. Still any point of the plane, as I said, which goes into this hyperplane will be misclassified. If you have a point in the plane here, it goes into that hyperplane, it will be misclassified because the algorithm thinks that it belongs to your original set. So let's try to get rid of it. How shall we get rid of it? Very simple. We'll find it more hyperplanes. By doing all these hyperplanes, don't forget they exist on the high dimensional space, which can be, by the way, very high. It can be thousands. It can be hundreds of thousands. But still it exists. Now we're going to demand from a point in the original space to be mapped not only to this hyperplane but also to this hyperplane. Why does it make sense? It makes sense because we know that the original points, the points which were given as an input to the problem, do satisfy this condition, now mapped into this one and this one. By the way, when I say into, can also mean of course very close to it is good enough. So now we'll -- by demanding, okay, but to accept only the points which are mapped into these many hyperplanes, we're making it less and less likely for extraneous components to occur, right? Question? >>: Yeah, I mean, you may be addressing this later on, but what happens at the points of [inaudible]? 7 >> Daniel Keren: Oh, we have example in which we added a lot of noise. Yes. Yes. Yes. Noise will usually not be a problem unless you -- it may be a problem in principal, the function which throws everything to this very high dimensional space. It's a very, very nasty function with very large derivatives, in which case a small noise in the input will create a lot of noise in the output. But usually you don't use these functions. Use radial basis functions, and they're pretty well behaved. Question? >>: This seems like [inaudible] if the points occupy some significant lower dimensional subspace of the higher dimensional space. I mean, you want all these planes to approximate each set, then there's some -- they must have some -- the intersection is going to essentially be the same [inaudible]. >> Daniel Keren: Yes. And the point is that for any reasonable example, this is exactly what will happen. Because this space is really ->>: Meaning the actual surface [inaudible]. >> Daniel Keren: It doesn't have to be a nice clean surface. The point is that you're mapping things into a very, very high dimension. Okay. So let's think about the following question. You have a bunch of points in R -- in 5,000 Euclidean space. Okay. What is the probability, quote marks, I'm not saying that I can really -- I mean, we can maybe compute it, but never mind, just waving hands, what is the probability that you cannot find two orthogonal hyperplanes which goes through this punch of points. Well, we know the answer is very simple. You compute the SCADAmetrics and just a do a PCA decomposition. Now, you have to be extremely unlucky, okay, so only for the eigenvalues of this huge matrix. If all of them are large, then you will only have a one-dimensional -- I'm sorry, a single -- a hyperplane of co-dimension 1. But usually this is not what happens. Usually many of the eigenvalues are very small. And if they are small, it means that you can find many hyperplanes. I will relay these two concepts ->>: I'm not quite sure where you're going with this. But is that sort of -- that's where you're going here, you take this kind of PC and get some lower dimensional subspace [inaudible]? >> Daniel Keren: Exactly. And I'm actually going to weigh these hyperplanes according to the eigenvalue. Exactly. Yes. Yes. Okay. Yes. So now suppose that we can find these hyperplanes. And this is just absolutely technical and not very complicated point. And now we're looking for a function which satisfies the following, which satisfies that it tries to be small, only on the points which belong to all these hyperplanes. So think about it as a very simple question in geometry. You have a bunch of hyperplanes in very high dimension. And you're looking for a function which is either zero or very small on all of them. If you do a direct extension of the RBF, RBF does -- it does exactly that. The problem is that it takes only one hyperplane, and that's the reason for this great redundancy and many extraneous components of it. But we can typically use many, many hyperplanes. 8 So how would you define such a function. There are many ways to define it. The simplest definition, which we work with, with a slight modification, is the following. We want a function which is equal to zero, exactly on the intersection of the two hyperplanes. It's very easy to define. You just take the distance from this guy, the distance from this guy, square them and add the squares. And then of course your function will be easy, or only on the intersection. And this is what we do. This is the algorithm. That's it. There are all kinds of small modifications. One modification, and this directly relates to your question, it may well be -- I mean, it will always be of course that -- at least typically that this hyperplane is the best approximation. It really passes exactly through the points. And this one has a slight error. It's very close to the -- by the way, of course you have to enforce and to be different [inaudible]. So usually the first one will be best, the second one will be slightly less good. So when you add these squares, you simply weigh them by the goodness of fit to the points which is just measured very simply by the eigenvalues of the SCADAmetrics. By the way, when I say SCADAmetrics, I mean of course in the high dimension. Okay. So does this make sense? This is the crucial point on -- okay. So this will give us a function which tries to be small only on the data. The nice thing visually, you can look at each of them separately, because it's just a weighted average of what? Of a function which is defined by this hyperplane and another one by this. And suppose you have -- I think I have an example with nine. So the nice thing you see is that each of them is not good. Each of them has extraneous components. Okay. One extraneous here is measured by the fact that you get a small value, but on the bad point, a point which is not in your original set. But when you add them, okay, then each extraneous component appears only in one or two of these sums. And when you add all of them, you get something nice and you rid yourself of these components. So this is it. The entire idea of the talk here, except for some extensions I will talk about at the end, is this. That's it. Okay. So you just do it and you use a field in and the algorithm proceeds quite similarly, actually, in the beginning to RBF, the standard RBF. You take a bunch of centers. I will skip this because this is really very simple stuff about centering the point so you can -- you just [inaudible] average end of it. But this is a trivial thing. And then eventually you define your function like this. I hope you see I try to say in words some which -- the line with the equations. This is simply the sum of squares of the distances from these hyperplanes. And it is simply equal to this one where alpha I is of course the normal to the ith hyperplane and the high dimensional space. Okay. And this is the correction I said, and this was exactly your question. To correct it we weigh them by the goodness of set. Okay. Suppose that a tense hyperplane is really not that good. It approximates the points, but not as good as the first one. So obviously you want to give it a smaller weight. Unfortunately you don't have to write your way in on how to compute this weight. It simply 9 comes out automatically from the SCADAmetrics, okay, of the points in high dimensional space. Okay. This guy will get of course a high weight, a large weight, so the distance from this one squared will be multiplied by a large coefficient. And okay. Where am I going? What happened? Oh. Okay. Probably went backwards. And this is of course a very bad fit. So it will be entered into the function, but with a very small coefficient, a much smaller coefficient. And you can see what the coefficients will be, and they are simply the eigenvalues, okay, of this distribution. It's very straightforward to -- question? Yes. >>: [inaudible] seems like the smallest eigenvalues wouldn't correspond to any [inaudible] is there something different? If the hyperplanes you select are the ones with the smallest eigenvalues, doesn't that automatically provide [inaudible]? >> Daniel Keren: Oh, yes. Yeah. Yes. >>: So what is the [inaudible]? >> Daniel Keren: Oh, no. This is not a good fit. Ah. What happens -- okay. Okay. Okay. I think I understand the -- okay. You choose the one with the smallest eigenvalues. But the one with the smallest eigenvalues when you build your function are given a higher weight. Because they're better. Am I making sense? So at some point you will invert that matrix. Okay. So the good ones are the ones with the smaller eigenvalues. But when you build your combination, the smaller the eigenvalue, the better fit it is, for which it should be given a higher weight. Okay. And all this is encoded, by the way, in one very simple formula. So when I get there I hope -but that's actually a correct point. Yeah. Smaller eigenvalues are a better fit. So you weigh them. The smaller the eigenvalue, the higher weight it gets. Okay. So this is -- this may be minus PCA, it may not, so we remind people who are interested in machine learning of kernel PCA. This is really [inaudible] scale of PCA. And after on the -- you do the computations and you get a very, very simple equation. This is how it looks. Of course, this is maybe a nasty equation to compute because this may be of course a very large matrix. Remember that the size of this matrix, okay, is the -- of course the number of basis functions. And if you have to compute this creature at every point, it may take a lot of time. But you can do something very simple, and this is what we do to save time. Okay. You look at this, and what you do, you do just an SVD. You do an SVD decomposition on this one, and you take only the larger eigenvalues. So this is what I said, and I hope it will answer your question, because of the inverse here when the smaller eigenvalues are given higher weight. Exactly. Okay. And so you can eventually if you take -- usually we take 10 percent or 5 percent, it depends, but we don't of course take all of them into account. We take into account only the good ones. Okay. 10 Now, if you take only the first one and you drop the square root, okay, you just get ordinary RBF. So we can think about the standard RBF as a degenerate subcase of this approach, in which you take only the best, quote mark, hyperplane. Okay. And this is how the function looks. So see that this is the RBF function. And, by the way, this is not the curve; this is the function itself. And here you can see very, very clearly these extraneous components of the RBF, which I mentioned. You see here yellow is small. So you see it is obviously small on the set. But it has only the extraneous components. And this of course took me back many years ago to when we were fitting polynomials and running into all these extraneous components, and we had very few heuristics to get rid of them. But this is much, much vigorous than -- yes. >>: So if you took all the points here and projected them onto your hyperplane, and then you took your extraneous points and projected them onto the hyperplane as well, are they separated well? I mean, could you actually just do like a Gaussian mixed model on the hyperplane space to kind of say where ->> Daniel Keren: You could ->>: Is there a reason why we're doing this linear plane? >>: What you're saying is correct. It's a different approach, which was used by Scholkopf in his group. And it is possible. This is a simpler solution, and I will later show how it can be extended to something which encompasses both [inaudible]. But yes, you're right. You're right. Yes. The problem is of course with if you simply -- I mean, your question touches exactly on what I tried to hit on before, looking at this as a classification problem, you can think about fitting as a classification problem instead of text about soccer and text about basketball, which you are to classify using current SVM, you have a bunch of points here, and all the rest are the bad points. The problem is that as our classification problem -- if you try to directly apply standard kernel SVM to it, it's not an easy problem because once it is degenerate with a very low dimension and the other set is everything else, the entire plane, so you can think about it as a classification problem in which class A is this degenerate thing, one dimensional, and the other class is all the rest of the plane. So this is not easy for kernel SVM to handle. But what you said is correct, you could try to do that. But usually you try to separate when you do machine learning two things with each of them which has volume, kind of nonzero volume. You have a chunk here, a chunk here, and you pass something in the middle. And this is a different case. But that's a good point. Yes. Okay. Another example, and this is why I was talking about things which don't close, is the spiral. Okay. You look at the spiral and the points. You can see them here, of course. And this is our function. Okay. Our function, it does miss a little here, but you can see that it sticks quite tightly to the data, by which I mean it is small only on the data, and it increases as you move aware from it. And the RBF has this problem that it always tries to close the shape, and this relates to what I 11 said before, you have a function, a nice analytic simple function, and you look at all the points which are are mapped to zero. The inverse will typically be a closed curve. And this is why we chose a spiral for the demonstration, because it's not closed. Okay. If you just fit something very simple, RBF will do a better job. Okay. So -Now I'm going to completely skip the surface or cube extraction. I'm just going to say -- because I'm running out of time, and I'm going to say one word on it, is that after we get a function, okay, we have to extract the curve. So the curve here is simply all the points on which the function small. So there are many ways to extract it. It's different from RBF, because RBF, it's not a question of being small, you extract all the points in which the function is zero. This is different. So Roi, the graduate student, did a lot of work on it and eventually converged on something which works very nice on one and two and three dimensions using the watershed transform. There are many algorithms he fitted with a few, and the watershed worked best. But, as I said, I'm going to be unfair to Roi and I'm going to skip that. But just bear in mind maybe the major difference here technically from other work on surface fitting is that this is an unsigned distance. There's no inner part and outer part to the curve or to the surface. And the function is positive everywhere. So that's, as I said -- so you cannot apply many techniques which are used to extract a zero set. You cannot apply them here. It's a different thing. And as I said, he used the watershed and some examples. And as I said, the zero set of a signed function will always have co-dimension one. This is what I meant when I was talking about fitting this wire in 3D, our approach allows to do it. And this is one example. So here it took a bunch of points like this, and this is the RBF fit. Okay. Down in Israel I would say that this is a [inaudible]. This reminds of some sweet thing that you buy in Israel. It has exactly this shape and color. And what it does, you maybe have something not here too. And you see you have this curve, but because of this deficiency of the RBF, it gives you this set, which does, as I said, contain this curve. So this relates to the questions you're asking, look at the point here, why does it appear here. It appears here because this projection to high dimensional space also takes it into the bad -- into where it shouldn't go, into the same hyperplane. So therefore in the original space it is classified as part of the set, but obviously it's not. And this is what our result is. We have improved this. You see, it's still not easy. It's not easy to extract a 1D thing in a 3D environment, but our results now are better. So -- but I will -- okay. And these are other examples. And this is what I promised to show you. You can think about our function as the sum of square distances from any hyperplanes. If you look at the square distances of each of them, here are the source nine hyperplanes. Okay, by the way, you remember of course this is the function which measures the distance from the hyperplane and the high dimension, therefore it's in the original dimension, in the low dimension. It's very strange. It doesn't look like a distance function because it's defined in [inaudible]. 12 >>: [inaudible]? >> Daniel Keren: How many we take? >>: We don't know how many points there are, so consider [inaudible]. >> Daniel Keren: Oh, yeah. We take far less than the number of points. >>: [inaudible]. >> Daniel Keren: We did a lot of work -- I mean, we did all the experiments, of course, and it's not really sensitive to the exact ->>: Just to give me a sense, is it like 20, 30? >> Daniel Keren: Yes. 10. 10. I would say that 10 is typically [inaudible]. Of course, if you'll have -- okay. Suppose that we go crazy and we want to fit a wiry thing in 500 dimensions, then obviously we'll need much more, right, because you'll have a huge number of dimensions for these extraneous components, and you will have to chop out all of them. Okay. But this is not something which you usually do. We didn't go beyond co-dimension two, because in graphics applications, that's the highest co-dimension you can have. So 10, 20 is a good number. By the way, when you go above a certain number, the performance doesn't change. The limitation on the number is simply to make the computation more efficient. It doesn't matter -- I mean, you cannot take too many. That's what I'm saying. If you take -- if 10 is good, and you use 30, you still get a very good result or you will just do three times more computation. >>: [inaudible] some of these higher ->> Daniel Keren: Yes. So it's quick and easy to decide where to chop that. But the good news is that, like I said, if you take too many, you're only wasting computation time. You're not winning the performance. Yeah. Yeah. >>: [inaudible] some indication from the PCA of whether the input set points best [inaudible] is best fit by a surface versus a curve? >> Daniel Keren: Yes. Yes. You would expect to see a different rate of decrease of the eigenvalues. Everything is encoded in these eigenvalues, and the speed by which they decrease, the rate by which they decrease. >>: But in the previous example of this, the spiral increases [inaudible] you could detect that it's much improved? >> Daniel Keren: Yes, yes. Because you would have two very, very small and nearly identical eigenvalues. Yes. Yes. And the RBF will take only one of them. So, yes, exactly. Yes. You definitely see it. Yes. Yes. Okay. So what happens here is that each of these by itself is not good. It does have extraneous 13 components, like RBF has. Remember this is just RBF. Okay. When you add all of them, you're left with all these extraneous components. Now, I begin by criticizing RBF that it doesn't have a theory. So you may rightly ask, okay, this is very nice, but do you have a theory? The answer is yes. I can prove it will take too much time. But we can prove that if you take more and more basis functions, at the limit, then actually this function will have a very strong property. It will be bounded by a constant which we can determine on the set, on the sample which you get. And it will actually diverge to infinity at any other point. So this is guaranteed at the limit to work. At the limit, you will not have any extraneous components whatsoever. Of course you don't need to go to the limit. But still it's good that this theory is there. But I cannot prove it because it will take me too much time. So of course I'll be happy to discuss it. Okay. So finally the examples. And Roi did a lot of work. He took many examples and he programmed them. Now, as I said, I'm not an expert on graphics, but the results were good as compared to some cases -- I'm sorry, to some other algorithms. Okay. This is ours. >>: Sorry. Do you have any guess why the [inaudible] looks so blurry? Is it just bad tuning of parameters? >>: Looks like, if you go to the next slide [inaudible] is not as dense as the [inaudible] on the right. >> Daniel Keren: Uh-huh. Yeah. I think we got some feedback from the reviewers on the initial submission, and then he went and he improved it. But I'm going to deflect any [inaudible] to the graduate student. But we did get a lot of feedback from the reviewers on how did you do -- how did you run the other algorithms. So maybe this is an earlier version. But he did take it seriously, and he did ->>: [inaudible] the upper right? >> Daniel Keren: I have to confess that I don't remember. >>: Because the results there look remarkably dicey. There's that little black part, which is bad, but it's really doing [inaudible]. >> Daniel Keren: Well, but here you get this very sharp. >>: That's true. You do get sharp. >> Daniel Keren: But -- yeah. Yeah. For some reason I ->>: [inaudible] sharpness and ->> Daniel Keren: Yes, yes, yes. 14 >>: That's true both the adaptive things make more sense. >> Daniel Keren: Okay. So let's go to the other examples. Okay. This example -- as I understand, all these are standard sets that people use, and here the result was pretty good. And, again, there were comments from the reviewers on the application of the other sets. >>: What were they? I can see the triangles from here [inaudible]. >> Daniel Keren: Here there are no -- when you say triangles, what exactly -- I mean, because we don't do any triangulation. >>: My question is [inaudible] on the right, you see the [inaudible] of the match. It's extremely coarse. It's not very refined. >> Daniel Keren: I owe you -- I definitely owe you a reply on the implementation of the other angles. So I'll definitely look into that. Okay. I'll be sure and show the examples of the code I mentioned too, et cetera, in which -- but, yeah. >>: Are you going to talk about the computational complexity of your algorithm? >> Daniel Keren: The computational complexity is usually about 10 times more than RBF. If we take, for example, the first 10 ->>: That's fine. But these are RBF, and yours are above [inaudible] because you're computing all the ->> Daniel Keren: Well, we don't have to compute all of them. >>: Okay. >> Daniel Keren: I mean, you can do -- when you do the SVD, you have an efficient version of the SVD that tells you just give me the ten dominant eigenvalues. >>: No, but the first thing where you compute the kernelize, when you kernelize things, you're taking X minus X [inaudible] apply it to the kernel at the center at each point. >> Daniel Keren: Um-hmm. Yes. >>: And so you're ->> Daniel Keren: Oh, whoa, whoa. You're talking about the preparation of the matrix. >>: Yes. >> Daniel Keren: That is expensive. That is expensive. You're right. You're right. >>: [inaudible] linear the number of points because you [inaudible] it's linear -- it's, yeah, linear in the number points. >> Daniel Keren: Okay. So two replies. Actually we certainly don't make any claims about the 15 computational efficiency. The point is that after you do all this preparation, when you have to compute the function, that is relatively cheap. But there is some offline preparation involved. You are absolutely right about that. And -- but it does -- it did run it pretty quickly for thousands of points in R3. So I assume that it's reasonable. >>: So computation of the function [inaudible]? >> Daniel Keren: Yes. Ah. But we take only a small number. We take only 10. So to compute the function is about 10, 15 at most times more expensive than RBF. But we could definitely write about the preprocessing. Yes. >>: You say it's 10 times more expensive than RBF, but the main RBF method itself to evaluate a point involves a summation of all the Gaussians [inaudible]? >>: Oh, yes, yes. So. >>: So that's a very expensive [inaudible]operation. >>: Oh, yes. But you can -- I mean, I would assume as a nonexpert in graphics that since these are Gaussians, then if we're talking just about the RBF, you can do some very simple space partitioning. And when you, you know, suppose that the centers of the RBF are here, and you have to compute the value of this point, you will simply do some very simple space or subdivision and take only these which are here. Because, I mean, the radius of the Gaussians is usually much, much, much smaller than the extent of the entire shape. Definitely. So you could do that. But here we'll deflect the blame to RBF, not to the [inaudible]. So I will always find somebody to blame. [Inaudible] but you're right. It may take time. Yes. Yes. Okay. And we were also asked by the reviewers to check noise and outliers, and it did again -- I don't know if all graphics applications -- how -- if you would expect such enormous noise, but anyway, he programmed it and he ran it. And for these outliers, it performed reasonably well. And, again, you're the graphics experts; you should tell me whether we should really expect -- I mean, as opposed to computer vision, would you expect such a very high noise in it. But this is just the reviewers requested, they already requested. So this was done. And the results -- again, I'm here deflecting to the reviewers, but the results which they liked, for example, when you go and you sparsify the points more and more, okay, you get here where really you see a very -- I mean, a relatively small percentage of the points. And I'm not talking about noise. I'm talking about a much smaller number of points. Then as the number of points decreased, it did pretty good. Our algorithm still did pretty good. Here, for example, you see it lost one of -- lost this arm. But this is really a very, very severe undersampling, but it did relatively good as opposed to the other methods he tested. This [inaudible] here he took this cloud point, cloud of points, and he took some cuts through it. Okay. You see these are slices, the circles here and here. And, again, I keep apologizing, so not being a graphics person, but this was the result which the reviewers liked the most. So, again, you're much better qualified than me to say whether this is a nice example. But they really liked 16 this sample of the undersampling. I will also skip this. It just -- he also did some testing to see how good it reconstructs normals. Okay. You can look at your function and you can try to reconstruct the behavior of the curve, for example, to find the normal by all kinds of methods, or you can do a PCA. And he used it to compute normals and it did. Oops. Okay. I'm sorry for the false -- oh, no, it is here. Sorry. Sorry. And the estimation of the normals using our function was better on the average than other methods. It especially did better, not surprisingly, when the curve behaves like this. Okay. When you use PCA, et cetera, and you try to compute the normal here, so what do people do? They -- you take a sphere and you take the points of the curve, which fall inside this field, and you fit a plane to them. And after you fit a plane, you can find the normal. It's just a normal to the plane. But the problem of course in locations like this is that you get this here. This will be included. So when you compute the normal to this point, these points will interfere and give you a result which is absolute garbage. So when you look at the average improvement, it's a bit misleading. It would be more accurate to say that our method and other methods did more or less the same for, I don't know, 90, 95 percent of the points. But in regions like this, we did much better. So this is obscured when you look just at the average quality. So just to mention that. Okay. Now suppose that you do have normals. Suppose that you somehow -- and there are many, many algorithms for computing normals. If you have normals, you can take your basis functions to look like this. Okay. And then the function is of course even better than before. You see that here the function is really, really concentrated as a small value only on the curve itself, and it increases very rapidly as you move away from the curve. Okay. And actually he was supposed to do it for the three-dimensional example too, but I don't remember whether he did it. Because there the improvement would probably be more substantial. I mean, it would force your basis functions to be basically one dimensional, and then you would get a much better result. But this is just, you know -- okay. So this is the -- a conclusion. You can think about what we did as a generalized RBF. It behaved well under noise and outliers. For some cases the improvement over other methods was very good, especially with a host image. I took into [inaudible] check with Roi that, as I said, maybe this is not a version, but he did go and check very carefully the results of the other methods. But in some cases, it was quite good. There is a computational price, as you asked. It's definitely slower than RBF. The good news in that regard is that it's -- when you compute the function itself, which is the crucial thing, it's only slowed by a constant factor, which is about 10 to 20. So it's not too bad in that regard. Now the question is whether I have five more minutes to show you a completely different spin on all this, something which looks absolutely different and converges to the same result but in a manner which can allow to extend it to -- in a very strong way. 17 And this is what we're doing now. So the question is whether I have these five minutes. Okay. I will really -- I apologize. I will really run through this. What we did here is build a function which attempts to be small on the data and large when you move away from the data. And as I said, we can prove that it does satisfy this condition in the limit. So you can think about it as the opposite of an indicator function. An indicator function, like we study in calculus, is something which is equal to 1 on the data and 0 everywhere. This is the opposite. It's characterized by being small on the data. And we can put a probabilistic spin on it, and we can do this. This is I hope not too scary. You look at the function. Suppose you have a family of functions and you have a bunch of points XI. And you're trying to interpret what we did in a probabilistic setting. We put a probability on every function. By the way, functions here can be, for example, all the functions which are spanned by the radial basis functions. You look at this, and look at this very simple definition. You see that if your function is larger on the points, it will be assigned a smaller probability, right, because of the minus sign here. Okay. So this is clear that this is something in the spirit of the same spirit in which I was talking so far. You want to have functions which are small on the data. And this is a very simple definition. Okay. Now you're saying yes, but I want to use all functions. I hope I convince you that there's no unique best function. This is what RBF does, and very often it fails. So it does something very simple. I'm looking for a function which is positive everywhere and attempts to be small on the data. I will do something very simple. I will square all the points -- I'm sorry -- I will take the value to point and define a G, G of X, will be the sum of the squares of all function in the subspace. So it's an integral of course. It's an integral, by the way, over the subspace of functions spanned by the radial basis functions. And I will of course multiply by the probability of F. Okay. So this is just -- since the probability is defined as F, of course we define it here as a Gaussian. So you can calculate it. I mean, as we all know, the only integrals which you have any hope of calculating in higher dimension, except for the favorite ones, are Gaussians. Okay. Now, amazingly enough, if you -- it's not -- it's a Gaussian integral which is not trivial, but still you can calculate it. And, incredibly enough, it gives us -- it gives exactly our function. The MaD function which I showed, but strange definition with a covariance matrix [inaudible] it's exactly equal to this. Exactly. So this is a completely different way to look at it. But it gives you the exact same result. Now, why are we very energetically trying to pursue this direction? Because once you have this probabilistic formulation and you see that it works good, if you take a very simple probability, it gives you a nice fit, this MaD function. We can start throwing more and more things into the probability. For example, we can throw negative examples into it. We can demand a function, the fitted 18 function to behave nicely, to be smooth. We can add information along the normals. If we want to take this over to recognition, we can use some prior information on the background. We can do many, many things. This probabilistic formulation is much more powerful. There is a slight technical problem, and I will talk about that for 20 seconds and then conclude, is that if you want to enhance the probability structure, you may want to add things here. The simplest example is suppose that you have a point which you really want to exclude from the curve. Okay. Suppose you have something like this. Suppose [inaudible] and it really nearly touches. So probably whatever method you will use, this area will be a problem. So you have to go to it and you have to tell it exclude this point, penalize this point. It's quite straightforward to do. You will look at this expression, okay, and you will add here. You will have plus F squared at that point. Okay. You will penalize the probability for the function obtaining a small value at this point. And this is how you will rid yourself of that point. Slight problem. Once you start adding things here, what happens? This becomes nonpositive definite. If it's nonpositive definite, as we all know, you cannot compute the integral. Like you're trying to compute the integral over R2 of E to the minus X squared minus 0.2 Y squared. I don't know. Whatever. And this of course diverges. You cannot compute the integral. The solution we found, which at the beginning seemed that it would be hopelessly in terms of numerical -- of the numerics of computing it, is to do the thing which actually even without this point, which I mentioned, is obviously the correct solution is not to integrate over the subspace of all functions, the linear subspace, but to limit the coefficients of the subspace to be of unit norm. And if I would have ten more minutes, but I really don't, I would convince you that even, as I said, without this problem, it's the correct thing to do. And now we run into the problem of -- by the way, this is program and it's running. It's not an initial. But it can be done. It can be done even relatively easy. So we ran it through the problem of computing the integral of a Gaussian not over the entire space for which we have a closed form equation, but over the units here in this high dimensional space. The good thing of course is that the unit sphere, never mind the dimension, is a compact set. Compact set you can integrate everything. Okay. You don't -- I mean, it doesn't matter if the quadratic form and the Gaussian is nonnegative definite. You can still integrate it. Too bad, but fortunately there's a very neat trick which allows you to compute the integral over the unit field. There's no closed form expression. But you can reduce it to a one-dimensional integral which you have to do numerically. Fortunately, it's only one dimensional. Second, and this is very important, it involves preprocessing that you do once. And then to compute the function at any point, you don't have to do it. You just do the preprocessing once, it may take some time. But it only involves one-dimensional integrals. And with today's machines to compute numerically a one-dimensional integral is not too difficult. So the idea is to carry [inaudible]. We already have some examples, but they are a bit artificial 19 in a 2D, so I didn't bring them, in which we exclude points on the curve [inaudible] where other methods would glue this together we can rid ourself of them. And that's it. This is just what I've been saying so far. So, to summarize, RBF is a degenerate case of what we did, and both of them are degenerate subcases over much, much broader -- at least we hope it will indeed turn to be more general probabilistic approach. So I've really taken too much of your time, so thank you very much. [applause] >> Sing Bing Kang: Any other questions? So a quick question. You mentioned for us as being one example, but they usually involve many connective components. >> Daniel Keren: Yes. >> Sing Bing Kang: So do you want it -- is it that you separate all of them first before you compute each of them separately, or you can just compute all at the same time, all the surfaces? >> Daniel Keren: I'm not sure -- are you referring to the original set? >> Sing Bing Kang: [inaudible]. >> Daniel Keren: Ah. Okay. No, no, no. The nice thing, this is done automatically. You don't have to separate. Because this formalism contains everything. You don't have to separate and find one function for that and one function for that. No, no. It's done automatically. And, by the way, you can do shapes. For example, suppose that you have a surface and a wiry thing coming out of it. You do everything together. You just throw everything into this machine and it works. You don't have to separate. You don't have to do anything manually. Yeah. Okay. Yeah. Sure. >>: Earlier when you explained that to prevent a degenerate solution of all 0s, [inaudible] traditional points that were just placed off of this surface? >> Daniel Keren: This is done in the RBF, yes. >>: So how do you prevent the degenerate solution in your approach? >> Daniel Keren: Ah, if, for example, you -- ah, how do we prevent it? Ah, because you're working with that matrix, okay, then you're using the eigenvalues. And the eigenvalues have norm 1, so they cannot be degenerate. So it cannot yield a degenerate solution. >>: So for a single hyperplane, then a solution of all zeros would satisfy the linear equation. >> Daniel Keren: Yes. But you don't allow that, because you enforce -- you force your solution to include -- in this case it would be the smallest eigenvector, in the eigenvector of the smallest eigenvalue, but the eigenvectors you always take them to be a few unit lengths. 20 So when we look at only the first part of our solution with one vector, it will be exactly equal to what RBF gives you, when you enforce the condition of unit norm in the solution of RBF, this is exactly what it will give you. It will be the same. You don't have to force it. It follows automatically from the formalism. >>: So there's a linear system that is solved [inaudible] the solution. >> Daniel Keren: Actually, you don't solve any system. You just have a big matrix. And as Rick mentioned, it does take time to prepare it. I admit to that. But you just have a matrix and you just invert it or do an [inaudible] on it. You don't have to force this additional constraint of unit length. Sure. >>: One thing I'm not quite sure is like where does the algorithm switch from reconstructing a 1D thing like your spiral to basically reconstructing the full surface, and the place where I was wondering that when you get the example of the horse where it just [inaudible] just reconstruct all the wires where you cut them through and all the full horse model and sort of like how -where was the [inaudible] control that? >> Daniel Keren: You can control it, yes. What you're saying is I would go even to the more extreme case. Suppose that you're going to the limiting case. Of all these -- of -- you have a discrete bunch of points. And you can think about these -- you can think it's exactly the same conceptually as this being the entire horse and this being the slices. Now, if you go to the limit, if you take a number of basis functions, which are not just the regular basis functions at your point, any family of functions which in the limit, which its union spans all the functions. And then the notion of a [inaudible] space. They span all the functions. Then eventually it will start going to infinity, as I said, I could prove it, but not now of course, it would start to grow more and more between the points. But of course you don't want to do that computationally and obviously it will not work. So there's a very wide range, okay, of the number of basis functions in which it will still be small at the points now and it's a very nice and smooth function. So if it's small on the points, or on the core sections of the host, it will not increase rapidly when you move away from them. Okay. But suppose, you know, you're looking at from a totally abstract mathematics point of view and said, I mean, why should I limit myself just to the radial basis functions of the points. Why not take more and more functions, take all the -- you know, look at the Fourier basis of all functions from R2 to the reels. Eventually you will get something which is too good. It's small on the points, but it goes to infinity everywhere. But of course you -- this would also be a problem if you apply this idea of taking too many functions to RBF, you will run into the same problem. Okay. So ->>: The same thing could be said about the comparison between the helix and the [inaudible]. >> Daniel Keren: Yeah. >>: If you took the RBF to be more severe, then you get the helix. The surface that you get if 21 the function is less than a certain value, right? >> Daniel Keren: Actually, I don't think that you'll get the helix. Because you're looking at one function, at one single function, and looking at all the points in which it equals zero. Okay. So -- how should I say that. I mean, you're trying to interpolate a bunch -- taking more and more complicated functions. I'll try to convince in ten seconds. Suppose you're taking a bunch of points, okay, and you're trying to construct a function, which is not real, of course, so you enforce some norm coefficient, the norm is 1 or whatever, and you force it to be 0 here. And you take more and more and more, for example, Fourier coefficients. Say I will go up to a frequency of 1 million. It will be zero at these points, but it will be zero in many, many, many other points. We can prove that our solution, this will not happen. This will eventually in the limit be small only at the points. Of course, if you take too many, first it's absolutely not necessary. There's obviously some kind of degeneracy in taking more functions than points. So it will not happen. So I hope I -- so the problem will be really if you take too many functions. In our case we did not have to tailor them because we never went above the number of points. The number of basis functions is always bounded by the number of points. Can be smaller, but it's never larger. So you don't run into these naughty problems of -- yeah. >>: The classical problem of overfitting. >> Daniel Keren: Definitely. Definitely. >> Sing Bing Kang: We can ->> Daniel Keren: Okay. I'll be happy to. >> Sing Bing Kang: Let's thank our speaker once more. [applause]