>> Sing Bing Kang: Good morning, everybody. It's... Keren. He last visited us about nine years ago. ...

advertisement
1
>> Sing Bing Kang: Good morning, everybody. It's my pleasure to welcome back Daniel
Keren. He last visited us about nine years ago. So Daniel got his Ph.D. from Hebrew University
in '91, and since '94 he's been with Haifa University.
He has a wide variety of interests, his main interest being machine learning, regularization
theory, loss distributed systems. But he has also done a lot of work in a vision graphics.
So Daniel.
>> Daniel Keren: Okay. Thank you very much, Sing Bing. It's definitely been a while, and I
realized when I got back that only Sing Bing and Greg Zielinski are still here from the original
group, which was when I last visited, so I'm definitely below the [inaudible]. But it's very nice to
come back.
As Sing Bing said, my perspective is more mathematical, vision and machine learning oriented,
so I hope I will not misrepresent any other work in graphics. I'm not really a graphics person.
But the work does seem relevant to computer graphics. It's published in a graphics journal and
it's a joint work with Craig Gotsman, whom I assume many people now know, and joint graduate
student Roi Poranne.
So feel free to interrupt and ask any questions in any way you want.
So this is an example of what we're doing. MaD is Mahalanobis distance, and of course I
assume that even if the younger crowd are familiar with Alfred E. Neuman.
So the input to our algorithms, the one which I will describe, is a set of points, either in 2 or 3D.
And the output is a function which is constructed so as to be smaller on the data and large
everywhere else.
So as opposed to the common wisdom of implicit fitting in which the function is -- attempts to
obtain a value of zero on the data and nonzero elsewhere, and typically nonzero would be a
different sign inside or outside a curve. This is different. The functions which we fit are
everywhere positive. And the set which we try to approximate is characterized by the fact that I
said that the function attempts to obtain a minimal value on the data, and I hope to be able to
convince you that this has some advantages.
So this is the overview of the talk. We will talk -- ah, [inaudible] nice to see you. So a little
background. Our function, the surface extraction. And I apologize in advance, I will not talk too
much about that.
Results. And an extension which we hope to pursue which puts a very, very different spin on
this probabilistic scheme. And it has the potential we so believe to very broadly generalize all
these radius.
So the input is the same input to many works which a lot of you must be familiar. You're given a
bunch of points, and you want to reconstruct a nice surface. I already chose this. I assume this is
a -- I hope it's a common example. And you want to end up with a surface. And so this is the
problem. And some of the background I will really just glance over.
3D point clouds can come -- if you're doing vision, they can come from stereo or they can come
2
from 3D scanners. Of course you have the Kinect. There are many, many sources for that.
The famous Michelangelo project. And these are things which you all know. Another example.
So we're giving this huge set of points in 3D and we want to fit a surface. And the question is
also of interest in 2D.
This is a very well-known problem. A great many papers. Voronoi based. Other papers which
you must know. Implicit functions, which is something which is something which has always
interested me, and I actually have some previous work, some of it very old, some of it not as old,
including with Craig Gotsman. And so -- and this may resemble it, but, as I said, it will be
different.
For a very typical solution for the implicit approach -- and, by the way, when I say implicit, the
model can be anything. It can be an implicit polynomial, it can be something which is built from
radial basis function. It doesn't matter. The principle is the same. The guiding is the same.
You have a function defined from Euclidean space to the reels and you try to -- for it to set aside
the following; that it obtains negative value outside of the shape, a positive shape inside, and
zero on the shape. And then you extract the surface as the set of points in which that function
equals zero.
And then immediately you will run into some problems, and I will make a [inaudible] example.
Suppose that you said in 2D looks like this. Okay. And this is not a mistake. This is part of the
set. Okay. It's not clear to see what function you will build, a reasonable function which will
have the property that it is -- obtains the value of zero on this set, which is not only disconnected,
but which is not closed.
Usually these things tend to close the shape and very often you will run into extraneous
components. It will somehow try to close this by adding an artificial part, which is not -- which
you don't want to be there. So this is one of the things we aim to solve.
Another example which is more difficult to do, although I will show it in the experiments, is
suppose that you have a wire in 3D. And you have this wire in 3D and you want to describe it.
Okay. Typically, if you have a function, from R3 to R, that's a reasonable function. And you
look at F minus 1 of 0, okay, all the points which go to 0, usually this will always have
co-dimension 1.
And as we would show, when you try to use radial basis function, et cetera, to describe a wiry
thing, a 1D entity which lives in 3D, it gives you a huge extraneous set. Okay? It gives you an
entire surface which does contain this wire, but it contains -- most of it is redundant. You don't
want it. And this is also a problem which we can solve using our approach. So this is the -- was
the main motivation at least when we started it. And I will concentrate on that.
So you extract the 0 set and you -- all of you know about the [inaudible]. You use marching
cubes, et cetera. And now let's talk about how you construct the function.
So typically you use radial basis functions, which are good because they're local. I and some
other people, we -- I did a lot of work on implicit polynomials, but they have some problems
because it's a global model. So radial basis functions are definitely very popular and quite
appropriate.
3
So this is how it looks. Okay. You have your points. Okay. And you build something like this,
which is a combination of the radial basis functions. These are the centers. The centers may or
may not equal the entire set of sample points you have. You may take less. If you have a
million points, you may definitely want to take less centers.
But that doesn't really change anything; only makes your function simpler. And you have the
coefficients. And here you add something which is usually constant or linear.
Okay. And this is of course very, very well known. And, by the way, I'm mentioning it because
I hope to make further connections later. If you think about an SVM, a kernel classifier, it also
looks like this. Very often you have the function, which looks exactly like this, where this is
typically a Gaussian, and you classify things.
For example, for the set I want a set of images or text or documents or whatever I want to
recognize are those in which this is positive and the ones which I want to reject are those in
which this is negative. So does a very nice and simple relation between surface fitting and
classification.
And Bernhard Schölkopf have done a lot of work in -- a lot of very nice work in this direction.
And I will also hope to [inaudible] have to relay this to recognition. And then you extract the
zero set after you do this using various algorithms.
Now, the question is of course how to find this function ->>: I have a question [inaudible] that doesn't give you a function that's positive one side,
negative the other side.
>> Daniel Keren: Well, it depends on the sign of the alphas. Or maybe I misunderstood the
question. But the alphas can be negative of course. So -- okay?
>>: And it's still negative on both sides?
>> Daniel Keren: No. No, no. The alphas, some of them can be positive; some of them can be
negative. Okay. I will talk about that. Although, our functions will not have this problem.
They're all very positive.
But in the common wisdom, this is [inaudible]. I mean, you can think about it as a polynomial.
If you're looking at a simple locator, which is an implicit polynomial, then assume that all your
points lie in the positive quadrant. Then still some of this will be positive, some of the
coefficients will be negative, and you will get a function which can be positive, negative, and
[inaudible]. So, yeah, that's possible.
Okay. Okay. So how do we find this function? You have your point cloud, X, and you -- you
want to satisfy this equation. Okay. You want to satisfy this equation, which means that your
function is zero on the set of points. Okay.
So you have your sample. I'm doing things of course in 2D, but it generalizes immediately to
3D. You have this. And you're looking for a function which obtains a value of zero on this,
which is pretty easy, of course. You just take the whole function [inaudible] interesting. So you
4
solve a linear system. You just substitute the points in the function, and you demand that the
result at every point will be zero. This gives you a set of equations and these coefficients.
Now, it's sometimes confusing to people. The equations are linear. The fact that the function
phi are not linear, that's not a problem. The system is of course linear in the coefficients.
And you solve it, but this of course yields nothing. Because, for example, you will simply get
the zero solution. Okay. So this is -- you obviously have to add some constraints. And this is
where it starts to be more difficult.
Very typical constraints are the following. Okay? You look at this. Listen, if I just want to
obtain a value of zero here, what does it mean. I mean, usually I will not get anything
meaningful. So people start doing things like this. They add anchor points. So they saw a
bunch of points here, and they demand that the value of the function in these points will, for
example, be negative.
And then they -- for a bunch of points here, and they demand that it will be positive. And then
you're guaranteed to have a function which firstly is nonzero, it's not trivial; second, you're
saying I will force it to obtain a value of zero only on the sample points. Okay. And if I demand
that outside to be positive and inside negative, it means that it will go through the points. Okay.
And it will hopefully obtain a value of zero only on the points.
Of course usually you still have many problems. One problem is of course that you have to
provide these anchor points, which is not necessarily trivial, because your input is just a huge
cloud of points, a messy cloud of points in 2D or 3D.
Secondly, let's go back to this example. Okay. What will you do here? You want your function
to be zero only on this patch which is not closed. It doesn't have an inside and an outside. So
you cannot apply this trick. So it's definitely not obvious how -- what you will do. I mean, you
can just say, okay, I'll just throw a zillion points all around here and demand the function to be
positive everywhere.
But that's not really clear how do it. Okay. This method, which is very common, the implicit
fitting, really works better -- is really good to the place for the cases in which you want to fit
something which is closed.
Okay. And I skipped something. Sometimes if you have normal information, if you have the
normal sort of surface, it of course helps you to find these points. Okay. If you want to find a
point inside, you go along the normal, the correct direction, assuming you can find it. It's not
trivial. And if you go the opposite direction you find points outside. But then you need normals,
and normals are not always trivial to obtain.
Okay. So you get these kind of results and you can try to improve them and if you don't do it
good, you get something like this. And if you do a better job and you really very carefully
compute the normals, you get something like that.
Okay. But it requires a lot of hard work. And, as I said, it still will not work for this case or for
the case that the code I mentioned is larger than 1. Okay.
A different approach which is -- by the way, oh, this is not what we did, this is old stuff -- is
5
simply you say that I don't want -- I want to exclude the zero solution. Okay. In order to
exclude the zero solution, when I say the zero solution, I mean of course to this set of equations.
Then I would simply do a very stand-up thing and demand that in addition to satisfying a set of
equations, then the norm of the solution will be equal to 1. Okay.
And so it can advantage, but this still has a lot of problems. Firstly, it's a heuristic. There's no
really -- no theory behind it. It's just a heuristic which may work and may not. And still you run
into these spurious components. Okay.
By the way, often in the graphics literature you don't see these spurious components because
they are hidden by the surface itself. And some people in the literature have acknowledged it.
You see it when you look at the simple case of 2D. Then in 2D you cannot hide anything behind
the surface and you see these extraneous components very vividly. Okay. So now we want to
generalize this.
Are there any questions so far? Okay.
So we look at the set of functions, of basis functions as defining a map from, for example, if we
are trying to fit a curve, X lies in the plane, and we project these points into a much higher
dimensional space.
Now, this is actually quite simple. You have your basis functions, okay, which are many
functions in two variables. For point and how to, simply is projected into this higher
dimensional space. The definition is very simple. You take your, for example, 100 functions,
and then it's projected into R100, with an ith coordinate, simply the value of the ith basis
function at your point.
Now, this is of course a very old concept. When you do kernel SVM, this is what you do.
Right? You're saying I cannot do linear SVM in Euclidean space. It does not -- it's not good
enough. It's not rich enough. I saw the points to higher dimensional space, and I classify them
using a linear classifier in the higher dimension space. And this is a very famous camera trick
which Vapnik invented quite a few years ago.
So this is how it looks. Okay. And what you do then, you find exactly as the case I mentioned
for machine learning: you try to find a hyperspace in the higher dimensional case. Okay. In the
higher dimensional space, I'm sorry, not case. And then it's -- there's a hyperplane which goes
exactly to all the points, right, and therefore it has a linear equation which describes it. Right?
It's just a hyper -- hyperplane is described by one linear equation.
And then your equation for the original space is constructed through the high dimensional space.
Okay. You take the points, you slow them to a very high dimensional space, you find a linear
equation in this space. The linear equation here is of course nonlinear here because the basis
functions are nonlinear. But still you can do it quite easily. But it's not good enough for the
reasons I mentioned before. But this is just -- there's still nothing new here. This is just standard
obvious, right?
Okay. So and this is the idea behind this. Okay. So now if you think about it, your coefficients,
the coefficients which we set out to look for or to find are simply the normal to this hyperplane,
right, because the equation of a hyperplane is simply defined of course by the normal, by the
normal vector, and it's very simple. And you add this of course because it doesn't have to go
6
through the origin, so this is quite simple.
Okay. And you can -- and this is the matrix you get. And it's exactly the same. So so far
nothing new. So what are we going to do to extend this idea? Okay. What we're going to do is
we're going to take this bunch of points at the high dimensional space and obtain a type of
approximation to them. Okay. And why are we doing this? I will try to make a simplistic
sketch here to explain. Okay. Although it does appear on the slides, but I like to write things.
So suppose that this is your bunch of points. And you project them to a very high dimensional
space. And you find the linear equation in this high dimensional space which satisfies the
following. All the points here go into this hyperplane. The problem is of course that many,
many other points here will also go into this hyperplane.
So your solution in the original space will include these points. And this is of course not good.
This is where -- this is the source of all these extraneous components which we want to get rid
of.
So what we're going to do is we're going to take this so-called hyperplane, which is usually just a
bunch of points in the high dimensional space, which we sketch here in the Euclidean space to
make it visible, and we're going to approximate them not with one hyperplane, but with many
hyperplanes. Okay? And usually you can do it. Right? Because this is a tiny set in a very, very
high dimensional space. Usually you can find many hyperplanes which go through it, which
contain it or which at least approximate it very, very tightly.
So why not use all of them? If you use all of them, you'll have a much better chance of ridding
yourself of the extraneous components, because now you have a much harsher condition.
Let me try to demonstrate. This is the optimal hyperplane which RBF gives you. Still any point
of the plane, as I said, which goes into this hyperplane will be misclassified. If you have a point
in the plane here, it goes into that hyperplane, it will be misclassified because the algorithm
thinks that it belongs to your original set.
So let's try to get rid of it. How shall we get rid of it? Very simple. We'll find it more
hyperplanes. By doing all these hyperplanes, don't forget they exist on the high dimensional
space, which can be, by the way, very high. It can be thousands. It can be hundreds of
thousands. But still it exists.
Now we're going to demand from a point in the original space to be mapped not only to this
hyperplane but also to this hyperplane. Why does it make sense? It makes sense because we
know that the original points, the points which were given as an input to the problem, do satisfy
this condition, now mapped into this one and this one.
By the way, when I say into, can also mean of course very close to it is good enough.
So now we'll -- by demanding, okay, but to accept only the points which are mapped into these
many hyperplanes, we're making it less and less likely for extraneous components to occur,
right? Question?
>>: Yeah, I mean, you may be addressing this later on, but what happens at the points of
[inaudible]?
7
>> Daniel Keren: Oh, we have example in which we added a lot of noise. Yes. Yes. Yes.
Noise will usually not be a problem unless you -- it may be a problem in principal, the function
which throws everything to this very high dimensional space. It's a very, very nasty function
with very large derivatives, in which case a small noise in the input will create a lot of noise in
the output. But usually you don't use these functions. Use radial basis functions, and they're
pretty well behaved.
Question?
>>: This seems like [inaudible] if the points occupy some significant lower dimensional
subspace of the higher dimensional space. I mean, you want all these planes to approximate each
set, then there's some -- they must have some -- the intersection is going to essentially be the
same [inaudible].
>> Daniel Keren: Yes. And the point is that for any reasonable example, this is exactly what
will happen. Because this space is really ->>: Meaning the actual surface [inaudible].
>> Daniel Keren: It doesn't have to be a nice clean surface. The point is that you're mapping
things into a very, very high dimension. Okay. So let's think about the following question. You
have a bunch of points in R -- in 5,000 Euclidean space. Okay. What is the probability, quote
marks, I'm not saying that I can really -- I mean, we can maybe compute it, but never mind, just
waving hands, what is the probability that you cannot find two orthogonal hyperplanes which
goes through this punch of points.
Well, we know the answer is very simple. You compute the SCADAmetrics and just a do a PCA
decomposition. Now, you have to be extremely unlucky, okay, so only for the eigenvalues of
this huge matrix. If all of them are large, then you will only have a one-dimensional -- I'm sorry,
a single -- a hyperplane of co-dimension 1.
But usually this is not what happens. Usually many of the eigenvalues are very small. And if
they are small, it means that you can find many hyperplanes. I will relay these two concepts ->>: I'm not quite sure where you're going with this. But is that sort of -- that's where you're
going here, you take this kind of PC and get some lower dimensional subspace [inaudible]?
>> Daniel Keren: Exactly. And I'm actually going to weigh these hyperplanes according to the
eigenvalue. Exactly. Yes. Yes. Okay. Yes.
So now suppose that we can find these hyperplanes. And this is just absolutely technical and not
very complicated point. And now we're looking for a function which satisfies the following,
which satisfies that it tries to be small, only on the points which belong to all these hyperplanes.
So think about it as a very simple question in geometry. You have a bunch of hyperplanes in
very high dimension. And you're looking for a function which is either zero or very small on all
of them. If you do a direct extension of the RBF, RBF does -- it does exactly that. The problem
is that it takes only one hyperplane, and that's the reason for this great redundancy and many
extraneous components of it. But we can typically use many, many hyperplanes.
8
So how would you define such a function. There are many ways to define it. The simplest
definition, which we work with, with a slight modification, is the following. We want a function
which is equal to zero, exactly on the intersection of the two hyperplanes. It's very easy to
define. You just take the distance from this guy, the distance from this guy, square them and add
the squares. And then of course your function will be easy, or only on the intersection.
And this is what we do. This is the algorithm. That's it. There are all kinds of small
modifications. One modification, and this directly relates to your question, it may well be -- I
mean, it will always be of course that -- at least typically that this hyperplane is the best
approximation. It really passes exactly through the points.
And this one has a slight error. It's very close to the -- by the way, of course you have to enforce
and to be different [inaudible]. So usually the first one will be best, the second one will be
slightly less good. So when you add these squares, you simply weigh them by the goodness of
fit to the points which is just measured very simply by the eigenvalues of the SCADAmetrics.
By the way, when I say SCADAmetrics, I mean of course in the high dimension.
Okay. So does this make sense? This is the crucial point on -- okay. So this will give us a
function which tries to be small only on the data. The nice thing visually, you can look at each
of them separately, because it's just a weighted average of what? Of a function which is defined
by this hyperplane and another one by this. And suppose you have -- I think I have an example
with nine.
So the nice thing you see is that each of them is not good. Each of them has extraneous
components. Okay. One extraneous here is measured by the fact that you get a small value, but
on the bad point, a point which is not in your original set.
But when you add them, okay, then each extraneous component appears only in one or two of
these sums. And when you add all of them, you get something nice and you rid yourself of these
components.
So this is it. The entire idea of the talk here, except for some extensions I will talk about at the
end, is this. That's it. Okay. So you just do it and you use a field in and the algorithm proceeds
quite similarly, actually, in the beginning to RBF, the standard RBF. You take a bunch of
centers. I will skip this because this is really very simple stuff about centering the point so you
can -- you just [inaudible] average end of it. But this is a trivial thing.
And then eventually you define your function like this. I hope you see I try to say in words some
which -- the line with the equations. This is simply the sum of squares of the distances from
these hyperplanes. And it is simply equal to this one where alpha I is of course the normal to the
ith hyperplane and the high dimensional space.
Okay. And this is the correction I said, and this was exactly your question.
To correct it we weigh them by the goodness of set. Okay. Suppose that a tense hyperplane is
really not that good. It approximates the points, but not as good as the first one. So obviously
you want to give it a smaller weight.
Unfortunately you don't have to write your way in on how to compute this weight. It simply
9
comes out automatically from the SCADAmetrics, okay, of the points in high dimensional space.
Okay. This guy will get of course a high weight, a large weight, so the distance from this one
squared will be multiplied by a large coefficient.
And okay. Where am I going? What happened? Oh. Okay. Probably went backwards. And
this is of course a very bad fit. So it will be entered into the function, but with a very small
coefficient, a much smaller coefficient. And you can see what the coefficients will be, and they
are simply the eigenvalues, okay, of this distribution. It's very straightforward to -- question?
Yes.
>>: [inaudible] seems like the smallest eigenvalues wouldn't correspond to any [inaudible] is
there something different? If the hyperplanes you select are the ones with the smallest
eigenvalues, doesn't that automatically provide [inaudible]?
>> Daniel Keren: Oh, yes. Yeah. Yes.
>>: So what is the [inaudible]?
>> Daniel Keren: Oh, no. This is not a good fit. Ah. What happens -- okay. Okay. Okay. I
think I understand the -- okay.
You choose the one with the smallest eigenvalues. But the one with the smallest eigenvalues
when you build your function are given a higher weight. Because they're better. Am I making
sense? So at some point you will invert that matrix. Okay. So the good ones are the ones with
the smaller eigenvalues. But when you build your combination, the smaller the eigenvalue, the
better fit it is, for which it should be given a higher weight. Okay.
And all this is encoded, by the way, in one very simple formula. So when I get there I hope -but that's actually a correct point. Yeah. Smaller eigenvalues are a better fit. So you weigh
them. The smaller the eigenvalue, the higher weight it gets. Okay.
So this is -- this may be minus PCA, it may not, so we remind people who are interested in
machine learning of kernel PCA. This is really [inaudible] scale of PCA.
And after on the -- you do the computations and you get a very, very simple equation. This is
how it looks. Of course, this is maybe a nasty equation to compute because this may be of
course a very large matrix. Remember that the size of this matrix, okay, is the -- of course the
number of basis functions.
And if you have to compute this creature at every point, it may take a lot of time. But you can do
something very simple, and this is what we do to save time. Okay. You look at this, and what
you do, you do just an SVD. You do an SVD decomposition on this one, and you take only the
larger eigenvalues. So this is what I said, and I hope it will answer your question, because of the
inverse here when the smaller eigenvalues are given higher weight.
Exactly. Okay. And so you can eventually if you take -- usually we take 10 percent or 5
percent, it depends, but we don't of course take all of them into account. We take into account
only the good ones. Okay.
10
Now, if you take only the first one and you drop the square root, okay, you just get ordinary
RBF. So we can think about the standard RBF as a degenerate subcase of this approach, in
which you take only the best, quote mark, hyperplane.
Okay. And this is how the function looks. So see that this is the RBF function. And, by the
way, this is not the curve; this is the function itself. And here you can see very, very clearly
these extraneous components of the RBF, which I mentioned. You see here yellow is small. So
you see it is obviously small on the set. But it has only the extraneous components. And this of
course took me back many years ago to when we were fitting polynomials and running into all
these extraneous components, and we had very few heuristics to get rid of them. But this is
much, much vigorous than -- yes.
>>: So if you took all the points here and projected them onto your hyperplane, and then you
took your extraneous points and projected them onto the hyperplane as well, are they separated
well? I mean, could you actually just do like a Gaussian mixed model on the hyperplane space
to kind of say where ->> Daniel Keren: You could ->>: Is there a reason why we're doing this linear plane?
>>: What you're saying is correct. It's a different approach, which was used by Scholkopf in his
group. And it is possible.
This is a simpler solution, and I will later show how it can be extended to something which
encompasses both [inaudible]. But yes, you're right. You're right. Yes.
The problem is of course with if you simply -- I mean, your question touches exactly on what I
tried to hit on before, looking at this as a classification problem, you can think about fitting as a
classification problem instead of text about soccer and text about basketball, which you are to
classify using current SVM, you have a bunch of points here, and all the rest are the bad points.
The problem is that as our classification problem -- if you try to directly apply standard kernel
SVM to it, it's not an easy problem because once it is degenerate with a very low dimension and
the other set is everything else, the entire plane, so you can think about it as a classification
problem in which class A is this degenerate thing, one dimensional, and the other class is all the
rest of the plane. So this is not easy for kernel SVM to handle.
But what you said is correct, you could try to do that. But usually you try to separate when you
do machine learning two things with each of them which has volume, kind of nonzero volume.
You have a chunk here, a chunk here, and you pass something in the middle. And this is a
different case. But that's a good point. Yes.
Okay. Another example, and this is why I was talking about things which don't close, is the
spiral. Okay. You look at the spiral and the points. You can see them here, of course. And this
is our function. Okay. Our function, it does miss a little here, but you can see that it sticks quite
tightly to the data, by which I mean it is small only on the data, and it increases as you move
aware from it.
And the RBF has this problem that it always tries to close the shape, and this relates to what I
11
said before, you have a function, a nice analytic simple function, and you look at all the points
which are are mapped to zero. The inverse will typically be a closed curve. And this is why we
chose a spiral for the demonstration, because it's not closed. Okay. If you just fit something
very simple, RBF will do a better job. Okay. So -Now I'm going to completely skip the surface or cube extraction. I'm just going to say -- because
I'm running out of time, and I'm going to say one word on it, is that after we get a function, okay,
we have to extract the curve. So the curve here is simply all the points on which the function
small.
So there are many ways to extract it. It's different from RBF, because RBF, it's not a question of
being small, you extract all the points in which the function is zero. This is different.
So Roi, the graduate student, did a lot of work on it and eventually converged on something
which works very nice on one and two and three dimensions using the watershed transform.
There are many algorithms he fitted with a few, and the watershed worked best. But, as I said,
I'm going to be unfair to Roi and I'm going to skip that.
But just bear in mind maybe the major difference here technically from other work on surface
fitting is that this is an unsigned distance. There's no inner part and outer part to the curve or to
the surface. And the function is positive everywhere. So that's, as I said -- so you cannot apply
many techniques which are used to extract a zero set. You cannot apply them here. It's a
different thing.
And as I said, he used the watershed and some examples. And as I said, the zero set of a signed
function will always have co-dimension one. This is what I meant when I was talking about
fitting this wire in 3D, our approach allows to do it. And this is one example. So here it took a
bunch of points like this, and this is the RBF fit. Okay. Down in Israel I would say that this is a
[inaudible]. This reminds of some sweet thing that you buy in Israel. It has exactly this shape
and color.
And what it does, you maybe have something not here too. And you see you have this curve, but
because of this deficiency of the RBF, it gives you this set, which does, as I said, contain this
curve. So this relates to the questions you're asking, look at the point here, why does it appear
here. It appears here because this projection to high dimensional space also takes it into the
bad -- into where it shouldn't go, into the same hyperplane.
So therefore in the original space it is classified as part of the set, but obviously it's not. And this
is what our result is. We have improved this. You see, it's still not easy. It's not easy to extract a
1D thing in a 3D environment, but our results now are better. So -- but I will -- okay.
And these are other examples. And this is what I promised to show you. You can think about
our function as the sum of square distances from any hyperplanes. If you look at the square
distances of each of them, here are the source nine hyperplanes.
Okay, by the way, you remember of course this is the function which measures the distance from
the hyperplane and the high dimension, therefore it's in the original dimension, in the low
dimension. It's very strange. It doesn't look like a distance function because it's defined in
[inaudible].
12
>>: [inaudible]?
>> Daniel Keren: How many we take?
>>: We don't know how many points there are, so consider [inaudible].
>> Daniel Keren: Oh, yeah. We take far less than the number of points.
>>: [inaudible].
>> Daniel Keren: We did a lot of work -- I mean, we did all the experiments, of course, and it's
not really sensitive to the exact ->>: Just to give me a sense, is it like 20, 30?
>> Daniel Keren: Yes. 10. 10. I would say that 10 is typically [inaudible]. Of course, if you'll
have -- okay. Suppose that we go crazy and we want to fit a wiry thing in 500 dimensions, then
obviously we'll need much more, right, because you'll have a huge number of dimensions for
these extraneous components, and you will have to chop out all of them.
Okay. But this is not something which you usually do. We didn't go beyond co-dimension two,
because in graphics applications, that's the highest co-dimension you can have. So 10, 20 is a
good number.
By the way, when you go above a certain number, the performance doesn't change. The
limitation on the number is simply to make the computation more efficient. It doesn't matter -- I
mean, you cannot take too many. That's what I'm saying. If you take -- if 10 is good, and you
use 30, you still get a very good result or you will just do three times more computation.
>>: [inaudible] some of these higher ->> Daniel Keren: Yes. So it's quick and easy to decide where to chop that. But the good news
is that, like I said, if you take too many, you're only wasting computation time. You're not
winning the performance. Yeah. Yeah.
>>: [inaudible] some indication from the PCA of whether the input set points best [inaudible] is
best fit by a surface versus a curve?
>> Daniel Keren: Yes. Yes. You would expect to see a different rate of decrease of the
eigenvalues. Everything is encoded in these eigenvalues, and the speed by which they decrease,
the rate by which they decrease.
>>: But in the previous example of this, the spiral increases [inaudible] you could detect that it's
much improved?
>> Daniel Keren: Yes, yes. Because you would have two very, very small and nearly identical
eigenvalues. Yes. Yes. And the RBF will take only one of them. So, yes, exactly. Yes. You
definitely see it. Yes. Yes.
Okay. So what happens here is that each of these by itself is not good. It does have extraneous
13
components, like RBF has. Remember this is just RBF. Okay. When you add all of them,
you're left with all these extraneous components.
Now, I begin by criticizing RBF that it doesn't have a theory. So you may rightly ask, okay, this
is very nice, but do you have a theory? The answer is yes. I can prove it will take too much
time. But we can prove that if you take more and more basis functions, at the limit, then actually
this function will have a very strong property. It will be bounded by a constant which we can
determine on the set, on the sample which you get. And it will actually diverge to infinity at any
other point.
So this is guaranteed at the limit to work. At the limit, you will not have any extraneous
components whatsoever. Of course you don't need to go to the limit. But still it's good that this
theory is there. But I cannot prove it because it will take me too much time. So of course I'll be
happy to discuss it.
Okay. So finally the examples. And Roi did a lot of work. He took many examples and he
programmed them. Now, as I said, I'm not an expert on graphics, but the results were good as
compared to some cases -- I'm sorry, to some other algorithms.
Okay. This is ours.
>>: Sorry. Do you have any guess why the [inaudible] looks so blurry? Is it just bad tuning of
parameters?
>>: Looks like, if you go to the next slide [inaudible] is not as dense as the [inaudible] on the
right.
>> Daniel Keren: Uh-huh. Yeah. I think we got some feedback from the reviewers on the
initial submission, and then he went and he improved it. But I'm going to deflect any [inaudible]
to the graduate student. But we did get a lot of feedback from the reviewers on how did you
do -- how did you run the other algorithms. So maybe this is an earlier version. But he did take
it seriously, and he did ->>: [inaudible] the upper right?
>> Daniel Keren: I have to confess that I don't remember.
>>: Because the results there look remarkably dicey. There's that little black part, which is bad,
but it's really doing [inaudible].
>> Daniel Keren: Well, but here you get this very sharp.
>>: That's true. You do get sharp.
>> Daniel Keren: But -- yeah. Yeah. For some reason I ->>: [inaudible] sharpness and ->> Daniel Keren: Yes, yes, yes.
14
>>: That's true both the adaptive things make more sense.
>> Daniel Keren: Okay. So let's go to the other examples. Okay. This example -- as I
understand, all these are standard sets that people use, and here the result was pretty good. And,
again, there were comments from the reviewers on the application of the other sets.
>>: What were they? I can see the triangles from here [inaudible].
>> Daniel Keren: Here there are no -- when you say triangles, what exactly -- I mean, because
we don't do any triangulation.
>>: My question is [inaudible] on the right, you see the [inaudible] of the match. It's extremely
coarse. It's not very refined.
>> Daniel Keren: I owe you -- I definitely owe you a reply on the implementation of the other
angles. So I'll definitely look into that. Okay. I'll be sure and show the examples of the code I
mentioned too, et cetera, in which -- but, yeah.
>>: Are you going to talk about the computational complexity of your algorithm?
>> Daniel Keren: The computational complexity is usually about 10 times more than RBF. If
we take, for example, the first 10 ->>: That's fine. But these are RBF, and yours are above [inaudible] because you're computing
all the ->> Daniel Keren: Well, we don't have to compute all of them.
>>: Okay.
>> Daniel Keren: I mean, you can do -- when you do the SVD, you have an efficient version of
the SVD that tells you just give me the ten dominant eigenvalues.
>>: No, but the first thing where you compute the kernelize, when you kernelize things, you're
taking X minus X [inaudible] apply it to the kernel at the center at each point.
>> Daniel Keren: Um-hmm. Yes.
>>: And so you're ->> Daniel Keren: Oh, whoa, whoa. You're talking about the preparation of the matrix.
>>: Yes.
>> Daniel Keren: That is expensive. That is expensive. You're right. You're right.
>>: [inaudible] linear the number of points because you [inaudible] it's linear -- it's, yeah, linear
in the number points.
>> Daniel Keren: Okay. So two replies. Actually we certainly don't make any claims about the
15
computational efficiency. The point is that after you do all this preparation, when you have to
compute the function, that is relatively cheap. But there is some offline preparation involved.
You are absolutely right about that. And -- but it does -- it did run it pretty quickly for thousands
of points in R3. So I assume that it's reasonable.
>>: So computation of the function [inaudible]?
>> Daniel Keren: Yes. Ah. But we take only a small number. We take only 10. So to compute
the function is about 10, 15 at most times more expensive than RBF. But we could definitely
write about the preprocessing. Yes.
>>: You say it's 10 times more expensive than RBF, but the main RBF method itself to evaluate
a point involves a summation of all the Gaussians [inaudible]?
>>: Oh, yes, yes. So.
>>: So that's a very expensive [inaudible]operation.
>>: Oh, yes. But you can -- I mean, I would assume as a nonexpert in graphics that since these
are Gaussians, then if we're talking just about the RBF, you can do some very simple space
partitioning. And when you, you know, suppose that the centers of the RBF are here, and you
have to compute the value of this point, you will simply do some very simple space or
subdivision and take only these which are here.
Because, I mean, the radius of the Gaussians is usually much, much, much smaller than the
extent of the entire shape. Definitely. So you could do that. But here we'll deflect the blame to
RBF, not to the [inaudible]. So I will always find somebody to blame. [Inaudible] but you're
right. It may take time. Yes. Yes.
Okay. And we were also asked by the reviewers to check noise and outliers, and it did again -- I
don't know if all graphics applications -- how -- if you would expect such enormous noise, but
anyway, he programmed it and he ran it. And for these outliers, it performed reasonably well.
And, again, you're the graphics experts; you should tell me whether we should really expect -- I
mean, as opposed to computer vision, would you expect such a very high noise in it. But this is
just the reviewers requested, they already requested. So this was done.
And the results -- again, I'm here deflecting to the reviewers, but the results which they liked, for
example, when you go and you sparsify the points more and more, okay, you get here where
really you see a very -- I mean, a relatively small percentage of the points. And I'm not talking
about noise. I'm talking about a much smaller number of points. Then as the number of points
decreased, it did pretty good. Our algorithm still did pretty good.
Here, for example, you see it lost one of -- lost this arm. But this is really a very, very severe
undersampling, but it did relatively good as opposed to the other methods he tested.
This [inaudible] here he took this cloud point, cloud of points, and he took some cuts through it.
Okay. You see these are slices, the circles here and here. And, again, I keep apologizing, so not
being a graphics person, but this was the result which the reviewers liked the most. So, again,
you're much better qualified than me to say whether this is a nice example. But they really liked
16
this sample of the undersampling.
I will also skip this. It just -- he also did some testing to see how good it reconstructs normals.
Okay. You can look at your function and you can try to reconstruct the behavior of the curve,
for example, to find the normal by all kinds of methods, or you can do a PCA.
And he used it to compute normals and it did. Oops. Okay. I'm sorry for the false -- oh, no, it is
here. Sorry. Sorry.
And the estimation of the normals using our function was better on the average than other
methods. It especially did better, not surprisingly, when the curve behaves like this. Okay.
When you use PCA, et cetera, and you try to compute the normal here, so what do people do?
They -- you take a sphere and you take the points of the curve, which fall inside this field, and
you fit a plane to them.
And after you fit a plane, you can find the normal. It's just a normal to the plane. But the
problem of course in locations like this is that you get this here. This will be included. So when
you compute the normal to this point, these points will interfere and give you a result which is
absolute garbage.
So when you look at the average improvement, it's a bit misleading. It would be more accurate
to say that our method and other methods did more or less the same for, I don't know, 90, 95
percent of the points. But in regions like this, we did much better. So this is obscured when you
look just at the average quality. So just to mention that.
Okay. Now suppose that you do have normals. Suppose that you somehow -- and there are
many, many algorithms for computing normals. If you have normals, you can take your basis
functions to look like this. Okay. And then the function is of course even better than before.
You see that here the function is really, really concentrated as a small value only on the curve
itself, and it increases very rapidly as you move away from the curve.
Okay. And actually he was supposed to do it for the three-dimensional example too, but I don't
remember whether he did it. Because there the improvement would probably be more
substantial. I mean, it would force your basis functions to be basically one dimensional, and then
you would get a much better result.
But this is just, you know -- okay. So this is the -- a conclusion. You can think about what we
did as a generalized RBF. It behaved well under noise and outliers. For some cases the
improvement over other methods was very good, especially with a host image.
I took into [inaudible] check with Roi that, as I said, maybe this is not a version, but he did go
and check very carefully the results of the other methods. But in some cases, it was quite good.
There is a computational price, as you asked. It's definitely slower than RBF. The good news in
that regard is that it's -- when you compute the function itself, which is the crucial thing, it's only
slowed by a constant factor, which is about 10 to 20. So it's not too bad in that regard.
Now the question is whether I have five more minutes to show you a completely different spin
on all this, something which looks absolutely different and converges to the same result but in a
manner which can allow to extend it to -- in a very strong way.
17
And this is what we're doing now. So the question is whether I have these five minutes. Okay. I
will really -- I apologize. I will really run through this.
What we did here is build a function which attempts to be small on the data and large when you
move away from the data. And as I said, we can prove that it does satisfy this condition in the
limit. So you can think about it as the opposite of an indicator function.
An indicator function, like we study in calculus, is something which is equal to 1 on the data and
0 everywhere. This is the opposite. It's characterized by being small on the data.
And we can put a probabilistic spin on it, and we can do this. This is I hope not too scary. You
look at the function. Suppose you have a family of functions and you have a bunch of points XI.
And you're trying to interpret what we did in a probabilistic setting.
We put a probability on every function. By the way, functions here can be, for example, all the
functions which are spanned by the radial basis functions.
You look at this, and look at this very simple definition. You see that if your function is larger
on the points, it will be assigned a smaller probability, right, because of the minus sign here.
Okay. So this is clear that this is something in the spirit of the same spirit in which I was talking
so far. You want to have functions which are small on the data. And this is a very simple
definition. Okay.
Now you're saying yes, but I want to use all functions. I hope I convince you that there's no
unique best function. This is what RBF does, and very often it fails. So it does something very
simple. I'm looking for a function which is positive everywhere and attempts to be small on the
data. I will do something very simple. I will square all the points -- I'm sorry -- I will take the
value to point and define a G, G of X, will be the sum of the squares of all function in the
subspace.
So it's an integral of course. It's an integral, by the way, over the subspace of functions spanned
by the radial basis functions. And I will of course multiply by the probability of F.
Okay. So this is just -- since the probability is defined as F, of course we define it here as a
Gaussian. So you can calculate it. I mean, as we all know, the only integrals which you have
any hope of calculating in higher dimension, except for the favorite ones, are Gaussians.
Okay. Now, amazingly enough, if you -- it's not -- it's a Gaussian integral which is not trivial,
but still you can calculate it. And, incredibly enough, it gives us -- it gives exactly our function.
The MaD function which I showed, but strange definition with a covariance matrix [inaudible]
it's exactly equal to this. Exactly. So this is a completely different way to look at it. But it gives
you the exact same result.
Now, why are we very energetically trying to pursue this direction? Because once you have this
probabilistic formulation and you see that it works good, if you take a very simple probability, it
gives you a nice fit, this MaD function. We can start throwing more and more things into the
probability.
For example, we can throw negative examples into it. We can demand a function, the fitted
18
function to behave nicely, to be smooth. We can add information along the normals. If we want
to take this over to recognition, we can use some prior information on the background. We can
do many, many things. This probabilistic formulation is much more powerful.
There is a slight technical problem, and I will talk about that for 20 seconds and then conclude, is
that if you want to enhance the probability structure, you may want to add things here. The
simplest example is suppose that you have a point which you really want to exclude from the
curve. Okay. Suppose you have something like this. Suppose [inaudible] and it really nearly
touches.
So probably whatever method you will use, this area will be a problem. So you have to go to it
and you have to tell it exclude this point, penalize this point. It's quite straightforward to do.
You will look at this expression, okay, and you will add here. You will have plus F squared at
that point. Okay. You will penalize the probability for the function obtaining a small value at
this point. And this is how you will rid yourself of that point.
Slight problem. Once you start adding things here, what happens? This becomes nonpositive
definite. If it's nonpositive definite, as we all know, you cannot compute the integral. Like
you're trying to compute the integral over R2 of E to the minus X squared minus 0.2 Y squared.
I don't know. Whatever. And this of course diverges. You cannot compute the integral.
The solution we found, which at the beginning seemed that it would be hopelessly in terms of
numerical -- of the numerics of computing it, is to do the thing which actually even without this
point, which I mentioned, is obviously the correct solution is not to integrate over the subspace
of all functions, the linear subspace, but to limit the coefficients of the subspace to be of unit
norm.
And if I would have ten more minutes, but I really don't, I would convince you that even, as I
said, without this problem, it's the correct thing to do. And now we run into the problem of -- by
the way, this is program and it's running. It's not an initial. But it can be done. It can be done
even relatively easy.
So we ran it through the problem of computing the integral of a Gaussian not over the entire
space for which we have a closed form equation, but over the units here in this high dimensional
space. The good thing of course is that the unit sphere, never mind the dimension, is a compact
set. Compact set you can integrate everything. Okay. You don't -- I mean, it doesn't matter if
the quadratic form and the Gaussian is nonnegative definite. You can still integrate it.
Too bad, but fortunately there's a very neat trick which allows you to compute the integral over
the unit field. There's no closed form expression. But you can reduce it to a one-dimensional
integral which you have to do numerically. Fortunately, it's only one dimensional.
Second, and this is very important, it involves preprocessing that you do once. And then to
compute the function at any point, you don't have to do it. You just do the preprocessing once, it
may take some time. But it only involves one-dimensional integrals.
And with today's machines to compute numerically a one-dimensional integral is not too
difficult.
So the idea is to carry [inaudible]. We already have some examples, but they are a bit artificial
19
in a 2D, so I didn't bring them, in which we exclude points on the curve [inaudible] where other
methods would glue this together we can rid ourself of them.
And that's it. This is just what I've been saying so far.
So, to summarize, RBF is a degenerate case of what we did, and both of them are degenerate
subcases over much, much broader -- at least we hope it will indeed turn to be more general
probabilistic approach. So I've really taken too much of your time, so thank you very much.
[applause]
>> Sing Bing Kang: Any other questions?
So a quick question. You mentioned for us as being one example, but they usually involve many
connective components.
>> Daniel Keren: Yes.
>> Sing Bing Kang: So do you want it -- is it that you separate all of them first before you
compute each of them separately, or you can just compute all at the same time, all the surfaces?
>> Daniel Keren: I'm not sure -- are you referring to the original set?
>> Sing Bing Kang: [inaudible].
>> Daniel Keren: Ah. Okay. No, no, no. The nice thing, this is done automatically. You don't
have to separate. Because this formalism contains everything. You don't have to separate and
find one function for that and one function for that. No, no. It's done automatically.
And, by the way, you can do shapes. For example, suppose that you have a surface and a wiry
thing coming out of it. You do everything together. You just throw everything into this machine
and it works. You don't have to separate. You don't have to do anything manually. Yeah.
Okay. Yeah. Sure.
>>: Earlier when you explained that to prevent a degenerate solution of all 0s, [inaudible]
traditional points that were just placed off of this surface?
>> Daniel Keren: This is done in the RBF, yes.
>>: So how do you prevent the degenerate solution in your approach?
>> Daniel Keren: Ah, if, for example, you -- ah, how do we prevent it? Ah, because you're
working with that matrix, okay, then you're using the eigenvalues. And the eigenvalues have
norm 1, so they cannot be degenerate. So it cannot yield a degenerate solution.
>>: So for a single hyperplane, then a solution of all zeros would satisfy the linear equation.
>> Daniel Keren: Yes. But you don't allow that, because you enforce -- you force your solution
to include -- in this case it would be the smallest eigenvector, in the eigenvector of the smallest
eigenvalue, but the eigenvectors you always take them to be a few unit lengths.
20
So when we look at only the first part of our solution with one vector, it will be exactly equal to
what RBF gives you, when you enforce the condition of unit norm in the solution of RBF, this is
exactly what it will give you. It will be the same. You don't have to force it. It follows
automatically from the formalism.
>>: So there's a linear system that is solved [inaudible] the solution.
>> Daniel Keren: Actually, you don't solve any system. You just have a big matrix. And as
Rick mentioned, it does take time to prepare it. I admit to that. But you just have a matrix and
you just invert it or do an [inaudible] on it. You don't have to force this additional constraint of
unit length. Sure.
>>: One thing I'm not quite sure is like where does the algorithm switch from reconstructing a
1D thing like your spiral to basically reconstructing the full surface, and the place where I was
wondering that when you get the example of the horse where it just [inaudible] just reconstruct
all the wires where you cut them through and all the full horse model and sort of like how -where was the [inaudible] control that?
>> Daniel Keren: You can control it, yes. What you're saying is I would go even to the more
extreme case. Suppose that you're going to the limiting case. Of all these -- of -- you have a
discrete bunch of points. And you can think about these -- you can think it's exactly the same
conceptually as this being the entire horse and this being the slices.
Now, if you go to the limit, if you take a number of basis functions, which are not just the regular
basis functions at your point, any family of functions which in the limit, which its union spans all
the functions. And then the notion of a [inaudible] space. They span all the functions.
Then eventually it will start going to infinity, as I said, I could prove it, but not now of course, it
would start to grow more and more between the points. But of course you don't want to do that
computationally and obviously it will not work.
So there's a very wide range, okay, of the number of basis functions in which it will still be small
at the points now and it's a very nice and smooth function. So if it's small on the points, or on the
core sections of the host, it will not increase rapidly when you move away from them.
Okay. But suppose, you know, you're looking at from a totally abstract mathematics point of
view and said, I mean, why should I limit myself just to the radial basis functions of the points.
Why not take more and more functions, take all the -- you know, look at the Fourier basis of all
functions from R2 to the reels.
Eventually you will get something which is too good. It's small on the points, but it goes to
infinity everywhere. But of course you -- this would also be a problem if you apply this idea of
taking too many functions to RBF, you will run into the same problem. Okay. So ->>: The same thing could be said about the comparison between the helix and the [inaudible].
>> Daniel Keren: Yeah.
>>: If you took the RBF to be more severe, then you get the helix. The surface that you get if
21
the function is less than a certain value, right?
>> Daniel Keren: Actually, I don't think that you'll get the helix. Because you're looking at one
function, at one single function, and looking at all the points in which it equals zero. Okay.
So -- how should I say that. I mean, you're trying to interpolate a bunch -- taking more and more
complicated functions. I'll try to convince in ten seconds.
Suppose you're taking a bunch of points, okay, and you're trying to construct a function, which is
not real, of course, so you enforce some norm coefficient, the norm is 1 or whatever, and you
force it to be 0 here. And you take more and more and more, for example, Fourier coefficients.
Say I will go up to a frequency of 1 million. It will be zero at these points, but it will be zero in
many, many, many other points.
We can prove that our solution, this will not happen. This will eventually in the limit be small
only at the points. Of course, if you take too many, first it's absolutely not necessary. There's
obviously some kind of degeneracy in taking more functions than points. So it will not happen.
So I hope I -- so the problem will be really if you take too many functions.
In our case we did not have to tailor them because we never went above the number of points.
The number of basis functions is always bounded by the number of points. Can be smaller, but
it's never larger. So you don't run into these naughty problems of -- yeah.
>>: The classical problem of overfitting.
>> Daniel Keren: Definitely. Definitely.
>> Sing Bing Kang: We can ->> Daniel Keren: Okay. I'll be happy to.
>> Sing Bing Kang: Let's thank our speaker once more.
[applause]
Download