Document 17954770

advertisement
>> Eyal Ofek: Good morning. It is my pleasure to invite Simon Korman to give a talk here at Microsoft
Research. Simon is finishing his Ph.D. at Tel-Aviv University under the advisement of Professor Shai
Avidan. He was also doing an internship here at Microsoft Research, so Simon.
>> Simon Korman: Okay, thanks very much for having me. This is the topic of the talk today Inverting
RANSAC. This is recent work that will appear in CVPR, joint work with Roee Litman, Alex Bronstein, and
Shai Avidan.
Before I get into details here, I’d like to give the context. Just say something about other things I’ve
been doing during my Ph.D. The problem I first starting looking into is of computing and Approximate
Nearest Neighbor Field between a pair of images. That is, you need to find a dense mapping between
the entire set of patches of one image and the entire set of patches of the other image.
This is a problem of high complexity because the number of patches in each image is like the number of
pixels. You need to do some approximation here. You probably know of the Patch Match method that
came out in two thousand and nine which really made this task possible in a couple of seconds I’d say.
We came with this approach Coherency Sensitive Hashing where we incorporate a hashing scheme into
this problem. We managed to do so to find a matching which is more accurate and faster than what
patch match can do.
Later on and this is joint work with Eyal here. We looked into the same problem, but when we have the
additional information of a depth channel of the image. We asked ourselves whether we can do a
better, so again just an AGB matching between patches if we’re given this additional depth information.
It turned out to be that the answer is yes. We have a way of using the depth information to kind of
rectify in some way the patches. Reduce the problem to the problem similar to the original one. As an
application we focus here on single image de-noising. We showed that when we use patch based denoising methods which are fed with our patch matches. We get results that are significantly better than
state of the art in single image de-noising. But again, compared to methods that do not use depth, we
really managed to show that using depth makes a difference here.
Now to, this is the, I think the main line of my works. You could put this under this frame of Global
Model Detection. These are four different problems. I’ll just go through them quickly. The first one is
about template matching under affine transformations. The second one is about finding global rigid
symmetries of three D shapes.
The next one is and this is the topic of today’s talk about registering a pair of images given a set of
matching interest points which contain noise and outliers. Finally there’s this work here about depth
extension. It is done by designing a special kind of template matching. That is specific for this task, so
this is still work in progress.
What is common to all these problems or the way that we tackle them is that there is some very large
search space in which we want to find a model. What we show in different ways here is that if you have
an understanding of the error function. They way it changes across the search space. You can do kind
of an efficient exhaustive search that gives you global guarantees about the result you obtain.
Let me get into the work. I like to start with this very simple Canonical example in robust estimation
which is affinity aligned to a set of points. In this case the line is the model. You can see typically what
you have is a set of inlier points. These are the red points which have some amount of noise around the
model. In addition you have outlier points these are the blue ones here. If we count the number of
inliers in this example, we can say that the inlier rate. We mark it by p star is int his case eight out of
seventeen. There are eight inliers out of a total of seventeen points.
Now what would be the standard way to find such a model behind a set of points? You would use, kind
of solve the problem of maximum consensus maximization. The standard way to do that is RANSAC.
You’re looking for a model that has the highest support of data. In this case what does RANSAC do? It’s
an iterative method. At each iteration it picks at random a minimal subset of the data that can generate
a model.
In this case it picks at random two points, looks at the line that goes between them, and measures the
amount of support. How does it do that? It’s given an inlier threshold which we map by r star. It counts
the number of points within that amount of error.
Okay, so this is not a very successful choice. It gives only two inliers. But repeating this process you can
get there in better alliance like four inliers here. For instance this line was, so the two points that were
chosen were actually inliers. But still the line that went through them is not perfect. It’s close to the
true model but not there yet. This gives 5 inliers. Eventually you might find this line that explains the
complete set of inliers in this example.
Now our method works in a very different way. We call it GMD, Global Model Detection. What we start
with is as I said or suggested before is we find. We look at a sample of the complete space of solutions.
In this case we’re looking at a set of lines. For instance we take this sampling of horizontal lines. We
look at different orientations.
Once we have this sample of lines the second step is to directly estimate the inlier rate p star, okay.
Now assume we managed to do that. We somehow know now that p star is eight out of seventeen. We
can go to the third step which is to find the best line for this specific inlier rate p star. We can look at
each of the lines in our set.
For instance this line here we look at the eight closest points. Sorry, this would be the eighth closest
point. It gives some amount of error. We can measure this over the entire set. I will show that if we
were to turn the best line that we found. We have a guarantee on the error that it gives compared to
the best line in the continuous space.
This is something we can do. But we go further and we can employee here a branch and bound scheme.
Again, using these exact guarantees and actually get as good as an approximation as we want. We can
actually find the optimal line for this specific inlier rate p star. This is just an example of another line.
Finally we will find this line which has this error which is optimal for this p star equals eight out of
seventeen.
I will talk about the guarantees that allow us to do this. But the main part of this work is actually going
back to step two. Which is how do we figure out what the inlier rate p star is? Getting back to step two
and we notice this very interesting phenomenon here. If we look at this best line for the true p star
what we notice is that the error that it obtains is very rare. What do I mean by that? I mean that if we
look at different lines in the space. It turns out that there are very few lines or the portion of lines that
have an error that is similar to this one is extremely small.
Okay, so you can imagine if you take different lines, different orientations, or locations. There would be
very few lines that get this kind of error for the eighth closest point. But what if we had a wrong
estimate of p star, so for instance if we had an under-estimate like five out of seventeen. This is the,
actually the best possible line for the fifth point. But here it turns out that there are many other lines
like these ones here that have a very similar. That have an error that is very close to the optimal one
that I just showed.
These lines are close to the real model. But actually it might even happen that a line like this one which
is very far can also have a very close error to the optimal one. If you take an over-estimate like eleven
inliers you have the same behavior. This is maybe the best line for eleven. But there are many lines for
eleven points that would have a similar error.
What we get over all. This is just an illustration. But this is the behavior we see is that if we look at the x
access at different possible inlier rates. Then for each one we count the number of, we call them good
lines or good models. Good lines are ones that have an error that is not much worse than the best one
possible for the specific inlier rate.
We see this kind of behavior. This curve has a clear minimum at the true inlier rate p star. We call this
our IRE measure. I’ll define it more precisely. We basically compute it for different percentiles and
choose the minimum. Take p star to be the one with the minimum value.
>>: To compute the inlier rate for given line you find the n closest points. Like you find eight closest
points and then measure the sub-square differences or something?
>> Simon Korman: We actually work with the distance of the eighth closest point. Not the average or
anything like that. We just take the percentile of, the specific percentile of errors with respect to the
model. In this case this is just a simple case. It would be the distance of the eighth closest point.
>>: The distance of the eighth closest point, okay, right, okay.
>> Simon Korman: Right, yeah.
>>: You basically for a given candidate line you can just tabulate all of the distances. In essence sort
them if you need to. But maybe you don’t need to sort.
>> Simon Korman: Exactly.
>>: Just basically compute what the thing is, okay, I get it.
>> Simon Korman: Right, that’s exactly what we do. Yes?
>>: Is this curve very sensitive to how densely you’re sampling your candidate lines?
>> Simon Korman: I’ll get to that. We have a theoretical analysis of this phenomena when it actually.
We give conditions to when it actually. We can prove that it actually happens. But we say that in fact
it’s so strong that it happens even with very close kind of sampling of the space. I’ll show real examples.
We saw this behavior across different kinds of data transformations. It happens. It’s a very strong thing.
Getting back to our, the problem that we’re really looking into. We have a pair of images. We’re given,
we’re looking, our model is some transformation that maps the first imaging to the second one. In this
case we’re going to work with homographies. We’re given a set of point matches. Those matching
interest points between pairs of matching interest points between the two images. Each match point m
is a point q one and image one, q two and image two.
In this example, so you can see the true transformation. That’s the one that maps image one to this
pink quadrilateral. We colored the matching points. The blue ones are inliers. The red ones are
outliers. This is a real example that we obtained by matching SIFT points.
Okay and in this case like we had before a point and a line. The distance was the error. In this case we
define the error of a match m with a respect to a transformation t. This is given down here below, so if
we have the match q one and q two, this is the match itself, and some transformation t. The error is
exactly this distance in the second image. This is the standard thing that also RANSAC would work with,
okay.
Now if we look at the ground truth transformation. We can look at each and one of the errors of the
matches that we have. Say we have here two hundred matching pairs. Say we sort these errors from
the smallest to the largest. It’s very interesting to look at the distribution of these errors. This is what
we do. This is just an illustration.
But we plot here a curve which is the CDF of this distribution. Starting with match errors that are close
to zero down here and the higher you let the match error go. You can accumulate up to a hundred
percent of the matches. This is up here, okay. This is also a typical shape of this CDF.
There would be very few perfect matches. Even the inliers suffer from some noise. Usually you do see
at a certain point this kind of plateau in a CDF, which kind of tells you that even if you let the match
error grow significantly. You won’t be finding any more matches. This is usually the area of the true
inlier rate.
Just to compare RANSAC to our approach looking at this illustration. What RANSAC does is it first
assumes or is given usually an inlier threshold. This is this point down here. Then it tries to find a
transformation that maximizes the inlier rate with respect to this specific threshold. It tries to find a
transformation with as many as possible matches with up to this error. If it succeeds it will find a point
that’s at least at the height of this CDF. You can run RANSAC with different thresholds. If you’re more
generous give it a higher threshold that would find a higher amount of inliers.
Now our approach really starts as I said with estimating the percentile of matches of interest. Then we
find a transformation that minimizes the error with respect to that specific percentile of matches, okay.
Now I’m back to this real result. This is what we get. Down here at the bottom this blue curve is exactly
the IRE measure that I was talking about. On the y access this is shown in log scale. It’s actually a very
sharp minimum here. In this example we find a minimum at thirty percent, thirty-seven percent of the
matches. Once we have that we can, we look here this is our search scheme that’s four thirty-seven
percent, and finds a transformation with this amount of error. You can see here also RANSAC results on
this example. Okay, so this is basically the idea.
Again, to talk about some prior work. Most methods tackle the Consensus Set Maximization problem
which is given definition of what it is to be an inlier. To find a model with maximal consensus and
RANSAC is the main technique used here. Starting from the original scheme there have been, these are
only a few of the many improvements and extensions there were over the years, which actually made it
much more robust. Made it run faster with different very good heuristics.
We will be looking at this recent work from PAMI two thousand thirteen, which is called Universal
RANSAC, which kind of incorporates all the good extensions that have been proposed over the years. It
also comes with very efficient source code. We looked at that.
I should mention that there are for this Consensus Set Maximization problem there are Global
Optimization techniques. These, some of them, so these are methods that actually guarantee that you
find the global best transformation, the one that finds the real maximum number of inliers for a given
threshold. But these methods are they’re interesting. But as you go higher up with the search space
size, so like with homographies they’re not very practical.
Now let me get into our method. Basically we have these three steps. We sample the homography
space. Then we, using that we estimate the inlier rate p star. Finally having p star we can find the best
transformation for the specific p star.
How do we sample homography space? I’ll start with some definitions. This we’ve seen already the
error of a match m with respect to a transformation t. As I showed before is this distance in the target
image. Because we’re now going to sample the space of transformations, the space of homography, so
we need to define a distance between homographies, or transformations.
What we use is this l infinity kind of distance. Where if we have transformations t one and t two we
look, we go over all the points in image one and the distance is the maximal difference in the target
locations in image two. Okay, so this is an l infinity kind of distance between transformations.
One more thing this is the Sampson error. It’s the standard way of, so when you come up with, find
some transformation you don’t only want to see how many inliers you have. You sometimes want to
see how close it is to the ground truth transformation. The Sampson error is the standard way of
measuring this.
What you do is you take a large set of points or maybe the entire set of points in image one. You map
them with both the ground truth and your transformation t. You take the average distance between the
target locations in the second image. This is the Sampson error.
Okay, so…
>>: Why not choose the maximum just like before?
>> Simon Korman: This, the maximum kind of won’t tell you the whole story of what the difference is.
Like the maximum we use because we are going to be using it to bound how much the error can change.
It’s kind of a worse case, to give the worse case bounds on the change in the error. But in the end you’d
like to know over the whole image how close you are to the ground truth.
Now, how do we sample the homography space? We build this sampling that we call S epsilon. We
have this precision parameter epsilon. We do this, so if we’re looking at images I one and I two. We
impose this two dimensional grid of points over the space of image two. The diagonal distance here is
epsilon. The transformations that we allow are exactly the homographies that map the four corners of
image I one onto four points on this grid of points.
For example you can see this green quadrilateral is a legal one because all the corners are on grid points.
On the other hand these two red ones are not because they each have at least one corner that is not on
a grid point. But on the other hand there are in our sample transformations that are very close to these
ones that are not in the sample.
Okay, so here are two. Formally what we get is an epsilon cover of the space of homographies. It
means that any homography in a space has an epsilon close sample in this set that we constructed.
Okay and the main point that we get from this covering is that if we look at a single match between the
two images. We’re looking at some transformation t.
We ask ourselves what happens when you change the transformation to some neighboring
transformation. To some other transformation whose corners are at most epsilon far? What you easily
get is that the error of the match when you move between the two transformations cannot change by
more than epsilon. This is kind of trivial. But it’s very important.
Furthermore, if we don’t only look at one match but we look at all the matches. We can look at
different statistics of matches like the average of all the matches, a certain percentile, a mean of a
certain quantile. All these measures will also change by at most epsilon when you move between these
neighboring samples. These facts are very important for us to give the guarantees on the error. To
allow a branch and bound scheme that is fully guaranteed again.
Okay, one other thing is that, maybe a nice thing about this sampling is that it’s rather uniform in the
space of homographies. When I say uniform I mean uniform with respect to this distance between
transformations that I define. You can measure here also the cover radius and the packing radius of this
cover which are not very far apart. This is important for instance when we, so we have that IRE measure
that we would like to measure in the continuous space since we do this through this sampling. We
would like the sampling to be kind of uniform in space to represent what happens in the continuous
space.
Okay, so that was the sampling. Now technically how do we use this to estimate the inlier rate? We’re
given a set, a list of matches m. We constructed this sampling of transformations as epsilon.
>>: You construct the complete, if there are m by n grid points, right. M, n grid points you construct the
two to the m, n potential homographies?
>> Simon Korman: Yes, but again the idea will be to start with a very coarse sampling.
>>: Okay.
>> Simon Korman: Yeah, it does go to the power of eight for homographies like you have to choose
each of the four corners…
>>: Right, so it’s eight to the, oh, right, okay.
>> Simon Korman: Yeah, yeah that’s the number that points to it. Not really to, I mean some, you want
to make sure that it’s a homography. We kind of limit the scale of the homography. But after that it is,
grows.
>>: Yep, okay.
>> Simon Korman: Yeah, so we build this two dimensional matrix, error matrix E where the entry I, j is
just the error of the match mi and the transformation tj. Okay, so you can think of maybe a couple of
hundred of matches typically, many thousands, tens or hundreds of thousands of transformations.
We compute this matrix. Again, each row is cross points to a match, each column to a transformation.
The first thing we do for each column we sort the errors in an ascending order. What we get as a result
is that if we look at the certain percentile, a row at a certain percentile p. We get the pth match-error
for each of the different transformations.
Having that, we can now for each such row of errors we can find a minimal error. We store this in this
vector r min. R min holds for each percentile the best possible error for that percentile over all different
transformations.
Now we can count in each such row how many of the errors are at most epsilon larger than the smallest
error we found. How many of the entries are at most epsilon larger than the minimal one we stored in r
min. This is the IRE measure that I was talking about. That blue curve that we saw. We simply find the
minimum of this vector and the corresponding row is the percentile p star that we choose. Okay, so it’s
pretty simple.
Now once we did this we can continue to find the best transformation for p star. Starting from the same
point here, so this is giving a little more detail. This is the row that we’re working with. Say this was the
best error we found, so we stored it here in r min. Say in this example that there were three errors, so
one, two, and three which were at most epsilon worse than the best one. I marked with the red cross all
the errors that were more than epsilon higher than the optimal one.
What we can do at this stage. Again, we have to keep in mind the construction of the sampling that we
had. One thing is that if we want to just return this transformation here that gave the best error. This
transformation we know that its error is at most epsilon higher than the optimal error in the continuous
space. But what we actually do is we want to employ here a branch and bound scheme. What we can
show here is that we would like to improve the approximation to move to a finer sampling S epsilon half.
What we can safely do is any of these red transformations we can safely discard in the search space.
Because from this sampling we know that even if we open finer resolutions around this transformation
the error cannot change by more than epsilon. Therefore, it cannot be even as good as this error that
we already have. What we do is we can focus on these three surviving transformations. Open the finer
sampling only around those. We can repeat this in a branch and bound manner. Okay, so this is the
idea.
Now I’d like to relate back again to this Inlier Rate Estimation measure. Really the two main insights
were, so first we claim that this measure obtains a minimum at the true inlier rate p star. But another
very interesting and practical fact is that as we just saw that the branch and bound scheme works best
and the most efficiently when it is given the true inlier rate p star.
This just going back here, so again as a reminder we chose the row to work with. The one with the
minimal value here and this vector our IRE measure is actually. It’s a count of the number of surviving
transformations for the branch and bound scheme. If we look at this real example again this is log scale.
If we chose a wrong, didn’t work with this minimum point but within a different point. The number of
transformations that we had to open in the branch and bound scheme would be much worse. It’s kind
of a lucky situation here.
A lot of the paper tries to deal with understanding this minimum phenomenon here. We try to analyze
this behavior and give some kind of analysis about why it happens. We did this making several different
assumptions which are very I guess there’s a very large gap between what. The setting that we’re able
to prove that this works compared to what we actually see in our experiment. We believe that it’s a
much more general thing.
But at the moment the proof is quite difficult even in a very limited setting. What we do is, I’ll just tell
you about the basic idea. We move to continuous spaces of transformations and matches. We use a
generative process for creating matches. Under which we will analyze this measure.
What is this generative model? It’s a, I think the standard way of thinking about this. How do we
generate a single match m which is a pair of points? In the background we’re given, we have some, the
ground truth transformation t star. We have some inlier rate p star. We’re also working with an
amount of noise r star which we allow the inlier matches to suffer from. The inlier matches also have
some amount of noise which has a magnitude of r star.
For the moment assume we take just an arbitrary point q one in the first image. We can look at the
target location in the second image. This would be where a perfect inlier point q two would be. But
what we actually do is we open a radius of r star around it. With probability p star we generate the
point q two inside this circle, as an inlier. We probably, probability one minus p star it will be in the rest
of the image.
This is the basic model. It’s kind of the standard way that say RANSAC papers generate synthetic
examples. This is what we have. What we get is this kind of probability space in image two regarding
where the location of the point q two will be.
Then if we look at any transformation t we’re at maps q one two. We can look at a radius r around that
point. What we get here is again that if the point q two is within this red circle the meaning is that the
error of the match is at most r. If it’s outside the error of the match is more than r.
We actually define, we look at probabilities here. We can look at the probability of the error of the
match being at most r. This is exactly the amount of the distribution and image I two that this red circle
captures. This is what we’ll be doing. In our analysis what we, when we count transformations. What
we do here is we actually count such red circles of a certain radius that capture a certain amount of the
distribution.
This is how we analyze the measure. Again, the radius is here corresponds to match error thresholds.
Probability here is, corresponds to, so if you generate many matches what happens with a certain
probability will happen at a certain percentile of the matches.
The limiting, the assumptions we make here are first of all regarding the noise of inliers and both inliers
and outliers. An inlier is mapped into the green circle with uniform distribution. An outlier is mapped to
the rest of the image again uniformly. We think we can, regarding the inliers we could use Gaussian
distribution around the center point which is maybe more realistic.
What this allows us is to when we look at any kind of area in the second image to calculate the
probability of a point landing in that area. We just have to measure areas inside the circle and outside
the circle to get the probability. Also we assume only two D translation because this allows us not to
assume anything about the distribution of the point q one. But only focusing what happens in the
second image.
What we end up getting is these equations that tie the IRE measure that we call here V with the
different parameters r star, p star, epsilon, and p. What we do is we differentiate this function around p
star to show that under certain conditions a minimum does exist. We’re able to show. We can see here
in this D shaped area in the space of p star and r star. The inlier rate and the inlier noise in this whole D
shaped area we show that the minimum must exist under these assumptions. This is the theoretical
analysis. There are many details in the paper.
Now I’d like to move to some experiments, some of the experiments that we did. This is one of the
experiments that tries to validate the IRE measure. What I’m showing here is two extreme cases of this
experiment. It’s a synthetic experiment. We generated point matches exactly according to the model
that we defined.
What you can see here is, so images I one and I two. This is a very extreme case in terms of the inlier
rate which is very low. It’s, p star is only one percent of the matches. Only the red matches here are
inliers. The rest are random outliers. The true model is the one that takes I one to this black
parallelogram. Here are the IRE measures that we obtain for this example. You can see that for
different values of epsilon the minimum happens to be quite close to one percent which is the true inlier
rate.
Again, this is another extreme example, this one in the sense of the amount of noise that the inliers
have. Eight percent of the image size which is quite large. You can see here again that the different
curves of the IRE measure obtain a minimum around the true inlier rate, which is sixteen percent in this
example. Yeah?
>>: Just to clarify to see if I understand. On the left side example since p star is one percent you were
kind of bound to take a low r percent because of the D shape, right, additional r star?
>> Simon Korman: Yeah.
>>: Otherwise it just wouldn’t work, right?
>> Simon Korman: Right, so these are two extreme cases. This one in terms of p star and this one in
terms of r star. If you take them both to be extremely low, we’ll see real examples that are in that
direction, but, yeah.
>>: Okay and can I ask one more question?
>> Simon Korman: Yeah.
>>: You kind of alluded to the fact that you start with pretty coarse epsilon. I can assume that you then
do kind of a heretical or somehow you know.
>> Simon Korman: Yeah.
>>: Use a sequential epsilons which are lower in specific areas of the solution space. But I don’t see
how you do it.
>> Simon Korman: Again, the branch and bound scheme is only in the second stage for finding the
transformation. Basically the estimation of the…
>>: P star.
>> Simon Korman: Inlier rate p star is from the, we usually use…
>>: [inaudible]
>> Simon Korman: Initial epsilon. Practically in some cases we do maybe if we see that there isn’t a
clear minimum we can try again with one finer epsilon. But we can’t go much refine epsilon too much
because the sample grows a lot.
>>: Epsilon is a guess C parameter in other words?
>> Simon Korman: In this case epsilon is not the sampling parameter. But it’s the threshold of what we
defined as good transformation. It’s the one that has the optimal error plus epsilon. This is a synthetic
example. We could still sample it very finely because it’s two D affine.
But I’ll show now the real results. On homographies we worked with a very coarse epsilon. This is now
to move to the real data. We use these two data sets, the Mikolajczyk dataset from Oxford and the
pairs of images that come with the USAC method.
To talk about the runtime it really depends on how well the branch and bound scheme works here. But
typically I say that it’s around maybe five up to ten seconds for an image pair. We haven’t done much to
improve this. It’s quite a basic implementation. But again the advantage is that it works on the full
range of possible inlier rates and amounts of noise that the inliers suffer from. Unlike RANSAC and so
this is what we have.
Preferably for homographies, so we do use an initial resolution of around epsilon of around thirty
percent of the size, the dimension of image two, which is extremely coarse. But still we’re able to find
the inlier rate correctly in most cases. Even with this very sparse resolution.
>>: That’s a three by three grid, so it’s larger then?
>> Simon Korman: It’s a bit more than three by three because we also have to take points outside the
image because of scale. But say maybe five by five, six by six. But it still gives a huge number of
transformations because it’s homographies, but, yeah.
We did do some very small changes to try to make things. When you have a branch and bound scheme
there are many ways to do it correctly, efficiently. Some heuristics that can actually help you manage
better the way that the, this tree expands. We added here a very degenerate kind of DFS search.
Where we take the best current transformation and we just drill down locally. Hopefully it will increase
the bound a little when we move to the next stage.
There are better ways of doing this. But this is what we did at this stage. Also once we find the final
transformation we can do this Re-weighted Lease Squares optimization on the final set of inliers that we
had because even when you find the set of inliers you would like to concentrate on the inliers that have
smaller amount of noise. This is something that like Lo-RANSAC does so we have this as an option.
Let’s see some of the results. These are real images from the USAC dataset. We have three pairs of
images here. For each in the middle column you can see the inlier rate measure. Again the Y access is
log scale. You see the minimum that we find here forty-six percent inliers, thirty-six percent, and in this
case seventeen percent. Here you see the results overlaid on the ground truth CDF. You can see
RANSAC USAC here and our result.
>>: How do you interpret these green curves with the red dashes when you say RANSAC results? These
are RANSAC results for different inlier?
>> Simon Korman: Right, yeah.
>>: What’s, how do you tell that the algorithms doing well? I mean these are just number of matches.
Presumably the homography that’s estimated right could be very similar between doing twenty-five
matches and thirty-seven matches, right.
>> Simon Korman: Right, right, so…
>>: What’s significant here?
>> Simon Korman: It is maybe quite confusing because even when you run RANSAC here. This is the
standard threshold of RANSAC for SIFT’s two pixels of error. We find this amount of inliers. If you use
here thirty pixels of error you find this amount of inliers. But actually most of these searches will find
maybe the same transformation or a similar one.
>>: Okay.
>> Simon Korman: Just, you count things here in a different way. This doesn’t really tell you how good
the result is. It’s just an interesting way of, you get an idea comparing to the ground truth CDF. But
what I will show in the other dataset is what these Sampson errors look like.
>>: Okay and what are the typical runtimes for you know the recommended USAC algorithm?
>> Simon Korman: USAC it also, there’s no one runtime I could say because they have these. You know
they have Pro-SAC which is one improvement and Lo-SAC. In some cases, so standard RANSAC would
have to evaluate say in, for a certain inlier rate fifty thousand transformations to kind of get a set of
inliers.
>>: Right.
>> Simon Korman: But they have these heuristics where in some cases like Pro-SAC you rank the
matches by their quality.
>>: Right.
>> Simon Korman: Sometimes they evaluate ten transformations and find a perfect one. It can take
fractions, milliseconds to find the transformations.
>>: Okay.
>> Simon Korman: But in some cases when, so especially when the noise is higher and inlier rate is
lower. It can take much longer and even fail. That’s what we see.
The more interesting dataset I think is the Mikolajczyk one because you have these five homography
sequences. Each sequence has frames one to six. You can kind of see where, when the difficulty
increases where the methods still work, to what stage they still work. This for example is from one of
the sequences frames one and two.
These sequences are mostly view point change. From frame one to two there’s a slight view point
change. You can see, so there is the ground truth map. It’s like a dashed thick green quadrilateral. The
pink quadrilateral is our result overlaid on that. You can see that it looks right in this example. But again
in this example you can see that all the matches, I think a hundred percent of them are inliers. Because
SIFT just worked very well here. This was very easy.
But as you go forward in this sequence I hope you can see that the number of, the percent of the inliers
drops drastically. What you can’t see that even the inliers that you still have here also have a
considerable amount of noise. They’re not localized exactly. We were kind of surprised because people
think of SIFT as like you could control how much the localization error can be. But we found that it’s
much more than what people usually work with.
These are some other examples of; these are frames one and four from three of the different sequences
and again, the same kind of results that I showed before. Now if I move from frames one and four to
frames one and five. These are frames one and five, so you can see that the inlier rates drop here. In
this example it’s only thirteen percent of inliers.
These are frames one and six, actually this is one of the only examples that we totally failed. This is due
to the fact that there are only two correct matching SHIFT points in this example. SHIFT totally broke
down here. We couldn’t do much.
Really looking at the, these are the Sampson errors of both our method GMD and USAC, so we’re
looking at the five different sequences. In each sequence the five different image pair combinations.
You can see here the Sampson errors which are in pixels. If you look carefully, so in most cases I think
the methods are comparable. They do quite well. I think even the ground truth transformation is
maybe not, doesn’t have subtext or accuracy. Being around one pixel error I think is just fine.
But the interesting thing I think is looking at these two sequences where at a certain stage USAC fails
earlier than we do. In these two examples we still manage to find a transformation which is not too bad
compared to their failure. Just to look into these two examples.
Now these are the two sequences graffiti and graffiti five. We compare the two methods in terms of the
inlier rate that they find and the amount of noise that these inliers have. This is just an example. But
the two cases where they failed and we still succeeded. You can see that they’re very extreme both in
terms of p star which is only twelve percent and nine percent here. Also the inlier noise which is kind of
related to r star. This r star is also very high, so eight point five with standard deviation of fourteen and
this example too.
Down at the bottom here we can see for these two sequences. Again the ground truth CDFs of the
errors with our result overlaid on h one of them. You have frames, the red one is frames one two,
frames one three, and so on. These two difficult examples are this one down here and this one here.
Where you see that if you look at the CDF there are very few errors that have, very few matches that
have an error that’s below say even ten pixels. These examples are really very challenging. We still
manage to get a result.
>>: [inaudible] the use of epsilon was higher than that error?
>> Simon Korman: Yeah, we ran USAC with like in different steps of thresholds up to, I think forty or
fifty. But the problem with USAC, so again it’s, I think it’s a mistake in the basic analysis of RANSAC.
Where you assume there are inliers, a certain amount. Your only goal is to manage to randomly pick a
set of points that are purely inliers.
But what happens if and I showed an example before. But the set of inliers that you pick still has some
amount of noise. You’re not guaranteed using that to find. If the amount of noise in the inliers is high
you won’t always find the true model. I think this is what happens in these two examples here.
>>: Right, so you don’t have to find a minimal set. You can just find the inliers and then do a least
squares or weighted least squares.
>> Simon Korman: Yeah, yeah, that’s what we do. Okay, so I think I’m done with that. If, I’d like to say a
few words about the Fast-Match work. How different it is compared to this one. Just in a couple of
words because I think it’s interesting. There it’s, the problem looks very similar because you’re trying to
search in transformation space, mapping the first image into the second one.
We worked with affine transformations. But you could, we actually generalized to homographies just
like we have here. It looks very similar. But what is very different is the error function that you work
with. This was a pure kind of geometric problem because you work only with locations of these
matches.
In template matching you work with the pixel intensities themselves. We work with the, some of
absolute differences of when you map the template into the image. You take each pixel and look at the
difference to the target pixel it’s mapped too.
Again we have this idea of sampling the space. But the, and things are much simpler because we don’t
have this, we didn’t handle this outlier problem that we had here. We consider all the pixels in the
template as inliers. We’re working with a whole, with everything. In that sense it’s much easier than
this example.
But on the other hand, so when we had this sampling we worked with here. The main point was that
when you move between neighboring samples the error does not change by more than epsilon. That
was very easy to, was trivial to get here. But what happens when you do that with some of absolute
differences, it’s not true. Even if you change your transformation by one pixel the error can go
anywhere. We don’t have that guarantee.
But what we did manage to do there is to show that when you change your transformation. The error
can actually not change arbitrarily. But it can change as a function of how much the transformation is
different. Also depending on the smoothness of the template, so it turns out that the smoother the
template is, or the image is. The change in the error is much smaller. The more texture vary the
template is the error can change in a faster manner.
We managed to prove there are exactly bounds on this relation. On how much the error can change as
a function of the change in transformation and the smoothness of the template. Based on that, we can
determine the exact density of the sample of transformations that we use. We have the same kind of
guarantees there.
Again, we do the same branch and bound scheme. We manage to get results. I’ll just show, I’ll skip
these but, sorry. These are kind of results that we get. This is an image. We extract the random in a
template and then search for it back inside the image. It works very well. These are examples with kind
of a large template with respect to the image.
But we looked at different size templates going down to very small dimensions of templates. What we
show that, so we compare our method to an interest point based method. Like a SIFT based matching
method. It turns out that really the smaller the templates are you can see that feature based methods
just break down.
When you work with small templates like these are twenty-five percent of the image size here. They’re
just shown larger. But actually this is where it comes from. This is twenty-five percent in these
examples. Fifteen percent and so when we go down even to ten percent of the image. You can see
when you take templates of this size it’s almost impossible to think of trying to match feature points
between the two images.
Actually I think, I don’t know of any other way to tackle this problem especially as the templates get
smaller. I think it’s a very effective method for doing that.
>>: What about scale between the template and the image?
>> Simon Korman: Scaling?
>>: First you have to assume that there is scaling and how far can you…
>> Simon Korman: Again we worked in this setting. We worked with the affine transformations. We
limit the scale to say up to times three and one over three. Yeah, you have to limit your search space in
some way. But still it manages to do things quite quickly.
Here you can see one more thing that, special thing that we use here. That instead of computing the,
when you map the template into each of the thousands of locations in the image because you’re willing
to suffer this approximation of say epsilon. You can only instead of mapping the entire set of pixels.
You can only choose a random, can choose only a random set of pixels and evaluate the error only on
those. Get the same kind of guarantee.
We look at actually a constant number of pixels in the template size. It doesn’t depend on the size of
the template and evaluate the error only on those specific. You can see these pixels here. This really
together with a branch and bound enables to speed up the whole thing. It works quite nicely.
Just to summarize now. These are some other examples. We believe that some different large spaces,
search spaces can be exhaustively searched through if you have an understanding on the behavior of the
error function. If you do, so you get global guarantees on an approximation. You can then usually also
use a branch and bound scheme to get a very good approximation.
Really there are other domains that we’re already looking at applying this technique. Also the robust
estimation with the inlier rate error measure we think would be interesting to look at other domains like
in even machine learning. Where you specifically want to take into account the fact that there are
inliers and outliers that you’d like to totally reject, and not just use some robust measures that can limit
how much they can ruin your results. But actually work with the, try to isolate the inliers of the
problem.
We’re thinking in those directions too. That’s all I think any questions?
[applause]
>>: Very nice work.
>>: Do you have, will there be any limitation that you can play with…
>> Simon Korman: There will be for all my work so there ends up being an implementation available.
For most of them there is this latest one with inlier rate is not available yet. But hopefully in a couple of
months it will be, yeah.
>>: When you are looking the space of affine transformation you said for instance that you will
eliminate scales beyond three x and one over three x.
>> Simon Korman: Yeah.
>>: Are there, could you have kind of say probabilistic priors on the affine transformations? Would that
help at all if you knew which ones were more likely to look at before others?
>> Simon Korman: So like…
>>: Let’s say, you know let’s say like a one x scaling is twice as likely as a two x scaling for instance, right,
you could image the distribution.
>> Simon Korman: I would image how you could do things to get your result faster. But in terms of if
you still want to keep the guarantee on the global search.
>>: Right.
>> Simon Korman: Probably not, but practically yeah I guess that you would prefer to sample more
densely to start with in the areas that are more likely.
>>: By estimating the rate of inliers. If someone gave you, if you started to check with some guesses
which are close to the minimum you will find faster, right? Then you could maybe order the branch and
bound according to what’s expected, group expectancy.
>> Simon Korman: I’m not sure I exactly get how that can be done. But I think that, so one of the things
is that we do the inlier rate estimation first, only at the first level. Then continue with that. There
should be a way of kind of running things together. Like maybe from the first level getting coarse idea of
what the inlier rate is. Then also refining that moving to the next stages, so possibly that could be done.
>> Eyal Ofek: Great, thanks.
>> Simon Korman: Thank you.
Download