to give us a talk this afternoon. Hongzhi just... Stevens Institute of Technology, which is just across the Hudson... >> Rick Szeliski:

advertisement
>> Rick Szeliski: Good afternoon. It's my pleasure to welcome Hongzhi Wang
to give us a talk this afternoon. Hongzhi just defended his PhD thesis at the
Stevens Institute of Technology, which is just across the Hudson River from
Manhattan. And he's been doing a lot of work with his advisor, John Oliensis, on
Perceptual Organization. He's going to tell us about some very interesting ways
to do segmentations, in particular, to deal with the uncertainty associated with
segmentations.
>> Hongzhi Wang: All right. Thank you. Thank you for attending this talk. In
this talk I will try to describe the major work I did in my thesis work, which is trying
to use image segmentations to do shape matching. I will show you this work can
be applied for image segmentation and for shape matching for both tasks. And I
will describe the applications for both areas.
So first, I will give you some -- a brief, a compact information for this work. The
recognition problem is a very interesting topic. Given the images we want to
know what object and what scene, what activities are going on inside the image.
But recognition is a very tough problem.
Just like the example I showed here, even though these two images have very
similar contents, have similar objects inside image, but the appearance of the two
images can be quite different. And manufacturers can contribute to these
appearance variations. For example, the viewpoint of the object, the skill,
illumination, and the definitions, all these factors make recognition a very
challenging problem.
When we talk about the recognition there are some fundamental issues related to
recognition. The first one is a representation issue. It's also called -- it's also the
so-called the local versus the global dilemma.
Globally the whole image contains rich information, which is sufficient for
recognition. But it is very hard to work with because the huge appearance
variation. You know, in other words, it means it is very hard for us to find the
correct matching in a global image level.
On the other hand, if we can use a local image patch, it will be much easier to
work with because appearance variation is much smaller. But it may not contain
sufficient information for the recognition task. So that's the issues.
Another important issue is the features we can use for recognition. So far the
most popular features are using the textures and edges and recently the image
segmentations has also been applied for object recognition.
So next I will give you some work, a discussion about these things and actually
I'm sure you already familiar with this, but I just, you know...okay.
So since the global image is not very easy to work with, so currently the most
approach is using local features. So we have to find some way to clarify the local
ambiguity somehow. The tags(phonetic) of feature technique tends to use as
many features as possible. And the good thing about it is it can be learned very
efficiently using the supporter vector machine technique.
Another type of approach also considers the special relations between the local
features. This way is more efficient to clarify the local ambiguity, but the learning
becomes harder because to find the optimum solution we have to switch over
computatorial space. Right. And recently there are some techniques try to
combine these two, for example, the fragments approach.
Compared to the tags for features, the edges are usually more robust to
appearance changes caused by illumination change or viewpoint change. And
more importantly, just like showing you this example, edges tells you the shape
of the object, which is a very distinctive feature for recognition.
So next, let's take a close look at the shape matching problem. Again, when we
try to compare the shapes we are facing the global and the local dilemmas. The
global shape is very distinctive, but it's very hard to work with because of huge
possible definitions. Right?
And on the other hand, the local ad fragments is easier to work with, but is not
affirmative enough. So currently for reliable shape matching you should need to
find the point-to-point edge correspondence, which is very expensive because it
requires such overall communitorial space.
>> Question: (Inaudible) -- scale and perspective?
>> Hongzhi Wang: Yes. Ideally you should account for the scales, but in this
talk we assume the shapes you are trying to match already have a similar scale
and already have a similar orientation, something like this. You know, just like
the example shows here. It's a simplified task of shape matching. Yeah.
And recently there are some techniques try to combine advantages of global and
local shape matching approaches. For example, this one. Instead of using just
the ad fragments over here, they try to group the ad fragments first. After
grouping, the shapes becomes more distinctive, but it is still simple enough for us
to work with. So it's kind of like an intermediate solution between the global and
the local approaches.
The limitation for this method is that boundary grouping is not reliable, so they
have to use -- based on using training images and based on learning techniques.
So it is not very good to compare arbitrary shapes when you don't have training
datas for the problem.
Another interesting technique using histograms of edges or histogram orientation
of edges to represent a shape and this method is very simple. It doesn't require
a point-to-point edge correspondence and it is robust enough to overcome small
definitions. It is one of the most successful techniques available so far. The
major limitation is that histograms use information loss compression technique.
So basically it will sacrifice shape description accuracy. Just like the example
shows here, if adjust the histogram over the region over here, these two shape
actually have the same histogram, but actually they are different shapes.
So after a brief review of the techniques, here comes close to the work I did. For
the shape matching problem, our work is based on information shown over here.
When we try to compare the two shapes you can transfer the edge consistency
into region consistency. By region consistency I mean how well the two regions
overlap to each other, just like the example shows here.
This way, you know, you can detect any small shape difference and it is still -you will require the edge correspondence between the two shapes and since the
regions are global features and they span over large regions, so it uses robust to
small shape variations.
So these are the advantages using this strategy to shape matching. Here is the
list over here. When we try to use regions for shape matching, it also has some
issues we have to take care of. The major issues is since regions are global
features, so it is very hard to represent and compare precisely. And more
importantly, regions usually -- it is very hard to retrieve good regions for images
directly. So all these issues, we need to take care of and in my work I try to
address these issues to make this idea more practical and more efficient.
So here is a brief overview of the solutions for each issue. So for representation
problem, the proposed use image segmentation to represent a shape and to
compare the shape. And for matching we use much information to do the
comparison. To address the unreliability of the segmentations, we give a
solution to average over all the possible segmentations.
So next I will give you the detail for each part. The first part is using image
segmentations. The first thing I want to mention is image segmentation has been
used for recognition before. But mostly the image segmentation has been used
to represent appearance. And in this work the major novelties that we tried to -we proposed to use the entire shape encoded in the segmentation to represent a
shape and as I will show you it involves very efficient shape matching algorithms.
Just like the example I showed here, good segmentations usually contain
sufficient boundaries for shape matching. And also using segmentations to
represent a shape has some other advantages. The major one is image
segmentation algorithms usually using the global image information to compute
the boundaries. So we have better chance to find the true boundaries, the local
add detectors.
Another major advantage as shown in the images. The image segmentation
reviews the global shape structures of the image, which is more distinctive than
local shapes. But the limitation of segmentation, it can be unreliable. Just like
the examples show here, sometimes image segmentation does not have
sufficient boundaries for us to do the shape matching. But over segmentation
can be partially addressed this problem because it can -- has better chance to
locate the two boundaries.
But the point here is over segmentation is not sufficient because they add too
many thick boundary side image. So it makes the overall shape description less
accurate. So we will come back to this issue later. Actually this is the major
point that actually to addressing this problem is the major novelty in this work.
And -- okay. The next point is we may have the shape of the segmentation.
How can we compare them? The intuition we use to compare the shape of
segmentation is shown in this image. So if -- suppose we want compare the
segmentation with to segmentation in the middle. The red column shows the
joined segmentation for these two matching problems.
The joined segmentation is just simply just to overlay the boundaries of coming
from this to segmentation, to have the joined segmentation. Right. The intuition
here is if the two segmentation have similar shapes in their joint segmentation,
their boundaries tend to overlap each other. So this will give you some large
segment, you know, in the joined segmentation, just like the body of the car.
On the other hand, if they don't have similar shapes in their joined segmentation,
their boundaries don't overlap to each other, so this will give you many small
segments. So thus the intuition we use. And the good thing about this matching
strategies is that this intuition actually is well captured by the mutual information
concept. So the idea here is that we want to use mutual information to mirror the
similarity between two shapes. All right.
So before we go ahead to show how to mirror the image information between
images, between segmentations, we first need to know how to compute entropy
for segmentation. For this purpose suppose you have segmentation like this. If
you adjust the run like pic, one pixel out from this segmentation, the piece of I
represents the probability that this pixel is coming from the ad segment. Right?
So piece of I is actually Z equal to the area of proportion for the whole image. It's
like shown this image. For this one we have six segments. Right?
So basically the piece of I represents the probability, distribution represented by
the segment size for this segmentation. Then using the standard entropy
definition, we can compute the entropy for this segmentation. All right. We call
this structure entropy. Because it actually gives the structure complexity of the
segmentation. Just like shown in this example, these are two segmentations for
one image. The left image has fewer segments and have fewer boundaries. So
measured by the structure entropy it is much smaller than the other one.
All right. Okay. So actually after knowing how to compute entropy for
segmentation, it is very straightforward to use the standard and mutual
information definition to compare the similarity between two segmentations. So
here is the Y example to using the mutual information.
Suppose A and the B are the two segmentations we want to compare with. This
segmentation shows their joined segmentation. Below it shows the structure
entropy for each segmentation above. Right. For this case, A and B, they don't
really -- it contains (inaudible) information, so they don't tell anything about each
other. So the mutual information between them is 0, which means they are
totally different. All right.
By the way, are there any questions? Okay.
Also, in the context of the recognition, it's usually very useful for us to normalize
the mutual information. In our case we normalize the mutual information by the
joined entropy. So after the normalization the metric is between the 0 and the 1.
0 means totally different. The 1 means identical shapes. All right.
So here is another example to using this metric to compare the shapes. These
two segmentations are labeled by different human subjects for the image over
here. And this one shows their joint segmentation. As you can see it's pretty
consistent for these two, but there are still some minor difference in this section
and this section. And using our metric we can successfully capture this fact. All
right.
All right. At this point I will take a few minutes to give more discussion about this
segmentation-based mutual information metric. Because as we know, the
mutual information intensed image has been available for a very long time, for
mostly for image recognition applications. So here we are trying to use the
mutual information on image segmentations. So what is the major difference
between these two techniques? The major difference between segmentation
and intensity image is that for image segmentation if two pixels are coming from
one segment they have to be spatially connected by the pixels coming from the
same segment.
But for intensed image there is no such constraints. And so this is spatial
constraint difference between these two. And because of this difference using
image segmentations the matching results is more robust into small shape
variations. But using intensity metrics is more sensitive to such changes.
So here is an experiment I did to show this property. For this test what I did is I
tried to compare the similarity between segmentation with the shape diversion of
the same segmentation. So as I shifted the segmentation away wanted to see
how the similarity changes. And the X axis actually gives the shape of the pixels
between the two images. The Y shows the normalized similarity between the two
images. Right.
And for this one I used the structured image information on the segmentations
and mutual information on the intensity images. All right. As we can see, using
intensity image is very sensitive to the spatial shift. With the first pixel being
moved, it (inaudible) very largely. While using segmentations we have pretty
good robustness against this test.
And so basically this means, you know, intensity information is better for
registration because for registration we expect the two shapes to have very close
matches. But for recognition it's a different story. For recognition it's very rare
for us to find the two identical shapes. So we want the matching technique to be
more robust to this kind of variations.
All right. So the last part is how to address the unreliability of the segmentations.
All right. Since any specific segmentation can be very unreliable, so our idea
here is we don't want to base our decisions just based on any specific
segmentation. Instead we want to use all the segmentations, but weighted by
their probabilities. And so basically this is the idea. So here is -- shows how to
use this idea.
Okay. Recall, give segmentation, the structure entropy for all the segmentation
is defined over here. But now given an image since we want to use all the
segmentations for this image so we define the average structure entropy for this
image, which is defined over here. The capital S is one kind of segmentation and
the P gives the probability of segmentation for this image. And the SEG
represent the side containing all the possible segmentations. So basically this is
a very straightforward definition. So try to use all the segmentations, but are
weighted by their probabilities.
So to show how to use this definition here is a toy example. For this simple
example I just consider an image when I have four pixels. So for this case we
can enumerate all the possible segmentations over here. There are 15 possible
segmentations for this image. For example, this one. If two pixels are in one
segment there is an add between them. So for this one we have four segments.
All right. For this one we have three. For this one we have two.
To compute the construct entropy, we need to know the probability for each
segmentation. For this case if we assume each one, each segment has the
same probability then we can have the average such entropy for this example.
So here is the procedure, how to compute it. Since each segmentation has the
same probability, so the P here is a constant. So 115. So all we need to know is
the structure entropy for each segmentation over here. And here is one example
to compute and here is another example.
But unfortunately this definition is not very useful for us to compute the average
structure entropy for real images. Because first it is very hard to compute the
probability for each segmentation and secondly it is impossible for us to
enumerate all the possible segmentations for real images because there are too
many of them.
So the major part we try to give a solution how to efficiently approximate this
computation over here. For approximation our work is based on observation
over here. We try to compute the structure entropy, we can consider the
contribution from each pixel separately.
So here is what I mean. Recall the structure entropy is defined over here. It is
based on the size, the number of segments and some as the number of
segments over here. The observation over here means we can rewrite this
definition to another one. In this form it is actually considers the contribution from
each pixel separately, which is the sum or rate. And the contribution from one
pixel is defined over here.
And the N is the image size and S, superscript. M is the segment containing this
pixel N. A gives the area of the segment. So the whole thing, the whole thing
inside of the log actually correspond into a piece of I over here. Yeah.
>> Question: So why don't you use the segmentation of brightness distribution
in the segment?
>> Hongzhi Wang: The segment for ->> Question: (Inaudible) -- in the segment in the pixels there (inaudible) then
you will have different (inaudible), right? But you are not doing that.
>> Hongzhi Wang: Oh, you mean using the distribution of the intensity formula
for ->> Question: The segment measure.
>> Hongzhi Wang: The measure of what?
>> Question: Segmentation.
>> Hongzhi Wang: Oh, it is a possible way to do that, but I think this way is
more straightforward because it is just based on the probability of the
segmentation directly and using the standard deviation you adjust the one way to
evaluate the probability for the segmentations. Right. And but the work here we
don't really take care of too much how you (inaudible) the probability for each
segmentation. But once you have this evaluation here is the way to show how
you can combine all these evaluations to gather more reliable results. You know,
so it's kind of different concerns. All right. Okay. So okay.
So which means, you know, when you try to compute the structure entropy for a
segmentation all we need to know is the size of the segment containing each
pixel. So this observation actually is not very useful to compute the structure
entropy for single segmentation, but it is very useful to compute the average
structure entropy for an image.
Okay. So now we can transfer the observation to an average structure entropy.
Again, here we can consider the contribution from each pixel separately and
there is a contribution from one pixel to the average structure entropy given in
here. It is actually average over the contribution from this pixel to all the
segmentations for to combine the probability of the segmentation here.
Recall the contribution from this pixel to one segmentation is defined over here.
If you put the probability term inside the log what we have is something, you
know, pretty nasty, but it has a very simple meaning. The meaning is it gives the
geometric segment size containing this pixel across all the possible
segmentations.
All right. So here is a summary for the observation we are using for (inaudible).
So to compute the structure entropy all we need to know is the side of the
segment containing each pixel. And similarly to compute the average structure
entropy all we need to know is the geometric mean segment size containing each
pixel across all the possible segmentations. All right.
The geometric size is pretty hard to compute. But we are more familiar with
arithmetically, which is also easier to compute with. So our idea here is we want
to use the arithmetic mean to approximate the geometric mean. So here comes
the approximation part. All right.
Fortunately there is a standard approximation for this purpose. Here is the
geometric mean. This is the arithmetic mean and here is variance of the
segment size, size of segment containing each pixel. So basically this
approximation is the (inaudible) from the (inaudible) series by ignoring all the
higher order terms. So we are only using the first two terms for this
approximation. And all right.
>> Question: (Inaudible) -- is large or -- I'm sorry. So you say that it is tailored
expansion and you say that the high order terms drop off.
>> Hongzhi Wang: Yes.
>> Question: I was just wondering when do they actually drop off? Was it when
the number of segments become large or ->> Hongzhi Wang: Uh, for this part, I don't really know, actually. I didn't study
this part. But the things you might experiment testing. You know I evaluated the
significance of the second term compared to the first term. Actually this term is
already -- contribution is much smaller compared to the first term. You know, it
gives you some insights. For the higher order terms the contribution will be much
smaller (inaudible).
>> Question: Okay. Just what makes the higher order terms drop off so
quickly? That's where I'm (inaudible).
>> Hongzhi Wang: Yeah. Um, another way to think about it is to consider the
variation of the segment size compared to the size of the segmentations.
Basically the variation term, this term is not very large so basically means the
variation divided by the size of the segmentation is not very huge. But the
variation term is -- it is either related to the -- I will show you way to compute the
arithmetic mean of areas. It is based on the image infinities between the pixels.
So if the two pixels have very high infinity close to one or very low infinity close to
0, usually this term is very small. But if infinity is not very certain, say, around
half, something like that, 0.5, this term becomes the larger. So it's related to the
-- how confident to the segmentation is, maybe later it can become clearer.
All right. To compute your arithmetic mean, what we can do, we use the infinity
metric -- infinity metrics. The infinity between two pixels actually can be
interpreted as a probability. These two pixel belong to one segment. So when
we try to compute the arithmetic segment size for one pixel what we can do is
just sum over the infinity between this pixel and all the other pixels. So that is the
arithmetic mean segment size.
So how can we compute the infinities? Actually to compute the infinities the
process is not very hard. For example, we can use the Gaussian function based
on the intensity values to evaluate the infinities to pixel. So if two pixels are very
close to each other, they have similar intensity values, so they should have a
high intensity -- high infinities, otherwise they should have small infinities.
The next question is how to compute the variance. To compute the variance is a
little bit difficult so we make assumption. We assume the (inaudible) wise
infinities are independent from each other. This way we can ignore all the
co-variance between the infinities. So the variances can be simply computed
over here.
>> Question: I mean, what do these infinities, do they have to be normalized
property to be an actually probability function?
>> Hongzhi Wang: Yes. At least they should have a probability meaning. So it
should be between 0 and 1 because 1 means definitely 1 segment. You know, 0
means, yeah, something like that. Yeah.
But the independent assumption is pretty -- even though the independent
assumption is pretty common image segmentation community, for example the
normalized (inaudible) actually compute a maximum posterior solution based on
this assumption. But this is still a very strong assumption, so our concern here is
how much accuracy we actually lose when we use this assumption.
So you must get this problem, ideas are falling experiments. In this experiment I
use real images from the Berkeley's testing benchmark. You know,
segmentation database.
So giving you real images for each pixel so you can evaluate the geometric mean
using -- without using the independent assumption. So we want to compare how
different these two approximations are. For this purpose we are using the ratio
between the two approximations. Right? So if the ratio is close to 1, which
means the independent assumption doesn't really give you too much difference
so we can safely use it.
For this one it shows the empirical distribution of this ratio over all the test images
from the Berkeley's segmentation data site. As you can see, for most of the
cases over 90%, this ratio is very close to 1, which is larger than 0.8. Okay.
Which means in practice the independent assumption doesn't really give you too
much difference for the approximations. So we can very simple to use this
assumption to compute, to do the approximations.
All right. So here is -- coming back to the toy example we saw before, which we
know the (inaudible) value is given over here. If we're using our approximation
without the independent assumption the approximation is very close to the real
one. All right. But using the independent assumption, the area is larger, but is
still pretty close to the real one. All right.
So now it's time to show how to compare this -- yeah?
>> Question: Is the difference and the destination (inaudible)?
>> Hongzhi Wang: For the first one it's over, but for the second one if you
ignore the second term it's a little bit under. Yeah. For this example. Okay.
Okay. So to show how to compare the similarity between two shapes, between
two images based on the shape, based on the shapes, we -- here is the
equations. So given two images again here we want to compare the
misinformation between the segmentations over here. But to address the
unreliability of the segmentations where average over segmentation for both
images over here.
So this is the definition. Again the definition is not very useful, but the good thing
is we can rewrite it into another form. In this new form the first two terms of
average structure entropy for the two images, which we already know how to
approximate it. The last term is the area joined structure entropy for the two
images, which actually can be approximated using the same way, but using
different infinity metrics. The drawing infinity metric is just simply you know you
multiply the infinities coming from the two images. That's it. Okay.
So here is the final approximation to compare the shape similarity between two
images. In the (inaudible) it is corresponding to the average same size in the
joined image. The denominator -- yeah.
>> Question: (Inaudible) ->> Hongzhi Wang: So overall it is matter of statistically independent level
between the two images.
>> Question: Let me see if I'm still following this. So M1 and MN, you take two
pixels in image one and you compare the gray levels and the gray levels are
close, you say, close to 1 and gray levels are far away you say it's close to 0.
Right? You pass them through a (inaudible) ->> Hongzhi Wang: Yeah.
>> Question: You can do that. For image two you can pick up the same pair of
images, pixels. But it is a different image so colors will be different, right?
>> Hongzhi Wang: Yeah.
>> Question: Now for the top you're saying we just basically multiply those two
numbers, right? Just -- okay.
>> Hongzhi Wang: And basically this is just one way to compute the infinities
using the Gaussian and the intensities. There are some other ways. Also, one
way using the Gaussian intensities, reinforce a special constraint. So if the two
pixels are close to each other we use this Gaussian to compute it, otherwise we
just consider the infinity to be 0 or something.
>> Question: So basically if the pixels are very similar in both images then
which means the ->> Hongzhi Wang: The image, they also have a high probability between,
yeah ->> Question: But if one image is similar and the other image is different ->> Hongzhi Wang: Yeah.
>> Question: -- then something is different and that's more likely (inaudible).
>> Hongzhi Wang: Yeah.
>> Question: Okay.
>> Hongzhi Wang: Especially encourage the two infinities to have similar, you
know, response.
>> Question: Yeah. Okay.
>> Hongzhi Wang: All right. So here is a summary for this part. This is
average over segmentation use throughout segmentation matching. So it
gives -- it has all good properties using segmentation for matching, but we don't
suffer from (inaudible) segmentations too much anymore and even better we
don't have to need compute actual segmentation for shape matching at this point.
>> Question: When you say compute segmentations, do you mean normalized
cuts or which -- if you were to compute segmentations, what class of algorithms
would you be running?
>> Hongzhi Wang: Um, any one. Any one you could use. Normalized class,
mean shift.
>> Question: So mean shift is also defined in terms of infinity matrixes?
>> Hongzhi Wang: Um, you know in a way it is. In a way it is because it is
based on the current stuff. It doesn't really have to -- it doesn't really require the
segmentation to have -- the segmentation algorithm to have the same
interpretation as we formulated. You know, when you try to use image
segmentation to represent the shape and to compare a shape, whatever the
segmentation algorithm is, we only care about the segmentation itself.
>> Question: But you're making the assumption that the infinities capture
important information ->> Hongzhi Wang: Yeah.
>> Question: Yet the only thing you ever look at is the infinities. So you must
be implicitly comparing yourself against algorithms that segment based on those
infinities.
>> Hongzhi Wang: Yes.
>> Question: Totally different -- segmentation algorithm out of a hat you
wouldn't -- your approximation wouldn't be (inaudible) ->> Hongzhi Wang: Yeah, that makes sense.
>> Question: So I'm curious what is the range of segmentation algorithms that
exploit the kinds of ->> Hongzhi Wang: Actually, as far as I know, most of the available -- at least
the most leading segmentation algorithms are based on this assumption, based
on the infinities of pixels to do the groupings.
>> Question: (Inaudible) algorithms?
>> Hongzhi Wang: For example, the main shift is normalized cut (inaudible)
and also there is some graph theory-based segmentation algorithms. Actually
implicitly or explicitly, using infinities between the pixels (inaudible). Yeah.
All right. And so next we can show some applications using these approaches.
First we show how to use the algorithm to -- for several (inaudible). For image
segmentation and for image smoothing.
For the application image segmentation, our motivation is that low level image
segmentation is very ambiguous process just like showing this example, these
are the global, the (inaudible) labeled by human subjects for one image. As you
can see there are bigger variations across the different segmentations. So the
huge -- the variation shows that the plausible image segmentation usually have a
flatter distribution.
So for random variable with flat distribution the mean estimator statistically
should be more robust than the maximum posterior estimator. The point is our
approximation technique allows us to compute a mean estimator for images to
compute the image segmentations.
All right. So let's first recall the definition of main (inaudible) we should give the
list of variance for this run variable with respect to some distance metric.
Formerly it is defined over here and this represent the distance metric for this
random variable. Okay.
So in our case we can compute the -- define the central segmentation, which
gives the least average distance to all the other segmentations. Formerly it is
defined over here. Here is the capital V. It represents distance metric for image
segmentations. It is variation of information. Basically it is a variance of mutual
information. Actually it's equivalent to mutual information. Minimized this
distance is equivalent to maximize mutual information between the two
segmentations.
As you may already see actually this is not exact mean estimator because we
are computing the -- here using the distance instead of distance squared. But it
is kind of trivial, but we can show you that the central segmentation is the mean
estimator with respect to the distance metric of square root of V. All right.
The point is our approximation approach allows us to compute this function, this
cost function. So we can minimize this cost function, you know, to search for the
optimum segmentations. So here is one way to search for the optimum
segmentation. We're using -- we can use -- here we are using grading
(inaudible).
All right. So basically initially segmentation is a trivial one. So each pixel is an
individual segmentation so we keep merging the neighbor segments, as long as
this merging can decrease the cost function and we stop merging until we cannot
decrease the cost function anymore. All right. So this is just one way to do the
optimization based on the approximation. For general image segmentation we
can use the grade and decent optimization procedure. So here are the demos to
show the process.
So after -- so each ground of the optimization we are just using the derivative
with respect to the cost function to optimize -- to update the segmentations and
until we converge to local minimum. All right. Okay.
So for quantitative evaluation of the segmentation results, we propose a new
criterion to compare -- to evaluate the quality of the segmentation, which is
average distance from this segmentation to the (inaudible) to segmentations. All
right.
Here is an example to show how to use this criterion and all these segmentations
are labeled by humans, so they are considered as (inaudible) segmentation.
Below each segmentation shows the average distance from this segmentation to
all the other (inaudible) segmentations. The criterion states that the smaller the
average distance is the better the segmentation is.
>> Question: Here we are comparing two segmentations. You are using that
very early definition.
>> Hongzhi Wang: Yeah. So you don't have to average segmentations
because segmentation for this case is very definite.
>> Question: Overlaying the two segmentations, taking the intersection of the
regions and then computing (inaudible) --
>> Hongzhi Wang: Exactly.
>> Question: Okay.
>> Hongzhi Wang: So using this criterion actually this segmentation is
considered the best. And it also has meaning average this segmentation is most
similar to all the others (inaudible) segmentations. All right.
So using this criterion we apply the segmentation algorithm on Berkeley's test
images and we compare it with mean shift normalized cut efficient graph
developed by Feldon Scwaub. And in here for each test images we segment the
imaging to different number of segments. The number of segments we use 10,
20, 30, 40 and 70. And then here this gives the average, the mean of average
distance for each test images. As you can see our approach actually is more
consistent in the competing method. Overall it performs much better than the
competing approaches.
Okay. Another application of this matter is to use it for image smoothing. Since
our approach actually mirrors similarity between two images based on the term of
their segmentations so we can ask one question, you know, given image I, what
is the most similar image between image I and formerly it is defined over here.
Right?
So we won't find image optimum image should be the most similar image to the I.
The answer is it is different from I, actually, and it is a smooth image, but
boundaries of image is preserved. So here are two demos to show the
smoothing process. The first row is the original image, so we optimize -- we try
to find the optimum image starting from the original image. We try to maximize
the similarity between the lower image and the upper image. And we can see
actually this value here shows the information is actually increasing. It is very
small, you can see here.
The point is as the similarity increase the image becomes quite (inaudible) with
the boundaries, with the major structure preserved. The reason for this is as we
compute the (inaudible) segmentation, the segmentation actually is the most
similar segmentation to the whole image. So you can see that for this optimum
image this should have distribution of segmentation tightly clustered around the
central segmentation. So as you optimize this image the distribution gradually
cluster around the central segmentation and during this way the image becomes
smooth and the structure is very close to the central segmentations. Yes.
>> Question: So you are saying the smooth image when it runs through a
segmentation algorithm will have a segmentation that is most similar to the ->> Hongzhi Wang: The central segmentation (inaudible).
>> Question: Central segmentation for?
>> Hongzhi Wang: For the original image.
>> Question: The original image. So you have your algorithm (inaudible) image
and over all possible segmentations will compute the one that is most central.
>> Hongzhi Wang: Yeah.
>> Question: For some range of segmentations, some algorithm that generates
segmentations?
>> Hongzhi Wang: No. It's actually signal like this way because the algorithm is
based on the infinities, right. Basically after you compute the infinity for an
image, it implicitly defines the distribution of the segmentations for this image and
it computes the central segmentation based on those distributions. Yeah. All
right.
Okay. So the next application is for using the for shape matching. The first one
is for object detection. For this case we want -- given the image we want to see if
there is an object inside image and where is it. Currently to address the global
appearance variations most approach is using some local features. As I will
show you using our approach we can use -- using a global template matching,
we can still achieve very excellent results.
The detection procedure is, we first need a template for this class. Then we -using a scanning window across the image, then we use a threshold to detect the
image. For the -- as I said before, we assume the object has similar (inaudible)
skills already. And the database we use first is the (inaudible) database. For this
database we (inaudible) database. For this database we already have some
training examples. So we use the average infinity metric as our example, as our
template. To give you a rough idea about the shape structures for the template,
here this image shows the rough structure. The intensity for this image is
corresponding to the segment size computed from the arithmetic segment size
for this image computing from the average infinity metric. So if it is bright means
the pixel is standing in large segment. If it is dark means it is close to a boundary
of most segment.
The other two database we use are the human face and the (inaudible)
database. The -- all right. So here are some detection results. The first to
compare our method is the similarity template approach and the probability
syntax approach. The rhythm we want to compare these to is these two methods
are actually also comparing the image similarity based on image infinities. The
only difference is that our approach is derived from the image segmentation and
mutual information. So we have a better statistical meaning competing metric.
As we can see, we -- our method performance better than the competing
approaches.
So here we are comparing with two shift matching technique which are based on
local shape features. These two matters are actually using similar training and
detection approach strategy as given training images. First we try to figure out
which add fragment is most affirmative for this class. Then try using the learned
classifier to label the object in the new test images.
Again, our matter is also performs better than the local shape matching
approach. This shows the importance of using global shape for shape matching.
All right.
So the last one, application is for shape-based tracking. The reason we want to
use shape, shape is more, you know, specific than the appearance histogram in
representation. Currently most shape tracker using snakes which only use the
silhouette of the object. And if you want to include the inner shape, if you want to
use the complete shape of the object you have to use some training algorithms.
Here we show that using our global template of matching approach we can use
all the detail shapes.
All right. For this tracking algorithm, given the first image you label the object you
want to track. You will find the most similar one in the next frame. All right. So
here shows some tracking videos to show the demos.
Again, on this (inaudible) shows the raw shape structure of the template is using
now the same way we computed it before.
To handle the occlusion we just, you know, average smooth the template across
the time, you know, so this way we can handle the occlusion part. All right. And
here we try to compare the shape matching algorithm with the pyramid histogram
of gradients. This pyramid histogram gradient is based on the object detection
and is also considered as a global shape matching algorithm. And so the test
here is against the video sequence with great lighting changes.
As we can see since the pyramid histogram of variance is based on local ad
detection algorithms, it is very sensitive to either make changes. But our
approach is based on average over all the possible segmentations, so it is more
robust to this type of changes. So it gives us some more here is the rough
structure of the shape, more -- or actually tracking results.
>> Question: So that image in the lower left corner, what is that process?
>> Hongzhi Wang: This one is a rough shape structure for the template he is
using now.
>> Question: And the shape structures getting visualized by the gray encoding
the size of the region, the average size ->> Hongzhi Wang: Yes.
>> Question: -- of the region, like you showed us for (inaudible).
>> Hongzhi Wang: Yes. All right. So here is the conclusion. The major point
of this work is first we propose to use image segmentations to represent and
compare the complete shape in the image. And we use mutual information to do
that and to address the run reliability of the image segmentations, we give a
solution to average over all the possible segmentations. This approach can be
applied for shape matching and can be applied for image segmentations. And
pretty much that's it and that's the work. So any questions?
>>: Thank you for your talk.
>> Hongzhi Wang: Thank you.
(applause)
Download