to give us a talk this afternoon. Hongzhi just... Stevens Institute of Technology, which is just across the Hudson... >> Rick Szeliski:

>> Rick Szeliski: Good afternoon. It's my pleasure to welcome Hongzhi Wang to give us a talk this afternoon. Hongzhi just defended his PhD thesis at the Stevens Institute of Technology, which is just across the Hudson River from Manhattan. And he's been doing a lot of work with his advisor, John Oliensis, on Perceptual Organization. He's going to tell us about some very interesting ways to do segmentations, in particular, to deal with the uncertainty associated with segmentations. >> Hongzhi Wang: All right. Thank you. Thank you for attending this talk. In this talk I will try to describe the major work I did in my thesis work, which is trying to use image segmentations to do shape matching. I will show you this work can be applied for image segmentation and for shape matching for both tasks. And I will describe the applications for both areas. So first, I will give you some -- a brief, a compact information for this work. The recognition problem is a very interesting topic. Given the images we want to know what object and what scene, what activities are going on inside the image. But recognition is a very tough problem. Just like the example I showed here, even though these two images have very similar contents, have similar objects inside image, but the appearance of the two images can be quite different. And manufacturers can contribute to these appearance variations. For example, the viewpoint of the object, the skill, illumination, and the definitions, all these factors make recognition a very challenging problem. When we talk about the recognition there are some fundamental issues related to recognition. The first one is a representation issue. It's also called -- it's also the so-called the local versus the global dilemma. Globally the whole image contains rich information, which is sufficient for recognition. But it is very hard to work with because the huge appearance variation. You know, in other words, it means it is very hard for us to find the correct matching in a global image level. On the other hand, if we can use a local image patch, it will be much easier to work with because appearance variation is much smaller. But it may not contain sufficient information for the recognition task. So that's the issues. Another important issue is the features we can use for recognition. So far the most popular features are using the textures and edges and recently the image segmentations has also been applied for object recognition. So next I will give you some work, a discussion about these things and actually I'm sure you already familiar with this, but I just, you know...okay. So since the global image is not very easy to work with, so currently the most approach is using local features. So we have to find some way to clarify the local ambiguity somehow. The tags(phonetic) of feature technique tends to use as many features as possible. And the good thing about it is it can be learned very efficiently using the supporter vector machine technique. Another type of approach also considers the special relations between the local features. This way is more efficient to clarify the local ambiguity, but the learning becomes harder because to find the optimum solution we have to switch over computatorial space. Right. And recently there are some techniques try to combine these two, for example, the fragments approach. Compared to the tags for features, the edges are usually more robust to appearance changes caused by illumination change or viewpoint change. And more importantly, just like showing you this example, edges tells you the shape of the object, which is a very distinctive feature for recognition. So next, let's take a close look at the shape matching problem. Again, when we try to compare the shapes we are facing the global and the local dilemmas. The global shape is very distinctive, but it's very hard to work with because of huge possible definitions. Right? And on the other hand, the local ad fragments is easier to work with, but is not affirmative enough. So currently for reliable shape matching you should need to find the point-to-point edge correspondence, which is very expensive because it requires such overall communitorial space. >> Question: (Inaudible) -- scale and perspective? >> Hongzhi Wang: Yes. Ideally you should account for the scales, but in this talk we assume the shapes you are trying to match already have a similar scale and already have a similar orientation, something like this. You know, just like the example shows here. It's a simplified task of shape matching. Yeah. And recently there are some techniques try to combine advantages of global and local shape matching approaches. For example, this one. Instead of using just the ad fragments over here, they try to group the ad fragments first. After grouping, the shapes becomes more distinctive, but it is still simple enough for us to work with. So it's kind of like an intermediate solution between the global and the local approaches. The limitation for this method is that boundary grouping is not reliable, so they have to use -- based on using training images and based on learning techniques. So it is not very good to compare arbitrary shapes when you don't have training datas for the problem. Another interesting technique using histograms of edges or histogram orientation of edges to represent a shape and this method is very simple. It doesn't require a point-to-point edge correspondence and it is robust enough to overcome small definitions. It is one of the most successful techniques available so far. The major limitation is that histograms use information loss compression technique. So basically it will sacrifice shape description accuracy. Just like the example shows here, if adjust the histogram over the region over here, these two shape actually have the same histogram, but actually they are different shapes. So after a brief review of the techniques, here comes close to the work I did. For the shape matching problem, our work is based on information shown over here. When we try to compare the two shapes you can transfer the edge consistency into region consistency. By region consistency I mean how well the two regions overlap to each other, just like the example shows here. This way, you know, you can detect any small shape difference and it is still -you will require the edge correspondence between the two shapes and since the regions are global features and they span over large regions, so it uses robust to small shape variations. So these are the advantages using this strategy to shape matching. Here is the list over here. When we try to use regions for shape matching, it also has some issues we have to take care of. The major issues is since regions are global features, so it is very hard to represent and compare precisely. And more importantly, regions usually -- it is very hard to retrieve good regions for images directly. So all these issues, we need to take care of and in my work I try to address these issues to make this idea more practical and more efficient. So here is a brief overview of the solutions for each issue. So for representation problem, the proposed use image segmentation to represent a shape and to compare the shape. And for matching we use much information to do the comparison. To address the unreliability of the segmentations, we give a solution to average over all the possible segmentations. So next I will give you the detail for each part. The first part is using image segmentations. The first thing I want to mention is image segmentation has been used for recognition before. But mostly the image segmentation has been used to represent appearance. And in this work the major novelties that we tried to -we proposed to use the entire shape encoded in the segmentation to represent a shape and as I will show you it involves very efficient shape matching algorithms. Just like the example I showed here, good segmentations usually contain sufficient boundaries for shape matching. And also using segmentations to represent a shape has some other advantages. The major one is image segmentation algorithms usually using the global image information to compute the boundaries. So we have better chance to find the true boundaries, the local add detectors. Another major advantage as shown in the images. The image segmentation reviews the global shape structures of the image, which is more distinctive than local shapes. But the limitation of segmentation, it can be unreliable. Just like the examples show here, sometimes image segmentation does not have sufficient boundaries for us to do the shape matching. But over segmentation can be partially addressed this problem because it can -- has better chance to locate the two boundaries. But the point here is over segmentation is not sufficient because they add too many thick boundary side image. So it makes the overall shape description less accurate. So we will come back to this issue later. Actually this is the major point that actually to addressing this problem is the major novelty in this work. And -- okay. The next point is we may have the shape of the segmentation. How can we compare them? The intuition we use to compare the shape of segmentation is shown in this image. So if -- suppose we want compare the segmentation with to segmentation in the middle. The red column shows the joined segmentation for these two matching problems. The joined segmentation is just simply just to overlay the boundaries of coming from this to segmentation, to have the joined segmentation. Right. The intuition here is if the two segmentation have similar shapes in their joint segmentation, their boundaries tend to overlap each other. So this will give you some large segment, you know, in the joined segmentation, just like the body of the car. On the other hand, if they don't have similar shapes in their joined segmentation, their boundaries don't overlap to each other, so this will give you many small segments. So thus the intuition we use. And the good thing about this matching strategies is that this intuition actually is well captured by the mutual information concept. So the idea here is that we want to use mutual information to mirror the similarity between two shapes. All right. So before we go ahead to show how to mirror the image information between images, between segmentations, we first need to know how to compute entropy for segmentation. For this purpose suppose you have segmentation like this. If you adjust the run like pic, one pixel out from this segmentation, the piece of I represents the probability that this pixel is coming from the ad segment. Right? So piece of I is actually Z equal to the area of proportion for the whole image. It's like shown this image. For this one we have six segments. Right? So basically the piece of I represents the probability, distribution represented by the segment size for this segmentation. Then using the standard entropy definition, we can compute the entropy for this segmentation. All right. We call this structure entropy. Because it actually gives the structure complexity of the segmentation. Just like shown in this example, these are two segmentations for one image. The left image has fewer segments and have fewer boundaries. So measured by the structure entropy it is much smaller than the other one. All right. Okay. So actually after knowing how to compute entropy for segmentation, it is very straightforward to use the standard and mutual information definition to compare the similarity between two segmentations. So here is the Y example to using the mutual information. Suppose A and the B are the two segmentations we want to compare with. This segmentation shows their joined segmentation. Below it shows the structure entropy for each segmentation above. Right. For this case, A and B, they don't really -- it contains (inaudible) information, so they don't tell anything about each other. So the mutual information between them is 0, which means they are totally different. All right. By the way, are there any questions? Okay. Also, in the context of the recognition, it's usually very useful for us to normalize the mutual information. In our case we normalize the mutual information by the joined entropy. So after the normalization the metric is between the 0 and the 1. 0 means totally different. The 1 means identical shapes. All right. So here is another example to using this metric to compare the shapes. These two segmentations are labeled by different human subjects for the image over here. And this one shows their joint segmentation. As you can see it's pretty consistent for these two, but there are still some minor difference in this section and this section. And using our metric we can successfully capture this fact. All right. All right. At this point I will take a few minutes to give more discussion about this segmentation-based mutual information metric. Because as we know, the mutual information intensed image has been available for a very long time, for mostly for image recognition applications. So here we are trying to use the mutual information on image segmentations. So what is the major difference between these two techniques? The major difference between segmentation and intensity image is that for image segmentation if two pixels are coming from one segment they have to be spatially connected by the pixels coming from the same segment. But for intensed image there is no such constraints. And so this is spatial constraint difference between these two. And because of this difference using image segmentations the matching results is more robust into small shape variations. But using intensity metrics is more sensitive to such changes. So here is an experiment I did to show this property. For this test what I did is I tried to compare the similarity between segmentation with the shape diversion of the same segmentation. So as I shifted the segmentation away wanted to see how the similarity changes. And the X axis actually gives the shape of the pixels between the two images. The Y shows the normalized similarity between the two images. Right. And for this one I used the structured image information on the segmentations and mutual information on the intensity images. All right. As we can see, using intensity image is very sensitive to the spatial shift. With the first pixel being moved, it (inaudible) very largely. While using segmentations we have pretty good robustness against this test. And so basically this means, you know, intensity information is better for registration because for registration we expect the two shapes to have very close matches. But for recognition it's a different story. For recognition it's very rare for us to find the two identical shapes. So we want the matching technique to be more robust to this kind of variations. All right. So the last part is how to address the unreliability of the segmentations. All right. Since any specific segmentation can be very unreliable, so our idea here is we don't want to base our decisions just based on any specific segmentation. Instead we want to use all the segmentations, but weighted by their probabilities. And so basically this is the idea. So here is -- shows how to use this idea. Okay. Recall, give segmentation, the structure entropy for all the segmentation is defined over here. But now given an image since we want to use all the segmentations for this image so we define the average structure entropy for this image, which is defined over here. The capital S is one kind of segmentation and the P gives the probability of segmentation for this image. And the SEG represent the side containing all the possible segmentations. So basically this is a very straightforward definition. So try to use all the segmentations, but are weighted by their probabilities. So to show how to use this definition here is a toy example. For this simple example I just consider an image when I have four pixels. So for this case we can enumerate all the possible segmentations over here. There are 15 possible segmentations for this image. For example, this one. If two pixels are in one segment there is an add between them. So for this one we have four segments. All right. For this one we have three. For this one we have two. To compute the construct entropy, we need to know the probability for each segmentation. For this case if we assume each one, each segment has the same probability then we can have the average such entropy for this example. So here is the procedure, how to compute it. Since each segmentation has the same probability, so the P here is a constant. So 115. So all we need to know is the structure entropy for each segmentation over here. And here is one example to compute and here is another example. But unfortunately this definition is not very useful for us to compute the average structure entropy for real images. Because first it is very hard to compute the probability for each segmentation and secondly it is impossible for us to enumerate all the possible segmentations for real images because there are too many of them. So the major part we try to give a solution how to efficiently approximate this computation over here. For approximation our work is based on observation over here. We try to compute the structure entropy, we can consider the contribution from each pixel separately. So here is what I mean. Recall the structure entropy is defined over here. It is based on the size, the number of segments and some as the number of segments over here. The observation over here means we can rewrite this definition to another one. In this form it is actually considers the contribution from each pixel separately, which is the sum or rate. And the contribution from one pixel is defined over here. And the N is the image size and S, superscript. M is the segment containing this pixel N. A gives the area of the segment. So the whole thing, the whole thing inside of the log actually correspond into a piece of I over here. Yeah. >> Question: So why don't you use the segmentation of brightness distribution in the segment? >> Hongzhi Wang: The segment for ->> Question: (Inaudible) -- in the segment in the pixels there (inaudible) then you will have different (inaudible), right? But you are not doing that. >> Hongzhi Wang: Oh, you mean using the distribution of the intensity formula for ->> Question: The segment measure. >> Hongzhi Wang: The measure of what? >> Question: Segmentation. >> Hongzhi Wang: Oh, it is a possible way to do that, but I think this way is more straightforward because it is just based on the probability of the segmentation directly and using the standard deviation you adjust the one way to evaluate the probability for the segmentations. Right. And but the work here we don't really take care of too much how you (inaudible) the probability for each segmentation. But once you have this evaluation here is the way to show how you can combine all these evaluations to gather more reliable results. You know, so it's kind of different concerns. All right. Okay. So okay. So which means, you know, when you try to compute the structure entropy for a segmentation all we need to know is the size of the segment containing each pixel. So this observation actually is not very useful to compute the structure entropy for single segmentation, but it is very useful to compute the average structure entropy for an image. Okay. So now we can transfer the observation to an average structure entropy. Again, here we can consider the contribution from each pixel separately and there is a contribution from one pixel to the average structure entropy given in here. It is actually average over the contribution from this pixel to all the segmentations for to combine the probability of the segmentation here. Recall the contribution from this pixel to one segmentation is defined over here. If you put the probability term inside the log what we have is something, you know, pretty nasty, but it has a very simple meaning. The meaning is it gives the geometric segment size containing this pixel across all the possible segmentations. All right. So here is a summary for the observation we are using for (inaudible). So to compute the structure entropy all we need to know is the side of the segment containing each pixel. And similarly to compute the average structure entropy all we need to know is the geometric mean segment size containing each pixel across all the possible segmentations. All right. The geometric size is pretty hard to compute. But we are more familiar with arithmetically, which is also easier to compute with. So our idea here is we want to use the arithmetic mean to approximate the geometric mean. So here comes the approximation part. All right. Fortunately there is a standard approximation for this purpose. Here is the geometric mean. This is the arithmetic mean and here is variance of the segment size, size of segment containing each pixel. So basically this approximation is the (inaudible) from the (inaudible) series by ignoring all the higher order terms. So we are only using the first two terms for this approximation. And all right. >> Question: (Inaudible) -- is large or -- I'm sorry. So you say that it is tailored expansion and you say that the high order terms drop off. >> Hongzhi Wang: Yes. >> Question: I was just wondering when do they actually drop off? Was it when the number of segments become large or ->> Hongzhi Wang: Uh, for this part, I don't really know, actually. I didn't study this part. But the things you might experiment testing. You know I evaluated the significance of the second term compared to the first term. Actually this term is already -- contribution is much smaller compared to the first term. You know, it gives you some insights. For the higher order terms the contribution will be much smaller (inaudible). >> Question: Okay. Just what makes the higher order terms drop off so quickly? That's where I'm (inaudible). >> Hongzhi Wang: Yeah. Um, another way to think about it is to consider the variation of the segment size compared to the size of the segmentations. Basically the variation term, this term is not very large so basically means the variation divided by the size of the segmentation is not very huge. But the variation term is -- it is either related to the -- I will show you way to compute the arithmetic mean of areas. It is based on the image infinities between the pixels. So if the two pixels have very high infinity close to one or very low infinity close to 0, usually this term is very small. But if infinity is not very certain, say, around half, something like that, 0.5, this term becomes the larger. So it's related to the -- how confident to the segmentation is, maybe later it can become clearer. All right. To compute your arithmetic mean, what we can do, we use the infinity metric -- infinity metrics. The infinity between two pixels actually can be interpreted as a probability. These two pixel belong to one segment. So when we try to compute the arithmetic segment size for one pixel what we can do is just sum over the infinity between this pixel and all the other pixels. So that is the arithmetic mean segment size. So how can we compute the infinities? Actually to compute the infinities the process is not very hard. For example, we can use the Gaussian function based on the intensity values to evaluate the infinities to pixel. So if two pixels are very close to each other, they have similar intensity values, so they should have a high intensity -- high infinities, otherwise they should have small infinities. The next question is how to compute the variance. To compute the variance is a little bit difficult so we make assumption. We assume the (inaudible) wise infinities are independent from each other. This way we can ignore all the co-variance between the infinities. So the variances can be simply computed over here. >> Question: I mean, what do these infinities, do they have to be normalized property to be an actually probability function? >> Hongzhi Wang: Yes. At least they should have a probability meaning. So it should be between 0 and 1 because 1 means definitely 1 segment. You know, 0 means, yeah, something like that. Yeah. But the independent assumption is pretty -- even though the independent assumption is pretty common image segmentation community, for example the normalized (inaudible) actually compute a maximum posterior solution based on this assumption. But this is still a very strong assumption, so our concern here is how much accuracy we actually lose when we use this assumption. So you must get this problem, ideas are falling experiments. In this experiment I use real images from the Berkeley's testing benchmark. You know, segmentation database. So giving you real images for each pixel so you can evaluate the geometric mean using -- without using the independent assumption. So we want to compare how different these two approximations are. For this purpose we are using the ratio between the two approximations. Right? So if the ratio is close to 1, which means the independent assumption doesn't really give you too much difference so we can safely use it. For this one it shows the empirical distribution of this ratio over all the test images from the Berkeley's segmentation data site. As you can see, for most of the cases over 90%, this ratio is very close to 1, which is larger than 0.8. Okay. Which means in practice the independent assumption doesn't really give you too much difference for the approximations. So we can very simple to use this assumption to compute, to do the approximations. All right. So here is -- coming back to the toy example we saw before, which we know the (inaudible) value is given over here. If we're using our approximation without the independent assumption the approximation is very close to the real one. All right. But using the independent assumption, the area is larger, but is still pretty close to the real one. All right. So now it's time to show how to compare this -- yeah? >> Question: Is the difference and the destination (inaudible)? >> Hongzhi Wang: For the first one it's over, but for the second one if you ignore the second term it's a little bit under. Yeah. For this example. Okay. Okay. So to show how to compare the similarity between two shapes, between two images based on the shape, based on the shapes, we -- here is the equations. So given two images again here we want to compare the misinformation between the segmentations over here. But to address the unreliability of the segmentations where average over segmentation for both images over here. So this is the definition. Again the definition is not very useful, but the good thing is we can rewrite it into another form. In this new form the first two terms of average structure entropy for the two images, which we already know how to approximate it. The last term is the area joined structure entropy for the two images, which actually can be approximated using the same way, but using different infinity metrics. The drawing infinity metric is just simply you know you multiply the infinities coming from the two images. That's it. Okay. So here is the final approximation to compare the shape similarity between two images. In the (inaudible) it is corresponding to the average same size in the joined image. The denominator -- yeah. >> Question: (Inaudible) ->> Hongzhi Wang: So overall it is matter of statistically independent level between the two images. >> Question: Let me see if I'm still following this. So M1 and MN, you take two pixels in image one and you compare the gray levels and the gray levels are close, you say, close to 1 and gray levels are far away you say it's close to 0. Right? You pass them through a (inaudible) ->> Hongzhi Wang: Yeah. >> Question: You can do that. For image two you can pick up the same pair of images, pixels. But it is a different image so colors will be different, right? >> Hongzhi Wang: Yeah. >> Question: Now for the top you're saying we just basically multiply those two numbers, right? Just -- okay. >> Hongzhi Wang: And basically this is just one way to compute the infinities using the Gaussian and the intensities. There are some other ways. Also, one way using the Gaussian intensities, reinforce a special constraint. So if the two pixels are close to each other we use this Gaussian to compute it, otherwise we just consider the infinity to be 0 or something. >> Question: So basically if the pixels are very similar in both images then which means the ->> Hongzhi Wang: The image, they also have a high probability between, yeah ->> Question: But if one image is similar and the other image is different ->> Hongzhi Wang: Yeah. >> Question: -- then something is different and that's more likely (inaudible). >> Hongzhi Wang: Yeah. >> Question: Okay. >> Hongzhi Wang: Especially encourage the two infinities to have similar, you know, response. >> Question: Yeah. Okay. >> Hongzhi Wang: All right. So here is a summary for this part. This is average over segmentation use throughout segmentation matching. So it gives -- it has all good properties using segmentation for matching, but we don't suffer from (inaudible) segmentations too much anymore and even better we don't have to need compute actual segmentation for shape matching at this point. >> Question: When you say compute segmentations, do you mean normalized cuts or which -- if you were to compute segmentations, what class of algorithms would you be running? >> Hongzhi Wang: Um, any one. Any one you could use. Normalized class, mean shift. >> Question: So mean shift is also defined in terms of infinity matrixes? >> Hongzhi Wang: Um, you know in a way it is. In a way it is because it is based on the current stuff. It doesn't really have to -- it doesn't really require the segmentation to have -- the segmentation algorithm to have the same interpretation as we formulated. You know, when you try to use image segmentation to represent the shape and to compare a shape, whatever the segmentation algorithm is, we only care about the segmentation itself. >> Question: But you're making the assumption that the infinities capture important information ->> Hongzhi Wang: Yeah. >> Question: Yet the only thing you ever look at is the infinities. So you must be implicitly comparing yourself against algorithms that segment based on those infinities. >> Hongzhi Wang: Yes. >> Question: Totally different -- segmentation algorithm out of a hat you wouldn't -- your approximation wouldn't be (inaudible) ->> Hongzhi Wang: Yeah, that makes sense. >> Question: So I'm curious what is the range of segmentation algorithms that exploit the kinds of ->> Hongzhi Wang: Actually, as far as I know, most of the available -- at least the most leading segmentation algorithms are based on this assumption, based on the infinities of pixels to do the groupings. >> Question: (Inaudible) algorithms? >> Hongzhi Wang: For example, the main shift is normalized cut (inaudible) and also there is some graph theory-based segmentation algorithms. Actually implicitly or explicitly, using infinities between the pixels (inaudible). Yeah. All right. And so next we can show some applications using these approaches. First we show how to use the algorithm to -- for several (inaudible). For image segmentation and for image smoothing. For the application image segmentation, our motivation is that low level image segmentation is very ambiguous process just like showing this example, these are the global, the (inaudible) labeled by human subjects for one image. As you can see there are bigger variations across the different segmentations. So the huge -- the variation shows that the plausible image segmentation usually have a flatter distribution. So for random variable with flat distribution the mean estimator statistically should be more robust than the maximum posterior estimator. The point is our approximation technique allows us to compute a mean estimator for images to compute the image segmentations. All right. So let's first recall the definition of main (inaudible) we should give the list of variance for this run variable with respect to some distance metric. Formerly it is defined over here and this represent the distance metric for this random variable. Okay. So in our case we can compute the -- define the central segmentation, which gives the least average distance to all the other segmentations. Formerly it is defined over here. Here is the capital V. It represents distance metric for image segmentations. It is variation of information. Basically it is a variance of mutual information. Actually it's equivalent to mutual information. Minimized this distance is equivalent to maximize mutual information between the two segmentations. As you may already see actually this is not exact mean estimator because we are computing the -- here using the distance instead of distance squared. But it is kind of trivial, but we can show you that the central segmentation is the mean estimator with respect to the distance metric of square root of V. All right. The point is our approximation approach allows us to compute this function, this cost function. So we can minimize this cost function, you know, to search for the optimum segmentations. So here is one way to search for the optimum segmentation. We're using -- we can use -- here we are using grading (inaudible). All right. So basically initially segmentation is a trivial one. So each pixel is an individual segmentation so we keep merging the neighbor segments, as long as this merging can decrease the cost function and we stop merging until we cannot decrease the cost function anymore. All right. So this is just one way to do the optimization based on the approximation. For general image segmentation we can use the grade and decent optimization procedure. So here are the demos to show the process. So after -- so each ground of the optimization we are just using the derivative with respect to the cost function to optimize -- to update the segmentations and until we converge to local minimum. All right. Okay. So for quantitative evaluation of the segmentation results, we propose a new criterion to compare -- to evaluate the quality of the segmentation, which is average distance from this segmentation to the (inaudible) to segmentations. All right. Here is an example to show how to use this criterion and all these segmentations are labeled by humans, so they are considered as (inaudible) segmentation. Below each segmentation shows the average distance from this segmentation to all the other (inaudible) segmentations. The criterion states that the smaller the average distance is the better the segmentation is. >> Question: Here we are comparing two segmentations. You are using that very early definition. >> Hongzhi Wang: Yeah. So you don't have to average segmentations because segmentation for this case is very definite. >> Question: Overlaying the two segmentations, taking the intersection of the regions and then computing (inaudible) -- >> Hongzhi Wang: Exactly. >> Question: Okay. >> Hongzhi Wang: So using this criterion actually this segmentation is considered the best. And it also has meaning average this segmentation is most similar to all the others (inaudible) segmentations. All right. So using this criterion we apply the segmentation algorithm on Berkeley's test images and we compare it with mean shift normalized cut efficient graph developed by Feldon Scwaub. And in here for each test images we segment the imaging to different number of segments. The number of segments we use 10, 20, 30, 40 and 70. And then here this gives the average, the mean of average distance for each test images. As you can see our approach actually is more consistent in the competing method. Overall it performs much better than the competing approaches. Okay. Another application of this matter is to use it for image smoothing. Since our approach actually mirrors similarity between two images based on the term of their segmentations so we can ask one question, you know, given image I, what is the most similar image between image I and formerly it is defined over here. Right? So we won't find image optimum image should be the most similar image to the I. The answer is it is different from I, actually, and it is a smooth image, but boundaries of image is preserved. So here are two demos to show the smoothing process. The first row is the original image, so we optimize -- we try to find the optimum image starting from the original image. We try to maximize the similarity between the lower image and the upper image. And we can see actually this value here shows the information is actually increasing. It is very small, you can see here. The point is as the similarity increase the image becomes quite (inaudible) with the boundaries, with the major structure preserved. The reason for this is as we compute the (inaudible) segmentation, the segmentation actually is the most similar segmentation to the whole image. So you can see that for this optimum image this should have distribution of segmentation tightly clustered around the central segmentation. So as you optimize this image the distribution gradually cluster around the central segmentation and during this way the image becomes smooth and the structure is very close to the central segmentations. Yes. >> Question: So you are saying the smooth image when it runs through a segmentation algorithm will have a segmentation that is most similar to the ->> Hongzhi Wang: The central segmentation (inaudible). >> Question: Central segmentation for? >> Hongzhi Wang: For the original image. >> Question: The original image. So you have your algorithm (inaudible) image and over all possible segmentations will compute the one that is most central. >> Hongzhi Wang: Yeah. >> Question: For some range of segmentations, some algorithm that generates segmentations? >> Hongzhi Wang: No. It's actually signal like this way because the algorithm is based on the infinities, right. Basically after you compute the infinity for an image, it implicitly defines the distribution of the segmentations for this image and it computes the central segmentation based on those distributions. Yeah. All right. Okay. So the next application is for using the for shape matching. The first one is for object detection. For this case we want -- given the image we want to see if there is an object inside image and where is it. Currently to address the global appearance variations most approach is using some local features. As I will show you using our approach we can use -- using a global template matching, we can still achieve very excellent results. The detection procedure is, we first need a template for this class. Then we -using a scanning window across the image, then we use a threshold to detect the image. For the -- as I said before, we assume the object has similar (inaudible) skills already. And the database we use first is the (inaudible) database. For this database we (inaudible) database. For this database we already have some training examples. So we use the average infinity metric as our example, as our template. To give you a rough idea about the shape structures for the template, here this image shows the rough structure. The intensity for this image is corresponding to the segment size computed from the arithmetic segment size for this image computing from the average infinity metric. So if it is bright means the pixel is standing in large segment. If it is dark means it is close to a boundary of most segment. The other two database we use are the human face and the (inaudible) database. The -- all right. So here are some detection results. The first to compare our method is the similarity template approach and the probability syntax approach. The rhythm we want to compare these to is these two methods are actually also comparing the image similarity based on image infinities. The only difference is that our approach is derived from the image segmentation and mutual information. So we have a better statistical meaning competing metric. As we can see, we -- our method performance better than the competing approaches. So here we are comparing with two shift matching technique which are based on local shape features. These two matters are actually using similar training and detection approach strategy as given training images. First we try to figure out which add fragment is most affirmative for this class. Then try using the learned classifier to label the object in the new test images. Again, our matter is also performs better than the local shape matching approach. This shows the importance of using global shape for shape matching. All right. So the last one, application is for shape-based tracking. The reason we want to use shape, shape is more, you know, specific than the appearance histogram in representation. Currently most shape tracker using snakes which only use the silhouette of the object. And if you want to include the inner shape, if you want to use the complete shape of the object you have to use some training algorithms. Here we show that using our global template of matching approach we can use all the detail shapes. All right. For this tracking algorithm, given the first image you label the object you want to track. You will find the most similar one in the next frame. All right. So here shows some tracking videos to show the demos. Again, on this (inaudible) shows the raw shape structure of the template is using now the same way we computed it before. To handle the occlusion we just, you know, average smooth the template across the time, you know, so this way we can handle the occlusion part. All right. And here we try to compare the shape matching algorithm with the pyramid histogram of gradients. This pyramid histogram gradient is based on the object detection and is also considered as a global shape matching algorithm. And so the test here is against the video sequence with great lighting changes. As we can see since the pyramid histogram of variance is based on local ad detection algorithms, it is very sensitive to either make changes. But our approach is based on average over all the possible segmentations, so it is more robust to this type of changes. So it gives us some more here is the rough structure of the shape, more -- or actually tracking results. >> Question: So that image in the lower left corner, what is that process? >> Hongzhi Wang: This one is a rough shape structure for the template he is using now. >> Question: And the shape structures getting visualized by the gray encoding the size of the region, the average size ->> Hongzhi Wang: Yes. >> Question: -- of the region, like you showed us for (inaudible). >> Hongzhi Wang: Yes. All right. So here is the conclusion. The major point of this work is first we propose to use image segmentations to represent and compare the complete shape in the image. And we use mutual information to do that and to address the run reliability of the image segmentations, we give a solution to average over all the possible segmentations. This approach can be applied for shape matching and can be applied for image segmentations. And pretty much that's it and that's the work. So any questions? >>: Thank you for your talk. >> Hongzhi Wang: Thank you. (applause)

to give us a talk this afternoon. Hongzhi just... Stevens Institute of Technology, which is just across the Hudson... >> Rick Szeliski:

Related documents

Products

Support

to give us a talk this afternoon. Hongzhi just... Stevens Institute of Technology, which is just across the Hudson... &gt;&gt; Rick Szeliski:

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

to give us a talk this afternoon. Hongzhi just... Stevens Institute of Technology, which is just across the Hudson... >> Rick Szeliski: