Eyal Ofek: Good afternoon. It's my pleasure to invite George Liefman

advertisement
>> Eyal Ofek: Good afternoon. It's my pleasure to invite George Liefman from the Technion in
Israel to give a talk. George is finishing his PhD under the supervision of Ayellet Tal. He also
work as a core architect in Intel. Thank you, George.
>> George Leifman: Thank you Eyal. As you've been told I am from the Technion. I'm finishing
my PhD now and my research mainly focuses on geometric processing problems in both
computer graphics and computer vision. The common topic of all my works is similarity for
shape analysis and I would like to talk about this topic and let me start by discussing each one
of the components of the title. I will talk about what is shape, what is shape analysis and what
is similarity and how is it related to shape analysis. So let's start with the definition of shape.
According to the Oxford dictionary, shape is defined as the external form, contours or outline of
someone or something. So when considering images the outline or the shape is a curve. My
research focuses on 3-D objects whose silhouette is given by the object boundary, which is the
surface. The most common representation of surfaces right now is a triangular mesh which
consists of vertices and phases as you saw here. Shape analysis is aimed at developing
computational tools for understanding the object’s shape. It is using many application fields
and archaeology, for example, we want to find similar objects or missing parts. In architecture
we would like to identify objects that fit to a specific space. There are also a lot of other fields
like medical imaging, security, entertainment industry, computer aided design manufacturing
and many others. Let me mention on a few specific applications that are also related to my
research and show examples of my work in these areas. Let's start with segmentation. It's
probably one of the most popular applications of graphics as far as the last 10 or 20 years. In
segmentation the goal is to divide a given surface into meaningful components. Here you can
see on the left a result of our hierarchical segmentation algorithm from 2005. You can see the
sumo warrior segmented into the lips, the head, the torso and then each of these parts are
further segmented in more semantic parts like the feet and the fingers and the nails and you
can see the accurate hierarchical of each segment. You can see a very partial list of papers that
were published in this area in the last decade. Another example of separate shape analysis
topic is matching and retrieval. So here, for example, you can see a screenshot of our 3-D
retrieval engine and this engine supports text and shape-based queries and also lets the user
provide relevance feedback. And in this case I call it Georgle combination of Google and
George and hoping to have the same number of users but we're still waiting for them to satisfy
and let's see. Another topic is composition algorithm, so here in composition algorithms the
goal is to construct 3-D object by using the parts of existing objects. You can see an example of
what I was involved in. This is our composition tool which gets its input of two objects. In this
case the object of a human and of a horse and only by applying a few quick operations you can
get a centaur here on the right. And finally, let me discuss the third component of the title,
similarity. Similarity measures the resemblance between objects or between parts of objects.
In particular, for all of the above problems that I talked about before and many other problems
in determining the similarity is essential. In some cases object similarity is needed and in other
cases vertex similarity is necessary. So for object similarity each object is represented by a
descriptor and then the descriptor characterizes the object semantically, geometrically and
topologically and then we measure the distance between, the similarity between the objects by
measuring the distance between the descriptors. So in this case in order to say the horse is
similar to the camel we just compare the proper descriptors. In case of vertex similarity it is
done by representing each vertex by descriptor and then comparing the descriptors in the same
way. Many of the descriptors were recently proposed also for object and vertex similarity. I
will not go to those but let me mention only a few of them. For global object descriptors this is
a partial list and in many cases, for example, a light field descriptor from 2003 was known to
outperform other descriptors for many years and in this case it considers objects similar if they
look similar from all of your angles. And to build a vertex descriptor, a local vertex descriptor a
vertex is created through a histogram with a feature vector and this vector characterizes the
geometry of the neighborhood of the vertex. You can see once again a partial list of the
descriptors. There are popular vertex descriptors and I will discuss some of them later when I
make use of them in my work. My research was essentially on shape analysis problems at the
basis of same shape similarity as I said before. I can classify my work into the four main fields.
That is matching and retrieval, reconstruction and composition, segmentation and colorization
and saliency detection. I will start with briefly describing my latest work on reconstruction and
colorization. Then in the second part of my talk I will focus on the one problem which is
saliency detection of 3-D objects. So let's start with the reconstruction of relief objects from
line drawings. This was a joint work with Kolomenkin, Shimshoni and Tal. Given the line
drawing of an object as you can see here on the left our goal was to automatically produce a 3D object from it as you can see rotating it over here on the right. Our main application was the
reconstruction of archaeological artifacts and based those line drawings in the reports. While
many findings might have been lost or destroyed over the years the illustration remains in the
archaeological reports and therefore the only existing thing before reconstruction is the line
drawings. This problem is challenging due to the following reasons. First, the lines are usually
very sparse and therefore the object is not fully constrained by the input. Second, the line
drawings often were ambiguous and the line can have different geometric meanings. For
example, they can indicate 3-D discontinuities, ridges, valleys surface creases and other stuff.
And finally, the input may consist of very large number of strokes and the algorithm must be
efficient enough to handle them in normal time. We partition our reconstruction problem in
two subprograms. First, we reconstruct the base and then atop this base we reconstruct the
relief, as you can see here. The main idea of the basic construction which is based also on
similarity is given the silhouette of the object we look only on the outline of the object and the
findings in the database a 3-D object which is most similar to the silhouettes of the drawing in
the picture. Then when we have the more similar object base from the database, we are trying
to deform it to better match the drawing. The second part, the relief estimation is based on the
idea of computing the relief by reducing this problem to the topological graph sorting. On the
way you also get… Yes?
>>: For every input do you always get this side and bottom? Is that what we're looking at
here?
>> George Leifman: It depends on the topological report, but in some cases we have only one
view. In most of the cases we have two graphic use and in some very, very complex cases we
see three views from three of the graphics.
>>: But in all of them do you have something like this some [indiscernible] wherever and the
task is [indiscernible] the bottom. Like do you ever have ones where it's a cup or a vase or a
reconstructed side?
>> George Leifman: [indiscernible] I think I should show some example of the latest, but in this
case algorithm works with three inputs. It can work with only one input and you can probably
work with two and even three.
>>: Okay. So when you say base you don't mean bottom; you mean…
>> George Leifman: No. When I say base I am talking about this, the smooth object that
doesn't contain any ridge. So essentially what we do is take these two drawings and get only
the outline of each piece of them. According to what we find in the database, the most similar
object, similar one. And if I find it before me, to match the original drawing here even better,
then I have this smooth base, we take the second step of the relief estimation which gives us a
relief and we put it on top of that smooth base. Okay? I just wanted to mention that the last
week [indiscernible] so I similar work of reconstructing 3-D objects from images by
[indiscernible] from Michigan and they also used some kind of prior and try to deform it to get
better 3-D object, so in our case I think when using the database it should be even easier to get
more accurate results. So I'm not going into the details. If you want you can ask me or see the
paper, but I will show only a few results. For example, here, it's part of the Roman oil lamp.
You can see here the outline of the object and then the relief and both you can see that also
the object and the…
>>: [indiscernible] and see this is very gentle probably [indiscernible] final relief and I was
wondering why because the geometry is not fine enough or what is the relief [indiscernible]
what is the bottom of the flower just, these bunch is not represented anywhere [indiscernible]
>> George Leifman: This one?
>>: Yes this bunch. Is not represented…
>> George Leifman: We have small stuff here…
>>: So without going to do this, I would I'm just curious if this is because the resolution of the
geometry is not fine enough to create these or…
>> George Leifman: Is probably because of the pre-processing we do on the image. The image
here is pretty…
>>: So you lose that in the pre-processing?
>> George Leifman: Probably most -- you can see here there are two or three of them, but
during the processing we need to get all of the edges so we just run an edge detector in the
image to understand where the lines are. So some of them are, this one got lost in the process.
But in most of the cases you will see the fine details. For example, here, also here some of
them are smooth but another result is also the full Roman oil lamp because the relief of the
horse and the outline of the object and both are pretty similar to the original one. This one…
>>: [indiscernible]
>> George Leifman: It's done with the program tools. Only stuff that remains is only in the
archaeological part.
>>: No architects, that you have both the artifact and the drawing to see if this…
>> George Leifman: We really don't have any of them. In some of the cases along with the
drawing you have a picture, a photo of the object and we try to compare it, but most of the
comparison was done with memory. It looks pretty similar. We tried to, even now we're
working on another work how to use also the drawing and the photo in order to enhance the
results and to see to make it better and maybe also maybe to put a texture on the object, but it
was not in this specific work. So we can see here another case of relief on the vase and this
was, the question you asked about the weights of the top and the bottom and so in this case
here the outline of the object is similar.
>>: [indiscernible] how did the original database object…
>> George Leifman: We have it in the paper. I don't think there will be time in the presentation
to show you. In the paper it has all of the steps. In this case I think the vase was very smooth
without any rough edges. It was just a goblet with a smooth bottom and on this…
>>: [indiscernible]
>> George Leifman: No. In this case we didn't use this assumption. If you use it you get better
results. To tell you the truth when you look on this object from the top you will see that the
circle here is not complete circle, but it's very similar to it, but it's smaller overall, but since we
didn't use object stuff is revolution. If you put it inside a little bit, it's a complete circle. So in
this case you have a lot of lines. I think in this case we had 500 or 600 lines that needed to be
accurately reconstructed here on the relief and all the lines are interconnected and it has its
challenges. Okay. This was the first topic. Now let's briefly describe our colorization work and
we have two papers on this work. So in general, colorization is a computer assisted process of
adding color to black or white images or movies. In the past there was a lot of colorization
proposed in computer graphics and computer vision and here you can see a pretty work from
Levin in 2004 where by adding only a few scribbles on the image as you see on the left and you
run the algorithm and you can see this pretty nice image, colorful image on the right. And we
extended this approach to the surfaces in three dimensions. Usually meshes in surfaces are
textual images, but in some applications we don't really need those rich textures but they
require colorization only by a few colors. An example of such application is in reconstruction of
ancient cultures. So here you can see the painted result of the reconstruction of the statue of
Caligula. So essentially, they made a replica of the original statue and took a color brush and
colored it different colors on the object. We propose an algorithm that can perform this task
fully automatically in the computer. So here you can see an example of applying our algorithm.
You can see a statue of Greek [indiscernible] being colorized. It originally used our algorithm.
We start by scribbling two colors on the left and the system propagates the color to the rest of
the mesh and the hair was colored in brown. Now to make the point more alive we add red to
the hair band and pink to the lips and you can see in less than a few minutes the results on the
right.
>>: So how do you diffuse, in the original color in that colorization work, if you use the…
>> George Leifman: This one?
>>: No, diffuse the changes and in saturation and value according to the change in intensity. So
how do you guys, how do you change the…
>> George Leifman: I will talk about that. Extension from images to meshes is no
straightforward. It is due to the following issues. The main issues are the following. First of all,
the choice of points that get the same color. So in 2D as we said it was done according to
intensity, so if you have two neighborhood pixels that have the same intensity, regular
intensity, most likely they will have also the same RGB color. In the case of the 3-D meshes, we
don't have any intensity channel, only the geometry. The second challenge is the color bleeding
which occurs also in the images so in images the technique is to use edge detection and by
using edge tension to avoid the color bleeding and then meshes most of the edge detection,
the algorithms 3-D edges are broken curves and it cannot be used. So in order to overcome this
problem, in order to choose the points of vertices that have the same color, neighbors that
have the same color we use vertex similarity. So eventually we have two vertices which
categorizes by descriptor. It categorizes the geometry of the neighborhood of the vertex and
then if two vertices are similar from the point of your geometry we say that it also, that this
vertex should have the same color. And this assumption leads to optimization problem as a
result of the optimization problems similar to the living work is the colorization itself. The
second issue of color bleeding avoidance is here we propose normal direction field in measure
and so here we can see that, for example, a color bleeding for one range to another since we
had a gap in the edge, the 3-D edge. So we propose to use direction field and what we do is we
direct the vectors of the directions field to be, to give the opposite orientation and then they
reside on the different sides of the edge. So we can see here that the simple constraints are
here and in this case we just want the field to be smooth so the green lines generated
automatically and as a result if I were this field and then this direction we used to symbolize the
similarity measuring the improve vertices. As a result we don't have any spilling of the color.
>>: [indiscernible] generate diffusion [indiscernible]
>> George Leifman: Diffusion, so in general those constraints, so along the feature line you
have the blue or the red lines and arrows, and then we want the whole field to be smooth and
as a result we have the green lines. And once again I'm not going into the details but just show
a couple of results. So this one is statue of David and Goliath so only with a small number of
strokes you can see the colorization is out on the right. And another example here of pretty
complex mesh so you can see here, for example, the fingers and the way they are separated
and it works nice and another problem with the dress, the dress has a lots of multiple folds, in
other words many convexes and concavities and still a couple of strokes across it managed to
colorize it without any problem. Another example. This one is the most complex one, so here
we can see many characters in different poses dressed in a variety of ways holding each other
and all the features are highly interlaced here, and still our algorithm is around 50 or 60 strokes
we colorize the whole scene pretty much very accurately. Okay. So the main drawback of this
algorithm was the fact that when we had patterns, in order to color as the patterns, we need to
add the color on each instance of the pattern. For example, here, to colorize this octopus we
needed to put color on each of the suction cups. So it's possible but it took me around half an
hour to put the color on each one of them. So this here on the [indiscernible] we had the work
that can handle this problem and opposed an algorithm that could do this automatically. So
here you can see a vase with different patterns and only by scribbling once, on a single instance
of each pattern or around it, it's enough to colorize the whole surface according to the pattern
and to have the colors in these cases. You see by the way that the not all the instances of the
same pattern are identical. We have the reflections. We have the small-scale information and
even in the some of the cases had none region information and still the algorithm is working on
these problems.
>>: You get a sense of how invariant the algorithm is to which pattern you select?
>> George Leifman: What do you mean?
>>: Well, see your input is all the ones in the center which are also…
>> George Leifman: We could also have gone around [indiscernible] so why was I able to in the
center and not in the side?
>>: No, my question is more like do you have any sense of when this starts to break down?
How different do the patterns have to be?
>> George Leifman: I will show some of the limitations at the end. If the pattern varies,
different [indiscernible], but as long as your algorithms are robust, fully robust to transition,
translation, reflection, rotation robust, algorithm is robust to scale, fully robust to scale, but the
implementation in order to implement it in a reasonable time robust only to the exercises of
scale since it searches specific windows and more than that was…
>>: If the patterns were on a more concave surface, would that still color?
>> George Leifman: Yeah. It would still color, most likely, yes. So let me talk about the main
steps of the algorithms. So given the 3-D object here this is, here I can say I zoom to the
previous surface and given the 3-D object so the user scribbles only on the one instance and
around it and then we produce a couple of automatic strokes. In order to produce automatic
strokes we use first we describe, we calculate each vertex here with descriptor and then we
classify these descriptors in accordance with the pattern they are inside. So as a result, the
results of the classification is these automatic strokes. As you can see here not the whole
surface has labels here, only part of them. It was enough to get only the results we are fully
confident in in order to get colorization. So the last stage is applying the colorization from the
previous paper with these automatic strokes and we can get this result.
>>: [indiscernible]
>> George Leifman: It's in the paper, of course. Here not, but I can say it in a couple of words.
First of all we perform for each vertex here, we perform [indiscernible] segmentation by using
the original colorization. So if you look here we run from each vertex here colorization
algorithms whose input is the yellow strokes and the brown ones, and as a result you can see
here in the region of the goblet is colorized in, fully colorized in brown and the rest is not. And
we perform it for all of the vertices. And as a result we have for each vertex some kind of
region that describes it, and this region is very similar to the original is the original pattern. And
now we produce two descriptors. One of the descriptors describes the region itself and the
second one characterizes the curves, the bounds of the region. So for the region itself we use a
PFA descriptor from histograms 2008 I think, and essentially it's the descriptor and do I have a
slide. No I don't have a slide about it. Essentially the descriptor is a histogram which uses for
each pair of vertices, the angular and the relationship in relation between their normals, so
essentially, for each pair of vertices there are corresponding [indiscernible] and instead of using
the whole three coordinates for each vertex and recording all three numbers. Instead of using
12 numbers, we use only three angles between a pair of vertices. So it uses a descriptor to
describe the region and in order to describe the boundary we use [indiscernible] and the
character of the boundary. And then we start with the classification process. The problem here
with classification is first the number of examples is very small, so in order to enrich the number
of positive foreground examples, the brown ones here, it was pretty simple. We can just use
colorization and understand each of the points are the positive examples, the foreground ones.
I think this it was page But for the negative examples kind of problematic since we don't know
where starts the next instance of the pattern so we can't do just the flat fit here. Because of
this we use bootstrapping for enriching the number of examples. We use the one class SVM on
the region descriptor and by that we found more positive examples, more brown labels. But
the problem of using only one class SVM which uses only positive examples as training data it
also finds many false positives. So for example, in this case in addition of adding any brown
pattern on the goblet, it also added some brown dots on these flowers. And then we use the
second descriptor which is using it as a boundary of the shape to filter out all of the cases
where which were incorrectly classified by the one classifier. And as a result we have only the
cases of positive examples right here. Okay. Another problem of our case as I told is
misclassification is totally unacceptable since if we have only one misclassified input stroke it
can finally lead to the wrong colorization of the whole surface, so in order to overcome this
problem we don't classify the whole vertices, all vertices of the surface but classify only the
ones that really we are confident in them and in case of using SVM, for example, we can just
use the vertices that are very, very far from the separation plane and in these vertices we are
very confident. Okay. So once again, it was only a brief description of the algorithm and I'm
not going into the details but as a result you can see here some of the results, so this one is a 3D object of stars on the note and you can see here in two strokes around the star and the two
strokes on the star gives this result. This is pretty much all the stars were correctly colorized.
>>: [indiscernible]
>> George Leifman: To run this one? This example is around I think 100 vertices and it runs in
20 seconds or 30 seconds. This example is more complex and you can see that this piece of
jewelry is symmetric and it has repeating reliefs on them and I scribbled a couple of colors on
1/8 of the symmetric parts and the whole object was…
>>: [indiscernible] same geometric shapes differentiate the scale of [indiscernible] so you
would prefer to have the same scale if there are contradiction that can cause [indiscernible]
>> George Leifman: There are two issues here. First, as I told before this algorithm is a scale
invariant, but the way we implemented it the search window is the same. For example, if you
take this little bowl and the large ones the implementation will differentiate between these
cases. But in some cases we had were very similar circles but also were colored in different
colors and the reason for it is enabled by this specific pattern. I'm not sure I mentioned it
before but when I'm talking about the region, as I described it, it describes a region of a specific
pattern with its immediate surroundings. So in this case, for example, you can see that this
green ball and this green ball are the same but the surroundings are different so we needed to
make in this case to make them the same color, we needed to put the color on both of them.
So don't look just on the balls or the circles; also look on the surroundings, because the circles
in order to get the same color also the surroundings and the pattern itself should all be of the
same color. So in some cases this can be an advantage, or if you use it wrong in some cases it is
a disadvantage. Another case of this is a chandelier and once again only a few scribbles here
will get this result and all the identical parts are colorized. In this case all of the parts are
completely identical so it was pretty simple example. Okay. In the paper there was another
example of the famous statue of three elephants and three creatures and [indiscernible]. You
used it in your paper I think.
>>: [indiscernible]
>> George Leifman: So in this specific object was a skin object and it had noise so I can say that
I don't have an example for you, but for example, if this one in this instance is where it is noise
and not completely similar. In most of the cases up until some kind of thresholds the algorithm
can do it automatically, but from the system point of view sometimes you need to put a color
more on one instance. And the algorithm uses, runs on two instances. So say, for example, you
have 10 instances of the same pattern and it varies and you have two different of them and you
colorize into them the same color the algorithm will learn on both of them and as a result it will
colorize all the other eight correctly. So this specific statue I needed to colorize twice or three
times and each, on the same pattern to get the results when it was scanned.
>>: What do you do with [indiscernible]
>> George Leifman: I don't have any problem with [indiscernible]
>>: [indiscernible]
>> George Leifman: If it's not connected [indiscernible]
>>: [indiscernible]
>> George Leifman: If it's not connected [indiscernible] just the descriptor will assume for
example the parts that are not [indiscernible] should be different patterns.
>>: How do you do it with the topology?
>> George Leifman: I do it using the topology, but if you have an object that you have a specific
pattern which is split it in two not connected compliance, it won't work. Or if it's split it in all of
the cases, it identifies this part as two parts and colors it correctly.
>>: So looks like this [indiscernible] how robust it is when you use it [indiscernible] so I believe
in [indiscernible]
>> George Leifman: We tried a lot of descriptors and then…
>>: [indiscernible] why did you end up with [indiscernible]
>> George Leifman: I ended up using PFH and PFH was proposed for a robotic community I
think and the main reason we use it and not other descriptor is because for instance it was very
robust to noise and specifically if I'm not mistaken PFH was designed to work on the point
clouds without any connectivity of the mesh.
>>: Question because this raises -- you should -- there are descriptors that are very good at
having noise. Usually there are descriptors that are very good at theoretically representing the
geometry. There is also the case especially with the relief most of the information from
robotics we consider noise because the relief is so small relative to the surface that visually if
you aren't [indiscernible] of some kind, the [indiscernible] will be too small to [indiscernible]
>> George Leifman: Yes, that's right but this one the point is right but I always I used as a
search window with regard to the size of the feature I am working it. For example, the simplest
case here, the size of the search window is twice the bounding box of this case. So you are right
if you are looking at the whole object, but if you are looking at something small, the whole scale
will be different, so each case…
>>: [indiscernible] for the subject there is a window…
>> George Leifman: No, automatically. Here I can find the bounding of the, the bounds of the
[indiscernible] object by two or by 1.5 or by…
>>: Assume the user by [indiscernible] two of them [indiscernible]
>> George Leifman: The user, but drawing the input defines the size of the window size of the
subject, so essentially it's right if you are going around -- but the instruction to the user is to
draw on the instance and around. So in this case you can say that he could put the red stroke
here and the yellow one here.
>>: No. [indiscernible]
>> George Leifman: It wouldn't work? The instruction for the user is to put the strokes of the
background around the instance and as a result the algorithm found the search window. So it
was the colorizations part. Now let me recap the talk to now and I showed algorithm of
Reconstruction from the lines getting this object with relief. Then I showed the colorization
algorithm that takes the reconstructed object and by a couple of strokes you get this nice
colorful object. And now I will talk a little bit in detail about the last topic which is saliency
detection, saliency-based viewpoint selection given this object which shows a viewpoint from
which we can see the most informative view of this specific object. Okay, so this is where I
connect all my works. So it was at CVPR last year and detecting interesting or salient region of
an object attracted a lot of attention in computer graphics and computer logic in the last
decade. Most of the work has concentrated on images and videos. In this work I focused on
the saliency-based selection of surfaces and then I will discuss the second part, the viewpoint
selection using this saliency. So please take a look at the subject. As a human observer which
part of the subject would you consider interesting? So according to our algorithm the red
regions are the most interesting ones and the blue represents the least interesting regions.
Many problems in computer vision and computer graphics can benefit from the detection of
salient or interesting regions of the surface. Examples are similarity, it's your work. Icon
generation, viewpoint selection, simplification and I can talk about some more works. What are
these interesting regions? I said that a lot of work was done in images and movies and for 3-D
objects I am aware of only three works in the last year and [indiscernible] was the fourth work I
think which is not in this presentation. So Lee et al defined a measure of software saliency
based on the mean curvature. Other works including Gal and Cohen-Or the text regions by the
curvature of the surface is inconsistent with its immediate surroundings. And therefore they
take into account the human tendency to be drawn to differences. In our case we also look for
region distinctness, but unlike prior approaches we are trying to focus not only on the local
distinctness but also consider global distinction. Another interesting approach was proposed by
Shilane and Funkhouser and in this case the distinctness is based on the similarity between the
given surface and a similar object in its class. For example, in the case of the animals, the shape
of the ears of the stencil bunny are unique to rabbits and that's what distinguishes the bunny
from the other animals, where the shape of the body is not very distinctive and therefore you
can see that the ears are red and the rest of the body is green or blue. So our contribution is
twofold. First we propose an algorithm for the detecting the salient region of the surface and
then we present an algorithm for the viewpoint selection whose goal is to make visible as much
as possible of the region of interest. So our approach considers three main principles. First,
since most people are drawn to differences we say that the region is interesting if it differs from
other regions of the mesh both locally and globally. Therefore, we look for the mesh vertices
with distinct in their appearance. For example, the vertices on the index finger points to the
mesh are very distinctive in this case. Second, was found by [indiscernible] that shape
extremities are also considered salient and examples of vertices on the tips of the hairdo and
the tip of the feet also should be considered salient. And finally, we consider the human
tendency to be drawn to the tendency to clothes items together and therefore regions with the
clothes was a focus of attention are more interesting than the regions that are far away from
the focus of attention. I will discuss all of these components and the way to realize them in our
algorithm. So let's start with the first principle of distinctness and according to this principle we
look for the vertex where the geometry is unique. First, we think of descriptor that
characterizes the geometry of the vertex. Then we look for efficient similarity measure
comparing the descriptors. And finally, we claim that a vertex is distinct if it's descriptor is
dissimilar to all other vertex descriptors in the mesh. There are many ways to realize each of
these steps. We can choose different vertex descriptors and different similarity measure and
now I will describe our choices and why we use them. So in this case we examine a lot of
descriptors and found that the best one that I choose the best results is using by spin images
from ‘99. Spin images is essentially 2-D histogram that encodes the density of the vertices and
more specifically, given that vertex through v was a normal n, two cylindrical coordinates are
defined with respect to v and n. The radial distance r and the elevation e as you can see on the
screen. Then the spin image is created by quantizising e and r creating bins and counting the
number of vertices in each bin. As a result we have a 2-D histogram of the vertex density. After
choosing the descriptor now we look for the similarity measure between the descriptors. So
Ling and Okada says the similarity should be robust to small changes in the mesh like noise or
[indiscernible] and we reuse the diffusion distance which models the difference between two
histograms as a temperature field. And specifically given two histograms and in this case hvi
and hvj we defined the fusion distance d as shown here on the slide to construct a Gaussian
pyramid where each level contains the difference between the histograms. And then the final
distance is defined as the sum of all the L1 nodes of all the levels, as you can see here.
Specifically, this distance is considered a cross bin distance between histograms and it's pretty
fast comparing to EMD, for example.
>>: [indiscernible]
>> George Leifman: You have histogram. And in case of histograms you always have two
approaches, to use bin to bin distance which is fast but it has problems between the two similar
bins because of the noise and because of the meshing disperses. Or to use cross bin distance,
for example, in this more of a distance which is very robust through all of these small changes in
histogram. For example, EMD is very slow. For this case is also it can be considered as cross
distance and it does averaging over all of the stages and it's pretty fast. Finally, given the
dissimilarity value d between each pair of vertices we computed the distinctness value of
feature vertex. It would take the vertex and compare it to all of the other vertices in the mesh,
as you can see here. If the distance between the descriptors of this vertex and all other
descriptors is high, the vertex is distinct as you can see here. And the color is in red and
otherwise if the distance is not high enough it will be non-distinct and the result will be colored
in blue. However, in our case since we want global this thing this, this consideration is
insufficient. We claim that the vertex is distinct when the vertices similar to it are nearby and
less distinct when the resembling vertices are far away. And this is so because when the similar
vertices are far away it can indicate a 3-D texture. And hence the similarity measure should be
proportional to the difference in the appearance and inverse proportional to the geodesic
distances between the vertices. And finally, we say that the vertex vi is distinct when it's highly
dissimilar to all other vertices.
>>: [indiscernible] scales?
>> George Leifman: [indiscernible] all the computation [indiscernible]
>>: So it's distinct at least in one scale it's distinct?
>> George Leifman: Everywhere in this case. If it's distinct in one of them, if it has a high score
in one of them [indiscernible] even after averaging it's going to be high score [indiscernible].
We tried different approaches also taking the maximum [indiscernible] putting variables in.
Okay. So until now we defined the distinctness of the vertex which was the first principle and it
took into account the dissimilarity to other vertices and it was weighted by the geodesic
distance. The second consideration was the shape extremities. As you can see here on the
slides the red dots are the shape extremities. The question is how to find these red dots. So we
use the following approach. Given the object, we first transform it to pose environment and
presentation as you can see here, given this dog and transform it into [indiscernible]
interpretation. Then we determine if it is a limb like or not limb like object and finally we find
its extremities. Let me discuss each of the steps. So the first step is multidimensional scaling of
the 3-D -- in this case when we mentioned scaling it is in 3-D transforms the mesh such that the
Euclidean distances between the points on the transformed mesh become similar to the
geodesic distances between the corresponding points on the input mesh. As a result, you can
see that all the parts of this monkey are straightened up like a folded tail and [indiscernible]. So
here we have scaled variant representation of this monkey. It fully ignores its pose. Now to
decide whether the object has a limb like structure, we know that the volume of the round
shape is similar to that of its convex hull. While the volume of the limb like objects differs from
the volume of its convex hull. Therefore, we utilize a procedure which works very simple one
which works well in practice. We compute the range between the volume of the convex hull of
MDS to the object volume and if this threshold is high enough we say that our object is limb
like. If it's low, similar to close to one, we say that our object is not limb like. It's very, it works
nice. And now for the cases of objects that are limb like we want to find the extremities. So
given the limb like object, first we say the vertex is extreme if it satisfies two conditions. First,
the results of the convex hull of the MDS transform mesh as shown here on the left and second,
it should be a local maximum of the sum of the geodesic distance from [indiscernible] as shown
here. The definition derives an algorithm for computing our extreme vertices and giving the
mesh [indiscernible] first we compute the convex hull of its MDS transform mesh and then
among all the vertices that resides on the convex hull we find all the, only those ones that
satisfy this particular condition and the result is we have these extremities are marked with the
red dots. Okay? So until now we calculated the distinct and the extreme vertices. And the final
step of our approach is we take into account the fact of the visual form may possess one of
several centers of gravity about which the form is organized. Therefore, the region with the
closest focus of attention should be more interesting or more salient than the faraway regions.
The model of the focus follows. We define the fraction of vertices with the highest distinctness
of value as the focus points as shown on the left here. And the cessation a of the vertex i is
defined as a function of the distinctness of its closest focus point and as a function distance to
this point as it comes to here. And you can see it as a small Gaussians around each of the focus
points. Finally we integrate all of the results together from the algorithm by applying this
formula and the degree of the interest of the vertex vi is defined with a maximum of the
distinctness d and the extremity e taking into account Association a. the result here is shown on
the right. Let me show some results of the saliency of the object. Again, the regions of the
color in red are most interesting, most salient and the blue ones are the least. You can see here
that in most of the cases it is usually expected salient region are found. For example, for the
dog, the algorithm finds the threshold features; the feet and the tail are interesting. And you
can see for the threshold features are the most interesting. For example, for the chess set, the
chess pieces are more interesting than the board. And then you can see the unique pieces are
like the Knight and the Queen. King and Queen are more interesting than the pawns because
since there are many regular pawns and less of the unique cases.
>>: [indiscernible] like in those eyes introduces [indiscernible]
>> George Leifman: This is some of them [indiscernible]. One point on the second
[indiscernible] and you sum it over the whole mesh.
>>: In the previous example [indiscernible] you have [indiscernible] hair that has curls in a more
interesting geometrically will steal the attention. The face wasn't really that interesting. It was
interesting for us when we know semantics [indiscernible]
>> George Leifman: Our algorithm has nothing to do with high level features only the lowerlevel ones, but for example, in the case with the hair since we have, we can see a 3-D pattern
on the hair we try to reduce -- even so, it's a very interesting from, very different from a
geometrical point of view. If you have a pattern and because of the regularization by dividing
by the geodesic distances we reduce the saliency of the hair. It's considered as a pattern.
Okay. So here you can see some comparisons. So it's a good thing that Gal is not here. I can
say his results were not very nice. So for example, for the frog you can see that the facial
features are correctly found as salient and also the feature on the limbs. Similar results you can
see on the camel. And this slide compares to Lee et al, 2005. Also, once again, you can see that
since they are using only the curvature in their case the results are much more noisy and every
curve has a salient. In our case results are more concentrated on specific areas and -- for
example, here the body of this dinosaur is less salient in our case. So until now we demonstrate
how to compute the saliency. Now we demonstrate the utility of this salient region in the
viewpoint selection. So the goal here given a surface is to select the camera position from
which the most informative use. In fact, we do even more than this. We automatically found
the minimum set of views which jointly describes the surface. For example, here are two views.
The key idea is to maximize the ROI, region of interest as demonstrated here we selected
viewpoints that correctly described the different regions of interest of the object zone. In this
case two views were sufficient to describe the whole object. I'm not going to all the details of
the algorithm, but it is based on three main principles. First, we wish a viewpoint to reveal as
much as possible of the salient region. And for that a simple set of candidate viewpoint and
bounding sphere and the value of the quality of each viewpoint according to the size of the
region of interest reveals. And then the best candidates we find. Second, the viewing angle is
also important. So the contribution of each region is weighted according to the angle between
the surface normal and the viewing direction. And the weight should be high if the area of the
region occupied in the projection is high. And finally, we should account for symmetry. There
is no point to show two viewpoints of symmetric region. So for example, in this case the first
viewpoint was selected by algorithm is the front view of the statue which reveals the maximum
interest. And then it reveals additional view from the back. Jointly with the previous one, it
reveals the whole, most of the region of interest of this specific object. Okay. For example, for
this Buddha statue we select three viewpoints to fully describe the object and you can see that
each of these viewpoints reveals new information and in this case we needed three. And finally
only one viewpoint was generated for this little teapot which captures both the topside and
the, the top and the side. And here a single viewpoint is sufficient since the bottom of the
object is not interesting and it also illustrates consideration of symmetry since the teapot is
perfectly symmetric, there is no point to show it from the other side. Here we compare to
some of the statues with Laga results. You can see here that our result on the top are much
more natural than the results on the bottom in all the cases.
>>: Can you, so same question. The top ones is your method?
>> George Leifman: Yes.
>>: So take [indiscernible]
>> George Leifman: Okay. Mentioning, but in Tal’s algorithm we found the viewpoint selection,
the rotation and [indiscernible]. So the only single [indiscernible] algorithm found is only the
viewpoint you are looking from and the rotation is [indiscernible] projection is [indiscernible]
>>: So when you compare to Laga is using only the selection of the two direction, is using the
same measure of importance or…
>> George Leifman: They don't use importance [indiscernible]
>>: Saliency.
>> George Leifman: They don't use saliency for viewpoint selection. In this case Laga uses,
maybe they -- I don't remember the details of how…
>>: [indiscernible] I remember the [indiscernible]
>> George Leifman: It was a paper only…
>>: What did they try [indiscernible]
>> George Leifman: I think that they use caricatures…
>>: For what? [indiscernible]
>> George Leifman: Yes. For viewpoint selection. [indiscernible] one viewpoint and they use
caricatures. I can't say that it's similar to this result. And this one compares our result to other
three results. As you can see we think that our result is better since it is a three-quarter view
which is popular in art and stuff and reveals more information then not only this idea. And so
in addition to the comparison, we also tried to do it more quantitatively so there was no
available ground tools for the best viewpoint, so we conducted a user study. And the goal of
the user study was to evaluate our results against the views considered as informative by the
human observers. So we conducted the study as follows or each of the 79 objects we produced
12 images taking each one from different viewpoints and put them uniformly on the sphere and
we asked evaluators to pick the most informative of the user object and each evaluator could
mark more than one view. We had 200 evaluators and obtaining 68 evaluations on average per
object. And for example, for this object there were three views that were considered
informative by the valuators. Two of them are symmetric. And to assess the result to follow
with and we compared our view, we found against the views here and it seems the evaluators
could only choose between the 12 viewpoints. Our result considered correct if it was closer to
one of the viewpoints and it was not selected by the users and 75 out of 79 objects were
correctly found. And let me show you an example of the four cases where we missed. So for
example, in this case it was a tank. The most salient features are indeed here on the top of the
tank, but most of the evaluators wanted to put the tank in the natural position. It reveals less
interesting points but we all know the tank should be on the ground.
>>: So this is [indiscernible]
>> George Leifman: So this is imported map and the viewpoint were still symmetric and this is
the one of the user study.
>>: And the wheels are not considered [indiscernible]
>> George Leifman: Yeah. Since there are five on each side [indiscernible]
>>: [indiscernible]
>> George Leifman: [indiscernible]
>>: You would want to show at least one of them in a case like that, right?
>>: Not necessarily.
>>: I mean something, it could be something that appears a lot in the object, so it's not
interesting in the sense that you see it many times and even though you see it many times you
should see it at least once I think.
>>: You are right. I agree with you, but I am saying this is what we did…
>>: No. [indiscernible]
>> George Leifman: It is a philosophical question, but in our case it was something that took a
lot of time, a lot of times we don't consider even one of the wheels because we did not
consider it interesting and did not show it in this case. So let me conclude this with specific
work and then I will go on to hold talk. We introduced a novel algorithm for detecting surface
region of interest. This algorithm is based on three considerations, local and global
distinctness, shape extremities and association. We showed how to realize how to realize all
these components are done and how to pull them together and then we showed that the
region of interest can be a significant input to many computer vision applications and we
showed the viewpoint selection application and also showed that this is better than most of the
state-of-the-art results and also it was reinforced by a user study that we conducted. And
finally, let me recap the whole talk. I showed that similarities are important aspects for many
shape analysis problems. So first I showed how to reconstruct a 3-D object from a line drawing
and for this we used object similarity, global object to achieve smooth base from the data base
and then reconstructed reliefs on top of it. And then I described a couple of colorization
algorithms that employ vertex similarity to decide which regions should get the same color.
And finally I showed saliency-based selection viewpoint algorithm which uses similarity to do
compute the distinctness and we showed the result. And this concludes my talk and thank you
for the attention. [applause]
>> Eyal Ofek: Okay. Not sure we have more questions.
>> George Leifman: If there are more questions on the details I have some backup slides I can
show you [laughter].
Download