>> Zhengyou Zhang: It's my pleasure to introduce Pascal... actually both got Ph.D. from (inaudible) and (inaudible) but...

advertisement
>> Zhengyou Zhang: It's my pleasure to introduce Pascal Fua. Pascal, and I we
actually both got Ph.D. from (inaudible) and (inaudible) but he spent most of time
at SRI and now he's a professor at the EPFL in Switzerland, in Lausanne. So he
will talk about deformable object modeling today. Pascal.
>> Pascal Fua: Thank you. So, yeah, I've actually retained an interest from our
days in 3D geometry, so today it's going to be 3D geometry of deformable
surfaces. And the game is going to try to get the shape preferably using a single
video camera of a deforming surface.
And the reason we got into this is America's Cup. So as you know Switzerland is
a big maritime nation and but it has nevertheless won the last America's Cup
twice now and the idea is we are supposed to help them keep it.
And to do that, what we've been asked to do is to model the shape of the
spinnakers from video and actually that's a very good example where doing
things from video is good because these things are huge. These things are 100
feet tall. You could maybe scan them using other devices but it wouldn't be all
that convenient. Video is what's needed. And another very important point
which is why I usually show should slide, you want to know what the competitor
is doing as well.
So you could conceivably have sensors on your own boat, but it's unlikely that
you'll have sensors on the competitor's boat. So it's really a case where video is
good. So we've been working on algorithms to do this. And the reason of course
being that what the sailors want to me and the designers want to know is do their
sails behave the way they should? They have simulated them but they want to
know whether what they have simulated is really what happens and how they
could improve it.
>>: (Inaudible).
>> Pascal Fua: I'm sorry?
>>: Are you talking about the frame? Because I'm not familiar with (inaudible).
>> Pascal Fua: The spinnaker is the big bell shaped.
>>: (Inaudible) the sail.
>> Pascal Fua: The shape of the sail. So we want to recover the 3D shape of
the sail. And right now, I'll talk about that a little bit later, we gave them a system
to do this offline but now in the current phase of this project, we are working on
this boat and another one we want to go realtime. And the idea is in realtime you
have a readout of the shape of the sails and how you should change your
trimming of the boat to improve performance.
>>: And that's legal according to the sailing laws.
>> Pascal Fua: That's legal. Actually it's a self defeating thing. Is the reason
video is allowed is because people assume you can't do anything with it. Once
we demonstrate that you actually can do something with it, it will be banned,
that's okay, the student will have his Ph.D. since.
(Laughter).
And actually we have other problems up our sleeve and that's another project
EPFL is interested in, which is called solar impulse. So solar impulse is a plane
that's powered by four electric engines with a bunch of solar cells on the wings
and the idea some day, the dream is some day it will go around the world so full
circle on solar power, no fuel. And while in this day and age actually that's -- I
guess that's a good kind of project to have. And turns out that to do this, what
you need to have is a plane that's very light with very big wings so that you can
have lots of cells. So if you say big wings very flight you say very flexible.
So the wings will redeform -- actually the wings of all planes deform but the wings
of this one will deform even more. And it's the same kind of tech following. You
want to be able to during the flight tests to understand what's truly happening.
And okay so I've given you two examples which I like because in the university
settings they are great projects, I mean they get the students motivated. It's not
hard to get the students to work hard on something like this.
But this has lots of other applications in more mundane matters. For example,
you should think of a crash test, you send a car against a wall, you want to know
how it deforms, you will also have differently surfaces that you want to measure.
>>: People have been doing this for years.
>> Pascal Fua: Yes, they have but, sure but with markers. And they get
information only where you have the markers. You don't get the fine -- you don't
get the fine information that what happens between the markers.
>>: Well, okay. You talk to the automotive companies, Ford or everybody has a
big office, a big team, (inaudible) I mean (inaudible) deform surface. And then do
the cash test.
>> Pascal Fua: Sure. But if you to that, you look at the cars, they have targets.
>>: The real cars. I'm talking about scanning.
>> Pascal Fua: I talk about the real crash tests you go to a car company and it's
really heavy, it can be done but it's a heavy thing where you have to very
carefully put all these markers and it's like motion capture of people. You can do
Vicon where you put markers and you get what people do, but you get only very
discreet measures if you want to get finer measures then you want -- it's useful
that the video in between.
And then, okay, there is also -- all sorts of other applications. I'll get back to that.
Things like model how clothing behaves. In a medical case, our insides are full
of deformable surfaces. I mean the list is long of things that this could apply to.
And okay, so what I'm going to do is present a bunch of techniques to handle
these problems using different assumptions on the nature of the (inaudible) we
are trying to track and in all cases I am going to be in a fairly standard framework
which is the framework of the deformable models where what we have is an
objective function and we want to minimize with respect to camera pose and
some shape amateurs. And that optimization function is going to be as often the
case a sum of two terms, a data term, usually based on correspondences
between a model image, which is this one, a model image in which we know the
shape and an input image which is the image that we get in our video in which
the shape is unknown and we're going to assume we can establish
correspondences between the two.
Plus and of course as we all know correspondences are noisy. So none of them
will be good, so you need typically to add a second term which is a requisition
term and whether this works well or not is usually predicated on how good your
representation term is.
And what makes this challenging is of course if you have only one view it's highly
ambiguous because shapes, deformable shapes many, there are many -- given
one projection, you can -- given one projection that could have been produced by
several shape. So the most obvious one is a scale ambiguity where two shapes,
a big shape far away and a small shape close up give you the same projection.
But there are many others and this one is an interesting one, the barrel leaf
ambiguity which says that if you have a convex shape and a concave shape the
projection is not exactly the same, but it's very close. And given the fact that your
projections are -- your correspondence are noisy, it's very hard to distinguish one
from the other.
And there are in fact many more than this. So what we're trying to show is a
highly ambiguous underconstraint problem and we are going to need some form
of a shape model or deformation model to solve it.
And the challenge we've tried to address in this work is, well, how can we design
our models, our reputation terms so that we make as few assumptions as
possible on the deformations? And so in this talk I will distinguish essentially four
different case, cases. Either I have very textured surface, which means I have
lots of correspondences, in which case I will show that you can actually get away
with very, very weak assumptions on the shape so that you can mold things that
deform really strongly like cloth. Or you have points at your surfaces in which
case you have far fewer correspondences if you use something like instead of
correspondences you could use of course template matching or any of these
techniques but it would be the same essentially and then you do need some form
of deformation models. And I'll talk about global deformation models, which is
assume smooth deformations and I will expand them to local models that can
also handle complicated deformations such as those of cloth.
So let's start with the case where we have a well textured object, surface we
want to model which means we have lots of correspondences. And here we are
going to pay homage to Olivier, our common thesis advisor and go back to
projected geometry, this is '90s stuff, but it still works where the problem we are
now facing in this case is we have a model minute in which we know the shape,
we have an image we are now looking at in which we have correspondences,
and we are going to assume that we model the surface as shown in image.
So on that image we have points for which we know correspondences. And
these points on the three surface can be defined in terms, they belong to a facet,
and they can be defined in terms of the very (inaudible) in the facet in which, to
which they belong.
It turns out that if you look at this every correspondence give you one linear
equation that relates the 3D corners of the vertices to the 2D projections. And
when you do this, it's the equivalent of doing a DLT, which means if you have a
whole bunch of correspondences you can create the vector Y and the vector Y
here the vector you get you take the 3D corners of all the vertices, you make a
great big vector by concatenating them and this set of linear constraints can be
written matrix form as MY equals B where M and B, M is the matrix, B is the
vector which are given by the correspondences. So once you have written that,
you could say great, it's done. Except it's not. Even if you have infinitely many
correspondences because if you look at the eigenvalues or the rank of this
matrix, M, it looks like this. And what's to be noted in this graph is there's a very
sharp drop here.
There are two-thirds of eigenvalues are clearly nonzero and the last 30 is not
exactly zero, in fact only one which is due to the scaling is exactly zero but these
are very small. And this is actually fairly easy to see. What's happening is the
correspondence is constrained vertices of the mesh to be along particular lines of
site which constrains two of the degrees of freedom but lets the third one pull the
constraint which is exactly what you see here. And that says that's a way of
formalizing these ambiguities I've been talking about.
So if you sold it just like that, you give that to Netlab you get essentially junk as a
result because of the small eigenvalues a little bit of noise in your
correspondences and you get crazy results. So what you need to do to solve this
is to increase the rank of the matrix M. And the question then we've tried to
answer is how can you do that without making lots of assumptions on the
surface? For example without assuming that the they flow smoothly because it
doesn't necessarily do that.
And it turns out that one good way of doing this is to replace the spacial
smoothness that's often used by a temporal smoothness by saying well I don't
know exactly how the surface deforms but I know, I assume that the versus don't
jump crazily from one frame to the next. So when they to put constraints for how
much they move from one frame to the next. And I'm going to do my
reconstruction not on the single image but on the sequence. Because I'm
dealing with video sequences anyway. Which means that if I have capital T
images I have one of the systems for each image. I can express that in matrix
form by putting a matrix with (inaudible) matrix here and here I put all my Y
vectors and I concatenate them. If I just do that, I haven't gained anything
because I haven't linked the frames.
But where I can win is by saying, well, I'm going to add a bunch of constraints
that say I cannot move more than a certain amount from one -- each vertex
cannot move more than a certain amount from one frame to the next. Or
alternatively and almost equivalently, I'm going to minimize the amount -- I'm
going to find the solution that minimizes the amount of motion from one frame to
the next.
And it turns out that that, these additional constraints allow you to add lines to
your big matrix and to increase in strength until evenly it becomes full rank minus
one and the minus one is the scale ambiguity which you can't get rid of.
>>: So you do assume (inaudible) points?
>> Pascal Fua: Yes, I'm assuming I'm tracking the points in 2D. So I have
correspondences in all the frames.
>>: Why do you have a scale? I mean you know you know for example the
length of the boom or the length of the wing or things like that, right?
>> Pascal Fua: It would go away. But here I'm not making an assumption, I'm
assuming nothing, so it could shrink. At this point in this case I'm not explicitly
saying that it does not shrink.
>>: Okay.
>> Pascal Fua: I will do that later, but here I'm trying to even not do that.
>>: (Inaudible) configuration of the very first frame also do the not fit somehow.
>> Pascal Fua: Yes. I'm going -- in the first frame it is fixed. And then you get it
to be full rank. And what's interesting about this is now if you look at the
eigenvalues of this big matrix, here is the kind of graph you get. All the
eigenvalues are nonzero, it's full rank except one, the last one. And the
eigenvalues for the first few, the first two-thirds have roughly the same shape,
which is one way to say that at least these modes you haven't perturbed by
adding your constraints, you haven't assumed anything about how it deforms, in
other words, or very little.
And that's actually pretty nice because it allows you to get deformation or
something like that, which is fairly complex, it is creases and we still get the
deformation out of those constraints. Which if you -- if you are using a standard
smoothness constraint you couldn't do that.
So of course and I'll come back to that, this is extremely textured, and of course
this is not an accident. Yes in.
>>: You assume that each (inaudible) is visible frames.
>> Pascal Fua: Yes. We're not having occlusions here. In some sense this is
more a theoretical proof of concept than a working algorithm. Okay. So this
works with one little caveat is as I've said, what we essentially do here is a DLT,
a direct linear transform. It's the same spirit at least. And as you know, when
you do that, you lose accuracy because what you are minimizing is not the truer
projection error, it's something else. So typically when you do, you do, you
compute fundamental matrices and the rest using DLT, you do that and then you
do a adjustment or you do something to do it more precise, precision is needed.
It turns out that in this case you can actually also get minimizer true reprojection
error if you use the SOCP (inaudible) that (inaudible) and Richard Hatley
(phonetic) introduced a while ago, I mean it's been known a while in other
disciplines but it was introduced in our field a couple years ago, I think. And what
is this, it's called SOCP stands for Second Order Code Programming and what it
is, if you can formulate your in those terms, find the vector Y such that this kind of
equation is satisfied for M such equations and you have to satisfy them, you
can -- there are ways of solving that, it is a problem convex problem and when
you've said convex you've said it could be easily solved. They are good solvers
for that.
And it turns out that our reconstruction problem can be formatted in those terms.
So let's go back. I have said that every constraints, every correspondence gives
us a linear equation of that sort. So we can just rewrite it. It becomes this. This
is what we are trying to minimize. And it turns out that minimizing this for each
correspondence can be expressed as, well, for you take this, you send this to the
other side, you multiply by this, and what we want to say is for each
correspondence we want the reputation error to be smaller than some gamma,
and that is an SOCP problem. So they are good solvers for that. And so the way
this is done, gamma is the maximum reprojection error you are willing to tolerate,
so it's not known a priori, what you do is you solve several times and you find the
smallest one for which there is a solution. And because it's convex optimization it
doesn't cost you that much to do this several times.
The one problem with this, and that's people have complained about this is it if
you just do it this way, it is not robust to wrong correspondences. Because as
soon as you have outliers, you will -- I mean it will produce a bad result.
However, this can be sold fairly easily because once you have outliers the ones
that give you the worst that satisfy the constraints the worst are the outliers so
the way you handle this you solve your problem with your correspondences, you
look at the ones for which that essentially gives you, that satisfy the constraint
least well, you remove them and you run it again until you find a gamma that is
within the threshold you fixed.
>>: (Inaudible).
>> Pascal Fua: You start with all your correspondences, you run the algorithm,
you look at the ones for which this instead of the inferior is exactly equal or
almost exactly equal. You get rid of them and you rerun it. And again, I'm
assuming I'm in a world where lots of correspondences. So I mean we do that
essentially in the end there are enough left that you get answer.
And you get so that works but of course not enough because we still have these
ambiguity problems. For example if I just do this, given this image, here's what I
get. And that's actually interesting. This, shall this represents to that. In
projection it looks perfect. But of course in terms of shape it's absurd. Again, it's
because we have this ambiguity problem. Turns out that we can introduce
constraint that are similar to the ones we've introduced in the first case that say
that between consecutive frames the orientation of the edges can change but not
infinitely so. We can essentially put thresholds on how much we allow them to
move from one frame to the next, and these constraints can be expressed as
SOCP inequalities. So they fit in the framework.
>>: Can you go back. So you mention that (inaudible) from one frame or from ->> Pascal Fua: From one frame. So now we are in one single frame. As I set if
you do it on multiple frames it would go away, but.
>>: (Inaudible).
>> Pascal Fua: Okay. So these constraints which are the same nature as the
ones before can be expressed in the SOCP constraint, in the SOCP framework
so that it will actually handle something like this where here that's a nice smooth
deformation and this is not. And what's interesting about this is I haven't
changed smoothest parameters in the algorithm, the algorithm is the same in
both cases, it's just a (inaudible) told I'm going to do one thing or the other.
And here is a complicated -- this is a fairly complicated deformation of cloth and
again does the right thing. So what you have here is the three reconstruction
and here we just -- I mean, this is just -- we just show the instantaneous
curvature as a color image and here we draw the level lines as though this was a
map.
>>: How many points do you have, how many trackers?
>> Pascal Fua: We have about how many? This has -- is about 100 facets, two
or three points per facet, so it's about 300.
>>: That's good.
>> Pascal Fua: Okay. So just to summarize upping in now, this works on this,
but of course this again I have two or three points per facet, which means I have
a very textured surface. The real world unfortunately is not like that or not
usually like that. So the next step is to try to deal with poorly textured surfaces.
And well then we -- the first idea is to -- well, actually the first idea which is a
known idea would be to use physics based models in the sense that they were
first introduced in the '80s.
The problem with those is they are physics based but it's usually a much
simplified physics because if you try to model the full physics it becomes awfully
complicated very, very quickly. So you can use the simplified physics based
models or since now physical learning seems to be the answer to everything so
in other words when you can't model you statistically learn, you can try to learn
possible deformations, which means you typically you have -- you create a
database of possible deformations, you learn a load dimensional model from it
and you do your fitting to your data in terms of the parameters of this load initial
model instead of the original my dimensional one.
The problem is creating the database is not always trivial. So for example, the
one of the well known examples in the field the multiple models, (inaudible) a
while ago, it's a wonderful model but it was getting the data was hard work, and
it's not clear that you can do that for everything. So what I'd like to talk about
now is how are you going to go about creating models maybe in a slightly more
practical way, if you are not in a position to scan the thousands of deformable
things and the spinnaker again is a good example. That thing that's 100 feet tall,
it's hard to -- if I had to create a database of all the possible shapes of that thing
depending on the condition of win C, et cetera, et cetera, it would not be easy.
It's probably doable but not easy.
>>: The implications of the (inaudible) texture surface, is it just having.
>> Pascal Fua: Fewer correspondences.
>>: (Inaudible) becomes more important.
>> Pascal Fua: Both. Which is the implication here the which I told you the
matrix was full rank so the ambiguities were gone implicit was the assumption
that I have one, at least one correspondence per facet, and that of course goes.
And as you'll see in the -- in what I show we start using the edges, we start
using -- so we use whatever texture we have in the middle but they are going to
be large areas that are blank. And in some sense again, this is a bit what I'm
presenting here is not meant to be the end product, right, I mean you (inaudible)
texture for example you could use shading. What I'm talking about today is how
far can you go simply with texture information and deformation models? But of
course maybe I mention that later, shading is important, too, so at some point we
want to use it as well.
Okay. So for something like a spinnaker that's when inflated deforms relatively
smoothly, one way we've found a way that works very well to create a low
dimensional model which is we take the mesh that defines the object, in this case
I'm showing you a regular mesh but it could be a mesh of any shape, and it turns
out that if you assume that the mesh is inextensible, so this time I'm explicitly
making the assumption, a mesh like this one is far fewer degrees of freedom
than the freedom it would have if you did not assume that the ends are of
constant length. This is because when for example for a vertex like this one,
once you know where this vertex is, this vertex and this vertex is, this one has to
be at the intersection of three spheres of -- whose radius are given by the lengths
of the edges.
So you don't have to specify the position of all the vertices to completely specify
the shape of the mesh. In fact it turns out that you can (inaudible) a mesh like
this one by specifying a relatively small number of joint angles which are the
angles between a subset of the facets. And you can show easily that in fact this
number is almost directly proportional to the number of edges at the border,
which is far smaller than the total number of vertices in general. Which means
because you have this (inaudible) you can sample it. So you can artificially
create a database of visible shapes and then run the dimensional reduction
technique to create your load initially model.
And in fact, in this case the one that works best is the simplest, which is PCA.
You just run -- you take this, you sample possible shapes, you run PCA, and you
get deformation modes. So that you can express your shape as a rest shape
which actually does not have to be flat in the case of the spinnaker it isn't, it's a
bell shaped thing, plus a weighted sum of deformation modes. And the
optimization -- your optimization variables become the weights that you assigned
to the deformation modes.
So for example, in the case of relatively smooth deformations, and this -- this
actually runs in realtime, you can wave the piece of paper in front of the camera
and it will reconstruct the 3D shape. Or you can stretch the thing and again it will
get the deformation. And of course this works because underneath here is a low
go, which means there is some texture. And that's actually in the case up to the
system we delivered to the sailing team is based on that idea. So we have the
spinnaker, we have a model of the spinnaker in its shape which is the mold that
was used by the sail maker to actually cut the panels of which the thing is made,
which means we could compute deformation modes and fit it to our videos. So
the yellow dots here that you see are reprojection of the 3D models on to the
spinnaker.
And in this case, of course, we have -- so that's interesting case. We have some
texture here, some texture here, and of course we use the con tours as well to
get this. And something I would think as an interesting extension which we have
not done yet but is to look at the shading. To us as humans, I mean the shading,
I mean it's a long of things. The probably is if you introduce shading in this
context you are not going to assume a point life source. This is a good example.
Yes, the sun is in the sky but the water reflects the slight so it's actually lit from
everywhere. And just to make things a little bit more interesting, the -- this, the
spinnaker is translucent somewhat. So it's a very interesting shading problem,
but it's not an easy one.
>>: This example you give (inaudible).
>> Pascal Fua: Well, it's dealing with self occulusion which is easy in this case
because we have a 3D model.
>>: It looks like all right some errors down there by the bottom, is that because
this scale actually stretches?
>> Pascal Fua: Here, yeah. Well, it's because we are -- okay. The main
formation is here, right. So there's a little bit of constraint here. It just has to be
that the edge here is not good probably. If you look at the contrast here, it's very,
very weak. So I would suspect that's what's happening.
>>: Okay.
>> Pascal Fua: What we did actually maybe to -- we did validate this in terms of
what the -- what they want, the measure they want. They don't care about pretty
pictures of course. What they want are the measures of curvature at various
levels of the sail. And so we ran actually some simulation, we create images and
with one camera we get within I think three or four percent of the real value, of
the through value. So we get that with one camera. This technique actually you
can use more cameras if you want so you get better results in the case like that.
If you one camera that looks from one side and the other camera that looks from
the other side and then we get it down to one or two percent with more than one
camera.
>>: (Inaudible).
>> Pascal Fua: I'm sorry?
>>: (Inaudible)?
>> Pascal Fua: We don't need to, but we improve it. I mean, not the more, I
mean the -- if I remember the curve correctly two cameras improve noticeably,
three cameras two and after that, it doesn't help that much to add additional
ones. But they don't have to be in a closed baseline. Actually it doesn't help if
they're in a closed baseline arrangement. Much better if they had a wide
baseline arrangement so they see different sides of the sail.
>>: So you don't (inaudible) area?
>> Pascal Fua: No, we don't.
>>: (Inaudible) the model you use is took it from the guys who design it?
>> Pascal Fua: Yes.
>>: That means for each different sail you have to use different (inaudible).
>> Pascal Fua: I'm coming to that. That's the next -- yeah, you're right. This is
the weakness of that technique.
>>: (Inaudible).
>> Pascal Fua: It works. In any case, these are cases where if you think of the
wing of an aircraft, you have the model but I'll get to that in a moment.
Okay. One more example. They have differently sails so we do have differently
models loaded in the system. And one actually thing we've recently realized
about this is what you've seen here is essentially a tracking algorithm because
we get the shape in one image and then we get the shape in the next image by
optimizing and starting from the previous position. But it turns out that there is
something very nice about this linear formulation is actually you can find your
solution in close form. Which means you don't need, you can't have automated
initialization of the algorithm. If you assume the mesh is inextensible. So why is
that?
You remember that I've said that these correspondences give you an equation of
the 4 M, which is given by the correspondences Y equals B. If you express
everything in the referential of the camera then the D vanishes and it's really a
simple MY equals zero. Y are your unknown. And M is this matrix which is not
very well conditioned which has a lot of very small eigenvalues. So one way to
say that is to say that Y is a weighted sum of the aye gone vectors that
correspond to the very small eigenvalues. When you've said that, your unknown
are these betas, the weights you give to those aye gone vectors.
If you now add the constraint that the mesh is inextensible, it means that the
distances between vertices are preserved and that gives you a bunch of
(inaudible) constraints of that form. And you have actually more constraints of
that sort than you have betas. So in theory you can solve this. In close form.
So a system query equation can be solved in theory except if you just do it if Y is
the vertex -- the vector or the vertices of your mesh, it's a large number of
equations and it explodes. The solver dies a miserable death. So if you don't
want the solver to die a miserable death what you can use is the fact you can
express all this in terms of this linear condition of modes. And this is linear as
well. So you have the shape depends linearly on the modes, the equation that's
linear in terms of the shape, so it's still all linear and you can express what you
are trying to solve so your shape Y which was rest shape plus this matrix XS
which represent the modes times A which are your unknown, these are the
parameters, the weighted parameters you give to your mode has to satisfy this
equation which is equivalent to the previous one and we've added one thing here
is this matrix W, which is a diagonal matrix which says that some modes you
penalize some modes than the others, you do mold the composition, that's what
you normally do.
Again, it means that your vector of weights A has to be in the kernel of that
matrix, can therefore be expressed as a weighted sum of aye gone vectors of
this bigger matrix than the previous one. But you now of far fewer variables
because your variables are not the X, Y, Zs for every vertex, they are the weight
you give to the modes. And it's still a set of poly equation in terms of those
modes and that now you can solve in the close form. Which means you can
have in one minute you have a bunch of correspondences, you have your model
image in which you knew the shape and in close form you can compute the
shape in a new image. This is very useful in practice because if you to tracking,
you lose sometimes. You always fail at some point. With this algorithm, you can
get the shape really from the single image without prior (inaudible) which is
absolutely essentially if it's -- if this is ever going to be a realtime working system.
And so here is are examples of how this works. So what's interesting about this
is what you see on the right is done on each frame independently. We have
not -- we on purpose to demonstrates that it works. We have not enforced ample
consistency. So it shows you it's reasonably stable. So of course after that you
want precision you would want to adjust. But even without doing it, it's already
reasonable.
Okay. So last topic which is and that's the point you made the technique I've just
shown works if you have a global model of the surface you want to model. But
that's not always available. So how are you going to go about modeling creating
a model for a shape that is made of say a known material but arbitrary shape?
Well, one way to do this is to observe that assuming that the surface you are
looking at is homogenous, has homogenous properties you can actually lure
local models by which I mean deformations of little patches of that surface.
And the way you would do that, we do do that is you take a piece of material, we
put Vicon markers on it at regular intervals, we wave it in front of the Vicon
system and we get tracks, 3D tracks for these patches. And every patch can be
considered as an example of what that surface can do. So each of them is a
training -- goes now training database and because we take all the patches from
the surface even with relatively short sequences we get a lot of training
examples.
So we put that in the database, we learn a local deformation model for how a
patch can deform and then to form the global mode we simply use a product of
expert framework, we just we take the likelihood to be the log likelihood to be the
sum of the log likelihood for the individual patches.
The particular model we use to model local (inaudible) we use Gaussian process
latent variable molds mainly because we've used them a lot in the past so we've
kept on using them. I believe actually for those local models you could use PCA
just as well, it would probably give about the same result. But the important part
is we get prior for local deformations. And we can assemble them into a global
prior for the whole thing.
>>: How big a range is your (inaudible).
>> Pascal Fua: Five by five.
>>: Five by five.
>> Pascal Fua: So on something like this the patch is a five by five. And the
overlapping. So we take all the overlapping patches. And what we get is
something like this, where now we have something in which we have really little
texture. In this case, we have, if you look at this thing, it has some texture here,
it has we use the edges, but they are large areas in which we have absolutely no
information. And nevertheless because it has a decent prior it can get this fairly
complicated deformation. And this speaks to your earlier question is now let's
take something made of the same material but different shape. And different
shape because we've -- especially we've cut a hole in there, so we also change
topology of the object.
Doesn't matter, we can still compute our product expert on this mesh. It's the
same local experts but topology is different, so we get a different global thing and
it works in this case as well.
So that gets around having to learn a new model for every new object. We learn
a model for a specific material.
>>: If we change the deformation I think (inaudible) just change (inaudible)
blowing air, (inaudible)?
>> Pascal Fua: That's good question. Is it because essentially you want to know
how flexible the metal is locally and then I -- okay. It's an experiment we have
not done and we should do. But I think the global property is given by once they
assembled. I'm not sure, I mean do you really think it would behave differently?
Because how much you can stretch basically is a property of the material, not
that much property of the -- okay. If you pull hard in your.
>>: (Inaudible).
>> Pascal Fua: Right.
>>: (Inaudible).
>> Pascal Fua: You will learn that. Okay.
>>: But (inaudible).
>> Pascal Fua: Right. Okay. So.
>>: (Inaudible).
>> Pascal Fua: There is dependence here, too, right, because the patches are
overlapping.
>>: (Inaudible).
>> Pascal Fua: But we don't know the -- I mean we don't want them to be too
strong, right? Again, we have this back in our mind is whether the minimum
constraints we can have and still have the thing to work.
>>: (Inaudible) I'm just saying that (inaudible).
>> Pascal Fua: I think that, okay, you have a point. As in all -- as soon as you
have statistical learning in there, you learn what you have trained for so you kind
of have to know if you are in a case where really you have going to have people
stretching the thing, you better have this kind of motion in your training data.
Okay. So just to tell you that this is not only for sail boat racing and other
amusing things, this -- I mean these techniques that we have practical
applications. And this is an example, for example for wings again but in a much
more industrial context where these people were interested in knowing what the
effect of putting winglets, if you put winglets on the wing and depending on the
size of the winglet and the angle of the winglet with respect to the wing, the wing
is going to deform in various ways and depending on flight conditions and you
want to measure that. And as you point out, you can of course do it by putting
targets which actually we did.
But if you could to that without the targets, it would help. And one thing that's
also very interesting in this is these kinds of techniques give you a way of closing
the loop which is the engineers of course are finding methods to simulate all this,
but you want to check that what you simulate is what truly happens.
And so that's actually some of the things we are working on for this solar impulse.
So this plane is actually not built yet. It's being built as we speak. However,
what we have in the lab, we have a two meter wing span scaled down model.
And the idea here is we want to know whether we could get something in a real
world -- the scenario we have in mind is these people don't want us to touch their
airplane in any way, shape, or form, right. So we have to be able to do this, so
the simplest thing would be to put cameras on the plane itself. But I don't want to
hear about that because cameras, we have to touch the airplane, they need
electricity and they are already short on electricity. We have to be away. So the
idea is this is a slow flying plane so we can imagine at least from the initial flight
tests to have a car in front. And it flies lowly. So that looks a bit from below,
which is kind of this view.
So we've simulated that in the lab, that's about what you would see and, okay, so
this is not incredibly, the deformations here, they are grad student induced
deformations, but the -- and the green thing is the outline of the wing is what the
algorithm computes by following the outline of the wings and the curves on the
right side are measures I think is the height of the wing tip with some reference.
And we've actually tried to validate this and our simulation give us that on the real
plane when you scale it up to what the real dimensions would be, in that
configuration you could measure -- this is a -- it's 60 meter long, it's big, and you
could get three centimeter in height at the wing tip and probably better than .5
degrees in twist angle, which is this, which is what the nearest need during their
flight test.
And if you want more than that, you can because this is done with a single
camera, but you could do better with more than one. You could use additional
visual clues for example we are not using the contours here, it's purely based on
correspondences. So the contours would help. And we could incorporate true
deformation models which we don't have really. But in this case it's a physical
thing. We would have the plans, the CAD cam model exists. We could
incorporate constraints if we wanted to go further than that.
So I think it's actually a very viable technology for a very real problem.
>>: (Inaudible). When the airplanes fly in the air. So what ->> Pascal Fua: You have a chase plane then. They will have a chase
helicopter.
>>: Okay.
>> Pascal Fua: So the first, okay, so the way this is going to happen is they have
a long -- there is a long military one where not far from where we are. So the first
flight tests will be on this very long runway, so it's just little hops. And hops the
car, the camera on the car will work. If that is successful, then of course they will
start flying for real and then the plane would be followed by probably a chase
helicopter.
>>: In the camera itself (inaudible).
>> Pascal Fua: Sorry?
>>: (Inaudible).
>> Pascal Fua: I don't -- I hope it doesn't really matter because we are not -- we
don't care that much about the global motion we discounted, what we want is
deformations.
And another, okay, now for a completely different kind of application which is
(inaudible) you mentioned I worked for long time at SRI in Menlo Park and these
people are interested in -- so work with unnamed three letters agency, and I
talked to this, I talked about this to the head of the center, this is cool, we have
an important problem. Our military would like to be able to have automated
readings of banners, you know some parts of the world where people are
chanting nasty glow cans and to do OCR on the banner you want to unfrill it first,
so could you use these techniques to say computed formation straight out of the
banner and then have OCR run on that.
>>: (Inaudible) another already easier (inaudible).
>> Pascal Fua: Of course (inaudible).
>>: (Inaudible) Pascal, right.
>> Pascal Fua: Right. We could. But actually do I have time because -- yeah,
we have time. So I can show -- well, I'll show you after. We've done it, so I'll
show you afterwards if you're interested. We didn't put Pascal, but we put other
things.
But okay, the point is of course this there are two points to this. I'm really glad to
be in a school where we can work on projects that the plane or the sailboat rather
than for three letter agencies, but however I think these kinds of techniques,
there's a wide range of applications besides those I've thought about. And if you
have good ideas of course since you have showed Microsoft those products I'm
really of course interested.
And also, I think, I hope I've shown that what this is potentially a generic
paradigm for modeling 3D deformable surfaces from one or more cameras. And
the way this could go is you need models in general to model deformations and
you can obtain them using texture versions of the surfaces you want to represent
or capture. You can learn the models and then apply these models to surfaces
at the farthest texture. And that's a fairly easy method to deploy, this can be
done. So the idea is you would have a piece of material, you would wave it once
in the front of your Web cam, make sure it's textured enough, learn the model,
store it, so the next time you see this kind of material it will work.
And actually you could probably do a hybrid version of this, which is many kinds
of material are partially texture -- or surfaces are patiently textured and patiently
not. So you could imagine a smaller algorithm that would track the parts of the
sufficient that are textured, learn the model from that, and then track the rest. In
other words, I think there is there's quite a lot of fun stuff to do on this for the
future. Thank you.
(Applause)
>>: So the advance texture of the material (inaudible) based on the factorization.
So can you comment (inaudible) factorization (inaudible).
>> Pascal Fua: Well to the best of my knowledge, the factorization methods
work for rigid stuff.
>>: (Inaudible).
>> Pascal Fua: There's some.
>>: (Inaudible) the camera and the structure and also deformation.
>> Pascal Fua: Okay. So these I think the one -- the one that comes to mind is
something by (inaudible) and a few others. In some sense it's related because
they use PCA, so they learn modes. So in that sense it's related. But the PCA
was essentially a small part of what I've shown, what I've tried to explain is how
you go beyond that.
>>: So how many PCA (inaudible) do you use?
>> Pascal Fua: 40. Typically for the sail, 40.
>>: The sailboat?
>> Pascal Fua: 40.
>>: 40. That's not that many.
>> Pascal Fua: Not many, no. Which is why we can get the realtime behavior.
>>: Maybe (inaudible) too much?
>> Pascal Fua: No, the sail actually once it's inflated doesn't deform that much.
Where it becomes really hairy is where a poorly trained sail will start flapping in
funny ways.
>>: (Inaudible) a little wind ->> Pascal Fua: If it flaps in the wind, it becomes more difficult, but for this
particular application it's not important because I don't care about what happens
then, then it's just badly trimmed. They care what happens when it's true and
correctly trimmed.
>>: Yeah. So far an airplane application (inaudible) really small. I mean
airplanes couldn't reach it, right.
>> Pascal Fua: Actually it's not as rigid as you would think. It's on airbus, the
wingtips I think flap by a meter of two.
>>: A meter.
>> Pascal Fua: Which actually it's a good thing because it acts as a.
>>: (Inaudible).
>> Pascal Fua: No, no, as a shock absorber. If they were truly rigid, the ride
would be far bumpier.
>>: Right.
>>: So have you compared the local model to the, have you applied the local
model to the (inaudible)?
>> Pascal Fua: Yeah. Yes. It works to the sail, no, but to the smooth
deformation it works almost as well.
>>: (Inaudible).
>> Pascal Fua: We have if you look at the -- we had essentially a piece of
cardboard.
>>: Okay.
>> Pascal Fua: And on cardboard it works fine. So it has -- it's still slightly less
constrained. I think where you lose it's actually a bit heavier in terms of the
computation. You have more variables. So it's -- we haven't managed to make
that realtime yet.
>>: Okay.
>>: (Inaudible) sailboat you imagine there's no texture in the center. You
mention (inaudible). I wonder if you have (inaudible) to model. You have
(inaudible). And then suppose you have no qualifies at all so shading and stuff
there, more shading, I wonder (inaudible).
>> Pascal Fua: Okay. With some luck, that will be our submission next year.
>>: Okay.
>> Pascal Fua: It's a natural idea.
>>: (Inaudible).
>> Pascal Fua: Because you want to formulate -- okay. So the difficulty in that
we haven't -- I mean, I'm not sure it will be our submission next year because of
that, is that if you assume a simple point like model then the constraints are
linear, the shading can be linearized and it fits this paradigm. The problem is
well point life sources are really it's too strong and assumption to be realistic. So
the difficulty is.
>>: But you have extended a base model (inaudible) I mean you have
(inaudible) similar to your linear model and then on the (inaudible).
>> Zhengyou Zhang:
>> Pascal Fua: Right.
>>: (Inaudible).
>> Pascal Fua: That is a possibility as well.
>>: So then you don't have to (inaudible).
>> Pascal Fua: Yeah.
>>: The shading is a cue for the dress on the ->> Pascal Fua: The dress on.
>>: The shading gives you rough shape, don't care about the actual if your
model the connection between the two triangles into right.
>>: The shading can give you that.
>> Pascal Fua: Yes.
>>: (Inaudible).
>> Pascal Fua: Right.
>> Pascal Fua: That's -- I mean that's an interesting lead as well which is, okay,
we have this ambiguity say we have a simple mesh with two facets, this and that
doesn't look that different than projection. But if you know something about the
shading, you should be able to tell one from the other. And that's actually I think
that's also an interesting thing. That's worth looking at.
>> Zhengyou Zhang: Okay. Thank you again.
(Applause)
Download