22315 >>: Rick Szeliski: Good morning, everyone. It is... Bill Freeman to give our talk this morning. Bill...

advertisement
22315
>>: Rick Szeliski: Good morning, everyone. It is my great pleasure to introduce
Bill Freeman to give our talk this morning. Bill is a Professor of Electrical
Engineering and Computer Science at the Massachusetts Institute of Technology
where he's been a professor or faculty there since 2001.
Before that, he was a research scientist at Mitsubishi Electric Labs just across
the street. He obtained his Ph.D. from MIT in 1992, and before that he also
worked in Polaroid. So he's alternated between industry and academia. Bill is
extremely well known in fields of computer vision and computer graphics having
done a lot of very seminal work. And today he's going to be giving us a little tour
through time and photography.
>> Bill Freeman: Thanks a lot. It's a pleasure to be here. So yeah, this is a
really fun talk to make and I'm glad to share it with you.
So I'm interested in photography and how photography tells stories over different
time scales.
So I thought it would be illuminating to just organize it all and just go through from
the practically the shortest possible photograph you could take to practically the
longest possible one.
And talk about how photography tells stories over these different time scales. So
let's start off with the shortest possible photo you could take. And that would be
a picture of light itself.
So this is a photograph of a very short pulse of laser light. You can see it here.
And it comes through a diffraction grading and splits into well the primary lobe
and the first diffraction, and then the second order lobes.
And so how do you take a picture of light? Well, this was done about ten years
ago using holography. So you have your pulse you're going to photograph going
wherever you want to have it go through. In this case on ground glass.
Then there's a second reference beam which exposes the photographic plate
and makes a hologram at the same time. And they're very both short pulses.
The photographic film only records the coherent beats of the two waves when
this one passes over the holographic plate. So this acts as sort of a gate for the
photograph. And then if you take the developed hologram and look at it and
move your viewpoint across space, you get a picture over time of this light wave
progressing.
And that's quite remarkable. Recently there's been another remarkable thing of
actually recording photographs of light using electronic means, not just
holography. And I wasn't going to talk about it this because it's sort of
unpublished work and people pointed out to me look it's on YouTube. It's not my
work it's by Ramesh Raskar at the Media Lab.
And let's see, what do they do? They have an extremely short laser pulse, and
then they have a special sensor which is just a one desensor that I guess you
scan electron beam very quickly and it's only sensitive to light when the beam
scans it. So you can get temporal resolution on the order of tenth to minus
12 seconds so you can only record one horizontal row at a time so the
photograph is going to show you are of light traveling actually took hours to
expose, because they had to just do the laser pulse over and over again and
they recorded one row and then the next row and the next row and so forth. So
let me show you these. These are by ram mesh Raskar's group.
So here's a light beam passing through ->>: Should we turn the lights down?
>> Bill Freeman: For this one maybe we should.
>>: Could you turn them all off for one second.
>> Bill Freeman: It's going to go through it a couple times. And I don't know if
you -- I don't know if you can see him. But there's Ramesh down there
describing his work at the international conference at computational photography.
So and again they can record this by measuring with very fine temporal precision
one row at a time. Now, this next one they're going to show overlay that with
what we just saw on top of a static photograph of the scene that was there. So
now you can see it all in context and this light beam comes through the Coke
bottle, could have picked a better subject, but anyway you can see it progress.
So this is just -- you know, this just blows my mind actually. And you can go to
that Web page and look at the videos yourself.
So that's in the very, very fast range on the order of ->>: I think that was the darkest video. So we'll bring the lights back up.
>> Bill Freeman: On the order of ten to the minus 12 seconds. Now let's slow it
down by a factor of a thousand and get like ten to the minus nine seconds what
sort of photographs can you make there, is it useful to make there.
So this is a realm that's useful for time of flight depth imaging. So there's a
camera which I brought in my lab eight years ago called 3DV systems. I think
Microsoft might own it now. I'm not sure. But they sent a slab of light, about 50
centimeters long out to the object and then this bounces back. 50 centimeters is
on the order of ten minus nine seconds worth of light.
It bounces back, and now the things that hit an object which is close to you are
ahead of the things which hit objects which are further away. So you've distorted
this wave form. And then at the camera end they go and quench the detection
with a very fast shutter.
So they, the camera only receives this amount of light. So now you've traded off
depth coding for intensity coding. Now the intensity of the position is proportional
to the depth away from you. And you can make a depth image this way.
So.
>>: Proportional modulate by [inaudible].
>> Bill Freeman: So they actually have to take -- they use two exposures to get
this, right.
And so here's the RGB version and here's the depth camera version. It's pretty
noisy, but it's serviceable as a depth camera.
Okay. So that's the very, very fast. Now we're going to slow it down again by
another factor of a thousand roughly to the one to two microseconds range of
time.
And this is the range of high speed flash photography and ordinary flash
photography. So of course the way you take a picture this way is by keeping the
aperture open, having a dark room and then just for this brief moment expose the
subject with flash of that duration, and you can capture events that happen over
that sort of time scale.
So how do you make a flash that short? Well, there's two ways you can do it
chemically. So these people got Noble Prize for these fast chemical reactions
they studied.
You can also make it electronically. So Harold Doc Edgerton well known MIT
professor pioneered these electronic flashes. In preparing this talk I discovered
the wonderful Web pages that has his lab notes online. So here's a page of his
lab notebook showing the design for this very fast strobe. 1930 Harold Edgerton
scope tests.
He was a master of just taking beautiful photographs with these high speed
strobes. So here's water as it cascades down. Here's a little assortment of
photos.
Bullet going through an apple. Many photos over time of a diver. Schleerin
[phonetic] photography of a bullet going through a candle. A bullet going through
a card. This tells you when these photos were taken this is a footballer of the
day kicking a football.
This is one I really like. It actually tells a story over a number of different time
scales. Here's the bullet and here are three different balloons that the bullet has
passed through. So this shot took about one ten -- the exposure is about one in
ten microseconds but at the same time we get a picture on the order of the two
milliseconds that it took the bullet to travel this length of time by having the
identical structure repeated and now we sort of code time over space and we see
the progression of the destruction of the balloon as a function of time now
displayed over space.
And, of course, now high speed flashes are commonplace and anyone can do
them. So there's Flickr groups on flash photography. We have the updated
version of the bullet going through the Yoplait yogurt. There's the bullet there.
I find them so delightful because they relate to everyday experience but they let
you see it in an entirely new way. Here's somebody putting their finger in water
and you see it in a way that you've never seen it before.
Here's everyone likes bullets going through things. Here's a bullet going through
a cookie, which looks like a cookie sneezing.
>>: [inaudible].
>> Bill Freeman: Yeah.
>>: Just kind of wondering like how the society at large lead into that work, they
sort of ->> Bill Freeman: I think they embraced it. I think he was like -- at least at MIT
they tell us he was like this hero, really well known.
>>: That's the dogma, but I don't know if that's the truth, though.
>> Bill Freeman: Well, he took part in the war effort, too. He made these -there's a photograph of like Cambridge from up in the air at night. He just had a
humungo flash that lit up the entire city and used these in World War II for
reconnaissance.
Okay. So that's flash. Now, let's slow it down again. Now finally we're in the
realm of sort of conventional photography. Say 1-5,000th of a second to 1-25th
of a second ballpark. And of course there's billions of photographs one could
show that were taken over this time range. It's sort of interesting to look at the
very first photographs that were taken or movies that were taken over these sort
of time scales.
So, of course, then we have the names Marey and Muybridge and Edison who,
in the late 1800s, made motion pictures and photographs in these kind of time
frames.
So here's a photographic rifle that Marey made, shoots 12 seconds fascinating
trained as a doctor, interested in circulation and how things move and studying
hummingbirds. He came to photography as a way to study his passion of how
birds fly and animals move.
So he took a lot of photographs of birds in flight, and they recorded these 12
frames around the circle of this photographic film. So here's one of his shots.
And then he made these beautiful sculptures, I don't know if it's taken off for
landing, but a bird in flight and also this photograph of a Pelican landing, which
again reveals a whole new viewpoint on an everyday thing.
He made bronze and plaster sculptures of these. These are photos of people
hammering, jumping, and of course Muybridge was a contemporary and they
talked to each other.
So Muybridge addressed the question of the day, which was when a horse
gallops, is there ever a moment when all the hooves are off the ground at the
same time? And this wasn't known before these photographs were taken.
And now you can see that there is a moment when all the hooves are in the air.
So now let's slow it down yet again to, telling stories over the time scale of 1-25th
of a second down to one second. So this is sort of the realm of photographic
blur. And blur tells a story.
Here's photos of blurry photos that tell you a story of what's going on over that
sort of a time scale.
And there's a Web page I want to point you to Ernst Haas photographs. A whole
series of blurred photographs, gorgeous, artistically done. Here's a bullfighter, a
bird in flight.
And on a blog I was pointed to a Web page by this anonymous Flickr
photographer called Just Big Feet who made delightful photos of a marathon
taking exposures on the order of a second long. But you get these beautiful
stories of runners.
So this is almost not all my work but there are insertions like product placements
in a movie of my work. Like in a movie when you see a product placement you
are like why are they focusing on that Coke can so much. Here's the first product
placement. This is joint work with Ayan Chakrabarti, graduate student at Harvard
and Todd Zeigler. And you can use blur to help you learn about the image, use it
to help segment out the blurred from non-blurred objects if you have a carefully
designed prior model for how images ought to look and how blurred images
ought to look.
You can make a one-dimensional search for all possible motions and find the
most probable speed of any one region and segment out the individual region
according to how it was blurred. Here's a segmentation of the blurred runner.
>>: Why are the shoes blurred with the foot attached to it?
>> Bill Freeman: Why do they? Because they're on the ground.
>>: No, there's one shoe off the ground.
>> Bill Freeman: Oh, this one. I see that could be a number of things. There
are a lot of steps in this segmentationnal algorithm, and it could well be because
the contrast between the shoe and the grass is not as strong as contrast
between the skin and the grass.
>>: Or it could be measuring horizontal ->> Bill Freeman: Yes, we were. Yes.
>>: Even the curve of the foot and shoe go together.
>> Bill Freeman: That's true, they do go together. As I said, this is actually, the
local blur is just one input to this segmentation algorithm. And there are other
inputs as well.
So now let's go down, slow it down even more, down to the time frame of
seconds to hours. So this, of course, is the world we all live in. This is the world
that movies are in.
And there's a lot of wonderful ways to tell stories over these time scales, too.
Typically you often have time one and then time two, maybe, several hours later,
minutes or seconds later.
And okay so there are a number of ways you can describe what's going on over
this time. You might the what's going on between these two time frames is a
stationary process. And you just want to describe what's this constant process
that's happening between these two times.
So it might make sense to average images over these times, and I'm show you
some of those.
You can also again the it's a constant process but select different ones that stand
out in some way. And so there's useful work to do with what you would call
image selection.
And then finally you might the there's a process that's actually somewhat
changing over this time. And you might study analyze how things have changed
over that time.
So let's go through these one at a time. First let's kind of describe what's going
on between these two times separated by seconds or hours through averaging.
And now you get to another artist. Jason Salavon, who has made these
wonderful pictures that are averages of the, well averages of many things, but
these are averages of the late night talk show hosts. So here's an average of
many images of Jay Leno. Can you tell who this one is? Conan O'Brien and
Dave Letterman and they're all different in a different way. And I can recognize
them.
Here's another averaging photo. An eight-hour photograph by an artist Atta Kim,
which again tells a nice story about urban life, really.
So that's averaging. And then as I mentioned there's selections. So how are you
going to select pictures which frames to show of many frames you might collect
between these two time frames? Well, one way to do it is by which one is closest
to you or which pixel is closest to you each time. So this brings up something
called shape time photography, which is our second product placement, I should
say.
So this is something I was involved with. So here's the deal. You take stereo
images of something, many times a second, and so you record both depth and
image information. And you can use that to tell the story of things moving over
time. So, for example, suppose you have five frames from a video of the death
rattle of a quarter on a table. How might you composite those together to
describe the action of the coin rattling on the table.
Well, you can just average them all together with the averaging method, and that
kind of works. It tells you let's just see what happened. But there are a number
of problems with it.
You can't get any sense of the depth of things or the temporal order of things and
you have reduced contrast of the things where things are averaged together.
Oftentimes computer vision methods are used to extract out the foreground thing
from the table. And then you can layer by time. You can put the first things in
first and then layer on top of that, the thing that happened next, and so forth.
So that fixes the contrast problem. But now you've got another problem. The
shapes are all wrong, the thing that's on the bottom is actually on the top in this
photograph so it doesn't really tell you the story of how the shapes relate to each
other.
So instead you can make your selection of which pixel to show from each time
according to their shape. So you show the, at every pixel you show the intensity
corresponding to the thing that was closest to you out of all of them. So this
gives you sort of an approximation to what you would have seen if you looked at
the union of all those shapes at the same time.
And we call this shape time photography. And you can use it to tell little stories
of short-term things. So here's three photos of my wife sewing something, and
then here's the little composite picture telling you how to sew.
And here are two pictures of my brother-in-law's head and I'm sure you're all
wondering what would his head look like if it were in the same place at the same
time, it would look like that.
So and this relates to more generally what you might call another selection
method you could call, called lucky imaging.
So here the story is that there's this process going on over this time. But maybe
there's something obscuring it at some moments and not at others. Or maybe
what you really are looking for occurs at some moments but not at other
moments. So you want to select out those lucky shots.
And this is used in astronomy with great success. So here's a single exposure of
a distant astronomical object. Obviously under very noisy conditions. And here's
another exposure. When the atmospheric turbulence was a little bit better, as
you could see the object a little bit better then.
And it's enough better that you could actually measure out of all your photos this
is a good one and that's not a good one. You could imagine looking at perhaps
the local variance or something.
So if you take an average of 50,000 of these, you get this. And if you take an
average of just 500 of these selected good ones, then you get this. And so this is
called lucky imaging, and you are just grabbing those moments when the
turbulence happens to line up just right to give you a better view of things, and
either show those or average over those to get your lucky picture.
>>: It's turbulence, because the noise you'd expect to be fairly.
>> Bill Freeman: Yeah, I think that's how we put it. Yeah. And Microsoft
Researchers have exploited this. So Michael in the back really should be telling
the story. But I'll just say it anyway.
So this is work he and Neil did in Rick's group looking at Mount Rainier. So, first
of all, I'm told just getting a picture of it at all is one lucky thing right there.
But on top of that -- so let's see, here's an input, hazy, you can apply image
processing techniques to it and get this dehazed version. So this is pretty good.
>>: Actually very clear day.
>> Bill Freeman: This one was.
>>: So far away.
>> Bill Freeman: Clear day, right. Clear day, imaging processing makes it look
even clearer. Not quite there yet, it's still rather noisy.
So let's take an average over many photographs of these dehazed images and
we get this. But again we're subject to the atmospheric turbulence that slightly
disturbs the light paths over that long distance.
Instead, what they did was a local shifting of each patch to get them over the
ones that they're averaging over to line up much better by making local
adjustments rather than a single global adjustment.
And so by doing that, that sort of local lucky imaging and adjustment, they can
make a photograph like this of Mount Rainier.
Did I miss my lines?
>>: One more after this. This is without the lucky -- this is just the line. There's
one more that's aligned with all of them. Anyway, I'll show you those after.
>> Bill Freeman: Thank you.
>>: One more slide or ->> Bill Freeman: I'll get it right. This is very helpful to have the author there. So
here's another one again with the same author. This is, again, a form of lucky
imaging but although a different kind of context.
So again this is Michael's work. You have a group shot. And again this is sort of
if you will a continuous process over a lot of time where people randomly smile.
But they don't all randomly smile together. You can't get that one shot that you
want.
And you want to combine the different locations at different times to give you a
single composite shot where everyone is smiling and everyone looks good. This
is back in the days when Michael was still in the witness protection program so
we don't see his face here.
But this was I guess to protect anonymity of a submission. So that's another
form of lucky imaging. And then one thing that I wanted to do was a form of
lucky imaging I always wanted to go to a large plaza and get a movie of just the
countless people walking along. And then construct a single composite image
where everybody was walking on their left foot as if they're all marching together.
Well, it turns out an artist has done this, and of course I'm sure done it much
better than I could have. There's an artist named Peter Funch who has
wonderful photos with just that idea in mind.
So here's one where everybody's in the air. Except this one guy here who is like
wondering what's going on. And, again -- so he doesn't say how he made this
photo, but I assume it was a form of lucky imaging, stood there with a tripod and
photographed many people who came across, and whenever anybody was in the
air and composited all the appropriate photos together to get this single collage,
although it doesn't look like a collage because I'm sure it was all taken at the
same place.
Just a whole series of these. So here's another one. Everybody carrying a
manila envelope. And this one's really nice. This is everybody in time square
taking a picture.
And can you tell what the story is here? World of children. Everyone's young.
So those are again lucky imaging, but done by an artist. There's another artist
whose work I really like, and she makes found animations. So here's a photo of
a horse. She went and collected lots of photos of horses and made a movie out
of it. So you can think of it as lucky imaging, the time goes over many hours, I'm
sure the photos were taken over many months but they're all composited to make
a single story.
And then she has another one so this next one doesn't fit in with pictures over
time but it fits in with the story of random selection, and I like it so much I'm just
going to insert it into the talk here. So we're going to take a two slide break from
the talk about time and show you two other images from Cassandra C. Jones.
So there's one. I just love it. It's random selection. Here's lightning forming a
little bunny rabbit.
And here's a lightning forming a little squirrel, a chipmunk. So these are -- again,
random selection. They're not telling you a story about time. But they're telling
you a story about.
>>: Does her process go all the way ->> Bill Freeman: I'm sure it's not. Okay.
>>: A lot of modifications to be moving the shapes around to make something.
>> Bill Freeman: Yeah.
>>: So you can also show how things have changed. So now we're on to
describing changes over time. Again, showing things by one of my favorite
scientists Michael and collaborators. So this is selecting images from a short
sequence and compositing them together to tell a story of changes over time,
daughter on the swing set, monkey bars.
This is another one just like the balloons by Edgerton makes a story of time
separated by spatial position. So here's a sequence of photos of a building being
blown up but here they've composited them together with the latest ones on the
left and earlier ones on the right but you now code time spatially and you get to
see in one photo of the whole spatial structure how it deforms over time.
Then just to show off they then went and did it in the other direction. Now it's
times later on the right and earlier on the left.
In case there's a few of you in the room who haven't seen this, I'd like to show
you this we call motion magnification which analyzes small motions and
exaggerates them. This is my wife on the swing behind our house, we track
feature points carefully to avoid collusion artifacts and then we cluster them
according to similar tracks over time.
And then we group them in, to get a layered representation of the motion. And
the user says take the red stuff and amplify its motion by a factor of 40. So we'll
make a motion microscope. We'll let you see small motions. You put the pixels
back in and push them around that way. But now we have holes where we didn't
see data before. You use text or synthesis methods to fill in the missing holes.
Now we have a motion microscope that lets us see the small motions as they
would have appeared if they were amplified by some factor.
She looks at this and asks if I felt the video makes her look heavy. [laughter].
And here's the motion that someone undergoes the balance upside down and
push down on wooden things aluminum supports and it actually moves.
Oftentimes parents of perfectly healthy newborns wonder is the baby breathing.
So now we can tell that.
And you can actually use it for science. So here's small deformations of a
membrane as an acoustic wave goes across it in the ear of some small animal.
So magnify, you can see the deformation, but originally can't see the deformation
really at all. So we're publishing a paper on that membrane.
So that's sort of taking something and exaggerating its difference relative to zero
motion. Recently we've gone and gone a little further with that and tried to
exaggerate the difference of one motion from another motion.
So here this is a two -- the same car going over the same speed bump, once with
its trunk empty and once with its trunk full. Might imagine you sponsor this with
the defense department sponsor but here it is again.
And so we track each one carefully. And now we're going to exaggerate not the
motion relative to zero, but the motion of the full trunk car relative to the empty
trunk car to see if we can see a difference in how it moves.
And you indeed you can. You can sort of see an exaggerated version of those
differences in the motion. So now we're going to slow it down even further, down
to hours, to months.
So this is in the realm of time lapse photography. So let me start with a nice time
lapse. This is from planet earth, PBC documentary. This is a beautiful time
lapse and great care was taken to make that, I the, because it's really hard to get
it so smooth. Of course there's camera motion very slowly at the same time.
You've got these two different temporal processes. Feels like just a conventional
pan. But that panning took a long time to do, of course.
>>: Several days?
>> Bill Freeman: The same time scale as it took for these flowers to open up.
>>: It may have been shot -- like there's a -- so there's a -- well, so there may
have been a pausing going on there, because there's a making out thing that's a
different sequence where they actually talk about the fact that, you have
[inaudible] and the sun comes in, it's possible to do it slowly, they actually bring
all the flowers into the studio.
>> Bill Freeman: Great, this is very helpful. I'll look at that. >> It's well worth
watching. [inaudible] BBC. They have this wonderful tracking shot, going
through I think like a glen or something like that. And it's astounding, but it's all
about time.
>> Bill Freeman: Great. Good. That fits in with my point, actually. That there's
really a lot of room for computational photography to make an impact here.
Typically with time lapse, there are all these things happening over this period of
time. Maybe the lighting's changing, maybe the object is moving around in a way
you don't want. So you've really got to understand the sort of higher level -- you
want these controls over things which are normally difficult to control in a
photograph such as the changing the lighting or changing the position exactly.
And so if we have better computational understanding of those things, we can do
a better job of recording events over these long time scales.
So a first attempt at this was made by researchers at Harvard and at Merle. I
guess they're now at Harvard. [inaudible]. So they worked with time lapses and
first developed a method to remove cast shadows from the time lapse, and with
the cast shadows removed you can identify them because the intensity goes way
down with the cast shadows removed they did a low rank factorization of the
sequence to separate out the sequence into different components time of day
and lighting components. So here they're rerendering the sequence without
shadows. We can also then manipulate their low ranked decomposition further.
So this is kind of a first step at this sort of thing you want to do. Of course it
requires a stationary camera and stationary subject for this case.
And a key piece of making good photographs over long time sequences I believe
is tracking.
And the better we can do tracking, then the more flexibility we have at rendering
things sharply that are moving overtime, so forth. Just to address the point
what's the state of art in the computer vision of tracking, here's what we think it
is.
So we took sort of the best, a good candidate for the best trackers, Fox and
Maleks [phonetic]tracker, and reimplemented it by my extremely good graduate
student Michael Rubenstein. This is kind of a picture of what the state of the art
is in published literature as opposed to what Fox and Maleks have in their code
they haven't released or whatever.
So here it is. Okay. So each track is coded with the color of when track was lost
for that piece. And so you can tell what the color code means by looking at the
color of these things as they slide off the end.
Let me play it again. So these last all the way until the red stuff slides off the
end. But others not quite as long. And you'd like ideally all the dots here would
be the same red color throughout the whole tracking of the cheetah.
Is that clear? This color-coding? It's demonstrating that these tracks are actually
much more short term than you like them to be. You like them to be stuck on for
the whole length of time that the cheetah is in view and even remembering when
they pass through occluders.
>>: The features are not -- it's an adaptation in the feature tracks?
>> Bill Freeman: Right. I don't believe there is an appearance model that
changes over time for this one. And of course that's what we're working on.
So just another piece of work that we've been involved with, in a time lapse,
there's things going on all these different time scales. So here's some sprouts
growing and you've got a short scale of the sprout ends, flickering back and forth
and you've got the longer time scale process of the plants growing themselves.
And you'd like to -- I think you'd like to be able to make a photo that took those
things and treated them separately. You'd like to be able to just see the
long-term effect by itself and that would maybe clarify this time lapse for you.
So without actually tracking we've made what we call a motion denoiser that
addresses that problem. So again let me just go through this in a little bit more
detail. The game is we want to make new video that's going to use the pixels
only from this video and just reorganize them in space and in time.
So we're not going to use a pixel we've never seen before. We're just going to
put it into a different position. So the desired output of our algorithm is a warped
map which tells us where we've grabbed each pixel from that we're rendering.
So what do we want the warp map to look like, so W is a function of the position
P. So we have several terms that tell us what a good work map is. Number one
is it more or less respects the original video. So the intensity of the warped video
minus the intensity of the non-warped is small.
But that would just give us the original video back if we didn't do anything else.
So we're also going to say that we want the warped map to change very little
over time. The warped map video, the output video to be pretty slowly changing
over time.
So the output at one time should equal the output at another time. And finally we
want this warped map where we grab our pixels from to be spatially and
temporally smooth. So there's another term there. And these three terms define
a Markov random field and you can find the optimal -- find an approximation to
the optimal warp map which gives you the optimizes this objective function we've
created. You can do it a number of different ways.
Iterate conditional modes is one solution method. Graph cuts is another. And
loopy belief propagation is another. We've tried all three. For this particular
problem, we believe loopy belief propagation worked best. This here then are
the approximation to the optimal warp according to this objective function we've
made. Here's the original video and here's our motion denoised video output.
So what we're trying to do is show only the long-term effects and not the
short-term. And here's a little story that tells us how this video was made. Here's
the spatial displacement at every position and here's a color code telling what,
how the color displayed here corresponds to a spatial displacement from the
center of this figure.
And here's a color map showing the temporal displacement of every pixel. So
you take these pixels and you grab across space and across time according to
this map and you get this output video. Let me just play it again.
And the thing you might notice is it looks pretty good but we've clipped off some
of the ends. That's because this whole thing, the state space that you're trying to
solve for the translated pixel position in space or time, there are many different
translations we have to consider. And so that slows down solving of this thing
and we had to only consider a relatively small-volume in space and time.
If we just take a little crop of this video and allow ourselves a bigger search
space in space and time, then we get output frames which look much closer to
the desired ones.
So this is just a matter of computational time to fix that artifact. But so now you
can take your input video and separate it into the short-term kind of low
frequency motion components and the high frequency long-term and short-term
components.
>>: Is it possible to get some sort of like a temporal Marey effect?
>> Bill Freeman: Sure. That would come in actually right at the time lapse itself.
And so that might mask a high frequency thing as something that's low frequency
and then we would look at it as low frequency and not smoothing out, that's true.
Here's just a comparison of different ways you might do this motion denoising
problem. Here's the source. Here's taking, just taking the average at each
position, the average value over the sequence over some temporal window
taking the median value over the temporal window, and here's our motions
denoised output. As you might expect taking the average over time gives you
something that's kind of smooth but it's blurry. The meeting's a little bit better
and the motion denoise is a little bit better.
Here on the bottom we show a space/time drawing of this. So you can kind of
see better what's going on. Here's one scan line displayed over multiple times.
So in the original it just goes scccr and it wiggles back and forth as we saw in the
original video. The mean and median are blurred out somewhat and here's the
motion denoised. So just to show you a few more of these. And this will appear
in CDPR in June. Here's a source long-term components, short-term
components. Here's a swimming pool being dug.
And you can see in the output this grill cover is stabilized and you can again
separate into the long-term and the short-term components of the video.
And this isn't perfect, but it's taking steps in the direction that I think
computational photography should go for long time scale events of giving you
kind of independent controls over these different components of the video.
Here we're looking at the kind of low frequency motions and the high frequency
motions.
You can also imagine -- there's a beautiful set of time lapse images made by the
extreme ice survey of glaciers. Here's the original time lapse of a glacier. It's
really noisy in many ways, but here we've applied our method to pull out just the
long-term and the short-term components of it, and we think it gives better
rendering of it.
So now let's go beyond time lapse up to years and centuries. How do you tell
stories with photography over time, years and centuries? One way to do it is to
just photograph very, to look at photographs from very long ago. I mean, that
tells a story over time.
So I just want to show you these photos I like so much. These are some of the
earliest color photos made. They were separations made sort of temporal multi
plex, to obtain the color. So make a black and white photo through three
different colors of filters. If you combine them together you can get a color photo.
We now with digital methods we can combine them to get much richer color
pictures than they could have seen when they took them back then.
And now we have some of the world's first color artifacts from the fact that the
motion waves, of course, didn't stay stationary over the three different exposures.
Another way to tell stories over years and centuries is to compare photographs
that were taken a long time ago with photographs that you take now. And so
there's a really nice sort of -- there's a book called "New York Changing"
rephotographs of old photos. So a person went and retook photographs from
many different locations. So this original set were taken in the mid-1930s and
the new set were taken around 2000.
So it's very illuminating to compare the two. So here's the Manhattan bridge
looking up. It changes very little over the course of time.
Here's looking from on the bridge. 1935. And how do you think it's going to
change in 2000.
>>: Wider.
>> Bill Freeman: Pardon?
>>: I was going to say wider.
>> Bill Freeman: But actually this is the pedestrian part of it. Constraints on
what you can do. Fences. And here's an old photo of a street corner. 1936.
And 2001. And just even something as simple as this would make a nice sort of
masters thesis computational photography project. You'd like to have these two
as inputs and get a separate picture of what was there in one picture but not in
the other and there in the other but not in one and do it in a nice artifact-free way.
I think it would be nontrivial and I think it would be nicely useful.
Another way you can apply computational techniques to these problems is to use
the computer to help you with making these rephotographs.
So this is work by my colleague Ferado Durand and Agarwala and his student
Somi Bae. They use computer vision methods to help you line up your camera
at the right position to match an input photo that you want to make a
rephotograph of. This uses the kind of computer vision methods you might
expect. Local feature detectors and knowledge of the geometry to tell you how to
move the camera, how to adjust the focus.
So this is a test, here's a reference photo, and rephotographing using their
method, and rephotographing kind of naively trying to make it match.
>>: How many years ago did they do this work?
>> Bill Freeman: No more than four.
>>: The only reason I'm asking is that back then they might have had to do it with
a laptop and a camera. Now it could be an iPhone app.
>> Bill Freeman: True. Definitely I know it wasn't just on the iPhone. And so
here's some real test cases. Here are -- let's see, reference photographs. Okay.
Each row is the same place.
Reference photo. Other reference photo. Their rephotography results and
comparison against the professional photographer who is also rephotographed
the same thing.
You can see they get it slightly better, although it's a shame they had that car
blocking the view there. But maybe that's part of the story, too.
And then again another way to tell stories over hundreds of years is not just to
compare individual photos but to compare an aggregate of the photos. So here
again is the artist Salavon. This time showing an average of high school
yearbook portraits from 1967 and from 1988. And again this is just kind of
delightful that you can sort of see a story in these average pictures and then see
how the averages change over time.
One of my favorite, and kind of forehead-slapper wish I had thought of that thing
for telling stories over long periods of time is this Picasa face movie application.
So here's a YouTube video of it. It's really simple.
>>: I think this is coming out as a SIGGRAPH paper this summer it was on the
preview submitted yesterday.
>> Bill Freeman: Let me just finish this and I'll take your question. So you take
all your pictures, put them in a shoebox as it were, I guess you have to tell what
the ordering is. And then the only computer vision technology really is identifying
the face and lining things up so that the transitions work well but it's such a
compelling story over many years that tells the story of this woman growing up.
And anyway again it's a nice use of computational methods to help tell stories
over long periods of time. Yes?
>>: Two questions, one there's a series of four sisters over I think 25 years Nixon
sisters, I'm not sure, artist -- photographed and then they tried to what you call it
canonical, the same sister in the same position. It's the something sisters, Nixon,
maybe not.
>> Bill Freeman: I haven't seen that.
>>: Has anybody tried to rephotograph the original building that was
photographed in the 1820s?
>> Bill Freeman: Which ones in the 1820s?
>>: In the 1820s I believe the first photograph, for MIPS. Maybe it's a stepping?
>> Bill Freeman: I'm not aware. Okay. Now finally I'm going to slow it down
again. I'm going to go beyond centuries how can you take photos beyond
centuries? There's two ways you might look at it. One way is let's take photos
over a centuries time scale over human scale things. Can we make instead of a
time capsule can we make a time capsule camera that records things over
hundreds or thousands of years.
And this becomes as much a hardware project as a computational project. It
reminds me of the 10,000-year clock that was, I don't than if they're still working
on that project or if they've launched it or what but the goal was to make a clock
that would keep accurate time for 10,000 years.
So if you want to make a camera like that, I think you'd have similar obstacles to
work against.
But then the second way to make very, very long, make photographs of very,
very long ago is to give up on the notion of taking pictures of things at a human
scale and go back to relying on the finite travel time of light and look at
astronomical images. Again, if we give up on taking pictures ourselves this is
from 5,000 years in the past because it's an astronomical objects, 5,000 light
years away.
Let me run through a little bit of this. Let me jump ahead here. This is 50 million
years in the past. This is 200 million years in the past. And this is 3.8 billion
years in the past. There was actually an event that occurred over the course of
several, just a day or something.
It was they think some star slipped into some black hole and made this huge
gamma ray burst which astronomers then directed their telescopes at and looked
at and said here's a picture of that area where this catastrophe occurred
3.8 billion years ago from our frame of reference.
>>: How did they date how long ago the light left this galaxy or how far it is?
>> Bill Freeman: I believe the number of ways, there's sort of these reference
galaxies that, where they think they know how far away, how fast they're moving
away from us so from the red shift you can tell how far away they are.
I don't know all the details. But tricks like that are used. And you're not going to
get a photograph much of something much older than this because that's, you
know less an order or magnitude away from the age of the universe itself.
So that's the other extreme of the photograph you're going to take. So we've
covered the gamut then from the very, very, shortest -- taking a photograph of
the fastest anything can be moving, and to a photograph of as long ago as we
can see. And photography lets us take pictures anywhere in between really.
As far as what, where the research is to be pushed, I just don't think we're going
to beat the short time edges and the beauty of these photographs By Edgerton
and others as well. But I do think there's a lot to be done in the area of long time
frame photography, because again these things that you want to remove are
lighting effects or changes in position and that's a computer vision problem
handling those properly. And I think we can -- there's a lot to be done in this
realm of the problem. So that's it. Thanks.
[applause]
>>: Rick Szeliski: Before we take questions I want to say one thing that I forgot
to say at the beginning which is Bill is here for the whole week. Bill is a
consulting researcher with us. So you can talk to him about anything you're
doing in the company. He signed nondisclosure agreements. So please either
drop by his office which is in our hallway or send an e-mail. He's got an interim
e-mail account or reads his MIT e-mail. So please take advantage of his visit to
chat with him.
So are there any questions now on the talk?
>> Bill Freeman: I think you've been good about asking questions during the
talk.
>>: Can I ask you very fast because you've known directly in film -- two beam of
light can be way more than black, but the rest can be done ->> Bill Freeman: I think that's the story. I mean, also it's the fact that these two
beams of light, it's not just a nonlinear interaction it's the fact they're coherent
with respect to each other. So that would give you a different signal than just
averaging each one by itself.
>>: Rick Szeliski: Any other questions? . Okay. Thanks a lot, Bill.
[applause]
Download