>> Andy Wilson: So today we have a guest... Kentucky who happens to be my older brother, Hank Dietz. ...

advertisement
>> Andy Wilson: So today we have a guest speaker from the University of
Kentucky who happens to be my older brother, Hank Dietz. He got his Ph.D.
from Polytechnic University, which many of you may know is now part of NYU.
He probably is best known for his work on compilers, the PCTS tool kit which
some of you may have used, and did a lot of early work on parallelizing Linux.
But today he's going to talk to us about his hobby in photography. He's always
been into photography. In fact, we just sold our parents' house which had a
darkroom from our childhood, and, of course, nobody has any idea what to do is
a darkroom in this day and age.
Anyhow, so with that, I give it to Hank.
>> Hank Dietz: Thank you.
So I guess my brother has already said, this is kind of a hobby for me. So let me
just say if I say some things that sound kind of wrong, they probably are, but, you
know, that's okay. I've basically been playing around with this for a while. I'm not
an image processing guy in any way, shape, or form. Not that I have anything
against them, but I'm happy that they're image processing guys and I'm not.
I am a computer engineering systems guy, and the kind of thing that I'm usually
looking at is not just compilers or hardware architecture or operating system, I'm
looking at how you integrate the different pieces to make the system work better.
And if you think about it, even though I've been doing that mostly with parallel
super computers, that works just as well for embedded processors, and, after all,
what is something like this but a little embedded processor with a really cool IO
device on it. Right?
So essentially my background in photography goes way back before I was a
computer guy. So essentially back in the 1970s I was photo editor of things like
Broadway Magazine at Columbia University. I published photos as a
professional photographer, did a lot of work that way.
Then I kind of got down to doing serious engineering and I kind of forgot about all
that stuff, and eventually back around '84-'85 I got sucked into doing some digital
half-toning work fork Dupont. So I got involved in doing X-ray image half-toning,
and I guess some of that stuff actually got picked up by IBM and William
Pedibaker's [phonetic] group there, so that kind of was a little foray back to image
processing-ish photography related things, but very short foray. Didn't really stick
with it.
Then back in 1996 I had this wonderful little scenario where I'm the guy who built
the world's first living PC cluster supercomputer. Now, back in '94 when I did
that, we needed some way of showing that we could actually keep the machines
tightly synced. So as early as '94 we started building video walls.
Well, back in '96 we had a 30-megapixel video wall. Where do you get
30-megapixel images to put on a video wall in 1996?
So I had all these land sat images and I had stuff that -- from NASA, images of
things like IO and what have you that were stitched together. But I want to be
able to take my own images, and I want to be able to do things, say, maybe 200
mega pixel images and play around with that, and you just couldn't do that with
any cheap commodity technology.
So ever since then I've been trying to make digital cameras be able to do these
kinds of things, and basically I've been doing things that I think of as
computational photography since 1999, but I'm not sure that it's exactly what the
term computational photography is really associated with now.
So just to give you a quick background, these are all weird camera things that
I've done. So you can see, yes, that actually is a USB plug coming out of that old
4-by-5. So I hacked the sensors on that.
This was back in -- I think it was 1999. I did a couple of autonomous robots that
had tethered Nikon 950s with the 185 degree fish-eye lenses on them, and I was
doing realtime stitching and doing spherical pan and zoom on the video wall tied
to a cluster wirelessly transmitting things.
I have fleets of these little guys all over the place. That's actually a peephole
being used as a fish-eye. And you go, well, what a horrible way to try and abuse
a lens, right? Well, it turns out I was actually working on a project called
Firescape where we're trying to make a way that people could basically go in to
fight fires with some knowledge of what their surroundings were. So we had a
helmet-mounted system that we were working on, and we needed something that
could actually go into a fire and give them a wide angle point of view and was
basically disposable.
Well, turns out those stupid little door peepholes cost 4 bucks and they're fire
rated. So they're disposable. They're good. They're not good optically, but, hey,
what can we say?
That's actually another one of the Nikon 950s from years ago that's still been
clicking away in one of my super computer machine rooms. That particular
camera has actually taken over.
3 million images as a tethered system over a period of more than 10 years.
This is actually a little thing that we did with CHDK. We're basically using the
Canon Hack Development Kit. I had an undergraduate senior project team do
some things with spherical -- excuse me, with 360-degree images and in the
camera directly creating a 360-degree video view. Very nice that you can do that
in those.
This is another one of these mutant things. This is actually kind of from the
Firescape stuff.
So you can see I've been doing a lot of camera hacking, and in terms of the stuff
that I've done, certainly, as I say, I'm pretty well known for doing a lot of video
wall stuff very early on. This thing here was the 30-megapixel display that I was
telling you about. And, yes, that is 32 pentium 2's running it.
You'll see I've had lots and lots of video walls. And we did our own video wall
libraries, things like that. I have the dubious honor of having been the first guy to
actually have an MPEG player that would run on a Linux video wall.
This is another thing that I was doing with a couple of my colleagues where this
is actually a handprint, and the interesting thing is that we're actually doing
structured light 3D capture. So I'm the camera guy in that little crowd.
And this is something that I did quite a while back. This is from 2001 where
basically I was able to extract the near infrared image directly from a
conventional camera without any extra filtering, just [inaudible] and software. So
I was doing a lot of little multi-spectral tricks.
And in case you're wondering what this is, this is actually the infrared remote
control, and you can see there's the infrared light from that separated out.
So enough kind of hand-wavy background. You can see I've been playing
around with this stuff a lot, but I'm not a very formal guy about this.
So here's my definition of computational photography which may or may not
agree with the kinds of things that most people would call computational
photography.
I think of cameras as computing systems. And obviously for me that makes it a
very reasonable thing for me to be involved in. And so what I'm trying to do is I'm
trying to use computation to enhance the camera abilities and/or to process the
captured data.
And I don't think of it as an image necessarily. I think of it as data. The kinds of
things that I've been doing, new camera sensor and processing models. Some
of that involves some things that I've been kind of playing around with at the nano
technology center over at UK.
Intelligent computer control of capture. Obviously I've been doing a lot of
computer control capture over the years. Detection and manipulation of image
properties. And that's really what I'm talking about this time.
Basically I've done a lot of things with CHDK. How many of you are aware of
CHDK? Has this been used a lot or -- so CHDK basically -- some people in the
former Soviet Union with way too much time on their hands decided they would
hack the Canon cameras and figure out if they could actually reprogram them.
Well, basically what it boils down to is the thing that makes the Canon camera a
little power shot into a camera is after it runs autoexec.bat it runs camera.exe.
So it's actually a very accessible interface. And it turns out that what they did
was they managed to come up with a way of having an alternative operating
system load with the camera thinking that it's actually just starting up the
firmware boot loader.
And so it doesn't actually override anything in the camera. You can put whatever
code you want in there. It's a full C programming environment which you can use
with GCC to generate the code for it. So you can run arbitrary code inside of
these cameras with full access to all the camera features.
And that's pretty cool because here we're talking about hundred dollar little
commodity cameras, and they're actually quite reasonable. They're a little slow
processing-wise, but it's still nice to have everything contained.
Okay. Well, I've had a lot of you said graduate student projects involving the
CHDK and doing interesting things with cameras that way. And back in spring of
2009 be an undergraduate EE senior project, I had these three students working
on the idea of modifying a Canon power shot to capture depth maps directly in
the camera by when you press the shutter, basically quickly fire off a sequence at
different focus distances, analyze where it's sharpest and basically interpolate
and get a full depth map out.
Pretty cool stuff, and they did a really nice job of doing it. So essentially use
CHDK with some custom C code to measure the blur and combine the images,
and they did a pretty nice job. They really did a very serious job. In fact, they
actually won an award that year for best senior project.
The theory behind this is, of course, contrast detect auto focus. If you take a look
at how cameras normally are focusing, assuming it's not an SLR, what they're
normally do is they're trying to detect when the image is in focus by basically
looking at local contrast, and when the local contrast is highest, presumably the
image is in focus.
There are lots of different algorithms in the literature for that, and my students
were pretty diligent about that, came up with about 30 or 40 different techniques
that they actually tried. And it turned out Sobel happened to work best. No big
surprise on that. No big thrill either.
But the interesting thing is there were some quirks about this which I'll talk about
in a moment. The processing they actually did on the raw sensor data and they
eliminated the red and blue channels because they were noisier and really didn't
improve the quality.
It was limited to what they had as far as the resources in the camera. And the
big one on that was the camera didn't have enough main memory to keep all the
images around, so there was some neat little processing tricks that they had to
do. Kind of cool. Not really important to this talk.
Here's what's important to this talk. I realize this is a little hard to look at, kind of
a dark slide in here, but this is basically a bunch of cards kind of lined up on a
table, and you can see this is actually the depth map image that was generated
in camera using their particular project.
And you can see it's pretty good. Close stuff, further and further away, far away
stuff. Cool. Right?
So I was very impressed with this. Everybody else was very impressed with this.
And then we started looking at it a little bit more closely.
And here's the problem. If you take a look at this, the depth of the edges in the
cards are perfect. They're absolutely great. But around the edges we've got
these weird little echos of the edges that say that it's in sharpest focus way far
away from where it really should be.
And basically it's not just way far away, it's way far away in either direction. So
when we were trying to do things like ah-hah, let's interpolate to fill this in so that
we know what the depths are in between, this screwed up everything.
And so that really bothered me, because I looked at that and I went, geez, you
know, why is this so far off? And so I've spent two years basically answering that
question.
And basically, as I say, it's wrong by a lot, it's not wrong by a little bit, which is
kind of an interesting property already. It's wrong in both directions, and as I say,
it echos the edges. It's something whereas you take a look at these things here,
it's not actually just random spots. It's a very simple small distance away from
each of the edges.
So what went wrong? Well, basically it comes down to understanding what
happens to a point that's out of the focus. So we take a point light source,
basically a bright spot in a sky or whatever. Looks kind of like this when it's in
focus. It's just a nice, tight little dot. Right?
If we take a look at what happens to that point, basically what happens to the
point spread function, what happens to this shape as we bring this image out of
focus -- and, by the way, I'll explicitly say right now I'm not sure that anyone
would be approving the fact that I'm talking about out-of-focus point spread
functions because usually people like to talk about them as being in focus, but,
hey, nothing's ever really in focus anyway, so I think I'm still justified.
So here's what people think happens to a point spread function when it's out of
focus. Basically they think you get a Gaussian blur. Right?
And in fact this is what people really, really, really desperately want to have
happen. And this is what the vast majority of the image processing code out
there assumes you're getting in order to do things like contrast detect auto focus.
Well, here's what actually is there. What actually happens is that essentially
when you're out of focus, you don't get a blurry image at all. What you actually
get is you get a larger spot for that point spread function. And that larger spot
happens to have a nice sharp edge, and that nice sharp edge was the highest
sharpness thing that it saw when it wasn't at the real edges of the image, and
there's our false echoing of the distances.
When it was too far out of focus close, too far out of focus far away, it was
hallucinating these sharp edges in areas that really had no edges. Right?
So why does this happen? Well, first of all, this is just basic physics, but let's talk
about how this really gets to be an interesting thing.
Well, when we talk about the point spread function, it's really the response of an
imaging system to a point source which is basically the impulse response.
Normally people like to talk about the resolution of a system as the modulation
transfer function. This is the spatial domain representation of the modulation
transfer function, so you can kind of use this to get back to what the resolution is
and so forth. I'm not really concerned with that, but that's the standard way of
looking at these things.
An image is really the sum of the point spread functions of all points of light in the
scene. Okay. I'm lying. There's all this nasty wave stuff that really happens,
right?
But if we play our cards right and we're careful, we can avoid the nasty wave
stuff, just like I'm a computer engineer, I know that these circuits are really
analog, but I do my best to avoid that. I'm much happier if I can ignore that.
The point spread function grows in proportion to how out of focus for most
lenses. And let's take a look at what the point spread functions really look like.
Well, I showed you before this point spread function. This is actually from a
Pentax Takumar 135 millimeter f/2.5 lens, basically a nice, old, telephoto lens.
And it's a very good lens. And in fact this is almost a perfect point spread
function for an out-of-focus point spread function. This is what you would expect.
This is what physics predicts, a nice, evenly shaded disk.
In case you're wondering what these little spots are there, this is a 50-year-old
lens. It has dust in it. So when you have dust in the lens, you actually see those
dust defects basically as interference patterns in this. And in fact, as I'll talk
about in a little bit, this is one of the cool things that you can do. You can actually
identify lenses because you can actually get a very clear picture of what the
defects are in the lens.
The optical formula affects the point spread function. So if you have two lenses
that have different lens formulas even if they're the same focal length, et cetera,
you actually see different point spread functions for them. The out-of-focus point
spread functions are different.
Retro focal lenses, basically if you have an SLR, a single lens reflex, there's this
nasty little mirror that has to be able to swing out of the way. It would bad if that
slaps into the back element of your lens. Right?
So essentially you've got about a 45 to 50 millimeter gap between the focal plane
and the flange mount on most SLR cameras, so you need to have the focal
length of the lens be larger than that. Otherwise you're probably going to have
some elements extending back into the area where they will be slapped by the
mirror.
All right. The way that that's gotten around is from back in the 1950s, so
basically it was discovered that ah-hah, you can basically build a wide-angle lens
by taking essentially a conventional wide-angle lens and an inverted telephoto to
project the image and that way you can get this larger back distance.
The down side of that is that retrofocus lenses, retrofocal lenses like that
basically have twice as many pieces of glass in them, and they kind of have,
shall we say, the compounding of two sets of defects. So what you, not
surprisingly, often see is you see a bright edge -- bad stuff, because it looks a lot
less like Gaussian blur than what we'd like -- and you also see a little bit of a
bright center. It's not real obvious in this one, but this is actually slightly brighter
than around the edges.
Recognize this? Classic mirror lens, right? People either love these or hate
them because you get doughnuts all over the place. Well, basically, yeah, if you
take a look at the out-of-focus image of a point light source, what you get on this
500 millimeter f/6.3 mirror lens is the classic doughnut shape. And the reason is
because literally you have light coming in around the edges and then it bounces
off the rear mirror and it bounces off another mirror that is blocking out the center
of the light and, not surprisingly, there's your doughnut. Okay. Obviously easy to
identify whether it came from a mirror lens or not.
It turns out that you'll find most modern lenses may not have so much dust and
so forth in them, but they actually almost always have aspheric elements. And
aspheric elements are very cheap to make now, but they're also kind of not so
good. Basically they're made in ways that leave them slightly imperfect, and
those slight imperfections actually leave a mark on the point spread function. So
we can actually identify the slight aspheric abnormalities when we look out the
out-of-focus point spread function.
So, for example, this is the Sony18 to 70 millimeter for a kit lens at 18 millimeter
f/3.5, wide open. And you can see here a couple of things. Basically you can
see this actually has a fairly nasty-looking distorted character to it. The little
ripples that you see and so forth, classic of the minor imperfections you get in
aspherical elements.
And you'll notice something else is wrong with this. Anybody notice what else is
wrong? It's slightly decentered. This lens wasn't quite assembled right, which is
also very common in modern lenses. Less common when lenses were made out
of solid metal like this thing that I have here, more common now that they're very
light plastic because the auto focus has to be able to move these things cheaply
and easily and you don't want big hunks of metal with tight tolerances to do that,
you want loose plastic that's light mass, et cetera.
So here comes the fun part. Right? Even compact cameras actually have an
out-of-focus point spread function that is significant and you can actually
measure the properties. Here's, for example, an Olympus C5050Z. Nothing
particularly spectacular. Kind of an old camera.
Interesting thing, though, you'll notice it happens to have this dark spot in the
middle. Basically what you're seeing is the point spread function is still there, it
still has all these interesting properties, but you're starting to see some of the
wave-like interference. So basically a little bit more complicated structure there,
but still a structure that we can actually recognize and understand.
All right. For Bokeh, the wonderful little generic etherial term for the out-of-focus
properties of an image. Everyone wants beautiful Bokeh. It's a
Japanese-derived word, and you'll hear people talking about this as though
there's some Oriental mystique about how you have good Bokeh.
Basically forget that. It's really just a matter what was the point spread functions
look like. It's really just a matter of what the out-of-focus point spread function's
structure is.
Good Bokeh basically are Gaussian blur point spread functions. If you have an
out-of-focus point spread function that has that nice blur, you get beautiful Bokeh.
If you have something that looks more like what I was showing you for the mirror
lens where you have a very sharp edge and it's got a bright line to it, you get the
worst of all which is what's called Nisen Bokeh or broken Bokeh. Basically
double line artifacts.
So let's take a look at a Bokeh legend. The old SMC Takumar for a 50 millimeter
f1.4, this is, again, another one of those 50-year-old lenses but a very nice lens in
terms of the out-of-focus smoothness. And if you take a look at why, you'll notice
the edge is not very bright and relatively bright center. This is pretty close to
being a Gaussian blur. So this is one of those reasons why people have been
seeking out this particular lens because it happens to generate this nice pattern.
By the way, you'll notice it's not really symmetric. Anyone know why?
The reason it's not symmetric is because -- it's not a decentering problem or
anything like that. It's because this was actually mounted on a camera. And you
go what? Well, the camera has a mirror box, and it has a relatively shiny sensor
in it, and what you're seeing here is this is actually getting a reflection from the
sensor that's causing it to blur out a little bit that way. So you can actually tell the
orientation of the sensor by this too.
Yeah?
>>: So this has no back coating at all?
>> Hank Dietz: When you're talking about 50-year-old lenses, yeah, they didn't
expect you to have a shiny sensor behind that. And between you and me, it's not
even the back coatings that are the big issue. The big issue is a lot of old lenses
will have very flat surfaces in them, and the very flat surfaces, even if they're
coated, are just a bad piece of news for that. Right?
By the way, that's also why -- and I feel very -- I feel very much confirmed by this.
That's also why I have never liked using just plain glass filters in front of lenses.
So I'm always going commando on all my equipment, right? Because I don't
want the extra reflections because a flat piece of glass is going to do all sorts of
evil things.
So the long and the short of it is, yeah, that is definitely generating a pretty
pattern. It looks more like a Gaussian blur, but it still is very easily recognizable.
It still is a very specific pattern.
Remember I was telling you before that the dust and so forth is very clear? Well,
this is a lens that I got on Ebay, as most of these, and -- ever bought a lens on
Ebay? Yeah? I've bought like 100 lenses on Ebay, average maybe 40 years old
each, and, yeah, I've -- I haven't seen too many with fungus, thank God, but
basically here's the fun part.
This looks awful, right? This looks like this is -- this looks like somebody just,
like, ran it through, like, dead meat that was laying on the street or something.
Just terrible fungus infection. The interesting thing about this is that fungus is not
visible with the naked eye unless you actually take a bright light and go looking
for it.
This is a very, very, very, very minor fungal infection in the lens. The reason it's
so obvious is because basically you're looking at the interference pattern for it,
not the actual fungus. So this makes it a really great thing for detecting specific
characters of dust, defects, et cetera, in the lens because they're amplified.
They're very easy to spot.
In fact, I had a bunch of people after -- I actually returned this lens along with a
copy of the point spread function and I got a very embarrassed email back from
the person I bought it from, oh, gosh, I never realized it was that bad. I'm sorry.
Gee, everyone should be required to ship the point spread function with their lens
when they're doing it on Ebay. So, at any rate, the long and the short of it is
clearly we can recognize this.
Now, here comes a bit of a surprise. Remember I was saying that optical
formula actually makes this a little bit different? Well, it turns out not only can
you recognize the differences consistently for different optical formula but you
can also recognize differences between particular copies of the same lens even if
there isn't any dirt in these. These are pretty much brand new Sony 18 to 70s.
There's not really any dust in here. But I think it's fairly easy to distinguish these
two, mostly because one of them has a decentering problem. Actually, they're
both slightly decentered, but I think it's very clear which one is a little bit more
decentered. Right?
So another thing that people talk about, and especially, as I said before, I was
doing a lot of work with some colleagues, especially Larry Hassbrook [phonetic],
who are very known for structured light 3D capture, things like that. One of the
big things that people always talk about as the advantage for structured light is,
well, you have these ambiguities whenever you're trying to do things detecting
depth by looking at blur and so forth. And it's ambiguous before and after the
focus point.
Well, actually, it's not. It turns out for most lenses, it's not ambiguous at all. The
point spread function, the out-of-focus point spread function, is essentially kind of
turned inside out when you go to the other side of the focus range.
So, for example, if you had blue fringe on the image points that were past the
focus point you're going to have basically yellow-ish fringe when you're on the
other side of the focus point. This allows you to actually disambiguate.
Things in front block the point spread function of things behind. This sounds so
obvious, but I haven't seen anybody else actually making use of this. Maybe I
just haven't found it, but I haven't seen it.
If you have a big, fat circle that's coming from some particular out-of-focus point
and you put some object in front of part of that, what you'll see is the sharp
outline of that object will cut into the point spread function so you can actually tell
that, gee, this thing's in front, right?
And I'll actually show you a little example of that, an image, in just a few
moments.
Now, when things are almost in focus, okay, all bets are off. That's the normal
point spread function stuff that people talk about, and there are all sorts of
issues. Different colors focus at different depths, so that kind of messes it up a
little bit. There's this inversion of the pattern that I was talking about before and
after the focus points. When you're near focus, some things -- some portions of
the pattern invert before others so you get all sorts of weird structural changes.
And there are also these wave properties when you start looking at things that
are very small points. So basically things don't really just add when you look at
the focal plane.
All right. So here's the classic thing that people talk about as the big difference
between before and after the focus point. If I have a lens with spherical
aberration, and most lenses have some amount of spherical aberration, basically
if I over correct it versus undercorrecting it, whether I'm too near or too far from
the focus point, you can see that what I get is I get a very different pattern.
Basically I get these sharp, bright edges if I undercorrect and I'm near, and I get
a nice smooth thing if I overcorrect and I'm near. And in far I've got the opposite
thing.
So essentially what you have is you can actually tell the difference between the
near and far just by looking at, again, how the structure has actually changed.
So spherical aberration is one of the ways that kind of inverses as you go before
or after the focus point. And I will say I got this image from Wikipedia, which we
all know is the ultimate source of knowledge, right? But it's kind of right anyway
except for all the wave interference.
Yes?
>>: If you want to use this to do before and after [inaudible] is this something
that has to be calibrated per camera or is there [inaudible]?
>> Hank Dietz: So the quick answer is what I'm actually advocating is having it
calibrated not just per camera but actually having it calibrated at the point of
recognizing exactly what it is for that lens at that aperture setting at
approximately that focus distance at each point in the entire image, which I know
sounds horrifically complicated, but as I'll tell you later, I use a GA for that.
So it's not a calibration procedure. It's more -- basically the way that I'm
analyzing these things is by trying to create them rather than trying to match
them. All right?
Another thing you have, actual chromatic aberrations. Remember I was talking
before about blue shift becoming yellow? Well, that nice little bird was not sitting
on a tree that I painted some of the branches blue and some yellow. If you take
a look at the blue and the yellow, that's basically the before and after focus point
issue. It's essentially chromatic aberrations. Very common on most fast lenses.
So, again, I think it's fairly easy to unambiguously say whether you were before
or after the focus point because it's just a matter of, well, which color is it, right?
Okay. Cat's eye/swirl vignetting. A lot of people, again, in the world of talking
about Bokeh, this is one of those magical features. Oh, I take this picture and
there's this swirly pattern behind it. It's really impressive. What that really comes
from is basically artificial vignetting. Essentially what's happening is near the
center of the frame I've got a nice circular spread function. Out here, that ain't a
circle anymore, is it? And the reason is it's basically a circle that's had an edge
clipped off because the lens does not have infinitely thin thickness. It actually
has a thickness. So essentially you're internally vignetting some of the rays.
You're basically clipping the structure.
So that's the primary thing I have on this slide, but let me just point out -remember I said before I'd show you another slide? Notice this one? That one's
kind of interesting. Because what you see is you see this nice big blurred spread
function, and look at how it's clipped there. It actually is clipped by the fact that
the edge of this chair rung happens to be in the path, so it's giving me a nice
sharp edge on that even though the chair rung is out of focus too. Kind of an
interesting property, right?
I'm not going to pursue this too much right now, but let me say you can kind of
almost look behind things using this. Just not very much behind them.
Okay. Computational photography using point spread functions. So what did I
do? Well, after playing around with this stuff and convincing myself that, well,
gee, there really is a lot that I don't know about lenses and optics and I'd like to
find out more, I basically went out, as I said, bought basically about 100 lenses
on Ebay and essentially went and characterized them. These are the point
spread functions and all that.
And this is what I've basically been able to do with my nice little database of 100
lenses. So one thing is that you can do depth-from-focus or defocus much more
easily. I also can do refocus or converting things to all-in-focus very easily
because, again, I have a pretty good model of point spread function.
I can diagnose lens defects like contamination or fabrication flaws like
decentering, for example. You can forensically identify the type of lens. This is a
little bit of a price. If you look in the field of forensics for cameras, they're usually
looking at things like sensor pattern noise, issues having to do with the, oh, for
example, [inaudible] information, JPEG tagging, things like that. I haven't really
seen people looking at the out-of-focus portions to try to recognize structure
there.
You can also identify the specific lens because, as I said, again, dust, dirt, what
have you, or even aspherical anomalies can actually mark the thing.
Point spread function substitution. Basically, in general, if I want to, I can always
change the point spread function to something else. So you want a Gaussian
blur? Hey, no real lens is going to generate a Gaussian blur. I can make it a
Gaussian blur. I recognize this and I go and replace it with my nice blur.
I can also do things with structured apertures and apodization, and that's part of
what I'm going to be talking about today.
But before I get into some of the additional theory that I have to go to to the main
thing that I'm talking about today, let me just take a little break and give you some
low-hanging fruit.
How am I doing time-wise here?
>> Andy Wilson: You're doing fine.
>> Hank Dietz: Okay. Good.
So turns out Minolta and now Sony are known for this thing called the STF lens,
the smooth trans focus lens. It's a 135 millimeter f/2.8, which is a t/4.5. It's
basically not as transmissive as a normal lens. T-stops are basically an F-stop
adjusted by how much light is actually being passed through, what the
transmissivity is.
And the reason that everybody loves this lens, it's absolutely the Bokeh king, is
because when you take a look at the lens, it actually has this very funny little
element in it. If you look at this, this is basically a flat piece of glass. And, in fact,
this piece and this piece are made of exactly the same optical index, et cetera. It
has exactly the same properties.
The difference? This one is kind of smoke-colored. It's basically gray. Since
that is now machined in a very precise way as this spherical curve, what you
really get is essentially there's less of the gray in the middle, there's more here,
you get this beautiful smooth Bokeh. Very cool idea, right?
So Minolta did this. It's still available. I think it's about $1200 to buy this lens.
It's a cool lens. But now here's the fun part.
Just before Minolta went out of the film/camera business they produced this
camera, the Maxim7. And basically this was kind of the ultimate achievement for
Minolta. This was their highest-end camera in terms the sophistication of the
control logic and such in the camera.
And it turns out among the custom modes in this camera, custom mode 25-2 -yeah, that's not buried somewhere, right -- so custom mode 25-2 -- and, yeah,
you really had to go looking for this to find it. It wasn't something they really
advertised. It turns out what it does is it fakes the smooth trans focus
mechanism by taking multiple exposures varying the aperture.
So essentially if you think about it, if I have the exposure with the aperture
relatively small for a relatively long fraction of the exposure, that's going to make
a bright center and a little bit adds on a little bit of a diffuse larger circle around
that, then a little bit more and such. So basically by taking something like seven
multiple exposures, they've constructed something that looked kind of like a
Gaussian blur point spread function.
Actually, it wasn't really so much Gaussian blur, but here's the fun thing. Oh, I
guess I should say one more thing about that. One of the executives at Minolta
was asked at some point exactly why they don't do this in any of their digital
cameras. And he had a very nice little statement that I should kind of frame that
basically said, oh, because it doesn't apply to digital cameras because they don't
do multiple exposures. Good. Since he said that, now I'm free to do it for digital,
right?
So took the good old CHDK, right, and on an A640, which is actually -- it's not
this one, but, you know, just to show, a little power shot like this. Basically stuck
that in and here's the native point spread function at F4, and basically doing a
controlled varying of the aperture during exposure I get this with a Gaussian blur
approximation. And I think we can agree that's kind of miserable, right?
Gaussian blur is too hot in the middle.
And this is basically closer to the spherical pattern that is actually what they did in
the STF lens. So believe it or not, the spherical actually tends to look better than
the Gaussian that everybody thinks it should look like.
All right. Well, that was cool, right? So we've got some low-hanging fruit already.
And keep in mind, you can do that on just about any camera that you can control
the aperture. And it's left to the reader to find out if Minolta is going to sue people
if they do that.
>>: What was the low-hanging fruit there? The fact that you can go [inaudible]?
>> Hank Dietz: The fruit is that I can basically get myself a nice, smooth,
out-of-focus blur pattern with lenses that don't naturally create that by
dynamically varying the aperture while I'm taking my pictures.
>>: Thanks.
>> Hank Dietz: Which is a huge thing.
Okay. So -- well, let's put it another way. It's a huge thing to photographers. I'm
not sure it's a huge thing to everybody, but it's -- issues that at least a sizable
market that cares about that.
So let's talk about apertures a little bit more. Okay. The aperture of a lens
represents the area through which the light is admitted by the lens, and I'm not
going to get into the admitted -- excuse me, the point at which the lens has its
admission and so forth. We don't need to talk about those details.
A circular opening, the F number is basically the focal length divided by the
diameter of the circular opening. If it's not quite circular, you really should adjust
this, but people usually don't.
Light is admitted in proportion to 1 over the F number squared. So basically if we
start at 1, .7 is going to give you twice as much light, 1.4 is half as much, 2 is half
as much as that, again, and so forth.
The effective F number for a given transmission is the T number. So you just
basically divide the number down by whatever your transmissivity is. Actually
multiply it by the transmissivity, which is a fraction.
So here's an aperture on a nice old lens, 20 blades. You don't see them like this
anymore. Why not? Because if you try to open and close that fast, there's way
too much friction. Can't really do that with wimpy little motors that you're using
now.
On the other hand, when it's a thing you're turning like this, no problem. Right?
Okay. Aperture can be implemented by the lens barrel. That's what you get
when the lens is wide open. Nice circular lens barrel, very easy to make.
Movable blades. An iris. That's what most of us are used to. And it may or may
not be circular, but, you know, it's got the familiar hexagonal or pentagonal or
whatever shapes.
It can also be done with a waterhouse stop, which is basically just a hole in a
plate. Right?
It could also be done with liquid crystal or other controllable opacity materials.
So you can basically make yourself a variable aperture that way.
The aperture cannot be placed arbitrarily. Basically if you put it in the wrong
place it vignettes rather than becoming an aperture.
The correct placement for an aperture makes it form a stop. Now, I'm not going
to go get into the details of how you figure out where that is. Let me just say,
though, that there's more than one place that is okay in most lenses to have the
aperture appear. It's not just that one spot in the middle.
And in fact, traditionally it wasn't in the middle. It was very often in the front
instead.
Usually there's more than one place that it can go, as I said, depending upon the
optical formula of the lens. In front of the lens is really great because I don't have
to open up the lens. In other words, I can make my own aperture without actually
having to modify the camera or lens in any way. Very nice feature. It means
users can modify and remove the modification harmlessly to any camera.
Inside the lens. This is really what you see in most cameras now, and it works
great in a lot of ways, but, frankly, this is really awkward for us to get at. Right?
This is not someplace that you really want to be messing around with.
Behind the lens. A few camera buddies did this so that they didn't have to have
an aperture in every lens, they could just have it in the body. This is kind of
problematic in some ways, but suffice it to say it can work too.
Where you put the aperture does change the optical aberrations that you see on
most lenses, and I'm not going to get into whether it's a really good thing or a bad
thing to place it in different spots. Just suffice it to say at least you can and it will
kind of work in multiple spots.
To be a stop, the aperture effectively must be no larger than any other stop. So if
you have multiple apertures that you're trying to treat as stops, whichever one is
the smallest is basically going to act as the real aperture.
Now, vignetting. Remember when we talked before about the little cat's eye
stuff? There we go. How do you get the cat's eye vignetting? Take a lens, and
if you look through the lens sideways, what you see is no longer a circle, right?
Well, when you're projecting out to the sides, guess what? It's look through the
lens sideways. That's why it's vignetting. Because essentially it's not a lens with
an infinitely thin material, it's a lens that has thickness and you're bumping into
other portions of that thickness that are blocking the light. All right?
This is a really bad thing. Artificial vignetting for the stuff that I'm talking about
here is a killer. It makes some of the stuff that I'm talking about not work.
On the other hand, for recognizing the actual shapes and the point spread
function in an image, it actually helps on that because it's, again, distinctive. It's
another distinctive feature.
It is annoying, by the way, because that also is a feature that, again, it varies
over all the different points in the image. So you have to actually take a look at
that.
On the other hand, that also means that you can do things like, for example, you
can tell the image has been cropped, right? Because the vignetting will be
different.
Natural vignetting. This is the cosine for fall-off. I really don't care. It's bad. It
causes the corners to be dark. It's a bigger problem with wide-angle lenses,
especially non-inverted telephoto type things, non-retrofocal low lenses, but the
long and the short of it is this is there, you live with it, no big deal.
Mechanical vignetting. This is the one that everyone knows is bad, right? You
decide, gee, wouldn't it the be fun if I put this filter on the front of my lens, ah,
let's put another five filters there, and the next thing you know you're doing
something that looks like you're looking through a little keyhole, right? Because
everything is all shadowed on the edges.
Basically mechanical vignetting you want to stay away from too, but it's kind of a
matter of common sense. Most people are not going to accidentally have
mechanical vignetting and leave it there. It's very easy to recognize that. The
artificial vignetting is the killer.
Okay. So remember before I said that the point spread function gets larger as
we get more out of focus? Why is that?
Well, the point spread function echos the shape of the aperture. It's clipped by
the shape of the aperture. That's a really important point.
So when a point is in focus, all the rays from that point pass through the lens and
end at the same spotted on the film or sensor. But when it's not in focus, they
don't really land at the same spot. That's why the point spread function gets
larger.
So out of focus, the rays are going to different positions, and what we're really
doing is we're actually separating the different views through that single lens.
So let's take a look at it here. So here's some object far away that we're focusing
on, and basically if it's in focus, everything's all at one point. If it's not, we're
going to get a bigger image because we're basically separating out the views
from the different points in the lens.
Different points of view is how you do stereo vision. That's how you do
plenoptics. All that information is already there in an ordinary everyday image
that has an out-of-focus region. It's just a matter of can we recover it efficiently.
All right. So the aperture blocks some of the rays. Right? And that's why we
see its shape being imposed on the point spread function.
So if we make that shape easy to recognize, that should help us. Right? And
you'll see that's generally referred to in astronomy as coded apertures, and in the
computational photography field you'll see it usually talked about as that, but it's
also known as structured or shaped apertures. Same concept. It's deliberately
making the aperture have some structure that makes it easy to recognize or
causes some other desirable property. In astronomy it's actually used because if
you shape the aperture properly, you can basically resolve stars that are too
close otherwise.
Okay. So what kinds of aperture shaping has been done in the past? Well,
there's the good old famous Imagon and this Fujinon lens that has this built-in
sink strainer-like thing. These are photos from MF Lenses and from M42.org
because I am not about to buy these lenses. They're way too expensive. But
suffice it to say what does this really do? Well, this is supposed to be soft-focus
lenses. What they actually are is really they're just superimposing a whole bunch
of images from slightly different points of view.
That's really what we're getting for all of these things in the out-of-focus regions.
And that does actually give a fairly interesting character to it because it gives you
some sharpness structure, but it's not really a sharp sharpness structure. It
basically gives you these very fine artifacts that tend to be more textural rather
than something that is disturbing for some bright pattern.
So this is actually somewhat effective, but I have to say personally I think it's a
sort of disappointing effect. I don't think it's really that great a way of doing soft
focus.
The cool thing is, though, by the way, you do soft focus like this, what happens to
the image in focus? It's still in focus. Right? This only affects the things that are
actually in the out-of-focus region.
Okay. How many of you have seen this kind of thing? There's a whole pile of
instructables talking about this sort of thing. There's all sorts of stuff out there.
And basically what it is is you cut some pretty picture, you know, some shape
into an aperture and basically, voila, your out-of-focus points now take that on
and the in-focus stuff still looks kind of in focus and you can do all these cute little
things that are probably really cool to use for one day and then you never want to
do that again.
Okay. Then there's also the more serious stuff in the computational photography
community where they're doing shaped apertures that look like this, structured
apertures that basically have relatively easy-to-recognize patterns of one kind
other another. And, again, these are images from MIT. They're not from me.
I'll make a public scary statement and see if anyone calls me on it. I think this is
dumb. And the reason I think it's dumb is because they have square corners.
Because the square corners cause it to not behave in that nice, wonderful way of
actually having the light add that basically causes interference patterns. So
when I make mine, I do them with rounded corners.
But almost everything I've seen out there has square corners on everything. So
we'll see if the community out there tells me that I'm an idiot and I'm screwing up
on this.
But, at any rate, the point is you can computationally recognize these. They're
usually doing it in the frequency domain for matching these patterns. It's pretty
straightforward stuff.
Okay. Point spread function substitution. Replacing a bad point spread function
with a good one. Good one being defined as what I want rather than what I got.
Commonly attempted for image refocus, right? And there is that wonderful little
camera that's come out now that is doing plenoptics for refocus. There may or
may not be a market. I guess we'll see soon.
You can improve the image Bokeh by replacing the native PSF with a Gaussian
blur or whatever other function. I would say you probably actually want a
spherical blur instead a Gaussian.
And you can directly synthesize 3D stereo pairs and enhance their apparent
depth. Sorry about that. More next lecture. I merged slides and I didn't catch
myself doing that.
All right. So why would we bother doing this? Well, soft focus effects, we said.
Cool Bokeh effects? Yeah. Recognizing point spread functions and captured
image means that we know the depth of the points in the scene. Anything that
you can do with plenoptic you can actually do if you can properly match the
out-of-focus point spread functions, and you get to do it with basically a stock
camera, and you get to do it with higher resolution.
Okay. So how do you recognize these? Well, deconvolution. That's the
standard technique, or at least my impression is that's the standard technique.
Usually in a frequency domain. So you're basically trying to find these patterns,
right?
By the way, that doesn't really work because remember the patterns can be
partially occluded by things that were in front of them and so forth. So there are
all sorts of issues. But at least this gives you a pretty good estimate as a starting
point.
What I actually many doing is I'm actually trying to generate them. So I'm a big
fan of genetic algorithms.
Let me put this a slightly different way. I'm a super-computing weenie. I have
two machine rooms full of super-computing hardware and they're just sitting there
waiting to do whatever I feel like doing. I have no problem with running really
expensive GAs and so forth to just try things out, see how well I can make it
work.
So basically what I've been doing is instead of trying to recognize these
structures, what I've been doing is I've been trying to use a genetic algorithm to
essentially evolve the pattern that, when I go through the transmission, generates
the same image.
So basically I'm doing a search for the parameters for each individual pixel. And
you can see this is really a much more powerful technique because any weird
things that happen spatially over the frame or any other kinds of issues can be
accurately modeled this way, whereas trying to get a closed-form solution thing,
forget it. You're not going to be able to model these things.
I can still use this, by the way, as a starting point to speed this up. Why did I start
doing it this way? Well, there's a guy named Jim Allenbach [phonetic] back at
Purdue University, and I was faculty at Purdue for 13 years before I went to UK
11 years ago. And basically I was helping him with some defractive optic design
stuff using direct binary search, and essentially he was doing better defractive
optic design by doing something very similar to this, essentially trying to reverse
the direction of the design process. And so that's why I thought, hey, this is a
reasonable thing to try. And for the record, yeah, it's a reasonable thing to try,
and right now it's really, really, really, really expensive, but I don't care because
I've got supercomputers.
You can also use spectral color coding on the point spread function. That makes
it easy. I can recognize colors. The only problem with this -- and I thought I was
so clever at coming up with this -- is that it was actually patented in 1973. Don't
you hate it when you find things like that? Of course, 1973 is a good year for it to
have been patented in because that's gone now, right? But basically Songer
[phonetic] had this wonderful little patent on doing single-shot anaglyphs with a
conventional lens that was modified only in that it had a funny looking aperture
stuck in the middle of it.
And the funny looking aperture -- he was very specific about exactly what the
shapes were. This was his preferred shape. Oh, well. Didn't get everything
right. Then these were the other choices.
We got one good shape there, right? If you take a look at these, these don't
really line up very well on top of each other because, remember, if you're trying to
fuse them for stereo vision, these shapes are going to be superimposed.
So when you look at the images generated from something like this, they look
kind of blurry and yucky, and it's not a big surprise, but basically this was used in
the old Vivitar Qdos lens. How many of you have heard of that lens? Yeah.
Very famous lens. It was not in production for very long. They didn't sell a whole
lot of them. And mainly the reason they didn't sell them is because if you take a
look at the images that it generates, they look really kind of sad. They're very
blurry.
But reason is really because they had a very fancy internal filter like this was split
in two parts, and you turn it -- actually, it's split in four parts -- you turn it and it
would come together kind of like an iris to form this dual filter. Very complex
mechanical mechanism. Works very well mechanically, but optically it kind of
mucks it up.
So here's what I do. It turns out there are problems with the color choice and
with all sorts of other issues about this. So I basically went, okay, I'm interested
in this as a digital computing system. I'm not interested in this as just, you know,
some optical thing. I don't care whether I'm generating an image to be viewed as
an anaglyph or not. I'm trying to use anaglyphs as a way to capture information
that it can reprocess like it you reprocess a plenoptic image, for example, right?
So what I'm doing is basically I'm encoding the left and right views by color. And,
in fact, I'm using green and magenta. And it's very important that I'm using green
and magenta. It doesn't work with red and cyan, which is kind of weird, but, you
know -- I'll even give you the little sneak preview.
Think about what the better color filters look like. Very bad choice to have reds
singled out. Also think about how JPEG compression works. Really, really bad
idea to try and separate red and blue on different channels. Because JPEGs
screw that.
So transmission from an anaglyph from a full color stereo pair, it turns out, does
not require stereo matching. Let me say that again.
You would expect that if I was trying to take this one image that basically had the
left and right views color coded and try and separate that out into two full color
images, I would have to do stereo matching to figure out what colors each thing
was. Turns out that's not necessary, and in fact, when I tried it, that wasn't even
the technique that worked best.
Of course, because I can actually put my aperture anywhere so long as I'm
careful about it -- well, I shouldn't say anywhere, but I can put it in any of multiple
places, including in front of the lens, guess what? I get to do this with a nice user
add, user removable stop in front of the lens. So I can do this with just about any
camera.
All right. So what are the big issues? Well, the color choices. I gave you a little
bit of a hint about that. It's not about how it looks when you view it as an
anaglyph. I don't care what it looks like when you view it as an anaglyph other
than it's kind of nice to be able to check it out by just putting on the funny
glasses, right?
And, by the way, I do have a box of the funny glasses that I can pass around
glasses and show you. I have some anaglyph images here.
You want to balance the average value for the pixels. Why? Most camera have
the kind of bold assumption in them that all the pixels are seeing about the same
amount of lighted even though they're filtered to basically get different portions of
the spectrum.
So if you screw that up, basically what you end up with is higher noise levels on
certain color channels and it messes things up.
Color isolation. As I said, the Bayer matrix and JPEG encoding both say you do
not want to be using red and cyan. They mess that up horrifically.
So it turns out green and magenta happen to have a nice property that way
because green is basically your luminance channel and it's pretty much
preserved independent of the color information, which is basically the magenta
pair of channels.
You also want the same pixel count per spectral band for the left and right sides.
Guess what? Bayer filter has two greens for every red and blue. So we balance
them out again.
Point spread function shaping and location of the stop. Well, locating the stop, I
actually have a little web tool that I've put up for that called anaperture that, given
the parameters for the lens and so forth, designs what the stop is, tells you how
to do that and basically generates an image that you can then feed to a little laser
printer -- not laser printer, a laser cutter or to a paper cutter, which is what I use
because I have no budget, and you can basically get it to cut these stops out for
you nice and precisely.
The -- where was I here? The shaping, of course, if you actually go through
analyzing it, basically you really want the shape to be the same on the left and
right sides. And essentially a circle is really good.
So circular shapes seem to be the preferred winner on this. Especially if you
ever want to view it as anaglyph as it's actually out of the camera or perhaps
even on the back of the display as it's in the camera.
Vignetting. Artificial, natural, and mechanical. As I said before, the natural and
mechanical ones, natural I don't really care much. Mechanical, you definitely
don't want to ever have problems with that. Artificial actually causes big
problems. Why? Because if part of the view at an angle is clipped, guess what?
That says I don't have the same depth on the angular measurements because
it's basically saying I don't have those views there. All right? It's literally
removing some of the views.
So I only really get good stereo images during the portion of the lens's aperture
that isn't clipped. That's actually part of what's incorporated in figuring out how to
do this shape and sizing and placement in the anaperture tool.
Depth ambiguity versus the aperture width. Basically the wider the aperture,
well, any aperture is going to be having different views in it. The wider the
aperture, the more different views I have there, the more ambiguity in the depth.
Same problem, right? Because an aperture inside of an aperture is still an
aperture. Still has the same properties.
Depth of focus. People usually like to see things in stereo views with very large
depth of focus. So we have a little bit of an issue there. Depending upon how
we choose the aperture sizes, et cetera, you can have different amounts of depth
of focus. Although, of course, since you're recognizing the patterns anyway, you
can always substitute a different point spread function.
Bokeh shape issues. Basically, again, if you're viewing it, you want things that
line up naturally.
Okay. So basically here's what I did. As I say, here's the tool. It does look like
this. And to be very physical about it -- did that get shuffled just enough to lose
it?
Well, I will have to dig around a little bit. But I actually have a couple of the
apertures that I cut here. But they look like this. They're basically just a simple
circle with a little tab on it so that you can literally stick that inside of the filter
thread. And I'm just cutting them out of black cardboard. There's nothing really
tricky here.
And what are these? Basically you take a pair of these glasses and you cut out
the film from them and you take a little Scotch Tape and you basically -- and I
know you're going, ugh! Yeah, I know. Low budget, okay? If you want to build a
real filter that does this, by all means, do. I did too, but, you know, I only did, like,
one or two of them that are real filters like that whereas I can do these without
even trying. Very cheap. Costs less than a buck to make these filters.
Okay. So this is really accessible, right? So last year I figured, okay, I haven't
really got a publication record in doing computational photography stuff, and, you
know, as the phrase goes, I'm tenured, so, you know, I'm established in another
field. I've got something like 200 publications in high-performance computing,
that kind of thing. So I figured, okay, well, let's try and get this thing out and see
whether the public thinks this is a big deal, see if there's really interest in this.
So I wrote an instructable telling people how to do 3D anaglyphs as a single shot
this way. And as you can see, it's had about 20,000 readers. And, more
importantly, there are about 7,000 people who have used the design tool to
design the apertures for their lenses. So this tells me that this is actually people
actually trying it.
So at this point, time for me to throw out some of the funny-looking glasses,
right? And for anybody who is conveniently hidden behind a computer monitor
watching this online, they get to do this without looking funny in public. Assuming
they have the glasses, right?
Of course, I have my special one that has the Linux penguin in the middle.
>> Andy Wilson: Do you have one more?
>> Hank Dietz: I have plenty more, yes. Have some more. Because we must
have the entire audience looking funny, right?
Now, I will say I did not really clean up any of these images. These are basically
raw from the camera. And, remember, there is some internal vignetting, some
artificial vignetting, that causes some defects on these. That is kind of a small
image, so, here, here's a big one.
And this was shot, again, with an unmodified camera, just sticking that stupid
filter in front. I haven't even reprocessed this. This is just the raw stuff.
Here's another one. I think you can agree, this is really pretty good depth from a
camera that hasn't been modified in any permanent way, and the total cost of the
modification is well under a dollar.
>>: Which camera did you say it is?
>> Hank Dietz: The majority of the stuff that I've been shooting has been shot
with either a Sony 350, a Sony A55 or a Sony Nex-5.
>>: Are these point-and-shoot class cameras?
>> Hank Dietz: No. So these are -- the stuff that I'm usually using is like this. So
basically a little bit higher-end camera. It does work for point-and-shoots and it
also works for things like camcorders, but the precision with which you need to
make the apertures gets touchy. So for something that's being homemade and
cut on a paper cutter, a craft cutter -- actually, I'm cutting them on a craft robo, if
you've ever heard of that particular model. It's about $200 paper cutter. It really
pushes the technology to be doing it for these finer points.
Keep in mind also that the stereo separation is going to be relatively small. So if
you have a point-and-shoot -- here, I'll give you another. Here. If you have a
point-and-shoot camera, if it's got a high enough resolution, you're still fine. But
keep in mind your ability to resolve the depths and to separate the views is going
to be dependent on having a lot of pixels there.
So in some sense we do have that similarity to plenoptics, right? If you want to
get a high resolution light field, you really need a lot of pixels that you're kind of
wasting to get the plenoptic field for each point. We're kind of doing the same
thing here except they're not really completely wasted. We still get relatively high
resolution. The catch is that we've kind of lot of some of the color information.
>>: When had you say a lot of pixels, I mean, 20 megapixels? What is a lot of
pixels?
>> Hank Dietz: Let's put it this way. It ain't gonna work with a web cam. All
right? If you have something that is typical little like the power shots or
something like that, your typical hundred dollar camera or perhaps even some of
the latest versions of cell phone cameras, you should be fine. If you're in 6, 8
megapixel, you're probably going to be okay. What it really depends on is the
physical size constraints of how large the lens is and all of that, so it gets into a
lot of the raw optics design. And I guess you're more of an optics person. Can
you kind of say something about that or -- yeah, basically what it really boils
down to is it's very specific to the properties of the optics that you have and the
sensor size and all of that, but you do need a relatively high resolution sensor to
be able to get good information when you have relatively small stereo separation.
And relatively high resolution, you know, 8 megapixel is probably more than you
would need for almost anything.
>>: The effect of color matching projected to the spectral properties would
presumably impact on the clarity, but you wouldn't probably have to do much
more than that in order to get all the information out.
>> Hank Dietz: So, again, yeah, I'm doing a really crappy job here, right?
Because I'm having you look at this on an uncalibrated projector. Everything's
uncalibrated here. But I don't care because, again, I'm not really doing this for
people to look at with these funny glasses, I'm doing this for me to reprocess into
full stereo pairs and things like that.
So I think a good way of thinking of it is if you were shooting plenoptics, not only
would you be stuck with the lower resolution kind of no matter what, but you
would also have the problem that basically if you're doing it with a plenoptic, how
do you view that? Do you just ignore all the other view information? It gets kind
of funny.
Whereas, here there happens to be this convenient way that you can get at least
an approximate view. And I will say, by the way, keep in mind this works on live
view. So it may look funny when I'm taking the pictures, but like when I'm taking
the pictures, I can actually do that too.
Doesn't work so well with an electronic viewfinder because I can't get two eyes in
an electronic viewfinder. But for life view on the back of a camera, it's fine. And
I'll even -- I'll give a completely unwarranted, unjustifiable plug. I love the Sony
Nexus. So I hope Sony recovers and continues building them.
But, at any rate, may the floods recede.
Okay. Let's go through a few more here. These, by the way, are not really great
images, but I kind of threw together stuff. These are actually some of the older
images. These are mostly the ones that I had when I published the instructable a
year ago.
You can see they're all pretty good. They're pretty convincing.
This I actually shot a few days ago. And I see it conveniently had Seattle-type
weather. These are actually in my office.
Am I allowed to show a Linux penguin? Never mind. Okay. I know I can show
that, right? Because that's a UK thing. So just remember, UK, University of
Kentucky! Woo!
All right. So, again, why am I dealing with anaglyphs? It's not because I really
want to be viewing anaglyphs, although, I admit, it is a cool thing to be able to
look out at an audience that has actually all put on the stupid glasses. What
power I have up here, right?
So why am I doing that? I'm doing it because single-shot anaglyph computer is
easy and it gets me stereo information. It gets me more than stereo information.
There are lots of anaglyph images out there.
Ah. Now here comes the fun little thing. Let me say this very specifically.
I don't care how you got your anaglyph. So, for example, if it's already an
existing anaglyph movie, guess what? Now I can reprocess that. I can do the
whole thing on that too. I don't care whether it was shot through two matched
lenses or if it was shot through pieces of one lens, although I will admit, one lens
is usually fairly well matched to itself. Right? Not at well matched as you might
think because you're looking through edge portions of it, but it's still fairly well
matched. Right?
So this is really a potential win because there are a lot of stereo TVs out there
and such, and there aren't really a whole lot of stereo things that they might want
to watch, but there are a fair number of anaglyphs out there.
So I'm trying to use anaglyphs basically like people use light fields. So I can do
refocusing, I can do point spread function substitutions, I can get full color stereo
pairs. That's the sort of thing that I'm really doing here.
And how far long am I? Well, I'm going to show you some kind of old stuff, but,
you know -- let's take the basic one, getting a stereo pair from this.
So remember I was saying you don't have to do stereo matching to get the stereo
pair full color? Well, one thing that people have tried is just doing blurring and
masking. It's cheap. It doesn't work. You get all sorts of horrible artifacts if you
try and blur and mask to restore the colors from one channel to the other.
Stereo matching, basically you have trouble finding matches. Why? Because
unless you're photographing a gray scene, there are going to be marked
differences in the features that you find in the different color channels. So you
really don't have something to align it on that's all that great.
Modified super pixel or shape matching worked okay. So I have about 15
different ways that I tried doing super pixels modified to follow shapes and that at
this tried just doing literally shape matching directly.
And this has some promise. This actually in some ways works better, but it's
much more computationally complicated than the stuff that I'm going to be
pointing at having to do with colors.
Point spread function matching in theory could work, but, again, really matching
them is a little bit computationally complex right now. So believe it or not, what
actually worked best was this weird color analysis.
And now I point to another thing that's not one of my pictures, but how many of
you have seen this before?
So this is from the cover of Scientific American in 1959, November of 1959,
actually, the monthly I was born, and this was done by Land, and what it is is
basically you see there's a projector there. It's projecting two monochrome
slides, one with white light and one with red light. Now, how many of you see
blue and yellow and green? What up with that?
Basically this inspired a whole pile of work from Land. He spent about 20 years
trying to explain what the heck was going on with this, did not really fully
succeed. One of the artifacts from this is something called the retinex theory, the
retinex algorithm.
Basically what I'm doing is very akin to what he was doing with this. What I'm
doing is essentially I'm saying, okay, based on the color adjacency properties
that I see in the image, what I'm doing is I'm saying, okay, let me take the side
that has two different colors and try and make a guess at what the third color
should be for each of those points based on nothing but the color analysis.
Then what I do is I take that solution and I plug that in to refine the color estimate
on the other channel. Then I take that solution and I plug that back in to refine
the estimate on the first channel, and I flip it back and forth a bunch of times and
that's basically how I'm getting my colors, which is kind of weird because you go,
where'd they come from? They came out of thin air. Right?
The thing that people keep forgetting is color is not a property of wavelength.
Color is a perceptual thing. Right?
And basically -- by the way, it's even more disturbing because, notice, this is a
color photograph of this thing and it even photographs, right? So it's perceptual,
but it's actually something that you can photograph with an ordinary camera.
I'm not pretending that I've solved in the past couple of years this problem that
Land worked on for 20 years and didn't solve, but at least I was smart enough to
notice, hmm, there's something to this, and so that's why I started doing these
little weird things in color space.
>>: [inaudible]
>> Hank Dietz: At least as far as I'm concerned, yeah. I mean, is it? Because, I
mean, I have never seen anyone who had a really great explanation of it.
Actually, the best explanation of this that I have seen is, oddly enough, at the
website of a woman named Wendy Carlos, who's better known for her music.
And it turns out she got into this. She's a true artist. She does all sorts of things.
And she actually ended up simulating this where what she did was she took two
images -- well, I should actually say one color image, split it up such that she
basically created two gray images from different color bands for it, projected one
with white light and one with red by basically making arbitrary line patterns where
you had one line was gray and the next line was red -- actually, what the heck.
I'll show you. I have it in here, but I wasn't sure whether I should show you. So
I'll show you because it's coming up. It's this slide.
Oh, that's not working. Unfortunately, it's getting scaled.
It doesn't give you the good effect -- well, actually, you can still see it. This still
looks kind of yellow, right? It's not as good because this is being interpolated
and reworked in all sorts of weird ways.
But to make a long story short, if you actually do alternating stripes of red and
monochrome with just two monochrome source images that are from different
color bands, even if they're taken as close as something like two yellows that are
only 10 nanometers apart, you actually see full color, which is really freaky.
And, again, that actually was primarily out of Land's work, but it's kind of
interesting that you can digitally do this.
And so the long and the short of it is colors have much more stable properties
than people think. So it is possible to recover.
Now, this is not the world's best image. You can see there's a fair amount of
vignetting impacting this. You've got some fringing on the two sides, but it's not
terrible.
Here's another one. It's not particularly great, but it's not terrible.
Well, here are the stereo pairs generated from those. Only by looking at the
color.
And you can see they're not perfect. There's that shading there that was really
the defect from the vignetting. And I can tell you, I've done a lot better than this.
These are kind of old images, but you can see this actually works.
And if you have a better set of anaglyphs to begin with, obviously it works a lot
better.
So the long and the short of it is -- I guess I should say one more thing on that
too.
These are not great, and I'm not giving you the full algorithms or anything like
that, but in January at the SC11 electronic imaging conference, I'm going to be
presenting some of this. So it's a little bit pre-publication that I'm talking about
this. That's why I'm not giving you too many of the details. But at least I can kind
of whet your appetites and also hopefully find out if I'm doing anything bad before
I do it in the audience there.
Right. So basically, in conclusion, there's a lot more information in out-of-focus
image reasons than people have been using in general, much more than people
have been using. And there are low-hanging fruits. There are things that you
can trivially, easily do.
Dynamic aperture control using point spread function shaping, anaglyph capture
and reprocessing. These are not hard things to do. These are definitely feasible.
The reprocessing, okay, it might be another six months or a year before I have
something that I really like that can run at video frame rates, but, you know what?
I will. Absolutely confident of that now.
More accurate models are needed if I'm going to be doing this point spread
function recognition and substitution and all of these nasty things. And in fact, I
can actually do it, believe it or not, on a completely ordinary image. If you just
give me an image that was taken with an ordinary camera, no filters, nothing, no
a priori information about it, I can actually get all the same information out of it,
but basically when I've tried it even on simple test images, I'm talking about a
week plus -- actually something more on the order of a month on a
supercomputer to process one image.
So not really a useful technology yet, but, you know, give me time. Or, even
better, give me time and funding [laughter].
But in any case -- so this has a lot of promise, but there's still work to be done.
And as I said, I'll really just starting to publish in this field. I'm really a -- I'm
known as a parallel supercomputing guy, compiler guy, and architecture guy. So
this is a new branch for me. And because of that, I'm, as I say, quite confident
that I don't really know what I'm talking about here. And if any of you have any
nice things to contribute that way in terms telling me, you know that thing you
said on the fifth slide, that's totally wrong, please, go right ahead and tell me,
because better to find that out now before I've actually published a whole bunch
of papers than later.
So thank you very much.
[applause]
>> Hank Dietz: Yes? Your photographs of the point spread functions for the
different lenses are very much like open shooting at a point diffuse source. But
how did you actually set those up
>> Hank Dietz: Okay. Now, setting them up is actually harder than it sounds
because you really need a good point source. And it turns out that -- so, okay,
you can sample a point spread function out of focus past the focus point or
before the focus point. And you'd really like to sample both to make sure that the
inversion is really correct and all of that.
Sampling them before the focus point is very hard because any minor detail in
that point source, any flaw in that point source is going to show up in the point
spread function.
And to be honest with you, I haven't found a really satisfactory way of doing
those yet. What I've been playing with is I've been using some very fine
fiberoptic fibers and using that as my point sources, but I'm very open to
suggestion on that.
>>: [inaudible].
>>: Lasers?
>> Hank Dietz: No, a laser won't have the same properties because you'll get all
the -- yeah. So I think that the fiberoptic cable is not a bad way to do it, but I'm
not happy with it yet.
The ones that I have been using, the ones that I've showed you, most of those
are really easy because once you get far away enough if you have a relatively
clean light source, it's not that hard to make something that's a good
approximation to a point source. So my standard thing that I've been doing is at
a distance of 10 meters with the camera focused basically at something like five
feet, so like a meter, a meter and a half, something like that. I forget what I had
exactly.
Actually, I have a web page that gives the guidelines for how to do that because I
was trying to get other people to sample lens that is didn't feel like paying to -yeah.
At any rate -- did I say low budget? Yeah. Okay.
So basically what I've found is that, believe it or not, if you choose wisely, one of
the little white lead pen lights can actually be a reasonable point source.
>>: So the LED's got enough spectral breadth to it that it doesn't crease. The
wave effects are the ones you'd be concerned about, right, to get a clean spectral
source?
>> Hank Dietz: So while the wave effects are really not going to be that relevant
so long as I have a relatively large aperture, et cetera. So basically it's not -- it's
not touchy in that way. It's that if there's structure there, that structure gets
magnified in much the same way that the lens defects get magnified.
So basically if you have something that's a relatively clean white lead source -don't get me wrong. I'm not saying it's a good source. I obviously need a better
source. But that's good enough that I've been able to repeatedly see the same
patterns on different lenses multiple times over periods of months. It's been good
enough that way.
Let me say the ->>: [inaudible]
>> Hank Dietz: I did that -- well, actually I tried using halogen light sources
through the fiberoptic from distance, and that wasn't too bad. But the lead pen
lights, believe it or not -- just at 10 meters the lead pen light, if you choose wisely
which lead pen light, that actually works pretty well.
And if you think about it, it makes sense. Because if you think about the physics
of how the led pen lights are actually making the light, you really do have a pretty
clean point source.
So it works better than you would originally think, but when you think about it, it
really is a micro structure that's quite accurate.
The issue that I actually ran into that I wasn't expecting but I obviously should
have expected this, as I've gotten all these old lenses and I'm changing lenses all
the time, I'm having to clean my sensor like three times a day. And if you've ever
cleaned a sensor, this is not fun.
So basically there's always dust on every sensor, and I'm not doing this in clean
room circumstances, I'm honestly doing it in my basement because that's the
only place I could get it dark enough that had a long enough shot.
And, by the way, just to tell you how bad it is, I'm doing it in my basement and the
target is actually in my shop area where I have woodworking tools and all that.
All right?
And, by the way, I do have a lab that's long enough at school, but it's got
windows the whole length of it. So I can't make it dark. So, at any rate, the long
and the short of it is that the dust has been a problem. And the way that I've
gotten around that is for many of the point spread functions that I've captured and
that I've shown you, I actually go through a procedure where I essentially capture
the point spread function multiple times and then basically rotate the camera to
essentially eliminate the defects from some of the sensor dust. And if you rotate
or move the camera slightly, you can basically compensate.
So the marks that you've seen in those I'm fairly confident are the marks on the
lens, but it gets convolved with the dust on the sensor and that sort of thing.
That's been more of a problem than the quality of the point light source.
>>: Well, and some of the cameras have an automatic dedusting thing which I
presume is -- it's good for me, which is probably not worthwhile.
>> Hank Dietz: Some of the cameras have a marketing device.
>>: [inaudible]
>> Hank Dietz: Well, so like the Sonys, what they do is they do ultrasonic
vibration of the glass over the sensor.
And, by the way, I should be precise. We're never touching the sensor. The
glass is on top of the -- but at any rate, so Sony actually does this vibration, and
on the next 5 it's supposedly an ultrasonic. On some of the others it's actually
they use their image stabilization thing. They just literally shake it violently. And
there are a whole bunch of other techniques, some of them using ultrasonic
pieces of plastic that basically try and projectile vomit the dust away. I don't
know how else to put it.
But from what I've seen, none of these techniques really work. And do they get
the dust off? Yes. But the dust that they get off -- there's another device that is
really high tech that I can show you that gets that dust off really well. It's called
this.
So the dust that comes of with this is the dust that they tend to be really good at
removing. The dust that is stickier or whatever and, you know -- I'm not proud
here, right? So, again, I get no money from any of these people, which is an
unfortunate theme here, but at any rate, this is a lens pen. And I'm using this to
clean my sensor all the time.
And even so, I don't know how many of you have interchangeable lens cameras,
but if you ever want to really be truly disgusted, just take your camera, set it on a
nice slow shutter speed, set it to overexpose a little bit, point it at a blank wall,
and basically take the shortest focal length lens you have, stop it down as far as
it will go, just move the camera violently while you're taking the picture so that
there's total blur, there's nothing there for it to actually interfere with the dust
pattern, and then take a look at the image and you're going to see just -- it's
incredibly scary. There's just tons and tons of dust there.
By the way, you usually don't see it because basically if you're stopping down,
you're getting very narrow shafts of light, it casts basically nice sharp images of
the dust that's really far away from the surface of the actual sensor.
But if you're wide open, it's not sharp shadows, you don't really see it. So, yeah,
dust is much more of a problem than I ever anticipated.
I'll also tell you too, it really hurts, because in terms of doing things like
macrophotography and such, I've actually found that good old bellows actually
work better than a lot of macro lenses. The catch is bellows are like the ultimate
dust infuser because -- yeah.
So to make a long story short, I obviously need a clean room, don't I? But it's
worked out okay. And as I said, I really go by how repeatable the measurements
are that I take. If the measurements are repeatable over a period of months and
so forth using a similar setup but with minor changes, then I feel like I must have
actually compensated pretty well.
>>: In your paper would you be going into the actual PSF reshaping?
>> Hank Dietz: So ->>: That's -- I don't understand how you even go about doing that, frankly.
>> Hank Dietz: Oh, it's very simple. So you recognize the structure. Basically
what you're going to do is you're going to take the -- well, I'll tell you directly what
the idea is. My representation, the thing that the GA is trying to come up with, is
basically for each pixel, what the total energy is at that pixel in the scene, you
know, corresponding to that pixel in the scene, in each of the color channels and
basically a positive or a negative diameter for the spread at that point based on
whatever the point spread function for the lens is.
So what I'm really looking at is red, green, blue diameter for each pixel. And the
diameters don't have to actually be circular because if I know that the pixel is
physically off to the side, if I know what the point spread function is for that lens, I
can recreate that.
So all that I'm really doing is when I have that set of data for the image, when I
have the red, green, blue and basically that spread diameter, that point spread
function diameter, that is sufficient for me to then perform any kind of
transformation I want to change the diameter systematically -- for example,
enlarge them and then re-render with a wider stereo separation with a point
spread function that looks like two dots instead of one or a dot that's offset for the
left side or a dot that's offset for the right side, so two separate images.
So it's basically really all I need -- the gold image for me is really that red, green,
blue, and point spread function diameter with negative before the focal point and
positive after the focal point. And if I know that, I can reconstruct everything.
And, by the way, the reconstruction is more complicated than you might think in
that it is a true reconstruction. So what I'm actually doing is, for example, when
something is in front of something else, it clips the point spread function for the
thing behind. So that actually happens. Basically you solve that by just doing a
rendering order. It's really very simple.
>> Andy Wilson: Okay. I think we're just about out of time. So I think Hank will
be around today and again for another talk this afternoon, so he's available if you
want to chat with him.
>> Hank Dietz: Right. And if you want, I guess I can say one more thing. So I
have a research exhibit at the supercomputing conference, which is how I could
swing being here today. Actually, I've had a major research exhibit at the
supercomputing conference -- well, I've had my own research exhibit. My group
has had my own research exhibit ever since 1994 when we first showed off
whole bunch of Linux PC clusters. As I say, we were the first people to build
one, eight months before the [inaudible] guys. And I'm the author of the parallel
processing how-to for the Linux documents project, which, ironically, even though
I've published like 200 technical papers, there are 30 million plus copies of the
parallel processing how-to in circulation. It doesn't count as a peer-reviewed
publication. But it's obviously the biggest impact document I've ever written.
So there's all this weirdness, but, at any time rate, yes, I will be at
supercomputing, and we're Booth 202. So I think we're three over from where
Hewlett Packard is.
>> Andy Wilson: Thank you.
>> Hank Dietz: Thank you.
[applause]
Download