16103 >> Richard Szeliski: So it's my pleasure to welcome...

advertisement
16103
>> Richard Szeliski: So it's my pleasure to welcome Zeev Farbman, who is a graduate student at
the Hebrew University in Jerusalem. And he's here at Microsoft as an intern this summer, and
he's also giving a talk at SIGRAF (phonetic) next week about some research he did with us last
summer. So he's going to present a dry run of the talk. We're not going to take any questions for
the first 20 minutes while he's speaking. After that we'll have 10 minutes for questions and
debugging comments. Go ahead.
>> Zeev Farbman: Hello everybody. Today I would like to tell you about some recent work that
we have been doing on edge-preserving decomposition for multi-scale tone and detail
manipulation. It's a joint work with Ryan Anafol, Daniel Shalensky and Rig Zalinski (phonetic).
So let me start by reminding what is an edge-preserving decomposition. This is a process where
you take an image and apply a edge-preserving smoothing filter to obtain a piece-wise smooth
version of the image where small details have been removed but the strong edges still remain.
Now, if you subtract the smooth version from the original, within a residual image contained in all
those fine details that have been removed by this smoothing operation.
These two images are often referred to as a base and detail layers. Such decomposition are
used by virtually dozens of recent computational photography techniques, such as tone and high
dynamic range image editing and image extraction, et cetera. Most of these techniques rely on
the bilateral filter to do the edge-preserving smoothing.
In this paper we're advocating an alternative approach based on the weighted list for optimization
framework. We'll see that the filter based on this framework gives us better control over the
spatial scale of the features that end up in the detail layer. Also, in many applications we would
like to operate on details at a variety of scales rather than just at the single scale. And here's an
example for such a scenario. Suppose we have this nice landscape photo and would like to
make it (inaudible) and emphasis the details of it. So by manipulating details of the finest scale,
we get this effect, but we can get very different effect by manipulating details at some other scale
made through the course.
In practice, a photographer would most likely to combine together the result of detail manipulation
at all those scales to get maybe something like this result.
And this type of manipulation is exactly the kind of thing that our multi-scale decompositions are
good for.
Now that we have covered the motivation for our work, we can go a bit deeper into this whole
issue of edge-preserving filtering. And let's start by reminding ourselves what are our
expectations from an ideal edge-preserving operator?
What we need to do is to be able to smooth the signal while preserving the shape of the
significant change in the signal and then cleanly extract the details at a given scale, which then
can be manipulated and recombined with the base to yield the result.
When performing the smoothing, it is important to avoid blurring of the strong edges since this
introduces ringing into the detail layer and may result in halos once the base and details are
recombined.
Trying to use a piece-wise constant segmentation in order to extract the detail layer is also not so
good idea as it will oversharpen, say, the edges and make all things traversal of the artifacts near
the strong edges.
Because there's an extensive amount of previous work in the field of edge-preserving filters, and
we will address it really briefly by mentioning the methods that were or are still used for
computational photography-related tasks that require base detail separation.
And one of the earlier edge filters was an isotropic defusion. Originally, it was not devised as a
base detail separation method but more as a technique to facilitate the early vision algorithm such
as edge detection and image segmentation. In this case oversharpening may actually be
considered a desirable feature but make it less suitable for detail extraction.
Also, it's a theoretic set of somewhat results in somewhat slower (inaudible) and because of this
reason the diffusion has not been used much in the computational photography. And currently
the bilateral filter is the most common edge-preserving filter in computer graphics.
Okay. So let's remind ourselves how it works. Applying the bilateral filter on image I at point P
basically amounts to computing the weighted average of the surrounding pixel cue. The actual
weight of each surrounding pixel cue is determined by both the spatial distance of pixel Q from T
and the difference in intensity. Because of this the image is filtered with the partially varying
kernel which is able to preserve strong edges.
The shape of the kernel is controlled by two parameters. The spatial full of parameter sigma S
and the range full of parameter sigma R. This worked reasonably well as long as the smoothing
is fairly mild. For example, this is what we get with the fairly small set of parameters. You can
see that the texture has become smoother but not the silent edges, which is good.
Now if we try to remove larger scale details, for example, if we try to get rid of the texture on the
statues, we'll find it's quite difficult to do. As you can see on the sequence here, you've been
quizzed only the spatial support of the filter, sigma S. Some small scale details simply refuse to
go away and some things even reappear especially near the strong edges. This may appear
counterintuitive at first, but the reasons is previous sigma S each pixel will return to find more
pixels with those values and those pixels will dominate the weighted average.
So in order to achieve a more aggressive smoothing, it is necessary to increase the range
parameter sigma R as well, but this automatically causes even some of the stronger edges to
become blurry and yet some small scale details still remain unfiltered.
And, of course, in the limit and quiz in sigma R makes the bilateral filter behave increasingly more
like a linear filter. So to summarize, in order to produce progressively coarser images with a
bilateral filter, we need to increase sigma R which tends to blur some of the edges that we
probably would like to preserve.
And as we already mentioned, blurring introduces halos. So, for example, suppose we want to
boost the details in this image. If you extract the medium scale details with the bilateral filter and
try to boost them, we get an image with the halo artifacts. And this is something that the
photographers really don't like to see in their images, at least without having the control of it.
So one possible way to overcome the problems with the bilateral filter is to build upon the fact that
it's quite effective when kernels are small and conservative. And instead of trying to do all the
smoothing in one iteration, we can try to apply it in an authoritative manner, in each set applying
small kernel on the result of the previous iteration.
So this was the main idea in pholatal (phonetic) work from the last C graph (phonetic). And let's
just look at the results of this method. It produces a more effective image coarsening. But notice
some small details remain even in the coarsest levels. And it also oversharpens some of the
edges which have been already mentioned can cause thin radiant artifacts near the edges.
Another alternative is the tri lateral filter. It is governed by signal parameter, sigma C, and let's
just see what happens when we change this parameter. It managed to filter out most of the fine
scale detail but introduces a variety of strong artifacts near the edges.
Finally, this very long prologue, we get to the approach that we advocate which is the use of the
weighted list squares optimization framework to perform an edge-preserving smoothing.
Informally speaking, what we do is the following: Given an input image G, we seek a new image
U, which, on one hand, is as close as possible to G, especially near the strong edges and at the
same time is as smooth as possible elsewhere.
Let's try to make it a little bit more formal. For simplicity, let's switch to one different now. First,
the similarity between the original signal G and the result U can be obtained by minimizing the
following expression. Square of U minus G at each point P. Clearly the result here is simply U
equals G. And in order to get edge-preserving smoothing we need to add the smoothness
requirement as well.
Further, simple smoothness requirement would be to minimize the squares of the first derivative
of U. So let's revise our problem accordingly. This gives us a much smoother result but not very
edge-preserving one, since the smoothness requirement applies everywhere.
So in order to make it work, we should add the last part of our requirement, of our intuition, and it
is the fact we do not need U to be smooth everywhere. Across significant edges in G, we can
relax this requirement. Our way to convey this intuition is to add a third pixel map A which should
have a high weight away from the edges and low weight across the edges.
The purpose of map A could take many different forms. Probably one of the simplest is dramatic
equal to the inverse of the derivative of G raised in some power alpha. Epsilon simply takes care
of zero division.
So now in places where the derivative of G is large, we'll get a low weight on the smoothness
term. And U is forced to be more similar to G at this point. Okay. So that's our function. Now by
minimizing it, we hope to achieve an edge-preserving smoothing.
Okay. Now let's switch back from 1-D signal to 2-D images. Things stay pretty much the same
except that, instead of minimizing the derivative of U, we would like to minimize the partial
derivatives in both X and U directions at each pixel P. We also would like to add a factual lambda
which will balance between the smoothness requirement and the similarity requirement. By
increasing the lambda, we can make the result progressively smoother.
The exact connection between the parameter lambda and the degree of smoothing can be hard
to analyze because we can't immediately apply a linear filter theory since our operator is
especially variant. But using the fact that away from the strong edges our operator very much
resemble especially invariant operator, we can figure out some interesting connections and more
details on this can be found in the paper.
So in order to find U, we need to minimize this expression. And since this is a quadratic form in
U, we can compute U by solving this linear system where L of A is nonhomogenous Laplacian
metrics. The coefficients depend on the input image. In other words, we have a closed form
solution for U.
That pretty much sums up the exposition of the edge-preserving operator. Now, the
decomposition construct it in a way that is very similar to that of the Laplacian, regular Laplacian
pyramid. We can either apply our operator on the original image multiple times with different
lambda parameters each time, or apply the smoothing operator interactively.
The alternative scheme is useful when we are interested in aggressive smoothing and detail
removal and not too concerned about other sharpening. Regardless of how we computed the
course and version of the image, the detail layers are simply defined as a difference or ratios
between two subsequent levels.
And this is how the final multi-scale decomposition can look like. Of course, the exact set of
these is very application-dependent. Now we've got the decomposition. Let's see what kind of
stuff we can do with it.
Then we implemented a number of simple tools that use our decomposition for things like
contrast manipulation, high dynamic range enhancement code mapping detail enhancement
image obstruction. The purpose of these tools is to demonstrate that one can easily improve the
results of the existing applications by simply switching to the weighted list squares-based filtering.
Okay. Let's talk with multi-scale contrast manipulation. And here's our simple tool for
manipulating local contrast at different scales based on our multi-scale decomposition. Once the
decomposition has been computed, the different layers can be manipulated at the interpretive
rates.
Note that the manipulation range is very wide. It takes an extreme manipulation to cause the
artifacts to appear. Manipulative medium detail layer can adapt to the image while coarse detail
manipulation is closer to global contrast adjustment. And, of course, we can combine
adjustments of all detail layers to visually pleasing final result.
Those of you familiar with Adobe Photo Shop or other similar tools know you can achieve a local
contrast adjustment with unsharp masking, which actually can show up as detail boosting where
we have a regular low pass filter for base detail separation.
So it's interesting to compare between these two and on this slide you can see the result of the
unsharp masking. And here's the result of a weighted list square-based contrast enhancement.
You can probably see that away from the strong edges things pretty much similar, but in the
(inaudible) strong edges, the unsharp masking is prone to halos, which we can avoid with the
weighted list square smoothing.
And another application is detail iteration from a (inaudible) paper. The idea is to combine details
from a collection of images which have been taken under different light conditions in order to
create single imagery with details.
Upon close inspection in the original (inaudible) results, we can find thin gradient reverse
attributes that can be attributed to over-sharpening the filter. Those can be avoided by simply
switching to weighted field squared filtering. Another thing while the original results are produced
by combining the detail layers for a number of images, here we demonstrate that in some cases
we can generate highly saturated detail even from a single input image rather than three multi
light images like in photalital (phonetic) work.
Weighted list square smoothing are easy to perform a detailed perserving decompression of high
dynamic range images. For example, we can simply replace the bilateral filter in the full mapping
algorithm by Duran and Dorsey (phonetic) with the weighted list square based smoothing and
avoid the mild halo artifacts that are sometimes visible in their results.
So in their original result, you may spot some halos near the picture frames and the light fixture.
Those pretty much disappear when you simply switch the underlying filtering from bilateral filter to
weighted D squares. Let's see it again.
The bilateral filter smoothing, and here's the weighted list square smoothing. Another option we
experimented with to use tone mapping algorithm proposed by Tomblin and Tober (phonetic)
replace Elsie's based multi-scale decomposition with our weighted square list decomposition.
Here, the goal was to achieve a rather flat image with exaggerated local contrasts. Again, upon
close inspection you can see that with the weighted list squares you have less artifacts.
And here we went after more photographic log by maintaining stronger contrasts at coarser
scales. Of course, we can also use the weighted list square frame for image obstruction where
the details are suppressed rather than enhanced. Doing this at different scales produces different
degrees of abstraction which may be combined together in a specially varying manner to provide
more details in areas of interest.
And here's another example to demonstrate the progressive image obstruction with the weighted
list square framework.
So to sum up, the multi-scale edge-preserving decomposition based on the weighted list square
firm, except for some of the drawbacks of the bilateral filter and other approaches, it allows small
features to gracefully fade in magnitude without introducing significant blurring. In future work
we'd like to investigate more sophisticated schemes for smoothing the coefficients for the
weighted list square formulation in order to further improve the ability to extract details while
preserving the edges.
And another important issue that must be tackled about is the better handling of color. Our tone
management tool currently uses the LEB color space. And we have notes that after strong
manipulation the perceived color is quite different.
Okay. So that's all and thank you.
>> Richard Szeliski: Thank you.
(Applause)
Download