16103 >> Richard Szeliski: So it's my pleasure to welcome Zeev Farbman, who is a graduate student at the Hebrew University in Jerusalem. And he's here at Microsoft as an intern this summer, and he's also giving a talk at SIGRAF (phonetic) next week about some research he did with us last summer. So he's going to present a dry run of the talk. We're not going to take any questions for the first 20 minutes while he's speaking. After that we'll have 10 minutes for questions and debugging comments. Go ahead. >> Zeev Farbman: Hello everybody. Today I would like to tell you about some recent work that we have been doing on edge-preserving decomposition for multi-scale tone and detail manipulation. It's a joint work with Ryan Anafol, Daniel Shalensky and Rig Zalinski (phonetic). So let me start by reminding what is an edge-preserving decomposition. This is a process where you take an image and apply a edge-preserving smoothing filter to obtain a piece-wise smooth version of the image where small details have been removed but the strong edges still remain. Now, if you subtract the smooth version from the original, within a residual image contained in all those fine details that have been removed by this smoothing operation. These two images are often referred to as a base and detail layers. Such decomposition are used by virtually dozens of recent computational photography techniques, such as tone and high dynamic range image editing and image extraction, et cetera. Most of these techniques rely on the bilateral filter to do the edge-preserving smoothing. In this paper we're advocating an alternative approach based on the weighted list for optimization framework. We'll see that the filter based on this framework gives us better control over the spatial scale of the features that end up in the detail layer. Also, in many applications we would like to operate on details at a variety of scales rather than just at the single scale. And here's an example for such a scenario. Suppose we have this nice landscape photo and would like to make it (inaudible) and emphasis the details of it. So by manipulating details of the finest scale, we get this effect, but we can get very different effect by manipulating details at some other scale made through the course. In practice, a photographer would most likely to combine together the result of detail manipulation at all those scales to get maybe something like this result. And this type of manipulation is exactly the kind of thing that our multi-scale decompositions are good for. Now that we have covered the motivation for our work, we can go a bit deeper into this whole issue of edge-preserving filtering. And let's start by reminding ourselves what are our expectations from an ideal edge-preserving operator? What we need to do is to be able to smooth the signal while preserving the shape of the significant change in the signal and then cleanly extract the details at a given scale, which then can be manipulated and recombined with the base to yield the result. When performing the smoothing, it is important to avoid blurring of the strong edges since this introduces ringing into the detail layer and may result in halos once the base and details are recombined. Trying to use a piece-wise constant segmentation in order to extract the detail layer is also not so good idea as it will oversharpen, say, the edges and make all things traversal of the artifacts near the strong edges. Because there's an extensive amount of previous work in the field of edge-preserving filters, and we will address it really briefly by mentioning the methods that were or are still used for computational photography-related tasks that require base detail separation. And one of the earlier edge filters was an isotropic defusion. Originally, it was not devised as a base detail separation method but more as a technique to facilitate the early vision algorithm such as edge detection and image segmentation. In this case oversharpening may actually be considered a desirable feature but make it less suitable for detail extraction. Also, it's a theoretic set of somewhat results in somewhat slower (inaudible) and because of this reason the diffusion has not been used much in the computational photography. And currently the bilateral filter is the most common edge-preserving filter in computer graphics. Okay. So let's remind ourselves how it works. Applying the bilateral filter on image I at point P basically amounts to computing the weighted average of the surrounding pixel cue. The actual weight of each surrounding pixel cue is determined by both the spatial distance of pixel Q from T and the difference in intensity. Because of this the image is filtered with the partially varying kernel which is able to preserve strong edges. The shape of the kernel is controlled by two parameters. The spatial full of parameter sigma S and the range full of parameter sigma R. This worked reasonably well as long as the smoothing is fairly mild. For example, this is what we get with the fairly small set of parameters. You can see that the texture has become smoother but not the silent edges, which is good. Now if we try to remove larger scale details, for example, if we try to get rid of the texture on the statues, we'll find it's quite difficult to do. As you can see on the sequence here, you've been quizzed only the spatial support of the filter, sigma S. Some small scale details simply refuse to go away and some things even reappear especially near the strong edges. This may appear counterintuitive at first, but the reasons is previous sigma S each pixel will return to find more pixels with those values and those pixels will dominate the weighted average. So in order to achieve a more aggressive smoothing, it is necessary to increase the range parameter sigma R as well, but this automatically causes even some of the stronger edges to become blurry and yet some small scale details still remain unfiltered. And, of course, in the limit and quiz in sigma R makes the bilateral filter behave increasingly more like a linear filter. So to summarize, in order to produce progressively coarser images with a bilateral filter, we need to increase sigma R which tends to blur some of the edges that we probably would like to preserve. And as we already mentioned, blurring introduces halos. So, for example, suppose we want to boost the details in this image. If you extract the medium scale details with the bilateral filter and try to boost them, we get an image with the halo artifacts. And this is something that the photographers really don't like to see in their images, at least without having the control of it. So one possible way to overcome the problems with the bilateral filter is to build upon the fact that it's quite effective when kernels are small and conservative. And instead of trying to do all the smoothing in one iteration, we can try to apply it in an authoritative manner, in each set applying small kernel on the result of the previous iteration. So this was the main idea in pholatal (phonetic) work from the last C graph (phonetic). And let's just look at the results of this method. It produces a more effective image coarsening. But notice some small details remain even in the coarsest levels. And it also oversharpens some of the edges which have been already mentioned can cause thin radiant artifacts near the edges. Another alternative is the tri lateral filter. It is governed by signal parameter, sigma C, and let's just see what happens when we change this parameter. It managed to filter out most of the fine scale detail but introduces a variety of strong artifacts near the edges. Finally, this very long prologue, we get to the approach that we advocate which is the use of the weighted list squares optimization framework to perform an edge-preserving smoothing. Informally speaking, what we do is the following: Given an input image G, we seek a new image U, which, on one hand, is as close as possible to G, especially near the strong edges and at the same time is as smooth as possible elsewhere. Let's try to make it a little bit more formal. For simplicity, let's switch to one different now. First, the similarity between the original signal G and the result U can be obtained by minimizing the following expression. Square of U minus G at each point P. Clearly the result here is simply U equals G. And in order to get edge-preserving smoothing we need to add the smoothness requirement as well. Further, simple smoothness requirement would be to minimize the squares of the first derivative of U. So let's revise our problem accordingly. This gives us a much smoother result but not very edge-preserving one, since the smoothness requirement applies everywhere. So in order to make it work, we should add the last part of our requirement, of our intuition, and it is the fact we do not need U to be smooth everywhere. Across significant edges in G, we can relax this requirement. Our way to convey this intuition is to add a third pixel map A which should have a high weight away from the edges and low weight across the edges. The purpose of map A could take many different forms. Probably one of the simplest is dramatic equal to the inverse of the derivative of G raised in some power alpha. Epsilon simply takes care of zero division. So now in places where the derivative of G is large, we'll get a low weight on the smoothness term. And U is forced to be more similar to G at this point. Okay. So that's our function. Now by minimizing it, we hope to achieve an edge-preserving smoothing. Okay. Now let's switch back from 1-D signal to 2-D images. Things stay pretty much the same except that, instead of minimizing the derivative of U, we would like to minimize the partial derivatives in both X and U directions at each pixel P. We also would like to add a factual lambda which will balance between the smoothness requirement and the similarity requirement. By increasing the lambda, we can make the result progressively smoother. The exact connection between the parameter lambda and the degree of smoothing can be hard to analyze because we can't immediately apply a linear filter theory since our operator is especially variant. But using the fact that away from the strong edges our operator very much resemble especially invariant operator, we can figure out some interesting connections and more details on this can be found in the paper. So in order to find U, we need to minimize this expression. And since this is a quadratic form in U, we can compute U by solving this linear system where L of A is nonhomogenous Laplacian metrics. The coefficients depend on the input image. In other words, we have a closed form solution for U. That pretty much sums up the exposition of the edge-preserving operator. Now, the decomposition construct it in a way that is very similar to that of the Laplacian, regular Laplacian pyramid. We can either apply our operator on the original image multiple times with different lambda parameters each time, or apply the smoothing operator interactively. The alternative scheme is useful when we are interested in aggressive smoothing and detail removal and not too concerned about other sharpening. Regardless of how we computed the course and version of the image, the detail layers are simply defined as a difference or ratios between two subsequent levels. And this is how the final multi-scale decomposition can look like. Of course, the exact set of these is very application-dependent. Now we've got the decomposition. Let's see what kind of stuff we can do with it. Then we implemented a number of simple tools that use our decomposition for things like contrast manipulation, high dynamic range enhancement code mapping detail enhancement image obstruction. The purpose of these tools is to demonstrate that one can easily improve the results of the existing applications by simply switching to the weighted list squares-based filtering. Okay. Let's talk with multi-scale contrast manipulation. And here's our simple tool for manipulating local contrast at different scales based on our multi-scale decomposition. Once the decomposition has been computed, the different layers can be manipulated at the interpretive rates. Note that the manipulation range is very wide. It takes an extreme manipulation to cause the artifacts to appear. Manipulative medium detail layer can adapt to the image while coarse detail manipulation is closer to global contrast adjustment. And, of course, we can combine adjustments of all detail layers to visually pleasing final result. Those of you familiar with Adobe Photo Shop or other similar tools know you can achieve a local contrast adjustment with unsharp masking, which actually can show up as detail boosting where we have a regular low pass filter for base detail separation. So it's interesting to compare between these two and on this slide you can see the result of the unsharp masking. And here's the result of a weighted list square-based contrast enhancement. You can probably see that away from the strong edges things pretty much similar, but in the (inaudible) strong edges, the unsharp masking is prone to halos, which we can avoid with the weighted list square smoothing. And another application is detail iteration from a (inaudible) paper. The idea is to combine details from a collection of images which have been taken under different light conditions in order to create single imagery with details. Upon close inspection in the original (inaudible) results, we can find thin gradient reverse attributes that can be attributed to over-sharpening the filter. Those can be avoided by simply switching to weighted field squared filtering. Another thing while the original results are produced by combining the detail layers for a number of images, here we demonstrate that in some cases we can generate highly saturated detail even from a single input image rather than three multi light images like in photalital (phonetic) work. Weighted list square smoothing are easy to perform a detailed perserving decompression of high dynamic range images. For example, we can simply replace the bilateral filter in the full mapping algorithm by Duran and Dorsey (phonetic) with the weighted list square based smoothing and avoid the mild halo artifacts that are sometimes visible in their results. So in their original result, you may spot some halos near the picture frames and the light fixture. Those pretty much disappear when you simply switch the underlying filtering from bilateral filter to weighted D squares. Let's see it again. The bilateral filter smoothing, and here's the weighted list square smoothing. Another option we experimented with to use tone mapping algorithm proposed by Tomblin and Tober (phonetic) replace Elsie's based multi-scale decomposition with our weighted square list decomposition. Here, the goal was to achieve a rather flat image with exaggerated local contrasts. Again, upon close inspection you can see that with the weighted list squares you have less artifacts. And here we went after more photographic log by maintaining stronger contrasts at coarser scales. Of course, we can also use the weighted list square frame for image obstruction where the details are suppressed rather than enhanced. Doing this at different scales produces different degrees of abstraction which may be combined together in a specially varying manner to provide more details in areas of interest. And here's another example to demonstrate the progressive image obstruction with the weighted list square framework. So to sum up, the multi-scale edge-preserving decomposition based on the weighted list square firm, except for some of the drawbacks of the bilateral filter and other approaches, it allows small features to gracefully fade in magnitude without introducing significant blurring. In future work we'd like to investigate more sophisticated schemes for smoothing the coefficients for the weighted list square formulation in order to further improve the ability to extract details while preserving the edges. And another important issue that must be tackled about is the better handling of color. Our tone management tool currently uses the LEB color space. And we have notes that after strong manipulation the perceived color is quite different. Okay. So that's all and thank you. >> Richard Szeliski: Thank you. (Applause)