>> Jin Li: Okay. It's a great pleasure to have Professor Xiaolin Wu from McMaster University to visit Microsoft Research and give us a talk. I know Professor Wu for a long time who has been active in many areas. The earliest paper I saw if Professor Wu is from his (inaudible) paper on basically contriving images into different color index. And lately Professor Wu have claim to fame was his excellent work on lost image coding on color codec which has been a long time service benchmark for lossless image compression. Professor Wu have also has been active in various number of international standard image compression. Today let's hear his work on standard compliant multiple description image coding by spatial multiplexing and constrained least-squares restoration. Without further ado, let's here what Professor Wu have has to say. >> Xiaolin Wu: Thanks, Jin. That is nice to see familiar faces in this small group. I don't think of it. The topic of multiple description coding has been started for the last say one decade mostly for multi media streaming. Here I emphasize standard compliance. As I will point out up to now all the techniques are mostly they present quite a departure from current practice. What I propose is something that is so very, very much in touch with the reality of today's codecs and the infrastructure, and I will appreciate you know this technique can be quite practical and -- here. Give the outline of the talk. First I give a little instruction and then I go to the technique called spatial multiplexing for multiple description coding, particularly for image coding. This thing can be generalized with video compression as well, video coding. So the short hand of what this is SMMD, okay. So after I give the architecture of what is technique, this paradigm, I'll go into some detail about how to generate a side descriptions and how to conduct side decoding and then of course, the (inaudible). And a central decoding. Okay. Then I'll present some experiment or results and then conclude. Okay. So multiple description coding has been promoted as a methodology for multimedia streaming over massive networks, particularly patch switch networks for which we have to deal with pack losses and network errors. Okay. And that is because in those applications, particularly for low time multimedia communications low transmission is possible because of a string generate delay requirements or bandwidth economy considerations. Okay. So we use resort to best effort basis, right, like UDP and the multiple discipline coding can tolerate, you know, various degrees of losses. Okay. So in other words, we can have a quality of service scalable to network conditions. All right. So the reconstruction quality is proportional to the number of packets we receive. Okay. I am not going to give a compensable review over this class of techniques, instead I just highlight some of the common methods for multiple encoding for multimedia. The probably oldest, you know, the (inaudible) started multiple decoding techniques based on multiple description quantization, so we have multiple description scaler quantizer and multiple description quantizer. So here we create multiple descriptions by quantizing samples or, you know, sample vectors into different representations. And this is a generalization of the approach of a single quantization from a single description to multiple description and then of course we have correlating transforms to produce correlated multiple descriptions. And then we have another class of techniques that is also started in great length by various researchers that is based on uneven air protection or richer protection was scalable code streams. So in other words, we apply codes like (inaudible) some code and a packed scalable code into a multiple layers using various degrees of protection so therefore we can guarantee the reconstruction quality is proportional to a number of packets received. Okay. And here, of course, we need the explicit forward air correction codes plus a scalable source code stream. Okay and as we can see none of those techniques is really standard compliant, right? They represent quite a departure from the current practice. Okay. The question is can we produce multiple descriptions for media and while, you know, we can work with current compression standards? This is the main motivation of this research. Okay. So what I propose is very simple straightforward approach based on spatial multiplexing. So this is the input source of this is a two dimensional, it could be three dimensional, even one dimensional audio, right? What you do is you multiplex in spatial domain in the single space into, you know, subsets of those sample grids. And here I deliberately split them into regular sample grids because each of them is an image by itself, so they can be compressed by any existing standard. Okay. Now, the question is this simple minded scheme were it to work, how efficient it can be, and how it compare against other MD techniques as in the previous slide. >>: What exactly do we mean by standard appliance. >> Xiaolin Wu: Standard a ->>: Each description can be decoded correctly by the (inaudible) decoder. >> Xiaolin Wu: Yeah. Okay. Let me show you the next slide, you know, maybe this will answer your question. So this is the input image, right? I will do some prefiltering, a preprocessing first, then I split that original imaging into multiple parts, right? Let's say two parts like a checker board, right? And each is a smaller image, right? So therefore I can use any third party encoder and a decoder. So in this box it's a standard or any other existing technique. >>: Well, without the (inaudible) multiplexer you're not going to get the full resolution image back, okay? You get a subset of (inaudible) subsets. >> Xiaolin Wu: Okay. So this multiplexer, you know, I don't think that can be ->>: That's the decoder side. On the decoder side without the green multiplexer of some sort you will not get a full resolution image back. >> Xiaolin Wu: True. Yeah. But this is the channel, right? I mean the reason we have standards is because we want to have this come, you know, the ->>: They're all different levels of standards. And unless you -- I mean even, you know, MPEG is not interoperable at some level, you know, depends upon your packaging. You need the file system. >>: Do they make the same argument with other approaches or can you make them work on (inaudible) subset of those can be combined. (Brief talking over) >>: For example, Apple, you know, they use (inaudible) for the icons but nobody else can -- I mean, it's proprietary as far as anybody else (inaudible) because it's not inoperable. >>: Yes, for example ->>: Right. So I think this is maybe I mean we should just take it for as it is, you know, no more or less. It's not really -- it's not interoperable at some level. >> Xiaolin Wu: Right. Right. Yeah, here ->>: The operation is (inaudible) operations are minimal. >> Xiaolin Wu: And also it's modularized, right, it's a complete detached from the channel portion of this whole system. >>: (Inaudible) because there's like hardware communications which make it (inaudible). >>: But I think in general I (inaudible) take it with a big grain of salt because it doesn't really necessarily mean that it's interoperable with anything. >> Xiaolin Wu: But in this architecture, though, you know, this part, right, it's a totally independent of what you do with the encoder and the decoder and the channel, right? Whatever you get back at the decoder, you can use it as these and then of course it's not optimal performance but you can do as I explained, you know, following how you can merge those multiple disciplines, right, in case you have more and then achieve the central decoding. And of course the performance can be scaled according to your complicity at the decoder side. Okay. So, yeah, of course, well I mean here is each of the description can be coded by existing standard. I don't have to do special type of MDQ, right, multi description quantization or I don't have to do correlating transformer or you know do error correction code to generate some other packets. Okay. So the (inaudible) again is generated by uniform spatial down sampling, right, and however before I do this down sampling we will perform some adaptive directional low pass prefiltering for the following purposes, right. First we will use this prefiltering mechanism to build some bridge, a bridge between the decoder, so in other words I expect, you know, at the decoder side the decoder will collaborate with encoder to perform MDC decoding and that this filtering operation will also build some correlation between the (inaudible). And also because it's a directional prefiltering, we preserve (inaudible) in the presence of edges, which is by the way very important for visual quality. Okay. So this is a -- okay. So this is the design principal and the technique. So first we set up the design code of this directional prefilter. So we want to preserve a maximum 2D bandwidth without aliasing due to down sampling, so one would be down sample we reduce the color frequency and then (inaudible) aliasing, right. So now the thing is how can we, you know, get to the maximum band pass region in two dimensional without aliasing and so this is what I mean by those diagrams. You can see, okay. And I can have a directional filters and then the those boxes were -- those boxes represented the pass region in this 2D spectral space, right, and then this is the frequency in each direction and so this is for the angles around and this is for this diagonal angle. This is diagonal, this is tangent one to two or one half. I mean, we have those six cases then you have some other symmetry cases. Okay. And it's easy to show, okay. There are only eight directional filters that have the maximum pass. Okay. The maximum region you can pass without aliasing is pi square. There are only eight directions that can achieve this maximum region, okay. And this is difficult to have a close form representation for those cases, so I use this table. >>: I guess I missed. So you're going to prefilter before you down. >> Xiaolin Wu: Down sample, yes. >> Jin Li: You get a prefilter in the direction beta? >> Xiaolin Wu: Yes. This is the prefilter design based on different angles. >>: Okay. You're going to be like image adaptive somehow? >> Xiaolin Wu: Yeah, yeah, yeah, that's what I say adaptive prefilter. >>: To the content. >> Xiaolin Wu: To the local (inaudible). >>: And then you're going to filter again after you at the decoding, the reconstruction side. >> Xiaolin Wu: At the reconstruction side I'll do the (inaudible). >>: (Inaudible) that means you are going to use four different angles? >> Xiaolin Wu: Okay. So here, right? So this is example. I can produce four descriptions, I can also produce two descriptions, for instance. Let's say each of those cases the sampling rate is reduced by half, right? >>: Okay. >> Xiaolin Wu: So the frequency is reduced by half. So under this let's say I reduce the scaling factor by half, reduce the sampling frequency by half, then the question is what is the best prefilter without introducing any alias, right? >>: Then you're going to filter the long edge or the cross edge? >> Xiaolin Wu: Along the edge, right. Okay. >>: Low frequency like along the edge, right? >> Xiaolin Wu: Yeah. So ->>: Why do you ->> Xiaolin Wu: Because you want to preserve the edge content that you -- in that case, you achieve maximum imaging compaction, right? >>: But there's already -- there's already low frequency along that direction, right? >> Xiaolin Wu: So that's why you want to preserve that. I mean, yeah, because against the (inaudible) direction in the gridding direction you are going to introduce alias anyway, right? >>: Right. That's why I'm not quite understanding. You have alias thing on edges like anyway, right? Not even prefilter. >> Xiaolin Wu: Yeah, so that's why ->>: Not even prefiltering in that direction, he's prefiltering along the low path. >> Xiaolin Wu: Yeah. >>: Zero frequency. So what is there to be gained by -- >> Xiaolin Wu: Because you know against the AD direction in the maximum grid direction there's alias. You don't want that, right, so you want to cut off that. So you prefilter ->>: But you're not doing any filtering in that direction, right? Or you are? >> Xiaolin Wu: I am. >>: (Inaudible). Can you give some examples? >> Xiaolin Wu: Yeah. For example in this case, this is the ->>: The edge goes like that? >> Xiaolin Wu: Right. So you see I mean the frequency against the edge, the path. >>: This is the filter? >> Xiaolin Wu: Yeah, the filter response. Yeah. Right. So therefore you allow, you know, you allow a very high bandwidth, right, in the low frequency direction because you know, you know, you are not going to suffer from aliasing. >>: Okay. That's -- all right. So that's what I meant. Okay. So you're going to do a low pass filtering kind of make smooth out the edges? >> Xiaolin Wu: Yeah, yeah. >>: Okay. >> Xiaolin Wu: Right. And against the edges I smooth, right, because I know ->>: Okay. >> Xiaolin Wu: You know, this down sampling scheme will introduce error, right. So in other words, you know, I want to preserve as much as possible, right? So I want to design the region without aliasing, right? Then the question is, you know, what is the maximum possible region that is a pi square, right? Because if you take the periodical tiles, right, you don't want those tiles to overlap, right, so then you can see, you know, this is the best you can do in those angles. Okay. So therefore there are only eight angles, you know, zero the horizontal and vertical and the tangent half and the two right you have a plus, minus and also you have the diagonal, two diagonals. So for those special angles you can achieve a maximum region of a pi square without overlapping, you know. Without introducing alias. >>: So the (inaudible) you're referring to? >> Xiaolin Wu: Here? >>: Yes. >> Xiaolin Wu: That's the bandwidth. Bandwidth in low frequency, bandwidth in high, right. And this is the zeta, that is the parameter for the directions. Okay. So for instance let use this type. So for horizontal or vertical direction, right, and this is -- this is two pi in width so your bandwidth is a pi, right, and in this case, you know, this bandwidth is a much louder or in a low pass range so it in this case is not symmetric. So you have the right WL pi over root 5 and it is root 5 times pi. >>: So (inaudible) the area of the prefilter pass as long as it's pi squared then we're guaranteed to have (inaudible). >> Xiaolin Wu: That is I mean without aliasing on that condition, the maximum region you can pass is pi square, right? Now, the question is what filters, direction of filters can achieve that area. There are only eight cases. >>: Eight (inaudible) >> Xiaolin Wu: Right. Then my possible rating has to be smaller than pi square without aliasing, right. So that's what I was saying. Okay. I want to achieve the maximum preservation of the frequency without -- you know, in this down sampling scheme, right, I want to preserve the maximum frequency without aliasing then there are only eight angles that allow me to achieve that. So that's why I designed those eight angle prefilters, right. Enough cause in (inaudible), you know, I have to quantifies those edge angles into those eight cases. Right. And some angle in between, this is not a achievable. Yes. That is true. Okay. So, you know, there are many possible of course this is ideal filter, right. In practice you have to use some real filters to get to filters to achieve that for instance we can use the function to achieve those directional prefilters and here you can apply a window function in order to achieve, to reduce the ringing effect. Okay. Now, this is the pre -- here is the preprocessing, this is, you know, prior to down sampling you apply adaptive prefiltering and this is (inaudible) this is a single adaptive, right, point by point we try to estimate the direction of the gradient then apply the filter. Okay. After that we down sample and then we decompress, right. At the decoder side we have this decompressed subsample image, I would say decompressed prefiltered subsample image, right? And then our goal is to reconstruct the original from this down sampled compressed image, okay. And so let's look at a one side description if we get only one quarter of the samples like this, and then this is a the reconstruction problem. And to achieve this, we apply a so-called windowed autoregressive model or model so in other words we assume the image is piecewise stationary, so in a small window we can apply this autoregressive model, each pixel can be written as a linear combination of its labors plus some (inaudible). Okay. And then the regressive coefficients are here or will be a two dimensional process as well, and we assume that is, you know, piecewise stationary and so in other words this single alpha changes slowly under this assumption then we can learn this model in a local window, right, where you try to estimate the model parameters alpha, you know, overlap the window fashion. Okay. So this is the technique or algorithm. Let me introduce some symbols, okay. So X here is the represents the pixels in original image I and the Y are pixel in the down sampled image I down sampled here. W is a local window, and then I use this particular symbol I diamond T for the four connect labors of our particular pixel XI. I is the two dimensional index for the pixel position and the here the plus, plus means the full connect labor for a particular pixel XI and likewise X cross represented the eight connected labors of a particular pixel. Okay. And the task is to estimate a block of high resolution pixel, the original samples X in this local window using this autoregressive model. Two models, okay, model A and B. Okay. So or maybe I should use our diagram to show this. Okay. So this is kind of piece X. It's a four diagonal neighbors, eight connect neighbors and the signal is X cross, right, with respect to I. Okay. So this is 0, 1, 2, 3, offset, and this is another spatial configuration current sample relationship to its four connect labors, north, south, east, west, east, west, right, like that. Okay. So I will use two models in two spatial configurations to estimate the center pixel XI. The reason I use two instead of a single order eight model is because I want to avoid data overfeeding because I have to estimate those parameters A1, A2, you know, those regression coefficients locally, right. If I have too high an order, then I don't -- I may not have enough samples to do this properly. Okay. So I split that into two. So I have two models to fit the pixel XI, okay. >>: (Inaudible). >> Xiaolin Wu: T is the labors, right. >>: (Inaudible). >> Xiaolin Wu: T is the index labors. So here, 0, 1, 2, 3, so it's right and of course I have a diagonal model, I have an axle model, right, I need to optimally blend them. So there would be two weights to be determined. Because I cross and I plus, right. And I'll explain how those rates can be computed. Okay. So this is the overall objective function on which we tried to estimate the original pixel I in, X in a local window W, okay. So you can see this is a real problem, right, because here I actually have the relationship between the unknowns, right? Okay. And then there are a few loose ends we have to tie up, first how to determine those coefficients A and B and also those weights. Suppose we have them, right? Then you know, we still have a year post problem because this cost function may allow a trivial solution, right, if I put all those X -estimate of flat signal, then this will be minimized, right? So if I make a -- the two dimensional (inaudible) in this window flat, right. So therefore I need a constraint right. So what do we know? What we know is this decoded image Y, okay. So Y is the pixels in the decoded down sample image, right? So therefore this X convoluted by this prefilter, right, will agree to the decode the image. That's what I receive. Right. Then of course remember why is compressed so therefore I still have reconstruction error on the Y, so A of R, R here is the rate of this compression scheme, right. >>: (Inaudible). >> Xiaolin Wu: Quantization error. Yeah. So this is the -- yes, this is the compression distortion, right. Okay. So I want to have the best model fate subject to this constraint. This constraint is given by the system, right? So this is the prefilter I used at the encoder side, right, so this is the receive the compression image. Okay. This is the quantization error estimate. Okay. I want ->>: But where you're only -- this is for -- are you only receiving some ->> Xiaolin Wu: I only receive one sight description. >>: One description. >> Xiaolin Wu: One description which is a down sample compared to version of I, right, and Y values alone, that's the only thing I can observe. This is receive the image. But this Y, right in but I know the original image convoluted by that directional filter down sample minus that should be the. >>: The convolution includes the down sampling. >> Xiaolin Wu: Yes, prefilter down sampling, yeah. The cascade of those two operations. And I know, right, this difference should be the quantization loss. Okay. So I tried to get the best model you subject to this constraint, this constraint is due to the pipeline of the filtering down sample and the compression, right. Okay. So now we have to solve those problems how to estimate A and B and how to determine those weights. Okay. So this is the way we compute the coefficients or the model parameters for the diagonal model and for the actual model so again we just solve this and (inaudible) try to estimate the model parameters, right. So we take a small window and fit and here of course we can only use observables. Those are the decompressed pixel, right. >>: The decompression (inaudible) you don't have that (inaudible) have the YI, we have the cross and plus. >> Xiaolin Wu: Yes, I mean this down sample I still have the square lattice, right? I still have the cross and the diagonal and the axle correlations to lay out. But you got a point, yes. In this case the scale has changed, right, because it's down sampled, so therefore, you know, in terms of the size it's doubled in spatial domain. So here of course we have to make assumption that the (inaudible) matrix does thought change by down sampling which is a big assumption, right? But for large scale edges, this may be a reasonable assumption, right? If the edge or structure have a large scales then so for instance if you straight line and the directional correlation does not change. So that's why later you will see, you know, this technique has some merits. So you have reconstruct edges very nicely. That is because, you know, those model parameters can be satisfactorily estimated from the signal Y from the down sample to signal Y. So this is how we build the model, right? >>: (Inaudible) minimize the joint. >> Xiaolin Wu: So here, right? >>: Function go down to (inaudible) parameter you just minimize these who are not the same, right, so there a step to go (inaudible). >> Xiaolin Wu: Right. So this is the objective function for the entire task, right? And here we have to know the model parameters A and B. The next slide just shows how to estimate A and B? >>: Actually (inaudible) minimized in. >> Xiaolin Wu: Okay. Yeah, that's an excellent point. If I say I want to minimize this objective function by treating both X, A and B, all of them as optimization variables, I can do that, right? But then it becomes (inaudible) problem which is much harder. Right. So what I -- I adapt this approach is a separation approach. We first estimate the A and the B. And then the plotting the estimate A had and B had into this objective function and then try to do -- try to solve this problem, then this problem becomes linear this square problem. Which has an efficient solution, right? But enough (inaudible) yes, we have also tried non linear approach. So in other words, this minimization is not only with respect to X, right, we also serve through A and B try to do this, right? But ->>: (Inaudible) next slide. >> Xiaolin Wu: Okay. >>: So you've got -- you're training the As on your (inaudible) but those are down sampled, right? >> Xiaolin Wu: Right. So (inaudible) asked the same question, yes. >>: There's a dispatch but it's not minimizing over A and B in the other one. You're taking approximation assuming it would be our known but minimize over X. >> Xiaolin Wu: Right, right. So I mean ideally, yes, this A and the B should be estimated on X, right? From (inaudible) but that becomes a non linear problem, right, so all we do is try to do it in two steps. >>: Okay. But it's not even on the same resolution, right? >> Xiaolin Wu: Right. >>: But A is supposed to be a predictor for high resolution. >> Xiaolin Wu: For high resolution, yes. But in this formula A and B are estimate if I'm in the low resolution. Yeah. Yeah. But you know computationally, you can -- but then this is chicken egg situation, right? I don't know eggs, how can I estimate A and B. Without A and B, how can I know X, right? Because X will be (inaudible) on the Y. >>: Sure. Well, you could do -- I mean another approach might be you train on higher resolution images and you send the coefficients over to the decoder and you reconstruct maybe with a fixed filter. >> Xiaolin Wu: Right, but that -- I think that is highly single dependent, right. So you -- I mean depends on how much (inaudible) you want to send, right? >>: This not (inaudible). >> Xiaolin Wu: Yes. So this is point by point estimate. >>: (Inaudible). >> Xiaolin Wu: Too much side information. Of course another thing you use a training set like you design and then you don't send. But okay, any way, those are the you know ->>: You have a signal X and you down sample that by 4, how does that (inaudible) compare to the one before you down size? >>: Yeah, exactly. >>: I can take that it's probably not too far. The signal (inaudible). >> Xiaolin Wu: Yeah, yeah, yeah. >>: Like if I get closer to you, maybe the statistics don't change all that much. >> Xiaolin Wu: Yeah. This is the, you know, the basic assumption we have to make, right? I mean if the single structures large scale then this is okay, right? As you said. I mean the covariance makes me stop. >>: Okay. >> Xiaolin Wu: I mean those things don't change, second order statistics don't change that much, right? In original sampling rate is high enough. >>: (Inaudible). The low frequency (inaudible). >> Xiaolin Wu: Yeah. And so this is how we determine A and B, and then this becomes a linearly square problem, right? And then of course we have to compute those weights. Those weights can be -- by the way, I have to say something about you know this is an important term, this -- you know this constraint, right? Because the origin estimation problem and you see your post and many, many solutions can satisfy that. So therefore constraints play an important role. The constraints here, you know, I already explained, right? This is the cascade of prefiltering down sampling, this is the original (inaudible) for the estimate, this is the compressed version we receive, right? So this is the system constraint, okay? Okay. So here we have to know because the prefilter is adaptive, right? Is changing from point to point. So therefore we have to know this prefilter kernel, okay because we use a directional filter we don't need to send the side information, the decoder will try to estimate the direction at the same spot and then they guess about the prefilter that encoder uses. So it prints out this estimate quite good, right, with only two side information. Okay. So the wings of those two models are determined by just you know this classic linearly square weights, right, and those error terms, error plus, error plus are the estimating errors of those two objective functions that we use to estimate the A and the B, right? So they represent how good those two models fit in a local neighborhood, right, assuming those errors are independent and then we have this optimal weights. Okay. So now we can put all those pieces together and stay to the (inaudible) square problem so once we have the weights we have the model parameters, we have the constraint, right, then we try to solve this long ranging form constraint optimization problem can be solved like this. Okay. Another interesting interpretation of this scheme is you can consider this as adaptive (inaudible) predictive decoding, right? Because in that object function you know we try to predict each unknown pixel using labors. This is a (inaudible), right and the predictor is trained point by point so it's adaptive, right. So I believe this is the reason we get a very good performance as we will see later. Okay. So you can consider those autoagressive models as (inaudible) castle predictors, adaptive one. Okay. And so this is the side decoder. Now, if we get more than one descriptions, then we have to perform the so-called central decoding, so take two descriptions as example, and again, you know, the decoder can be -- the encoder and the decoder can be any third party scheme, right, where you consider MDCD coder independent of the each side description scheme, okay. What we do is we do deep multiplexing. So in other words, we can simply just do a mosaic, right, and the combine those samples into a sampling grid, okay. And then of course because now we have much more information to work with, probably we can do much better than someone just do a mosaic, right? So later now we'll see how the central decoder can rely on all the pieces of information available to it. >>: (Inaudible). >> Xiaolin Wu: Yes? >>: I mean we can see this framework is (inaudible) restoration framework. I wonder if you can consider other possible framework for the solution (inaudible) some of the model I see in the past is the model identify (inaudible). Then the description you receive is constraint which basically (inaudible) descriptions, you just try to solve this combined optimization problem. >> Xiaolin Wu: Using conditional (inaudible) even, you know -- I mean you do this maximum likelihood estimate using ->>: Basically you try to estimate the lowest energy point for the constraint and then basically look back into the constraint to see if the coding to the description you receive if it's out and just turncate them back into the ->> Xiaolin Wu: To the -- yeah. To the boundary and then. >>: Boundary and then (inaudible). >> Xiaolin Wu: Yes. >>: Using basically (inaudible) what do you think about that? >> Xiaolin Wu: We haven't tried that, but I think that's an excellent suggestion. So for instance here we use just a classic D square approach, we assume it's autoregressive process, we estimate the parameters. I mean, so here we take adaptive prediction approach. But you know, this kind of relationship could be a markup field, right? So yes, in this framework you could have a different reconstruction approach using the decoded image as a constraint and try to instead of minimize square error, you know, for instance you do (inaudible) right, so you can, yeah, just like, yeah, it becomes a different admission technique, right. But nevertheless, you know, this approach will accommodate that, right? I mean, the only thing here of course is how to use this. I mean, this is the only long thing, right? This is the only information or the fact we have. We know the original source gone through this kind of a degradation, right? It's mapped to Y. That's the only thing we have. >>: How big is W? >> Xiaolin Wu: How big is W? 5 by 5, 7 by 7. >>: So I'm a little confused. Are you tiling the image into these regions? >> Xiaolin Wu: No, it's overlap. >>: Okay. So if they're overlapped, then you're just determining what, the center pixel? >> Xiaolin Wu: Yes, indeed. >>: But the minimization doesn't really. >> Xiaolin Wu: Minimize on a single point, right? >>: Yeah. >> Xiaolin Wu: Right. So minimization is performed on a block. >>: Okay. So you find the X in that block and then what, just. >> Xiaolin Wu: Fix the point X. Then move on. >>: Okay. >> Xiaolin Wu: Right. And then of course that's a good point. If you do this, then for every single pixel you have a multiple estimates, right? >>: Right. >> Xiaolin Wu: You could be a center, you could be a corner, right? But then you can (inaudible) in fact you can. If you do this overlapping businesses eventually you have a multiple estimates on everybody. Right. So then -- so that's why, you know, Jin said that you could model in the markup field. Then you can, right, consider this interplay using some kind of maximum likelihood estimate. >>: So what will happen if you just had your -- just had a single, you know, estimated a single pixel? >> Xiaolin Wu: Oh, then the (inaudible). >>: I mean based -- the objective would be the single ->> Xiaolin Wu: Pixel. And then ->>: But of course -- I mean you're still using your neighborhood and all that stuff. >> Xiaolin Wu: Yeah. For instance if you want. >>: The constraints are still in the neighborhood. >> Xiaolin Wu: Constraint is still neighborhood and then once you have those parameters A and B estimated, then it becomes simply just -- simple interpolation, right? >>: (Inaudible) one pixel (inaudible) minus one then is (inaudible) one pixel at a time (inaudible). >> Xiaolin Wu: Yeah. But then of course we don't have to fix just single point, we could fix kernel pop, right, the interior. So this is just the implementation. I mean, the most expensive part is I only fix a single point in the middle. Because you see here we make a lot of assumptions, we think of this thing, these thing piecewise stationary, I mean how about at this point is right on the border of a (inaudible), edge. Then this window may not. >>: Yeah. So I see how this is leading back toward Jin's mark of (inaudible). >> Xiaolin Wu: Right. >>: I guess I was assuming supposing you know all the other Xs around, then you could reformulate this as, you know, find the X in the middle -- point in the middle that minimizes some prediction error subject to this constraints. >> Xiaolin Wu: Yes. >>: But you don't know all those other. >> Xiaolin Wu: Right. I mean all those Xs are known as well. >>: Yes. So you have to do it instead of doing it all jointly somehow, which would be done in say a iterative mode or ->> Xiaolin Wu: Using message passing, that kind of a trick to do it, right? >>: Yeah. You're just doing it this way. >> Xiaolin Wu: Right. I mean, this way is almost equivalent, right? You estimate a block instead of a single point and then you move on. So you take a context of a large neighborhood. Okay. So then of course if you have a multiple minutes and you know there are spatial configurations, then the same problem can be posted, right, and solved. Okay. So those are the details. I suppose. Okay. And yeah, I don't think we need to go into the details, but the one thing we can explain why the central decoder will be more powerful then it's because now I have more constraints, right? You have -- if I have two descriptions, I have this checkerboard, right I have the black spots and then the white spots. For all of them I can establish for every basis point I can (inaudible) a constraint. And also the model parameters will be better estimated. Now I have more samples, right, if I have more descriptions. And also you remember, with those in the (inaudible) information about the filter at then coder. If I have more samples about the direction estimate it will be better so I can get better estimate of the kernel. So way more I can improve all those -- improve the quality of all the input data for that optimization problem, and I have a better solution for the central decoder. Okay. So I will show some experiments. We compare the scheme against some recent minute NDC techniques. One is done by (inaudible) and who used to be here, right? And he's now. >>: (Inaudible). >> Xiaolin Wu: He's now at (inaudible). In fact I visit him tomorrow. I will drive to Vancouver tomorrow, right. So they publish a paper in this year's DCC. They use filter banks for (inaudible) and then there's another paper, a recent paper encoding. And those two other (inaudible) at the best performed methods. Okay. So here is an illustration. Okay. Let me explain. This is (inaudible), this is a bike, okay, and those three solid lines that appears into those three competing method, okay, the one was triangle or be ours a circle is technique of (inaudible) difference one and the (inaudible) is another (inaudible) technique from reference two. Okay. And the access is the bit rate per side description. Number of bits per sample in each side description. Okay. >>: Number of bits per sample on side description do you convert it back to the whole image? >> Xiaolin Wu: No. No. So in other words, here the side description is coded by JPEG 2000 and .88 ->>: (Inaudible). >> Xiaolin Wu: Yeah. And the smaller resolution. Right. Okay. Yeah. So this is a more complex image byte. >>: And when (inaudible) dotted line and solid line. >> Xiaolin Wu: Okay. The dotted line will be the side, right. This is the center. So there's two descriptions. So you can see there's a gap up to 3DD? >>: (Inaudible) >> Xiaolin Wu: Yes. We haven't implemented that. We've only implemented the two descriptions and the two descriptions shift by half a half pixel, half pixel, the optimal shift. So it's a checkerboard. Right. Yeah, you can see here roughly 3DD from a side to a center. And interesting thing is, you know, our side and the center were above the competing methods, right? Okay. So I didn't tell the whole story. And when the rate gets higher those curves were costs. So our scheme has excellent performance at a low to media bit rate. I mean that can be -- that is understandable because we have done down sampling, right, and we basically disregarded the high frequency single component. Then you know if the rate gets much higher than this down sampling scheme is self optimal. So here is another example, so the image flower and the fruit. Okay. So okay. So here are the visual comparison, so this is method one, method two and our method at the top reside and the bottom row is the center. So maybe just kind of reduce the. >>: One should be (inaudible). >> Xiaolin Wu: So this is at the bit rate a quarter pixel. Here, right. Okay. It's a wrap around. There is a delay. Okay. So I think the visual quality prove on this even more significant than the TSR numbers who suggest so for instance if you're (inaudible) at the side, right, this is quite significant. Not as much as in the center. Okay. So this is a dark image. So this is a competing method of a side description, you know, here you can see yeah, good, thank you. All right. So this is the proposed method SMMD at the point four bit per pixel. This is the MDC technique in reference one at the same bit rate. This is the side. This is the center, two descriptions. >>: How do these compare to the single description? >> Xiaolin Wu: Okay. I'll come to that point. Single description say (inaudible) right? Yeah. >>: (Inaudible) other question is that here I see two technologies (inaudible) why is (inaudible) the other is the (inaudible) estimation. Can we see results for each of the portions, let's say we just apply prefiltering. >> Xiaolin Wu: Oh, none are adaptive prefiltering, right? >>: Let's say we do a pi adaptive prefiltering but let's say we just use a very simple post process just to buy linear and (inaudible) at the end. And the other is basically let's say don't do some prefiltering. (Inaudible) for example SSMD approach is more appealing than the other approach. I want to see where the gain -- I mean exam part of the gain is provided by prefilter, which part of the gain provided by (inaudible). >> Xiaolin Wu: So without prefiltering, right, you knew something from a three point -- .3DB to a .6DB in that range. I just had a discussion with my student this morning over the phone. I mean, I'm talking about the ongoing project. >>: So prefiltering provide more gains for the o. >> Xiaolin Wu: No, I think the restoration is more important. Let's say you replace adapt this restoration by as you said by cubic, right, and then the deterioration would be more than one DT, right, if it's ->>: I mean this restoration thing is like, I mean they've been using this sort of thing for super resolution. >> Xiaolin Wu: Yeah, yeah. That's why we, you know, found this application. I mean we started with super resolution. >>: I mean like Allen Gershow (phonetic) would put in just say (inaudible). >> Xiaolin Wu: I didn't know Allen Gershow did this. >>: I think the way you've done it is probably unique, you know, but this sort of approach where you use a -- you know you predict using a additional expectation you predict the pixel up there given the (inaudible) based on the conditional expectation model somehow and you've modelled it in a particular way so it's adaptive sort of across the whole image is kind of a common technique. So it seems like what -- what's more unusual maybe to me is the prefiltering part. I don't know about which contributes most to the theme theory, but ->> Xiaolin Wu: I think the prefiltering contributes as you say the .5DD is not that much. In fact, you can just do non adaptive prefiltering. Okay. Prefiltering, yeah, yeah. Okay. Remember we are talking about MDC, right? So there's a trade-off between the side and the center, right? So in this case, you know, our side description has a fairly good quality compared with some other schemes. Right? So that is because the two sides, two side have a fairly high correlation, right? So if you get -- do you know description you don't get too much new information so the center distortion does not drop that much, right? Particularly at high rate. So the prefiltering actually can (inaudible) how much correlation you can introduce between the two correlations, right, because they are interlocked, right. So if you get a, you know, a larger filter kernel and then you know you correlated the two, better. You have a, you know, impulse sampling then the two are more or less independent, so you favor the center, right? So that is another, you know, MD -- that's unique to MDC. Right. So you know, prefiltering we're also embed some scheme for the collaboration (inaudible) and decode as I said earlier, because those are the constraints we use. For super resolution though those things are (inaudible), right? That is the point sphere function, the point sphere function keeps you, that's what you work with. I mean here you have the freedom to choose to design. But then I agree, I mean, the framework is the same. I mean the restoration problem has been started for long time, yes, for instance (inaudible) super resolution, right. Interpolation, yes. But I haven't seen that approach used in compression, right. So, yeah and later, yeah, let's go there. I mean so here you can see SMMD has a much better visual quality than others even though the (inaudible) number is actually a little bit lower in this case, right? Okay. So this is due to this particular model we use, right, we use the direction of prefiltering and then we use this autoregression model which has you know -- you know, remember we have to estimate the model parameters if the low resolution minute. That estimate is good if you have, you know, large edges or large scale structures, right? In that case, then the single statistic doesn't change in scale. I mean if that's the case we do a good job. So and also I mean in this direction of course, you know, you have a high frequency but in -- you know because we do prefiltering like that and our model will learn this directional correlation well. So we reconstruct those edges much better say than those techniques. So this is some advantage we want to emphasize. Okay. And this is from the side to the center, so you see the improvement of a visual details. All right. So now if you get additional packets then -- so you can consider this as scalable code as well. Right. You know, low (inaudible) narrative scalable scheme. Okay. So, you know, rate distortion performance remark. So now the question is if you do this down sampling business where you necessarily suffer in terms of the distortion performance, right? So here is my argument, you know, just by (inaudible) principal, we know that if this inequality holds, right, in other words whatever I discard, right, this is the discarded energy from my color frequency. If this part is less than the distortion given the packet a rate, this is the distortion function of the source. Then I'm okay. I can, you know, I can safely discard, right? So, you know, for (inaudible) signal, you know, the (inaudible) I suppose, you know ->>: (Inaudible) this approach is more okay at low bit rate (inaudible) more safe than a low bit rate than. >> Xiaolin Wu: Yes, that's the argument, yes. In deed all the experiment and results show we have a very nice performance at low rates. >>: So how about higher rate. >> Xiaolin Wu: A higher rate then we start to lose, right. So this is the single description, right? Phil just asked this question. So in its own right this scheme is superior low rate code. And in fact, it has a higher (inaudible) you know, a very good low rate code, right? And the way we do (inaudible) JPEG 2000 using this down sampling, right, at an encoder side you treat it as an inverse problem and solve it using some knowledge. And so, yeah, this is a table to show some experiment results. So this is just a single side description which is, you know, basically a combination of code now if you don't get an additional descriptions. Right. And here are the rates, (inaudible) leaves, flower and a bike. I mean those minutes I've shown you, right. So this is a bike. >>: (Inaudible). >> Xiaolin Wu: Even with one description it's better. >>: One description I can understand. >> Xiaolin Wu: Two descriptions for the total rate of course it's worse, right. Yeah, yeah. >>: Okay let's say you work at (inaudible). Did that point 3 (inaudible) that means you're asking the JPEG to solve and to work at 0.075 (inaudible)? >> Xiaolin Wu: I think this is the with respect to (inaudible) size. >>: Okay. >> Xiaolin Wu: So in other words, for our side then the bit rate for single -- the bit rate is a .6. >>: 1.2 (inaudible). >> Xiaolin Wu: This rate is already converted to the original size. >>: Okay. >> Xiaolin Wu: Right. Original size. >>: (Inaudible) one description, right. >> Xiaolin Wu: Yes, one description. Okay. So for the side I only have one quarter of the samples. So therefore the original is, yes, 1.2. Yes, 1.2, yes. >>: One quarter. So now you have four descriptions? >> Xiaolin Wu: No. This is a JPEG 2000 with original resolution, right? It is .3 bit per pixel. This side is only one quarter of the original size. >>: Up quarter of the original. >> Xiaolin Wu: Size in terms of number of samples, you know. I drop every other row in the column, right? >>: Okay. I thought you -- but now you have four descriptions. >> Xiaolin Wu: I only have two descriptions. >>: But if you're dropping -- if you only have one quarter of the samples. >> Xiaolin Wu: I can produce four description, right, but if I get only one ->>: I know you have only one here. But you're producing four descriptions. In the previous result you had ->> Xiaolin Wu: Two descriptions. >>: And now you have four. >> Xiaolin Wu: Okay. So here. Let's go back to the ->>: But here the most you use just ->> Xiaolin Wu: I can't have a four. >>: One was you using it in the previous result you were using checkerboard and here you have four descriptions. >> Xiaolin Wu: Yes. So here I can split that into four. >>: Right. >> Xiaolin Wu: But in all the results I presented that I want to reconstruct two. >>: At most. >> Xiaolin Wu: But the black and the shaded. Right? So I only reconstruct. >>: Description one and two. >> Xiaolin Wu: One and two, yes. One and two. >>: One essentially, two ->>: Okay. So one let's see. (Brief talking over) >>: Where you're comparing reference one an reference two. >> Xiaolin Wu: They have two descriptions. >>: Meaning the description one and description two? >> Xiaolin Wu: Description one and description two only. Reference one, reference two are different techniques, right. So we only compare at the same right. Right. So we could -- we only implemented our central decoder for two descriptions. >>: Okay. So I was pretty confused. I thought a central. >> Xiaolin Wu: Decoder. >>: Decoder would take all the description. >> Xiaolin Wu: Right. But I mean here we just say this scheme can split four but ->>: I misunderstood. I thought you were doing a checker board. You said you did a checkerboard. >> Xiaolin Wu: This is a checkerboard. This is a shade and this is black. >>: (Inaudible). >> Xiaolin Wu: Okay. I see. That's where the confusion. So in other words, the two descriptions form a (inaudible), right? Form a half of a checkerboard. That's why. Right. But each description is, you know, one quarter ->>: (Inaudible). >> Xiaolin Wu: I'm almost done. Right. So in other words now you see each side is only one quarter of the original size, right? So this is just one quarter so if (inaudible) spend .3 per pound for this side, we spend 1.2 per pound, right. I mean that's a very good quality. And then of course you know when we build it up we have a very good side information to work with. >>: (Inaudible). >> Xiaolin Wu: Yeah. All the last is due to cut off by half. I mean that is quite significant. So yes, indeed, you can see when the rate gets here we actually lose, right. Because then the rate -- the (inaudible) not well spent, right, because those bits should go to the high frequency coefficients. Our scheme does not allow that. So that's an inherent drawback. But for the low rate you can see we gain, right in and 4 various images you can see. Okay. Okay. So here you can see the visual comparison. This is a (inaudible) this is bicubic, right, this is our side. At the same rate, .2. This is very low, right some for this thing of course, you know, even for .2, JPEG 2000 doesn't accept the job for things like (inaudible), right, other than some ringing artifacts around the diagonal edges. That is understandable, because you know, they continue to transform and it's not adaptive. This is already construction. This is bicubic. >>: In the JPEG 2000 (inaudible) location? I mean, it could (inaudible). >> Xiaolin Wu: I think it's due to this transform. So, for instance, the people have now found the curevelet, right, wavelet, directional lifting, then those drawbacks can be somehow corrected. So I mean here the implement and visual quality is even more prominent, right. And here. Of course we deliver those structures in J being 2000 or bicubic or allows in (inaudible), right? So you know, here it's severe artifacts at this point or this low rate. Now the rate gets higher. >>: (Inaudible) .35 is just the or the total image? >> Xiaolin Wu: The total image. Yeah. Yeah. For this (inaudible) the only gets this. >>: This is (inaudible) right? I mean basically (inaudible). >> Xiaolin Wu: Yeah part of that. >>: I record that (inaudible) tables. .3 is something like 24 for the bike, right? >> Xiaolin Wu: This is for the -- no, I think this is for the leaf. Probably cut off or cut off. >>: Okay. Okay. >> Xiaolin Wu: So it's, you know, 32 DB kind of thing, right. So, yeah, this is I wouldn't say that bit is too low, right? It's media. I mean you have the media bit rate and some of the techniques. For instance JPEG 2000 doesn't a good job and we can have a significant improvement. So this is our side versus their -this is a single description now, right. Just as a combination of codec. Yeah. This is also. Okay. So this is -- yeah. And by the way, we know this is in population thing, right? And the we can do this enhancement after the restoration. Right. So to improve a visual quality this is age enhanced version of those pictures. For two different methods. Okay. You can see for JPEG 2000, because of the presence of those artifacts if you do age enhancement, you know, the things get ugly, but with our technique and you know, it's somewhat smoother, right? So then the result, you know, after age enhancement you can have a sharp image without getting such, you know, bad artifacts. So this is a comparison. Yeah, we also compare this technique against the DCT based old JPEG and at those bit rates you see even black artifacts. So this is JPEG 2000 or (inaudible) this is the current technique okay. So I'm done. (Applause) >> Jin Li: Thank you very much for the interesting presentation. >> Xiaolin Wu: Thank you for your time.