>> Jin Li: Okay. It's a great pleasure... McMaster University to visit Microsoft Research and give us a...

advertisement
>> Jin Li: Okay. It's a great pleasure to have Professor Xiaolin Wu from
McMaster University to visit Microsoft Research and give us a talk.
I know Professor Wu for a long time who has been active in many areas. The
earliest paper I saw if Professor Wu is from his (inaudible) paper on basically
contriving images into different color index. And lately Professor Wu have claim
to fame was his excellent work on lost image coding on color codec which has
been a long time service benchmark for lossless image compression.
Professor Wu have also has been active in various number of international
standard image compression.
Today let's hear his work on standard compliant multiple description image
coding by spatial multiplexing and constrained least-squares restoration.
Without further ado, let's here what Professor Wu have has to say.
>> Xiaolin Wu: Thanks, Jin. That is nice to see familiar faces in this small group.
I don't think of it. The topic of multiple description coding has been started for the
last say one decade mostly for multi media streaming. Here I emphasize
standard compliance. As I will point out up to now all the techniques are mostly
they present quite a departure from current practice.
What I propose is something that is so very, very much in touch with the reality of
today's codecs and the infrastructure, and I will appreciate you know this
technique can be quite practical and -- here. Give the outline of the talk. First I
give a little instruction and then I go to the technique called spatial multiplexing
for multiple description coding, particularly for image coding. This thing can be
generalized with video compression as well, video coding.
So the short hand of what this is SMMD, okay. So after I give the architecture of
what is technique, this paradigm, I'll go into some detail about how to generate a
side descriptions and how to conduct side decoding and then of course, the
(inaudible). And a central decoding. Okay. Then I'll present some experiment or
results and then conclude.
Okay. So multiple description coding has been promoted as a methodology for
multimedia streaming over massive networks, particularly patch switch networks
for which we have to deal with pack losses and network errors. Okay. And that
is because in those applications, particularly for low time multimedia
communications low transmission is possible because of a string generate delay
requirements or bandwidth economy considerations. Okay. So we use resort to
best effort basis, right, like UDP and the multiple discipline coding can tolerate,
you know, various degrees of losses. Okay. So in other words, we can have a
quality of service scalable to network conditions. All right. So the reconstruction
quality is proportional to the number of packets we receive.
Okay. I am not going to give a compensable review over this class of
techniques, instead I just highlight some of the common methods for multiple
encoding for multimedia. The probably oldest, you know, the (inaudible) started
multiple decoding techniques based on multiple description quantization, so we
have multiple description scaler quantizer and multiple description quantizer. So
here we create multiple descriptions by quantizing samples or, you know, sample
vectors into different representations.
And this is a generalization of the approach of a single quantization from a single
description to multiple description and then of course we have correlating
transforms to produce correlated multiple descriptions. And then we have
another class of techniques that is also started in great length by various
researchers that is based on uneven air protection or richer protection was
scalable code streams. So in other words, we apply codes like (inaudible) some
code and a packed scalable code into a multiple layers using various degrees of
protection so therefore we can guarantee the reconstruction quality is
proportional to a number of packets received. Okay. And here, of course, we
need the explicit forward air correction codes plus a scalable source code
stream.
Okay and as we can see none of those techniques is really standard compliant,
right? They represent quite a departure from the current practice. Okay. The
question is can we produce multiple descriptions for media and while, you know,
we can work with current compression standards? This is the main motivation of
this research.
Okay. So what I propose is very simple straightforward approach based on
spatial multiplexing. So this is the input source of this is a two dimensional, it
could be three dimensional, even one dimensional audio, right? What you do is
you multiplex in spatial domain in the single space into, you know, subsets of
those sample grids. And here I deliberately split them into regular sample grids
because each of them is an image by itself, so they can be compressed by any
existing standard. Okay.
Now, the question is this simple minded scheme were it to work, how efficient it
can be, and how it compare against other MD techniques as in the previous
slide.
>>: What exactly do we mean by standard appliance.
>> Xiaolin Wu: Standard a ->>: Each description can be decoded correctly by the (inaudible) decoder.
>> Xiaolin Wu: Yeah. Okay. Let me show you the next slide, you know, maybe
this will answer your question. So this is the input image, right? I will do some
prefiltering, a preprocessing first, then I split that original imaging into multiple
parts, right? Let's say two parts like a checker board, right? And each is a
smaller image, right?
So therefore I can use any third party encoder and a decoder. So in this box it's
a standard or any other existing technique.
>>: Well, without the (inaudible) multiplexer you're not going to get the full
resolution image back, okay? You get a subset of (inaudible) subsets.
>> Xiaolin Wu: Okay. So this multiplexer, you know, I don't think that can be ->>: That's the decoder side. On the decoder side without the green multiplexer
of some sort you will not get a full resolution image back.
>> Xiaolin Wu: True. Yeah. But this is the channel, right? I mean the reason
we have standards is because we want to have this come, you know, the ->>: They're all different levels of standards. And unless you -- I mean even, you
know, MPEG is not interoperable at some level, you know, depends upon your
packaging. You need the file system.
>>: Do they make the same argument with other approaches or can you make
them work on (inaudible) subset of those can be combined.
(Brief talking over)
>>: For example, Apple, you know, they use (inaudible) for the icons but nobody
else can -- I mean, it's proprietary as far as anybody else (inaudible) because it's
not inoperable.
>>: Yes, for example ->>: Right. So I think this is maybe I mean we should just take it for as it is, you
know, no more or less. It's not really -- it's not interoperable at some level.
>> Xiaolin Wu: Right. Right. Yeah, here ->>: The operation is (inaudible) operations are minimal.
>> Xiaolin Wu: And also it's modularized, right, it's a complete detached from the
channel portion of this whole system.
>>: (Inaudible) because there's like hardware communications which make it
(inaudible).
>>: But I think in general I (inaudible) take it with a big grain of salt because it
doesn't really necessarily mean that it's interoperable with anything.
>> Xiaolin Wu: But in this architecture, though, you know, this part, right, it's a
totally independent of what you do with the encoder and the decoder and the
channel, right? Whatever you get back at the decoder, you can use it as these
and then of course it's not optimal performance but you can do as I explained,
you know, following how you can merge those multiple disciplines, right, in case
you have more and then achieve the central decoding. And of course the
performance can be scaled according to your complicity at the decoder side.
Okay. So, yeah, of course, well I mean here is each of the description can be
coded by existing standard. I don't have to do special type of MDQ, right, multi
description quantization or I don't have to do correlating transformer or you know
do error correction code to generate some other packets.
Okay. So the (inaudible) again is generated by uniform spatial down sampling,
right, and however before I do this down sampling we will perform some adaptive
directional low pass prefiltering for the following purposes, right. First we will use
this prefiltering mechanism to build some bridge, a bridge between the decoder,
so in other words I expect, you know, at the decoder side the decoder will
collaborate with encoder to perform MDC decoding and that this filtering
operation will also build some correlation between the (inaudible). And also
because it's a directional prefiltering, we preserve (inaudible) in the presence of
edges, which is by the way very important for visual quality.
Okay. So this is a -- okay. So this is the design principal and the technique. So
first we set up the design code of this directional prefilter. So we want to
preserve a maximum 2D bandwidth without aliasing due to down sampling, so
one would be down sample we reduce the color frequency and then (inaudible)
aliasing, right. So now the thing is how can we, you know, get to the maximum
band pass region in two dimensional without aliasing and so this is what I mean
by those diagrams. You can see, okay. And I can have a directional filters and
then the those boxes were -- those boxes represented the pass region in this 2D
spectral space, right, and then this is the frequency in each direction and so this
is for the angles around and this is for this diagonal angle. This is diagonal, this
is tangent one to two or one half. I mean, we have those six cases then you
have some other symmetry cases.
Okay. And it's easy to show, okay. There are only eight directional filters that
have the maximum pass. Okay. The maximum region you can pass without
aliasing is pi square. There are only eight directions that can achieve this
maximum region, okay. And this is difficult to have a close form representation
for those cases, so I use this table.
>>: I guess I missed. So you're going to prefilter before you down.
>> Xiaolin Wu: Down sample, yes.
>> Jin Li: You get a prefilter in the direction beta?
>> Xiaolin Wu: Yes. This is the prefilter design based on different angles.
>>: Okay. You're going to be like image adaptive somehow?
>> Xiaolin Wu: Yeah, yeah, yeah, that's what I say adaptive prefilter.
>>: To the content.
>> Xiaolin Wu: To the local (inaudible).
>>: And then you're going to filter again after you at the decoding, the
reconstruction side.
>> Xiaolin Wu: At the reconstruction side I'll do the (inaudible).
>>: (Inaudible) that means you are going to use four different angles?
>> Xiaolin Wu: Okay. So here, right? So this is example. I can produce four
descriptions, I can also produce two descriptions, for instance. Let's say each of
those cases the sampling rate is reduced by half, right?
>>: Okay.
>> Xiaolin Wu: So the frequency is reduced by half. So under this let's say I
reduce the scaling factor by half, reduce the sampling frequency by half, then the
question is what is the best prefilter without introducing any alias, right?
>>: Then you're going to filter the long edge or the cross edge?
>> Xiaolin Wu: Along the edge, right. Okay.
>>: Low frequency like along the edge, right?
>> Xiaolin Wu: Yeah. So ->>: Why do you ->> Xiaolin Wu: Because you want to preserve the edge content that you -- in
that case, you achieve maximum imaging compaction, right?
>>: But there's already -- there's already low frequency along that direction,
right?
>> Xiaolin Wu: So that's why you want to preserve that. I mean, yeah, because
against the (inaudible) direction in the gridding direction you are going to
introduce alias anyway, right?
>>: Right. That's why I'm not quite understanding. You have alias thing on
edges like anyway, right? Not even prefilter.
>> Xiaolin Wu: Yeah, so that's why ->>: Not even prefiltering in that direction, he's prefiltering along the low path.
>> Xiaolin Wu: Yeah.
>>: Zero frequency. So what is there to be gained by --
>> Xiaolin Wu: Because you know against the AD direction in the maximum grid
direction there's alias. You don't want that, right, so you want to cut off that. So
you prefilter ->>: But you're not doing any filtering in that direction, right? Or you are?
>> Xiaolin Wu: I am.
>>: (Inaudible). Can you give some examples?
>> Xiaolin Wu: Yeah. For example in this case, this is the ->>: The edge goes like that?
>> Xiaolin Wu: Right. So you see I mean the frequency against the edge, the
path.
>>: This is the filter?
>> Xiaolin Wu: Yeah, the filter response. Yeah. Right. So therefore you allow,
you know, you allow a very high bandwidth, right, in the low frequency direction
because you know, you know, you are not going to suffer from aliasing.
>>: Okay. That's -- all right. So that's what I meant. Okay. So you're going to
do a low pass filtering kind of make smooth out the edges?
>> Xiaolin Wu: Yeah, yeah.
>>: Okay.
>> Xiaolin Wu: Right. And against the edges I smooth, right, because I know ->>: Okay.
>> Xiaolin Wu: You know, this down sampling scheme will introduce error, right.
So in other words, you know, I want to preserve as much as possible, right? So I
want to design the region without aliasing, right? Then the question is, you know,
what is the maximum possible region that is a pi square, right? Because if you
take the periodical tiles, right, you don't want those tiles to overlap, right, so then
you can see, you know, this is the best you can do in those angles. Okay. So
therefore there are only eight angles, you know, zero the horizontal and vertical
and the tangent half and the two right you have a plus, minus and also you have
the diagonal, two diagonals.
So for those special angles you can achieve a maximum region of a pi square
without overlapping, you know. Without introducing alias.
>>: So the (inaudible) you're referring to?
>> Xiaolin Wu: Here?
>>: Yes.
>> Xiaolin Wu: That's the bandwidth. Bandwidth in low frequency, bandwidth in
high, right. And this is the zeta, that is the parameter for the directions. Okay.
So for instance let use this type. So for horizontal or vertical direction, right, and
this is -- this is two pi in width so your bandwidth is a pi, right, and in this case,
you know, this bandwidth is a much louder or in a low pass range so it in this
case is not symmetric. So you have the right WL pi over root 5 and it is root 5
times pi.
>>: So (inaudible) the area of the prefilter pass as long as it's pi squared then
we're guaranteed to have (inaudible).
>> Xiaolin Wu: That is I mean without aliasing on that condition, the maximum
region you can pass is pi square, right? Now, the question is what filters,
direction of filters can achieve that area. There are only eight cases.
>>: Eight (inaudible)
>> Xiaolin Wu: Right. Then my possible rating has to be smaller than pi square
without aliasing, right. So that's what I was saying. Okay. I want to achieve the
maximum preservation of the frequency without -- you know, in this down
sampling scheme, right, I want to preserve the maximum frequency without
aliasing then there are only eight angles that allow me to achieve that. So that's
why I designed those eight angle prefilters, right. Enough cause in (inaudible),
you know, I have to quantifies those edge angles into those eight cases. Right.
And some angle in between, this is not a achievable. Yes. That is true.
Okay. So, you know, there are many possible of course this is ideal filter, right.
In practice you have to use some real filters to get to filters to achieve that for
instance we can use the function to achieve those directional prefilters and here
you can apply a window function in order to achieve, to reduce the ringing effect.
Okay. Now, this is the pre -- here is the preprocessing, this is, you know, prior to
down sampling you apply adaptive prefiltering and this is (inaudible) this is a
single adaptive, right, point by point we try to estimate the direction of the
gradient then apply the filter. Okay. After that we down sample and then we
decompress, right. At the decoder side we have this decompressed subsample
image, I would say decompressed prefiltered subsample image, right? And then
our goal is to reconstruct the original from this down sampled compressed image,
okay.
And so let's look at a one side description if we get only one quarter of the
samples like this, and then this is a the reconstruction problem. And to achieve
this, we apply a so-called windowed autoregressive model or model so in other
words we assume the image is piecewise stationary, so in a small window we
can apply this autoregressive model, each pixel can be written as a linear
combination of its labors plus some (inaudible). Okay. And then the regressive
coefficients are here or will be a two dimensional process as well, and we
assume that is, you know, piecewise stationary and so in other words this single
alpha changes slowly under this assumption then we can learn this model in a
local window, right, where you try to estimate the model parameters alpha, you
know, overlap the window fashion. Okay.
So this is the technique or algorithm. Let me introduce some symbols, okay. So
X here is the represents the pixels in original image I and the Y are pixel in the
down sampled image I down sampled here. W is a local window, and then I use
this particular symbol I diamond T for the four connect labors of our particular
pixel XI. I is the two dimensional index for the pixel position and the here the
plus, plus means the full connect labor for a particular pixel XI and likewise X
cross represented the eight connected labors of a particular pixel. Okay. And
the task is to estimate a block of high resolution pixel, the original samples X in
this local window using this autoregressive model. Two models, okay, model A
and B. Okay. So or maybe I should use our diagram to show this. Okay.
So this is kind of piece X. It's a four diagonal neighbors, eight connect neighbors
and the signal is X cross, right, with respect to I. Okay. So this is 0, 1, 2, 3,
offset, and this is another spatial configuration current sample relationship to its
four connect labors, north, south, east, west, east, west, right, like that. Okay.
So I will use two models in two spatial configurations to estimate the center pixel
XI. The reason I use two instead of a single order eight model is because I want
to avoid data overfeeding because I have to estimate those parameters A1, A2,
you know, those regression coefficients locally, right. If I have too high an order,
then I don't -- I may not have enough samples to do this properly.
Okay. So I split that into two. So I have two models to fit the pixel XI, okay.
>>: (Inaudible).
>> Xiaolin Wu: T is the labors, right.
>>: (Inaudible).
>> Xiaolin Wu: T is the index labors. So here, 0, 1, 2, 3, so it's right and of
course I have a diagonal model, I have an axle model, right, I need to optimally
blend them. So there would be two weights to be determined. Because I cross
and I plus, right. And I'll explain how those rates can be computed. Okay. So
this is the overall objective function on which we tried to estimate the original
pixel I in, X in a local window W, okay. So you can see this is a real problem,
right, because here I actually have the relationship between the unknowns, right?
Okay. And then there are a few loose ends we have to tie up, first how to
determine those coefficients A and B and also those weights.
Suppose we have them, right? Then you know, we still have a year post problem
because this cost function may allow a trivial solution, right, if I put all those X -estimate of flat signal, then this will be minimized, right? So if I make a -- the two
dimensional (inaudible) in this window flat, right. So therefore I need a constraint
right. So what do we know? What we know is this decoded image Y, okay. So
Y is the pixels in the decoded down sample image, right? So therefore this X
convoluted by this prefilter, right, will agree to the decode the image. That's what
I receive. Right. Then of course remember why is compressed so therefore I still
have reconstruction error on the Y, so A of R, R here is the rate of this
compression scheme, right.
>>: (Inaudible).
>> Xiaolin Wu: Quantization error. Yeah. So this is the -- yes, this is the
compression distortion, right. Okay. So I want to have the best model fate
subject to this constraint. This constraint is given by the system, right? So this is
the prefilter I used at the encoder side, right, so this is the receive the
compression image. Okay. This is the quantization error estimate. Okay.
I want ->>: But where you're only -- this is for -- are you only receiving some ->> Xiaolin Wu: I only receive one sight description.
>>: One description.
>> Xiaolin Wu: One description which is a down sample compared to version of
I, right, and Y values alone, that's the only thing I can observe. This is receive
the image. But this Y, right in but I know the original image convoluted by that
directional filter down sample minus that should be the.
>>: The convolution includes the down sampling.
>> Xiaolin Wu: Yes, prefilter down sampling, yeah. The cascade of those two
operations. And I know, right, this difference should be the quantization loss.
Okay. So I tried to get the best model you subject to this constraint, this
constraint is due to the pipeline of the filtering down sample and the
compression, right.
Okay. So now we have to solve those problems how to estimate A and B and
how to determine those weights. Okay. So this is the way we compute the
coefficients or the model parameters for the diagonal model and for the actual
model so again we just solve this and (inaudible) try to estimate the model
parameters, right.
So we take a small window and fit and here of course we can only use
observables. Those are the decompressed pixel, right.
>>: The decompression (inaudible) you don't have that (inaudible) have the YI,
we have the cross and plus.
>> Xiaolin Wu: Yes, I mean this down sample I still have the square lattice,
right? I still have the cross and the diagonal and the axle correlations to lay out.
But you got a point, yes. In this case the scale has changed, right, because it's
down sampled, so therefore, you know, in terms of the size it's doubled in spatial
domain.
So here of course we have to make assumption that the (inaudible) matrix does
thought change by down sampling which is a big assumption, right?
But for large scale edges, this may be a reasonable assumption, right? If the
edge or structure have a large scales then so for instance if you straight line and
the directional correlation does not change. So that's why later you will see, you
know, this technique has some merits. So you have reconstruct edges very
nicely. That is because, you know, those model parameters can be satisfactorily
estimated from the signal Y from the down sample to signal Y. So this is how we
build the model, right?
>>: (Inaudible) minimize the joint.
>> Xiaolin Wu: So here, right?
>>: Function go down to (inaudible) parameter you just minimize these who are
not the same, right, so there a step to go (inaudible).
>> Xiaolin Wu: Right. So this is the objective function for the entire task, right?
And here we have to know the model parameters A and B. The next slide just
shows how to estimate A and B?
>>: Actually (inaudible) minimized in.
>> Xiaolin Wu: Okay. Yeah, that's an excellent point. If I say I want to minimize
this objective function by treating both X, A and B, all of them as optimization
variables, I can do that, right? But then it becomes (inaudible) problem which is
much harder. Right. So what I -- I adapt this approach is a separation approach.
We first estimate the A and the B. And then the plotting the estimate A had and
B had into this objective function and then try to do -- try to solve this problem,
then this problem becomes linear this square problem. Which has an efficient
solution, right? But enough (inaudible) yes, we have also tried non linear
approach. So in other words, this minimization is not only with respect to X, right,
we also serve through A and B try to do this, right?
But ->>: (Inaudible) next slide.
>> Xiaolin Wu: Okay.
>>: So you've got -- you're training the As on your (inaudible) but those are down
sampled, right?
>> Xiaolin Wu: Right. So (inaudible) asked the same question, yes.
>>: There's a dispatch but it's not minimizing over A and B in the other one.
You're taking approximation assuming it would be our known but minimize over
X.
>> Xiaolin Wu: Right, right. So I mean ideally, yes, this A and the B should be
estimated on X, right? From (inaudible) but that becomes a non linear problem,
right, so all we do is try to do it in two steps.
>>: Okay. But it's not even on the same resolution, right?
>> Xiaolin Wu: Right.
>>: But A is supposed to be a predictor for high resolution.
>> Xiaolin Wu: For high resolution, yes. But in this formula A and B are estimate
if I'm in the low resolution. Yeah. Yeah. But you know computationally, you
can -- but then this is chicken egg situation, right? I don't know eggs, how can I
estimate A and B. Without A and B, how can I know X, right? Because X will be
(inaudible) on the Y.
>>: Sure. Well, you could do -- I mean another approach might be you train on
higher resolution images and you send the coefficients over to the decoder and
you reconstruct maybe with a fixed filter.
>> Xiaolin Wu: Right, but that -- I think that is highly single dependent, right. So
you -- I mean depends on how much (inaudible) you want to send, right?
>>: This not (inaudible).
>> Xiaolin Wu: Yes. So this is point by point estimate.
>>: (Inaudible).
>> Xiaolin Wu: Too much side information. Of course another thing you use a
training set like you design and then you don't send. But okay, any way, those
are the you know ->>: You have a signal X and you down sample that by 4, how does that
(inaudible) compare to the one before you down size?
>>: Yeah, exactly.
>>: I can take that it's probably not too far. The signal (inaudible).
>> Xiaolin Wu: Yeah, yeah, yeah.
>>: Like if I get closer to you, maybe the statistics don't change all that much.
>> Xiaolin Wu: Yeah. This is the, you know, the basic assumption we have to
make, right? I mean if the single structures large scale then this is okay, right?
As you said. I mean the covariance makes me stop.
>>: Okay.
>> Xiaolin Wu: I mean those things don't change, second order statistics don't
change that much, right? In original sampling rate is high enough.
>>: (Inaudible). The low frequency (inaudible).
>> Xiaolin Wu: Yeah. And so this is how we determine A and B, and then this
becomes a linearly square problem, right? And then of course we have to
compute those weights. Those weights can be -- by the way, I have to say
something about you know this is an important term, this -- you know this
constraint, right? Because the origin estimation problem and you see your post
and many, many solutions can satisfy that. So therefore constraints play an
important role. The constraints here, you know, I already explained, right? This
is the cascade of prefiltering down sampling, this is the original (inaudible) for the
estimate, this is the compressed version we receive, right? So this is the system
constraint, okay?
Okay. So here we have to know because the prefilter is adaptive, right? Is
changing from point to point. So therefore we have to know this prefilter kernel,
okay because we use a directional filter we don't need to send the side
information, the decoder will try to estimate the direction at the same spot and
then they guess about the prefilter that encoder uses. So it prints out this
estimate quite good, right, with only two side information.
Okay. So the wings of those two models are determined by just you know this
classic linearly square weights, right, and those error terms, error plus, error plus
are the estimating errors of those two objective functions that we use to estimate
the A and the B, right? So they represent how good those two models fit in a
local neighborhood, right, assuming those errors are independent and then we
have this optimal weights.
Okay. So now we can put all those pieces together and stay to the (inaudible)
square problem so once we have the weights we have the model parameters, we
have the constraint, right, then we try to solve this long ranging form constraint
optimization problem can be solved like this. Okay. Another interesting
interpretation of this scheme is you can consider this as adaptive (inaudible)
predictive decoding, right? Because in that object function you know we try to
predict each unknown pixel using labors. This is a (inaudible), right and the
predictor is trained point by point so it's adaptive, right. So I believe this is the
reason we get a very good performance as we will see later. Okay.
So you can consider those autoagressive models as (inaudible) castle predictors,
adaptive one. Okay. And so this is the side decoder. Now, if we get more than
one descriptions, then we have to perform the so-called central decoding, so take
two descriptions as example, and again, you know, the decoder can be -- the
encoder and the decoder can be any third party scheme, right, where you
consider MDCD coder independent of the each side description scheme, okay.
What we do is we do deep multiplexing. So in other words, we can simply just do
a mosaic, right, and the combine those samples into a sampling grid, okay.
And then of course because now we have much more information to work with,
probably we can do much better than someone just do a mosaic, right? So later
now we'll see how the central decoder can rely on all the pieces of information
available to it.
>>: (Inaudible).
>> Xiaolin Wu: Yes?
>>: I mean we can see this framework is (inaudible) restoration framework. I
wonder if you can consider other possible framework for the solution (inaudible)
some of the model I see in the past is the model identify (inaudible). Then the
description you receive is constraint which basically (inaudible) descriptions, you
just try to solve this combined optimization problem.
>> Xiaolin Wu: Using conditional (inaudible) even, you know -- I mean you do
this maximum likelihood estimate using ->>: Basically you try to estimate the lowest energy point for the constraint and
then basically look back into the constraint to see if the coding to the description
you receive if it's out and just turncate them back into the ->> Xiaolin Wu: To the -- yeah. To the boundary and then.
>>: Boundary and then (inaudible).
>> Xiaolin Wu: Yes.
>>: Using basically (inaudible) what do you think about that?
>> Xiaolin Wu: We haven't tried that, but I think that's an excellent suggestion.
So for instance here we use just a classic D square approach, we assume it's
autoregressive process, we estimate the parameters. I mean, so here we take
adaptive prediction approach. But you know, this kind of relationship could be a
markup field, right? So yes, in this framework you could have a different
reconstruction approach using the decoded image as a constraint and try to
instead of minimize square error, you know, for instance you do (inaudible) right,
so you can, yeah, just like, yeah, it becomes a different admission technique,
right.
But nevertheless, you know, this approach will accommodate that, right? I mean,
the only thing here of course is how to use this. I mean, this is the only long
thing, right? This is the only information or the fact we have. We know the
original source gone through this kind of a degradation, right? It's mapped to Y.
That's the only thing we have.
>>: How big is W?
>> Xiaolin Wu: How big is W? 5 by 5, 7 by 7.
>>: So I'm a little confused. Are you tiling the image into these regions?
>> Xiaolin Wu: No, it's overlap.
>>: Okay. So if they're overlapped, then you're just determining what, the center
pixel?
>> Xiaolin Wu: Yes, indeed.
>>: But the minimization doesn't really.
>> Xiaolin Wu: Minimize on a single point, right?
>>: Yeah.
>> Xiaolin Wu: Right. So minimization is performed on a block.
>>: Okay. So you find the X in that block and then what, just.
>> Xiaolin Wu: Fix the point X. Then move on.
>>: Okay.
>> Xiaolin Wu: Right. And then of course that's a good point. If you do this,
then for every single pixel you have a multiple estimates, right?
>>: Right.
>> Xiaolin Wu: You could be a center, you could be a corner, right? But then
you can (inaudible) in fact you can. If you do this overlapping businesses
eventually you have a multiple estimates on everybody. Right. So then -- so
that's why, you know, Jin said that you could model in the markup field. Then
you can, right, consider this interplay using some kind of maximum likelihood
estimate.
>>: So what will happen if you just had your -- just had a single, you know,
estimated a single pixel?
>> Xiaolin Wu: Oh, then the (inaudible).
>>: I mean based -- the objective would be the single ->> Xiaolin Wu: Pixel. And then ->>: But of course -- I mean you're still using your neighborhood and all that stuff.
>> Xiaolin Wu: Yeah. For instance if you want.
>>: The constraints are still in the neighborhood.
>> Xiaolin Wu: Constraint is still neighborhood and then once you have those
parameters A and B estimated, then it becomes simply just -- simple
interpolation, right?
>>: (Inaudible) one pixel (inaudible) minus one then is (inaudible) one pixel at a
time (inaudible).
>> Xiaolin Wu: Yeah. But then of course we don't have to fix just single point,
we could fix kernel pop, right, the interior. So this is just the implementation. I
mean, the most expensive part is I only fix a single point in the middle. Because
you see here we make a lot of assumptions, we think of this thing, these thing
piecewise stationary, I mean how about at this point is right on the border of a
(inaudible), edge. Then this window may not.
>>: Yeah. So I see how this is leading back toward Jin's mark of (inaudible).
>> Xiaolin Wu: Right.
>>: I guess I was assuming supposing you know all the other Xs around, then
you could reformulate this as, you know, find the X in the middle -- point in the
middle that minimizes some prediction error subject to this constraints.
>> Xiaolin Wu: Yes.
>>: But you don't know all those other.
>> Xiaolin Wu: Right. I mean all those Xs are known as well.
>>: Yes. So you have to do it instead of doing it all jointly somehow, which
would be done in say a iterative mode or ->> Xiaolin Wu: Using message passing, that kind of a trick to do it, right?
>>: Yeah. You're just doing it this way.
>> Xiaolin Wu: Right. I mean, this way is almost equivalent, right? You estimate
a block instead of a single point and then you move on. So you take a context of
a large neighborhood.
Okay. So then of course if you have a multiple minutes and you know there are
spatial configurations, then the same problem can be posted, right, and solved.
Okay. So those are the details. I suppose. Okay. And yeah, I don't think we
need to go into the details, but the one thing we can explain why the central
decoder will be more powerful then it's because now I have more constraints,
right? You have -- if I have two descriptions, I have this checkerboard, right I
have the black spots and then the white spots. For all of them I can establish for
every basis point I can (inaudible) a constraint.
And also the model parameters will be better estimated. Now I have more
samples, right, if I have more descriptions. And also you remember, with those
in the (inaudible) information about the filter at then coder. If I have more
samples about the direction estimate it will be better so I can get better estimate
of the kernel. So way more I can improve all those -- improve the quality of all
the input data for that optimization problem, and I have a better solution for the
central decoder.
Okay. So I will show some experiments. We compare the scheme against some
recent minute NDC techniques. One is done by (inaudible) and who used to be
here, right? And he's now.
>>: (Inaudible).
>> Xiaolin Wu: He's now at (inaudible). In fact I visit him tomorrow. I will drive to
Vancouver tomorrow, right. So they publish a paper in this year's DCC. They
use filter banks for (inaudible) and then there's another paper, a recent paper
encoding. And those two other (inaudible) at the best performed methods.
Okay. So here is an illustration. Okay. Let me explain. This is (inaudible), this
is a bike, okay, and those three solid lines that appears into those three
competing method, okay, the one was triangle or be ours a circle is technique of
(inaudible) difference one and the (inaudible) is another (inaudible) technique
from reference two. Okay. And the access is the bit rate per side description.
Number of bits per sample in each side description. Okay.
>>: Number of bits per sample on side description do you convert it back to the
whole image?
>> Xiaolin Wu: No. No. So in other words, here the side description is coded by
JPEG 2000 and .88 ->>: (Inaudible).
>> Xiaolin Wu: Yeah. And the smaller resolution. Right. Okay. Yeah. So this
is a more complex image byte.
>>: And when (inaudible) dotted line and solid line.
>> Xiaolin Wu: Okay. The dotted line will be the side, right. This is the center.
So there's two descriptions. So you can see there's a gap up to 3DD?
>>: (Inaudible)
>> Xiaolin Wu: Yes. We haven't implemented that. We've only implemented the
two descriptions and the two descriptions shift by half a half pixel, half pixel, the
optimal shift. So it's a checkerboard. Right. Yeah, you can see here roughly
3DD from a side to a center.
And interesting thing is, you know, our side and the center were above the
competing methods, right? Okay. So I didn't tell the whole story. And when the
rate gets higher those curves were costs. So our scheme has excellent
performance at a low to media bit rate. I mean that can be -- that is
understandable because we have done down sampling, right, and we basically
disregarded the high frequency single component. Then you know if the rate
gets much higher than this down sampling scheme is self optimal. So here is
another example, so the image flower and the fruit. Okay. So okay. So here are
the visual comparison, so this is method one, method two and our method at the
top reside and the bottom row is the center. So maybe just kind of reduce the.
>>: One should be (inaudible).
>> Xiaolin Wu: So this is at the bit rate a quarter pixel. Here, right. Okay. It's a
wrap around. There is a delay. Okay. So I think the visual quality prove on this
even more significant than the TSR numbers who suggest so for instance if
you're (inaudible) at the side, right, this is quite significant. Not as much as in the
center. Okay. So this is a dark image. So this is a competing method of a side
description, you know, here you can see yeah, good, thank you. All right. So
this is the proposed method SMMD at the point four bit per pixel. This is the
MDC technique in reference one at the same bit rate. This is the side. This is
the center, two descriptions.
>>: How do these compare to the single description?
>> Xiaolin Wu: Okay. I'll come to that point. Single description say (inaudible)
right? Yeah.
>>: (Inaudible) other question is that here I see two technologies (inaudible) why
is (inaudible) the other is the (inaudible) estimation. Can we see results for each
of the portions, let's say we just apply prefiltering.
>> Xiaolin Wu: Oh, none are adaptive prefiltering, right?
>>: Let's say we do a pi adaptive prefiltering but let's say we just use a very
simple post process just to buy linear and (inaudible) at the end. And the other is
basically let's say don't do some prefiltering. (Inaudible) for example SSMD
approach is more appealing than the other approach. I want to see where the
gain -- I mean exam part of the gain is provided by prefilter, which part of the gain
provided by (inaudible).
>> Xiaolin Wu: So without prefiltering, right, you knew something from a three
point -- .3DB to a .6DB in that range. I just had a discussion with my student this
morning over the phone. I mean, I'm talking about the ongoing project.
>>: So prefiltering provide more gains for the o.
>> Xiaolin Wu: No, I think the restoration is more important. Let's say you
replace adapt this restoration by as you said by cubic, right, and then the
deterioration would be more than one DT, right, if it's ->>: I mean this restoration thing is like, I mean they've been using this sort of
thing for super resolution.
>> Xiaolin Wu: Yeah, yeah. That's why we, you know, found this application. I
mean we started with super resolution.
>>: I mean like Allen Gershow (phonetic) would put in just say (inaudible).
>> Xiaolin Wu: I didn't know Allen Gershow did this.
>>: I think the way you've done it is probably unique, you know, but this sort of
approach where you use a -- you know you predict using a additional expectation
you predict the pixel up there given the (inaudible) based on the conditional
expectation model somehow and you've modelled it in a particular way so it's
adaptive sort of across the whole image is kind of a common technique. So it
seems like what -- what's more unusual maybe to me is the prefiltering part. I
don't know about which contributes most to the theme theory, but ->> Xiaolin Wu: I think the prefiltering contributes as you say the .5DD is not that
much. In fact, you can just do non adaptive prefiltering. Okay. Prefiltering,
yeah, yeah. Okay. Remember we are talking about MDC, right? So there's a
trade-off between the side and the center, right? So in this case, you know, our
side description has a fairly good quality compared with some other schemes.
Right? So that is because the two sides, two side have a fairly high correlation,
right? So if you get -- do you know description you don't get too much new
information so the center distortion does not drop that much, right? Particularly
at high rate.
So the prefiltering actually can (inaudible) how much correlation you can
introduce between the two correlations, right, because they are interlocked, right.
So if you get a, you know, a larger filter kernel and then you know you correlated
the two, better. You have a, you know, impulse sampling then the two are more
or less independent, so you favor the center, right? So that is another, you know,
MD -- that's unique to MDC. Right.
So you know, prefiltering we're also embed some scheme for the collaboration
(inaudible) and decode as I said earlier, because those are the constraints we
use. For super resolution though those things are (inaudible), right? That is the
point sphere function, the point sphere function keeps you, that's what you work
with. I mean here you have the freedom to choose to design.
But then I agree, I mean, the framework is the same. I mean the restoration
problem has been started for long time, yes, for instance (inaudible) super
resolution, right. Interpolation, yes.
But I haven't seen that approach used in compression, right. So, yeah and later,
yeah, let's go there. I mean so here you can see SMMD has a much better
visual quality than others even though the (inaudible) number is actually a little
bit lower in this case, right? Okay. So this is due to this particular model we use,
right, we use the direction of prefiltering and then we use this autoregression
model which has you know -- you know, remember we have to estimate the
model parameters if the low resolution minute. That estimate is good if you have,
you know, large edges or large scale structures, right? In that case, then the
single statistic doesn't change in scale. I mean if that's the case we do a good
job.
So and also I mean in this direction of course, you know, you have a high
frequency but in -- you know because we do prefiltering like that and our model
will learn this directional correlation well. So we reconstruct those edges much
better say than those techniques. So this is some advantage we want to
emphasize.
Okay. And this is from the side to the center, so you see the improvement of a
visual details. All right. So now if you get additional packets then -- so you can
consider this as scalable code as well. Right. You know, low (inaudible)
narrative scalable scheme.
Okay. So, you know, rate distortion performance remark. So now the question is
if you do this down sampling business where you necessarily suffer in terms of
the distortion performance, right? So here is my argument, you know, just by
(inaudible) principal, we know that if this inequality holds, right, in other words
whatever I discard, right, this is the discarded energy from my color frequency. If
this part is less than the distortion given the packet a rate, this is the distortion
function of the source. Then I'm okay. I can, you know, I can safely discard,
right?
So, you know, for (inaudible) signal, you know, the (inaudible) I suppose, you
know ->>: (Inaudible) this approach is more okay at low bit rate (inaudible) more safe
than a low bit rate than.
>> Xiaolin Wu: Yes, that's the argument, yes. In deed all the experiment and
results show we have a very nice performance at low rates.
>>: So how about higher rate.
>> Xiaolin Wu: A higher rate then we start to lose, right. So this is the single
description, right? Phil just asked this question. So in its own right this scheme
is superior low rate code. And in fact, it has a higher (inaudible) you know, a very
good low rate code, right? And the way we do (inaudible) JPEG 2000 using this
down sampling, right, at an encoder side you treat it as an inverse problem and
solve it using some knowledge. And so, yeah, this is a table to show some
experiment results. So this is just a single side description which is, you know,
basically a combination of code now if you don't get an additional descriptions.
Right. And here are the rates, (inaudible) leaves, flower and a bike. I mean
those minutes I've shown you, right. So this is a bike.
>>: (Inaudible).
>> Xiaolin Wu: Even with one description it's better.
>>: One description I can understand.
>> Xiaolin Wu: Two descriptions for the total rate of course it's worse, right.
Yeah, yeah.
>>: Okay let's say you work at (inaudible). Did that point 3 (inaudible) that
means you're asking the JPEG to solve and to work at 0.075 (inaudible)?
>> Xiaolin Wu: I think this is the with respect to (inaudible) size.
>>: Okay.
>> Xiaolin Wu: So in other words, for our side then the bit rate for single -- the bit
rate is a .6.
>>: 1.2 (inaudible).
>> Xiaolin Wu: This rate is already converted to the original size.
>>: Okay.
>> Xiaolin Wu: Right. Original size.
>>: (Inaudible) one description, right.
>> Xiaolin Wu: Yes, one description. Okay. So for the side I only have one
quarter of the samples. So therefore the original is, yes, 1.2. Yes, 1.2, yes.
>>: One quarter. So now you have four descriptions?
>> Xiaolin Wu: No. This is a JPEG 2000 with original resolution, right? It is .3
bit per pixel. This side is only one quarter of the original size.
>>: Up quarter of the original.
>> Xiaolin Wu: Size in terms of number of samples, you know. I drop every
other row in the column, right?
>>: Okay. I thought you -- but now you have four descriptions.
>> Xiaolin Wu: I only have two descriptions.
>>: But if you're dropping -- if you only have one quarter of the samples.
>> Xiaolin Wu: I can produce four description, right, but if I get only one ->>: I know you have only one here. But you're producing four descriptions. In
the previous result you had ->> Xiaolin Wu: Two descriptions.
>>: And now you have four.
>> Xiaolin Wu: Okay. So here. Let's go back to the ->>: But here the most you use just ->> Xiaolin Wu: I can't have a four.
>>: One was you using it in the previous result you were using checkerboard
and here you have four descriptions.
>> Xiaolin Wu: Yes. So here I can split that into four.
>>: Right.
>> Xiaolin Wu: But in all the results I presented that I want to reconstruct two.
>>: At most.
>> Xiaolin Wu: But the black and the shaded. Right? So I only reconstruct.
>>: Description one and two.
>> Xiaolin Wu: One and two, yes. One and two.
>>: One essentially, two ->>: Okay. So one let's see.
(Brief talking over)
>>: Where you're comparing reference one an reference two.
>> Xiaolin Wu: They have two descriptions.
>>: Meaning the description one and description two?
>> Xiaolin Wu: Description one and description two only. Reference one,
reference two are different techniques, right. So we only compare at the same
right.
Right. So we could -- we only implemented our central decoder for two
descriptions.
>>: Okay. So I was pretty confused. I thought a central.
>> Xiaolin Wu: Decoder.
>>: Decoder would take all the description.
>> Xiaolin Wu: Right. But I mean here we just say this scheme can split four
but ->>: I misunderstood. I thought you were doing a checker board. You said you
did a checkerboard.
>> Xiaolin Wu: This is a checkerboard. This is a shade and this is black.
>>: (Inaudible).
>> Xiaolin Wu: Okay. I see. That's where the confusion. So in other words, the
two descriptions form a (inaudible), right? Form a half of a checkerboard. That's
why. Right. But each description is, you know, one quarter ->>: (Inaudible).
>> Xiaolin Wu: I'm almost done. Right. So in other words now you see each
side is only one quarter of the original size, right? So this is just one quarter so if
(inaudible) spend .3 per pound for this side, we spend 1.2 per pound, right. I
mean that's a very good quality. And then of course you know when we build it
up we have a very good side information to work with.
>>: (Inaudible).
>> Xiaolin Wu: Yeah. All the last is due to cut off by half. I mean that is quite
significant. So yes, indeed, you can see when the rate gets here we actually
lose, right. Because then the rate -- the (inaudible) not well spent, right, because
those bits should go to the high frequency coefficients. Our scheme does not
allow that. So that's an inherent drawback. But for the low rate you can see we
gain, right in and 4 various images you can see. Okay.
Okay. So here you can see the visual comparison. This is a (inaudible) this is
bicubic, right, this is our side. At the same rate, .2. This is very low, right some
for this thing of course, you know, even for .2, JPEG 2000 doesn't accept the job
for things like (inaudible), right, other than some ringing artifacts around the
diagonal edges. That is understandable, because you know, they continue to
transform and it's not adaptive. This is already construction. This is bicubic.
>>: In the JPEG 2000 (inaudible) location? I mean, it could (inaudible).
>> Xiaolin Wu: I think it's due to this transform. So, for instance, the people
have now found the curevelet, right, wavelet, directional lifting, then those
drawbacks can be somehow corrected.
So I mean here the implement and visual quality is even more prominent, right.
And here. Of course we deliver those structures in J being 2000 or bicubic or
allows in (inaudible), right? So you know, here it's severe artifacts at this point or
this low rate. Now the rate gets higher.
>>: (Inaudible) .35 is just the or the total image?
>> Xiaolin Wu: The total image. Yeah. Yeah. For this (inaudible) the only gets
this.
>>: This is (inaudible) right? I mean basically (inaudible).
>> Xiaolin Wu: Yeah part of that.
>>: I record that (inaudible) tables. .3 is something like 24 for the bike, right?
>> Xiaolin Wu: This is for the -- no, I think this is for the leaf. Probably cut off or
cut off.
>>: Okay. Okay.
>> Xiaolin Wu: So it's, you know, 32 DB kind of thing, right. So, yeah, this is I
wouldn't say that bit is too low, right? It's media. I mean you have the media bit
rate and some of the techniques. For instance JPEG 2000 doesn't a good job
and we can have a significant improvement. So this is our side versus their -this is a single description now, right. Just as a combination of codec. Yeah.
This is also. Okay. So this is -- yeah. And by the way, we know this is in
population thing, right? And the we can do this enhancement after the
restoration. Right. So to improve a visual quality this is age enhanced version of
those pictures. For two different methods. Okay. You can see for JPEG 2000,
because of the presence of those artifacts if you do age enhancement, you know,
the things get ugly, but with our technique and you know, it's somewhat
smoother, right? So then the result, you know, after age enhancement you can
have a sharp image without getting such, you know, bad artifacts. So this is a
comparison. Yeah, we also compare this technique against the DCT based old
JPEG and at those bit rates you see even black artifacts. So this is JPEG 2000
or (inaudible) this is the current technique okay. So I'm done.
(Applause)
>> Jin Li: Thank you very much for the interesting presentation.
>> Xiaolin Wu: Thank you for your time.
Download