March 2, 2004
ISO/IEC JTC1/SC29/WG11
MPEG04/ M10569/S15
March 2004, Munich
by
Yongjun Wu, Abhijeet Golwelkar, and John W. Woods
1. We already provided this part on the DVD-R disk.
2. A description of the number of bits which were used by the decoder to produce each submitted test sequence.
NOTE: We list the number of bytes used. The number of bits should multiply the number by 8.
Kbit/s CITY CREW HARBOUR ICE
64 79,483
128 159,462
192 239,465
384 479,793
750 937,290
79,468
159,465
239,464
479,792
937,291
79,476
937,286
63,634
159.469 127, 625
239,473 191,621
479,791 383,858
749,855
1500 1,846,154 1,845,227 1,845,612 1,477,166
3000 3,746,828 3,744,887 3,745,860 2,988,125
6000 7,497,206 7,495,185 7,495,655 5,997,698
Table 1: The number of bytes used by decoder in the required cases for SD sequences
1
March 2, 2004
Kbit/s MOBILE FOOTBALL
64
128
256
512
1024
79,404
159,742
319,717
639,717
68,358
138,190
276,870
554,186
1,280,051 1,109,358
Table 2a: The number of bytes used by decoder in the required cases for CIF sequences
Kbit/s FOREMAN BUS
48
64
128
256
512
58,861
79,459
159,449
319,441
640,024
28,973
39,702
79,704
159,689
320,012
Table 2a: The number of bytes used by decoder in the required cases for CIF sequences
3.
A technical description sufficient for conceptual understanding and generation of equivalent performance results by experts and for conveying the degree of optimization required to duplicate the performance. This description should include all data processing paths and individual data processing components used to generate the bitstreams. It need not include bitstream format or implementation details.
For the regeneration of equivalent performance results by experts, we can provide the executable encoder, extractor and decoder in binary format, which can run in Windows XP. We also can provide the parameter files for each sequence. Each item in the parameter file is explained in our manual for
MC-EZBC. Together with this document there are the manual and parameter files.
Please see the website ftp.cipr.rpi.edu/personal/wuy2/mpeg-competition/regeneration/ to get the parameter files, executable encoder, extractor and decoder, and the manual for each sequence. In the same directory, there is the sub-directory /highest-bitstream. We provided the highest bit-streams, corresponding command files and decoder for each sequence there. You can verify your results with them or decode the bit-streams to verify our results.
On the website ftp.cipr.rpi.edu/personal/wuy2/mpeg-competition/ , there are also the anchors and our results for the MPEG competition. Please feel free to take a look on them. Note the anchors are two or three frames shorter than actually required by MPEG committee due to some unknown reasons.
2
March 2, 2004
For the conceptual understanding of our coder, please refer to our Annex B .
4. Participants should state the programming language in which the software is written, e.g. C/C++ and platforms on which the binaries were compiled.
Our MC-EZBC is written in C/C++ in VC++.
5.
Description of how their technology behaves in terms of random access to any frame within the sequence. A description of the GOP structure and the maximum number of frames that must be decoded to access any frame, and how this would vary according to spatial, temporal and quality levels, is recommended.
The access for our encoded bit-stream is GOP by GOP generally if we want to get decoded frames in full resolution and full frame rate. But in the temporal scalability case, those frames in the lowest temporal level can be accessed directly. Otherwise, we should decode the bit-stream up to that temporal level. For example for those frames in temporal level 3, we should decode 2 frames to get those frames for a 16 frame GOP.
We chose best GOP size for each sequence. But also, the adaptive GOP function splits the GOP automatically when rapid motion or scene change occurs.
Current GOP size is as follows:
Sequence GOP size
Foreman 16 frames
Bus 16 frames
Mobile 32 frames
Football 16 frames
City 64 frames
Harbour 64 frames
Ice 64 frames
Crew 64 frames
6.
Expected Encoding and Decoding delay characteristics of their technology, including the expected performance penalty when subject to a total encoding-decoding delay constraint of 150ms. Stating the measured drop in PSNR (versus the case where no delay constraints were imposed due to this delay constraint is recommended.
We need all the frames for one GOP in the buffer when encoding. We also need all the frames for one GOP when decoding. End to end delay is, (2xGOPsize-1)/frame-rate.
Generally, there is penalty for smaller GOP size. As reported, PSNR performance will drop as GOP size becomes smaller. If this were a problem, a hybrid coder stage could be introduced at the lowest temporal level, but we have not done this.
7.
Description of the complexity of their technology, including the following, as applicable:
3
March 2, 2004
Motion Estimation (ME)/Motion Compensation (MC): number of reference pictures, sizes of frame memories, pixel accuracies (e.g. 1/2, ¼ pel), block size, interpolation filter.
Motion Estimation is done with Hierarchical Variable Size Block Matching (HVSBM) with block size varying from 64x64 down to 4x4. Motion compensation is done with Overlapped Block
Motion Compensation (OBMC). Number of reference frames is two, one future frame and one previous frame. Motion vector is 1/8 pixel accuracy. Interpolation filter is 8-tap separable filter.
Spatial/texture transform: use of integer/floating points precision, transform characteristics (such as length of the filter/block size)
Spatial transform is done with Daubechies 9/7 filter banks.
Description of the decoder complexity and its scalability features. The relation between complexity and frame size, frame rate and /or bit rate should be documented.
Decoder can run real-time for QCIF format and 15fps currently after optimized by Yufeng Shan.
Currently complexity in decoder is not scalable. The main complexity lies in the synthesis of the
Motion Compensated Temporal Filter (MCTF). So as resolution goes down one step, complexity decreases to 1/4 of previous one. As the frame rate goes down one step, complexity decreases to
1/2 of previous one. As the bit rate goes down but resolution and frame rate remain the same, complexity remains the same.
8. Description of rate control spatiotemporal window e.g. whether rate control is applied on 1s, one GOP (Group of
Pictures) or the number of spatiotemporal subbands.
Our rate control is applied on all the spatiotemporal subbands. We interlace the bitstreams and pull out sub-bitstream in the following style: get the first (most important) sub-bitplanes, subband-by-subband, for all available temporal levels and all GOPs (all frames), then get next sub-bitplane, subband-by-subband, for all available temporal levels and all GOPs (all frames)…… until all the bit budget is used up. In this way, we can get the most important parts of the subbands and keep constant quality from frame to frame. Currently, we apply rate control through all the frames in the clip. We have a frame number constraint, i.e. time constraint, in other versions of
MC-EZBC, so that we could get near constant quality only over a GOP, etc.
9. Granularity of scalability levels (spatial, temporal, SNR)
Spatial scalability is down to 88x72 (resolution for Y component). CIF sequences have 3 levels for spatial scalability. SD sequences have 4 levels for spatial scalability.
Temporal scalability is related to GOP size. GOP size with 16 frames has 4 levels for temporal scalability. GOP size with 32 frames has 5 levels for temporal scalability.
4
March 2, 2004
SNR scalability generally is only constrained by the minimum bit-rate of the motion vector bit-stream. We can get any bit rate video beyond that minimum constraint.
10. Specify whether a “base-layer” is used and whether it is compliant with an existing standard.
There is no “base-layer” in our coder.
11. State whether there are any circumstances where the proposal could produce a bitstream compliant with an existing standard (e.g. JPEG2000)
The frame data encoder EZBC in our video coder MC-EZBC is similar to JPEG2000, which actually is an enhanced image coder based on SPECK. After small changes, the bit-stream should be compatible to JPEG2000. Also, one could substitute JP2K for EZBC, as we have proposed earlier.
5
March 2, 2004
:
1
2
3
4
Requirement spatial scalability temporal scalability
SNR scalability
Complexity scalability
5 Region of interest scalability
Comments
Our coder can provide minimum resolution 88x72 (Y component), CIF sequences have
3 level spatial scalability, and SD sequences have 4 level spatial scalability.
It’s related to GOP size. If GOP size is 16 frames, there is 4 levels of temporal scalability. If GOP size is 32 frames, it’s 5 levels of temporal scalability.
We can pull out sub-bitstreams from the near lossless bitstream at any rate beyond the low bound given by smallest set of motion vectors.
The main computation lies in motion estimation. Assume the current computation is 2 units. We can do Y HVSBM instead of colour HVSBM to save 0.5 unit computation.
We can omit OBMC iteration to save another 0.5 unit computation.
Currently we do not provide this function.
6 object based scalability
Currently we do not provide this function.
7
8
9
10
11
12 combined scalability
Robustness to different types of transmission errors
Combinations are possible, like resolution scaling followed by rate scaling, etc. This version of the program does not have the meta data for that though.
An MD-FEC version has been developed that is very robust to lost packets. This proposal version does not have such a feature though.
Yes, but only for the MD-FEC version. graceful degradation robustness under “best-effort” networks
10, specially in the presence of server and path diversity
Yes, but only for the MD-FEC version.
Yes, for an MD-FEC version currently in development. colour depth
8 bits
Highly efficient, especially at high spatial resolutions.
13
14 coding efficiency performance base-layer compatibility
No base layer.
Can reduce the GOP size and can reduce the motion estimation complexity.
15 Low complexity codecs
16 end-to-end delay
2xGOPsize-1
17
18
19
20 random access capability
As described above support for coding interlaced material system interface to support quality selection
Currently we do not support for coding interlaced material, but it is considered a ‘solved problem,’ and so should not be hard to include.
? multiple adaptations
This has been demonstrated in a DIA context, but not included with the proposal version.
6
March 2, 2004
frame
(YUV format)
Color
HVSBM with pruning
Scalable coding based on CABAC
Detecting
IBLOCK motion vector
OBMC
Iteration bit-stream
EZBC
Coder
DEFAULT REVERSE PBLOCK IBLOCK
Block processing order in MCTF with OBMC
Fig. 1 Functional block diagram for MC-EZBC
The main functional diagram is shown in Fig. 1. The original frames in one GOP are input to MC-EZBC
(typically there are 16 frames for one GOP). In the following we describe each functional block in some details.
V
00
V
'
00
V
01 V
'
01
V
02 V
'
02
V
03 level 0
V
10
V
'
10
V
11 level 1
V
20 level 2 level 3
Fig. 2 Motion estimation in a GOP with size 8 frames. V
' xx
’s are estimated for motion concatenation but not coded.
V
10 is estimated starting from V
'
00
+ V
01
, similar for V
20
.
Motion estimation is first done through Hierarchical Variable-Size Block Matching (HVSBM), with block sizes varying from 64
64 to 4
4 for each pair of frames as shown in Fig. 2. The set of motion vectors V
' xx
is only used for the prediction of motion vectors in next temporal level [1]. For example, we estimated the additional motion vectors V
'
00
but not coded. V
'
00
+ V
01
will be used as the starting point
7
March 2, 2004 for the estimation of V
10
, which is estimated in a reduced search range, namely if original HVSBM search range is
, we start a search from V
'
00
+ V
01
in a reduced range about
/ 2
/ 2 .
The motion estimation is bi-directional, using one previous reference frame and one future reference frame. Thus, the number of reference frames is two . After we form the full motion vector quad-tree, if we detect there are more than 50% “unconnected” pixels with the algorithm in [2] in a given block, the block is classified as a REVERSE block and a good match for it is searched for in the past frame. We also search for good matches in the past frame for those blocks with big distortion after motion compensation from the future frame, an they are classified as PBLOCK after HVSBM. The better of the two matches is chosen as the reference block, and the block mode is changed accordingly. When this bi-directional match results in too many “unconnected” pixels, we decide to discontinue further temporal analysis for that frame pair based on a threshold value. This feature gives rise to adaptive GOP size .
Now we also use chrominance data U and V besides luminance data Y in order to get a more stable motion field. For this color HVSBM [3], we use original or sub-sampled U and V frames. The computation time for color HVSBM is 1.5 times that of HVSBM running on the luminance data alone.
Each dimension of U and V is half that of Y in the chosen 4:2:0 color space and we need sub-pixel accuracy motion vectors, but we keep the full accuracy for the U and V motion vectors, and saturate this accuracy at one eighth pixel.
After we form the full motion vector quad-tree with bi-directional color HVSBM, we prune it back in the sense of MV rate – error distortion optimization using a fixed
. We introduced a rough scalable motion vector coder [4] based on Context based Adaptive Binary Arithmetic Coder (CABAC) [5] with respect to temporal, SNR and resolution as will be described in the following.
DEFAULT
PBLOCK
CONNECTED
UNCONNECTED
I-BLOCK
(new)
REVERSE
A t
, H t
B t 1
B t
, L t
Fig. 3 The block modes in high temporal frame for MC-EZBC.
t
H is temporally aligned with aligned with t
B . IBLOCK is for the block employing spatial interpolation/ prediction.
t
A , and t
L is temporally
Since there are inevitable areas with occlusions and irregular motion, we classify the pruned blocks into three categories in MC-EZBC as shown in Fig. 3: DEFAULT blocks with motion vector between current frame A t
and next frame B t
, where we do the lifting predict step in t
A and the update step for
8
March 2, 2004
“connected” blocks in
B t
.
REVERSE blocks with motion vector between t
A and B t
1
, where we do motion compensation from a corresponding block in B t
1
. The PBLOCK has a motion vector between
A t and B t
, we do a predict step in A t
, and omit an update step for “unconnected” pixels in B t
.
This
PBLOCK originates from multiply connected pixels as well as from singly connected pixels with too large a prediction error, as determined by a threshold
=0.5 between the original block’s variance and the motion-compensated block's MSE.
Since PBLOCKs are classified partially from DEFAULT blocks with a threshold
=0.5, they can include poorly matched blocks between frames A t
and t
B . We regard such blocks of size 8x8 or 4x4 as potential candidates for IBLOCKs. Similarly we also look for IBLOCK candidates from the set of
REVERSE blocks. We detect IBLOCKs from the candidates and do spatial interpolation/prediction for them [6].
After IBLOCK detection, we apply overlapped block motion compensation (OBMC) to the variable size blocks as shown in Fig. 1. In our OBMC framework, we view any data received by the decoder prior to the decoding of frame k as a source of information about the true prediction scheme finally employed. For simplicity, we limit the valid information for each block to be the two horizontal, two vertical direct neighbors and itself [7], and we assume stationarity of the image and block motion field.
We use a modified 2-D bilinear window, whose performance is only a little worse than the iterated optimal window design (OWD) [8]. Each block size has its corresponding weighting window. Since a large block probably has different a motion vector from its small neighbor, a shrinking scheme is introduced between the different sized blocks, in order to reduce smoothing at a motion discontinuity. We do OBMC for all the types of blocks classified above. But, we use a different weighting window for
IBLOCK to emphasize the continuous between motion compensation and spatial interpolation/prediction.
To further optimize OBMC, we employ an iterative procedure and minimize the mean absolute difference
(MAD). Since now bi-directional color HVSBM runs on both luminance and chrominance data, it follows naturally that the OBMC iterations should be applied to YUV simultaneously. We know U and V are sub-sampled frame data after some transform from RGB data. So the weighting windows used for U and V are also sub-sampled versions of those used for Y [9].
As shown in Fig. 1, we do OBMC in a lifting implementation for DEFAULT blocks, i.e. with the prediction and update steps in normal order to reduce the noise in the area of good motion. The specific equations for OBMC in our lifting implementation are as follows [9],
9
March 2, 2004
H [ m , n ]
1
2
B [ m , n ]
1
2
k h k
[ m , n ]
~
A [ m
d m k
, n
d nk
] (1),
L [ m
d m , n
d n
]
~
H [ m
d m
d m , n
d
n
d n
]
2 A [ m
d
m
, n
d n
] (2).
The OBMC approach treats the displacement field ( d m
, d n
) as a random process, meaning that pixel
B [ m , n ] in frame
B
has motion vector ( d m k
, d nk
) with probability h k
[ m , n ] from its corresponding probability window, and is compensated by the weighted average of the predicted sub-pixels. In these equations
( d m
,
d n
) is the nearest integer to ( d m
, d n
) . Although the form of the low temporal frame seems the same as that without OBMC, actually OBMC affects both high and low temporal frames. Also, importantly, the low temporal frames from OBMC are visually preferred and more suitable for further stages of MCTF (i.e. nearly artifact free)
For PBLOCK, there is only a prediction step (1) from a future frame, and for REVERSE block, there is only a prediction step (1) from a previous frame. For IBLOCK, it’s predicted by the weighted average of spatial prediction/interpolation from spatial neighbors in the corresponding frame.
After careful arrangement of the bit-stream for motion vectors, namely grouping the bit-stream of motion vectors in each temporal layer with the bit-stream of frame data in that temporal level, temporal scalability of motion vectors follows naturally. But the bit-stream for motion vectors was still not scalable with respect to SNR and resolution. We propose herein a new scalable motion vector coder based on
Context based Adaptive Binary Arithmetic Coder (CABAC) for MC-EZBC. A layered structure for motion vector coding and Alphabet General Partition (AGP) of motion vector symbols are employed for
SNR and resolution scalability of motion vector bit-stream [4]. With these two new features and the careful arrangement of the motion vector bit-stream in MC-EZBC, we can roughly have temporal, SNR and resolution scalability for motion vectors, and improve significantly both visual and objective results for low rates and low resolution with slight PSNR loss (about 0.05dB) and unnoticeable visual loss at high rates. Initial results on scalable motion vector coding for MC-EZBC were obtained in [12].
5.1 Scan/spatial Prediction for Motion Vectors mv2 mv1 mv3 mv
10
March 2, 2004
Fig. 4 Spatial prediction from three direct neighbors for the motion vector in some block
As shown in Fig.4, now the motion vector mv in some block is predicted from its three direct spatial neighbors mv1 , mv2 , and mv3 similar to [1]. The spatial prediction scheme can predict the motion vector more efficiently as shown in our previous work. However, in MC-EZBC, currently there are four kinds of blocks: DEFAULT blocks with prediction and update steps in lifting implementation between current and next frame, PBLOCK with prediction step from next frame only, REVERSE blocks with prediction from previous frame only, and directional IBLOCK with spatial prediction from the neighbor pixels in the same frame. The motion vectors in DEFAULT and PBLOCK are between current and next frames
(defined as normal motion vectors), those in REVERSE blocks are between current and previous frames
(defined as reverse motion vectors), and there are no motion vectors in IBLOCK but spatial prediction mode.
We found the characteristic in normal motion vectors and reverse ones are quite different. So we predict and code the two sets of motion vectors separately, which can improve the prediction and coding efficiency as shown by our experiment. Then there are two loops for motion vector coding. First loop is for normal motion vector coding, and second loop is for reverse motion vector coding.
For the motion vector mv in some block, if there is at least one motion vector in mv1, mv2 and mv3 with the same type (normal or reverse) as that in this block, we predict from spatial neighbors for it. If there is no motion vector with the same type in the three neighbors, we predict it from the previous motion vector with the same type in quad-tree scan order. We consistently use this combined spatial and scan-order prediction in the following sections. The prediction residual will be coded by CABAC.
5.2 Alphabet General Partition of Motion Vector Symbols
In MC-EZBC, motion estimation is done at 1/8 pixel accuracy. Although it can reduce MSE after motion compensation, the quarter and eighth pixel bits of motion vectors are quite random due to the camera noise and quantization noise. This is because the MSE after motion compensation is already near the total noise variance after quarter or eighth pixel accuracy motion compensation. So we model the motion vector as follows, (without loss of generality we present the one dimensional case although a motion vector has two components.) r k
s k
n k
Here r k
the estimated k th
motion vector, s k is the true k th
motion vector, and n k
is the noisy motion vector due to the inaccuracies in the frame data. All of the three components are at 1/8 pixel accuracy. Since the noises in frame data are quite small, we believe they can only contaminate the quarter pixel and eighth pixel accuracy of the estimated motion vector during motion estimation. Then we divide the estimated motion vector r k
into two symbols, r k
r k 1
r k 2
11
March 2, 2004 where r k 1 is the major symbol up to 1/2 pixel accuracy, and r k 2 is the sub-symbol for quarter and eighth pixel accuracy. For example, if the estimated motion vector r k
=-1.625, then r k 1
=-1.5 and r k 2
=-0.125.
We predict the major symbol with the above scheme in section and code the prediction residual with
CABAC, but code the sub-symbol in binary sequence.
Actually we do not need to code the sign for sub-symbol because we can know the sign from the major symbol. The only exception is for those motion vectors in the range [-0.375, +0.375]. For those motion vectors, the major symbol r k 1
=0. Then we don't know the sign for sub-symbol r k 2
. We need one additional bit to indicate whether the sub-symbol is positive or negative.
At high rates and full resolution, we transmit all the three parts of motion vector bit-stream. At the decoder we receive the lossless motion vectors to reconstruct the frames. But at low rates, we can throw away the sub-symbol and sign parts, and get more room for residual (frame) data. At the decoder, we can also obtain lossy motion vectors. After we throw away some motion vector bits, in the same bit budget, we can get more frame data. Although the motion vectors are lossy, they can be compensated by the increased accurate frame data. The total performance will be improved by throwing away some bits for motion vectors as shown by our experiments. Also at low resolution, since in MC-EZBC the motion vectors will be scaled down, obviously we do not need the same accuracy for motion vectors as that in full resolution. So the sub-symbol and sign parts will also be thrown away, and the saved bits will be spent on frame data.
An R-D model for scaling the motion vector fields is under development [10]. In our future work, we will try to partition motion vector symbols further, i.e. not only partition quarter and eighth accuracy but also half or integer accuracy. Moreover, we will introduce context model for those sub-symbol instead of binary coding. Then we can have more levels of scalability for motion vector accuracy. Right now, the match between scaled motion vectors and bit-rate/resolution/frame-rate is determined by trial.
5.3 Layered Structure of Motion Vector Coding
Since we know, after one spatial level down, block size is halved, i.e. 16x16, 8x8 and 4x4 blocks become
8x8, 4x4 and 2x2 blocks. Obviously, after one or two spatial level down, we do not need the same number of motion vectors as that at full resolution. Moreover, the motion vectors are also scaled down by a factor of 2 after one spatial level down. If adjacent two motion vectors have a difference less than 2 pixels at full resolution, then in half resolution they will have less than 1 pixel difference. For these two motion vectors, we can replace them by any one of them.
Assume four grouped blocks (children) in quad-tree structure have similar motion vectors. We can replace the four motion vectors by one representative after one spatial level down. Now we just choose the first motion vector in quad-tree scan order as the representative for the four motion vectors in the children.
12
March 2, 2004
We also notice the normal motion vectors are between current frame and next frame, but reverse motion vectors are between current frame and previous frame. They have different characteristics. So we should reserve up to 2 representatives for the four children, one for normal motion vectors and the other for reverse motion vectors in the four children blocks.
We use a sub-sample selection scheme from bottom to up to form the layered structure for motion vector coding. In each layer, we still use the scan/spatial prediction as stated above. We also continue using context model from layer to layer. Since we simply choose the first normal and reverse motion vector in quad-tree scan order to be the representatives from the four children, the coded number of motion vectors is the same as non-layered coding. Then we code the prediction residuals in bigger blocks as base layer of motion vectors, smaller blocks as enhancement layers. When coding enhancement layers, we use all the information up to that layer, i.e. updated motion field and updated context models from previous layers.
The rough layer structure for motion vector coding shows quite good results. Since the main assumption for layered structure coding may be invalid frequently. In our future work, we will selectively do the layered structure for motion vector coding, instead of current method that does layered structure assuming the assumption always stands. The criteria for selectively layered structure will consider the average difference or biggest difference among the motion vectors in the four grouped motion vectors. More important, we should consider the local gradient of the compensated frame data. Namely, if the area is very smooth, then the difference among the motion vectors will not affect the reconstructed distortion too much. However, if the area is full of texture, then a little difference among the motion vectors may cause big distortion. This also involves a Rate-Distortion model similar to above one [10].
After MCTF analysis with OBMC, a set of high temporal frames and typically one low temporal frame as shown in Fig. 2 will be input to EZBC coder. The EZBC coder will code the frame data with scalability respect to SNR, temporal and resolution, by interleaving the bitstreams of the spatial and the temporal subbands. Please refer to [11] for the details about the details of the EBZC coder for fine scalable video compression.
13
March 2, 2004
[1] A. Golwelkar and J. W. Woods Improved Motion Vector Coding for the Sliding Window (SW-) EZBC
Video Coder . JTC1/SC29/WG11/M10415, Dec. 2003, Hawaii, USA.
[2]
P. Chen and J. W. Woods “Bi-directional MC-EZBC with lifting implementation”, IEEE Trans.
Circuit and System for Video Technology, to appear.
[3] Y. Wu and J. W. Woods, Recent Improvements in the MC-EZBC Video Coder ,
ISO/IEC/JTC1/SC29/WG11/M10158, Hawaii, USA, Dec. 2003.
[4] Y. Wu and J. W. Woods, Scalable Motion Vector Coding for MC-EZBC, report in CIPR, Feb 8, 2004, to be submitted.
[5] D. Marpe, H. Schwarz and T. Wiegand, Context-based adaptive binary arithmetic coding in the
H.264/AVC video compression standard . IEEE Trans. on Circuit and Systems for Video Technology,
Vol.13, No.7 pp620-636, July 2003.
[6] Y. Wu and J. W. Woods, Directional Spatial IBLOCK for the MC-EZBC Video Coder , accepted by
ICASSP 2004.
[7] Y. Wang, J. Ostermann and Y. Zhang, Video Processing and Communications, Prentice Hall, p.
296-300, 2002.
[8] M.T. Orchard and G.J. Sullivan, Overlapped block motion compensation: an estimation-theoretical approach , IEEE Trans. on Image Processing, vol.3, p. 693-699, Sept. 1994.
[9] Y. Wu, R. A. Cohen and J. W. Woods, An Overlapped Block Motion Estimation for MC-EZBC ,
ISO/IEC/JTC1/SC29/WG11/M10158, Brisbane, AU, Oct. 2003.
[10] Y. Wu and J. W. Woods, Rate-distortion Model for Scalable Motion Vector Coding in MC-EZBC , preparing.
[11] Shih-Ta Hsiang, Highly Scalable Subband/Wavelet Image and Video Coding , Rensselaer
Polytechnic Institute, Troy, NY, May 2002.
[12] S. S. Tsai, H.-M. Hang, and T. Chiang, Motion Information Scalability for MC-EZBC: Response to
Call for Evidence on Scalable Video Coding , ISO/IEC JTC1/SC29/WG11 MPEG03/M9756.
14