Video compression with 1-D directional transforms in H.264/AVC Please share

advertisement
Video compression with 1-D directional transforms in
H.264/AVC
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation
Kamisli, Fatih, and Jae S. Lim. “Video Compression with 1-D
Directional Transforms in H.264/AVC.” IEEE International
Conference on Acoustics Speech and Signal Processing, 2010.
738–741. © Copyright 2010 IEEE
As Published
http://dx.doi.org/10.1109/ICASSP.2010.5495034
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Version
Final published version
Accessed
Thu May 26 10:32:17 EDT 2016
Citable Link
http://hdl.handle.net/1721.1/72132
Terms of Use
Article is made available in accordance with the publisher's policy
and may be subject to US copyright law. Please refer to the
publisher's site for terms of use.
Detailed Terms
VIDEO COMPRESSION WITH 1-D DIRECTIONAL TRANSFORMS IN H.264/AVC
Fatih Kamisli and Jae S. Lim
Research Laboratory of Electronics
Massachusetts Institute of Technology
ABSTRACT
Typically the same transforms, such as the 2-D Discrete Cosine
Transform (DCT), are used to compress both images in image compression and prediction residuals in video compression. However,
these two signals have different spatial characteristics. In [1], we
analyzed the difference between these two signals and proposed
1-D directional transforms for prediction residuals. In this paper,
we provide further experimental results using these transforms in
the H.264/AVC codec and present other related information which
can provide insights in understanding the use of these transforms in
video coding applications.
Index Terms— Discrete cosine transforms, Motion compensation, Video coding
1. INTRODUCTION
An important component of image and video compression systems
is a transform. A transform is used to transform image intensities. A transform is also used to transform prediction residuals of
image intensities, such as the motion-compensation residual (MCresidual), the resolution-enhancement residual in scalable video coding, or the intra-prediction residual in H.264/AVC. Typically, the
same transform is used to transform both image intensities and prediction residuals. For example, the 2-D Discrete Cosine Transform
(2-D DCT) is used to compress image intensities in the JPEG standard and MC-residuals in many video coding standards. However,
prediction residuals have different spatial characteristics from image
intensities [1, 2]. Therefore, it is of interest to study if transforms
better than those used for image intensities can be developed for prediction residuals.
Recently, new transforms have been developed that can take advantage of locally anisotropic features in images [3, 4, 5, 6]. A
conventional transform, such as the 2-D DCT or the 2-D Discrete
Wavelet Transform (2-D DWT), is carried out as a separable transform by cascading two 1-D transforms in the vertical and horizontal dimensions. This approach does not take advantage of locally
anisotropic features present in images because it favors horizontal
or vertical features over others. The new transforms adapt to locally anisotropic features in images by performing the ſltering along
the direction where image intensity variations are smaller. This is
achieved, for example, by directional lifting implementations of the
DWT [6]. Even though most of the work is based on the DWT, similar ideas have been applied to DCT-based image compression [4].
Inspection of prediction residuals shows that locally anisotropic
features are also present in prediction residuals. However, unlike in
image intensities, a large number of pixels in prediction residuals
have zero amplitude. Pixels with nonzero amplitude concentrate in
regions which are difſcult to predict, such as moving object boundaries, edges, or texture regions. Therefore a major portion of the
978-1-4244-4296-6/10/$25.00 ©2010 IEEE
738
signal in prediction residuals concentrates along such object boundaries and edges, forming 1-D structures along them. As a result,
anisotropic features in prediction residuals typically manifest themselves as locally 1-D structures at various orientations. This is in
contrast to image intensities, which have 2-D anisotropic structures.
This difference can be quantiſed using an auto-covariance analysis
[1].
We proposed 1-D directional block transforms for the MCresidual in [1] and showed potential gains achievable with one
example set of such 1-D transforms on 8x8-pixel blocks within the
H.264/AVC codec. In this paper, we provide further experimental
results that show achievable gains using 1-D transforms deſned
on both 8x8-pixel blocks and 4x4-pixel blocks, which are available transform sizes in H.264/AVC. We also provide other useful
information obtained from these experiments. These include the
comparison of the achievable gains at relatively lower and higher
picture qualities, the fraction of the bitrate used to code the selected
transform for each block, and the average number of times each
transform is selected.
The remainder of the paper is organized as follows. In Section
2, we discuss the 1-D directional transforms that were used in the
experiments. Some details regarding the integration of these transforms into the H.264/AVC codec (JM reference software 10.2) are
discussed in Section 3. We then present the experimental results and
insights obtained from these results in Section 4. We conclude the
paper in Section 5.
2. 1-D DIRECTIONAL TRANSFORMS
As described in Section 1, a large number of local regions in prediction residuals consist of 1-D structures, which follow object boundaries or edges present in the original image. It appears that using
2-D transforms with basis function that have 2-D support is not the
best choice for such regions. Therefore we propose to use transforms
with basis functions whose support follow the 1-D structures of the
prediction residuals. Speciſcally, we propose to use 1-D directional
transforms for prediction residuals. In our initial paper [1], we used
1-D transforms on 8x8-pixel blocks for the experiments. In this paper, we use 1-D transforms on both 8x8-pixel and 4x4-pixel blocks
since both block sizes are available in H.264/AVC. We note that the
idea of 1-D transforms for prediction residuals can also be extended
to wavelet transforms [7].
The 1-D directional transforms that we use in our experiments
are shown in Figure 1. Figure 1-a shows the ſrst ſve 1-D block
transforms deſned on 8x8-pixel blocks. The remaining eleven are
symmetric versions of these ſve and can be easily derived. Figure
1-b shows the ſrst three 1-D block transforms deſned on 4x4-pixel
blocks. The remaining ſve are symmetric versions of these three and
can be easily derived.
ICASSP 2010
(a) First ſve out of sixteen 1-D transforms are shown.
Remaining eleven are symmetric versions.
(b) First three out of eight 1-D transforms are shown.
Remaining ſve are symmetric versions.
Fig. 1. 1-D directional transforms deſned on (a) 8x8-pixel blocks and (b) 4x4-pixel blocks. Each transform consists of a number of 1-D
DCT’s deſned on groups of pixels shown with arrows.
(a) Scans for 1-D transforms shown in Figure 1-a.
(b) Scans for 1-D transforms shown in Figure 1-b.
Fig. 2. Scans used in coding the quantized coefſcients of 1-D transform deſned on (a) 8x8-pixel blocks and (b) 4x4-pixel blocks.
Each of the 1-D block transforms consists of a number of 1D patterns which are all roughly directed at the same angle. For
example, all 1-D patterns in the ſfth 1-D block transform deſned on
8x8-pixel blocks or the third 1-D block transform deſned on 4x4pixel blocks are directed towards south-east. The angle is different
for each of the 1-D block transforms and alltogether they cover 180◦ ,
for both 8x8-pixel blocks and 4x4-pixel blocks. Each 1-D pattern
in any 1-D block transform is shown with arrows in Figure 1 and
deſnes a group of pixels over which a 1-D DCT is performed.
3. INTEGRATION OF 1-D TRANSFORMS INTO THE
H.264/AVC CODEC
To integrate these transforms into a codec, a number of related aspects need to be carefully designed. These include implementation
of the transforms, quantization of the transform coefſcients, coding of the quantized coefſcients, and coding of the side information
which indicates the selected transforms for each block.
In H.264/AVC, transform and quantization are merged together
so that both of these steps can be implemented with integer arithmetic using addition, subtraction and bitshift operations. This has
many advantages including the reduction of the computational complexity [8]. The computational complexity is not our concern in this
paper, and we use ƀoating point operations for these steps (including
the implementation of 2-D DCT.) This does not change the results.
We note that it is possible to merge the transform and quantization
steps of our proposed 1-D transforms so that these steps can also be
implemented with integer arithmetic.
designs. For the experiments in this paper, however, we use the
same method of H.264/AVC with the exception of the scan. We use
different scans for each of the 1-D transforms.
Figure 2-b shows the scans for the 1-D transforms deſned on
4x4-pixel blocks shown in Figure 1-b. These scans were designed
so that coefſcients less likely to be quantized to zero are closer to
the front of the scan and coefſcients more likely to be quantized to
zero are closer to the end of the scan. Scans for the remaining 1-D
transforms deſned on 4x4 blocks are symmetric versions of those in
Figure 2-b.
For transforms deſned on 8x8-pixel blocks, H.264/AVC generates four length-16 scans instead of one length-64 scan, when entropy coding is performed in CAVLC mode. Figure 2-a shows the
four length-16 scans for each of the 1-D transforms deſned on 8x8pixel blocks shown in Figure 1-a. These scans were designed based
on two considerations. The ſrst is that coefſcients less likely to
be quantized to zero are closer to the front of the scan and coefſcients more likely to be quantized to zero are closer to the end of the
scan. The second consideration is that neighbouring 1-D patterns
are grouped into one scan. The 1-D structures in prediction residuals are typically concentrated in one region of the 8x8-pixel block
and the 1-D transform coefſcients representing them will therefore
be concentrated in a few neighbouring 1-D patterns. Hence, grouping neighbouring 1-D patterns into one scan enables capturing those
1-D transform coefſcients in as few scans as possible. More scans
that consist of all zero coefſcients can lead to more efſcient overall
coding of coefſcients.
3.2. Coding of side information
3.1. Coding of 1-D transform coefſcients
We use CAVLC (context-adaptive variable-length codes) mode to
perform entropy coding. To code the quantized transform coefſcients in this mode, H.264/AVC zigzag-scans these coefſcients.
It uses a run-length coding algorithm to code the positions of the
nonzero quantized coefſcients and a context-adaptive method to
code the amplitudes. Note that these methods are tailored to the
statistics of the transform coefſcients which are obtained from an
approximation of the 2-D DCT. Ideally, it would be best to design
a new method which is adapted to the characteristics of the coefſcients of the proposed 1-D transforms. We are working on such
739
The selected transform for each block is encoded as side information using variable-length codes (VLC). We performed experiments
to estimate the probabilities of selection of transforms. From these
experiments and the effect of the side information on the increase
of the overall bitrate, we decided to use the following codeword assignments. If a macroblock uses 8x8-pixel transforms, then for each
8x8-pixel block, the 2-D DCT is represented with a 1-bit codeword,
and each of the sixteen 1-D transforms is represented with a 5-bit
codeword.
If a macroblock uses 4x4-pixel transforms, then for each 4x4pixel block, the 2-D DCT can be presented with a 1-bit codeword
25
BDŦbitrate savings (%)
20
15
10
5
4. EXPERIMENTAL RESULTS
AVG
0
bridgeŦcŦ
qcif
carphoneŦ
qcif
claireŦ
qcif
containerŦ
qcif
foremanŦ
qcif
highwayŦ
qcif
missŦaŦ
qcif
motherŦdŦ
qcif
salesmanŦ
qcif
suzieŦ
qcif
trevorŦ
qcif
flowerŦ
cif
basketŦ
cif
foremanŦ
cif
mobileŦ
cif
parkrunŦ
720p
and each of the eight 1-D transforms can be represented with a 4-bit
codeword. Alternatively, four 4x4-pixel blocks within a single 8x8pixel block can be forced to use the same transform, which allows
to represent the selected transforms for these four 4x4-pixel blocks
with a single 4-bit codeword. This reduces the average bitrate for
the side information but will also reduce the ƀexibility of transform
choices for 4x4-pixel blocks. We use the alternative method in our
experiments because it usually gives slightly better results.
(a) BD-bitrate savings of 8x8-1D w.r.t. 8x8-dct.
4.1. Bjontegaard-Delta (BD) bitrate results
To present the gains achievable with the 1-D transforms, we compare the above encoders using the BD-bitrate metric [9]. This metric
measures the average bitrate savings of one encoder with respect
to another encoder within the picture quality range covered by four
quantization parameters, which in our case roughly corresponds to
30dB to 40dB.
The BD-bitrate savings results are shown in Figure 3. We compare several encoders which have access to same block size transforms. Figure 3-a compares 8x8-dct to 8x8-1D. This is what was
reported in [1]. Figure 3-b compares 4x4-dct to 4x4-1D and Figure 3-c compares 4x4-and-8x8-dct to 4x4-and-8x8-1D. The average
bitrate savings are 11.4% for Figure 3-a, 4.1% for Figure 3-b, and
4.8% for Figure 3-c.
The bitrate saving is largest when comparing encoders which
have access to only 8x8-pixel block transforms and smallest when
comparing encoders which have access to only 4x4-pixel block
transforms. This is in part because the distinction between 2-D
transforms and 1-D transforms becomes less when block-size is reduced. For example, for 2x2-pixel blocks, the distinction would be
even less, and for the extreme case of 1x1-pixel blocks, there would
be no difference at all.
740
BDŦbitrate savings (%)
25
20
15
10
5
AVG
bridgeŦcŦ
qcif
carphoneŦ
qcif
claireŦ
qcif
containerŦ
qcif
foremanŦ
qcif
highwayŦ
qcif
missŦaŦ
qcif
motherŦdŦ
qcif
salesmanŦ
qcif
suzieŦ
qcif
trevorŦ
qcif
flowerŦ
cif
basketŦ
cif
foremanŦ
cif
mobileŦ
cif
parkrunŦ
720p
0
(b) BD-bitrate savings of 4x4-1D w.r.t. 4x4-dct.
25
BDŦbitrate savings (%)
20
15
10
5
AVG
0
bridgeŦcŦ
qcif
carphoneŦ
qcif
claireŦ
qcif
containerŦ
qcif
foremanŦ
qcif
highwayŦ
qcif
missŦaŦ
qcif
motherŦdŦ
qcif
salesmanŦ
qcif
suzieŦ
qcif
trevorŦ
qcif
flowerŦ
cif
basketŦ
cif
foremanŦ
cif
mobileŦ
cif
parkrunŦ
720p
We present experimental results to illustrate the performance of the
proposed transforms within the H.264/AVC codec (JM reference
software 10.2). We use 9 QCIF resolution sequences at 30 framesper-second (fps), 4 CIF resolution sequences at 30 fps, and one 720p
resolution sequence at 60 fps. We encode 20 frames for the 720p
sequence and 180 frames for all other sequences.
Some of the important encoder parameters are as follows. The
ſrst frame is encoded as an I-frame, and all remaining frames are
encoded as P-frames. Quarter-pixel-resolution full-search motionestimation with all available block-sizes are used. Entropy coding
is performed in CAVLC mode. Rate-distortion (RD) optimization is
done in the high-complexity mode, which encodes all possible macroblock coding options and chooses the best. Selection of the best
transform for each block is also performed in an RD optimized way
by encoding each block with every available transform and choosing
the transform with the smallest RD cost.
We encode all sequences at four different picture quality levels
(with quantization parameters 24, 28, 32 and 36 ) using 6 different
encoders, each having access to a different set of transforms. These
encoders are
• 4x4-dct
• 4x4-1D (includes 4x4-dct)
• 8x8-dct
• 8x8-1D (includes 8x8-dct)
• 4x4-and-8x8-dct
• 4x4-and-8x8-1D (includes 4x4-and-8x8-dct).
(c) BD-bitrate savings of 4x4-and-8x8-1D w.r.t. 4x4-and-8x8-dct.
Fig. 3. Comparison of several encoders using BD-bitrate savings.
4.2. Bitrate savings at high and low picture qualities
The BD-bitrate metric gives savings averaged over a wide range of
picture qualities. To determine bitrate savings for a speciſc range,
Figure 4 shows RD curves of several encoders for the foreman sequence at QCIF resolution. It can be observed that bitrate savings
are higher at high picture qualities than at low picture qualities. For
the comparison between 4x4-and-8x8-dct and 4x4-and-8x8-1D, the
savings are 2.2% at 32dB, 8.7% at 38dB and 6.3% when averaged
over the range of 32dB to 40dB. This trend, which implies larger
savings at higher picture qualities and smaller savings at lower picture qualities, is present in many sequences. This is in part because
at higher picture qualities the bitrate used for coding the side information becomes a smaller fraction of the entire bitrate.
4.3. Bitrate used for coding the side information
The average fraction of the total bitrate used for coding the side information is 3.6% for 4x4-1D, 5.9% for 8x8-1D and 4.4% for 4x4and-8x8-1D. These are averages obtained from all sequences at all
picture qualities. The lowest fraction is used by 4x4-1D and the
highest fraction is used by 8x8-1D. The reason is that 4x4-1D uses
a 1-bit (2-D DCT) or a 4-bit (1-D transforms) codeword for every
four 4x4-pixel blocks with coded coefſcients, and the 8x8-1D uses
a 1-bit or a 5-bit codeword for every 8x8-pixel block with coded coefſcients. In addition, the probability of using a 1-D transform is
35
40
Probability (%)
30
36
34
20
15
10
(a) RD-curves of encoders with 4x4-pixel block transforms.
35
40
8x8 1ŦD tr#16
8x8 1ŦD tr#14
8x8 1ŦD tr#15
8x8 1ŦD tr#13
8x8 1ŦD tr#8
8x8 1ŦD tr#9
8x8 1ŦD tr#10
8x8 1ŦD tr#11
8x8 1ŦD tr#12
8x8 1ŦD tr#7
8x8 1ŦD tr#5
8x8 1ŦD tr#6
8x8 1ŦD tr#4
350
8x8 1ŦD tr#2
8x8 1ŦD tr#3
300
4x4 1ŦD tr#8
250
8x8 2ŦD DCT
8x8 1ŦD tr#1
150
200
Bitrate (kb/s)
4x4 1ŦD tr#6
4x4 1ŦD tr#7
100
4x4 1ŦD tr#5
50
4x4 1ŦD tr#3
4x4 1ŦD tr#4
0
4x4 1ŦD tr#2
0
4x4 2ŦD DCT
4x4 1ŦD tr#1
5
8x8Ŧdct
8x8Ŧ1D
32
(a) Probabilities at low picture qualities (QP is 36).
Probability (%)
30
38
36
34
25
20
15
10
8x8 1ŦD tr#16
8x8 1ŦD tr#14
8x8 1ŦD tr#15
8x8 1ŦD tr#13
8x8 1ŦD tr#9
8x8 1ŦD tr#10
8x8 1ŦD tr#11
8x8 1ŦD tr#12
8x8 1ŦD tr#8
350
8x8 1ŦD tr#6
8x8 1ŦD tr#7
300
8x8 1ŦD tr#5
250
8x8 1ŦD tr#3
8x8 1ŦD tr#4
150
200
Bitrate (kb/s)
8x8 1ŦD tr#2
100
4x4 1ŦD tr#8
50
8x8 2ŦD DCT
8x8 1ŦD tr#1
0
4x4 1ŦD tr#6
4x4 1ŦD tr#7
0
4x4 1ŦD tr#5
32
4x4 1ŦD tr#2
5
4x4ŦandŦ8x8Ŧdct
4x4ŦandŦ8x8Ŧ1D
4x4 1ŦD tr#3
4x4 1ŦD tr#4
PSNR (dB)
25
4x4 2ŦD DCT
4x4 1ŦD tr#1
PSNR (dB)
38
(b) RD-curves of encoders with 4x4- and 8x8-pixel block transforms.
(b) Probabilities at high picture qualities (QP is 24).
Fig. 4. Rate-distortion curves of several encoders for the foreman
sequence at QCIF resolution. Bitrate savings at higher bitrates are
larger than at smaller bitrates.
Fig. 5. Average probability of selection for each transform at (a) low
picture qualities and (b) high picture qualities for the 4x4-and-8x81D encoder.
6. REFERENCES
higher in 8x8-1D than in 4x4-1D.
4.4. Probability of selection of transforms
Information about how often each transform is selected is presented
in Figure 5. These numbers depend on the encoded sequence and
picture qualities. Figure 5 shows the probabilities obtained from all
sequences for the 4x4-and-8x8-1D encoder at low and high picture
qualities. It can be observed that the 2-D DCT’s are chosen more often than the other transforms. A closer inspection reveals that using
a 1-bit codeword to represent the 2-D DCT and a 4-bit codeword (5bit in case of 8x8-pixel transforms) to represent the 1-D transforms
is consistent with the numbers presented.
At low picture qualities, the probability of selection is 58% for
both 2-D DCT’s, and 42% for all 1-D transforms. At high picture
qualities, the probabilities are 38% for both 2-D DCT’s, and 62%
for all 1-D transforms. The 1-D transforms are chosen more often at
higher picture qualities. Choosing the 2-D DCT costs 1-bit, and any
of the 1-D transforms 4-bits (5-bits for 8x8-pixel block transforms).
This is a smaller cost for 1-D transforms at high bitrates relative to
the available bitrate.
5. CONCLUSIONS
We report experimental results that show achievable gains when 1-D
directional transforms are used in addition to the 2-D DCT within the
state-of-the-art H.264/AVC codec. Experimental results also provide
other useful information in understanding the use of 1-D transforms
within the codec, such as the bitrate savings at higher and lower picture qualities, the fraction of bitrate used to code the side information
which indicates selected transforms, and probabilities of selection
for all transforms. Future research efforts focus on designing more
efſcient coefſcient coding methods adapted to the characteristics of
1-D transforms, and on applying 1-D transforms on other prediction
residuals.
741
[1] F. Kamisli and J.S. Lim, “Transforms for the motion compensation residual,” Acoustics, Speech and Signal Processing, 2009.
ICASSP 2009. IEEE International Conference on, pp. 789–792,
April 2009.
[2] K.-C. Hui and W.-C. Siu, “Extended analysis of motioncompensated frame difference for block-based motion prediction error,” Image Processing, IEEE Transactions on, vol. 16,
no. 5, pp. 1232–1245, May 2007.
[3] E. Le Pennec and S. Mallat, “Sparse geometric image representations with bandelets,” Image Processing, IEEE Transactions
on, vol. 14, no. 4, pp. 423–438, April 2005.
[4] Bing Zeng and Jingjing Fu, “Directional discrete cosine transforms for image coding,” Multimedia and Expo, 2006 IEEE
International Conference on, pp. 721–724, 9-12 July 2006.
[5] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P.L.
Dragotti, “Directionlets: anisotropic multidirectional representation with separable ſltering,” Image Processing, IEEE Transactions on, vol. 15, no. 7, pp. 1916–1933, July 2006.
[6] C.-L. Chang and B. Girod, “Direction-adaptive discrete wavelet
transform for image compression,” Image Processing, IEEE
Transactions on, vol. 16, no. 5, pp. 1289–1302, May 2007.
[7] F. Kamisli and J.S. Lim, “Directional wavelet transforms for
prediction residuals in video coding,” to appear in the International Conference on Image Processing, ICIP, 2009.
[8] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra,
“Overview of the h.264/avc video coding standard,” Circuits
and Systems for Video Technology, IEEE Transactions on, vol.
13, no. 7, pp. 560–576, July 2003.
[9] G. Bjontegaard, “Calculation of average psnr differences between rd-curves,” VCEG Contribution VCEG-M33, April 2001.
Download