Optimizing the Deblocking Algorithm for H.264 Decoder

advertisement
Optimizing the Deblocking Algorithm for
H.264 Decoder Implementation
Ashwin Alapati (alapati@wisc.edu), Anandnayan Jayaraman (jayaraman2@wisc.edu)
1 Overview
This project aims at the efficient implementation of a de-blocking loop filter using the
PLX subword parallel architecture. Deblocking filters are extremely popular in
improving the visual quality of decoded frames in the H.264 standard. The project
requires the identification of the portions of the algorithm wherein parallel processing can
be exploited. We propose to do a performance comparison between a C implementation
of a deblocking algorithm and its PLX counterpart.
2 Motivation
Deblocking filters are used to improve the visual quality of decoded frames in the H.264
video coding standard. These filters attempt to remove the artifacts produced by block
based operations which are basically DCT and motion compensation prediction.
Although these deblocking filters help tremendously in improving the subjective and
objective quality of the output frames, they are generally computationally intensive. In
fact, even after tremendous efforts for made in speed optimization of these filtering
algorithms, the filter can easily account for one third of the computational complexity of
a decoder [1]. This complexity is mainly based on the high adaptivity of the filter, which
requires conditional processing on the block edge and sample levels. These are known to
be very time consuming and are also quite a challenge for parallel processing in DSP
hardware.
3. Prior Art
There are a number of deblocking algorithms proposed for reducing the block
artifacts in block DCT based compressed images with minimal smoothing of true
edges. Three of the most popular techniques include the Projection on Convex Sets
(POCS) based iterative algorithm, Weighted Sum of Symmetrically Aligned Pixels and
Adaptive Deblocking Filter. The POCS based iterative algorithm [2] is implemented as a
two stage process. The first stage involves the band limiting of the image by low pass
filtering. After that the image is transformed to obtain the transform coefficients, which
are then subjected to quantization constraints. In the Weighted Sum of Symmetrically
Aligned Pixels [3] , the value of each pixel in the picture is recomputed with a weighted
sum of itself and the other pixel values which are symmetrically aligned with respect to
block boundaries. In case of the Adaptive Deblocking Filter algorithm [1], the deblocking
process is separated into two stages. In the first stage, the edge is classified into different
boundary strength with the pixels along the normal to an edge. In the second stage,
different filtering scheme is applied according to the strengths obtained in stage one. The
algorithm flow in each of these algorithms is highly iterative either at the pixel, block or
edge level.
4. Approach
We are planning to implement the deblocking filter algorithm in the C language and
compile it for the Intel architecture. This would be followed by an implementation of this
algorithm using the PLX architecture instruction set. This implementation would be
based on the Adaptive Filtering Algorithm. A detailed performance comparison would
determine the effective computational speedup obtained by exploiting parallelism.
5. Expected Results
It is expected that the implementation of the deblocking algorithm using the PLX
subword parallel architecture would be much more efficient when compared to the
implementation on the general purpose architecture. This is primarily due to the iterative
nature of the deblocking filter algorithm.
6. Task Planning
We propose to follow these steps for an organized approach towards the project
completion.
1. Writing a C code to implement the deblocking filter algorithm
2. Determining the computationally intensive portions of the algorithm which can be
parallelized.
3. Reviewing the PLX instruction set to determine the instructions which would be
needed to implement the deblocking algorithm.
4. Implementing the deblocking algorithm using the PLX instruction set.
5. Analyzing the obtained results and doing the performance comparison.
References
[1] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive
deblocking filter,”IEEE Transactions on Circuits and Systems for Video Technology,
vol.13, no.7, pp. 614-619, Jul. 2003.
[2] A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image
coding,”IEEE Transactions on Circuits and Systems for Video Technology, vol.2, no.1,
pp. 91-95, Mar. 1992.
[3] A. Z. Averbuch, A. Schlar, and D. L. Donoho, “Deblocking of Block-Transform
Compressed Images Using Weighted Sums of Symmetrically Aligned Pixels,” IEEE
Transactions on Image Processing, vol.14, no.2, pp. 200-212, Feb. 2005.
[4] Ruby B. Lee and A. Murat Fiskiran, PLX: An Instruction Set Architecture and
Testbed for Multimedia Information Processing, Journal of VLSI Signal Processing 40,
85-108, 2005.
Download