Optimizing the Deblocking Algorithm for H.264 Decoder Implementation Ashwin Alapati (alapati@wisc.edu), Anandnayan Jayaraman (jayaraman2@wisc.edu) 1 Overview This project aims at the efficient implementation of a de-blocking loop filter using the PLX subword parallel architecture. Deblocking filters are extremely popular in improving the visual quality of decoded frames in the H.264 standard. The project requires the identification of the portions of the algorithm wherein parallel processing can be exploited. We propose to do a performance comparison between a C implementation of a deblocking algorithm and its PLX counterpart. 2 Motivation Deblocking filters are used to improve the visual quality of decoded frames in the H.264 video coding standard. These filters attempt to remove the artifacts produced by block based operations which are basically DCT and motion compensation prediction. Although these deblocking filters help tremendously in improving the subjective and objective quality of the output frames, they are generally computationally intensive. In fact, even after tremendous efforts for made in speed optimization of these filtering algorithms, the filter can easily account for one third of the computational complexity of a decoder [1]. This complexity is mainly based on the high adaptivity of the filter, which requires conditional processing on the block edge and sample levels. These are known to be very time consuming and are also quite a challenge for parallel processing in DSP hardware. 3. Prior Art There are a number of deblocking algorithms proposed for reducing the block artifacts in block DCT based compressed images with minimal smoothing of true edges. Three of the most popular techniques include the Projection on Convex Sets (POCS) based iterative algorithm, Weighted Sum of Symmetrically Aligned Pixels and Adaptive Deblocking Filter. The POCS based iterative algorithm [2] is implemented as a two stage process. The first stage involves the band limiting of the image by low pass filtering. After that the image is transformed to obtain the transform coefficients, which are then subjected to quantization constraints. In the Weighted Sum of Symmetrically Aligned Pixels [3] , the value of each pixel in the picture is recomputed with a weighted sum of itself and the other pixel values which are symmetrically aligned with respect to block boundaries. In case of the Adaptive Deblocking Filter algorithm [1], the deblocking process is separated into two stages. In the first stage, the edge is classified into different boundary strength with the pixels along the normal to an edge. In the second stage, different filtering scheme is applied according to the strengths obtained in stage one. The algorithm flow in each of these algorithms is highly iterative either at the pixel, block or edge level. 4. Approach We are planning to implement the deblocking filter algorithm in the C language and compile it for the Intel architecture. This would be followed by an implementation of this algorithm using the PLX architecture instruction set. This implementation would be based on the Adaptive Filtering Algorithm. A detailed performance comparison would determine the effective computational speedup obtained by exploiting parallelism. 5. Expected Results It is expected that the implementation of the deblocking algorithm using the PLX subword parallel architecture would be much more efficient when compared to the implementation on the general purpose architecture. This is primarily due to the iterative nature of the deblocking filter algorithm. 6. Task Planning We propose to follow these steps for an organized approach towards the project completion. 1. Writing a C code to implement the deblocking filter algorithm 2. Determining the computationally intensive portions of the algorithm which can be parallelized. 3. Reviewing the PLX instruction set to determine the instructions which would be needed to implement the deblocking algorithm. 4. Implementing the deblocking algorithm using the PLX instruction set. 5. Analyzing the obtained results and doing the performance comparison. References [1] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive deblocking filter,”IEEE Transactions on Circuits and Systems for Video Technology, vol.13, no.7, pp. 614-619, Jul. 2003. [2] A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,”IEEE Transactions on Circuits and Systems for Video Technology, vol.2, no.1, pp. 91-95, Mar. 1992. [3] A. Z. Averbuch, A. Schlar, and D. L. Donoho, “Deblocking of Block-Transform Compressed Images Using Weighted Sums of Symmetrically Aligned Pixels,” IEEE Transactions on Image Processing, vol.14, no.2, pp. 200-212, Feb. 2005. [4] Ruby B. Lee and A. Murat Fiskiran, PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing, Journal of VLSI Signal Processing 40, 85-108, 2005.