International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 IMPLEMENTATION OF DEBLOCKING FILTER ALGORITHM USING RECONFIGURABLE ARCHITECTURE 1 C.Karthikeyan and 2Dr. Rangachar 1 Assistant Professor, Department of ECE, MNM Jain Engineering College, Chennai, Part Time Research Scholar, Hindustan University, Chennai, Tamilnadu, India 2 Senior Professor, Dean for school of Electrical Science, Hindustan University,Chennai,Tamilnadu,India. ABSTRACT A new international standards H.264 is used for the compression of video images, the blocking artifacts is one of the artifacts in video and image compression coding. This artifact will reduce the picture quality of the reconstructed images and video. To improve the quality of the received picture Deblocking filters are used to remove the artifacts. There are several algorithms have been proposed by researchers, this paper will introduce a Deblocking algorithm to remove the artifacts. This paper also proposes the hardware implementation for same algorithm. To reduce the power consumption of hardware implementation a technique clock gating is introduced. We achieved the result of 30% power reduction for clock gating technique at the cost of 2.3 % hardware and 5.8% clock speed. Keywords: Deblocking filter, blocking Artifacts, FPGA, Loop filter etc 1. INTRODUCTION The Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG has finalized a new standard for the compression of natural video images and it is known as H.264 and MPEG-4 Part 10, “Advanced Video Coding” [4,10]. This new standard offers a significant improvement on coding efficiency compared to other compression standards such as MPEG-2. The basic functional blocks of H.264/AVC encoder is shown in Figure 1. Video Source Intra Inter Transform Quantization Coefficient Scanning Bitstream Motion Estimation Motion Compensation Inverse Quantization Frame Buffer Intra Frame Prediction Inverse Transform Entropy Coding In-Loop Filter Motion Vector Figure 1 H.264 Encoder block Figure 2(b) shows that the visible discontinuity along the block boundary due to low bit rate quantization, motion compensation and block based transformation. Figure 2(a) shows the original image before quantization and Figure 2(b) shows the compressed image. In the motion-compensated prediction process, artificial discontinuities also appear in the inner part of the blocks. The quality of the picture may be improved by removing the blocking artifacts. Volume 2, Issue 12, December 2013 Page 179 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 Figure 2 (a) The original image (b) The highly compressed image Various deblocking algorithm had been proposed previously to remove the blocking artifacts. There are four types of deblocking algorithm namely 1) in-loop filtering 2) pre-processing 3) post-processing 4) overlapped block methods. The video codec used in H.264/AVC contains in-loop filtering algorithm with deblocking filter both in encoding and decoding. To improve the pixel and video quality post processing lowpass filters are used after the decoding of the video image. The quality of the image is improved by using pre-processing algorithms . The overlapped block methods include lapped orthogonal transform (LOT) whose transform bases are overlaid to each other and overlapped block motion compensation (OBMC) which consider the neighbouring blocks for motion estimation and motion compensation in video coding[1]. The post filtering process does not improve the picture quality. In order to improve the quality of the picture the deblocking filter process is included in the coding loop. The reason is the past reference frames are filtered frames of reconstructed image [6]. To Improve the coding performance in H.264 is achieved by deblocking filter. The 16x16 macroblock will be split into 4x4 subblocks. The filter process is applied to 4x4 blocks in horizontal and vertical edges [3]. The adaptive deblocking filter achieves higher level of content adaptivity in different levels due to motion vector, inter or intra mode of macroblock, the value of pixel and quantization parameter[3]. The deblocking filter adaptively adjust depending upon the quantization steps. Due to this the artifact is reduced without affecting the sharpness of the image. The section 2 explains the in-loop filtering algorithm to remove the artifacts. Section 3, hardware implementation of inloop filtering is described. In Section 4, implementation of in-loop filter in FPGA and the results are discussed. In Section 5, contains conclusions. Figure 3 Horizontal and Vertical Edges of 4 x 4 Blocks in a Macroblock 2 DEBLOCKING FILTER ALGORITHM The Deblocking process can be separated into two stages. In the first stage, the edges are classified into different edge strengths according to the pixel values along the normal to the edges. In the second stage, different filtering schemes are applied according to the strengths obtained in stage one. In [2,9], the edges are classified into 5 types to which no filter, weak 1,2,3 which uses 4-tap filter and strong uses 3,4 and 5-tap filter are applied. The threshold used in the filters are dependents on the quantization parameters of the corresponding blocks. In order to reduce the computational complexity, the filtering is applied only the side of edges. The filter will be strong if the side of Volume 2, Issue 12, December 2013 Page 180 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 the edges contain high detail blocks. The edges across the high detail blocks will be filtered if the threshold increases with quantization parameters. The Deblocking filter takes in information regarding the boundary strength (BS), certain threshold values and the pixels that are to be filtered. Each 4×4 sub-block inside a macro block has its vertical and horizontal edges filtered [2]. To filter each edge, eight pixels are required (see figure 2.2) four current pixels (q0,q1,q2,q3) and four reference pixels (p0,p1,p2,p3). Based on the pixel, threshold and boundary strength values, pixels p0 −p2 and q0−q2 may be modified. Due to the way the filtering process is defined, pixels p3 and q3 remain unfiltered. Pixels can be filtered as many as four times due to overlap in filtering between edges, and between vertical and horizontal filters. Chroma samples are filtered in the same manner as luma[5]. The figure 4 shows the Luma component , Chroma component. The basic filtering order, as defined for H.264, is shown in the table 1. Figure 4 (a) Luma component b) Chroma component Table 1 Basic Filtering Orders Sl.No 1 2 3 BS value 0 1,2 and 3 4 Operation No filtering 4 tap filter is applied producing p0,q0 and possibly p1 and q1 (depending on α and β) 3,4 or 5 tap linear filter may be applied producing p0,q0 and possibly p1 and q1 (depending on α and β) Figure 5 One-dimensional visualization of a block edge in a typical situation where the filter would be turned on [3] When BS is equal to 1, 2 or 3 two additional threshold values are calculated, tc and tco. tco is a threshold value defined by the H.264 standard. tc is then calculated from tco. tc is calculated as follows: tc =tco+x where x is defined as Volume 2, Issue 12, December 2013 Page 181 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 Once tc and tco are calculated, the filtered samples p0 and q0 must be calculated. They filtered as follows p0 = Clip1(p0+∆) and q0 = Clip1(q0−∆). ∆ is defined as ∆ = Clip3 Clip1 and Clip3 are clipping functions that are used to specify a maximum range for the filtered samples so too much filtering does not occur on a boundary. If the change in intensity is low on either or both sides, stronger filtering is applied, resulting in a smoother final image. If sharp changes are occurring on the ends, less filtering is required, preserving image sharpness. These two functions are Clip1(z) = Clip3 and Clip3(a,b,c) = Filtered samples p1' and q1' are calculated in a similar manner. In order for a 4-tap filter to be applied to samples q1 equation 2.1 must be satisfied. Similarly, for a 4-tap filter to be used on sample p1, equation 2.2 must be satisfied. |q2−q0| < β |p2−p0| < β (2.1) (2.2) If 2.2 is satisfied and luma samples are present then p1' is calculated according to 2.3. Otherwise, if 2.2 is not met or chroma samples are present then p1' is calculated according to 2.4. p1' = p1+Clip3 (2.3) p1'= p1 (2.4) Similarly, if 2.1 is satisfied and luma samples are present then q1' is according to 2.5. Otherwise, if is not met or chroma samples are present then q1' according to 2.6. q1' = q1+Clip3 q1' = q1 (2.5) (2.6) The values of q2' and p2' are set to the incoming values of q2 and p2 respectively. When filtering with BS of 4, two filters may be used depending on sample content. For luma pixels, a very strong4-or5tap filter, which modifies the edge values and two interior samples, if the condition |p0−q0| < α/4 +2 is met. (2.7) If equations 2.1 and 2.7 are not met, then a 3-tap filter is used to calculate q0' and the values of q1 and q2 pass through the filter. Similarly, if 2.2 and 2.7 are not met, then a 3-tap filter is used to calculate p0' and p1 and p2 pass through the filter. These calculations are Volume 2, Issue 12, December 2013 Page 182 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 p0' = p1' = p1, and p2' = p2; and q0' = q1' = q1, and q2' = q2; If the conditions of equations 2.1, 2.2 and 2.7 are met, the filtered values of q0'−q2' and p0'−p2' are calculated as p0' = p1' = , ,and p2' = and q0' = , q1'= and q2' = 3. FPGA IMPLEMENTATION OF DEBLOCKIBNG FILTER Eight pixels enter the filter hardware un it which is built using FPGA. From here, the pixels are sent to the calculation modules. Two calculation modules exist in the filter core: one for handling edges with boundary strengths between one and three and one for edges with boundary strength of four. A bypass path exists for when the filter is either disabled or no filtering is needed. Two additional modules are present in the core. These modules are used for the creation of, alpha, beta, tco and boundary strength (BS) parameters. The filter calculation module is designed to carry out filtering on a row of eight pixel values The amount of filtering that takes place depends on the input BS, alpha, beta and the input pixels. The output of this logic block is 8 filtered pixels. The interface is the same for both the horizontal and vertical filtering cores. Several techniques have been proposed to address the power issue. Among these techniques, clock gating is one of the most effective. Logic gates inserted into the clock cell will turn off circuits for some time. There are two types of clock gating: register-based and module-based. Fig.3.1. illustrates these two types of clock gating. Figure 6 Clock gating Volume 2, Issue 12, December 2013 Page 183 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 4. EXPERIMENTAL RESULTS The Deblocking filter algorithm is developed in VHDL Hardware Description language and the functional verification of this filter is done by the simulation using Modelsim. The algorithm is implemented in FPGA by Xilinx EDA tool. The FPGA used for hardware implementation is Spartan 3e and its capability is 500K gates. The figure 7 shows the simulation result of Deblocking filter. The RTL views for the Algorithm are obtained using Xilinx and are shown in figures 8 and 9. Figure 7 Simulation result of Deblocking filter Algorithm Figure 8 RTL view Deblocking filter Algorithm Figure 9 Detailed RTL view Deblocking filter Algorithm Volume 2, Issue 12, December 2013 Page 184 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 The hardware implementation of Deblocking filter algorithm without power reduction techniques consumes 263 slices out of 4656 slices in Spartan 3e FPGA. The power reduction implementation of Deblocking filter algorithm consumes 269 slices out of 4656 slices in Spartan 3e FPGA. The frequency of operation of Hardware implementation is 80.563 Mhz. The power reduction implementation reduces the frequency by 4.672 Mhz. The table 2 shows the comparison of hardware implementation with power reduction and without power reduction. Table 4.1 Comparison of Hardware implementation with and without power reduction Description Without power reduction With power reduction using clock gating % variation No of slices Number of 4 input LUTs Number of Slice Flip Flops Number of bonded IOBs Maximum Frequency Power consumption 263 out of 4656 501 out of 9312 163 out of 9312 148 out of 232 80.563MHz 81mw 269 out of 4656 523 out of 9312 171 out of 9312 148 out of 232 75.891 MHz 56.7 mw 2.3 % increased 4.39 % increased 4.91 % increased ----nil---5.8 % Decreased 30% Decreased The figure10 shows the comparison chart for area, speed and power for with and without power reduction of hardware implementation for Deblocking filter algorithm. Figure 10 Comparison chart for Area, Speed, Power 5. CONCLUSION In this paper, a Deblocking filter algorithm is discussed for video real time encoding or decoding in H.264. The algorithm is implemented using field programmable gate array. During implementation hardware blocks are efficiently used to reduce the power consumption of deblocking filter. A low power technique clock gating is used to further reduce the power consumption of the Deblocking filter. Using the above techniques the filter will have smaller area increment for power reduction implementation and also slightly reduces the speed. We achieved the result of 30% power reduction for clock gating technique at the cost of 2.3 % hardware and 5.8% clock speed. References [1] Shen-Yu Shih Cheng-Ru Chang Youn-Long Lin, "A Near Optimal Deblocking Filter for H.264 Advanced Video Coding”, IEEE international conference, 2006 [2] Bolla Leela Naresh1 N.V.Narayana Rao and Addanki Purna Ramesh "FPGA Implementation Of Deblocking Filter Custom Instruction Hardware On Nios-Ii Based Soc", International Journal of VLSI design & Communication Systems (VLSICS) Vol.2, No.4, December 2011. [3] Peter List, Anthony Joch, Jani Lainema, Gisle Bjøntegaard, and Marta Karczewicz "Adaptive Deblocking Filter", IEEE Transactions on Circuits and Systems For Video Technology, Vol. 13, No. 7, July 2003 [4] Mustafa Parlak and Ilker Hamzaoglu “A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm”, Second NASA/ESA Conference on Adaptive Hardware and Systems(AHS 2007) [5] Brian Dickey, "Hardware Implementation of a High Speed Deblocking Filter for the H.264 Video Codec" MS thesis 2012 Volume 2, Issue 12, December 2013 Page 185 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 2, Issue 12, December 2013 ISSN 2319 - 4847 [6] Tsu-Ming Liu, Student Member, IEEE, Wen-Ping Lee, and Chen-Yi Lee, Member, IEEE "An In/Post-Loop Deblocking Filter With Hybrid Filtering Schedule" IEEE Transactions on Circuits and systems for Videotechnology, Vol. 17,No. 7,July 2007 [7] Kyu-Yeul Wang, Byung-Soo Kim, Sang-Seol Lee, Young-Jun Kim, Bo-Keun Choi and Duck-Jin Chung " 3 Stage Pipelined Deblocking Filter for H.264/AVC" World Academy of Science, Engineering and Technology 38 2010 [8] S.Vijay,C.Chakrabarti,L.J.Karam "Parallel Deblocking Filter For H.264 AVC/SVC" ,IEEE international conference, 2010 [9] Jung-Ah Choi and Yo-Sung Ho “Deblocking Filter Algorithm with Low Complexity for H.264Video Coding”, Gwangju Institute of Science and Technology (GIST) ,2008 pp. 138–147. [10] Lain E.G. Richardson, “H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia”, John Wiley & Sons, Jan. 2004. C. Karthikeyan received B.E degree in Bharathiyar University, M.E (Applied Electronics) in Anna University 2007, now pursuing Ph.D. Programme in the area of development of low power technologies for video codec VLSI Design in Hindustan University, India. Volume 2, Issue 12, December 2013 Page 186