IMPLEMENTATION OF DEBLOCKING FILTER ALGORITHM USING RECONFIGURABLE ARCHITECTURE

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
IMPLEMENTATION OF DEBLOCKING
FILTER ALGORITHM USING
RECONFIGURABLE ARCHITECTURE
1
C.Karthikeyan and 2Dr. Rangachar
1
Assistant Professor, Department of ECE,
MNM Jain Engineering College, Chennai,
Part Time Research Scholar, Hindustan University,
Chennai, Tamilnadu, India
2
Senior Professor,
Dean for school of Electrical Science,
Hindustan University,Chennai,Tamilnadu,India.
ABSTRACT
A new international standards H.264 is used for the compression of video images, the blocking artifacts is one of the artifacts in
video and image compression coding. This artifact will reduce the picture quality of the reconstructed images and video. To
improve the quality of the received picture Deblocking filters are used to remove the artifacts. There are several algorithms have
been proposed by researchers, this paper will introduce a Deblocking algorithm to remove the artifacts. This paper also proposes
the hardware implementation for same algorithm. To reduce the power consumption of hardware implementation a technique
clock gating is introduced. We achieved the result of 30% power reduction for clock gating technique at the cost of 2.3 %
hardware and 5.8% clock speed.
Keywords: Deblocking filter, blocking Artifacts, FPGA, Loop filter etc
1. INTRODUCTION
The Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG has finalized a new standard for the compression of
natural video images and it is known as H.264 and MPEG-4 Part 10, “Advanced Video Coding” [4,10]. This new
standard offers a significant improvement on coding efficiency compared to other compression standards such as
MPEG-2. The basic functional blocks of H.264/AVC encoder is shown in Figure 1.
Video Source
Intra
Inter
Transform
Quantization
Coefficient
Scanning
Bitstream
Motion
Estimation
Motion
Compensation
Inverse
Quantization
Frame
Buffer
Intra Frame
Prediction
Inverse
Transform
Entropy
Coding
In-Loop
Filter
Motion Vector
Figure 1 H.264 Encoder block
Figure 2(b) shows that the visible discontinuity along the block boundary due to low bit rate quantization, motion
compensation and block based transformation. Figure 2(a) shows the original image before quantization and Figure
2(b) shows the compressed image. In the motion-compensated prediction process, artificial discontinuities also appear in
the inner part of the blocks. The quality of the picture may be improved by removing the blocking artifacts.
Volume 2, Issue 12, December 2013
Page 179
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
Figure 2 (a) The original image (b) The highly compressed image
Various deblocking algorithm had been proposed previously to remove the blocking artifacts. There are four types of
deblocking algorithm namely 1) in-loop filtering 2) pre-processing 3) post-processing 4) overlapped block methods. The
video codec used in H.264/AVC contains in-loop filtering algorithm with deblocking filter both in encoding and
decoding. To improve the pixel and video quality post processing lowpass filters are used after the decoding of the video
image. The quality of the image is improved by using pre-processing algorithms . The overlapped block methods include
lapped orthogonal transform (LOT) whose transform bases are overlaid to each other and overlapped block motion
compensation (OBMC) which consider the neighbouring blocks for motion estimation and motion compensation in video
coding[1].
The post filtering process does not improve the picture quality. In order to improve the quality of the picture the
deblocking filter process is included in the coding loop. The reason is the past reference frames are filtered frames of
reconstructed image [6].
To Improve the coding performance in H.264 is achieved by deblocking filter. The 16x16 macroblock will be split into
4x4 subblocks. The filter process is applied to 4x4 blocks in horizontal and vertical edges [3].
The adaptive deblocking filter achieves higher level of content adaptivity in different levels due to motion vector, inter or
intra mode of macroblock, the value of pixel and quantization parameter[3]. The deblocking filter adaptively adjust
depending upon the quantization steps. Due to this the artifact is reduced without affecting the sharpness of the image.
The section 2 explains the in-loop filtering algorithm to remove the artifacts. Section 3, hardware implementation of inloop filtering is described. In Section 4, implementation of in-loop filter in FPGA and the results are discussed. In
Section 5, contains conclusions.
Figure 3 Horizontal and Vertical Edges of 4 x 4 Blocks in a Macroblock
2 DEBLOCKING FILTER ALGORITHM
The Deblocking process can be separated into two stages. In the first stage, the edges are classified into different edge
strengths according to the pixel values along the normal to the edges. In the second stage, different filtering schemes are
applied according to the strengths obtained in stage one. In [2,9], the edges are classified into 5 types to which no filter,
weak 1,2,3 which uses 4-tap filter and strong uses 3,4 and 5-tap filter are applied.
The threshold used in the filters are dependents on the quantization parameters of the corresponding blocks. In order to
reduce the computational complexity, the filtering is applied only the side of edges. The filter will be strong if the side of
Volume 2, Issue 12, December 2013
Page 180
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
the edges contain high detail blocks. The edges across the high detail blocks will be filtered if the threshold increases
with quantization parameters.
The Deblocking filter takes in information regarding the boundary strength (BS), certain threshold values and the pixels
that are to be filtered. Each 4×4 sub-block inside a macro block has its vertical and horizontal edges filtered [2]. To filter
each edge, eight pixels are required (see figure 2.2) four current pixels (q0,q1,q2,q3) and four reference pixels
(p0,p1,p2,p3). Based on the pixel, threshold and boundary strength values, pixels p0 −p2 and q0−q2 may be modified.
Due to the way the filtering process is defined, pixels p3 and q3 remain unfiltered.
Pixels can be filtered as many as four times due to overlap in filtering between edges, and between vertical and horizontal
filters. Chroma samples are filtered in the same manner as luma[5]. The figure 4 shows the Luma component , Chroma
component. The basic filtering order, as defined for H.264, is shown in the table 1.
Figure 4 (a) Luma component b) Chroma component
Table 1 Basic Filtering Orders
Sl.No
1
2
3
BS
value
0
1,2
and 3
4
Operation
No filtering
4 tap filter is applied producing p0,q0 and possibly p1
and q1 (depending on α and β)
3,4 or 5 tap linear filter may be applied producing
p0,q0 and possibly p1 and q1 (depending on α and β)
Figure 5 One-dimensional visualization of a block edge in a typical situation where the filter would be turned on [3]
When BS is equal to 1, 2 or 3 two additional threshold values are calculated, tc and tco. tco is a threshold value defined by
the H.264 standard. tc is then calculated from tco. tc is calculated as follows:
tc =tco+x
where x is defined as
Volume 2, Issue 12, December 2013
Page 181
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
Once tc and tco are calculated, the filtered samples p0 and q0 must be calculated. They filtered as follows
p0 = Clip1(p0+∆) and q0 = Clip1(q0−∆).
∆ is defined as
∆ = Clip3
Clip1 and Clip3 are clipping functions that are used to specify a maximum range for the filtered samples so too much
filtering does not occur on a boundary. If the change in intensity is low on either or both sides, stronger filtering is
applied, resulting in a smoother final image. If sharp changes are occurring on the ends, less filtering is required,
preserving image sharpness. These two functions are
Clip1(z) = Clip3
and
Clip3(a,b,c) =
Filtered samples p1' and q1' are calculated in a similar manner. In order for a 4-tap filter to be applied to samples q1
equation 2.1 must be satisfied. Similarly, for a 4-tap filter to be used on sample p1, equation 2.2 must be satisfied.
|q2−q0| < β
|p2−p0| < β
(2.1)
(2.2)
If 2.2 is satisfied and luma samples are present then p1' is calculated according to 2.3. Otherwise, if 2.2 is not met or
chroma samples are present then p1' is calculated according to 2.4.
p1' = p1+Clip3
(2.3)
p1'= p1
(2.4)
Similarly, if 2.1 is satisfied and luma samples are present then q1' is according to 2.5. Otherwise, if is not met or chroma
samples are present then q1' according to 2.6.
q1' = q1+Clip3
q1' = q1
(2.5)
(2.6)
The values of q2' and p2' are set to the incoming values of q2 and p2 respectively.
When filtering with BS of 4, two filters may be used depending on sample content. For luma pixels, a very strong4-or5tap filter, which modifies the edge values and two interior samples, if the condition
|p0−q0| < α/4 +2
is met.
(2.7)
If equations 2.1 and 2.7 are not met, then a 3-tap filter is used to calculate q0' and the values of q1 and q2 pass through
the filter. Similarly, if 2.2 and 2.7 are not met, then a 3-tap filter is used to calculate p0' and p1 and p2 pass through the
filter. These calculations are
Volume 2, Issue 12, December 2013
Page 182
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
p0' =
p1' = p1, and
p2' = p2;
and
q0' =
q1' = q1, and
q2' = q2;
If the conditions of equations 2.1, 2.2 and 2.7 are met, the filtered values of q0'−q2' and p0'−p2' are calculated as
p0' =
p1' =
,
,and
p2' =
and
q0' =
,
q1'=
and
q2' =
3. FPGA IMPLEMENTATION OF DEBLOCKIBNG FILTER
Eight pixels enter the filter hardware un it which is built using FPGA. From here, the pixels are sent to the calculation
modules. Two calculation modules exist in the filter core: one for handling edges with boundary strengths between one
and three and one for edges with boundary strength of four. A bypass path exists for when the filter is either disabled or
no filtering is needed. Two additional modules are present in the core. These modules are used for the creation of, alpha,
beta, tco and boundary strength (BS) parameters.
The filter calculation module is designed to carry out filtering on a row of eight pixel values The amount of filtering that
takes place depends on the input BS, alpha, beta and the input pixels. The output of this logic block is 8 filtered pixels.
The interface is the same for both the horizontal and vertical filtering cores.
Several techniques have been proposed to address the power issue. Among these techniques, clock gating is one of the
most effective. Logic gates inserted into the clock cell will turn off circuits for some time. There are two types of clock
gating: register-based and module-based. Fig.3.1. illustrates these two types of clock gating.
Figure 6 Clock gating
Volume 2, Issue 12, December 2013
Page 183
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
4. EXPERIMENTAL RESULTS
The Deblocking filter algorithm is developed in VHDL Hardware Description language and the functional verification of
this filter is done by the simulation using Modelsim. The algorithm is implemented in FPGA by Xilinx EDA tool. The
FPGA used for hardware implementation is Spartan 3e and its capability is 500K gates. The figure 7 shows the
simulation result of Deblocking filter. The RTL views for the Algorithm are obtained using Xilinx and are shown in
figures 8 and 9.
Figure 7 Simulation result of Deblocking filter Algorithm
Figure 8 RTL view Deblocking filter Algorithm
Figure 9 Detailed RTL view Deblocking filter Algorithm
Volume 2, Issue 12, December 2013
Page 184
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
The hardware implementation of Deblocking filter algorithm without power reduction techniques consumes 263 slices
out of 4656 slices in Spartan 3e FPGA. The power reduction implementation of Deblocking filter algorithm consumes
269 slices out of 4656 slices in Spartan 3e FPGA. The frequency of operation of Hardware implementation is 80.563
Mhz. The power reduction implementation reduces the frequency by 4.672 Mhz. The table 2 shows the comparison of
hardware implementation with power reduction and without power reduction.
Table 4.1 Comparison of Hardware implementation with and without power reduction
Description
Without power reduction
With power reduction
using clock gating
% variation
No of slices
Number of 4 input LUTs
Number of Slice Flip Flops
Number of bonded IOBs
Maximum Frequency
Power consumption
263 out of 4656
501 out of 9312
163 out of 9312
148 out of 232
80.563MHz
81mw
269 out of 4656
523 out of 9312
171 out of 9312
148 out of 232
75.891 MHz
56.7 mw
2.3 % increased
4.39 % increased
4.91 % increased
----nil---5.8 % Decreased
30% Decreased
The figure10 shows the comparison chart for area, speed and power for with and without power reduction of hardware
implementation for Deblocking filter algorithm.
Figure 10 Comparison chart for Area, Speed, Power
5. CONCLUSION
In this paper, a Deblocking filter algorithm is discussed for video real time encoding or decoding in H.264. The algorithm
is implemented using field programmable gate array. During implementation hardware blocks are efficiently used to
reduce the power consumption of deblocking filter. A low power technique clock gating is used to further reduce the
power consumption of the Deblocking filter. Using the above techniques the filter will have smaller area increment for
power reduction implementation and also slightly reduces the speed. We achieved the result of 30% power reduction for
clock gating technique at the cost of 2.3 % hardware and 5.8% clock speed.
References
[1] Shen-Yu Shih Cheng-Ru Chang Youn-Long Lin, "A Near Optimal Deblocking Filter for H.264 Advanced
Video
Coding”, IEEE international conference, 2006
[2] Bolla Leela Naresh1 N.V.Narayana Rao and Addanki Purna Ramesh "FPGA Implementation Of Deblocking Filter
Custom Instruction Hardware On Nios-Ii Based Soc", International Journal of VLSI design & Communication
Systems (VLSICS) Vol.2, No.4, December 2011.
[3] Peter List, Anthony Joch, Jani Lainema, Gisle Bjøntegaard, and Marta Karczewicz "Adaptive Deblocking Filter",
IEEE Transactions on Circuits and Systems For Video Technology, Vol. 13, No. 7, July 2003
[4] Mustafa Parlak and Ilker Hamzaoglu “A Low Power Implementation of H.264 Adaptive Deblocking
Filter
Algorithm”, Second NASA/ESA Conference on Adaptive Hardware and Systems(AHS 2007)
[5] Brian Dickey, "Hardware Implementation of a High Speed Deblocking Filter for the H.264 Video Codec" MS
thesis 2012
Volume 2, Issue 12, December 2013
Page 185
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 12, December 2013
ISSN 2319 - 4847
[6] Tsu-Ming Liu, Student Member, IEEE, Wen-Ping Lee, and Chen-Yi Lee, Member, IEEE "An In/Post-Loop
Deblocking Filter With Hybrid Filtering Schedule" IEEE Transactions on Circuits and systems for Videotechnology,
Vol. 17,No. 7,July 2007
[7] Kyu-Yeul Wang, Byung-Soo Kim, Sang-Seol Lee, Young-Jun Kim, Bo-Keun Choi and Duck-Jin Chung " 3 Stage
Pipelined Deblocking Filter for H.264/AVC" World Academy of Science, Engineering and Technology 38 2010
[8] S.Vijay,C.Chakrabarti,L.J.Karam "Parallel Deblocking Filter For H.264 AVC/SVC" ,IEEE international
conference, 2010
[9] Jung-Ah Choi and Yo-Sung Ho “Deblocking Filter Algorithm with Low Complexity for H.264Video Coding”,
Gwangju Institute of Science and Technology (GIST) ,2008 pp. 138–147.
[10] Lain E.G. Richardson, “H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia”,
John Wiley & Sons, Jan. 2004.
C. Karthikeyan received B.E degree in Bharathiyar University, M.E (Applied Electronics) in Anna University
2007, now pursuing Ph.D. Programme in the area of development of low power technologies for video codec
VLSI Design in Hindustan University, India.
Volume 2, Issue 12, December 2013
Page 186
Download