Design and Implementation of Multi-Standard Video Encoder Supporting Different Coding Standards

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 3 – Aug 2014
Design and Implementation of Multi-Standard Video
Encoder Supporting Different Coding Standards
Karthika Sudersanan#1 , R. Ramya*2
#1
Student, *2Associate Professor,
Department of Electronics and Communication,
Sree Buddha College of Engineering, University of Kerala, Kerala, India
Abstract- In this paper we develop the VLSI architecture of
Multi-standard video encoder which supports different video
coding standards such as MPEG 2/4, VC-1, H.264/AVC and
AVS. A multi-standard transform unit with reduced path delay
generates transform coefficients for different video coding
standards through static reconfiguration. It is suitable for the
real-time processing of videos with less motion such as video
calls. This architecture is synthesised in Xilinx ISE Design Suite
and implemented in FPGA Virtex 5 board.
Keywords— Motion estimation, Video coding standards
I. INTRODUCTION
In a computer a still image is represented as an array of
integers and is usually two dimensional (2-D) if it’s a black
and white image and three dimensional (3-D) for a colour
image. Each number in the array represents an intensity value
at a particular location in the image, called a picture element
or pixel, for short. The pixel values are positive integers,
which ranges between 0 and 255 for an 8 bit representation of
pixel value. This means that each pixel of a black and white
image occupies 1 byte in a computer memory and the image
has a gray scale resolution of 8 bits per pixel (bpp). On the
other hand, a colour image has a triplet of values for each
pixel one each for the red, green, and blue primary colours.
Hence, it will need 3 bytes of storage space for each pixel.
A video signal is composed of series of still images
referred as frames as shown in Fig 1.
Fig 1: Series of still images representing “scenes in motion”
The number of frames displayed per second is referred as
frame rate which determines how smooth the changes in the
frames will be perceived. The human visual system can
ISSN: 2231-5381
process 10 to 12 separate images per second, perceiving each
image individually. Therefore, the higher the frame rate, the
smoother the moving picture is perceived. A video source may
produce 30 or more frames per second, in which case the raw
data rate will be much higher. There are no such practical
channels in existence that will allow such a huge transmission
bandwidth. It is very clear that efficient data compression
schemes are required to bring down the huge raw video data
rates to manageable values so that practical communications
channels may be employed to carry the data to the desired
destinations in real time.
Compression is the process of reducing the size of the data
sent, thereby, reducing the bandwidth required for the digital
representation of a signal. Compression technology can result
in reduced transmission time due to less data being
transmitted. It also decreases the storage requirements because
there is less data. Text files, pictures, voice, video and in fact
any data that contains redundancy can be made smaller by
employing compression. Currently there exist many video
coding standards such as MPEG-2, MPEG-4, H.264/AVC,
VC-1, AVS and HEVC. The intercommunications between
the video devices using different standards are so much
inconvenient, thus video codec supporting multiple standards
are more useful and more attractive.
In this paper we develop a 2-D Multi-Standard transform
unit with reduced path delay which generates transform
coefficients through static reconfiguration for different video
coding standards such as MPEG 2/4, VC-1, H.264/AVC and
AVS. Using this transform unit, a Multi-Standard video
encoder is developed which generates compressed bit stream
for transmission or storage.
The rest of this paper is organized as follows. Section
II, reviews the important contribution of researches in the field
of Video Coding units. In Section III the architecture of Multistandard video encoder is discussed. Section III deals with the
architecture Multi-standard transform unit and finally in
section V, experimental results are presented. In the last
section, conclusion of the work is presented.
II. RELATED WORK
In [1] Kim and Koh presented an area efficient VLSI
architecture of transform coding module for MPEG-2 video
encoder. This design and implementation are applicable to
http://www.ijettjournal.org
Page 145
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 3 – Aug 2014
MPEG-2 video encoder. Malvar, et. al. [2] presented an
overview of the transform and quantization designs in H.264.
Unlike the popular 8×8 discrete cosine transform used in
previous standards, the 4×4 transforms in H.264 can be
computed exactly in integer arithmetic, thus avoiding inverse
transform mismatch problems.
complete architecture of a Multi-standard video encoder is
shown in Fig. 2.
Lee and Cho [3] presented an area-efficient architecture of
a VLSI circuit that can perform various DCT-based
transforms for a video decoder supporting multiple standards
such as JPEG, MPEG-4, VC-1 and H.264. The proposed
architecture uses a novel concept of a delta coefficient matrix
and shares resources such as adders and shifters (instead of
multipliers) as much as possible.
Huang and Gao [4] presented a low-cost very large scale
integration (VLSI) architecture is designed for multi-standard
inverse transform. The proposed architecture is used in multistandard decoder of MPEG-2, MPEG-4 ASP, H.264/AVC and
VC-1. Two circuit share strategies, factor share (FS) and
adder share (AS), are applied to the inverse transform
architecture for saving its circuit resource
Fig. 2 Multi-standard video encoder
Initially video sequence is converted into frames
whose pixel values are fed as input serially to the input buffer
section, which temporarily stores the pixel values of a frame
and thus 64 pixels form a frame. From the input buffer the
first frame and every consecutive 8th frame is send as
Jignesh Patel et. al. [5] presented a VHDL implementation reference frame to the motion-estimation block and the
of H.264 video coding standard. Depending upon applications, remaining frames are fed directly to the motion-estimation
H.264 defines the Profiles and Levels specifying restrictions block.
In video compression, discrete cosine transform
on bit streams like some of the previous video standards.
Design is developed for Transform and Quantization to (DCT) is widely used because it concentrates signal
scaling the input video and converted for generating byte information in a few low-frequency components. But, if we
stream of input video. Kanwen Wang, Jialin Chen in [6] want to implement the DCT with low error, an arithmetic with
presents a reconfigurable VLSI architecture which is designed long word length and floating point implementation is
for multi-transform codec in several video coding standards of required which makes the implementation complex which in
turn makes the coder hardware more expensive. Thus,
MPEG-2/4, VC-1, H.264/AVC and AVS.
transform without float-point multiplication, namely, integer
transform, has been raised in this Multi-standard Transform
III. ARCHITECTURE OF MULTI-STANDARD VIDEO
unit discussed in section IV. It is similar, but not identical, to
ENCODER
DCT. The transformed data are called transform coefficients
Video compression exploits three kinds of data and they are passed to the quantization unit.
Most modern multimedia codecs (both encoder and
redundancy within a video scene. In a standard lossy video
decoder)
employ transform-quantization pair. The objective is
compression such as H.264, the temporal redundancy between
adjacent frames is the most important redundancy in video- to minimize the number of bits which must be transmitted to
type data and a large proportion of data dependency can be the decoder. Reduced quantization accuracy reduces the
reduced through motion estimation. Spatial redundancy, number of bits which need to be transmitted to represent a
which can be exploited within a frame, is reduced via given transform coefficient.
The entropy encoding unit is the last stage in a video
transform techniques. It also constitutes a significant portion
compression
system. The motion vectors output by the motion
of redundancy since there is usually a high correlation
estimation
unit
and quantized transform coefficients from the
between neighbouring pixels. Lastly, statistical redundancy,
which must occur in any kind of data source, is reduced by an transform unit are accepted in this stage to produce the
compressed bit stream that can be transmitted or stored. After
entropy coder in the last stage of the encoder.
quantization the 8×8 blocks of transform coefficients are
Every compression system involves complementary scanned in a zigzag manner to turn 2-D array into serial values
units, an encoder (compression unit) and a decoder of quantized coefficients. It is the process of ordering the
(decompression unit).The encoder exploits the redundancy transform coefficients from low to high spatial functions thus
among the given data and converts it to a compressed data providing more compression. The next step in entropy
stream. The decoder interprets the compressed data stream encoder is the run length encoder. It is the basic form of
and restores it into the original format. The encode/decode lossless compression. Run length encoder replaces
pair is often described as a codec (coder/decoder). The consecutive repeating occurrences of a symbol by one
occurrence of the symbol followed by the number of
ISSN: 2231-5381
http://www.ijettjournal.org
Page 146
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 3 – Aug 2014
occurrences i.e., (run, length). Huffman encoder is a technique
which assigns a variable length code word to an input data
item that is characters are not coded to a fixed number of bits.
Smaller code word is assigned to the more frequently
occurring input.
In order to get the results of matrix calculation, lots
of multiplication and addition are required. The positions and
signs of coefficients a to g are the same, although each
standard as its own coefficient values in the transform matrix.
The coefficients for different video coding standards are given
in Table I
IV. MULTI-STANDARD TRANSFORM UNIT
Table I
Coefficient among different video coding standards
A. Design of Multi-standard Transform unit
The transform unit reduces spatial redundancy within a
picture. Its input is the residue picture calculated by the
motion estimation unit. The 1-D 8-point forward transform
coefficient matrix is described as follows:
Coefficient
value
U
a a
a
a
a
a
a
a
⎡b c
d
e −e −d −c −b⎤
⎢ f g −g −f −f −g g
f ⎥
⎢
⎥
c
−e
−b
−d
d
b
e
−c
Ctran8 =⎢
⎥
a −a −a a ⎥
⎢ a −a −a a
c −c −e b −d⎥
⎢d −b e
⎣ g −f
f
−g −g f
−f g ⎦
a a
a
a
f g −g −f
Utran4 = a −a −a a
g −f
f −g
Vtran4 =
b c
d
e
c −e −b −d
d −b e
c
e −d c −b
Note that the 4-point U matrix is also used in 4-point
transform coding. The 1-D 8-point transform is illustrated as,
Y8=C8×X8
Where,
Y8 = [y0 y1 y2 y3 y4 y5 y6 y7]T
X8 = [x0 x1 x2 x3 x4 x5 x6 x7]T
C8 is the 8-point transform coefficient matrix. The 1-D 8point forward transform can be decomposed as,
y0
y2
= Utran4
y4
y6
x0 + x7
x1 + x6
x2 + x5
x3 + x4
y1
y3
= Utran4
y5
y7
x0 − x7
x1 − x6
x2 − x5
x3 − x4
ISSN: 2231-5381
8
Point
4
Point
VC-1
8
Point
H.264
VC-1
4
Point
H.264
AVS
a
362
12
8
17
1
8
f
473
16
8
22
2
10
g
196
6
4
10
1
4
b
502
16
12
-
-
10
c
426
15
10
-
-
9
d
284
9
6
-
-
6
e
100
4
3
-
-
2
Matrix
V
Matrix
Ctrans4 is the 8×8 transform coefficient matrix. In total, there
are 64 coefficients in this matrix, with 7 different values of a
to g. For different video coding standards the position of these
coefficients a to g is same but the values of it changes for each
video coding standard. By using fast algorithm from, an
8-point forward transform matrix can be decomposed into two
4-point forward transform matrices, which are,
MPEG
2/4
B. Architecture of Multi-standard Transform unit
The architecture of Multi-standard transform unit is given
in figure 2. It mainly consists of the U matrix calculation
block, the V matrix calculation block, the pre-processing
block, and the adder tree block. The matrix calculation is
multiplier less, which is made of only adders and shifters.
Two kinds of constant multipliers are used to calculate each
term of the matrix product. That is, an AFG constant
multiplier is in charge of U matrix calculations, and a BCDE
constant multiplier is in charge of V matrix calculations. The
constant multipliers are responsible for calculating
coefficients in parallel and can be reconfigured to support
different standards. The constant multipliers are reconfigured
to different standards depending upon the input at
sel_standards.
In the proposed multi-standard transform unit the ripple
carry adder in the design is replaced by carry select adder by
which the combinational delay and the memory usage for the
design can be reduced. The pre-processing block is used to
realize the butterfly structure of forward transform. The adder
tree block is used to obtain the sum of the matrix product.
http://www.ijettjournal.org
Page 147
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 3 – Aug 2014
and s2 are used as the common select lines for all the three 8:1
multiplexers.
Fig 5 shows the BCDE multiplier unit. With the help
of six select lines (s0, s1, s2, s3, s4, s5), the b, c, d, e
coefficients for different standards are generated. The select
lines s0 and s1 are used as common select lines for the four
4:1 multiplexers. s4 and s5 are the select lines for 2:1 MUX
m7 and m9 respectively. s2 and s3 are the select lines for the
4:1 MUX (m6).
Fig. 2 1-D Multi-standard Transform VLSI architecture
In the proposed design the structure of “standardsel_afg”
consists of a select_function unit and afg_multiplier unit as shown in
Fig 3. With the help of select_function unit seven select lines (m0,
m1, s0, s1, s2, s3, s4) a, f, g coefficients of different standards are
generated. The generated select lines are fed to the “afg_multiplier”
unit. Similarly “bcde_coefficient” consists of a select_function unit
and bcde_multiplier unit. For bcde_multiplier select_function unit
generates six select lines.
Fig 5 BCDE constant multiplier
The 2-D multi-transform can be implemented with
the 1-D multi-transform and a transpose memory in a row–
column decomposition manner. Initially multi-transform is
applied in the row of the 8×8 image block. Then, the obtained
transform coefficients are stored temporarily in the transpose
memory and then multi-transform is applied to the column of
the result.
Fig 3 Standardsel_afg unit
V. EXPERIMENTAL RESULTS
A. Performance Evaluation
Main performance evaluation parameters considered in
this architecture is combinational path delay.
B. Simulation and Implementation setup
Fig 4 AFG constant multiplier
Fig 4 shows the AFG multiplier unit. The select lines
m0 and m1 is used for add/sub module m0 and m7
respectively. When m0 and m1 is 1, add/sub module acts like
a subtractor else as an adder. The select lines s0 and s1 is for
the 2:1 MUX m5 and m9 respectively. The select lines s4, s3
ISSN: 2231-5381
The proposed Multi-standard video encoder is
designed using the hardware description language Verilog in
XILINX ISE Design Suite. The design is simulated using Isim
simulator. The proposed design is implemented using Virtex-5
FPGA board.
Throughout the design the value of the code stands
for different standards as follows.
Code – 0: MPEG 2
Code – 1: MPEG 4
Code – 2: 8 Point VC-1
Code – 3: 8 Point H.264
Code – 4: 4 Point VC-1
http://www.ijettjournal.org
Page 148
International Journal of Engineering Trends and Technology (IJETT) – Volume 14 Number 3 – Aug 2014
Code – 5: 4 Point H.264
Code – 6: AVS
VII. CONCLUSION
In this paper, the design and architecture of multiFigure 6 shows the simulation result of the forward standard video coding is developed using the proposed multitransform applied on the first row of the 8×8 image block. ‘x0’ standard transform unit which generate transform coefficients
to ‘x7’ is the input i.e., the first row of the image block. Code for different video standards through static reconfiguration.
denotes different video standards. ‘y0’ to ‘y7’ are the
transformed output.
ACKNOWLEDGEMENTS
I would like to thank Ms. R. Ramya for providing
required assistance and extending all the facilities for carrying
out my work successfully.
REFERENCES
[1] K. Kim and J. S. Koh, “An area efficient DCT architecture for MPEG-2
video encoder,” IEEE Trans. Consum. Electron., vol. 45, no. 1, pp. 62–67,
Feb. 1999.
[2] Henrique S. Malvar, Antti Hallapuro, Marta Karczewicz and Louis
Kerofsky, “Low complexity transform and quantization in H.264/AVC”,
IEEE Transaction on Circuits and Sytems for video technology, Vol.13, No.7,
pp. 598-603, July 2003.
Fig 6 Simulation result of 1 D forward transform unit
Figure 7 shows the simulation results of the multistandard video encoder. ‘out_sig’ is the bit stream generated
after the compression. This bit stream can be either
transmitted or stored. ‘out_rdy’ is high for the bit stream of
each pixel.
[3] S.Lee and K.Cho,“Architecture of transform circuit for video decoder
supporting multiple standards”, Electronics Letters, Vol. 44, No.5, Feb 2008
[4] H. Qi, Q. Huang, and W. Gao, “A low-cost very large scale integration
architecture for multistandard inverse transform,” IEEE Trans. Circuits Syst.
II, Exp. Briefs, vol. 57, no. 7, pp. 551–555, Jul. 2010.
[5] Jignesh Patel, Haresh Suthar, Jagrut Gadit, “VHDL Implementation of
H.264 Video Coding Standard,” International Journal of Reconfigurable and
Embedded Systems (IJRES), Vol. 1, No. 3, pp. 95-102 ISSN: 2089-4864,
November 2012.
[6] Kanwen Wang, Jialin Chen, “A reconfigurable multi transform VLSI
architecture supporting video codec design”, IEEE Trans. Circuits Syst.
II,Vol. 58, No.7, pp.432-436, July 2011.
Fig. 7 Multi-standard video encoder
D. Comparison of timing summary
While comparing the existing multi-transform
architecture with the proposed architecture it is found that the
delay got reduced in the proposed architecture. Table II gives
its comparison.
Table II
Comparison of Timing summary
Area of comparison
Maximum
combinational path
delay
No. of slice registers
Multi-transform
architecture
24.118ns
Proposed
architecture
21.846ns
5%
5%
ISSN: 2231-5381
http://www.ijettjournal.org
Page 149
Download