Implementation of Low Power SAD Architecture for Motion Estimation Sunitha S Patil

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015
Implementation of Low Power SAD Architecture
for Motion Estimation
Sunitha S Patil1, Narendra C.P2
1
2
M.Tech student, Department of ECE, Bangalore Institute of Technology, Bengaluru, India
Assistant professor, Department of ECE, Bangalore Institute of Technology, Bengaluru, India
Abstract—Multimedia video applications embraces a variety of
advanced features in consumer devices/mobile phones such that it
has become default features in a wide range of electronic gadgets
and devices. Due to the scaling of VLSI chip design technology
careful trade-off between the quality metrics like area, power and
performance is essential based onthe application. To meet the
bandwidth criteria and to achieve fast transfer rate there is a
need to compress the video before transmission. In video
compression algorithm Motion Estimation is critical block
because of its intense computations. Motion Estimation operation
involves predicting the frames and identifying motion vectors so
that redundancy can be exploited by eliminating the transfer of
similar information between successive frames.The most efficient
and simple technique to estimate the motion vectors is Sum of
Absolute Difference(SAD) algorithm. In this paper low power
SAD architecture is implemented in ASIC domain. A new low
power 1-bitfull adder cell is used in proposed SAD architecture
which gives improvements in Leakage Power (38.43%),
area(5.92%) and performance(9.94%) when compared to the
architecture having conventional full-adder.
Keywords—SAD, LOW POWER, VLSI, H.264/AVC, ADDER,
ASICetc.
I. INTRODUCTION
In the ever increasing world of digital era, due to
innovations in the VLSI technologyand with the advent of
smartphones there is a huge demand for multimedia
applications in all the fields such as HDTV, live streaming of
video, video conferencing, Electronic patient recording and so
on. Multimedia integrates many types of data such as text,
graphic, sound and video. There is a need to represent this data
to store and transmit for effective communication[1].
Compression in video is very essential to meet the
technological demands such as low power, less memory and
fast transfer rate for different range of devices. Apart from
spatial redundancy in a frame, there will be a similarity
present between the successive frames of a video called
temporal redundancy. In general, not the whole frame but just
a small portion of each frame will be involved in motion of a
particular video sequence. For example motion of a person or
vehicle in a scene of a movie.
A scene in a video of duration 3 seconds, assuming a
refresh rate of 60 frames per second which totally forms 180
frames related to that duration of a video. Because of the
similarity present between the successive frames we can send
ISSN: 2231-5381
only the information related to the segments that defines the
movement associated with them so that significant bandwidth
savings can be made by exploiting the temporal
redundancy[2]. The technique used to achieve greater
compression ratios is Motion Estimation[3-6].
Motion estimation exploits high correlation between
successive frames predicting the next sequences of a video,
sometimes combination of both preceding and succeeding
frames. Here the difference associated between each
successive frames are sent instead of sending actual source
information. The accuracy of prediction depends on the
movement of motion vectors.
The motion vector is expressed in the format of (X, Y),
where X represents the number of pixels that moves in
horizontal direction, while Y represents the number of pixels
that moves in vertical direction.
The simple metric system used to exploit the similarities
between the video frames is the SAD algorithm, where the
absolute difference values between the corresponding
elements/pixels are added up.
There are several video coding standards in the video
processing systems; the modern/latest video coding standard
used is H.264/AVC [8]. This video coding standard uses the
Variable Block Size Motion Estimation (VBSME), and the
computational requirements are much higher compared to the
previous video coding standards such as H.263/MPEG-IV,
MPEG-2. In H.264/AVC, each picture frame is divided into
different macro blocks.And further each macro block is subdivided into several sub-blocks as shown in“fig 1”.
In this paper we are implementing the low power sum of
absolute difference algorithm in order to determine the
effective motion vectors which gives the least sad value,
Based on the design specifications, the SAD algorithm is
implemented using Verilog HDL coding which is functionally
verified(simulated) using the Modelsim simulation tool and
the design is synthesized using Cadence RC compiler tools.
The remaining part of the paper includes the following
Sections: Section II describes the work related to the
Implementation of different types of SAD architecture to
determine the effective motion vectors. Implementation of
regular and proposed 1-bit full adder architecture has
beendescribed in Section III. The details of sum of absolute
difference architecture have been presented in Section IV.
http://www.ijettjournal.org
Page 376
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015
Section V describes the results and discussion, and finally
Section VI includes the conclusion.
which employs 1024 SAD processing units forH.264/AVC
encoder in terms of gate count and operational
frequency.Work proposed in [13] presents SAD architectures
targeting gate count and delay optimization which is not
suitable for low power devices.
The proposed work in this paper highlights both the
improvements in power dissipation as well as the
performance.
III. IMPLEMENTATION
An adder is a digital circuit that is mainly used to
perform the addition of two numbers/quantities. There are
different types of adders among which the ripple carry adder is
the simplest one.
A. Ripple carry adder
Cascading the 1-bit multiple full adders in parallel in
order to add N-bit binary numbers that forms the logical
circuit is called a ripple carry adder. Where each carry bit is
rippled or propagated to the next stage/next adder. Since no
carry-in is required at the 1st stage, therefore the 1st full adder
can be replaced by the half adder to utilize the resources
efficiently.The block diagram of ripple carry adder is as
shown in “fig 2”.
Fig.1 Different block sizes of motion estimation in H.264.
II. RELATED WORK
Different types of SAD implementations for H.264 video
codec has been proposed in the literature, considering the
tradeoff between the parameters like area, performance and
power parameters.
The SAD algorithm is the simplest metric which considers
all the pixels in a macroblock with one to one mapping
between template and search image[1].
Fig.2 4-Bit binary ripple adder circuit.
The main advantages of ripple carry adder is that, it has the
smallest area, longest delay and consumes the lowest power.
B. Existing architecture
Motion estimation techniques mainly exploit the temporal
redundancy between the successive video frames, achieving
significant bandwidth savings, which is presented by work
[3-7].
SAD algorithms can be implemented in various domains to
achieve desired parameters for various applications. The work
in [9] implemented SAD on FPGA. SAD algorithm to
determine the motion vectors can be modelled using VHDL
and implemented on FPGA [10].SAD algorithm can be
implemented in Matlab for stereo matching in computer vision
applications [11].
The work presented in [12] proposed the SAD algorithm
ISSN: 2231-5381
Fig.3 1-Bit regularfull adder architecture.
This is the basic full adder architecture which is generally
used. The full adder architecture plays a very important role in
the construction of basic binary adders. The basic existing full
adder architecture which mainly consists of 2 XOR gates, 2
AND gates, and 1 OR gate is shown in “fig 3”.
http://www.ijettjournal.org
Page 377
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015
Some of the limitations of the existing architecture are as
follows:
It consists of more number of cells which increases
the total cell area.
More interconnects increases the glitches and which
leads to the increased power consumption.
Interdependency of Carry path on the sum path
partially.
C. Proposed architecture
There are different types of full adders that are
implemented earlier, but here is the proposed new full adder
architecture which is well optimized in terms of several
parameters like area, delay and power and this proposed
architecture consists of a complex cell called AOI222 from
TSMC 180nm technology, which helps in reducing the overall
power consumption which is as shown in “fig 4”.
differences between each pair of corresponding pixels of
template image and search image and the obtained absolute
difference values are summed up together by means of an
adder to obtain the result in the similarity block. It mainly
involves just two basic arithmetic operations i.e, addition and
subtraction. The block diagram of sum of absolute difference
algorithm is shown in “fig 5”.
The SAD algorithm is one of the simplest metric which
considers all the pixels in the block/frame for the computation
that too separately and it is one of the most time efficient
algorithm since it compares the group of pixels together which
makes its implementation easier, faster and parallel. And it is
used in many of the applications like block motion estimation,
object recognition etc.
This paper proposes the new low power SAD architecture
which is well optimized in terms of resource sharing and
which consumes less area and power.
Advantages of the proposed cell
Complex cell reduces the gate counts and delay
optimizations required for the implementation.
Reduced interconnects reduces the glitches and hence
the associated power consumption is also reduced.
Individual carry path (non-interdependency of carry
path on the sum path).
This proposed architecture is implemented in the main
sum of absolute difference architecture in order to perform the
basic arithmetic operations like addition and subtraction. To
get the improved results.
Fig.5 Block diagram of Sum of Absolute Difference.
The hierarchy of SAD architecture is shown in “fig 6”.
Fig.4 1-Bit proposed full adder architecture.
D. SAD architecture
There are different types of block matching algorithms
used for predicting/determining the motion vectors out of
which the SAD algorithm is the simplest and most efficient
one. It is the most repeated block in the block matching
algorithm within the motion estimation subsystem.
Sad algorithm is mainly used for determining the motion
vectors by predicting the similarities between the
Images/frames which is determined by taking the absolute
ISSN: 2231-5381
Fig.6 The hierarchy of sum of absolute difference architecture.
http://www.ijettjournal.org
Page 378
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015
The absolute difference values which are obtained are
added up together using an adder and the obtained results are
fed to comparator and the comparator compares and selects
the block with the minimum absolute difference value and
finally we get the similarity block.
reduced to 19.68% due to the non-interdependency of the
carry and the sum path and hence we obtain the parallel
computations. The Leakage power has been significantly
improved by 40.03% against its counterpart architecture
because of the use of higher transistor stack complex cell
called AOI222 where the on-resistance between the supply
rails is high.
IV. RESULTS & DISCUSSION
Here we are proposing a new low power sum of absolute
difference algorithm for the H.264 video coding standard for
determining the motion estimation.
The existing & proposed parallel 8X8 SAD architectures
were implemented in ASIC methodology. The Existing
architecture includes EX-OR gates, AND gates, and OR gate
for the 1-bit full adder architecture which is replaced in the
adder part of SAD architecture.
The following TABLE Ishow the results of existing
architecture with respect to various parameters like area, delay
and power when synthesized in Cadence RC compiler tool.
TABLE I
Synthesis results of 1-bit regular full adder architecture
TABLE III
Synthesis results of regular and proposed 1-bit full adder architecture
SL
No
RegularFull Adder
Area
Delay(ps)
Leakage power(nW)
Dynamic power(nW)
Total power(nW)
93
315
4.372
6549.499
6553.871
The Proposed architecture uses a complex cell called
AOI222, which gives the better optimized results compared to
the Regular full adder architecture. When this 1-bit proposed
full adder architecture is synthesized in RC Compiler we
obtain the following results which are tabulated as follows in
%Gain
1
Area
93
86
7.527
2
315
253
19.683
4.372
2.62
40.073
4
Delay(ps)
Leakage
power(nW)
Dynamic
power(nW)
6549.499
6100.844
6.850
5
Total power(nW)
6553.871
6103.463
6.873
3
Parameters
Technology Library=180nm
Regular
Proposed
Full Adder Full adder
which uses which uses
Parameters
EXOR,
EXOR gate
AND,OR
& AOI222
gates
Cell
When the existing and proposed 1-bit full adder
architectures are implemented in the adder part of the SAD
architecture we obtain 5.92% improvement in the area, 9.94%
improvement in the performance and 38.43% improvement in
the leakage power, the following compared results are
tabulated below in TABLE IV.
TABLEIV
Synthesis results of Existing and proposed SAD architecture
TABLE II.
TABLE II
Synthesis results of 1-bit full adder proposed architecture
Technology Library=180nm
Existing
Proposed
Parameters SAD
SAD
%Gain
Parameters
Proposed Full adder
SL
No
Area
86
1
Area
495657
466278
5.927
Delay(ps)
253
2
7522
6774
9.944
Leakage power(nW)
2.620
Dynamic power(nW)
6100.844
20.137
12.398
38.430
Total power(nW)
6103.463
Delay(ps)
Leakage
power(µW)
Dynamic
power(µW)
Total
power(µW)
112174.65
114023.16
-1.648
112194.79
114035.56
-1.641
3
4
When the Proposed full adder architecture is compared
with the Existing full adder architecture we obtain the
improved, optimized results in terms of several parameters
like area, performance and power (both leakage and dynamic
power). The percentage change in gain is also estimated below
in TABLE III.
The proposed architecture gives 7.52% improvement in
area because of the use of reduced number of gates when
compared to the Existing architecture, and the delay is also
ISSN: 2231-5381
5
V. CONCLUSION
In this paper we implemented sum of absolute
difference algorithm for motion estimation.The proposed
concept gives significant improvements in the results. So
when the datapath architecture is optimized we get the
improved results at the system level also and this proposed
http://www.ijettjournal.org
Page 379
International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015
datapath architecture can be designed and extended to any
required constraint ofn-bit width. When these datapath
optimizations for SAD architecture are implemented in ASIC
design flow and synthesized using Cadence RC Compiler,
better improved results are obtained when compared to the
existing architecture.Further carrying out this work in circuit
level results in higher efficiency and facilitates greater control
over the datapath architectures to design as per the constraint
of the applications.
ACKNOWLEDGMENT
Authors like to express their deep gratitude towards
the Department of Electronics and Communication
Engineering of Bangalore Institute of Technology for their
support and encouragement during this work.
REFERENCES
[1] D.V. Manjunatha, G. Sainarayanan “Power Efficient Sum of Absolute
Difference Algorithms for video Compression”IOSR Journal of VLSI and
Signal Processing (IOSR-JVSP)(Mar. – Apr. 2013)
[12] T. C. Chen, et al., “Analysis and Architecture Design of anHDTV720p
30 Frames/s H.264/AVC Encoder”, IEEETCSVT, v. 16, no. 6, Jun. 2006, pp.
673-688.
[13] J. Vanne, E Aho, T D Hamalainen and K Kuusilinna, “AHighPerformance Sum of Absolute DifferenceImplementation for Motion
Estimation”, IEEE TCSVT, v. 16,n. 7, Jul. 2006, pp. 876-883.
AUTHORS PROFILE
Sunitha S Patilreceived her B.E degree in Electronics and Communication
Engineering from S.K.S.V.M.A.C.E.T, Lakshmeshwar, 33Gadag. Currently
she ispursuing her Master degree in Digital Electronics and Communication
Engineering from Bangalore Institute of Technology, Bengaluru, India. Her
area of interests include VLSI Communications, Digital signal processing.
Mr. Narendra C. P is Assistant Professor in the Department of Electronics and
communication Engineering, Bangalore Institute of Technology, Bangalore.
Received his B.E. degree in Instrumentation and Electronics from Bangalore
University. The specialization in Master degree was Digital Electronics and
Communication from NMAMIT, Nitte,Visvesvaraya Technological
University (VTU), Belgaum, Karnataka and published 2 papers and currently
pursuing Ph.D. His research interests include Digital Signal Processing,
Digital Image Processing and VLSI Signal Processing.
[2] FRED HALSALL: “Multimedia Communications Applications, Networks,
Protocols and standards”(PEARSON, 2001).
[3] Y. Wang et al. “Hilbert scanning search algorithm formotion estimation,”
IEEE transactions on circuits and systemsfor video technology, vol. 9, issue 5
pp. 683-691, Aug. 1999.
[4] S. Lee et al. “New motion estimation algorithm usingadaptively quantized
low bit-resolution image and its VLSIarchitecture for MPEG2 video
encoding,” IEEE transactionson circuits and systems for video technology,
vol. 8, issue 6,pp 734 -744, Oct. 1998.
[5] M. Pickering et al. “An adaptive search algorithm for blockmatching
motion estimation,” IEEE transactions on circuitsand systems for video
technology, vol. 7, issue 6, pp 906-912,Dec. 1997.
[6] J. Y. Tham et al. “A novel unrestricted center biaseddiamond search
algorithm for block motion estimation,” IEEEtransactions on circuits and
systems for video technology, vol.8, issue 4, pp 369-377, Aug. 1998.
[7] Chandana Pandey, DeependraPandey"Implementationof Novel Threshold
Diamond Search (TDS) Algorithm for Fast Motion Estimation", International
Journal of Engineering Trends and Technology (IJETT), V23(5),268-274 May
2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense
research group.
[8] DharmendraJha ,FreminKannampuzha , Justin Joseph , StevewPossa , Dr.
Deepak Jayaswal , Santosh Chapaneri. "Motion Estimation Algorithms for
Baseline Profile of H.264 Video Codec".International Journal of Engineering
Trends and Technology (IJETT).V4(4):727-733 Apr 2013. ISSN:2231-5381.
www.ijettjournal.org. published by seventh sense research group.
[9] Stephan Wong, StamatisVassiliadis, and SorinCotofana “A Sum of
Absolute Differences Implementation in FPGA Hardware”, International
Journal of Electrical and Computer Engineering 4:9 2009
[10]Joaquin Olivares, Ignacio Benavides and et. al., “Minimum Sum of
Absolute Differences implementation in a single FPGA device”, Dept. of
Electro-technics and Electronics, University of Cordoba, Spain.
[11] Hamza R.A, Rahim R.A and Noh Z.M, Fkekk, Utem, Ayer Keroh,
Malaysia “ Sum of Absolute Difference Algorithm in Stereo Correspondence
Problem for Stereo Matching in Computer Vision Application”, Computer
Science and Information Technology (ICCSIT). 2010 3rd IEEE International
conference of, Computer Science and Information Technology(ICCSIT).
ISSN: 2231-5381
http://www.ijettjournal.org
Page 380
Download