International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015 Implementation of Low Power SAD Architecture for Motion Estimation Sunitha S Patil1, Narendra C.P2 1 2 M.Tech student, Department of ECE, Bangalore Institute of Technology, Bengaluru, India Assistant professor, Department of ECE, Bangalore Institute of Technology, Bengaluru, India Abstract—Multimedia video applications embraces a variety of advanced features in consumer devices/mobile phones such that it has become default features in a wide range of electronic gadgets and devices. Due to the scaling of VLSI chip design technology careful trade-off between the quality metrics like area, power and performance is essential based onthe application. To meet the bandwidth criteria and to achieve fast transfer rate there is a need to compress the video before transmission. In video compression algorithm Motion Estimation is critical block because of its intense computations. Motion Estimation operation involves predicting the frames and identifying motion vectors so that redundancy can be exploited by eliminating the transfer of similar information between successive frames.The most efficient and simple technique to estimate the motion vectors is Sum of Absolute Difference(SAD) algorithm. In this paper low power SAD architecture is implemented in ASIC domain. A new low power 1-bitfull adder cell is used in proposed SAD architecture which gives improvements in Leakage Power (38.43%), area(5.92%) and performance(9.94%) when compared to the architecture having conventional full-adder. Keywords—SAD, LOW POWER, VLSI, H.264/AVC, ADDER, ASICetc. I. INTRODUCTION In the ever increasing world of digital era, due to innovations in the VLSI technologyand with the advent of smartphones there is a huge demand for multimedia applications in all the fields such as HDTV, live streaming of video, video conferencing, Electronic patient recording and so on. Multimedia integrates many types of data such as text, graphic, sound and video. There is a need to represent this data to store and transmit for effective communication[1]. Compression in video is very essential to meet the technological demands such as low power, less memory and fast transfer rate for different range of devices. Apart from spatial redundancy in a frame, there will be a similarity present between the successive frames of a video called temporal redundancy. In general, not the whole frame but just a small portion of each frame will be involved in motion of a particular video sequence. For example motion of a person or vehicle in a scene of a movie. A scene in a video of duration 3 seconds, assuming a refresh rate of 60 frames per second which totally forms 180 frames related to that duration of a video. Because of the similarity present between the successive frames we can send ISSN: 2231-5381 only the information related to the segments that defines the movement associated with them so that significant bandwidth savings can be made by exploiting the temporal redundancy[2]. The technique used to achieve greater compression ratios is Motion Estimation[3-6]. Motion estimation exploits high correlation between successive frames predicting the next sequences of a video, sometimes combination of both preceding and succeeding frames. Here the difference associated between each successive frames are sent instead of sending actual source information. The accuracy of prediction depends on the movement of motion vectors. The motion vector is expressed in the format of (X, Y), where X represents the number of pixels that moves in horizontal direction, while Y represents the number of pixels that moves in vertical direction. The simple metric system used to exploit the similarities between the video frames is the SAD algorithm, where the absolute difference values between the corresponding elements/pixels are added up. There are several video coding standards in the video processing systems; the modern/latest video coding standard used is H.264/AVC [8]. This video coding standard uses the Variable Block Size Motion Estimation (VBSME), and the computational requirements are much higher compared to the previous video coding standards such as H.263/MPEG-IV, MPEG-2. In H.264/AVC, each picture frame is divided into different macro blocks.And further each macro block is subdivided into several sub-blocks as shown in“fig 1”. In this paper we are implementing the low power sum of absolute difference algorithm in order to determine the effective motion vectors which gives the least sad value, Based on the design specifications, the SAD algorithm is implemented using Verilog HDL coding which is functionally verified(simulated) using the Modelsim simulation tool and the design is synthesized using Cadence RC compiler tools. The remaining part of the paper includes the following Sections: Section II describes the work related to the Implementation of different types of SAD architecture to determine the effective motion vectors. Implementation of regular and proposed 1-bit full adder architecture has beendescribed in Section III. The details of sum of absolute difference architecture have been presented in Section IV. http://www.ijettjournal.org Page 376 International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015 Section V describes the results and discussion, and finally Section VI includes the conclusion. which employs 1024 SAD processing units forH.264/AVC encoder in terms of gate count and operational frequency.Work proposed in [13] presents SAD architectures targeting gate count and delay optimization which is not suitable for low power devices. The proposed work in this paper highlights both the improvements in power dissipation as well as the performance. III. IMPLEMENTATION An adder is a digital circuit that is mainly used to perform the addition of two numbers/quantities. There are different types of adders among which the ripple carry adder is the simplest one. A. Ripple carry adder Cascading the 1-bit multiple full adders in parallel in order to add N-bit binary numbers that forms the logical circuit is called a ripple carry adder. Where each carry bit is rippled or propagated to the next stage/next adder. Since no carry-in is required at the 1st stage, therefore the 1st full adder can be replaced by the half adder to utilize the resources efficiently.The block diagram of ripple carry adder is as shown in “fig 2”. Fig.1 Different block sizes of motion estimation in H.264. II. RELATED WORK Different types of SAD implementations for H.264 video codec has been proposed in the literature, considering the tradeoff between the parameters like area, performance and power parameters. The SAD algorithm is the simplest metric which considers all the pixels in a macroblock with one to one mapping between template and search image[1]. Fig.2 4-Bit binary ripple adder circuit. The main advantages of ripple carry adder is that, it has the smallest area, longest delay and consumes the lowest power. B. Existing architecture Motion estimation techniques mainly exploit the temporal redundancy between the successive video frames, achieving significant bandwidth savings, which is presented by work [3-7]. SAD algorithms can be implemented in various domains to achieve desired parameters for various applications. The work in [9] implemented SAD on FPGA. SAD algorithm to determine the motion vectors can be modelled using VHDL and implemented on FPGA [10].SAD algorithm can be implemented in Matlab for stereo matching in computer vision applications [11]. The work presented in [12] proposed the SAD algorithm ISSN: 2231-5381 Fig.3 1-Bit regularfull adder architecture. This is the basic full adder architecture which is generally used. The full adder architecture plays a very important role in the construction of basic binary adders. The basic existing full adder architecture which mainly consists of 2 XOR gates, 2 AND gates, and 1 OR gate is shown in “fig 3”. http://www.ijettjournal.org Page 377 International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015 Some of the limitations of the existing architecture are as follows: It consists of more number of cells which increases the total cell area. More interconnects increases the glitches and which leads to the increased power consumption. Interdependency of Carry path on the sum path partially. C. Proposed architecture There are different types of full adders that are implemented earlier, but here is the proposed new full adder architecture which is well optimized in terms of several parameters like area, delay and power and this proposed architecture consists of a complex cell called AOI222 from TSMC 180nm technology, which helps in reducing the overall power consumption which is as shown in “fig 4”. differences between each pair of corresponding pixels of template image and search image and the obtained absolute difference values are summed up together by means of an adder to obtain the result in the similarity block. It mainly involves just two basic arithmetic operations i.e, addition and subtraction. The block diagram of sum of absolute difference algorithm is shown in “fig 5”. The SAD algorithm is one of the simplest metric which considers all the pixels in the block/frame for the computation that too separately and it is one of the most time efficient algorithm since it compares the group of pixels together which makes its implementation easier, faster and parallel. And it is used in many of the applications like block motion estimation, object recognition etc. This paper proposes the new low power SAD architecture which is well optimized in terms of resource sharing and which consumes less area and power. Advantages of the proposed cell Complex cell reduces the gate counts and delay optimizations required for the implementation. Reduced interconnects reduces the glitches and hence the associated power consumption is also reduced. Individual carry path (non-interdependency of carry path on the sum path). This proposed architecture is implemented in the main sum of absolute difference architecture in order to perform the basic arithmetic operations like addition and subtraction. To get the improved results. Fig.5 Block diagram of Sum of Absolute Difference. The hierarchy of SAD architecture is shown in “fig 6”. Fig.4 1-Bit proposed full adder architecture. D. SAD architecture There are different types of block matching algorithms used for predicting/determining the motion vectors out of which the SAD algorithm is the simplest and most efficient one. It is the most repeated block in the block matching algorithm within the motion estimation subsystem. Sad algorithm is mainly used for determining the motion vectors by predicting the similarities between the Images/frames which is determined by taking the absolute ISSN: 2231-5381 Fig.6 The hierarchy of sum of absolute difference architecture. http://www.ijettjournal.org Page 378 International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015 The absolute difference values which are obtained are added up together using an adder and the obtained results are fed to comparator and the comparator compares and selects the block with the minimum absolute difference value and finally we get the similarity block. reduced to 19.68% due to the non-interdependency of the carry and the sum path and hence we obtain the parallel computations. The Leakage power has been significantly improved by 40.03% against its counterpart architecture because of the use of higher transistor stack complex cell called AOI222 where the on-resistance between the supply rails is high. IV. RESULTS & DISCUSSION Here we are proposing a new low power sum of absolute difference algorithm for the H.264 video coding standard for determining the motion estimation. The existing & proposed parallel 8X8 SAD architectures were implemented in ASIC methodology. The Existing architecture includes EX-OR gates, AND gates, and OR gate for the 1-bit full adder architecture which is replaced in the adder part of SAD architecture. The following TABLE Ishow the results of existing architecture with respect to various parameters like area, delay and power when synthesized in Cadence RC compiler tool. TABLE I Synthesis results of 1-bit regular full adder architecture TABLE III Synthesis results of regular and proposed 1-bit full adder architecture SL No RegularFull Adder Area Delay(ps) Leakage power(nW) Dynamic power(nW) Total power(nW) 93 315 4.372 6549.499 6553.871 The Proposed architecture uses a complex cell called AOI222, which gives the better optimized results compared to the Regular full adder architecture. When this 1-bit proposed full adder architecture is synthesized in RC Compiler we obtain the following results which are tabulated as follows in %Gain 1 Area 93 86 7.527 2 315 253 19.683 4.372 2.62 40.073 4 Delay(ps) Leakage power(nW) Dynamic power(nW) 6549.499 6100.844 6.850 5 Total power(nW) 6553.871 6103.463 6.873 3 Parameters Technology Library=180nm Regular Proposed Full Adder Full adder which uses which uses Parameters EXOR, EXOR gate AND,OR & AOI222 gates Cell When the existing and proposed 1-bit full adder architectures are implemented in the adder part of the SAD architecture we obtain 5.92% improvement in the area, 9.94% improvement in the performance and 38.43% improvement in the leakage power, the following compared results are tabulated below in TABLE IV. TABLEIV Synthesis results of Existing and proposed SAD architecture TABLE II. TABLE II Synthesis results of 1-bit full adder proposed architecture Technology Library=180nm Existing Proposed Parameters SAD SAD %Gain Parameters Proposed Full adder SL No Area 86 1 Area 495657 466278 5.927 Delay(ps) 253 2 7522 6774 9.944 Leakage power(nW) 2.620 Dynamic power(nW) 6100.844 20.137 12.398 38.430 Total power(nW) 6103.463 Delay(ps) Leakage power(µW) Dynamic power(µW) Total power(µW) 112174.65 114023.16 -1.648 112194.79 114035.56 -1.641 3 4 When the Proposed full adder architecture is compared with the Existing full adder architecture we obtain the improved, optimized results in terms of several parameters like area, performance and power (both leakage and dynamic power). The percentage change in gain is also estimated below in TABLE III. The proposed architecture gives 7.52% improvement in area because of the use of reduced number of gates when compared to the Existing architecture, and the delay is also ISSN: 2231-5381 5 V. CONCLUSION In this paper we implemented sum of absolute difference algorithm for motion estimation.The proposed concept gives significant improvements in the results. So when the datapath architecture is optimized we get the improved results at the system level also and this proposed http://www.ijettjournal.org Page 379 International Journal of Engineering Trends and Technology (IJETT) – Volume 23 Number 8- May 2015 datapath architecture can be designed and extended to any required constraint ofn-bit width. When these datapath optimizations for SAD architecture are implemented in ASIC design flow and synthesized using Cadence RC Compiler, better improved results are obtained when compared to the existing architecture.Further carrying out this work in circuit level results in higher efficiency and facilitates greater control over the datapath architectures to design as per the constraint of the applications. ACKNOWLEDGMENT Authors like to express their deep gratitude towards the Department of Electronics and Communication Engineering of Bangalore Institute of Technology for their support and encouragement during this work. REFERENCES [1] D.V. Manjunatha, G. Sainarayanan “Power Efficient Sum of Absolute Difference Algorithms for video Compression”IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)(Mar. – Apr. 2013) [12] T. C. Chen, et al., “Analysis and Architecture Design of anHDTV720p 30 Frames/s H.264/AVC Encoder”, IEEETCSVT, v. 16, no. 6, Jun. 2006, pp. 673-688. [13] J. Vanne, E Aho, T D Hamalainen and K Kuusilinna, “AHighPerformance Sum of Absolute DifferenceImplementation for Motion Estimation”, IEEE TCSVT, v. 16,n. 7, Jul. 2006, pp. 876-883. AUTHORS PROFILE Sunitha S Patilreceived her B.E degree in Electronics and Communication Engineering from S.K.S.V.M.A.C.E.T, Lakshmeshwar, 33Gadag. Currently she ispursuing her Master degree in Digital Electronics and Communication Engineering from Bangalore Institute of Technology, Bengaluru, India. Her area of interests include VLSI Communications, Digital signal processing. Mr. Narendra C. P is Assistant Professor in the Department of Electronics and communication Engineering, Bangalore Institute of Technology, Bangalore. Received his B.E. degree in Instrumentation and Electronics from Bangalore University. The specialization in Master degree was Digital Electronics and Communication from NMAMIT, Nitte,Visvesvaraya Technological University (VTU), Belgaum, Karnataka and published 2 papers and currently pursuing Ph.D. His research interests include Digital Signal Processing, Digital Image Processing and VLSI Signal Processing. [2] FRED HALSALL: “Multimedia Communications Applications, Networks, Protocols and standards”(PEARSON, 2001). [3] Y. Wang et al. “Hilbert scanning search algorithm formotion estimation,” IEEE transactions on circuits and systemsfor video technology, vol. 9, issue 5 pp. 683-691, Aug. 1999. [4] S. Lee et al. “New motion estimation algorithm usingadaptively quantized low bit-resolution image and its VLSIarchitecture for MPEG2 video encoding,” IEEE transactionson circuits and systems for video technology, vol. 8, issue 6,pp 734 -744, Oct. 1998. [5] M. Pickering et al. “An adaptive search algorithm for blockmatching motion estimation,” IEEE transactions on circuitsand systems for video technology, vol. 7, issue 6, pp 906-912,Dec. 1997. [6] J. Y. Tham et al. “A novel unrestricted center biaseddiamond search algorithm for block motion estimation,” IEEEtransactions on circuits and systems for video technology, vol.8, issue 4, pp 369-377, Aug. 1998. [7] Chandana Pandey, DeependraPandey"Implementationof Novel Threshold Diamond Search (TDS) Algorithm for Fast Motion Estimation", International Journal of Engineering Trends and Technology (IJETT), V23(5),268-274 May 2015. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group. [8] DharmendraJha ,FreminKannampuzha , Justin Joseph , StevewPossa , Dr. Deepak Jayaswal , Santosh Chapaneri. "Motion Estimation Algorithms for Baseline Profile of H.264 Video Codec".International Journal of Engineering Trends and Technology (IJETT).V4(4):727-733 Apr 2013. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group. [9] Stephan Wong, StamatisVassiliadis, and SorinCotofana “A Sum of Absolute Differences Implementation in FPGA Hardware”, International Journal of Electrical and Computer Engineering 4:9 2009 [10]Joaquin Olivares, Ignacio Benavides and et. al., “Minimum Sum of Absolute Differences implementation in a single FPGA device”, Dept. of Electro-technics and Electronics, University of Cordoba, Spain. [11] Hamza R.A, Rahim R.A and Noh Z.M, Fkekk, Utem, Ayer Keroh, Malaysia “ Sum of Absolute Difference Algorithm in Stereo Correspondence Problem for Stereo Matching in Computer Vision Application”, Computer Science and Information Technology (ICCSIT). 2010 3rd IEEE International conference of, Computer Science and Information Technology(ICCSIT). ISSN: 2231-5381 http://www.ijettjournal.org Page 380