Implementation and Comparative Analysis

International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 Implementation and Comparative Analysis between Devices for Double Precision Interval Arithmetic Radix 4 Wallace tree Multiplication Krutika Ranjankumar Bhagwat#1 , Dr. Tejas V. Shah* 2 , Prof. Deepali H. Shah# 3 # Instrumentation & Control Engineering Department L. D. College of Engineering Ahmedabad-380015, Gujarat, India *S.S College of Engineering Bhavnagar - 364060, Gujarat, India Abstract— This paper presents comparative analysis between two devices for the design of a radix 4 wallace tree multiplier that performs interval multiplication. This 64 bit multiplier requires Booth partial product selection logic and [27,2] compressor. This interval arithmetic gives accurate result as rounding off error of floating point multiplier is eliminated . It requires slightly more area than conventional floating point unit. There is definite performance improvement over software approach as function calls , error and range checking etc are not present due to dedicated hardware logic. The input and output registers are each 64 bits and two multiplexer with control signal tx ,ty are used [10]. Keywords— Double Precision, Interval Multiplication, Booth partial product selection logic . I. INTRODUCTION IEEE 754 standard defines double precision as 1 sign bit , 11 bits for exponent ,53 bits for (52 explicitly stored) significand precision. The format is written with the significand having an implicit integer bit of value 1, unless the written exponent have all Zeros, with the 52 bits of the fraction significand appearing in the memory format. The total precision is therefore 53 bits results in 16 decimal digits which gives 53 log10 (2) ≈15.955 ) [4] . Fig. 1 Interval multiplier The sign logic computes the sign of the result by performing the exclusiveor of the sign bits of the input operands. The exponent adder performs an 11bit addition of the two exponents and subtracts the exponent bias of 1023 [10]. III. RADIX 4 WALLACE TREE MULTIPLIER II. INTERVAL MULTIPLICATION Multiplication of the intervals x = [ xl, xu] and y = [ yl ,yu] is defined as: Z = x *y = [min(xlyl, xlyu ,xuyl , xuyu),max( xlyl, xlyu, xuyl, xuyu)] The significand multiplier performs a 53 bit by 53 bit radix 4 wallace tree multiplication. If the most signicant bit of the product is one, the normalization logic shifts the product right one bit and increments the exponent. The rounding logic rounds the product to 53 bits based on a rounding mode (rm) which round to nearest even [10]. The interval multiplier shown in figure 1 has input and output registers, sign logic, an exponent adder and a significand multiplier with rounding and normalization ISSN: 2231-5381 http://www.internationaljournalssrg.org Page 340 International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 A. Booth multiplication Booth multiplication is a technique that allows for smaller, faster multiplication circuits, by recoding the numbers that are multiplied. It is possible to reduce the number of partial products by half. In this technique, instead of shifting and adding for every column of the multiplier term and multiplying by 1 or 0, we only take every second column, and multiply by ±1, ±2, or 0 to obtain the same results [16] . Partial products are halved in this method which gives tremendous performance advantage. Booth recode multiplier term, we consider the bits in blocks of three such that, each block overlaps the previous block by one bit. Grouping starts from the LSB, and the first block only uses two bits of the multiplier [16]. Fig. 3 Booth partial product selector logic Fig. 2 Grouping of bits from the multiplier term Figure 2 shows the grouping of bits from the multiplier term for use in modified booth encoding [16]. Each block is decoded to generate the correct partial product. The encoding of the multiplier Y, using the modified booth algorithm, generates the following five signed digits -2, -1, 0, +1, +2. Each encoded digit in the multiplier performs a certain operation on the multiplicand X as illustrated in Table 1.Booth partial product selector logic is shown in figure 3. Booth recording is used to reduce the number of partial products 53 to 27. The needed (27,2) Wallace tree is implemented which is used to add 27 partial products [16]. A [27,2] Compressor is made of three (9,2) blocks and one (6,2) block shown in figure 4. A [9,2] Compressor is made of three (3,2) blocks and one (6,2) block shown in figure 5. A [6,2] Compressor is made of two (3,2) blocks and one (4,2) block shown in figure 6. A [4,2] Compressor is made of two full adders shown in figure 7. B. [27,2] Compressor Since 27 is a multiple of 9, the (9,2) building block gives a very simple global routing structure for this multiplier. But it would not be appropriate for a 16 or 32 bit multiplier [13] . TABLE I OPERATION ON THE MULTIPLICAND BLOCK 000 001 010 011 100 101 110 111 Re- coded digit 0 +1 +1 +2 -2 -1 -1 0 X Operation on X 0X +1X +1X +2X -2X -1X -1X 0X ISSN: 2231-5381 http://www.internationaljournalssrg.org Page 341 International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 Fig. 4 [27 ,2] Compressor respectively. TABLE II. POWER ANALYSIS I(mA) P(mW) 0 97 31 18 0 37 60 1 v3200efg1156-8 I(mA) P(mW) Total estimated power consumption Quiescent Vccint 1.80V 0 367 200 360 2 7 xc3s1400an-5-fgg676 Fig. 5 [9,2] Compressor Total estimated power consumption Quiescent Vccint 1.20V Quiescent Vccaux 3.30V Quiescent Vcco25 2.50V Quiescent Vcco33 3.30V 250000 200000 Fig. 6 [6,2] Compressor 150000 v3200efg115 6-8 100000 xc3s1400an5-fgg676 50000 0 Total memory usage in kilobytes Fig. 7 [4,2] Compressor Fig. 8 Area Analysis IV. COMPARATIVE ANALYSIS Table 2 and 3 gives comparative analysis of interval arithmetic double precision Radix 4 wallace tree multiplication between virtexE and spartarn 3A &Spartan 3AN family which has device XCV3200E and XC3S1400AN, package FG1156and FGG676 ISSN: 2231-5381 http://www.internationaljournalssrg.org Page 342 International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 TABLE III. COMPARATIVE ANALYSIS Device comparision for 64 bit interval arithmetic based radix 4 wallace tree multiplier v3200efg1156-8 AREA ANALYSIS Number of Slices 333 Number of Slice Flip Flops 143 Number of 4 input LUTs 648 Number of bonded IOBs 498 Number of IOs 498 IOB Flip Flops 214 Number of GCLKs: 2 Total memory usage 237984 kilobytes SPEED ANALYSIS Minimum period 9.069ns Maximum Frequency 110.266MHz Minimum input arrival time before clock 14.289ns Maximum output required time after clock 11.919ns Maximum combinational path delay 19.712ns TIMING CONSTRAINTS Worst case stack hold 1.632ns Best case achievable set up 9.948ns Total REAL time to Xst completion: 16.00 secs THERMAL SUMMARY Estimated junction temperature 30C Ambient temp 25C Case temp 29C Theta J-A range 13C/W CLOCK REPORT Fanout le 214 Fanout clk 126 Net skew le ns 0.263 Net skew clk ns 0.139 Max Delay le ns 1.435 Max Delay clk ns 1.167 CLOCK SIGNAL (LOAD) clk 143 le 214 ISSN: 2231-5381 http://www.internationaljournalssrg.org xc3s1400an-5-fgg676 317 117 608 608 498 214 2 183188 kilobytes 8.295ns 120.548MHz 11.534ns 8.200ns 13.469ns 1.324ns 9.265ns 21.00 secs 27c 25C 26C 18C/W 214 112 0.140 0.176 1.06 1.063 117 214 Page 343 International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 12 25 10 20 8 15 v3200efg 1156-8 6 10 4 xc3s1400 an-5fgg676 2 5 Maximum combinational path delay Minimum period in ns Fig. 9 Timing Constraints Analysis Minimum input arrival time before clock 0 Worst case Best case stack hold achievable in ns set up in ns Maximum output required time after… 0 v3200ef g1156-8 xc3s140 0an-5fgg676 Fig. 11 Speed Analysis Fig. 10 Power Analysis V.CONCLUSION Interval arithmetic provides reliability and accuracy by computing a lower and upper bound in which result is guaranteed to reside. Concept of carry look ahead for 11 bit exponent adder is used which reduces the delay. Radix 4 wallace tree interval arithmetic based multiplication using virtexE has more the number of gates and delay compare to spartarn 3A and Spartan 3AN . 237984 kilobytes memory and 367 mW power are required for virtexE Radix 4 Wallace tree multiplication using interval arithmetic with 19.712ns maximum combinational path delay. While only 183188 kilobytes memory and 97 mW power required for spartarn 3A &Spartan 3AN. REFERENCES [1] [2] [3] Josh Milthorpe and Alistair Rendell “Learning to live with errors: A fresh look at floating-point computation”, Australian National University, Computing Conference 2005 Gupte, ruchir “Interval arithmetic logic unit for dsp and control applications”, Electrical and Computer Engineering, Raleigh 2006 Samir Palniker, “ Verilog HDL: A Guide to Digital Design and Synthesis”, ISBN 81-297-0092-1, @2003 SUN MICROSYSTEMS ISSN: 2231-5381 http://www.internationaljournalssrg.org Page 344 International Journal of Engineering Trends and Technology- Volume3Issue3- 2012 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] “IEEE Standard 754 for Binary Floating Point Arithmetic ” , ANSI/IEEE Standard No. 754, American National Standards Institute, Washington DC , 1985. Behrooz Parhami , “Computer Arithmetic, Algorithms and Hardware Designs” , 2nd Edn, OXFORD, 2011 C. N.Marimuthu1, P. Thangaraj2, “ Low Power High Performance Multiplier ”,Anna University,Tamil nadu , India, ICGST-PDCS, Volume 8, Issue 1, December 2008 Michael J. Schulte and Earl E. Swartzlander Jr., “A Performance Comparison Study on Multiplier Designs” ,IEEE Transaction On Computers, May 2000 Yong Dou S. Vassiliadis G. K. Kuzmanov G. N. Gaydadjiev , “64-bit Floating-Point FPGA Matrix Multiplication” , National Laboratory for Computer Engineering, FPGA’05, Monterey, California, USA, February 2005 Anane Nadjia, Anane Mohamed, Bessalah Hamid, Issad Mohamed & Messaoudi khadidja, “Hardware Algorithm for Variable Precision Multiplication on FPGA” © 2009 IEEE James E. Stine and Michael J. Schulte “A Combined Interval and Floating Point Multiplier”, Computer Architecture and Arithmetic Laboratory ,Electrical Engineering and Computer Science Department, Lehigh University, Bethlehem, PA 18015 Sparc Architecture Manual Prof. LohCS3220- Processor Design “Carry-Save Addition” - Spring , February , 2005 “Carry Save Adder Trees in Multipliers” ecen 6 2 6 3 advanced vlsI design november 3, 1999 C..N. Marimuthu1, P. Thangaraj “Low Power High Performance Multiplier”, Anna University, Tamil nadu , India Steve Kilts, “Advanced FPGA Design Architecture, Implementation, and Optimization”, Wiley – Interscience, A John Wiley & Sons, ISBN 978-0-470-05437-6, @ 2007 IEEE p. Assady, “A New Multiplication Algorithm Using High-Speed Counters” Islamic Azad University Varameen branch, Iran, © EuroJournals Publishing, Inc. 2009 ISSN: 2231-5381 http://www.internationaljournalssrg.org Page 345

Implementation and Comparative Analysis

Related documents

Products

Support

Implementation and Comparative Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib