Implementation and Comparative Analysis

advertisement
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
Implementation and Comparative Analysis
between Devices for Double Precision Interval
Arithmetic Radix 4 Wallace tree Multiplication
Krutika Ranjankumar Bhagwat#1 , Dr. Tejas V. Shah* 2 , Prof. Deepali H. Shah# 3
#
Instrumentation & Control Engineering Department
L. D. College of Engineering
Ahmedabad-380015, Gujarat, India
*S.S College of Engineering
Bhavnagar - 364060, Gujarat, India
Abstract— This paper presents comparative analysis between
two devices for the design of a radix 4 wallace tree multiplier
that performs interval multiplication. This 64 bit multiplier
requires Booth partial product selection logic and [27,2]
compressor. This interval arithmetic gives accurate result as
rounding off error of floating point multiplier is eliminated . It
requires slightly more area than conventional floating point
unit. There is definite performance improvement over software
approach as function calls , error and range checking etc are not
present due to dedicated hardware
logic. The input and output registers are each 64 bits and
two multiplexer with control signal tx ,ty are used [10].
Keywords— Double Precision, Interval Multiplication, Booth
partial product selection logic .
I.
INTRODUCTION
IEEE 754 standard defines double precision as 1 sign
bit , 11 bits for exponent ,53 bits for (52 explicitly
stored) significand precision. The format is written with the
significand having an implicit integer bit of value 1, unless
the written exponent have all Zeros, with the 52 bits of
the fraction significand appearing in the memory format.
The total precision is therefore 53 bits results in 16 decimal
digits which gives 53 log10 (2) ≈15.955 ) [4] .
Fig. 1 Interval multiplier
The sign logic computes the sign of the result by
performing the exclusiveor of the sign bits of the input
operands. The exponent adder performs an 11bit addition
of the two exponents and subtracts the exponent bias of
1023 [10].
III. RADIX 4 WALLACE TREE MULTIPLIER
II. INTERVAL MULTIPLICATION
Multiplication of the intervals x = [ xl, xu] and y = [ yl ,yu]
is defined as:
Z = x *y
= [min(xlyl, xlyu ,xuyl , xuyu),max( xlyl, xlyu, xuyl, xuyu)]
The significand multiplier performs a 53 bit by 53 bit
radix 4 wallace tree multiplication. If the most signicant bit
of the product is one, the normalization logic shifts the
product right one bit and increments the exponent. The
rounding logic rounds the product to 53 bits based on a
rounding mode (rm) which round to nearest even [10].
The interval multiplier shown in figure 1 has input and
output registers, sign logic, an exponent adder and a
significand multiplier with rounding and normalization
ISSN: 2231-5381 http://www.internationaljournalssrg.org
Page 340
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
A. Booth multiplication
Booth multiplication is a technique that allows for smaller,
faster multiplication circuits, by recoding the numbers that
are multiplied. It is possible to reduce the number of partial
products by half. In this technique, instead of shifting and
adding for every column of the multiplier term and multiplying
by 1 or 0, we only take every second column, and multiply
by ±1, ±2, or 0 to obtain the same results [16] .
Partial products are halved in this method which gives
tremendous performance advantage.
Booth recode
multiplier term, we consider the bits in blocks of three
such that, each block overlaps the previous block by one bit.
Grouping starts from the LSB, and the first block only uses
two bits of the multiplier [16].
Fig. 3 Booth partial product selector logic
Fig. 2 Grouping of bits from the multiplier term
Figure 2 shows the grouping of bits from the multiplier term
for use in modified booth encoding [16].
Each block is decoded to generate the correct partial
product. The encoding of the multiplier Y, using the
modified booth algorithm, generates the following five signed
digits -2, -1, 0, +1, +2. Each encoded digit in the multiplier
performs a certain operation on the multiplicand X as illustrated
in Table 1.Booth partial product selector logic is shown in
figure 3. Booth recording is used to reduce the number of partial
products 53 to 27. The needed (27,2) Wallace tree is
implemented which is used to add 27 partial products [16].
A [27,2] Compressor is made of three (9,2) blocks and
one (6,2) block shown in figure 4. A [9,2] Compressor is
made of three (3,2) blocks and one (6,2) block shown in
figure 5. A [6,2] Compressor is made of two (3,2) blocks and
one (4,2) block shown in figure 6. A [4,2] Compressor is
made of two full adders shown in figure 7.
B. [27,2] Compressor
Since 27 is a multiple of 9, the (9,2) building block
gives a very simple global routing structure for this multiplier.
But it would not be appropriate for a 16 or 32 bit multiplier
[13] .
TABLE I
OPERATION ON THE MULTIPLICAND
BLOCK
000
001
010
011
100
101
110
111
Re- coded
digit
0
+1
+1
+2
-2
-1
-1
0
X
Operation on
X
0X
+1X
+1X
+2X
-2X
-1X
-1X
0X
ISSN: 2231-5381 http://www.internationaljournalssrg.org
Page 341
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
Fig. 4
[27 ,2] Compressor
respectively.
TABLE II.
POWER ANALYSIS
I(mA)
P(mW)
0
97
31
18
0
37
60
1
v3200efg1156-8
I(mA)
P(mW)
Total estimated power
consumption
Quiescent Vccint 1.80V
0
367
200
360
2
7
xc3s1400an-5-fgg676
Fig. 5
[9,2] Compressor
Total estimated power
consumption
Quiescent Vccint 1.20V
Quiescent Vccaux 3.30V
Quiescent Vcco25
2.50V
Quiescent Vcco33
3.30V
250000
200000
Fig. 6 [6,2] Compressor
150000
v3200efg115
6-8
100000
xc3s1400an5-fgg676
50000
0
Total memory usage in kilobytes
Fig. 7
[4,2] Compressor
Fig. 8 Area Analysis
IV. COMPARATIVE ANALYSIS
Table 2 and 3 gives comparative analysis of interval arithmetic
double precision Radix 4 wallace tree multiplication between
virtexE and spartarn 3A &Spartan 3AN family which has device
XCV3200E and XC3S1400AN, package FG1156and FGG676
ISSN: 2231-5381 http://www.internationaljournalssrg.org
Page 342
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
TABLE III.
COMPARATIVE ANALYSIS
Device comparision for 64 bit interval arithmetic based radix 4 wallace tree multiplier
v3200efg1156-8
AREA ANALYSIS
Number of Slices
333
Number of Slice Flip Flops
143
Number of 4 input LUTs
648
Number of bonded IOBs
498
Number of IOs
498
IOB Flip Flops
214
Number of GCLKs:
2
Total memory usage
237984 kilobytes
SPEED ANALYSIS
Minimum period
9.069ns
Maximum Frequency
110.266MHz
Minimum input arrival time before clock
14.289ns
Maximum output required time after clock
11.919ns
Maximum combinational path delay
19.712ns
TIMING CONSTRAINTS
Worst case stack hold
1.632ns
Best case achievable set up
9.948ns
Total REAL time to Xst completion:
16.00 secs
THERMAL SUMMARY
Estimated junction temperature
30C
Ambient temp
25C
Case temp
29C
Theta J-A range
13C/W
CLOCK REPORT
Fanout le
214
Fanout clk
126
Net skew le ns
0.263
Net skew clk ns
0.139
Max Delay le ns
1.435
Max Delay clk ns
1.167
CLOCK SIGNAL
(LOAD)
clk
143
le
214
ISSN: 2231-5381 http://www.internationaljournalssrg.org
xc3s1400an-5-fgg676
317
117
608
608
498
214
2
183188 kilobytes
8.295ns
120.548MHz
11.534ns
8.200ns
13.469ns
1.324ns
9.265ns
21.00 secs
27c
25C
26C
18C/W
214
112
0.140
0.176
1.06
1.063
117
214
Page 343
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
12
25
10
20
8
15
v3200efg
1156-8
6
10
4
xc3s1400
an-5fgg676
2
5
Maximum combinational path delay
Minimum period in ns
Fig. 9 Timing Constraints Analysis
Minimum input arrival time before clock
0
Worst case Best case
stack hold achievable
in ns
set up in ns
Maximum output required time after…
0
v3200ef
g1156-8
xc3s140
0an-5fgg676
Fig. 11 Speed Analysis
Fig. 10 Power Analysis
V.CONCLUSION
Interval arithmetic provides reliability and accuracy
by computing a lower and upper bound in which result is
guaranteed to reside. Concept of carry look ahead for 11 bit
exponent adder is used which reduces the delay. Radix 4
wallace tree interval arithmetic based multiplication using
virtexE has more the number of gates and delay compare to
spartarn 3A and Spartan 3AN .
237984 kilobytes memory and 367 mW power are
required for virtexE Radix 4 Wallace tree multiplication using
interval arithmetic with 19.712ns maximum combinational path
delay. While only 183188 kilobytes memory and 97 mW
power required for spartarn 3A &Spartan 3AN.
REFERENCES
[1]
[2]
[3]
Josh Milthorpe and Alistair Rendell “Learning to live with errors: A
fresh look at floating-point computation”, Australian National
University, Computing Conference 2005
Gupte, ruchir “Interval arithmetic logic unit for dsp and control
applications”, Electrical and Computer Engineering, Raleigh 2006
Samir Palniker, “ Verilog HDL: A Guide to Digital Design and
Synthesis”, ISBN 81-297-0092-1, @2003 SUN MICROSYSTEMS
ISSN: 2231-5381 http://www.internationaljournalssrg.org
Page 344
International Journal of Engineering Trends and Technology- Volume3Issue3- 2012
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
“IEEE Standard 754 for Binary Floating Point Arithmetic ” ,
ANSI/IEEE Standard No. 754, American National Standards Institute,
Washington DC , 1985.
Behrooz Parhami , “Computer Arithmetic, Algorithms and Hardware
Designs” , 2nd Edn, OXFORD, 2011
C. N.Marimuthu1, P. Thangaraj2, “ Low Power High Performance
Multiplier ”,Anna University,Tamil nadu , India, ICGST-PDCS,
Volume 8, Issue 1, December 2008
Michael J. Schulte and Earl E. Swartzlander Jr., “A Performance
Comparison Study on Multiplier Designs” ,IEEE Transaction On
Computers, May 2000
Yong Dou S. Vassiliadis G. K. Kuzmanov G. N. Gaydadjiev , “64-bit
Floating-Point FPGA Matrix Multiplication” , National Laboratory for
Computer Engineering, FPGA’05, Monterey, California, USA,
February 2005
Anane Nadjia, Anane Mohamed, Bessalah Hamid, Issad Mohamed &
Messaoudi khadidja, “Hardware Algorithm for Variable Precision
Multiplication on FPGA” © 2009 IEEE
James E. Stine and Michael J. Schulte “A Combined Interval and
Floating Point Multiplier”, Computer Architecture and Arithmetic
Laboratory ,Electrical Engineering and Computer Science Department,
Lehigh University, Bethlehem, PA 18015
Sparc Architecture Manual
Prof. LohCS3220- Processor Design “Carry-Save Addition” - Spring ,
February , 2005
“Carry Save Adder Trees in Multipliers” ecen 6 2 6 3 advanced vlsI
design november 3, 1999
C..N. Marimuthu1, P. Thangaraj “Low Power High Performance
Multiplier”, Anna University, Tamil nadu , India
Steve Kilts, “Advanced FPGA Design Architecture, Implementation,
and Optimization”, Wiley – Interscience, A John Wiley & Sons, ISBN
978-0-470-05437-6, @ 2007 IEEE
p. Assady, “A New Multiplication Algorithm Using High-Speed
Counters” Islamic Azad University Varameen branch, Iran, ©
EuroJournals Publishing, Inc. 2009
ISSN: 2231-5381 http://www.internationaljournalssrg.org
Page 345
Download