Uploaded by Mauro Hemerly Gazzani

j.micpro.2019.102930

advertisement
Microprocessors and Microsystems 72 (2019) 102930
Contents lists available at ScienceDirect
Microprocessors and Microsystems
journal homepage: www.elsevier.com/locate/micpro
Low power single precision BCD floating–point Vedic multiplier
V. Ramya a,∗, R. Seshasayanan b
a
b
Department of Electronics and Communication Engineering, Anna University, India
Department of Electronics and Communication Engineering, Meenakshi College of Engineering, Chennai, India
a r t i c l e
i n f o
Article history:
Received 16 March 2019
Revised 18 October 2019
Accepted 24 October 2019
Available online 26 October 2019
Keywords:
Binary floating-point multiplier (BFPM)
BCD floating point multiplier (BCD-FPM)
Urdhva-Tiryakbhyam (UT) sutra
Kogge stone adder (KSA)
Binary to BCD converter (B2BCD)
BCD to binary converter (BCD2B)
a b s t r a c t
In this paper, the Binary coded decimal floating-point multiplier (BCD-FPM) and Binary floating-point
multiplier (BFPM) with binary to BCD (B2BCD) converter are proposed using Urdhva-Tiryakbhyam (UT)
sutra. Two methods are proposed for BCD-FPM and comparison is made between BCD-FPM and BFPM
with B2BCD converter. The designs are modelled in Verilog HDL and synthesized based on the 90nm
standard cell library in Cadence EDA Tool. Comparisons are based on the synthesis report generated by
Cadence RTL complier and implemented in Encounter RTL TO GDSII system. The results show that BCDFPM has better performance in terms of delay and power. The power for Method II gets reduced by
59.47% and 73.40% when compared with Method I and BFPM with B2BCD converter respectively. The
delay for Method II gets reduced by 6.9% than Method I and 30.37% than BFPM with B2BCD converter.
The pipelined architecture is designed for Method II as it is efficient than other multipliers, whose delay
is reduced by 65.82% after pipelining.
© 2019 Published by Elsevier B.V.
1. Introduction
Multiplier is the major unit in the Arithmetic and Logic Unit.
Since it consumes more power and area there is a need for the
design of efficient multiplier in terms of area, power, and latency.
Floating point arithmetic is commonly used in most of the digital
signal processors. Huge errors cannot be tolerated in applicationoriented sectors like banking, commercial, scientific, accounts, insurance, and other user-related functions. Binary arithmetic is
widely used in digital circuits to perform arithmetic and logic operations owing to simpler numerical properties and easy implementation in digital systems. Fixed-point tends to have loss of accuracy when applied in user-oriented applications and eventually
the error rating can reach the peak. To eliminate the error rating
and increase the range of representation, the floating-point technique is used. Fixed-point multiplication can have the drawback of
truncation and hence leads to degradation of precision. The truncation error is reduced by slightly modifying the partial product of
Booth multiplication and the error compensation technique is implemented [1][21,23,24]. Modified Booth multiplier is used for the
multiplication process and it shows that the mean square error is
reduced to 12.3% and 6.3% for 16-bit and 8-bit fixed point multiplication [1,2]. The error characteristics are studied [3] which states
that the error can be further reduced when approximate modified
∗
Corresponding author.
E-mail address: ramya.viswalingam@gmail.com (V. Ramya).
https://doi.org/10.1016/j.micpro.2019.102930
0141-9331/© 2019 Published by Elsevier B.V.
booth multiplication technique is applied. Two techniques are implemented using MBE along with Wallace tree which reduces the
error probability to 12.5% and 25% respectively. The fixed point has
3 stages: partial product generation, partial product reduction, and
result. The latency is predominantly evident in the second stage.
The Vedic technique can efficiently reduce the delay when compared to the modified Booth algorithm [4]. The partial product and
their sum are produced in a single step by using a Vedic multiplier [5,14]. The Q15 and Q31 [5] format multipliers are proposed
using 8∗ 8 and 16∗ 16 Urdhva-Tiryakbhyam (UT). The larger value
cannot be processed using fixed-point that may lead to inaccuracy. Recently floating-point arithmetic is considered in many research areas. The design of a 32-bit binary Vedic multiplier and
its simulation using Xilinx ISE 13.4 when compared with the conventional binary multiplier uses more LUTs and I/Os and has comparatively more delay [6]. Further 4-bit and 8-bit Vedic multipliers
are proposed using Urdhva-Tiryakbhyam (UT) sutra [7,15,16]. This
architecture is realized in 45nm CMOS technology in the Cadence
EDA tool and it is conceived that the proposed designs are efficient
in terms of power, area, and speed. The Vedic multiplier in combination with the Kogge stone adder has been implemented [8] and
it is proved that the design is fastest. The Vedic sutra along with
4:2 and 7:2 compressors for addition is explained [9,20] and compared with the conventional multiplier. The result shows that the
implemented design has terms of area and delay. A single and double precision floating point [10] using UT sutra using a carry save
adder gives higher speed than the conventional multiplier. The single precision floating point [18,19] is discussed using RCA and it is
2
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
compared with a different floating-point multiplier. The design is
modelled in Verilog HDL and designed based on the TSMC 180nm
standard cell library. The comparison shows that the Vedic multiplier reduces the power by 26% than the modified booth multiplier. Different adders for binary floating-point multiplier using
Vedic sutra have been studied and the prefix Sklansky adder has
better performance [22].
Decimal multiplication is mostly preferred in many real-time
applications [13,25]. Every decimal digit cannot be exactly represented in binary with a finite number of bits, either it is rounded
or truncated. Many applications cannot tolerate errors that result
from the conversion of binary to decimal format. Thus, to overcome these problems binary coded decimal is used. The BCD multiplication is carried out [11] for fixed-point and it is claimed that
the proposed partial product generator architecture saves 30% of
the area. For the multiplication process, Vedic Sutra is utilized after
the conversion two of the 4-bit binary number to BCD [17] and the
resultant binary product is again converted to BCD using binary to
BCD converter. A 32-digit binary coded multiplier is proposed using a novel binary counter for addition, BCD full adder and binary
to BCD converter for conversion of Binary to BCD [12].
This paper is organized as follows. In Section 2, the binary
floating-point format and the architecture are briefly reviewed. The
study on Vedic maths is presented in Section 3 . Section 4 describes the proposed BCD floating point multiplier. The pipelined
BCD-FPM is discussed in Section 5. In Section 6 the comparison
and the performance metrics analysis are discussed. The conclusion is described in Section 7 and the future work is described in
Section 8.
Table 1
Floating-point format.
Precisions
Sign
Exponent
Mantissa
Single precision (32 BITS)
Double precision (64 BITS)
1
1
8
11
22+1(implicit bit)
52+1(implicit bit)
2. Binary floating-point representation
Fig. 1. Binary floating-point architecture.
A number is generally represented in fixed-point of significant
digit and scaled using an exponent in some fixed base. The fixed
point operation can produce the resultant bits that may exceed the
operands. In the case of the multiplication process, the resultant
bits will be the sum bit of the operands. To fit the resultant bits
into the same size as operand bit size the term called truncation
and rounding comes into the picture. In this case, there is a possibility of information loss that results in accuracy issues. The problem is that the fixed point is prone to loss of precision when large
numbers are evaluated. The other major issue is that integer fixed
point is tedious to use in processor due to overflow conditions.
In a floating-point number, the decimal point can be shifted to
the right or left of a fixed number. When compared to the fixedpoint number, floating-point can represent a very large and small
number; thereby expanding the range of representation. The floating point is typically represented as sign (s), exponent (e) and
mantissa (M).The standard format for floating point number is depicted as,
−1S X be X M
(1)
The IEEE standard for floating point arithmetic (IEEE754) was
established in 1985 by the Institute of Electrical and Electronics
Engineer for floating point computation. Half, Single, Double, Extended and Quad are the precisions formulated by the IEEE754
standard. Out of these the single precision and double precision
are dominantly used. Table 1 shows the bit format of single and
double precision.
The single precision binary floating point is represented as 32bits which includes the 8-bit exponent, MSB as sign bit and the
23-bit mantissa. For double precision the MSB is sign bit, the exponent is 11-bits and the mantissa is about 53-bits. The 23rd bit
and the 53rd bit of mantissa are implicit bits. The standard format
for representing single precision floating point number is,
−1s 2E b0 .b1 b2 b3 . . . . . . . . . . . . .bp−1
(2)
The fractional part and the exponent part is given as,
f = b0 .b1 b2 b3 . . . . . . . . . . . ..bp−1
e = E + 127
(3)
Fig. 1 gives the architecture of BFPM. The multiplication process involves the computation of sign, exponent and the mantissa
part. The sign bit is expressed as 0 if the number is positive and 1
if the number is negative. XOR operation is carried out to get the
MSB bit which is the resultant sign bit. The exponent term is manipulated using the Kogge stone adder. The 8-bit exponent of both
the operands is summed up and it is biased to 127 for single precision. The mantissa part is computed using the Vedic multiplication
technique.
Urdhva-Tiryakbhyam sutra is used for the calculation of mantissa which results in a 48-bit wide multiplication product as both
operands are in length of 24-bits. As the resultant bit is twice
the length of the operand, the normalization progression is done
by eliminating the preceding one in the mantissa region and accordingly the exponent value is changed. The exponent term is increased by one if the leading bit of the mantissa is high and the
mantissa is expressed as 23-bit in length from the succeeding bit.
If the foremost bit in the mantissa part is zero then the bit from
(n-2) bit position is considered in case of mantissa and the exponent term is maintained as such. When the exponent is from 1 to
254 the number is either positive floating point or negative floating point number which is concluded by the MSB bit of the resultant. The binary result is then converted to BCD using a B2BCD
converter. The value after the decimal point cannot be inferred correctly. The conversion of binary to decimal will result in an error
and when the larger numbers are used these errors can further in-
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
3
that is smaller than the smallest normal number is a denormalized number. The production of a denormalized number is sometimes called gradual underflow because it allows a calculation to
lose precision slowly when the result is small.
2.1. Algorithm for binary floating-point multiplier
Input : T wo Binary F loating point numbers A and B
Binary Out put : C
BCD Out put : D
SINGLE PRECISION:
M[47 : 0] ← {Implicit bit, A[22 : 0]} ∗ {Implicit bit, B[22 : 0]}
NormalizedMantissa ← C[22 : 0]
C[31]← A[31]∧ B[31]
C[30 : 23] ← A[30 : 23] + B[30 : 23] − 127(Bias )
D ← BC D (C )
DOUBLE PRECISION:
M[103 : 0] ← {Implicit bit, A[51 : 0]} ∗ {Implicit bit, B[51 : 0]}
NormalizedMantissa ← C[51 : 0]
C[63]← A[63]∧ B[63]
C[62 : 52] ← A[62 : 52] + B[62 : 52] − 1023(Bias )
D ← BC D (C )
2.2. Simulation result
The simulation is performed in CADENCE Sim-Vision for BFPM.
The enforced input and the corresponding output for BFPM are illustrated in Table 3 and the corresponding waveform is given in
Fig. 3.
Fig. 2. Flowchart for binary floating point.
3. Vedic maths
Table 2
Exceptional cases in IEEE 754 standard.
Sign
Exponent
Mantissa
Representation
0/1
0/1
0/1
0/1
0/1
1-254
0
0
255
255
Anything
0
Non-Zero
0
Non-Zero
Positive/Negative Floating Point
Positive/Negative Zero
Denormalized number
Positive/Negative Infinity
Not-a-Number
crease. To avoid these kinds of error the BCD-FPM is proposed. The
flow of binary floating point operation goes as shown in Fig. 2.
The IEEE 754 standard provides special cases such as overflow
and underflow conditions. When the exponent is too large to be
represented in the exponent field the checker unit indicates overflow condition. When the negative exponent becomes too large
then it indicates an underflow condition. The checker unit checks
for 5 exceptional cases namely: zero, negative zero, Positive infinity, Negative infinity, Not a Number (NaN).The different cases are
summarized in Table 2. The number is said to be zero if every bit
in the representation is zero. NaN is a value that does not make
sense such as non-real numbers or the result of an operation like
infinity times zero. The number is said to be infinite if all the bit
of the exponent is 1 with mantissa value as zero. The positive and
negative infinity is declared by the sign bit. Any non-zero number
Vedic mathematics is an ancient mathematics technique that is
rediscovered from Vedas (1911-1918) by Sri Bharati Krishna Tirthaji
Maharaj (1884-1960). Vedic is a Sanskrit word derived from the
word ‘Vedas’ which means ‘KNOWLEDGE’. The regular mathematical model consumes more time and sometimes complex in operation. Vedic mathematics is a collection of sutras that is used to
solve arithmetic calculation in simple, efficient and fastest way. It
has 16 sutras or aphorisms and 13 sub-sutras [27]. Out of these 3
sutras and 2 sub sutras are used for multiplication which is listed
below,
1.
2.
3.
4.
5.
Urdhva- tiryakbhyam
Nikilam Navatashcaramam Dashatah
Anurupvena
Ekanvunena Purvena
Antyavordasake’pi
Among all these methods Urdhva-Tiryakbhyam (Vertical and
crosswise) is universally adopted method as it is suitable for both
binary and decimal number system. Nikhilam sutra specifies the
subtraction of a number from the nearest power of 10. This sutra is not suitable for decimal number system because at least one
operand has to be near the power of 10. Anurupvena sutra is another Vedic multiplication trick when both numbers are not closer
to the power of 10 but closer to multiples of 10 and closer to each
Table 3
Inputs and output in the simulated result.
Input
BFPM
S
E
M
A(-19.0)
B(9.5)
1
0
OUTPUT
10000011
10000010
00110000000000000000000
00110000000000000000000
(-180.5) Binary
1
10000110
01101001000000000000000
4
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
Fig. 3. BFPM waveform.
other. Ekanvunena Purvena is applicable whenever multiplier has
only 9 s as digit as the result is not suitable for all types of numbers. Antyavordasake’pi can be applied when the last digit of both
the numbers totals as 10. Except for Urdhva-Tiryakbhyam, all other
sutras are specific multiplication methods which mean they can
be applied when the numbers satisfy certain conditions like both
numbers are closer to the power of 10 or numbers closer to each
other or addition of last digits of both numbers is 10. The multiplication sutra used for BCD-FPM and BFPM is Urdhva-Tiryakbhyam
as it is a general method which suits for all types of numbers.
Table 4 defines the condition for the different sutras used for multiplication.
Fig. 4. Line diagram for 3 bit UT sutra.
3.1. Urdhva-Tiryakbhyam
The Vedic multiplier technique employed is UrdhvaTiryakbhyam which originated from Sanskrit word meaning
“VERTICAL” and “CROSSWISE”. This method is preferable as it can
be applied to all types of numbers. The major advantage of UT
sutra is all the partial products are generated concurrently. This
multiplication technique is faster and efficient when compared
with conventional multipliers [18]. The line diagram for two
3-bits; a2a1a0 and b2b1b0 using UT sutra is given in Fig. 4. The
final result of the 3-bit multiplier is c4s4s3s2s1s0. Fig. 5 gives the
example of UT sutra using two decimal digits.
Considering the numbers, A = a2 a1 a0 and B = b2 b1 b0 .
⎫
s0 = a0 b0 ;
⎪
⎪
⎬
c 1 s 1 = a 1 b 0 +a 0 b 1 ;
c2 s2 = c1 +a1 b1 +a0 b2 +a2 b0 ;
⎪
⎪
c3 s3 = c2 +a1 b2 +a2 b1 ;
⎭
c4 s4 = c3 +a2 b2 ;
Fig. 5. Example for UT sutra.
(4)
The algorithm for UT sutra is given as follows (for 3 bit),
Step 1: Arrange the numbers vertically. For the two unequal
operands in terms of number of digits, prefix the lesser digit
operand with zeros until it becomes equal to the number of
digits in another operand.
Step 2: Consider the vertical column from left to right. Vertical
multiplication is done for the leftmost column.
Table 4
Vedic Sutras applicable for multiplication.
Name
Meaning
Conditions
Nikhilam Navatashcaramam Dashatah
Urdhva Tiryakbhyam
Anurupye Shunyamanyat
Ekanyunena Purvena
Antyavordasaki’pi
All from 9 and the last from 10
Vertically and crosswise
If one is in ratio, the other is zero
By one less than the previous one
Last totaling to 10
Applicable
Applicable
Applicable
Applicable
Applicable
when the number is near to power of 10
to all types of numbers
when the number is closer to 10’s
when the digit has only 9 s
when the last digit of both numbers equals to 10
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
5
Fig. 6. BCD Floating point format.
Step 3: Crosswise multiplication is carried out for the first two
columns from the left and it is summed up.
Step 4: Vertical multiplication is done for the centre column
and crosswise multiplication are performed for the remaining two columns and the resultants are aggregated.
Step 5: Crosswise multiplication is performed for the two
columns from the right and the result is summed up.
Step 6: Vertical multiplication is done for the rightmost column.
4. Proposed method
Many applications use BCD floating point format as there is a
need for high precision. BCD float can be used with float (32-bits),
double (48-bits) and long double (56-bits). Each digit in BCD is
represented by a fixed number of bits, commonly 4 or 8 bits. Usually, BCD can be represented in packed and unpacked formats. In
packed format, each digit is represented in 4-bits (i.e. 9 = 1001)
but in case of unpacked 1 byte (i.e. 9 = 0 0 0 010 0 0) is needed to
store a single digit. As a result, there is wastage of space in the
case of unpacked which is not preferable. BCD floating point format for single precision [26] is illustrated in Fig. 6.
The MSB bit is represented as a sign bit as in binary floating
point representation. M epitomizes the 24-bit mantissa value in
BCD. E represents the binary coded decimal exponent value that is
6-bit wide. N intimates that the given number is BCD. Multiplication of BCD floating point can be done in two ways. The BCD mantissa is converted to binary and it is processed as a binary number
which is converted to BCD at the end. The second way is that the
BCD mantissa is kept as such and the process is carried out. But
the multiplication of the two BCD results in binary number which
should to be converted to BCD.
4.1. Method I
In this method, the input is given as 32 bit BCD which composes of MSB as sign followed by the bit which intimates the number is BCD. The next 6-bit express exponent followed by mantissa.
Here the BCD M and E are not converted to binary instead computation is done using BCD directly. The by-product of multiplication
of any two BCD number is binary and it is also true for aggregation. So the result is converted to BCD using binary to BCD converter (B2BCD).The sign bit is calculated by performing XOR on the
MSB bits. The adder used for exponent addition is a Kogge stone
adder. The architecture proposed for the method I is portrayed in
Fig. 7.
4.1.1. Simulation result
The simulation is performed in CADENCE Sim-Vision for BCDFPM (Method II). The enforced input and the corresponding output for the method I is illustrated in Table 5 and its corresponding
waveform is given in Fig. 8
4.2. Method II
The 32-bit BCD format is enforced as input to the multiplier.
The 24-bit mantissa and 6-bit exponent are converted to binary
using the BCD2B converter. Kogge stone adder is used for the summation of binary exponents as it reduces the latency and power
Fig. 7. Architecture for Method I.
Table 5
Inputs and output in the simulated result.
Input
S
N
Method I & II
A(19.5)
0
1
B(-82.5)
1
1
OUTPUT
(-1608.75)
1
1
E
M
001
000
001
000
000110010101000000000000
010
110
000101100000100001110101
100000100101000000000000
Table 6
Binary to BCD conversion
10’s
1’s
Binary
Operation
1
10
100
1
10
101
1000
0000
0000
0000
101000
01000
1000
000
000
00
0
-
Shift left
Shift left
Shift left
Shift left
Add 3
Shift left
Shift left
-
consumption. It generates carry in O (log n) time and is considered as the fastest adder and is widely used in the design to
achieve high performance in arithmetic circuits. Kogge stone adder
has 3 stages: Pre-processing, Carry look ahead network and Postprocessing. The pre-processing stage involves the computation of
generate and propagate signal corresponding to each bit of both
operands. The second stage involves the computation of carries
corresponding to each bit and the final stage is the computation
of sum bits.
⎫
pi = A i ∧ Bi
⎪
⎪
⎪
gi = A i &Bi
⎬
Pi:j = Pi:k+1 &Pk:j
⎪
Gi:j = Gi:k+1 Pi:k+1 &Gk:j ⎪
⎪
⎭
Si = pi ∧ Ci−1
(5)
The Vedic multiplier using UT sutra is used for the calculation
of the mantissa part. The obtained result is converted to BCD using
binary to BCD converter (B2BCD). The mantissa bits are normalized
after converting to BCD. Rounding off mantissa bit length to 24 is
performed by normalizer as the bit length of the mantissa is equal
to 24.This is done by considering the range from the MSB bit. The
architecture proposed for method II is shown in Fig. 9. The magnitude of the product is decided by XOR operation performed for the
sign bit of two operands. N is the one-bit value used to intimate
that the given number is BCD. The final 32-bit resultant will be in
BCD format.
6
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
Fig. 8. BCD-FPM waveform (Method I).
Table 7
BCD to binary conversion
Fig. 9. Architecture for Method II.
Operation
BCD
10’s
1’s
Shift right
Shift right
Shift right
Shift right
Shift right
Shift right
Sub 3
-
100100
10010
1001
100
10
1
-
0
00
100
0100
0010
1001
0110
0
00
00
The architecture for a 6-bit binary to BCD converter is given in
Fig. 11.
4.4. BCD to binary converter
4.2.1. Simulation result
The input and output values forced for BCD multiplication
(Method I) are given in Table 5 and its waveform is shown in
Fig. 10.
4.3. Binary to BCD converter
The product term obtained after mantissa calculation is converted to BCD digit. The technique used for conversion is Shift and
Add-3.The binary and the BCD values are the same for numbers till
9 after that 6 is added to the number to obtain the corresponding
BCD value. The number is checked and if it is greater than 9 then
6 need to be added which results in 5-bits. To avoid extra bits the
Shift and Add 3 technique is used (Table 6). The number, when
shifted to left, denotes that the number is multiplied by 2.3 gets
added if the shifted bits equal to or greater than 5. The process
continues until all the bits are shifted towards left. For example, if
6-bits are considered then there will be tens and ones place each
with 4-bits. The algorithm for Shift and Add-3 is as follows,
Algorithm for Shift and Add-3
Initialization
Input : Binary numberA
Result
Out put : BCD number C
Process
1. Cl ear al l bits o f C to zero.
2. Shi f t A by one bit le f t
3. Check whet her t he shi f ted number is greater than 4
i f (the shi f ted number is greater than 4)
add 3 to the shi f ted number
else
go to 2
4.Repeat til l al l the bits are shi f ted
The exponent and the mantissa are converted into binary in
Method II and the data processing is carried out. The technique
preferred for BCD2B conversion is Shift and Sub-3 (Table 7). Here
the bits are shifted towards the right and checked whether the 4bits value is greater than 7. If the value is greater than 7 then 3 is
subtracted from the value. The architecture for 6 bit BCD to binary
converter is shown in Fig. 12.
Algorithm for Shift and Sub-3
Initialization
Input : BCD number A
Result
Out put : Binary number C
Process
1. Cl ear al l bits o f C to zero.
2.Shi f t A by one bit right
3. Check whet her t he shi f ted number is greater than 7
i f (the shi f ted number is greater than 7)
sub 3 to the shi f ted number
else
go to 2
4.Repeat til l al l the bits are shi f ted
When the two methods are compared it is noted that Method II
is efficient than the Method I in terms of power and delay. To further reduce the delay in Method II, pipelining architecture is proposed.
5. Pipelined BCD-FPM architecture
The pipelining technique decomposes the function into consecutive sub-functions called stages. Each stage performs the specific
operation and produces the intermediate result. Pipelining exploits
parallelism by overlapping the execution process. The clock is connected to the latch and at each clock pulse, every stage transfers
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
Fig. 10. BCD-FPM waveform (Method II).
Fig. 11. 6 bit binary to BCD converter architecture.
Fig. 12. 6 bit BCD to binary converter architecture.
its intermediate result to the input latch of the next stage. The final result is obtained once the data have been passed through the
entire pipeline. The period of the clock pulse should be sufficient
for the data to traverse through each stage. The pipelining technique is used to improve the resource utilization and decrease the
delay thereby increasing the throughput. This can be achieved by
using latches between the logic blocks. A latch-based system gives
significantly more flexibility in implementing a pipelined system
and offers higher performance. The 3-stage pipeline is done so that
the latency approximately gets reduced to one third but the core
area and the power utilized get increased. The pipeline is done for
Method II so that the delay is further reduced. The pipeline for
Method II proposed in Fig 13.
Fig. 13. Pipelined architecture.
6. Results and discussions
6.1. Synthesis report
The proposed schemes are synthesized using CADENCE RTL
COMPLIER and are implemented in Encounter RTL to GDSII system using 90 nm technology. The comparison Table 8 shows the
synthesized results for BFPM, Method I and Method II using KSA
without pipelining.
The delay for Method II has been reduced by 6.9% and 30.37%
when compared with the Method I and BFPM with B2BCD converter respectively. The area gets increased for the proposed BCD
multipliers than BFPM as the gate and cell count increases. When
Fig. 14. Overall comparison chart.
7
8
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
Fig. 15. Comparison between pipelined and non-pipelined BCD-FPM.
Fig. 16. area chart.
Table 8
Comparison of synthesized results without pipelining
Parameters
BFPM with B2BCD
converter (90 nm)
Method I
(90 nm)
Method II
(90 nm)
Cells
Area(μm2 )
Leakage power (nW)
Switching power (nW)
Total power (nW)
Delay (ps)
4697
29829
158970.79
1313332.38
1472303.175
21558
5317
32693.5
184986.22
787357.05
966343.275
16138
8004
53568.1
303370.20
88204.79
391574.990
15009
Table 9
Comparison of synthesized results for pipelined and non-pipelined architecture.
Parameters
Method II with
pipelining (90 nm)
Method II without
pipelining (90 nm)
Cells
Area(μm2 )
Leakage power (nW)
Switching power (nW)
Total power (nW)
Delay (ps)
PDP(pJ)
10932
80969
485447.207
254553.965
740001.172
5130
3.324
8004
53568.1
303370.20
88204.79
391574.990
15009
5.877149
Fig. 17. Power chart.
Method II is compared with the Method I and BFPM the total
power is reduced by 59.47% and 73.40%respectively. As a result, the
power delay product (PDP) for Method II decrease by 62.31% than
the Method I and by 81.48% than BFPM. The overall performance
of Method II is efficient than the Method I for BCD-FPM.
Table 9 shows the comparison between non-pipelined Method
II and pipelined Method II. The pipelined result shows that delay
is reduced by 65.82% compared to non-pipelined BCD-FPM. But the
core area and the power consumption increases simultaneously as
the latches are used between the logic.
Fig. 18. Delay chart.
6.2. Comparison metrics
The comparison metrics for BFPM using RCA and KSA are
shown in Table 10. The overall comparison metric is shown in
Fig. 20.
power of proposed BFPM using KSA gets reduced by 1.06% when
the comparison is made with of proposed BFPM using RCA. The
power chart is given in Fig. 17.
i Delay comparison
i Area comparison
The area of proposed BFPM architecture using RCA reduces by
9.136% when compared to [19]. When KSA is used instead of RCA
then the area gets increased by 7.8% when compared to [19]. But
the area of the proposed BFPM using KSA gets increased by 1.39%
when the comparison is made with proposed BFPM using RCA.
Fig. 16 shows the area comparison chart.
i Power comparison
The BFPM architecture proposed using RCA reduces the power
by 61.38% when compared to [19]. When RCA is replaced by KSA
then power gets decreased by 65.5% compared with [19]. The
The delay gets increased in proposed BFPM architecture using
RCA when the comparison is made with [19]. When RCA is replaced by KSA the delay is further increased by 4.3%. Fig. 18 gives
the Delay comparison chart.
i PDP comparison
The PDP for proposed BFPM architecture using RCA is reduced
by 16.85% when compared to [19]. The PDP for the proposed BFPM
architecture with KSA has reduced by 22.53% than [19]. When RCA
is replaced by KSA in the proposed BFPM architecture then PDP
gets decreased by 6.83%. Fig. 19 shows the PDP chart.
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
9
Table 10
Comparison of performance metrics of BFPM.
BFPM using RCA(180 nm) [18] (1)
BFPM using RCA(180 nm) [19] (2)
Proposed BFPM using RCA(binary result) (180 nm) (3)
Proposed BFPM using KSA(binary result) (180 nm) (4)
Delay (ns)
Area(μm2 )
P.D (mW)
PDP (pJ)
8.0
9.2
19.807
20.659
71946
65109
59160
59995
20.305
15.024
5.802
5.183
162.44
138.22
114.92
107.07
CRediT authorship contribution statement
V. Ramya: Conceptualization, Data curation, Writing - original
draft, Writing - review & editing. R. Seshasayanan: Conceptualization, Data curation, Writing - original draft, Writing - review &
editing.
Acknowledgment
Fig. 19. PDP chart.
The authors extend their sincere thanks to the Centre for Research, Anna University, Chennai, India, for supporting this research work under Anna Centenary Research Fellowship (ACRF).
References
Fig. 20. Overall comparison chart.
7. Conclusion
Low power and delay efficient BCD-floating point multiplier
(BCD-FPM) for single precision are designed using UT sutra. Two
architectures have been proposed for BCD-FPM, Method I and
Method II. To compare the results BFPM is also designed using
KSA. The results show that the BCD-FPM Method II outperforms
the BFPM in terms of power by 73.41 % and delay by 30.37 % and
the BCD-FPM Method I in terms of power by 59.48% and delay
by 6.9 %. To further enhance the performance BCD-FPM Method
II is pipelined. Though the area increases for pipelined structure,
the power-delay product has improved by 43.44 % when compared with architecture without pipelining. Thus, it can be concluded that the pipelined BCD-FPM Method II has better performance metrics than the BCD-FPM Method I and BFPM.
8. Future work
The proposed BCD-FPM can be implemented in arithmetic unit
design and also the KSA in the architecture can be replaced by
Vedic adders. The multiplier can also be extended for double precision.
Declaration of Competing Interest
None.
[1] J.P. Wang, S.R. Kuang, S.C. Liang, High-accuracy fixed-width modified Booth
multipliers for lossy applications, IEEE Trans. Very Large Scale Integr. VLSI Syst.
19 (1) (2011) 52–60.
[2] Y.H. Seo, D.W. Kim, A new VLSI architecture of parallel multiplier–accumulator
based on the Radix-2 modified Booth algorithm, IEEE Trans. Very Large Scale
Integr. VLSI Syst. 18 (2) (2010) 201–208.
[3] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, F. Lombardi, Design of approximate
radix-4 Booth multipliers for error-tolerant computing, IEEE Trans. Comput.
(2017).
[4] A. Mittal, A. Nandi, D. Yadav, Comparative study of 16-order FIR filter design
using different multiplication techniques, IET Circ. Devices . Syst. (2017).
[5] M. Ashwath, B.S. Premananda, The signed fixed-point multiplier for DSP using
the vertically and crosswise algorithm, in: Computing, Communications and
Networking Technologies (ICCCNT), 2013 Fourth International Conference on,
IEEE, 2013, July, pp. 1–6.
[6] A. Bisoyi, M. Baral, M.K. Senapati, Comparison of a 32-bit Vedic multiplier with
a conventional binary multiplier, in: Advanced Communication Control and
Computing Technologies (ICACCCT), 2014 International Conference on, IEEE,
2014, May, pp. 1757–1760.
[7] S. Tripathy, L.B. Omprakash, S.K. Mandal, B.S. Patro, Low power multiplier architectures using Vedic mathematics in 45nm technology for high-speed computing, in: Communication, Information & Computing Technology (ICCICT),
2015 International Conference on, IEEE, 2015, January, pp. 1–6.
[8] R. Anjana, B. Abishna, M.S. Harshitha, E. Abhishek, V. Ravichandra, M.S. Suma,
Implementation of Vedic multiplier using Kogge-stone adder, in: Embedded
Systems (ICES), 2014 International Conference on, IEEE, 2014, July, pp. 28–31.
[9] R. Gupta, R. Dhar, K.L. Baishnab, J. Mehedi, Design of high performance 8-bit
Vedic Multiplier using compressor, in: Advances in Engineering and Technology (ICAET), 2014 International Conference on, IEEE, 2014, May, pp. 1–5.
[10] S.S. Mahakalkar, S.L. Haridas, Design of high-performance IEEE754 floating
point multiplier using Vedic mathematics, in: Computational Intelligence and
Communication Networks (CICN), 2014 International Conference on, IEEE,
2014, November, pp. 985–988.
[11] G. Jaberipur, A. Kaivani, Binary-coded decimal digit multipliers, IET Comput.
Digital Tech. 1 (4) (2007) 377–381.
[12] S. Veeramachaneni, M.B. Srinivas, Novel high-speed architecture for 32-bit binary coded decimal (BCD) multiplier, in: Communications and Information
Technologies, 20 08. ISCIT 20 08. International Symposium on, IEEE, 20 08, October, pp. 543–546.
[13] S. Gonzalez-Navarro, C. Tsen, M.J. Schulte, Binary integer decimal-based floating-point multiplication, IEEE Trans. Comput. 62 (7) (2013) 1460–1466.
[14] S.S. Saokar, R.M. Banakar, S. Siddamal, High speed signed multiplier for digital
signal processing applications, in: Signal Processing, Computing and Control
(ISPCC), 2012 IEEE International Conference on, IEEE, 2012, March, pp. 1–6.
[15] M. Ramalatha, K.D. Dayalan, P. Dharani, S.D. Priya, High-speed energy efficient
ALU design using Vedic multiplication techniques, in: Advances in Computational Tools for Engineering Applications, 2009. ACTEA’09. International Conference on, IEEE, 2009, July, pp. 600–603.
[16] S. Patil, D.V. Manjunatha, D. Kiran, Design of speed and power efficient multipliers using Vedic mathematics with VLSI implementation, in: Advances
in Electronics, Computers, and Communications (ICAECC), 2014 International
Conference on, IEEE, 2014, October, pp. 1–6.
[17] A.K. Mehta, M. Gupta, V. Jain, S. Kumar, High-performance Vedic BCD multiplier and modified binary to BCD converter, in: India Conference (INDICON),
2013 Annual IEEE, IEEE, 2013, December, pp. 1–6.
10
V. Ramya and R. Seshasayanan / Microprocessors and Microsystems 72 (2019) 102930
[18] Sharma Bhavesh, Mishra Ruchika, Comparison of single precision floating point
multiplier using different multiplication algorithm, Int. J. Electr. Electron. Data
Commun. 3 (2015) 106–109 2320-2084.
[19] Sharma, B., & Bakshi, A. Design and implementation of an efficient single precision floating multiplier using vedic multiplication.
[20] Y. Bansal, C. Madhu, A novel high-speed approach for 16× 16 Vedic multiplication with compressor adders, Comput. Electr. Eng. 49 (2016) 39–49.
[21] S. Anjana, C. Pradeep, P. Samuel, Synthesize of high-speed floating-point
multipliers based on Vedic mathematics, Procedia Comput. Sci. 46 (2015)
1294–1302.
[22] K.V. Gowreesrinivas, P. Samundiswary, Comparative study on the performance
of a single precision floating point multiplier using Vedic multiplier and different types of adders, in: Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2016 International Conference on, IEEE, 2016,
December, pp. 466–471.
[23] S. Havaldar, K.S. Gurumurthy, Design of Vedic IEEE 754 floating point multiplier, in: Recent Trends in Electronics, Information & Communication Technology (RTEICT), IEEE International Conference on, IEEE, 2016, May, pp. 1131–1135.
[24] A. Jais, P. Palsodkar, Design and implementation of a 64-bit multiplier using
the Vedic algorithm, in: Communication and Signal Processing (ICCSP), 2016
International Conference on, IEEE, 2016, April, pp. 0775–0779.
[25] A. Vazquez, E. Antelo, P. Montuschi, Improved design of high-performance parallel decimal multipliers, IEEE Trans. Comput. 59 (5) (2010) 679–693.
[26] http://www.fsinc.com/reference/html/com9anm.htm.
[27] Jagadguru Swami Sri Bharati Krisna Tirthaji Maharaja, Vedic Mathematics Sixteen Simple Mathematical Formulae from the Veda, 1965.
V. Ramya received B.Tech. degree in Electronics and Communication Engineering from Sri Manakula Vinayagar Engineering College, Pondicherry University, Puducherry, in
the year 2015, M.E. degree in VLSI Design from College of
Engineering Guindy, Anna University, Chennai, in the year
2017. She is currently pursuing Ph.D. in VLSI Design at Department of Electronics and Communication Engineering,
College of Engineering Guindy, Anna University, Chennai.
She is receiving Anna Centenary Research Fellowship from
Centre for Research Anna University, Chennai. Her area of
interest includes Digital circuit design and VLSI Design.
R. Seshasayanan received his M.E. degree and Ph.D. from
Anna University, India in the year 1983 and 2008 respectively. He is presently working as Associate Professor in
the Department of Electronics and Communication, Anna
University, India and his area of interests include MIMO
system architecture, modulation and coding, multi user
communications technique and Reconfigurable architecture.
Download