Novel, High-Speed 16-Digit BCD Adders Conforming to IEEE 754r

advertisement
Novel, High-Speed 16-Digit BCD Adders Conforming to IEEE 754r Format
Sreehari Veeramachaneni, M.Kirthi Krishna, Lingamneni Avinash, Sreekanth Reddy P, M.B. Srinivas
Centre for VLSI and Embedded System Technologies (CVEST)
International Institute of Information Technology
Hyderabad, India.
srihari@research.iiit.ac.in, {kirthikrishna, avinashl, sreekanthp}@students.iiit.ac.in, srinivas@iiit.ac.in
In view of increasing prominence of commercial,
financial and internet-based applications that process
data in decimal format, there is a renewed interest in
providing hardware support to handle decimal data. In
this paper, a new architecture for efficient 1-digit decimal
addition of binary coded decimal (BCD) operands, which
is the core of high speed multi-operand adders and
floating decimal-point arithmetic, is proposed. Based on
this 1-digit BCD adder, novel architectures for higher
order (n-digit) BCD adders such as ripple carry adder
and carry look-ahead adder are derived. The proposed
circuits are compared (both qualitatively as well as
quantitatively) with the existing circuits in literature and
are shown to perform better. Simulation results show that
the proposed 1-digit BCD adder achieves an
improvement of 40% in delay. The 16-digit BCD lookahead adder using prefix logic is shown to perform at
least 80% faster than the existing ripple carry one.
implementing BCD arithmetic will be to enhance its
speed as much as possible which is being addressed in
this paper.
This paper introduces and analyses various techniques
for high speed addition of higher order BCD numbers
which form the core of other arithmetic operations such
as multi-operand addition [5, 9], multiplication [6] and
division [7]. A novel architecture for 1-digit BCD
addition is proposed, based on which architectures for
higher order adders such as ripple carry adder and carry
look-ahead adder are derived.
The rest of the paper is organized as follow: Section 2
provides a brief mathematical background of BCD while
section 3 describes the proposed algorithm for BCD
addition. The proposed circuit for 1-digit BCD addition is
given in section 4. In section 5, novel architectures for
higher order BCD adders such as ripple carry adder and
carry look-ahead adder are presented. Simulation results
for the proposed and existing circuits are given in section
6 and discussed in detail.
1. Introduction
2. BCD Arithmetic – A Quick Overview
Due to growing importance of decimal arithmetic in
commercial, financial and internet-based applications,
which cannot tolerate errors of conversion between binary
and decimal formats, hardware support for decimal
arithmetic is receiving an increased attention. Recently,
specifications for decimal floating point arithmetic have
been added to the draft revision of the IEEE-754r
standard for floating point arithmetic [1]. Despite the
widespread use of binary arithmetic, decimal computation
remains essential for many applications. Not only is it
required whenever numbers are presented for human
inspection, but is also often a necessity when fractions are
involved. Decimal fractions are pervasive in human
endeavors, yet most cannot be represented by binary
fractions. The value 0.1, for example, requires an
infinitely recurring binary number. If a binary
approximation is used instead of an exact decimal
fraction, results can be incorrect even if subsequent
arithmetic is correct [2].
It is anticipated that once the IEEE-754r standard is
finally approved, hardware support for decimal floating
point arithmetic will be incorporated on processors for
various applications. Still, the major consideration while
BCD is a decimal representation of a number directly
coded in binary, digit by digit. For example the number
(9527)10 = (1001 0101 0010 0111)BCD. It can be seen that
each digit of the decimal number is coded in binary and
then concatenated, to form the BCD representation of the
decimal number.
To use this representation all the arithmetic and logical
operations need to be defined. As the decimal number
system contains 10 digits, at least 4 bits are needed to
represent a BCD digit. Considering a decimal digit A, the
BCD representation is given by A4A3A2A1 where
all Ak ∈ (0,1) . The only point of note is that the
Abstract
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
maximum value that can be represented by a BCD digit is
9. The representation of (10)10 in BCD is (0001 0000).
Addition in BCD can be explained by considering two
decimal digits A and B with BCD representations as
A4A3A2A1 and B4B3B2B1 respectively. In the conventional
algorithm, these two numbers are added using a 4-bit
binary adder. It is possible that the resultant sum can
exceed 9 which results in an overflow. If the sum is
greater than 9, the binary equivalent of 6 is added to the
resultant sum to obtain the exact BCD representation.
This can be illustrated with the following example
A
B
Sum
Add
0110 (6)
0101 (5)
1011 (11)
0110 (6)
BCD 1 0001 (11 in BCD )
Answer = (0001 0001)
3. Proposed Algorithm for BCD Addition
The existing algorithm for addition of two BCD
digits performs many redundant calculations leading to an
inefficient design. After overflow is detected the entire
number 0110 is added to the resultant sum (S4S3S2S1)
which is implemented using an entire 4-bit binary adder.
But on careful observation, it can be seen that S1 is just
being added to a 0 which doesn’t require any extra
hardware. S2 just needs to be inverted as it is being added
to a 1. S3 is also being added to a 1 which means that it
needs to be inverted only if S2 is a 1. Using similar logic,
S4 needs to be inverted only if either of S3 or S2 is a 1.
Hence the correct sum can be selected by a set of
multiplexers with the select signal as the overflow bit.
Therefore in terms of hardware, instead of a complete
4-bit binary adder, a set of 2 multiplexers arranged in
parallel is needed to compute the corrected sum and
another 3 multiplexers to select the appropriate one. Also
the number of inverters can be minimized as the inverted
output can be obtained by using the complement of the
actual output which is generated in the CMOS
implementation of the multiplexer [8, 12] in the FA of the
first stage. The logical derivation of the overflow bit
which selects the appropriate output is shown below:
If the resultant sum from the first 4-bit binary adder
is S4S3S2S1 and a C (carry-output), then for this number to
be greater than 10:
(
Overflow = (C ⊕ S 4 ) • S 4 S 3 S 2
(
)
= (C ⊕ S 4 ) • S 4 + S 3 + S 2
(
)
)
= C • S 4 + C • (S 3 + S 2 ) • S 4
The digital logic which implements the above
algorithm is used in the proposed 1-digit BCD adder
discussed in the following section.
4. One-Digit BCD Full Adder
A BCD 1-digit adder is a circuit that adds two BCD
digits in parallel and also produces the sum digit in BCD.
A BCD adder must also include the correction logic as
mentioned in section 1 [3, 4].
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
4.1 Existing Architectures
The conventional implementation of the addition as
per the algorithm described above is shown in Fig.1 [10].
It can be seen that there is a 4-bit binary adder at the
beginning to add the two BCD digits (each digit
expressed using 4 bits) and a Carry-input. Then comes the
overflow detection (to check whether the sum of the BCD
digit has exceeded 9) which is designed using two AND
gates and a 3-input OR gate. The output of this logic
determines whether to add 6 (0110) or not. After this, in
the critical path, comes another 4-bit binary adder which
adds 0110 if the overflow logic is ‘High’ and 0000 if the
overflow logic is ‘Low’. This is the correction stage.
Thus the critical path in this circuit consists of a 4-bit
binary adder plus overflow logic plus 4-bit binary adder.
Assuming that the 4-bit binary adder is a ripple carry
adder, the gate level analysis would show that it consists
of 5-gates in critical path. It can be observed from Fig. 1
that the overflow detection circuit starts functioning only
after 4-bit binary adder and it consists of 2-gates in
critical path.
Fig. 1. Block Diagram of Conventional 1-digit BCD FA
The conventional implementation can be made
more efficient by removing those gates which are
completely redundant in their operation. This modified
implementation is given in Fig. 2. It can be seen that,
since either 0110 or 0000 needs to be added in the second
stage, there is no necessity of the FA (full adder) for the
LSB bit as there isn’t any modification in either of the
cases. Thus in the modified implementation, the FA used
for the LSB bit is removed. Also the FA for the MSB bit
can be replaced with a HA (half adder) applying similar
logic.
This results in a smaller critical path. The first stage
and the overflow detection stage are similar to the
conventional implementation. In the third stage the delay
of a 4-bit binary adder (in the conventional design) is
minimized to 2 FA + 1 XOR (modified design).
A3
B3
CO FA
CI
A2
B2
CO F A
S
A1
11XX
A2
1X1X
CI
S3
B1
CO FA
A0
CI
CI
S
S2
S1
CI
Cin
S
CO FA
S
B0
CO FA
S
S
CO F A
Cout
A1
CI
0
S0
Fig.2. Block Diagram of Modified Conventional 1-digit
BCD FA
The modified implementation of the conventional 1-bit
BCD adder can be made faster by using carry look-ahead
circuits to predict the carry faster than the ripple carry
adder [11]. This is shown in Fig.3 (a), named NCLA
(New Carry Look-ahead adder) and implemented in Fig.3
(b). The overflow bit generation circuit is similar to the
conventional one.
A4
B4
A3
A2
B3
A1
B2
B1
C1
Cout
FA
C4
PGA
S3
S4
C3
PGA
G3 P3
S2
C2
PGA
G2 P2
S1
G1 P1
(a)
X3 Y3 X2 Y2 X1 Y1 X0 Y0
Cout
Cin
4 bit NCLA
Output
Carry
are the two BCD digits A4A3A2A1 and B4B3B2B1. The
output of the circuit is the Sum and the output Carry. The
complete circuit can be divided into three parts similar to
the previous implementations; the first being the 4-bit
binary adder stage, the second being the overflow
detection stage and the final correction stage.
As shown in Fig. 4, the first 4-bit binary stage is
implemented using a 4-bit prefix look-ahead logic. This
prefix logic is implemented using the Carry Merge (CM)
blocks mentioned in the diagram. The schematic of the
CM block is shown in Fig. 5 (a). These CM blocks take
propagate, generate (PG) and Cin (carry-input) bits as
inputs and compute Cout (carry-output) as output. Thus
CM1 takes the PG bits of A1, B1 and A2, B2 and then
computes the carry-input for A3, B3. Thus the total critical
path delay for the first stage can be analyzed by
substituting the FA block shown in Fig. 5(b) by the actual
circuit diagram of the FA shown in Fig. 4.
First Stage Delay = 1XOR + 1 CM + 1MUX + 1MUX
= 4-gate delays
After this, the overflow detection logic lies in the
critical path. As described in section 3, overflow takes
place when the sum of the first stage exceeds 9. The logic
for the overflow detection consists of two 2-input NOR
gate and a multiplexer with the select bit as S4
(intermediate sum bit). But only the multiplexer which
generates the final overflow bit is present in the critical
path because all the previous computations take place in
parallel with the first stage. Thus the overflow logic adds
only a 1-gate delay to the critical path.
This overflow bit is given as input to the conditional
sum generator as shown in Fig.5 (c). As can be seen from
the diagram, it consists of 3 multiplexers which select
either of its inputs based on the overflow bit. The first set
of inputs for each multiplexer is the actual sum bits when
there is no overflow. The second set of inputs can be
computed by adding 0110 to the original bits which need
not be computed explicitly again in hardware because of
the logic explained in Section 3.
A4 B4
A3 B3
A2 B2
A1 B1
FA
FA
FA
FA
0
CM
CM
4 bit NCLA
S3
S2
S1
S0
(b)
Fig.3. (a) Block diagram of NCLA (b) Block Diagram of
Lookahead-based 1-digit BCD FA
CM
C
S4
0
4.2 New BCD Adder Architectures
The schematic of a new 1-digit BCD full adder
architecture is shown in Fig. 4. The inputs for the circuit
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
1
S3
S2
S1
C o n d itio n a l
Sum
G e n e ra to r
Fig. 4. Block Diagram of Proposed 1-digit BCD FA
C in
P2
G1
Cin
G2
P2
G1
A
Cout
G2
B
XOR-XNOR
Cin
P2
Cin
Generate
P1
Propagate
0
1
0
1
P2
P1
(b)
S3
S4
S3
S4
1
Overflow
0
0
S2
S1
S2
1
0
1
O4
5. Higher Order (N-digit) BCD adders
Cout
Sum
(a)
Thus the final output is the actual BCD representation
of the sum of the two input digits and the overflow bit is
the carry-output of the 1-digit BCD full adder.
The critical path delay of the circuit is
= 4 (4-bit binary adder with prefix logic) + 1 (overflow
computation) + 1 (conditional sum generator)
= 6 gate delays
Thus theoretically it can be said that the proposed
circuit is 40% faster than the fastest existing one with
less gate count. The theoretical comparison of the
proposed architecture with the existing ones is shown in
Table 1.
0
1
O3
0
1
O2
O1
(c)
Fig. 5(a) Schematic of the Carry-Merge (CM) block (b)
Block Diagram of the 1-bit binary FA (c) Block Diagram of
Conditional Sum Generator
Though 1-digit BCD adders have been analyzed in
detail in the previous section, it is not practical to use
them exclusively. The most common application for BCD
addition, that is, floating point addition uses 16-digit
BCD adders. The focus of this section is to propose and
analyze a novel 16-digit BCD adder.
Directly generalizing the new 1-digit BCD FA, one
obtains a BCD ripple carry adder shown in Fig 6. The
inputs to each FA in this diagram are shown as a pair of
4-bitlines indicating two BCD digits and a carry input.
The diagram shows only for a BCD number of 4-digits
but it can be extended to any number of digits. The
critical path delay of an N-digit ripple carry adder
consists of N*(delay of a BCD FA) = N*(6 gate delay).
Table 1. Theoretical Comparison of the Proposed and
Existing 1-digit BCD Full Adders
Architecture
Critical Path count
Conventional
(Fig. 1)
1*XOR+4*MUX+1*AND
+1*OR+1*XOR+4*MUX
= 12 gates
Modified
conventional
(Fig. 2)
1*XOR+4*MUX+
1*AND +
1*OR+2*XOR+
2*MUX = 11 gates
1*XOR+1*4-input-AND+
1*4-input-OR+1*MUX+
1*AND+1*3-input
OR+1*XOR+1*4-inputAND+1*4-input-OR
+1*MUX = 10 gates
1*XOR + 1*CM + 4*MUX
= 6 gates
Lookaheadbased
(Fig. 3)
Proposed
adder
(Fig. 4)
Total
Gate
Count
12+3+
12
=27
gates
12+3+
6+1
=22
gates
(12+9)
+3+
(12+9)
=
45
gates
14+3+
2+5=
24
gates
4
Cout
4
1 Digit
BCD
FA
4
4
1 Digit
BCD
FA
4
4
4
1 Digit
BCD
FA
4
4
4
1 Digit
BCD
FA
Cin
4
Fig.6. Block Diagram of a BCD ripple carry adder
A typical look-ahead block can be added to the
ripple carry adder to optimize the propagation time. But
to do this, two functions namely Propagate (Carry-output
= Carry-input) and Generate (Carry-output = 1) need to
be defined. In any number system a 1-digit full adder is
said to propagate if the sum is R-1 (where the base is R)
and generate when the sum is equal to or greater than R.
Therefore in BCD, propagate is when the sum is equal to
9 and generate when the sum is greater than or equal to
10. This logic is performed by the circuit shown in Fig.7.
This circuit is similar to the proposed BCD FA but
instead of the conditional sum generator this has the logic
to check to generate PG (propagate and generate). The
overflow bit will serve as the generate bit and to check for
equality to 9 the following is computed.
P = S 4 S 3 S 2 S1
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
4
A4
B4
A3
FA
B3
A2
FA
B2
A1
FA
CM
B1
FA
CM
CM
integrated into Cadence Tools. All the schematics have
been analyzed using 0.18µm CMOS technology and
simulations carried out under various voltages ranging
from 0.9V to 3.3V with a load capacitance of 10 fF. All
inputs are fed at frequencies ranging from 100MHz 1GHz. The glitch power has also been taken into
consideration while calculating the power.
6.2 Results and Discussion
The existing and proposed architectures have been
simulated and the results for delay, power and powerdelay product are shown in Fig 9 and 10.
0
1
P ro p a g a te
A15
B15
BCD
PG
A14
B14
BCD
PG
A1
……………
B1
BCD
PG
A0
B0
BCD
PG
Delay (ns)
Modif ied Conventional
20
Look-ahead Based
15
Proposed
10
5
0
0.9V
1.2V
BCD FA
……………
2.5V
3.3V
(a)
Conventional
Pow e r of 1-digit BCD Adde rs
Power (nW)
Modif ied Conventional
350
300
250
200
150
100
50
0
Look-ahead Based
Proposed
0.9V
1.2V
1.8V
Voltage (V)
2.5V
3.3V
(b)
Power-Delay Product of 1-digit BCD Adders
Conventional
1500
Modif ied Conventional
Look-ahead Based
Proposed
1000
500
0
0.9V
1.2V
1.8V
2.5V
Voltage (V)
3.3V
(c)
P re fix N e tw o rk
BCD FA
1.8V
Voltage (V)
(nW*ns)
This look-ahead logic is further extended by using the
fastest adder logic till date that is the prefix logic shown
in Fig. 8. The first set of BCD PG blocks computes the
PG bits for all the significant stages. These are given as
input to a prefix network. There isn’t any restriction to the
prefix network that is used because only the final carryoutput bits are necessary and are sent to the 1-digit BCD
FA for the computation of the sum.
The critical path in this circuit is
=4 (4-bit binary adder with prefix logic) + 1*MUX + 4
(log2N = prefix network delay for N=16) + 6(1-digit BCD
FA) =15 gate delay for N = 16.
Therefore the entire circuit has a delay of (log2N + 11)
= O (log N)
The total gate count of the circuit (assuming N-digit
adder) is
= N*19 (for BCD PG Block) + Prefix Network
(Sklansky adder has 32 gates for N = 16) + N*24 (for
BCD FA).
Conventional
De lay of 1-digit BCD Adde rs
Power-DelayProduct
G e n e ra te
Fig. 7. Block Diagram of Proposed 1-digit BCD Propagate
Generate Block
BCD FA
BCD FA
Fig. 8. Block Diagram of Proposed 16-digit BCD Prefix
Adder
Fig. 9. Comparisons between proposed and existing
architecture for 1-digit BCD Full Adders (a) Delay (b) Power
(c) Power-Delay product.
6. Simulation Results
All the simulations have been carried out using
Cadence Tools 5.10.41. Power and delay have been
calculated using the virtual analog simulation tool already
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
300
Delay (ns)
6.1 Simulation Environment
Modified Conventional Ripple
Carry
Look-ahead Based Ripple
Carry
Proposed Ripple Carry
Delay of 16-bit BCD Adders
200
Proposed Prefix based
100
0
0.9V
1.2V
1.8V
Voltage (V)
(a)
2.5V
3.3V
Power (nW)
6000
5000
4000
3000
2000
1000
0
M odif ied Convent ional
Ripple Carry
Pow er of 16-bit BCD Adders
Look-ahead Based
Ripple Carry
Proposed Ripple Carry
8. References
Proposed Pref ix based
1.
0.9V
1.2V
1.8V
Voltage (V)
2.5V
3.3V
(b)
(nW*ns)
Power-Delay Product
Power-Delay Product of 16-digit BCD Adders
400000
350000
300000
250000
200000
150000
100000
50000
0
proposed architecture can be easily extended to comply
with the IEEE 754r Floating Point format.
Modified Conventional
Ripple Carry
Look-ahead Based Ripple
Carry
Proposed Ripple Carry
Proposed Prefix based
0.9V
1.2V
1.8V
2.5V
Voltage (V)
3.3V
(c)
Fig.10. Comparisons between proposed and existing
architecture for 16-digit BCD Adders (a) Delay (b) Power (c)
Power-Delay product.
It can be seen from the simulation results shown above
that the proposed 1-digit BCD full adder is 41% faster
than the fastest one till date. In terms of power
consumption the proposed adder consumes a little more
power than the modified conventional adder (Fig. 2)
while being 52% faster. The trade-off can be better
observed in the power-delay product simulations. When
the new BCD 1-digit full adder is used with prefix logic
the resultant 16-digit BCD adder is 80% faster when
compared to the existing ripple carry adder. ( ripple carry
adder using look-ahead based 1-digit BCD FA). Although
the prefix logic adder constructed using the new BCD FA
(Fig. 8) consumes a little more power than the existing
ripple carry adder (Fig. 3), the delay is greatly reduced.
Also, the efficiency of the new 16-bit BCD adder is
reflected by the very small power-delay product as can be
seen from Fig. 10 (c).
7. Conclusions
Existing and proposed architectures for 1-digit BCD
full adders have been presented, simulated and compared.
A novel way of implementing the correction logic is
explained. Simulations have been performed over a wide
range of voltages and frequencies in 0.18um CMOS
technology for circuits designed for 16-digit BCD
operation. The proposed 1-digit BCD FA has been found
to be 40% faster than the fastest one till date. The
extended 16-digit BCD prefix-logic based adder is more
than 80% faster than the existing ripple carry adder. This
IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07)
0-7695-2896-1/07 $20.00 © 2007
Draft IEEE Standard for Floating-Point Arithmetic.
New
York:
IEEE,Inc.,
2004,
http://754r.ucbtest.org/drafts.
2. Michael F. Cowlishaw, “ Decimal Floating Point:
Algorithm for Computers,” Proceedings of the 16th
IEEE Symposium on Computer Arithmetic (ARITH
’03), pp 104-111, June 2003.
3. M.S .Schmookler and A.W. Weinderger, “High
Speed Decimal Addition”, IEEE Transactions on.
Computers, vol. C-20, pp. 862-867, August 1971.
4. W. Bultmann, W. Haller, H. Wetter, and A. Worner,
“Binary and Decimal Adder Unit," U.S. Patent
#6,292,819, September 2001.
5. R.D. Kenney and M.J. Schulte, “Multioperand
Decimal Addition,” Proc. IEEE CS Ann. Symp. VLSI,
pp. 251-253, Feb. 2004.
6. M.A. Erle and M.J. Schulte, “Decimal Multiplication
via Carry-Save Addition,” Proc. IEEE 14th Int’l
Conf. Application-Specific Systems, Architectures,
and Processors, pp. 348-358, June 2003.
7. P. Parhami, Computer Arithmetic: Algorithms and
Hardware Designs. New York: Oxford Univ. Press,
2000.
8. R. Zimmermann and W. Fichtner, “Low-power logic
styles: CMOS versus pass-transistor logic,” IEEE J.
Solid-State Circuits, vol. 32, pp. 1079–1090, July
1997.
9. R.D. Kenney and M.J. Schulte, “High-speed
multioperand decimal adders,” IEEE Transactions
on Computers, Page(s):953 – 963, Volume 54, Issue
8, Aug. 2005.
10. Morris Mano, ‘Digital Design’, Third Edition,
Prentice Hall
11. Thapliyal, H, Kotiyal. S, Srinivas, M.B., “Novel
BCD
adders
and
their
reversible
logic
implementation for IEEE 754r format”, Proceedings
of the 19th International Conference on VLSI Design,
2006
3-7 Jan. 2006
12. Sreehari .Veeramachaneni, Kirthi Krishna .M,
Lingamneni Avinash, Sreekanth Reddy. P,
M.B.Srinivas: "Novel Architectures for High-speed
and Low-power 3-2, 4-2 and 5-2 Compressors,”
Proceedings of the 20th IEEE/ACM International
Conference on VLSI Design and Embedded Systems,
Bangalore ,India, January 2007
13. J. Sklansky, ‘Conditional-sum addition logic,” IRE
Trans. Electronic Computers, vol. EC-9, pp. 226-231,
June 1960.
Download