Novel, High-Speed 16-Digit BCD Adders Conforming to IEEE 754r Format Sreehari Veeramachaneni, M.Kirthi Krishna, Lingamneni Avinash, Sreekanth Reddy P, M.B. Srinivas Centre for VLSI and Embedded System Technologies (CVEST) International Institute of Information Technology Hyderabad, India. srihari@research.iiit.ac.in, {kirthikrishna, avinashl, sreekanthp}@students.iiit.ac.in, srinivas@iiit.ac.in In view of increasing prominence of commercial, financial and internet-based applications that process data in decimal format, there is a renewed interest in providing hardware support to handle decimal data. In this paper, a new architecture for efficient 1-digit decimal addition of binary coded decimal (BCD) operands, which is the core of high speed multi-operand adders and floating decimal-point arithmetic, is proposed. Based on this 1-digit BCD adder, novel architectures for higher order (n-digit) BCD adders such as ripple carry adder and carry look-ahead adder are derived. The proposed circuits are compared (both qualitatively as well as quantitatively) with the existing circuits in literature and are shown to perform better. Simulation results show that the proposed 1-digit BCD adder achieves an improvement of 40% in delay. The 16-digit BCD lookahead adder using prefix logic is shown to perform at least 80% faster than the existing ripple carry one. implementing BCD arithmetic will be to enhance its speed as much as possible which is being addressed in this paper. This paper introduces and analyses various techniques for high speed addition of higher order BCD numbers which form the core of other arithmetic operations such as multi-operand addition [5, 9], multiplication [6] and division [7]. A novel architecture for 1-digit BCD addition is proposed, based on which architectures for higher order adders such as ripple carry adder and carry look-ahead adder are derived. The rest of the paper is organized as follow: Section 2 provides a brief mathematical background of BCD while section 3 describes the proposed algorithm for BCD addition. The proposed circuit for 1-digit BCD addition is given in section 4. In section 5, novel architectures for higher order BCD adders such as ripple carry adder and carry look-ahead adder are presented. Simulation results for the proposed and existing circuits are given in section 6 and discussed in detail. 1. Introduction 2. BCD Arithmetic – A Quick Overview Due to growing importance of decimal arithmetic in commercial, financial and internet-based applications, which cannot tolerate errors of conversion between binary and decimal formats, hardware support for decimal arithmetic is receiving an increased attention. Recently, specifications for decimal floating point arithmetic have been added to the draft revision of the IEEE-754r standard for floating point arithmetic [1]. Despite the widespread use of binary arithmetic, decimal computation remains essential for many applications. Not only is it required whenever numbers are presented for human inspection, but is also often a necessity when fractions are involved. Decimal fractions are pervasive in human endeavors, yet most cannot be represented by binary fractions. The value 0.1, for example, requires an infinitely recurring binary number. If a binary approximation is used instead of an exact decimal fraction, results can be incorrect even if subsequent arithmetic is correct [2]. It is anticipated that once the IEEE-754r standard is finally approved, hardware support for decimal floating point arithmetic will be incorporated on processors for various applications. Still, the major consideration while BCD is a decimal representation of a number directly coded in binary, digit by digit. For example the number (9527)10 = (1001 0101 0010 0111)BCD. It can be seen that each digit of the decimal number is coded in binary and then concatenated, to form the BCD representation of the decimal number. To use this representation all the arithmetic and logical operations need to be defined. As the decimal number system contains 10 digits, at least 4 bits are needed to represent a BCD digit. Considering a decimal digit A, the BCD representation is given by A4A3A2A1 where all Ak ∈ (0,1) . The only point of note is that the Abstract IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 maximum value that can be represented by a BCD digit is 9. The representation of (10)10 in BCD is (0001 0000). Addition in BCD can be explained by considering two decimal digits A and B with BCD representations as A4A3A2A1 and B4B3B2B1 respectively. In the conventional algorithm, these two numbers are added using a 4-bit binary adder. It is possible that the resultant sum can exceed 9 which results in an overflow. If the sum is greater than 9, the binary equivalent of 6 is added to the resultant sum to obtain the exact BCD representation. This can be illustrated with the following example A B Sum Add 0110 (6) 0101 (5) 1011 (11) 0110 (6) BCD 1 0001 (11 in BCD ) Answer = (0001 0001) 3. Proposed Algorithm for BCD Addition The existing algorithm for addition of two BCD digits performs many redundant calculations leading to an inefficient design. After overflow is detected the entire number 0110 is added to the resultant sum (S4S3S2S1) which is implemented using an entire 4-bit binary adder. But on careful observation, it can be seen that S1 is just being added to a 0 which doesn’t require any extra hardware. S2 just needs to be inverted as it is being added to a 1. S3 is also being added to a 1 which means that it needs to be inverted only if S2 is a 1. Using similar logic, S4 needs to be inverted only if either of S3 or S2 is a 1. Hence the correct sum can be selected by a set of multiplexers with the select signal as the overflow bit. Therefore in terms of hardware, instead of a complete 4-bit binary adder, a set of 2 multiplexers arranged in parallel is needed to compute the corrected sum and another 3 multiplexers to select the appropriate one. Also the number of inverters can be minimized as the inverted output can be obtained by using the complement of the actual output which is generated in the CMOS implementation of the multiplexer [8, 12] in the FA of the first stage. The logical derivation of the overflow bit which selects the appropriate output is shown below: If the resultant sum from the first 4-bit binary adder is S4S3S2S1 and a C (carry-output), then for this number to be greater than 10: ( Overflow = (C ⊕ S 4 ) • S 4 S 3 S 2 ( ) = (C ⊕ S 4 ) • S 4 + S 3 + S 2 ( ) ) = C • S 4 + C • (S 3 + S 2 ) • S 4 The digital logic which implements the above algorithm is used in the proposed 1-digit BCD adder discussed in the following section. 4. One-Digit BCD Full Adder A BCD 1-digit adder is a circuit that adds two BCD digits in parallel and also produces the sum digit in BCD. A BCD adder must also include the correction logic as mentioned in section 1 [3, 4]. IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 4.1 Existing Architectures The conventional implementation of the addition as per the algorithm described above is shown in Fig.1 [10]. It can be seen that there is a 4-bit binary adder at the beginning to add the two BCD digits (each digit expressed using 4 bits) and a Carry-input. Then comes the overflow detection (to check whether the sum of the BCD digit has exceeded 9) which is designed using two AND gates and a 3-input OR gate. The output of this logic determines whether to add 6 (0110) or not. After this, in the critical path, comes another 4-bit binary adder which adds 0110 if the overflow logic is ‘High’ and 0000 if the overflow logic is ‘Low’. This is the correction stage. Thus the critical path in this circuit consists of a 4-bit binary adder plus overflow logic plus 4-bit binary adder. Assuming that the 4-bit binary adder is a ripple carry adder, the gate level analysis would show that it consists of 5-gates in critical path. It can be observed from Fig. 1 that the overflow detection circuit starts functioning only after 4-bit binary adder and it consists of 2-gates in critical path. Fig. 1. Block Diagram of Conventional 1-digit BCD FA The conventional implementation can be made more efficient by removing those gates which are completely redundant in their operation. This modified implementation is given in Fig. 2. It can be seen that, since either 0110 or 0000 needs to be added in the second stage, there is no necessity of the FA (full adder) for the LSB bit as there isn’t any modification in either of the cases. Thus in the modified implementation, the FA used for the LSB bit is removed. Also the FA for the MSB bit can be replaced with a HA (half adder) applying similar logic. This results in a smaller critical path. The first stage and the overflow detection stage are similar to the conventional implementation. In the third stage the delay of a 4-bit binary adder (in the conventional design) is minimized to 2 FA + 1 XOR (modified design). A3 B3 CO FA CI A2 B2 CO F A S A1 11XX A2 1X1X CI S3 B1 CO FA A0 CI CI S S2 S1 CI Cin S CO FA S B0 CO FA S S CO F A Cout A1 CI 0 S0 Fig.2. Block Diagram of Modified Conventional 1-digit BCD FA The modified implementation of the conventional 1-bit BCD adder can be made faster by using carry look-ahead circuits to predict the carry faster than the ripple carry adder [11]. This is shown in Fig.3 (a), named NCLA (New Carry Look-ahead adder) and implemented in Fig.3 (b). The overflow bit generation circuit is similar to the conventional one. A4 B4 A3 A2 B3 A1 B2 B1 C1 Cout FA C4 PGA S3 S4 C3 PGA G3 P3 S2 C2 PGA G2 P2 S1 G1 P1 (a) X3 Y3 X2 Y2 X1 Y1 X0 Y0 Cout Cin 4 bit NCLA Output Carry are the two BCD digits A4A3A2A1 and B4B3B2B1. The output of the circuit is the Sum and the output Carry. The complete circuit can be divided into three parts similar to the previous implementations; the first being the 4-bit binary adder stage, the second being the overflow detection stage and the final correction stage. As shown in Fig. 4, the first 4-bit binary stage is implemented using a 4-bit prefix look-ahead logic. This prefix logic is implemented using the Carry Merge (CM) blocks mentioned in the diagram. The schematic of the CM block is shown in Fig. 5 (a). These CM blocks take propagate, generate (PG) and Cin (carry-input) bits as inputs and compute Cout (carry-output) as output. Thus CM1 takes the PG bits of A1, B1 and A2, B2 and then computes the carry-input for A3, B3. Thus the total critical path delay for the first stage can be analyzed by substituting the FA block shown in Fig. 5(b) by the actual circuit diagram of the FA shown in Fig. 4. First Stage Delay = 1XOR + 1 CM + 1MUX + 1MUX = 4-gate delays After this, the overflow detection logic lies in the critical path. As described in section 3, overflow takes place when the sum of the first stage exceeds 9. The logic for the overflow detection consists of two 2-input NOR gate and a multiplexer with the select bit as S4 (intermediate sum bit). But only the multiplexer which generates the final overflow bit is present in the critical path because all the previous computations take place in parallel with the first stage. Thus the overflow logic adds only a 1-gate delay to the critical path. This overflow bit is given as input to the conditional sum generator as shown in Fig.5 (c). As can be seen from the diagram, it consists of 3 multiplexers which select either of its inputs based on the overflow bit. The first set of inputs for each multiplexer is the actual sum bits when there is no overflow. The second set of inputs can be computed by adding 0110 to the original bits which need not be computed explicitly again in hardware because of the logic explained in Section 3. A4 B4 A3 B3 A2 B2 A1 B1 FA FA FA FA 0 CM CM 4 bit NCLA S3 S2 S1 S0 (b) Fig.3. (a) Block diagram of NCLA (b) Block Diagram of Lookahead-based 1-digit BCD FA CM C S4 0 4.2 New BCD Adder Architectures The schematic of a new 1-digit BCD full adder architecture is shown in Fig. 4. The inputs for the circuit IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 1 S3 S2 S1 C o n d itio n a l Sum G e n e ra to r Fig. 4. Block Diagram of Proposed 1-digit BCD FA C in P2 G1 Cin G2 P2 G1 A Cout G2 B XOR-XNOR Cin P2 Cin Generate P1 Propagate 0 1 0 1 P2 P1 (b) S3 S4 S3 S4 1 Overflow 0 0 S2 S1 S2 1 0 1 O4 5. Higher Order (N-digit) BCD adders Cout Sum (a) Thus the final output is the actual BCD representation of the sum of the two input digits and the overflow bit is the carry-output of the 1-digit BCD full adder. The critical path delay of the circuit is = 4 (4-bit binary adder with prefix logic) + 1 (overflow computation) + 1 (conditional sum generator) = 6 gate delays Thus theoretically it can be said that the proposed circuit is 40% faster than the fastest existing one with less gate count. The theoretical comparison of the proposed architecture with the existing ones is shown in Table 1. 0 1 O3 0 1 O2 O1 (c) Fig. 5(a) Schematic of the Carry-Merge (CM) block (b) Block Diagram of the 1-bit binary FA (c) Block Diagram of Conditional Sum Generator Though 1-digit BCD adders have been analyzed in detail in the previous section, it is not practical to use them exclusively. The most common application for BCD addition, that is, floating point addition uses 16-digit BCD adders. The focus of this section is to propose and analyze a novel 16-digit BCD adder. Directly generalizing the new 1-digit BCD FA, one obtains a BCD ripple carry adder shown in Fig 6. The inputs to each FA in this diagram are shown as a pair of 4-bitlines indicating two BCD digits and a carry input. The diagram shows only for a BCD number of 4-digits but it can be extended to any number of digits. The critical path delay of an N-digit ripple carry adder consists of N*(delay of a BCD FA) = N*(6 gate delay). Table 1. Theoretical Comparison of the Proposed and Existing 1-digit BCD Full Adders Architecture Critical Path count Conventional (Fig. 1) 1*XOR+4*MUX+1*AND +1*OR+1*XOR+4*MUX = 12 gates Modified conventional (Fig. 2) 1*XOR+4*MUX+ 1*AND + 1*OR+2*XOR+ 2*MUX = 11 gates 1*XOR+1*4-input-AND+ 1*4-input-OR+1*MUX+ 1*AND+1*3-input OR+1*XOR+1*4-inputAND+1*4-input-OR +1*MUX = 10 gates 1*XOR + 1*CM + 4*MUX = 6 gates Lookaheadbased (Fig. 3) Proposed adder (Fig. 4) Total Gate Count 12+3+ 12 =27 gates 12+3+ 6+1 =22 gates (12+9) +3+ (12+9) = 45 gates 14+3+ 2+5= 24 gates 4 Cout 4 1 Digit BCD FA 4 4 1 Digit BCD FA 4 4 4 1 Digit BCD FA 4 4 4 1 Digit BCD FA Cin 4 Fig.6. Block Diagram of a BCD ripple carry adder A typical look-ahead block can be added to the ripple carry adder to optimize the propagation time. But to do this, two functions namely Propagate (Carry-output = Carry-input) and Generate (Carry-output = 1) need to be defined. In any number system a 1-digit full adder is said to propagate if the sum is R-1 (where the base is R) and generate when the sum is equal to or greater than R. Therefore in BCD, propagate is when the sum is equal to 9 and generate when the sum is greater than or equal to 10. This logic is performed by the circuit shown in Fig.7. This circuit is similar to the proposed BCD FA but instead of the conditional sum generator this has the logic to check to generate PG (propagate and generate). The overflow bit will serve as the generate bit and to check for equality to 9 the following is computed. P = S 4 S 3 S 2 S1 IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 4 A4 B4 A3 FA B3 A2 FA B2 A1 FA CM B1 FA CM CM integrated into Cadence Tools. All the schematics have been analyzed using 0.18µm CMOS technology and simulations carried out under various voltages ranging from 0.9V to 3.3V with a load capacitance of 10 fF. All inputs are fed at frequencies ranging from 100MHz 1GHz. The glitch power has also been taken into consideration while calculating the power. 6.2 Results and Discussion The existing and proposed architectures have been simulated and the results for delay, power and powerdelay product are shown in Fig 9 and 10. 0 1 P ro p a g a te A15 B15 BCD PG A14 B14 BCD PG A1 …………… B1 BCD PG A0 B0 BCD PG Delay (ns) Modif ied Conventional 20 Look-ahead Based 15 Proposed 10 5 0 0.9V 1.2V BCD FA …………… 2.5V 3.3V (a) Conventional Pow e r of 1-digit BCD Adde rs Power (nW) Modif ied Conventional 350 300 250 200 150 100 50 0 Look-ahead Based Proposed 0.9V 1.2V 1.8V Voltage (V) 2.5V 3.3V (b) Power-Delay Product of 1-digit BCD Adders Conventional 1500 Modif ied Conventional Look-ahead Based Proposed 1000 500 0 0.9V 1.2V 1.8V 2.5V Voltage (V) 3.3V (c) P re fix N e tw o rk BCD FA 1.8V Voltage (V) (nW*ns) This look-ahead logic is further extended by using the fastest adder logic till date that is the prefix logic shown in Fig. 8. The first set of BCD PG blocks computes the PG bits for all the significant stages. These are given as input to a prefix network. There isn’t any restriction to the prefix network that is used because only the final carryoutput bits are necessary and are sent to the 1-digit BCD FA for the computation of the sum. The critical path in this circuit is =4 (4-bit binary adder with prefix logic) + 1*MUX + 4 (log2N = prefix network delay for N=16) + 6(1-digit BCD FA) =15 gate delay for N = 16. Therefore the entire circuit has a delay of (log2N + 11) = O (log N) The total gate count of the circuit (assuming N-digit adder) is = N*19 (for BCD PG Block) + Prefix Network (Sklansky adder has 32 gates for N = 16) + N*24 (for BCD FA). Conventional De lay of 1-digit BCD Adde rs Power-DelayProduct G e n e ra te Fig. 7. Block Diagram of Proposed 1-digit BCD Propagate Generate Block BCD FA BCD FA Fig. 8. Block Diagram of Proposed 16-digit BCD Prefix Adder Fig. 9. Comparisons between proposed and existing architecture for 1-digit BCD Full Adders (a) Delay (b) Power (c) Power-Delay product. 6. Simulation Results All the simulations have been carried out using Cadence Tools 5.10.41. Power and delay have been calculated using the virtual analog simulation tool already IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 300 Delay (ns) 6.1 Simulation Environment Modified Conventional Ripple Carry Look-ahead Based Ripple Carry Proposed Ripple Carry Delay of 16-bit BCD Adders 200 Proposed Prefix based 100 0 0.9V 1.2V 1.8V Voltage (V) (a) 2.5V 3.3V Power (nW) 6000 5000 4000 3000 2000 1000 0 M odif ied Convent ional Ripple Carry Pow er of 16-bit BCD Adders Look-ahead Based Ripple Carry Proposed Ripple Carry 8. References Proposed Pref ix based 1. 0.9V 1.2V 1.8V Voltage (V) 2.5V 3.3V (b) (nW*ns) Power-Delay Product Power-Delay Product of 16-digit BCD Adders 400000 350000 300000 250000 200000 150000 100000 50000 0 proposed architecture can be easily extended to comply with the IEEE 754r Floating Point format. Modified Conventional Ripple Carry Look-ahead Based Ripple Carry Proposed Ripple Carry Proposed Prefix based 0.9V 1.2V 1.8V 2.5V Voltage (V) 3.3V (c) Fig.10. Comparisons between proposed and existing architecture for 16-digit BCD Adders (a) Delay (b) Power (c) Power-Delay product. It can be seen from the simulation results shown above that the proposed 1-digit BCD full adder is 41% faster than the fastest one till date. In terms of power consumption the proposed adder consumes a little more power than the modified conventional adder (Fig. 2) while being 52% faster. The trade-off can be better observed in the power-delay product simulations. When the new BCD 1-digit full adder is used with prefix logic the resultant 16-digit BCD adder is 80% faster when compared to the existing ripple carry adder. ( ripple carry adder using look-ahead based 1-digit BCD FA). Although the prefix logic adder constructed using the new BCD FA (Fig. 8) consumes a little more power than the existing ripple carry adder (Fig. 3), the delay is greatly reduced. Also, the efficiency of the new 16-bit BCD adder is reflected by the very small power-delay product as can be seen from Fig. 10 (c). 7. Conclusions Existing and proposed architectures for 1-digit BCD full adders have been presented, simulated and compared. A novel way of implementing the correction logic is explained. Simulations have been performed over a wide range of voltages and frequencies in 0.18um CMOS technology for circuits designed for 16-digit BCD operation. The proposed 1-digit BCD FA has been found to be 40% faster than the fastest one till date. The extended 16-digit BCD prefix-logic based adder is more than 80% faster than the existing ripple carry adder. This IEEE Computer Society Annual Symposium on VLSI(ISVLSI'07) 0-7695-2896-1/07 $20.00 © 2007 Draft IEEE Standard for Floating-Point Arithmetic. New York: IEEE,Inc., 2004, http://754r.ucbtest.org/drafts. 2. Michael F. Cowlishaw, “ Decimal Floating Point: Algorithm for Computers,” Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH ’03), pp 104-111, June 2003. 3. M.S .Schmookler and A.W. Weinderger, “High Speed Decimal Addition”, IEEE Transactions on. Computers, vol. C-20, pp. 862-867, August 1971. 4. W. Bultmann, W. Haller, H. Wetter, and A. Worner, “Binary and Decimal Adder Unit," U.S. Patent #6,292,819, September 2001. 5. R.D. Kenney and M.J. Schulte, “Multioperand Decimal Addition,” Proc. IEEE CS Ann. Symp. VLSI, pp. 251-253, Feb. 2004. 6. M.A. Erle and M.J. Schulte, “Decimal Multiplication via Carry-Save Addition,” Proc. IEEE 14th Int’l Conf. Application-Specific Systems, Architectures, and Processors, pp. 348-358, June 2003. 7. P. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. New York: Oxford Univ. Press, 2000. 8. R. Zimmermann and W. Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic,” IEEE J. Solid-State Circuits, vol. 32, pp. 1079–1090, July 1997. 9. R.D. Kenney and M.J. Schulte, “High-speed multioperand decimal adders,” IEEE Transactions on Computers, Page(s):953 – 963, Volume 54, Issue 8, Aug. 2005. 10. Morris Mano, ‘Digital Design’, Third Edition, Prentice Hall 11. Thapliyal, H, Kotiyal. S, Srinivas, M.B., “Novel BCD adders and their reversible logic implementation for IEEE 754r format”, Proceedings of the 19th International Conference on VLSI Design, 2006 3-7 Jan. 2006 12. Sreehari .Veeramachaneni, Kirthi Krishna .M, Lingamneni Avinash, Sreekanth Reddy. P, M.B.Srinivas: "Novel Architectures for High-speed and Low-power 3-2, 4-2 and 5-2 Compressors,” Proceedings of the 20th IEEE/ACM International Conference on VLSI Design and Embedded Systems, Bangalore ,India, January 2007 13. J. Sklansky, ‘Conditional-sum addition logic,” IRE Trans. Electronic Computers, vol. EC-9, pp. 226-231, June 1960.