Conversion between DPD and RBCD for on-line arithmetic computation Sonia González, Carlos Garcı́a, Julio Villalba. Summary— In recent years decimal arithmetic has gained renewed interest with the ratification of the IEEE 754-2008 Floating-point Standard. It specifies formats for Decimal Floating-point (DFP) numbers and uses Densely Packed Decimal (DPD) encoding to store the significand of a DFP number. However, to perform decimal arithmetic operations, DPD conversions to Binary Coded Decimal (BCD) are needed. In order to deal with on-line arithmetic it is necessary to use redundant number representation which prevents the carry propagation and allows the computation starting from the most significant digit (MSD). In this paper we consider the Redundant Binary Coded Decimal (RBCD) encoding and presents the design of a DPD converter to RBCD representation for decimal on-line arithmetic units. The direct conversion proposed in this paper (DPD to RBCD) supposes an improvement over the two steps conversion required by a regular computation (DPD to BCD and BCD to RBCD). Keywords— conversion, decimal floating-point, online arithmetic, densely packed decimal, redundant binary coded decimal. I. Introduction D ECIMAL arithmetic is present nowadays thanks to the ratification of the IEEE 754-2008 Floating-point Standard. During last years there have been a lot of activities in the design of specific decimal arithmetic units. In fact, processors such as IBM Power6, Power7, z9 and z10 [10], [5], [11] include decimal floating-point units. The standard includes two basic formats for Decimal Floating-point (DFP) numbers and specifies two encodings for DFP significands known as the decimal and the binary encoding. The decimal encoding uses the Densely Packed Decimal (DPD) [3] encoding to encode the significand. The main drawback of DPD encoding is that it is not easy to perform computations with it. To resolve this problem, a DPD number is converted to Binary Coded Decimal representation (BCD) and the operations are carried out using this representation. Most of the recent proposed decimal arithmetic units based on DPD encoding [4], [14], [15], [16], [8], [9], [7] are designed assuming this conversion. On the other hand, online arithmetic is based on serial computation starting from the Most Significant Digit (MSD). To avoid the chain of carries a redundant representation of the numbers is used in on-line arithmetic [6]. Redundant Binary Coded Decimal (RBCD) [13] is a redundant decimal representation where the BCD digits from 0 to 9 are represented with the digit set is {-7,...,7}. In order to work with decimal on-line units two steps are needed to convert from DPD to RBCD: 1 Dept. Computer Architecture, University of Málaga, email: sonia,cgarcia,julio@ac.uma.es 1 a conversion from DPD to BCD and from BCD to RBCD. Although the DPD to BCD conversion is fast in hardware, the BCD to RBCD conversion implies the chain of a carry between digits. In this paper, we propose a direct conversion DPD to RBCD by the fusion of the tables and equations involved by the two steps conversion, achieving a faster algorithm. The rest of the paper is organized as follows. Section II describes the DFP formats specified in the IEEE 754-2008 standard. Section III deals with the Redundant Binary Coded Decimal numbers and the on–line arithmetic requirements. Section IV presents the direct conversion from DPD to RBCD. Section V examines the implementation results, and finally, Section VI presents the summary and conclusions of this work. II. Decimal Floating-Point format Due to the importance of DFP arithmetic, IEEE developed its standard for floating-point arithmetic [1] by including specifications for DFP formats and operations [1]. With IEEE 754-2008, the value of a finite DFP number, x, is: (−1)Sx × Cx × 10Ex −bias where Sx is the sign bit, Ex is a biased exponent, bias is a constant value that makes Ex non-negative, and Cx is the significand, which is also referred to as the coefficient. IEEE 754-2008 defines two basic DFP formats, decimal64 and decimal128, with encodings lengths of 64 and 128 bits, respectively. These formats are used to represent a finite subset of real numbers including finite numbers, signed infinities and two different types of Not-a-Numbers (qNaN and sNaN). In addition, the Standard specifies two encodings for DFP significands; (1) a binary encoding, known as Binary Integer Decimal (BID), and (2) a decimal encoding, known as Densely Packed Decimal (DPD). With the BID encoding, the significand is represented using an unsigned binary integer. With the DPD encoding the significand is represented using an unsigned decimal integer, in which three decimal digits are encoding using ten bits [3]. With either encoding, the significand of a DFP number is not normalized, which means that a single DFP number may have multiple representations. More details on the DFP formats and operations are provided in [1]. III. Redundant Binary Coded Decimal numbers On–line arithmetic defines algorithms for serial arithmetic operators that receive the inputs and ge- vwxst p·q·u s·t·u (ef gh) Cout 0---- - 0 e=0 100-- - 0 0 (f gh) = (stu) + Cin1 110-- - 0 e=0 11101 0 0 (f gh) = (pqu) + Cin1 0---- - 1 100-- - 1 e = 1, f = 1 1 110-- - 1 g = Cin1 , h = Cin1 11101 1 101-- - e = u · Cin1 11100 - f = u · Cin1 1 11110 - g = u · Cin1 11111 - h = u ⊕ Cin1 nerate the output starting from the most-significant digit (MSD first). The serial approach is advantageous because of the simplicity of the hardware and the reduction in number and length of connections among modules. Moreover, the MSD first alternative allows the implementation of operations, such as division and square root, which are difficult to implement least-significant digit first. The drawback of the serial approach is the number of cycles required; however, this can be compensated by the overlap of the execution of dependent operations. Thanks to all these characteristics, on-line arithmetic is suitable for VLSI implementation. vwxstp·q·r (abcd) Cout 0---- 0 100-- 0 a=0 0 101-- 0 (bcd) = (pqr) + Cin0 11110 0 0---- 1 100-- 1 a = 1, b = 1 1 101-- 1 c = Cin0 , d = Cin0 11110 1 110-- 11000 - a = r · Cin0 , b = r · Cin0 1 11101 - c = r · Cin0 , d = r ⊕ Cin0 11111 - TABLE II Obtaining (ef gh) 0 0 Cin1 p q u 0 0 Cin1 s t u + + + 0 0 s t u + 1 1 Cin1 1 1 Cin1 p q u mux mux 4 4 x v mux v u Cin1 V TABLE I Obtaining (abcd) 4 X u Cin1 V mux T W 0 0 Cin0 p q r + S V + 1 1 Cin0 Cout2 0 p q r efgh mux r Cin0 V Fig. 2. Implementation obtaining efgh 4 W r Cin0 V mux V S T X Cout1 abcd Fig. 1. Implementation obtaining abcd To deal with on-line arithmetic it is necessary to have a number representation system with no carry propagation. In this way, it is possible to perform the computation starting from the Most Significant Digit (MSD). This is achieved by carry-save or signed digit representations. Therefore, to deal with decimal on-line arithmetic a decimal redundant number systems is required. The BCD code involved in the DPD format does not fulfill this condition. Thus, a conversion step from BCD to a redundant decimal system is needed. A code that meets the required condition and which is directly related to BCD code is the Redundant Binary Coded Decimal (RBCD) defined in [2]. A RBCD number is composed by digits of 4 bits which represent 15 numbers in the range {−7, −6, ...0...6, 7}. It is a signed digit representation such as a positive number is coded as natural binary whereas a negative number is coded as two‘s complement. This code allows the computation with no carry propagation for the decimal addition [2], substraction, multiplication and division [12]. The conversion between BCD and RBCD can be performed with no carry propagation whereas the opposite conversion involves a borrow propagation. Fortunately for the on–line arithmetic computation, the most critical conversion is BCD to RBCD since the MSD is required as soon as possible. DPD code is only used for storage purposes and the conversion from RBCD to DPD is performed only when the on– line processing has finished. The conversion from BCD to RBCD is performed by a two steps algorithm [2]. In the first step we detect if a number is greater or equal to 7 and we (ijkm) Cout i=0 0 (jkm) = (wxy) + Cin2 i=0 0 (jkm) = (sty) + Cin2 i=0 0 (jkm) = (pqy) + Cin2 vwxst w·x·y s·t·y p·q·y 0---- 0 - - 101-- - 0 - 110-11100 0---101-110-11100 100-11101 11110 11111 1 - 0 0 1 1 - 1 - i = 1, j = 1 k = Cin2 , m = Cin2 1 1 TABLE III Obtaining (ijkm) + 0 0 Cin2 s t y + + 0 0 w x y + 1 1 Cin2 1 1 Cin2 s t y mux 0 0 Cin2 p q y mux + 0 p q y v + 1 1 Cin2 4 4 mux 4 mux S W V T X mux V W y Cin2 4 V W X y Cin2 V mux V S T X Cout3 ijkm Fig. 3. Implementation obtaining ijkm add the amount of 6 in such a case. This provokes an output carry. In the second step we add the input carry to the result of the previous operation. Let (abcd) a generic BCD digit of a BCD-coded number, and Cin and Cout the input and output carries respectively. The condition for a carry generation is cout = ak(b · c · d) (see [2]), where the symbol k means the logic OR operation and · is the logical AND (notice that the carry only depends on the current digit bits). Thus, the conversion is: Cin0 Cin1 abcd Cout1 Cin2 efgh ijkm Cout2 cout = ak(b · c · d) (abcd) if (abcd) = (abcd) + (0110) if (1) cout = 0 cout = 1 (2) Second step: (abcd) = (abcd) + cin (3) IV. Direct conversion from DPD to RBCD i = y · Cin2 j = y · Cin2 k = y · Cin2 m = y ⊕ Cin2 0 0 Cin2 w x y First step: Cout3 Fig. 4. Global structure of the conversion Let (pqrstuvwxy) the ten bits corresponding to a DPD code. This code is converted to three BCD digits (abcd)(efgh)(ijkm), in such a way that each bit of the three BCD digits is obtained as a boolean function of the DPD bits. In [1] a table conversion is provided. On the other hand, conversion from BCD to RBCD is performed by implementation of equations (1) through (3). What we propose is the combination of the table and the equations to provide a direct table conversion, which is presented in tables I, II and III. The resulting BCD code is composed by three digits namely (abcd)(efgh)(ijkm) and they are directly obtained from the DPD code (pqrstuvwxy). In the tables the symbol · means the logical AND operation, the symbol ⊕ corresponds to the logical EXOR operation and the symbol + is the arithmetic addition. From these tables we can see that only logical operations are required as well as, for some cases, one level of 3-bit arithmetic addition to add the input carry (for example, in table I for the first case, the bits (bcd) are obtained by the addition of the bits (pqr) and a carry, whereas the bit a=0. Notice that the maximum value of (bcd) is 6 and thus the addition of a carry never provokes an output carry). Nevertheless, the BCD to RBCD conversion proposed in [2] involves two additions. The implementation of the direct conversion is shown in Fig. 1,Fig. 2 and Fig. 3 which are related to Tables I, II and III respectively. The implementation of Table I requires the use of only two multiplexers and one 3-bit adder, while the implementation of Table II uses four multiplexers and two 3-bit parallel adders, and the implementation of Table III uses six multiplexers and three 3-bit parallel adders. Fig. 4 shows the global structure of the full conversion. The Cin0 is the carry input coming from the previous conversion, and the Cout3 is the carry output produced by the current conversion. V. Experimental results The DPD to RBCD design presented in this paper have been implemented in Verilog, simulated using ModelSim 6.0, and synthesized using Synopsys Design Compiler and the TSMC 65nm library in which one cell unit has an area equal to 1 µm2 . Also, we have implemented the conversion using two steps (conversion from DPD to BCD [1] plus conversion from BCD to RBCD [2]). Table IV shows the implementation results. Our approach is close to 27% faster than the two steps algorithm. Nevertheless, our design requires about 58% more area than the two step processing. The improvement in the time of our algorithm is due to the fact that we use only one 3-bit parallel addition operation in comparison with the two serial 4-bit additions required by the standard conversion. Notice that the table conversion from DPD to BCD involves only logical operations, in such a way that addition has a high influence in the total computation time. Two steps Our design DPD to RBCD Time Area 0.0744 913 0.0546 1449 TABLE IV Implementation results VI. Summary and Conclusion In this paper we have presented a direct conversion between DPD and RBCD which makes the computation in an on–line arithmetic system possible. The proposed system obtains directly the RBCD digits from a DPD data stream starting from the MSD. The fusion of the two steps into one reduces significatively the computation time of the conversion with a moderate increase of hardware. The fast conversion proposed in this paper can benefit to all the potential decimal on–line arithmetic algorithms if these algorithms involve IEEE 754-2008 decimal floating point numbers. References [1] American National Standards Institute and Institute of Electrical and Electronic Engineers. 754-2008 IEEE standard for floating-point arithmetic,. IEEE Standard, Std 754-2008, 2008. [2] D.Y.Y. Yun B. Shirazi and C.N. Zhang. RBCD: redundant binary coded decimal adder. IEE Proceedings Computer and Digital Techniques, 136:156–160, March 1989. [3] M. F. Cowlishaw. Densely packed decimal encoding. In IEE Proceedings - Computers and Digital Techniques, volume 149, pages 102–104, May 2002. [4] M. F. Cowlishaw. Decimal floating-point: Algorism for computers. In Proceedings of the 16th IEEE Symposium on Computer Arithmetic, pages 104–111, June 2003. [5] A. Y. Duale, M. H. Decker, H.-G. Zipperer, M. Aharoni, and T. J. Bohizic. Decimal floating-point in z9: An implementation and testing perspective. IBM Journal of Research and Development, 51(1/2), 2007. [6] M.D. Ercegovac and T. Lang. Digital Arithmetic. Morgan Kaufmann, 2004. [7] Steven R. Carlough Eric M. Schwarz. Power6 decimal divide. In Proceedings of the 18th IEEE Symposium on Application-specific Systems, Architectures and Processors, 2007. [8] M. A. Erle, M. J. Schulte, and B. J. Hickmann. Decimal floating-point multiplication via carry-save addition. In Proceedings of the 18th IEEE Symposium on Computer Arithmetic, 2007. [9] B. Hickmann, A. Krioukov, M. A. Erle, and M. Schulte. A parallel ieee p754 decimal floating-point multiplier. In International Conference on Computer Designs, pages 296–303, October 2007. [10] J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough. Ibm power6 accelerators: [11] [12] [13] [14] [15] [16] Vmx and dfu. IBM Journal of Research and Development, 51:1–21, November 200u. E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw. Decimal floating-point support on the ibm system z10 processor. IBM Journal of Research and Development, 53(1):4:1 –4:10, 2009. S.Gorgin and G. Jaberipur. Fully redundant decimal arithmetic. In Proc. of 19th IEEE Symposium on Computer Arithmetic (ARITH 2009). IEEE Computer Society Press, 2009. B. Shirazi, D.Y.Y. Yun, and C.N. Zhang. Rbcd: redundant binary coded decimal adder. Computers and Digital Techniques, IEE Proceedings E, 136(2):156 – 160, March 1989. L.-K. Wang and M. J. Schulte. Decimal floating-point square root using Newton-Raphson iteration. In Proceedings of IEEE International Conference on ApplicationSpecific System, Architectures and Processors, pages 309–315, July 2005. L.-K. Wang and M. J. Schulte. Decimal floating-point adder and multifunction unit with injection-based rounding. In Proceedings of the 18th IEEE Symposium on Computer Arithmetic, Montpellier, France, June 2007. L.-K. Wang and M. J. Schulte. A decimal floating-point divider using Newton-Raphson iteration. The Journal of VLSI Signal Processing, pages 727–739, 2007.