A 32-bit Decimal Floating-Point Logarithmic Converter

A 32-bit Decimal Floating-Point Logarithmic Converter Dongdong Chen1 , Yu Zhang1 , Younhee Choi1 , Moon Ho Lee2 , Seok-Bum Ko1,2 Department of Electrical and Computer Engineering, University of Saskatchewan1 Campus Drive 57, Sasaktoon, SK, Canada, seokbum.ko@usask.ca Institute of Information and Communication, Chonbuk National University2 Jeonju 561-756, Korea, moonho@chonbuk.ac.kr Abstract This paper presents a new design and implementation of a 32-bit decimal floating-point (DFP) logarithmic converter based on the digit-recurrence algorithm. The converter can calculate accurate logarithms of 32-bit DFP numbers which are defined in the IEEE 754-2008 standard. Redundant digit e1 is obtained by look-up table in the first iteration and the rest redundant digits ej are selected by rounding the scaled remainder during the succeeding iterations. The sequential architecture of the proposed 32-bit DFP logarithmic converter is implemented on Xilinx Virtex-II Pro P30 FPGA device and then synthesized with TMSC 0.18-um standard cell library. The implementation results indicate that the maximum frequency of the proposed architecture is 47.7 MHz in FPGA and 107.9 MHz in TMSC 0.18-um technology. The faithful 32-bit DFP logarithm results can be obtained in 18 cycles. Keywords: Decimal Logarithmic Converter, Decimal Floating-Point, Digit-Recurrence Algorithm, Selection by Rounding. 1. Introduction Nowadays, there are many commercial demands for DFP arithmetic operations such as financial analysis, tax calculation, phone billing, currency conversion, Internet based applications, and e-commerce [3]. This trend gives rise to further development on DFP arithmetic unit which can perform more accurate calculations compared with a BFP arithmetic unit. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic [1] includes it in the specifications. Decimal arithmetic unit, as a main part of a decimal processor, is attracting more and more researchers’ attention. The decimal-encoded formats and arithmetic have been implemented in IBM’s POWER6 [5], system z9 [4] and z10 processors [12]. The logarithms operation, as one of the elementary function, is a useful arithmetic concept in many areas of science and engineering. Some applications, such as logarithmic number system (LNS) and digital signal processing, are implemented by using a logarithmic unit to replace the normal computer arithmetic. Moreover, the decimal logarithms operation as a decimal arithmetic operation is defined in the new IEEE 754-2008 standard [1]. Based on the improvement of basic decimal arithmetic units, more complex DFP elementary operations such as logarithm, exponential, trigonometric, etc would be the next useful building blocks. Muller [10] presents both software and hardwareoriented algorithms to compute elementary functions. While most elementary functions are implemented by the software-oriented methods due to their advantage of using large look-up tables and providing more accurate results, these methods are usually too slow for numerically intensive and real-time applications. Hardware-oriented methods with high-speed solutions have been developed as an alternative. A CORDIC-like BKM algorithm is presented in [7] for fast computation of complex exponentials and logarithms. Another digit-recurrence hardware-oriented algorithm is an interesting alternative method due to its low area requirements, especially for high-precision computations. The selection by rounding is introduced for highradix binary division, square-root in [6],[8] and logarithm in [11]. This method can efficiently decrease the cost of implementation and, in particular, the complexities of the selection function for the digits. In this paper, a radix-10 digit-recurrence algorithm with selection by rounding based approach is proposed to implement a 32-bit DFP logarithmic converter. This paper is organized as follows: In section 2, the basic DFP standard and 32-bit DFP logarithmic calculation is described. Section 3 presents a radix-10 fixed-point (FXP) logarithm operation by digit-recurrence algorithm with selection by rounding, and the related architecture is constructed. In section 4, the architecture of the proposed 32bit DFP logarithmic converter is presented with an example Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. and then verified by a function verification platform. In Section 5, first, the implementation results of the proposed converter are analyzed; then, we compare the hardware performance of decimal logarithmic converter with a binary logarithmic converter [11]; finally we analyze how we can scale the proposed 32-bit DFP logarithmic converter to a 64-bit or 128-bit converter. Section 6 gives conclusions. 2. A 32-bit DFP Logarithm 2.1. DFP Standard The IEEE 754-2008 standard specifies three interchange DFP formats: decimal32, decimal64 and decimal128 encoded in 32, 64 and 128 bits respectively. Figure 1 shows the basic DFP format specified in IEEE 754-2008. bit 11 bits bits 55bits Sign S Combination Field G 6 bits w bits Exponent Continuation E decimal32 format, is represented as: X = (−1)s × 10e × coeff icient (1) In (1), e is in the range of (Emin − 6) ≤ e ≤ (Emax − 6) (Emax = 96, Emin = −95) and the coefficient field is represented as an integer. If the DFP numbers with absolute values are larger than the largest DFP number (|Xmax| = 9999999×1090 ) then overflow occurs. Similarly, if they are less than smallest 32-bit DFP number (|Xmin| = 10−101 ) then underflow occurs. When the absolute value of DFP number is less than 1000000 × 10−101 and larger than 0000001 × 10−101 , it will produce subnormal. 2.2. Calculation of 32-bit DFP Logarithm A valid 32-bit DFP logarithmic calculation is defined as: 20 bits / 6 digitsdigits j bits \ 3j/10 Coefficient Continuation C Figure 1. DFP Number Format. The sign is a 1-bit field and indicates the sign of the number in the same way as BFP numbers. The combination field is a 5-bit field that encodes two most significant bits (MSBs) of the exponent and the most significant digit (MSD) of the coefficient. The Not-a-Number (NaN) and Infinite number (Inf) are indicated in the combination field. The exponent field (w+2 bits) is formed by appending the w-bit of exponent continuation as a suffix to the 2-bit MSBs derived from the combination field. The whole encoded exponent is an unsigned binary integer with the largest unsigned value. The value of the exponent is calculated by subtracting a exponent bias from the value of the encoded exponent, to be able to represent both negative and positive exponents. The coefficient field (j+4 bits) is formed by appending the decoded continuation digits (j-bit) as a suffix to the most significant digit (MSD) derived from the combination field. The j-bit coefficient continuation is a multiple of 10-bit and the most significant group is on the left. Each 10-bit group represents three decimal digits, using Densely Packed Decimal (DPD) encoding [2] and can be decoded to a 12-bit binary-coded decimal (BCD) representation. The total coefficient digit is q = 3j/10+1 digits1 . In IEEE-754-2008 DFP standard, the value of the coefficient is an non-normalized unsigned decimal fraction in the form of d0 .d1 d2 ...d6 , 0 ≤ di < 10. In decimal computer arithmetic, the coefficient is usually represented as an integer. The value of a 32-bit DFP number, compliant with 1 Note that w = 6, 8 and 12; j = 20, 50 and 110; exponent bias = 101, 398 and 6176; q = 7, 16, and 34 respectively in decimal32, decimal64 and decimal128 formats. R = log10 (X) = log10 (10e ) + log10 (coeff icient) (2) In (2), the exponent is in the range of −101 ≤ e ≤ 90, and the coefficient is a q-digit (q = 7) no-normalized integer in the range of 0000001 ≤ coeff icient ≤ 9999999. There are some exceptional cases need to be dealt with during a 32-bit DFP logarithmic calculation. First of all, X must be a positive floating-point number (S = 0), otherwise the logarithmic converter simply returns NaN. Moreover, if X is NaN and Zero, the logarithmic converter then simply returns NaN, if X is infinite, the logarithmic converter simply returns Inf. The inexact logarithm results need to be rounded and normalized to exact q-digit logarithm results. Since the maximum and minimum logarithm results are log10 (|Xmax|) = 96.99999 and log10 (|Xmin|) = −101 respectively, the subnormal, overflow and underflow will not be produced during logarithmic calculations. The calculation of log10 (coeff icient) is a 7-digit FXP decimal logarithm operation. Since it is defined as a nonormalized integer, the coefficient of DFP number should be adjusted into the range of [0.1, 1) before calculated. Therefore, (3) is obtained: R = log10 (X) = e + k + log10 (m) (3) k is the characteristic of the logarithms and can be easily achieved by leading-zero-detector (LZD), (1 ≤ k ≤ 7); m is a decimal fraction that consists of all q-digit (q = 7) of the coefficient part of 32-bit DFP number, (0.1 ≤ m < 1). Since the target is a 32-bit DFP calculation, the 7-digit FXP logarithm calculation should be able to achieve enough accuracy to guarantee faithful. A straightforward approach is required to guarantee at least a precision of 2q digits (14digit) so that the inexact rounding can be implied by a left shift of up to exactly q-digit (7-digit). Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. 3. Decimal FXP Logarithmic Converter 3.2. Selection by Rounding 3.1. Overview of Algorithm The selected redundant digits are achieved through rounding to the integer part of the residual indicated as (15), where ej ∈ {−9, −8, −7...., 0, ...7, 8, 9}. A digit-recurrence algorithm to calculate log10 (m) is summarized as follows, (0.1 ≤ m < 1). log10 (m) = log10 (m fj ) − log10 (fj ) (4) If the following condition is satisfied: lim {m j→∞ Then lim {log10 (m fj } → 1 j→∞ (5) (15) − 0.5 ≤ W [j] − ej ≤ 0.5 (16) Then obtain (16): Since |ej+1 | ≤ 9, thus, − 9.5 < W [j + 1] < 9.5 fj )} → 0 (6) ∞ (17) Equation (14) can be written as: W [j + 1] = 10(W [j]−ej )+ej101−j(W [j]−ej +ej ) (18) Finally log10 (m) = 0 − ej = round(W [j]) log10 (fj ) (7) j=1 fj is defined as fj = 1+ej 10−j in which m is transformed to 1 by successive multiplication. This form of fj allows the use of a shift-and-add implementation. The corresponding recurrences for transforming m and computing the logarithm are presented in (8) and (9), where j ≥ 1, E[1] = m and L[1] = 0. E(j + 1) = E[j](1 + ej 10−j ) (8) −j L(j + 1) = L[j] − log10 (1 + ej 10 ) (9) The digits ej are selected so that E(j + 1) converges to 1, 1-digit accuracy of the calculation result is, therefore, obtained in each iteration. After performing the last iteration of recurrence, the results are: E(N + 1) ≈ 1 L(N + 1) ≈ log10 (m) (10) (11) To have the selection function for ej , a scaled remainder is defined as: (12) W [j] = 10j (1 − E[j]) Thus, E[j] = 1−W [j]10−j (13) Substituting (13) in (8) yields W [j + 1] = 10(W [j] − ej + ej W [j]10−j ) (14) According to (14), the digits ej are selected as a function of leading digits of scaled residual in a way that the residual W [j] remains bounded. According to (16), (17) and (18), the numerical analysis is processed as follows: −0.5 × 10 + ej 101−j (−0.5 + ej ) > −9.5 1−j 0.5 × 10 + ej 10 (0.5 + ej ) < 9.5 (19) (20) The numerical analysis results show that if and only j ≥ 3, the conditions (19) and (20) are satisfied. In doing so, the selection by rounding is only valid for iterations j ≥ 3 and e1 and e2 can be only achieved by look-up tables. However, using two look-up tables for j = 1, 2 will significantly increase the overall hardware implementations. Therefore, the restriction for e1 is defined so that e2 can be achieved by selection by rounding and one look-up table will be saved. Because W [1] = 10(m−1), W [2] can be achieved as: W [2] = 100−100 × m−10e1 × m (21) When the value of j equates to 2, the value of e2 is in the range of −7 < e2 < 7 so that (19) and (20) are satisfied. −7 < e2 < 7 is brought to (16), then (22) is obtained: − 6.5 < W [2] < 6.5 (22) From (21) and (22), we can obtain a conclusion that input FXP decimal number m is in the range of 0.5 ≤ m ≤ 1 and e2 can be achieved by selection by rounding. The look-up table for selection of e1 is shown in Table 1. Because m is in the range of 0.1 ≤ m < 1, the input number in the range of 0.1 ≤ m < 0.5 needs to be adjusted by multiplying with 2, 3 or 5. Then the adjusted numbers m which are in the range of 0.5 ≤ m ≤ 1 are calculated by selection by rounding. Finally, the logarithm results log10 (m ) are adjusted by subtracting the constant (log10 (2), log10 (3) or log10 (5)) to obtain the final logarithm results of log10 (m). Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. • Iterations j ≤ 8, the logarithms can be achieved by look-up table II in which the values of − log10(1 + ej10−j ) are stored. In the iterations j > 8, the logarithm results can be approximated by −ej10−j/ ln(10). Table 1. Look-up Table for e1 Selection T he range of m e1 (BCD) [0.96, 1.00) 0(0000) [0.88, 0.95] 1(0001) [0.81, 0.87] 2(0010) [0.76, 0.80] 3(0011) [0.70, 0.75] 4(0100) [0.66, 0.69] 5(0101) [0.62, 0.65] 6(0110) [0.59, 0.61] 7(0111) [0.56, 0.58] 8(1000) [0.50, 0.55] 9(1001) • The values of − log10(1+ej10−j ) in iteration j ≤ 8 and −ej10−j/ ln(10) in iteration j > 8 are accumulated to achieve log10 (m ) which is adjusted by subtracting the constant (0, log10 (2), log10 (3) or log10 (5)) to obtain a 14-digit accuracy FXP decimal logarithm result. 3.5. Error Analysis and Evaluation 3.3. Approximation of Logarithm Logarithm result can be achieved by accumulating the values of − log10 (1+ej10−j ) in each iteration. The values of − log10 (1 + ej10−j ) are stored in another look-up table II. With the increasing number of iteration, however, the size of the table will become prohibitively larger. Therefore, a method for reducing the size of table, which can achieve a significant reduction in the overall hardware requirement, is necessary. A series expansion of logarithm function log10 (1+x) is expressed in (23): x2 log10 (1 + x) = (x − + ....)/ ln(10) 2 (23) After iteration j = k, the values of log10 (1 + ej10−j ) can be approximated by ej10−j / ln(10). Since a 14-digit accuracy needs to be guaranteed in this study, the series approximation can be used in the iterations when the constraint x2 −16 is met, where x = ej10−j 2 ln(10) < 10 e2j 10−2j /2 ln(10) < 10−16 The errors in the proposed algorithm are produced in four ways. The first error is the inherent error of algorithm, εi , resulted from the difference between the logarithm results obtained from finite iterations and the exact results obtained from infinite iterations. The second is the approximation error, εa , produced by approximating the values of − log10(1 + ej10−j ) with the value of −ej10−j/ ln(10). The third is the quantization error, εq , resulted from the finite precision of the intermediate values in the hardwareoriented algorithm. The fourth is the final output rounding error, εr , whose maximum value is 1/2 unit in the last place (ulp). In order to obtain a 14-digit accuracy logarithm result, the following condition must be satisfied : Eabsolute = εi +εa +εq +εr ≤ 10−14 3.5.1 Inherent Error of Algorithm Since each FXP decimal logarithm result is achieved after the 15th iteration, εi can be defined as: εi = − ∞ log10 (1+ej 10−j ) In order to use the static error analysis method, we choose the worst cases (ej = 9 or −9) to analyze the maximum εi : εi = − ∞ log10 (1±9×10−j ) The numerical analysis of (24) shows that after the number of k = 8 iterations, while the values of − log10 (1 + ej10−j ) does not need to be stored in table, the values of −ej10−j/ ln(10), instead, will be used for approximation. According to (27), the maximum εi is in the range: 3.4. Algorithm Summary 3.5.2 • Iterations j = 2 to j = 15, convergence is achieved with selection by rounding and the redundant digits ej are obtained. (26) j=16 (24) • First iteration (j = 1), e1 is obtained by look-up table I under the restriction of 0.5 ≤ m ≤ 1, and the number in the range of 0.1 ≤ m < 0.5 need to be adjusted. (25) (27) j=16 − 4.34×10−16 ≤ εi ≤ 4.34×10−16 (28) Approximation Error We use approximate value, ej 10−j / ln 10, to estimate log10 (1+ej 10−j ) from the 9th to the 15th iteration. According to the series expansion of logarithm function in (23), this approach produces an approximation error, εa : εa = 15 j=9 (− (ej 10−j )2 (ej 10−j )3 + −...)/ ln(10) 2 3 Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. (29) Since 3.5.4 15 (ej 10−j )3 −...)/ ln(10) 10−16 ( 3 j=9 (30) we keep −(ej 10−j )2 /2 ln(10) to analyze εa : 15 Eabsolute =εi +εa +εq +εr = 0.660×10−14 (31) Considering the worst cases (ej = 9 or −9) in (32), we obtain the maximum εa : εa ≤ 1.78×10−17 (32) Quantization Error Since only those intermediate values who have finite precisions are operated in the hardware-oriented algorithm, three quantization errors occur. First, the logarithm results are achieved by accumulating the 16-digit rounding values of − log10(1+ej10−j ) from the 1st to the 8th iteration. In each iteration, the maximum rounding error of − log10(1+ej10−j ) is 0.5×10−16 , therefore the maximum εq1 is: εq1 ≤ 8 0.5×10−16 = 4×10−16 (33) j=1 Second, the logarithm results are achieved by accumulating the 16-digit rounding values of −ej10−j/ ln(10) from the 9th to the 15th iteration. Since the maximum quantization error of the value 1/ln(10) is 0.5 × 10−14 , when ej = 9 or −9, the maximum ε1q2 is: ε1q2 ≤ 15 ±9×10−j ×0.5×10−14 10−16 (34) j=9 Another quantization error, ε2q2 , is produced by the finite 16digit precision truncating value of −ej10−j/ ln(10). In each iteration, the maximum truncating error of −ej10−j/ ln(10) is 1×10−16 , therefore the maximum ε2q2 is: ε2q2 ≤ Since the final logarithm result has 14-digit accuracy, the maximum final rounding error is 1/2 ulp, εr = 0.5×10−14 . With εi , εa , εq in (28), (32) and (37) respectively, −j 2 (ej 10 ) εa ≈ )/ ln(10) (− 2 j=9 3.5.3 15 1×10−16 = 7×10−16 (35) j=9 Third, the logarithm result log10 (m ) is adjusted by a finite 16-digit rounding constant (0, log10 (2), log10 (3) or log10 (5)) in the last iteration, so the quantization error, εq3 , occurs. The maximum εq3 is: εq3 ≤ 0.5×10−16 (36) Therefore, the maximum quantization error, εq , is: εq ≤ εq1 +ε1q2 +ε2q2 +εq3 ≈ Error Evaluation −15 1.15×10 (37) (38) Eabsolute satisfies the condition (25), so the proposed algorithm can guarantee faithful rounding for 14-digit precision decimal logarithm results. Moreover, a MATLAB simulation model which is completely consistent with the hardware implementation of the proposed 7-digit FXP logarithmic converter is set up. The MATLAB simulation model proves that there is a need to keep at least 14-digit precision for W [j] to obtain correct ej during 15 iterations. Furthermore, both the 10,000 7-digit decimal operands (close to 1.0) and the 100,000 random decimal operands in the rage of [0.1, 1) are simulated as test vectors in the MATLAB model. All the logarithm results achieved from simulation model can guarantee 14-digit accuracy. 3.6. Architecture Figure 2 shows a sequential architecture of the proposed 7-digit FXP decimal logarithmic converter. The hardware implementation of this logarithmic converter includes two stages. The stage 1 shown in Figure 2 is to obtain ej with selection by rounding. After ej is achieved, the logarithm results will be produced in the stage 2. Finally, for the input decimal numbers that are in the range of 0.1 ≤ m < 0.5, the corresponding logarithm results are adjusted. 3.6.1 Main Features of Architecture All variables in this architecture are represented with 10’s complement number system. Each digit of positive FXP decimal number is represented by 4-bit BCD code, whereas each digit of negative number is represented by its 10’s complement format. The reason of choosing 10’s complement format is the same as binary 2’s complement format, all digits, including the sign digit, participate in add or subtract operation. Moreover, the decimal subtraction operation can be replaced by a decimal addition in 10’s complement format. The architecture of this logarithmic converter includes 2 look-up tables. The look-up table I is constructed by a size of 24×4 ROM in which the values of e1 is stored as shown in Table I. The look-up table II 2 stores all the 16-digit values of log10 (1 + ej 10−j ) for achieving logarithm results; here 2 Note that the proposed architecture can be transformed to a decimal base e logarithmic converter by storing the values of ln(1 + ej 10−j ), ln(5), ln(3) and ln(2) in the look-up table II. Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. m 28 Reg 1 28 Mult1 2 8 TABLE I 32 m 2m 3m 5m e1 4 Reg 2 “0000” W[j] 56 56 m' e1 Mux 1 56 4 Mux 2 4 Mult2 56 Shifter (x10-j) 56 Reg 4 ej 4 ej 56 “0000” 56 m' 56 m' Mux 3 56 9'sCom 56 4 56 1 Mux 4 56 W[j] TABLE II (1/ln(10)) 4 56 Adjusted Costant 0 & Log 10(5,2,3) Mult3 -j 64 64 64 64 64 16-Digit Decimal CLA Adder Shifter (x100) 56 64 Mux 9 Mux 8 56 Shifter (x10) 56 Stage 2 4 Log10 (1+ej10 ) 14-Digit Decimal CLA Adder 56 9'sCom 56 Mux 5 56 4e j e1 4 Stage 1 Mux 7 8 Detector Reg 6 64 Mux 6 56 4 Reg 3 critical path Reg 5 14-Digit Decimal CLA Adder 56 W[j] Rounding Logic ej 4 Figure 2. Architecture of FXP Decimal Logarithmic Converter. j is in the range of 1 ≤ j ≤ 8 because the logarithm results can be achieved by the approximation of series expansion of logarithm function after 8 iterations. Furthermore, the 16-digit adjustment parameters, log10 (2), log10 (3) and log10 (5) are stored in this table. The size of look-up table II is 28 × 64. The Mult1, Mult2 and Mult3 in the architecture are multiple logics for obtaining the multiple of values. The Mult1 is to achieve the m, 2m, 3m and 5m; the Mult2 and Mult3 are designed to achieve ej m and ej / ln(10). Here ej is a value in the range of −9 ≤ ej ≤ 9, so the multiple logic is to obtain the results calculated from −9m to 9m. In this paper, the Mult1, Mult2 and Mult3 are implemented based on the partial product generation logic described in literature [9]. Based on 1-digit decimal carry-look-ahead (CLA) adder described in literature [15], the 10’s complement decimal CLA adder is implemented. For achieving faster speed, the 16-digit and 14-digit decimal numbers are divided into four groups in which there is a separate CLA adder in each group. The subtraction operations in algorithm are carried out by this CLA adder due to the 10’s complement decimal format used in this architecture. 3.6.2 Cycle Process At the first clock cycle, the first 7 digits FXP decimal number is obtained from Reg1. The input numbers in the range of 0.1 ≤ m < 0.5 are adjusted in the Mult1. The corresponding input m (selected from m, 2m, 3m and 5m) and e1 (obtained from the look-up table I) are sent to Reg2. In the first iteration (2nd clock cycle), m and e1 are selected by Mux1 and Mux2 respectively for achieving the e1 m in Mult2. At the same time, the m and 1 are chosen by Mux3 and Mux4 to obtain the 1 − m in 14-digit CLA adder. Then, e1m is shifted left 1-digit to achieve 10e1 m and 1−m is shifted left 2-digit to achieve 100(1−m ). Finally W [2] is obtained by adding −10e1 m and 100(1 − m ) together in 14-digit CLA adder. Then, W [2] is rounded to integer in Rounding logic to obtain e2 . As the same time, e1 is chosen by Mux7 and sent to stage 2, so the value of log10 (1+e110−1 ) is obtained from look-up table II. This value is selected by Mux8 and adjusted constant (log10 (2), log10 (3) and log10 (5), 0) is chosen by Mux9. Finally, the logarithms result L[2] is obtained in a 16-digit CLA adder in stage 2. From the second to the eighth iteration (3rd to 9th clock cycle), W [j] is chosen by Mux1, and ej obtained from the previous iteration, is selected by Mux2 and then, ej W [j] is obtained in Mult2. Meanwhile, −ej and W [j] are chosen by Mux3 and Mux4 to obtain the W [j]−ej in 14-digit CLA adder. Then, ejW [j] out from Mult2 is shifted right (j−1)digit to achieve ej W [j]10−(j−1) , and W [j] − ej is shifted left 1-digit to achieve 10(ej − W [j]). Finally W [j + 1] is obtained by adding ej W [j]10−(j−1) and 10(W [j] − ej ) together in 14-digit CLA adder. Then, W [j + 1] is rounded to integer in Rounding logic to obtain ej+1 . At the same time, ej is chosen by Mux7 and sent to stage 2. The values of log10 (1+ej 10−j ) are determined by look-up table II and chosen by Mux8. The result of logarithm in previous iteration is chosen by Mux9, and then they are added together in 16-digit CLA adder to obtain the L[j]. Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. From the ninth to the fifteenth iteration (10th to 16th clock cycle), ej+1 is obtained from the same process as the previous iterations. However, the logarithm results are approximated by ej 10−j/ ln(10) instead of from look-up table II. After 15 iterations, the final logarithms results are obtained. This 7-digit FXP decimal logarithmic converter takes 16 clock cycles to achieve decimal logarithms result with 14-digit accuracy. X 32 Sign Combinational Field Exponent Continuation 5 1 20 28 coefficient 8 k-bias Bin-to-BCD BCD exponent 12 Leading-zero-Detector 12 Fix-Point Decimal Log Converter 64 Exp_out- ‘1’ 12 28 coefficient’ 3-digit Decimal CLA Adder 12 log_int 4. A 32-bit DFP Logarithmic Converter log_fra 10's Complement log_int-1 64 64 1+log_fra Mux 1 Mux 2 12 64 log_fraction log_integer Combine Log_int . Log_fra exception cases: “01” Infinite “10” NaN “00” Normal 2 The architecture of the 32-bit DFP logarithmic converter is shown in Figure 3. First of all, the 32-bit DFP number is sent to a IEEE-754 decoder which unpacks the 32-bit DFP format to 8-bit exponent, 28-bit coefficient and 2-bit signal to represent exception cases. Second, 8-bit binary unsigned exponent is converted to a 12-bit BCD representation with a combinational Bin-to-BCD converter, which is implemented based on shift-and-add algorithms. The LeadingZero-Detector is defined to allow the 7-digit integer coefficient in the range of [0.1, 1). Meanwhile, the value of the characteristic k with minus bias (k − bias) adds the BCD exponent to represent the integer part of decimal logarithm results. The 7-digit adjusted coefficient is then calculated with FXP logarithmic converter and the 16-digit logarithm is saved for the next faithful rounding. If the integer field of the decimal logarithm result is positive, it is subtracted by ‘1’ and combined with 10’s complement of 16-digit FXP logarithm result to obtain a decimal logarithm result. Otherwise, the integer field is directly combined with the 16-digit FXP logarithm result to achieve an inexact logarithm result. The Shift register is to shift inexact logarithm result to obtain the exact 36-bit coefficient part, 8-bit binary unsigned exponent field and 1-bit sign field. The Rounding logic is to round the 36-bit coefficient to 28-bit faithful coefficient part by the round-halfeven algorithm. Finally, a 1-bit sign field, a 8-bit exponent field, a 28-bit coefficient field and 2-bit signals for exceptional cases are coded in IEEE-754 coder to pack a faithful 32-bit DFP logarithm result. We choose the DFP number, (−1)0 × 9999999 × 10−7 , as an example to illustrate the data flow of the proposed 32-bit DFP logarithmic architecture. The 32-bit DFP format of this number, represented in the hexadecimal format, is “6DE3FCFF”. First, the IEEE-754 decoder decomposes the 32-bit DFP format to a 8-bit unsigned binary exponent “01011110” and a 28-bit coefficient “9999999” in the form of BCD code. Second, the 8-bit unsigned binary exponent is converted to a 12-bit decimal BCD exponent “094” in BIN-to-BCD converter; the adjusted 28-bit coefficient “0.9999999” and the 3-digit characteristic k with minus bias Unpacking IEEE 754 Decoder with Input Register exponent 12 4.1. Architecture Coefficient Continuation 6 76 log_int. log_fra Shift Register 36 1 sign 1 Sign Rounding 28 coefficient_out 8 exponent_out IEEE 754 Coder with Output Register 32 6 5 20 Combinational Field Exponent Continuation Packing Coefficient Continuation 32 R=Log10(X) Figure 3. 32-bit DFP Logarithmic Converter. (k − bias) “906” are achieved in Leading-Zero-Detector. Third, the integer part of the logarithm result “000” is obtained by adding the BCD exponent “094” with the (k−bias) “906” in the 3-digit decimal CLA adder. Meanwhile, the result of the 16-digit FXP logarithm, “0.0000000434294461” is achieved in FXP decimal logarithmic converter. Fourth, since the integer of the logarithm result is “000” which is not positive, so it is directly combined with the 16-digit FXP logarithm result to obtain the inexact logarithm result “000.0000000434294461”. Fifth, the exact 36-bit integer coefficient “434294461”, the 8-bit binary unsigned exponent “01011110”, and the 1-bit sign ‘1’ is obtained by Shift register. Finally, the exact 36-bit integer coefficient is rounded to 28-bit faithful integer coefficient “4342945” in Rounding logic and the 32-bit DFP format of the logarithm result “B1770ACD” in the hexadecimal format is obtained in IEEE-754 coder. 4.2. Function Verification This section presents the function verification platform for verifying the proposed 32-bit DFP logarithmic converter. The function verification platform is implemented in Xilinx University Program Virtex-II Pro Development System [14] with Embedded Development Kit (EDK). This system includes a Virtex-II PRO P30 FPGA configuration [14]. The proposed verification method is created in Power PC with C language. First, the valid DFP test vectors which are coded to 32-bit DFP format are sent to 32-bit Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. Table 2. Critical Path of the Proposed 32-bit DFP Logarithmic Converter. Reg2 M ux2 M ult2 Shif ter M ux5 CLA Rounding T otal delay(ns) 1.188 ns 1.564 ns 9.347 ns 1.438 ns 1.350 ns 5.519 ns 0.566 ns 20.97 ns DFP logarithmic converter. The logarithm results calculated by this converter are sent back to Power PC. Meanwhile, these test vectors are calculated by Power PC to achieve the accurate double precision BFP results as the benchmarks. Finally, the logarithm results calculated by 32-bit DFP logarithmic converter are compared with these accurate results obtained by Power PC. If they are not identical, the corresponding 32-bit DFP format test vector will be displayed in personal computer for debugging the 32-bit DFP logarithmic converter. It is difficult to verify all the test vectors (192 × 107 ) due to the infinite processing time in this verification platform, so 10,000 special cases (NaN, Infinite, Zero, Subnormal) and 100,000 random test vectors are chosen and sent to this verification platform. The verification results show that all these 32-bit DFP logarithm results calculated by the proposed 32-bit DFP logarithmic converter are correct. 5. Experimental Results and Analysis 5.1. Implementation Results The proposed 32-bit DFP logarithmic converter is modeled with VHDL and implemented in Virtex-II PRO P30 FPGA configuration. The proposed 32-bit DFP logarithmic converter is synthesized with XST and placed and routed by Xilinx ISE 9.1. It occupies 1 out of 16 GCLK I/O block, 66 out of 644 I/O blocks, and 2,842 out of 13696 slices. The maximum clock frequency and latency are 47.7 MHz and 18 clock cycles respectively. The critical path of the proposed architecture is in stage 1 of the FXP decimal logarithmic converter which is highlighted in Figure 2 (dotted line) and its details are available in Table 2. Furthermore, the proposed 32-bit DFP logarithmic converter is synthesized with TMSC 0.18-um standard cell library and the implementation results indicate that its maximum frequency and area are 107.9 MHz and 221589.66 unit. Since there is no comparable decimal DFP logarithmic converter, we compare the proposed decimal FXP logarithmic converter with the radix-8 binary FXP logarithmic converter [11] for two cases in different precisions (Case 1: 7digit and 24-bit; Case 2: 16-digit and 53-bit), because 1) they have similar dynamic range for the normalized coefficients (223 < 107 < 224 ) for case 1, and (252 < 1016 < 253 ) for case 2; 2) they are implemented by same digitrecurrence algorithm with selection by rounding; and 3) the radix-10 is close to radix-8. For the purpose of comparison, the proposed decimal FXP logarithmic converter is synthesized with a TMSC 0.18-um standard cell library [13]. The synthesis results show that the worse case path delay and area in the 7-digit decimal FXP logarithmic converter are 8.25 ns and 145772.82 units; in the 16-digit decimal FXP logarithmic converter are 9.28 ns and 236164.33 units. Since the timing and area evaluation units in [11] are τ and f a ( 1τ = the delay of 1-bit full adder, 1f a = the area of 1-bit full adder), we use the same units to represent the delay and area of decimal FXP logarithmic converter in this paper3 . Table 3 shows the compared results of case 1 and 2, in which the proposed 7-digit architecture is 2.73 times slower and 2.51 times larger than the 24-bit radix-8 binary FXP logarithmic converter in case 1; and the proposed 16digit architecture is 2.38 times slower and 1.44 times larger than the 53-bit radix-8 binary FXP logarithmic converter in case 2. The reason is that 1) the number, in the form of BCD code in the proposed architecture, is less efficient than the binary number in the radix-8 binary FXP logarithmic converter and needs more resource to be implemented. 2) the latency of decimal arithmetic, such as decimal CLA adder and Multiple logic in Figure 2, is larger than the signed-digit (SD) binary adder and SD Multiple logic in the architecture of the radix-8 binary FXP logarithmic converters. 5.2. Scale to Decimal64 and Decimal128 Note that while decimal32 is only a storage format in IEEE 754-2008 standard, decimal64 and decimal128 are more accurate formats for decimal calculation. To explain how we scale the proposed 32-bit DFP logarithmic converter to 64-bit and 128-bit converters, compliant with decimal64 and decimal128 formats, we mainly discuss the transformation of the core part, the decimal FXP logarithmic converter. The 7-digit coefficient field in decimal32 format is extended to the 16-digit and 34-digit in decimal64 and decimal128 formats respectively, so the decimal FXP logarithmic converter should be able to achieve the 32-digit and 68-digit accurate results in order to guarantee faithful rounding for the 64-bit and 128-bit DFP logarithm results. The main alterations of the decimal FXP logarithmic converters for decimal64 and decimal128 are: 1) The digit width of Mult1 in the stage 1 of the decimal FXP logarithmic converter (refer to Figure 2.) needs to be extended to 16-digit and 34-digit. 2) It needs to keep at least 32-digit 3 Note that the τ and f a are delay and area of 1-bit full adder (ADFULD4) in TMSC 0.18-um standard cell library[13]. Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply. Table 3. Hardware Performance Comparison. Radix-10 Decimal Log Converter Radix-8 Binary Log Converter [11] P recision 7-digit 16-digit 24-bit 53-bit Areas 1630 f a 2640 f a 647 f a 1829 f a Cycle time 17 τ 19 τ 7τ 8τ N umber of cycles 9 18 8 18 Latency 153 τ 342 τ 56 τ 144 τ and 68-digit precision for W [j] in order to obtain correct ej during 33 and 69 iterations, therefore the digit width of CLA adder, Mult2 and other blocks in the stage 1 of the decimal FXP logarithmic converter need to be extended to 32-digit and 68-digit. 3) The decimal FXP logarithm results can be achieved by accumulating the values of − log10(1 + ej10−j ) in iteration j ≤ k and −ej10−j/ ln(10) in iteration j > k, where k = 17 for 32-digit accuracy results, and k = 35 for 68-digit accuracy results4 . The digit width of CLA adder, Mult3 and other blocks in the stage 2 of the decimal FXP logarithmic converter need to be extended to 34-digit and 70-digit. 4) The look-up table I, where the values of e1 are stored, remains the same. However, the look-up table II needs to store the 34-digit and 70-digit values of log10 (1 + ej 10−j ) when j is in the range of 1 ≤ j ≤ 17 and 1 ≤ j ≤ 35 for achieving 32-digit and 68-digit accurate logarithm results. Furthermore, the 34-digit and 68-digit adjustment constants, log10 (2), log10 (3) and log10 (5) are stored in this table too. The size of look-up table II in the decimal FXP logarithmic converter needs to be extended to 29×136 and 210×280 for decimal64 and decimal128 formats respectively. 6. Conclusions In this paper, we first present a 32-bit DFP format and its related logarithm operation. Second, we develop a decimal digit-recurrence algorithm with selection by rounding to achieve the radix-10 fixed-point (FXP) logarithm operation. Third, we construct the architecture of the 32-bit DFP logarithmic converter which is implemented and verified on an FPGA. Finally, we analyze implementation results of the proposed architecture, and compare the proposed decimal FXP logarithmic converter with a radix-8 binary FXP logarithmic converter for two cases. The compared results show that the decimal FXP logarithmic converter is slower and occupies more area than the binary FXP logarithmic converter. The presented architecture, however, can be optimized to achieve a faster speed or occupy a smaller area. 4 Note that e2 10−2j/2 ln(10) < 10−34 , j ≥ 17 for decimal64; j e2j 10−2j/2 ln(10) < 10−70 , j ≥ 35 for decimal128. References [1] IEEE standard 754-2008. IEEE standard for floating-point arithmetic. IEEE Computer Society, Aug 2008. [2] M. F. Cowlishaw. Densely Packed Decimal Encoding. IEEE Computers and Digital Techniques, pp. 102-104, May 2002. [3] M. F. Cowlishaw. Decimal Floating-Point: Algorism for Computers. IEEE Symp. on Computer Arithmetic, pp. 104111, Jun 2003. [4] A. Y. Duale, M. H. Decker, H.-G. Zipperer, M. Aharoni, and T. J.Bohizic. Decimal Floating-Point in z9: An Implementation and Testing Perspective. J. IBM Res. and Dev., Jan 2007. [5] L. Eisen, J. W. W. III, H.-W. Tast, N. Mading, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough. IBM POWER6 Accelerators: Vmx and dfu,. J. IBM Res. and Dev., Nov 2007. [6] M. D. Ercegovac, T. Lang, and P. Montuschi. Very HighRadix Division with Selection by Rounding and Prescaling. IEEE Trans. on Computers, pp. 909-918, May 1994. [7] L. Imbert, J. Muller, and F. Rico. A Radix-10 BKM Algorithm for Computing Transcendentals on Pocket Computers. J. VLSI Signal Processing, pp. 179-186, Jun 2000. [8] T. Lang and P. Montuschi. Very-High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit. IEEE Trans. on Computers, pp. 827-841, May 1999. [9] T. Lang and A. Nannarelli. A Radix-10 Combinational Multiplier. IEEE Asilomar Conference on Signals, Systems and Computers, pp. 313-317, Oct 2006. [10] J. Muller. Elementary Functions, Algorithms and Implementation. Birkhauser. [11] A. Pińeiro, M. D. Ercegovac, and J. D. Bruguera. HighRadix Logarithm with Selection by Rounding: Algorithm and Implementation. J. VLSI Signal Processing, pp. 109123, May 2005. [12] E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw. Decimal Floating-Point Support on the IBM System z10 Processor. J. IBM Res. and Dev., Jan 2009. [13] Virtual Silicon Technology Inc. Native-18 Standard Cell Library 0.18V TSMC Process, Sep 1999. [14] Xilinx Inc. Xilinx University Program Virtex-II Pro Development System, Hardware Reference Manual. [15] Y. You, Y. Kim, and J. Choi. Dynamic Decimal Adder Circuit Design by using the Carry Lookahead. IEEE Design and Diagnostics of Electronic Circuits and Systems, pp. 242244, Apr 2006. Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.

A 32-bit Decimal Floating-Point Logarithmic Converter

Related documents

Products

Support

A 32-bit Decimal Floating-Point Logarithmic Converter

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib