A 32-bit Decimal Floating-Point Logarithmic Converter

advertisement
A 32-bit Decimal Floating-Point Logarithmic Converter
Dongdong Chen1 , Yu Zhang1 , Younhee Choi1 , Moon Ho Lee2 , Seok-Bum Ko1,2
Department of Electrical and Computer Engineering, University of Saskatchewan1
Campus Drive 57, Sasaktoon, SK, Canada, seokbum.ko@usask.ca
Institute of Information and Communication, Chonbuk National University2
Jeonju 561-756, Korea, moonho@chonbuk.ac.kr
Abstract
This paper presents a new design and implementation of
a 32-bit decimal floating-point (DFP) logarithmic converter
based on the digit-recurrence algorithm. The converter
can calculate accurate logarithms of 32-bit DFP numbers
which are defined in the IEEE 754-2008 standard. Redundant digit e1 is obtained by look-up table in the first iteration and the rest redundant digits ej are selected by rounding the scaled remainder during the succeeding iterations.
The sequential architecture of the proposed 32-bit DFP logarithmic converter is implemented on Xilinx Virtex-II Pro
P30 FPGA device and then synthesized with TMSC 0.18-um
standard cell library. The implementation results indicate
that the maximum frequency of the proposed architecture is
47.7 MHz in FPGA and 107.9 MHz in TMSC 0.18-um technology. The faithful 32-bit DFP logarithm results can be
obtained in 18 cycles.
Keywords: Decimal Logarithmic Converter, Decimal
Floating-Point, Digit-Recurrence Algorithm, Selection by
Rounding.
1. Introduction
Nowadays, there are many commercial demands for DFP
arithmetic operations such as financial analysis, tax calculation, phone billing, currency conversion, Internet based applications, and e-commerce [3]. This trend gives rise to further development on DFP arithmetic unit which can perform
more accurate calculations compared with a BFP arithmetic
unit. Due to the significance of DFP arithmetic, the IEEE
754-2008 standard for floating-point arithmetic [1] includes
it in the specifications. Decimal arithmetic unit, as a main
part of a decimal processor, is attracting more and more
researchers’ attention. The decimal-encoded formats and
arithmetic have been implemented in IBM’s POWER6 [5],
system z9 [4] and z10 processors [12].
The logarithms operation, as one of the elementary function, is a useful arithmetic concept in many areas of science
and engineering. Some applications, such as logarithmic
number system (LNS) and digital signal processing, are implemented by using a logarithmic unit to replace the normal computer arithmetic. Moreover, the decimal logarithms
operation as a decimal arithmetic operation is defined in
the new IEEE 754-2008 standard [1]. Based on the improvement of basic decimal arithmetic units, more complex
DFP elementary operations such as logarithm, exponential,
trigonometric, etc would be the next useful building blocks.
Muller [10] presents both software and hardwareoriented algorithms to compute elementary functions.
While most elementary functions are implemented by the
software-oriented methods due to their advantage of using
large look-up tables and providing more accurate results,
these methods are usually too slow for numerically intensive and real-time applications. Hardware-oriented methods with high-speed solutions have been developed as an
alternative. A CORDIC-like BKM algorithm is presented
in [7] for fast computation of complex exponentials and
logarithms. Another digit-recurrence hardware-oriented algorithm is an interesting alternative method due to its low
area requirements, especially for high-precision computations. The selection by rounding is introduced for highradix binary division, square-root in [6],[8] and logarithm
in [11]. This method can efficiently decrease the cost of
implementation and, in particular, the complexities of the
selection function for the digits. In this paper, a radix-10
digit-recurrence algorithm with selection by rounding based
approach is proposed to implement a 32-bit DFP logarithmic converter.
This paper is organized as follows: In section 2, the basic DFP standard and 32-bit DFP logarithmic calculation is
described. Section 3 presents a radix-10 fixed-point (FXP)
logarithm operation by digit-recurrence algorithm with selection by rounding, and the related architecture is constructed. In section 4, the architecture of the proposed 32bit DFP logarithmic converter is presented with an example
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
and then verified by a function verification platform. In Section 5, first, the implementation results of the proposed converter are analyzed; then, we compare the hardware performance of decimal logarithmic converter with a binary logarithmic converter [11]; finally we analyze how we can scale
the proposed 32-bit DFP logarithmic converter to a 64-bit
or 128-bit converter. Section 6 gives conclusions.
2. A 32-bit DFP Logarithm
2.1. DFP Standard
The IEEE 754-2008 standard specifies three interchange
DFP formats: decimal32, decimal64 and decimal128 encoded in 32, 64 and 128 bits respectively. Figure 1 shows
the basic DFP format specified in IEEE 754-2008.
bit
11 bits
bits
55bits
Sign
S
Combination Field
G
6 bits
w bits
Exponent Continuation
E
decimal32 format, is represented as:
X = (−1)s × 10e × coeff icient
(1)
In (1), e is in the range of (Emin − 6) ≤ e ≤ (Emax − 6)
(Emax = 96, Emin = −95) and the coefficient field is represented as an integer. If the DFP numbers with absolute
values are larger than the largest DFP number (|Xmax| =
9999999×1090 ) then overflow occurs. Similarly, if they are
less than smallest 32-bit DFP number (|Xmin| = 10−101 )
then underflow occurs. When the absolute value of DFP
number is less than 1000000 × 10−101 and larger than
0000001 × 10−101 , it will produce subnormal.
2.2. Calculation of 32-bit DFP Logarithm
A valid 32-bit DFP logarithmic calculation is defined as:
20 bits
/ 6 digitsdigits
j bits
\ 3j/10
Coefficient Continuation
C
Figure 1. DFP Number Format.
The sign is a 1-bit field and indicates the sign of the number in the same way as BFP numbers. The combination field
is a 5-bit field that encodes two most significant bits (MSBs)
of the exponent and the most significant digit (MSD) of the
coefficient. The Not-a-Number (NaN) and Infinite number
(Inf) are indicated in the combination field. The exponent
field (w+2 bits) is formed by appending the w-bit of exponent continuation as a suffix to the 2-bit MSBs derived
from the combination field. The whole encoded exponent is
an unsigned binary integer with the largest unsigned value.
The value of the exponent is calculated by subtracting a
exponent bias from the value of the encoded exponent, to
be able to represent both negative and positive exponents.
The coefficient field (j+4 bits) is formed by appending the
decoded continuation digits (j-bit) as a suffix to the most
significant digit (MSD) derived from the combination field.
The j-bit coefficient continuation is a multiple of 10-bit and
the most significant group is on the left. Each 10-bit group
represents three decimal digits, using Densely Packed Decimal (DPD) encoding [2] and can be decoded to a 12-bit
binary-coded decimal (BCD) representation. The total coefficient digit is q = 3j/10+1 digits1 .
In IEEE-754-2008 DFP standard, the value of the coefficient is an non-normalized unsigned decimal fraction in the
form of d0 .d1 d2 ...d6 , 0 ≤ di < 10. In decimal computer
arithmetic, the coefficient is usually represented as an integer. The value of a 32-bit DFP number, compliant with
1 Note that w = 6, 8 and 12; j = 20, 50 and 110; exponent bias = 101,
398 and 6176; q = 7, 16, and 34 respectively in decimal32, decimal64 and
decimal128 formats.
R = log10 (X) = log10 (10e ) + log10 (coeff icient) (2)
In (2), the exponent is in the range of −101 ≤ e ≤ 90,
and the coefficient is a q-digit (q = 7) no-normalized integer in the range of 0000001 ≤ coeff icient ≤ 9999999.
There are some exceptional cases need to be dealt with during a 32-bit DFP logarithmic calculation. First of all, X
must be a positive floating-point number (S = 0), otherwise
the logarithmic converter simply returns NaN. Moreover, if
X is NaN and Zero, the logarithmic converter then simply returns NaN, if X is infinite, the logarithmic converter
simply returns Inf. The inexact logarithm results need to
be rounded and normalized to exact q-digit logarithm results. Since the maximum and minimum logarithm results
are log10 (|Xmax|) = 96.99999 and log10 (|Xmin|) = −101
respectively, the subnormal, overflow and underflow will
not be produced during logarithmic calculations.
The calculation of log10 (coeff icient) is a 7-digit FXP
decimal logarithm operation. Since it is defined as a nonormalized integer, the coefficient of DFP number should
be adjusted into the range of [0.1, 1) before calculated.
Therefore, (3) is obtained:
R = log10 (X) = e + k + log10 (m)
(3)
k is the characteristic of the logarithms and can be easily
achieved by leading-zero-detector (LZD), (1 ≤ k ≤ 7); m
is a decimal fraction that consists of all q-digit (q = 7) of
the coefficient part of 32-bit DFP number, (0.1 ≤ m < 1).
Since the target is a 32-bit DFP calculation, the 7-digit FXP
logarithm calculation should be able to achieve enough accuracy to guarantee faithful. A straightforward approach is
required to guarantee at least a precision of 2q digits (14digit) so that the inexact rounding can be implied by a left
shift of up to exactly q-digit (7-digit).
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
3. Decimal FXP Logarithmic Converter
3.2. Selection by Rounding
3.1. Overview of Algorithm
The selected redundant digits are achieved through
rounding to the integer part of the residual indicated as (15),
where ej ∈ {−9, −8, −7...., 0, ...7, 8, 9}.
A digit-recurrence algorithm to calculate log10 (m) is
summarized as follows, (0.1 ≤ m < 1).
log10 (m) = log10 (m
fj ) −
log10 (fj )
(4)
If the following condition is satisfied:
lim {m
j→∞
Then
lim {log10 (m
fj } → 1
j→∞
(5)
(15)
− 0.5 ≤ W [j] − ej ≤ 0.5
(16)
Then obtain (16):
Since |ej+1 | ≤ 9, thus,
− 9.5 < W [j + 1] < 9.5
fj )} → 0
(6)
∞
(17)
Equation (14) can be written as:
W [j + 1] = 10(W [j]−ej )+ej101−j(W [j]−ej +ej ) (18)
Finally
log10 (m) = 0 −
ej = round(W [j])
log10 (fj )
(7)
j=1
fj is defined as fj = 1+ej 10−j in which m is transformed
to 1 by successive multiplication. This form of fj allows
the use of a shift-and-add implementation.
The corresponding recurrences for transforming m and
computing the logarithm are presented in (8) and (9), where
j ≥ 1, E[1] = m and L[1] = 0.
E(j + 1) = E[j](1 + ej 10−j )
(8)
−j
L(j + 1) = L[j] − log10 (1 + ej 10
)
(9)
The digits ej are selected so that E(j + 1) converges to 1,
1-digit accuracy of the calculation result is, therefore, obtained in each iteration. After performing the last iteration
of recurrence, the results are:
E(N + 1) ≈ 1
L(N + 1) ≈ log10 (m)
(10)
(11)
To have the selection function for ej , a scaled remainder is
defined as:
(12)
W [j] = 10j (1 − E[j])
Thus,
E[j] = 1−W [j]10−j
(13)
Substituting (13) in (8) yields
W [j + 1] = 10(W [j] − ej + ej W [j]10−j )
(14)
According to (14), the digits ej are selected as a function of
leading digits of scaled residual in a way that the residual
W [j] remains bounded.
According to (16), (17) and (18), the numerical analysis
is processed as follows:
−0.5 × 10 + ej 101−j (−0.5 + ej ) > −9.5
1−j
0.5 × 10 + ej 10
(0.5 + ej ) < 9.5
(19)
(20)
The numerical analysis results show that if and only j ≥ 3,
the conditions (19) and (20) are satisfied. In doing so, the
selection by rounding is only valid for iterations j ≥ 3 and
e1 and e2 can be only achieved by look-up tables. However, using two look-up tables for j = 1, 2 will significantly
increase the overall hardware implementations. Therefore,
the restriction for e1 is defined so that e2 can be achieved by
selection by rounding and one look-up table will be saved.
Because W [1] = 10(m−1), W [2] can be achieved as:
W [2] = 100−100 × m−10e1 × m
(21)
When the value of j equates to 2, the value of e2 is in the
range of −7 < e2 < 7 so that (19) and (20) are satisfied.
−7 < e2 < 7 is brought to (16), then (22) is obtained:
− 6.5 < W [2] < 6.5
(22)
From (21) and (22), we can obtain a conclusion that input
FXP decimal number m is in the range of 0.5 ≤ m ≤ 1 and
e2 can be achieved by selection by rounding. The look-up
table for selection of e1 is shown in Table 1. Because m
is in the range of 0.1 ≤ m < 1, the input number in the
range of 0.1 ≤ m < 0.5 needs to be adjusted by multiplying
with 2, 3 or 5. Then the adjusted numbers m which are
in the range of 0.5 ≤ m ≤ 1 are calculated by selection
by rounding. Finally, the logarithm results log10 (m ) are
adjusted by subtracting the constant (log10 (2), log10 (3) or
log10 (5)) to obtain the final logarithm results of log10 (m).
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
• Iterations j ≤ 8, the logarithms can be achieved by
look-up table II in which the values of − log10(1 +
ej10−j ) are stored. In the iterations j > 8, the logarithm results can be approximated by −ej10−j/ ln(10).
Table 1. Look-up Table for e1 Selection
T he range of m e1 (BCD)
[0.96, 1.00)
0(0000)
[0.88, 0.95]
1(0001)
[0.81, 0.87]
2(0010)
[0.76, 0.80]
3(0011)
[0.70, 0.75]
4(0100)
[0.66, 0.69]
5(0101)
[0.62, 0.65]
6(0110)
[0.59, 0.61]
7(0111)
[0.56, 0.58]
8(1000)
[0.50, 0.55]
9(1001)
• The values of − log10(1+ej10−j ) in iteration j ≤ 8 and
−ej10−j/ ln(10) in iteration j > 8 are accumulated to
achieve log10 (m ) which is adjusted by subtracting the
constant (0, log10 (2), log10 (3) or log10 (5)) to obtain a
14-digit accuracy FXP decimal logarithm result.
3.5. Error Analysis and Evaluation
3.3. Approximation of Logarithm
Logarithm result can be achieved by accumulating the
values of − log10 (1+ej10−j ) in each iteration. The values
of − log10 (1 + ej10−j ) are stored in another look-up table
II. With the increasing number of iteration, however, the
size of the table will become prohibitively larger. Therefore, a method for reducing the size of table, which can
achieve a significant reduction in the overall hardware requirement, is necessary. A series expansion of logarithm
function log10 (1+x) is expressed in (23):
x2
log10 (1 + x) = (x −
+ ....)/ ln(10)
2
(23)
After iteration j = k, the values of log10 (1 + ej10−j ) can
be approximated by ej10−j / ln(10). Since a 14-digit accuracy needs to be guaranteed in this study, the series approximation can be used in the iterations when the constraint
x2
−16
is met, where x = ej10−j
2 ln(10) < 10
e2j 10−2j /2 ln(10) < 10−16
The errors in the proposed algorithm are produced in
four ways. The first error is the inherent error of algorithm,
εi , resulted from the difference between the logarithm results obtained from finite iterations and the exact results obtained from infinite iterations. The second is the approximation error, εa , produced by approximating the values
of − log10(1 + ej10−j ) with the value of −ej10−j/ ln(10).
The third is the quantization error, εq , resulted from the finite precision of the intermediate values in the hardwareoriented algorithm. The fourth is the final output rounding
error, εr , whose maximum value is 1/2 unit in the last place
(ulp). In order to obtain a 14-digit accuracy logarithm result, the following condition must be satisfied :
Eabsolute = εi +εa +εq +εr ≤ 10−14
3.5.1
Inherent Error of Algorithm
Since each FXP decimal logarithm result is achieved after
the 15th iteration, εi can be defined as:
εi = −
∞
log10 (1+ej 10−j )
In order to use the static error analysis method, we choose
the worst cases (ej = 9 or −9) to analyze the maximum εi :
εi = −
∞
log10 (1±9×10−j )
The numerical analysis of (24) shows that after the number of k = 8 iterations, while the values of − log10 (1 +
ej10−j ) does not need to be stored in table, the values of
−ej10−j/ ln(10), instead, will be used for approximation.
According to (27), the maximum εi is in the range:
3.4. Algorithm Summary
3.5.2
• Iterations j = 2 to j = 15, convergence is achieved with
selection by rounding and the redundant digits ej are
obtained.
(26)
j=16
(24)
• First iteration (j = 1), e1 is obtained by look-up table
I under the restriction of 0.5 ≤ m ≤ 1, and the number
in the range of 0.1 ≤ m < 0.5 need to be adjusted.
(25)
(27)
j=16
− 4.34×10−16 ≤ εi ≤ 4.34×10−16
(28)
Approximation Error
We use approximate value, ej 10−j / ln 10, to estimate
log10 (1+ej 10−j ) from the 9th to the 15th iteration. According to the series expansion of logarithm function in (23), this
approach produces an approximation error, εa :
εa =
15
j=9
(−
(ej 10−j )2 (ej 10−j )3
+
−...)/ ln(10)
2
3
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
(29)
Since
3.5.4
15
(ej 10−j )3
−...)/ ln(10) 10−16
(
3
j=9
(30)
we keep −(ej 10−j )2 /2 ln(10) to analyze εa :
15
Eabsolute =εi +εa +εq +εr = 0.660×10−14
(31)
Considering the worst cases (ej = 9 or −9) in (32), we obtain the maximum εa :
εa ≤ 1.78×10−17
(32)
Quantization Error
Since only those intermediate values who have finite precisions are operated in the hardware-oriented algorithm, three
quantization errors occur. First, the logarithm results are
achieved by accumulating the 16-digit rounding values of
− log10(1+ej10−j ) from the 1st to the 8th iteration. In each
iteration, the maximum rounding error of − log10(1+ej10−j )
is 0.5×10−16 , therefore the maximum εq1 is:
εq1 ≤
8
0.5×10−16 = 4×10−16
(33)
j=1
Second, the logarithm results are achieved by accumulating
the 16-digit rounding values of −ej10−j/ ln(10) from the
9th to the 15th iteration. Since the maximum quantization
error of the value 1/ln(10) is 0.5 × 10−14 , when ej = 9 or
−9, the maximum ε1q2 is:
ε1q2 ≤
15
±9×10−j ×0.5×10−14 10−16
(34)
j=9
Another quantization error, ε2q2 , is produced by the finite 16digit precision truncating value of −ej10−j/ ln(10). In each
iteration, the maximum truncating error of −ej10−j/ ln(10)
is 1×10−16 , therefore the maximum ε2q2 is:
ε2q2 ≤
Since the final logarithm result has 14-digit accuracy, the
maximum final rounding error is 1/2 ulp, εr = 0.5×10−14 .
With εi , εa , εq in (28), (32) and (37) respectively,
−j 2
(ej 10 )
εa ≈
)/ ln(10)
(−
2
j=9
3.5.3
15
1×10−16 = 7×10−16
(35)
j=9
Third, the logarithm result log10 (m ) is adjusted by a finite 16-digit rounding constant (0, log10 (2), log10 (3) or
log10 (5)) in the last iteration, so the quantization error, εq3 ,
occurs. The maximum εq3 is:
εq3 ≤ 0.5×10−16
(36)
Therefore, the maximum quantization error, εq , is:
εq ≤ εq1 +ε1q2 +ε2q2 +εq3 ≈
Error Evaluation
−15
1.15×10
(37)
(38)
Eabsolute satisfies the condition (25), so the proposed algorithm can guarantee faithful rounding for 14-digit precision
decimal logarithm results. Moreover, a MATLAB simulation model which is completely consistent with the hardware implementation of the proposed 7-digit FXP logarithmic converter is set up. The MATLAB simulation model
proves that there is a need to keep at least 14-digit precision
for W [j] to obtain correct ej during 15 iterations. Furthermore, both the 10,000 7-digit decimal operands (close to
1.0) and the 100,000 random decimal operands in the rage
of [0.1, 1) are simulated as test vectors in the MATLAB
model. All the logarithm results achieved from simulation
model can guarantee 14-digit accuracy.
3.6. Architecture
Figure 2 shows a sequential architecture of the proposed
7-digit FXP decimal logarithmic converter. The hardware
implementation of this logarithmic converter includes two
stages. The stage 1 shown in Figure 2 is to obtain ej with
selection by rounding. After ej is achieved, the logarithm
results will be produced in the stage 2. Finally, for the input
decimal numbers that are in the range of 0.1 ≤ m < 0.5, the
corresponding logarithm results are adjusted.
3.6.1
Main Features of Architecture
All variables in this architecture are represented with 10’s
complement number system. Each digit of positive FXP
decimal number is represented by 4-bit BCD code, whereas
each digit of negative number is represented by its 10’s
complement format. The reason of choosing 10’s complement format is the same as binary 2’s complement format,
all digits, including the sign digit, participate in add or subtract operation. Moreover, the decimal subtraction operation can be replaced by a decimal addition in 10’s complement format.
The architecture of this logarithmic converter includes 2
look-up tables. The look-up table I is constructed by a size
of 24×4 ROM in which the values of e1 is stored as shown in
Table I. The look-up table II 2 stores all the 16-digit values
of log10 (1 + ej 10−j ) for achieving logarithm results; here
2 Note that the proposed architecture can be transformed to a decimal
base e logarithmic converter by storing the values of ln(1 + ej 10−j ),
ln(5), ln(3) and ln(2) in the look-up table II.
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
m
28
Reg 1
28
Mult1
2
8
TABLE I
32
m 2m 3m 5m
e1 4
Reg 2
“0000”
W[j]
56
56 m'
e1
Mux 1
56
4
Mux 2
4
Mult2
56
Shifter (x10-j)
56
Reg 4
ej
4
ej 56
“0000”
56
m'
56
m'
Mux 3
56
9'sCom
56
4
56
1
Mux 4
56
W[j]
TABLE II
(1/ln(10))
4
56
Adjusted Costant
0 & Log 10(5,2,3)
Mult3
-j
64
64
64
64
64
16-Digit Decimal CLA Adder
Shifter (x100)
56
64
Mux 9
Mux 8
56
Shifter (x10)
56
Stage 2
4
Log10 (1+ej10 )
14-Digit Decimal CLA Adder
56
9'sCom
56
Mux 5
56
4e
j
e1
4
Stage 1
Mux 7
8
Detector
Reg 6
64
Mux 6
56
4
Reg 3
critical path
Reg 5
14-Digit Decimal CLA Adder
56
W[j]
Rounding Logic
ej 4
Figure 2. Architecture of FXP Decimal Logarithmic Converter.
j is in the range of 1 ≤ j ≤ 8 because the logarithm results can be achieved by the approximation of series expansion of logarithm function after 8 iterations. Furthermore,
the 16-digit adjustment parameters, log10 (2), log10 (3) and
log10 (5) are stored in this table. The size of look-up table
II is 28 × 64.
The Mult1, Mult2 and Mult3 in the architecture are multiple logics for obtaining the multiple of values. The Mult1
is to achieve the m, 2m, 3m and 5m; the Mult2 and Mult3
are designed to achieve ej m and ej / ln(10). Here ej is a
value in the range of −9 ≤ ej ≤ 9, so the multiple logic is to
obtain the results calculated from −9m to 9m. In this paper,
the Mult1, Mult2 and Mult3 are implemented based on the
partial product generation logic described in literature [9].
Based on 1-digit decimal carry-look-ahead (CLA) adder
described in literature [15], the 10’s complement decimal
CLA adder is implemented. For achieving faster speed,
the 16-digit and 14-digit decimal numbers are divided into
four groups in which there is a separate CLA adder in each
group. The subtraction operations in algorithm are carried
out by this CLA adder due to the 10’s complement decimal
format used in this architecture.
3.6.2
Cycle Process
At the first clock cycle, the first 7 digits FXP decimal number is obtained from Reg1. The input numbers in the range
of 0.1 ≤ m < 0.5 are adjusted in the Mult1. The corresponding input m (selected from m, 2m, 3m and 5m) and e1 (obtained from the look-up table I) are sent to Reg2. In the first
iteration (2nd clock cycle), m and e1 are selected by Mux1
and Mux2 respectively for achieving the e1 m in Mult2. At
the same time, the m and 1 are chosen by Mux3 and Mux4
to obtain the 1 − m in 14-digit CLA adder. Then, e1m is
shifted left 1-digit to achieve 10e1 m and 1−m is shifted
left 2-digit to achieve 100(1−m ). Finally W [2] is obtained
by adding −10e1 m and 100(1 − m ) together in 14-digit
CLA adder. Then, W [2] is rounded to integer in Rounding
logic to obtain e2 . As the same time, e1 is chosen by Mux7
and sent to stage 2, so the value of log10 (1+e110−1 ) is obtained from look-up table II. This value is selected by Mux8
and adjusted constant (log10 (2), log10 (3) and log10 (5), 0)
is chosen by Mux9. Finally, the logarithms result L[2] is
obtained in a 16-digit CLA adder in stage 2.
From the second to the eighth iteration (3rd to 9th clock
cycle), W [j] is chosen by Mux1, and ej obtained from the
previous iteration, is selected by Mux2 and then, ej W [j] is
obtained in Mult2. Meanwhile, −ej and W [j] are chosen
by Mux3 and Mux4 to obtain the W [j]−ej in 14-digit CLA
adder. Then, ejW [j] out from Mult2 is shifted right (j−1)digit to achieve ej W [j]10−(j−1) , and W [j] − ej is shifted
left 1-digit to achieve 10(ej − W [j]). Finally W [j + 1] is
obtained by adding ej W [j]10−(j−1) and 10(W [j] − ej ) together in 14-digit CLA adder. Then, W [j + 1] is rounded
to integer in Rounding logic to obtain ej+1 . At the same
time, ej is chosen by Mux7 and sent to stage 2. The values
of log10 (1+ej 10−j ) are determined by look-up table II and
chosen by Mux8. The result of logarithm in previous iteration is chosen by Mux9, and then they are added together in
16-digit CLA adder to obtain the L[j].
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
From the ninth to the fifteenth iteration (10th to 16th
clock cycle), ej+1 is obtained from the same process as the
previous iterations. However, the logarithm results are approximated by ej 10−j/ ln(10) instead of from look-up table II. After 15 iterations, the final logarithms results are
obtained. This 7-digit FXP decimal logarithmic converter
takes 16 clock cycles to achieve decimal logarithms result
with 14-digit accuracy.
X
32
Sign
Combinational Field Exponent Continuation
5
1
20
28
coefficient
8
k-bias
Bin-to-BCD
BCD exponent 12
Leading-zero-Detector
12
Fix-Point Decimal
Log Converter
64
Exp_out- ‘1’
12
28
coefficient’
3-digit Decimal CLA Adder
12
log_int
4. A 32-bit DFP Logarithmic Converter
log_fra
10's Complement
log_int-1
64
64
1+log_fra
Mux 1
Mux 2
12
64
log_fraction
log_integer
Combine Log_int . Log_fra
exception cases:
“01”
Infinite
“10”
NaN
“00”
Normal
2
The architecture of the 32-bit DFP logarithmic converter
is shown in Figure 3. First of all, the 32-bit DFP number is
sent to a IEEE-754 decoder which unpacks the 32-bit DFP
format to 8-bit exponent, 28-bit coefficient and 2-bit signal to represent exception cases. Second, 8-bit binary unsigned exponent is converted to a 12-bit BCD representation
with a combinational Bin-to-BCD converter, which is implemented based on shift-and-add algorithms. The LeadingZero-Detector is defined to allow the 7-digit integer coefficient in the range of [0.1, 1). Meanwhile, the value of the
characteristic k with minus bias (k − bias) adds the BCD
exponent to represent the integer part of decimal logarithm
results. The 7-digit adjusted coefficient is then calculated
with FXP logarithmic converter and the 16-digit logarithm
is saved for the next faithful rounding.
If the integer field of the decimal logarithm result is positive, it is subtracted by ‘1’ and combined with 10’s complement of 16-digit FXP logarithm result to obtain a decimal logarithm result. Otherwise, the integer field is directly
combined with the 16-digit FXP logarithm result to achieve
an inexact logarithm result. The Shift register is to shift
inexact logarithm result to obtain the exact 36-bit coefficient part, 8-bit binary unsigned exponent field and 1-bit
sign field. The Rounding logic is to round the 36-bit coefficient to 28-bit faithful coefficient part by the round-halfeven algorithm. Finally, a 1-bit sign field, a 8-bit exponent
field, a 28-bit coefficient field and 2-bit signals for exceptional cases are coded in IEEE-754 coder to pack a faithful
32-bit DFP logarithm result.
We choose the DFP number, (−1)0 × 9999999 × 10−7 ,
as an example to illustrate the data flow of the proposed
32-bit DFP logarithmic architecture. The 32-bit DFP format of this number, represented in the hexadecimal format,
is “6DE3FCFF”. First, the IEEE-754 decoder decomposes
the 32-bit DFP format to a 8-bit unsigned binary exponent
“01011110” and a 28-bit coefficient “9999999” in the form
of BCD code. Second, the 8-bit unsigned binary exponent is converted to a 12-bit decimal BCD exponent “094”
in BIN-to-BCD converter; the adjusted 28-bit coefficient
“0.9999999” and the 3-digit characteristic k with minus bias
Unpacking
IEEE 754 Decoder with Input Register
exponent
12
4.1. Architecture
Coefficient Continuation
6
76
log_int. log_fra
Shift Register
36
1
sign
1
Sign
Rounding
28
coefficient_out
8
exponent_out
IEEE 754 Coder with Output Register
32
6
5
20
Combinational Field Exponent Continuation
Packing
Coefficient Continuation
32
R=Log10(X)
Figure 3. 32-bit DFP Logarithmic Converter.
(k − bias) “906” are achieved in Leading-Zero-Detector.
Third, the integer part of the logarithm result “000” is obtained by adding the BCD exponent “094” with the (k−bias)
“906” in the 3-digit decimal CLA adder. Meanwhile, the result of the 16-digit FXP logarithm, “0.0000000434294461”
is achieved in FXP decimal logarithmic converter. Fourth,
since the integer of the logarithm result is “000” which is
not positive, so it is directly combined with the 16-digit
FXP logarithm result to obtain the inexact logarithm result “000.0000000434294461”. Fifth, the exact 36-bit integer coefficient “434294461”, the 8-bit binary unsigned exponent “01011110”, and the 1-bit sign ‘1’ is obtained by
Shift register. Finally, the exact 36-bit integer coefficient is
rounded to 28-bit faithful integer coefficient “4342945” in
Rounding logic and the 32-bit DFP format of the logarithm
result “B1770ACD” in the hexadecimal format is obtained
in IEEE-754 coder.
4.2. Function Verification
This section presents the function verification platform
for verifying the proposed 32-bit DFP logarithmic converter. The function verification platform is implemented
in Xilinx University Program Virtex-II Pro Development
System [14] with Embedded Development Kit (EDK). This
system includes a Virtex-II PRO P30 FPGA configuration [14]. The proposed verification method is created in
Power PC with C language. First, the valid DFP test vectors which are coded to 32-bit DFP format are sent to 32-bit
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
Table 2. Critical Path of the Proposed 32-bit DFP Logarithmic Converter.
Reg2
M ux2
M ult2 Shif ter
M ux5
CLA
Rounding T otal delay(ns)
1.188 ns 1.564 ns 9.347 ns 1.438 ns 1.350 ns 5.519 ns
0.566 ns
20.97 ns
DFP logarithmic converter. The logarithm results calculated
by this converter are sent back to Power PC. Meanwhile,
these test vectors are calculated by Power PC to achieve the
accurate double precision BFP results as the benchmarks.
Finally, the logarithm results calculated by 32-bit DFP logarithmic converter are compared with these accurate results
obtained by Power PC. If they are not identical, the corresponding 32-bit DFP format test vector will be displayed
in personal computer for debugging the 32-bit DFP logarithmic converter. It is difficult to verify all the test vectors
(192 × 107 ) due to the infinite processing time in this verification platform, so 10,000 special cases (NaN, Infinite,
Zero, Subnormal) and 100,000 random test vectors are chosen and sent to this verification platform. The verification
results show that all these 32-bit DFP logarithm results calculated by the proposed 32-bit DFP logarithmic converter
are correct.
5. Experimental Results and Analysis
5.1. Implementation Results
The proposed 32-bit DFP logarithmic converter is modeled with VHDL and implemented in Virtex-II PRO P30
FPGA configuration. The proposed 32-bit DFP logarithmic
converter is synthesized with XST and placed and routed by
Xilinx ISE 9.1. It occupies 1 out of 16 GCLK I/O block, 66
out of 644 I/O blocks, and 2,842 out of 13696 slices. The
maximum clock frequency and latency are 47.7 MHz and 18
clock cycles respectively. The critical path of the proposed
architecture is in stage 1 of the FXP decimal logarithmic
converter which is highlighted in Figure 2 (dotted line) and
its details are available in Table 2. Furthermore, the proposed 32-bit DFP logarithmic converter is synthesized with
TMSC 0.18-um standard cell library and the implementation results indicate that its maximum frequency and area
are 107.9 MHz and 221589.66 unit.
Since there is no comparable decimal DFP logarithmic
converter, we compare the proposed decimal FXP logarithmic converter with the radix-8 binary FXP logarithmic converter [11] for two cases in different precisions (Case 1: 7digit and 24-bit; Case 2: 16-digit and 53-bit), because 1)
they have similar dynamic range for the normalized coefficients (223 < 107 < 224 ) for case 1, and (252 < 1016 <
253 ) for case 2; 2) they are implemented by same digitrecurrence algorithm with selection by rounding; and 3) the
radix-10 is close to radix-8. For the purpose of comparison,
the proposed decimal FXP logarithmic converter is synthesized with a TMSC 0.18-um standard cell library [13].
The synthesis results show that the worse case path delay
and area in the 7-digit decimal FXP logarithmic converter
are 8.25 ns and 145772.82 units; in the 16-digit decimal
FXP logarithmic converter are 9.28 ns and 236164.33 units.
Since the timing and area evaluation units in [11] are τ and
f a ( 1τ = the delay of 1-bit full adder, 1f a = the area of
1-bit full adder), we use the same units to represent the delay and area of decimal FXP logarithmic converter in this
paper3 . Table 3 shows the compared results of case 1 and
2, in which the proposed 7-digit architecture is 2.73 times
slower and 2.51 times larger than the 24-bit radix-8 binary
FXP logarithmic converter in case 1; and the proposed 16digit architecture is 2.38 times slower and 1.44 times larger
than the 53-bit radix-8 binary FXP logarithmic converter in
case 2. The reason is that 1) the number, in the form of BCD
code in the proposed architecture, is less efficient than the
binary number in the radix-8 binary FXP logarithmic converter and needs more resource to be implemented. 2) the
latency of decimal arithmetic, such as decimal CLA adder
and Multiple logic in Figure 2, is larger than the signed-digit
(SD) binary adder and SD Multiple logic in the architecture
of the radix-8 binary FXP logarithmic converters.
5.2. Scale to Decimal64 and Decimal128
Note that while decimal32 is only a storage format in
IEEE 754-2008 standard, decimal64 and decimal128 are
more accurate formats for decimal calculation. To explain how we scale the proposed 32-bit DFP logarithmic
converter to 64-bit and 128-bit converters, compliant with
decimal64 and decimal128 formats, we mainly discuss the
transformation of the core part, the decimal FXP logarithmic converter. The 7-digit coefficient field in decimal32 format is extended to the 16-digit and 34-digit in decimal64
and decimal128 formats respectively, so the decimal FXP
logarithmic converter should be able to achieve the 32-digit
and 68-digit accurate results in order to guarantee faithful
rounding for the 64-bit and 128-bit DFP logarithm results.
The main alterations of the decimal FXP logarithmic
converters for decimal64 and decimal128 are: 1) The digit
width of Mult1 in the stage 1 of the decimal FXP logarithmic converter (refer to Figure 2.) needs to be extended to
16-digit and 34-digit. 2) It needs to keep at least 32-digit
3 Note that the τ and f a are delay and area of 1-bit full adder (ADFULD4) in TMSC 0.18-um standard cell library[13].
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
Table 3. Hardware Performance Comparison.
Radix-10 Decimal Log Converter Radix-8 Binary Log Converter [11]
P recision
7-digit
16-digit
24-bit
53-bit
Areas
1630 f a
2640 f a
647 f a
1829 f a
Cycle time
17 τ
19 τ
7τ
8τ
N umber of cycles
9
18
8
18
Latency
153 τ
342 τ
56 τ
144 τ
and 68-digit precision for W [j] in order to obtain correct ej
during 33 and 69 iterations, therefore the digit width of CLA
adder, Mult2 and other blocks in the stage 1 of the decimal
FXP logarithmic converter need to be extended to 32-digit
and 68-digit. 3) The decimal FXP logarithm results can be
achieved by accumulating the values of − log10(1 + ej10−j )
in iteration j ≤ k and −ej10−j/ ln(10) in iteration j > k,
where k = 17 for 32-digit accuracy results, and k = 35 for
68-digit accuracy results4 . The digit width of CLA adder,
Mult3 and other blocks in the stage 2 of the decimal FXP
logarithmic converter need to be extended to 34-digit and
70-digit. 4) The look-up table I, where the values of e1
are stored, remains the same. However, the look-up table II needs to store the 34-digit and 70-digit values of
log10 (1 + ej 10−j ) when j is in the range of 1 ≤ j ≤ 17
and 1 ≤ j ≤ 35 for achieving 32-digit and 68-digit accurate
logarithm results. Furthermore, the 34-digit and 68-digit
adjustment constants, log10 (2), log10 (3) and log10 (5) are
stored in this table too. The size of look-up table II in the
decimal FXP logarithmic converter needs to be extended to
29×136 and 210×280 for decimal64 and decimal128 formats
respectively.
6. Conclusions
In this paper, we first present a 32-bit DFP format and
its related logarithm operation. Second, we develop a decimal digit-recurrence algorithm with selection by rounding
to achieve the radix-10 fixed-point (FXP) logarithm operation. Third, we construct the architecture of the 32-bit DFP
logarithmic converter which is implemented and verified on
an FPGA. Finally, we analyze implementation results of the
proposed architecture, and compare the proposed decimal
FXP logarithmic converter with a radix-8 binary FXP logarithmic converter for two cases. The compared results show
that the decimal FXP logarithmic converter is slower and
occupies more area than the binary FXP logarithmic converter. The presented architecture, however, can be optimized to achieve a faster speed or occupy a smaller area.
4 Note that e2 10−2j/2 ln(10) < 10−34 , j ≥ 17 for decimal64;
j
e2j 10−2j/2 ln(10) < 10−70 , j ≥ 35 for decimal128.
References
[1] IEEE standard 754-2008. IEEE standard for floating-point
arithmetic. IEEE Computer Society, Aug 2008.
[2] M. F. Cowlishaw. Densely Packed Decimal Encoding. IEEE
Computers and Digital Techniques, pp. 102-104, May 2002.
[3] M. F. Cowlishaw. Decimal Floating-Point: Algorism for
Computers. IEEE Symp. on Computer Arithmetic, pp. 104111, Jun 2003.
[4] A. Y. Duale, M. H. Decker, H.-G. Zipperer, M. Aharoni, and
T. J.Bohizic. Decimal Floating-Point in z9: An Implementation and Testing Perspective. J. IBM Res. and Dev., Jan
2007.
[5] L. Eisen, J. W. W. III, H.-W. Tast, N. Mading, J. Leenstra,
S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R.
Carlough. IBM POWER6 Accelerators: Vmx and dfu,. J.
IBM Res. and Dev., Nov 2007.
[6] M. D. Ercegovac, T. Lang, and P. Montuschi. Very HighRadix Division with Selection by Rounding and Prescaling.
IEEE Trans. on Computers, pp. 909-918, May 1994.
[7] L. Imbert, J. Muller, and F. Rico. A Radix-10 BKM Algorithm for Computing Transcendentals on Pocket Computers.
J. VLSI Signal Processing, pp. 179-186, Jun 2000.
[8] T. Lang and P. Montuschi. Very-High Radix Square
Root with Prescaling and Rounding and a Combined Division/Square Root Unit. IEEE Trans. on Computers, pp.
827-841, May 1999.
[9] T. Lang and A. Nannarelli. A Radix-10 Combinational Multiplier. IEEE Asilomar Conference on Signals, Systems and
Computers, pp. 313-317, Oct 2006.
[10] J. Muller. Elementary Functions, Algorithms and Implementation. Birkhauser.
[11] A. Pińeiro, M. D. Ercegovac, and J. D. Bruguera. HighRadix Logarithm with Selection by Rounding: Algorithm
and Implementation. J. VLSI Signal Processing, pp. 109123, May 2005.
[12] E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw. Decimal Floating-Point Support on the IBM System z10 Processor. J. IBM Res. and Dev., Jan 2009.
[13] Virtual Silicon Technology Inc. Native-18 Standard Cell
Library 0.18V TSMC Process, Sep 1999.
[14] Xilinx Inc. Xilinx University Program Virtex-II Pro Development System, Hardware Reference Manual.
[15] Y. You, Y. Kim, and J. Choi. Dynamic Decimal Adder Circuit Design by using the Carry Lookahead. IEEE Design
and Diagnostics of Electronic Circuits and Systems, pp. 242244, Apr 2006.
Authorized licensed use limited to: University of Saskatchewan. Downloaded on January 27, 2010 at 02:06 from IEEE Xplore. Restrictions apply.
Download