Conversion between DPD and RBCD for on

advertisement
Conversion between DPD and RBCD for
on-line arithmetic computation
Sonia González, Carlos Garcı́a, Julio Villalba.
Summary— In recent years decimal arithmetic has
gained renewed interest with the ratification of the
IEEE 754-2008 Floating-point Standard. It specifies
formats for Decimal Floating-point (DFP) numbers
and uses Densely Packed Decimal (DPD) encoding to
store the significand of a DFP number. However, to
perform decimal arithmetic operations, DPD conversions to Binary Coded Decimal (BCD) are needed. In
order to deal with on-line arithmetic it is necessary
to use redundant number representation which prevents the carry propagation and allows the computation starting from the most significant digit (MSD). In
this paper we consider the Redundant Binary Coded
Decimal (RBCD) encoding and presents the design of
a DPD converter to RBCD representation for decimal on-line arithmetic units. The direct conversion
proposed in this paper (DPD to RBCD) supposes an
improvement over the two steps conversion required
by a regular computation (DPD to BCD and BCD to
RBCD).
Keywords— conversion, decimal floating-point, online arithmetic, densely packed decimal, redundant
binary coded decimal.
I. Introduction
D
ECIMAL arithmetic is present nowadays
thanks to the ratification of the IEEE 754-2008
Floating-point Standard. During last years there
have been a lot of activities in the design of specific
decimal arithmetic units. In fact, processors such
as IBM Power6, Power7, z9 and z10 [10], [5], [11]
include decimal floating-point units.
The standard includes two basic formats for Decimal Floating-point (DFP) numbers and specifies two
encodings for DFP significands known as the decimal
and the binary encoding. The decimal encoding uses
the Densely Packed Decimal (DPD) [3] encoding to
encode the significand. The main drawback of DPD
encoding is that it is not easy to perform computations with it. To resolve this problem, a DPD
number is converted to Binary Coded Decimal representation (BCD) and the operations are carried out
using this representation. Most of the recent proposed decimal arithmetic units based on DPD encoding [4], [14], [15], [16], [8], [9], [7] are designed
assuming this conversion. On the other hand, online arithmetic is based on serial computation starting from the Most Significant Digit (MSD). To avoid
the chain of carries a redundant representation of the
numbers is used in on-line arithmetic [6]. Redundant
Binary Coded Decimal (RBCD) [13] is a redundant
decimal representation where the BCD digits from 0
to 9 are represented with the digit set is {-7,...,7}.
In order to work with decimal on-line units two
steps are needed to convert from DPD to RBCD:
1 Dept. Computer Architecture, University of Málaga, email: sonia,cgarcia,julio@ac.uma.es
1
a conversion from DPD to BCD and from BCD to
RBCD. Although the DPD to BCD conversion is fast
in hardware, the BCD to RBCD conversion implies
the chain of a carry between digits. In this paper,
we propose a direct conversion DPD to RBCD by the
fusion of the tables and equations involved by the two
steps conversion, achieving a faster algorithm.
The rest of the paper is organized as follows. Section II describes the DFP formats specified in the
IEEE 754-2008 standard. Section III deals with the
Redundant Binary Coded Decimal numbers and the
on–line arithmetic requirements. Section IV presents
the direct conversion from DPD to RBCD. Section
V examines the implementation results, and finally,
Section VI presents the summary and conclusions of
this work.
II. Decimal Floating-Point format
Due to the importance of DFP arithmetic, IEEE
developed its standard for floating-point arithmetic
[1] by including specifications for DFP formats and
operations [1]. With IEEE 754-2008, the value of a
finite DFP number, x, is:
(−1)Sx × Cx × 10Ex −bias
where Sx is the sign bit, Ex is a biased exponent,
bias is a constant value that makes Ex non-negative,
and Cx is the significand, which is also referred to
as the coefficient. IEEE 754-2008 defines two basic DFP formats, decimal64 and decimal128, with
encodings lengths of 64 and 128 bits, respectively.
These formats are used to represent a finite subset of
real numbers including finite numbers, signed infinities and two different types of Not-a-Numbers (qNaN
and sNaN). In addition, the Standard specifies two
encodings for DFP significands; (1) a binary encoding, known as Binary Integer Decimal (BID), and
(2) a decimal encoding, known as Densely Packed
Decimal (DPD). With the BID encoding, the significand is represented using an unsigned binary integer.
With the DPD encoding the significand is represented using an unsigned decimal integer, in which three
decimal digits are encoding using ten bits [3]. With
either encoding, the significand of a DFP number is
not normalized, which means that a single DFP number may have multiple representations. More details
on the DFP formats and operations are provided in
[1].
III. Redundant Binary Coded Decimal
numbers
On–line arithmetic defines algorithms for serial
arithmetic operators that receive the inputs and ge-
vwxst p·q·u s·t·u
(ef gh)
Cout
0---- - 0
e=0
100-- - 0
0
(f gh) = (stu) + Cin1
110-- - 0
e=0
11101 0 0
(f gh) = (pqu) + Cin1
0---- - 1
100-- - 1
e = 1, f = 1
1
110-- - 1 g = Cin1 , h = Cin1
11101 1 101-- - e = u · Cin1
11100 - f = u · Cin1
1
11110 - g = u · Cin1
11111 - h = u ⊕ Cin1
nerate the output starting from the most-significant
digit (MSD first). The serial approach is advantageous because of the simplicity of the hardware and
the reduction in number and length of connections
among modules. Moreover, the MSD first alternative allows the implementation of operations, such as
division and square root, which are difficult to implement least-significant digit first. The drawback of
the serial approach is the number of cycles required;
however, this can be compensated by the overlap of
the execution of dependent operations. Thanks to
all these characteristics, on-line arithmetic is suitable for VLSI implementation.
vwxstp·q·r
(abcd)
Cout
0---- 0
100-- 0
a=0
0
101-- 0
(bcd) = (pqr) + Cin0
11110 0
0---- 1
100-- 1
a = 1, b = 1
1
101-- 1
c = Cin0 , d = Cin0
11110 1
110-- 11000 - a = r · Cin0 , b = r · Cin0 1
11101 - c = r · Cin0 , d = r ⊕ Cin0
11111 -
TABLE II
Obtaining (ef gh)
0 0 Cin1 p q u
0 0 Cin1 s t u
+
+
+
0
0
s
t
u
+
1 1 Cin1
1 1 Cin1
p
q
u
mux
mux
4
4
x
v
mux
v
u Cin1
V
TABLE I
Obtaining (abcd)
4
X
u
Cin1
V
mux
T
W
0 0 Cin0 p q r
+
S
V
+
1 1 Cin0
Cout2
0
p
q
r
efgh
mux
r Cin0
V
Fig. 2. Implementation obtaining efgh
4
W
r
Cin0
V
mux
V
S
T
X
Cout1
abcd
Fig. 1. Implementation obtaining abcd
To deal with on-line arithmetic it is necessary to
have a number representation system with no carry
propagation. In this way, it is possible to perform
the computation starting from the Most Significant
Digit (MSD). This is achieved by carry-save or signed
digit representations.
Therefore, to deal with decimal on-line arithmetic
a decimal redundant number systems is required.
The BCD code involved in the DPD format does not
fulfill this condition. Thus, a conversion step from
BCD to a redundant decimal system is needed. A
code that meets the required condition and which is
directly related to BCD code is the Redundant Binary Coded Decimal (RBCD) defined in [2].
A RBCD number is composed by digits of 4
bits which represent 15 numbers in the range
{−7, −6, ...0...6, 7}. It is a signed digit representation such as a positive number is coded as natural
binary whereas a negative number is coded as two‘s
complement. This code allows the computation with
no carry propagation for the decimal addition [2],
substraction, multiplication and division [12].
The conversion between BCD and RBCD can be
performed with no carry propagation whereas the
opposite conversion involves a borrow propagation.
Fortunately for the on–line arithmetic computation,
the most critical conversion is BCD to RBCD since
the MSD is required as soon as possible. DPD code
is only used for storage purposes and the conversion
from RBCD to DPD is performed only when the on–
line processing has finished.
The conversion from BCD to RBCD is performed
by a two steps algorithm [2]. In the first step we
detect if a number is greater or equal to 7 and we
(ijkm)
Cout
i=0
0
(jkm) = (wxy) + Cin2
i=0
0
(jkm) = (sty) + Cin2
i=0
0
(jkm) = (pqy) + Cin2
vwxst w·x·y s·t·y p·q·y
0---- 0
-
-
101-- -
0
-
110-11100
0---101-110-11100
100-11101
11110
11111
1
-
0
0
1
1
-
1
-
i = 1, j = 1
k = Cin2 , m = Cin2
1
1
TABLE III
Obtaining (ijkm)
+
0 0 Cin2 s t y
+
+
0
0
w
x
y
+
1 1 Cin2
1 1 Cin2
s
t
y
mux
0 0 Cin2 p q y
mux
+
0
p
q
y
v
+
1 1 Cin2
4
4
mux
4
mux
S
W
V
T
X
mux
V
W
y Cin2
4
V
W
X
y
Cin2
V
mux
V
S
T
X
Cout3
ijkm
Fig. 3. Implementation obtaining ijkm
add the amount of 6 in such a case. This provokes
an output carry. In the second step we add the input
carry to the result of the previous operation.
Let (abcd) a generic BCD digit of a BCD-coded
number, and Cin and Cout the input and output
carries respectively. The condition for a carry
generation is cout = ak(b · c · d) (see [2]), where the
symbol k means the logic OR operation and · is the
logical AND (notice that the carry only depends on
the current digit bits). Thus, the conversion is:
Cin0
Cin1
abcd
Cout1
Cin2
efgh
ijkm
Cout2
cout = ak(b · c · d)
(abcd)
if
(abcd) =
(abcd) + (0110) if
(1)
cout = 0
cout = 1
(2)
Second step:
(abcd) = (abcd) + cin
(3)
IV. Direct conversion from DPD to RBCD
i = y · Cin2
j = y · Cin2
k = y · Cin2
m = y ⊕ Cin2
0 0 Cin2 w x y
First step:
Cout3
Fig. 4. Global structure of the conversion
Let (pqrstuvwxy) the ten bits corresponding to a
DPD code. This code is converted to three BCD
digits (abcd)(efgh)(ijkm), in such a way that each
bit of the three BCD digits is obtained as a boolean
function of the DPD bits. In [1] a table conversion is
provided. On the other hand, conversion from BCD
to RBCD is performed by implementation of equations (1) through (3). What we propose is the combination of the table and the equations to provide a
direct table conversion, which is presented in tables
I, II and III. The resulting BCD code is composed by
three digits namely (abcd)(efgh)(ijkm) and they are
directly obtained from the DPD code (pqrstuvwxy).
In the tables the symbol · means the logical AND
operation, the symbol ⊕ corresponds to the logical
EXOR operation and the symbol + is the arithmetic
addition.
From these tables we can see that only logical operations are required as well as, for some cases, one
level of 3-bit arithmetic addition to add the input
carry (for example, in table I for the first case, the
bits (bcd) are obtained by the addition of the bits
(pqr) and a carry, whereas the bit a=0. Notice that
the maximum value of (bcd) is 6 and thus the addition of a carry never provokes an output carry). Nevertheless, the BCD to RBCD conversion proposed
in [2] involves two additions.
The implementation of the direct conversion is
shown in Fig. 1,Fig. 2 and Fig. 3 which are related to Tables I, II and III respectively. The implementation of Table I requires the use of only two
multiplexers and one 3-bit adder, while the implementation of Table II uses four multiplexers and two
3-bit parallel adders, and the implementation of Table III uses six multiplexers and three 3-bit parallel
adders.
Fig. 4 shows the global structure of the full conversion. The Cin0 is the carry input coming from
the previous conversion, and the Cout3 is the carry
output produced by the current conversion.
V. Experimental results
The DPD to RBCD design presented in this paper
have been implemented in Verilog, simulated using
ModelSim 6.0, and synthesized using Synopsys Design Compiler and the TSMC 65nm library in which
one cell unit has an area equal to 1 µm2 . Also,
we have implemented the conversion using two steps
(conversion from DPD to BCD [1] plus conversion
from BCD to RBCD [2]). Table IV shows the implementation results. Our approach is close to 27%
faster than the two steps algorithm. Nevertheless,
our design requires about 58% more area than the
two step processing.
The improvement in the time of our algorithm is
due to the fact that we use only one 3-bit parallel
addition operation in comparison with the two serial
4-bit additions required by the standard conversion.
Notice that the table conversion from DPD to BCD
involves only logical operations, in such a way that
addition has a high influence in the total computation time.
Two steps
Our design
DPD to RBCD
Time
Area
0.0744
913
0.0546
1449
TABLE IV
Implementation results
VI. Summary and Conclusion
In this paper we have presented a direct conversion
between DPD and RBCD which makes the computation in an on–line arithmetic system possible. The
proposed system obtains directly the RBCD digits
from a DPD data stream starting from the MSD.
The fusion of the two steps into one reduces significatively the computation time of the conversion with a
moderate increase of hardware.
The fast conversion proposed in this paper can
benefit to all the potential decimal on–line arithmetic
algorithms if these algorithms involve IEEE 754-2008
decimal floating point numbers.
References
[1]
American National Standards Institute and Institute of
Electrical and Electronic Engineers. 754-2008 IEEE standard for floating-point arithmetic,. IEEE Standard, Std
754-2008, 2008.
[2] D.Y.Y. Yun B. Shirazi and C.N. Zhang. RBCD: redundant binary coded decimal adder. IEE Proceedings Computer and Digital Techniques, 136:156–160, March 1989.
[3] M. F. Cowlishaw. Densely packed decimal encoding. In
IEE Proceedings - Computers and Digital Techniques,
volume 149, pages 102–104, May 2002.
[4] M. F. Cowlishaw. Decimal floating-point: Algorism for
computers. In Proceedings of the 16th IEEE Symposium
on Computer Arithmetic, pages 104–111, June 2003.
[5] A. Y. Duale, M. H. Decker, H.-G. Zipperer, M. Aharoni,
and T. J. Bohizic. Decimal floating-point in z9: An implementation and testing perspective. IBM Journal of
Research and Development, 51(1/2), 2007.
[6] M.D. Ercegovac and T. Lang. Digital Arithmetic. Morgan
Kaufmann, 2004.
[7] Steven R. Carlough Eric M. Schwarz. Power6 decimal
divide. In Proceedings of the 18th IEEE Symposium on
Application-specific Systems, Architectures and Processors, 2007.
[8] M. A. Erle, M. J. Schulte, and B. J. Hickmann. Decimal
floating-point multiplication via carry-save addition. In
Proceedings of the 18th IEEE Symposium on Computer
Arithmetic, 2007.
[9] B. Hickmann, A. Krioukov, M. A. Erle, and M. Schulte.
A parallel ieee p754 decimal floating-point multiplier. In
International Conference on Computer Designs, pages
296–303, October 2007.
[10] J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M.
Schwarz, and S. R. Carlough. Ibm power6 accelerators:
[11]
[12]
[13]
[14]
[15]
[16]
Vmx and dfu. IBM Journal of Research and Development, 51:1–21, November 200u.
E. M. Schwarz, J. S. Kapernick, and M. F. Cowlishaw.
Decimal floating-point support on the ibm system z10
processor. IBM Journal of Research and Development,
53(1):4:1 –4:10, 2009.
S.Gorgin and G. Jaberipur. Fully redundant decimal
arithmetic. In Proc. of 19th IEEE Symposium on Computer Arithmetic (ARITH 2009). IEEE Computer Society Press, 2009.
B. Shirazi, D.Y.Y. Yun, and C.N. Zhang. Rbcd: redundant binary coded decimal adder. Computers and Digital
Techniques, IEE Proceedings E, 136(2):156 – 160, March
1989.
L.-K. Wang and M. J. Schulte. Decimal floating-point
square root using Newton-Raphson iteration. In Proceedings of IEEE International Conference on ApplicationSpecific System, Architectures and Processors, pages
309–315, July 2005.
L.-K. Wang and M. J. Schulte. Decimal floating-point
adder and multifunction unit with injection-based rounding. In Proceedings of the 18th IEEE Symposium on
Computer Arithmetic, Montpellier, France, June 2007.
L.-K. Wang and M. J. Schulte. A decimal floating-point
divider using Newton-Raphson iteration. The Journal of
VLSI Signal Processing, pages 727–739, 2007.
Download