Floating Point Engine using VHDL Najib Ghatte , Shilpa Patil

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
Floating Point Engine using VHDL
Najib Ghatte #1, Shilpa Patil#2, Deepak Bhoir#3
Fr. Conceicao Rodrigues College of Engineering
Fr. Agnel Ashram,
Bandstand, Bandra (W), Mumbai: 400 050, India
Abstract— Floating Point arithmetic is by far the most used way
of approximating real number arithmetic for performing
numerical calculations on modern computers. Each computer
had a different arithmetic for long time: bases, significant and
exponents’ sizes, formats, etc. Each company implemented its
own model and it hindered the portability between different
equipments until IEEE 754 standard appeared defining a single
and universal standard.
This paper deals with the implementation of single precision as
well as double precision floating point arithmetic adder as well as
multiplier according with the IEEE 754 standard and using the
hardware programming language VHDL.
VHDL Codes so designed are simulated for various set of inputs
and desired results are obtained. Codes are synthesized on device
XC3S5000 having package as FG900 of Spartan®-3 FPGA
family.
Keywords— Single precision, double precision, FPGA, Spartan,
IEEE-754, Floating-point arithmetic, VHDL, Xilinx
I. INTRODUCTION
This chapter briefly introduces all the topics that will be
encountered and described in the later parts of the thesis. It
starts with representation of floating point numbers in the
fixed-point format and then will manifest the IEEE-754
standard to express numbers in single precision (32-bit) and
double precision (64-bit) format. It also argue with the reason
for using floating-point numbers in computation. It discusses
the floating-point arithmetic, cost involved in implementing
higher precision than the existing floating-point hardware. It
also explains the differences between the fixed-pint and
floating-point representation of numbers.
A. Computer Representation of Real Numbers
Real numbers may be described as numbers that can
represent a number with infinite precision and are used to
measure continuous quantities. Almost all computations in
Physics, Chemistry, Mathematics or scientific computations,
all involve operations using real numbers. Computers can
only approximate real numbers, most commonly represented
as fixed-point and floating-point numbers. In a Fixed-point
representation, a real number is represented by a fixed number
of digits before and after the radix point. Since the radix point
is fixed, the range of fixed-point also is limited. Due to this
fixed window of representation, it can represent very small
ISSN: 2231-5381
numbers or very large numbers accurately within the available
range. A better way of representing real numbers is floatingpoint representation. Floating-point numbers represent real
numbers in scientific notation. They employ a sort of a sliding
window of precision or number of digits suitable to the scale
of a particular number and hence can represent of a much
wider range of values accurately.
Most processors designed for consumer applications, such
as Graphical Processing Units (GPUs)
and
CELL
processors promise and deliver outstanding floating
point performance for scientific applications while using the
single precision floating point arithmetic hardware [1]. Video
games rarely require higher accuracy in floating-point
operations, the high cost of extra hardware needed in
their implementation is not justified. The hardware cost of a
higher precision arithmetic is lot greater than single- precision
arithmetic. For example, one double precision or 64-bit
floating-point pipeline has approximately same cost as two to
four 32-bit floating-point pipelines. Most applications use 64bit floating point to avoid losing precision in a long sequence
of operations used in the computation, even though the finalresult may not be accurate to more than 32-bit precision. The
extra precision is used so the application developer does not
have to worry about having enough precision.
B. Fixed-Point Representation of Real Numbers
In computing, a fixed-point number representation is a real
data type for a number that has a fixed number of digits after
(and sometimes also before) the radix point (after the decimal
point '.' in English decimal notation) as shown in Fig. 1.
Fixed-point number representation can be compared to the
more complicated (and more computationally demanding)
floating-point number representation.
Fixed-point numbers are useful for representing fractional
values, usually in base 2 or base 10, when the executing
processor has no floating point unit (FPU) or if fixed-point
provides improved performance or accuracy for the
application at hand.
Most
low-cost
embedded
microprocessors
and
microcontrollers do not have an FPU. [2]
A value of a fixed-point data type is essentially an integer
that is scaled by a specific factor determined by the type. For
example, the value 1.25 can be represented as 1250 in a fixed-
http://www.ijettjournal.org
Page 198
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
Fig. 1 8-bit sign-magnitude fixed-point representation
point data type with scaling factor of 1/1000, and the value
1,250,000 can be represented as 1250 with a scaling factor of
1000. Unlike floating-point data types, the scaling factor is the
same for all values of the same type, and does not change
during the entire computation
If the set of all the numbers in that range is represented by
this method, placed on a number line, there lies an equal
interval between the two points as shown in Fig. 2.
Fig. 2 Fixed-point representation on number line
The scaling factor is usually a power of 10 (for
conventional convenience) or a power of 2 (for computational
efficiency). However, other scaling factors may be used
occasionally, e.g. a time value in hours may be represented as
a fixed-point type with a scale factor of 1/3600 to obtain
values with one-second accuracy.
Fig. 3 shows the fixed-point representation of 32-bit
numbers (1st bit reserved for sign magnitude, 15 bits for
exponent and remaining 16 bits for fractional part) and 64-bit
numbers (1st bit reserved for sign magnitude, 31 bits for
exponent and remaining 32 bits for fractional part). The
highest value that can be represented by 32-bit representation
is 32767.9999 and that with 64-bit representation is
2147483647.9999 as shown.
Fig. 3 Fixed-point representation: 32-bit and 64-bit
Limitations:
Since fixed-point operations can produce results that have
more bits than the operands, there is possibility for
information loss. For instance, the result of fixed-point
multiplication could potentially have as many bits as the sum
of the number of bits in the two operands. In order to fit the
ISSN: 2231-5381
result into the same number of bits as the operands, the answer
must be rounded or truncated. If this is the case, the choice of
which bits to keep is very important. When multiplying two
fixed-point numbers with the same format, for instance with I
integer bits, and Q fractional bits, the answer could have up to
2I integer bits, and 2Q fractional bits.
C. Floating-Point Representation of Real Numbers
The term floating point implicates that there is no fixed
number of digits before and after the decimal point; i.e. the
decimal point can float. Floating-point representations are
slower and less accurate than fixed-point representations, but
can handle a larger range of numbers. [3]
Because mathematics with floating-point numbers requires
a great deal of computing power, many microprocessors come
with a chip, called a floating point unit (FPU ), specialized for
performing floating-point arithmetic. FPUs are also called
math coprocessors and numeric co-processors.
II. IEEE-754 FLOATING-POINT STANDARD
Hardware supporting different floating-point precisions and
various formats have been adopted over the years. The
MASPAR MP1 supercomputer performed floating-point
operations using 4-bit slice operations on the mantissa with
special normalization hardware and supported 32-bit and
64-bit IEEE 754 formats. [4]
Floating-point representation has a complex encoding
scheme with three basic components: mantissa, exponent and
sign. Usage of binary numeration and powers of 2 resulted in
floating point numbers being represented as single precision
(32-bit) and double precision (64-bit) floating-point numbers.
Both single and double precision numbers are defined by the
IEEE 754 standard. According to the standard, a single
precision number has one sign bit, 8 exponent bits and 23
mantissa bits where as a double precision number comprises
of one sign bit, 11 exponent bits and 52 mantissa bits.
A. Special Values
IEEE 754 reserves exponent field values of all 0s and all 1s
to denote special values in the floating-point scheme as
explicated in [5] such as
+0, -0, +∞, -∞, positive
denormalised numbers, negative denormalised numbers,
Not a Number (NaN).
B. Exceptions in Floating-point Number Representation
An arithmetic exception arises when an attempted atomic
arithmetic operation has no result that would be acceptable
universally. IEEE 754 defines five basic types of floating
point exceptions: invalid operation, division by zero, overflow,
underflow and inexact as described in [6] and tabulated in
Table I.
C. Rounding Modes
Precision is not infinite and sometimes rounding a result is
necessary. IEEE-754 supports four rounding modes: Round to
nearest, Round towards zero, Round towards +∞,
Round towards -∞. [7]
http://www.ijettjournal.org
Page 199
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
TABLE I
IEEE-754 FLOATING POINT E XCEPTIONS
Exception
Invalid
Division by zero
Overflow
Underflow
Example
0/0, 0×∞,
x/0
> max
< min
Inexact
nearest(x op y) ≠ x op y
Default Result
NaN
±∞
±∞
Subnormals or
zero
Rounded Result
III. FLOATING POINT ARITHMETIC
Floating Point arithmetic is by far the most used way of
approximating real number arithmetic for performing
numerical calculations on modern computers.
A. Floating Point Adder
Following the established plan, [8] the way to do the
operations (addition) will be set. This point will be also used
to try to explain why these steps are necessary in order to
make clearer and easier the explanation of the code in the next
section.
( F1  2 E1 )  ( F2  2E2 )  F  2E
The different steps are as follows:
1. Compare exponents. If not equal, shift smaller
fraction to right and add 1 to exponent (repeat).
2. Add fractions.
3. If result is 0, adjust for proper 0.
4. If fraction overflow, shift right and add 1.
5. If un-normalized, shift left and subtract 1 (repeat).
6. Check for exponent overflow.
7. Round fraction. If not normalized, go to step 4.
1) VHDL Code for Addition of IEEE-754 Single Precision
Numbers
VHDL code for addition of single precision (32-bit)
numbers is being developed and then is simulated using
ModelSim SE Plus 6.5.
VHDL code was break down into seven possible states
where each state represents steps observed in algorithm to
carry out the addition process. St signal is asserted to start the
process. Overflow and underflow are indicated as an
exception signal. As soon as addition is completed, it asserts
the done signal to indicate the termination of simulation.
Various sets of inputs are fed to the block to get the results.
The further part of the document deals with simulation and
synthesis results.
A. ModelSim Simulation
Consider addition of two decimal numbers, 12.3 and
1250.36 fed as FPinput1 = 12.3 (0 10000010
10001001100110011001101) and FPinput2 = 1250.36 (0
10001001 00111000100101110000101) to the algorithm to
get the desired output as FPsum = 1262.66 (0 10001001
00111011101010100011110) as shown in Fig. 4.
ISSN: 2231-5381
B. Xilinx ISE Synthesis
VHDL Code for addition of IEEE-754 Single Precision
(32-bit) numbers are then synthesized for device XC3S5000
having package as FG900 of Spartan®-3 FPGA family. From
the datasheet cited in [9], this device has following attributes
manifests in Table II.
TABLE II
SPARTAN®-3 FPGA FG900 XC3S5000 ATTRIBUTES
Device
System Gates
Equivalent Logic Cells
CLB Array
(One CLB=Four Slices
Row
Column
Total
Total Slices
Max. User I/O
Dedicated Multiplier
xc35000
5M
74,880
104
80
8,320
33,280
633
104
Table III shows the Device Utilisation Summary of the
VHDL code, so written, it is been observed that number of
device parameters used are very less. Hence, an optimum
Device Utilisation is obtained.
TABLE III
FLOATING POINT ADDITION (SINGLE PRECISION):
DEVICE UTILISATION SUMMARY
From the timing report obtained, it is found that the
maximum frequency at which it can operate is 67.925 MHz.
(maximum time - period of 14.722 ns). Minimum input arrival
time before clock (input setup time) is 9.323 ns and maximum
output required time after clock (output setup time) comes
out as 7.1 ns.
2) VHDL Code for Addition of IEEE-754 Double Precision
Numbers
VHDL code for addition of double precision (64-bit)
numbers is being developed and then is simulated using
ModelSim SE Plus 6.5.
VHDL code was break down into seven possible states
where each state represents steps observed in algorithm to
carry out the addition process. St signal is asserted to start the
process. Overflow and underflow are indicated as an
exception signal. As soon as addition is completed, it asserts
the done signal to indicate the termination of simulation.
http://www.ijettjournal.org
Page 200
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
Various sets of inputs are fed to the block to get the results.
The further part of the document deals with simulation and
synthesis results.
A. ModelSim Simulation
Consider addition of two decimal numbers, 1.25 and
6500.12 fed as FPinput1 = 1.25 (0 01111111111
01000000000000000000000000000000000000000000000000
00) and FPinput2 = 6500.12 (0 10000001011
10010110010000011110101110000101000111101011100001
01) were fed to the algorithm to get the desired output as
FPsum
=
6501.37
(0
10000001011
10010110010101011110101110000101000111101011100001
01) as shown in Fig. 5.
B. Xilinx ISE Synthesis
VHDL Code for addition of IEEE-754 Double Precision
(64-bit) numbers are then synthesized for device XC3S5000
having package as FG900 of Spartan®-3 FPGA family. From
the datasheet cited in [9], this device has following attributes
manifests in Table II.
Table IV shows the Device Utilisation Summary of the
VHDL code, so written, it is been observed that number of
device parameters used are very less. Hence, an optimum
Device Utilisation is obtained.
TABLE IV
FLOATING POINT ADDITION (DOUBLE PRECISION):
DEVICE UTILISATION SUMMARY
2.
3.
4.
5.
Place the decimal point in the result.
Add the exponents; i.e. (E1 + E2 – Bias)
Obtain the sign; i.e. S1 xor S2
Normalize the result; i. e. obtaining 1 at the MSB of
the result’s significand.
6. Rounding the result to fit in the available bits
7. Checking for underflow/overflow occurrence
1) VHDL Code for Multiplication of IEEE-754 Single
Precision Numbers
VHDL code for multiplication of single precision (32-bit)
numbers is being developed and then is simulated using
ModelSim SE Plus 6.5.
VHDL code is break down into small components to handle
normalisation, rounding, packing and unpacking respectively.
Operands are tested for special numbers such as zero, infinity
and NaN. Exponents are added and significands are further
multiplied. Sign bit is computed with XOR operation.
Various sets of inputs are fed to the block to get the results.
The further part of the document deals with simulation and
synthesis results.
A. ModelSim Simulation
Consider multiplication of two decimal numbers, 102.3 and
1.253 fed as FP_A = 102.3 (0 100001011
0011001001100110011010) and FP_B = 1.253 (0 011111110
1000000110001001001110) were fed to the algorithm to get
the desired output as FP_Z = 128.1819 (0 10000110
00000000010111010010001) as shown in Fig. 6.
B. Xilinx ISE Synthesis
VHDL Code for multiplication of IEEE-754 Single
Precision (32-bit) numbers are then synthesized for device
XC3S5000 having package as FG900 of Spartan®-3 FPGA
family. From the datasheet cited in [9], this device has
following attributes manifests in Table II.
Table V shows the Device Utilisation Summary of the
VHDL code, so written, it is been observed that number of
device parameters used are very less. Hence, an optimum
Device Utilisation is obtained.
From the timing report obtained, it is found that the
maximum frequency at which it can operate is 59.912 MHz.
(maximum time - period of 16.691 ns). Minimum input
arrival time before clock (input setup time) is 9.601 ns and
maximum output required time after clock (output setup time)
comes as 7.242 ns.
B. Floating Point Multiplier
Following the established plan, [10] the way to do the
operations (multiplication) will be set. This point will be also
used to try to explain why these steps are necessary in order to
make clearer and easier the explanation of the code in the next
section.
The different steps are as follows:
1. Multiply the significands: i.e. (F1 × F2).
ISSN: 2231-5381
TABLE V
FLOATING POINT M ULTIPLICATION (SINGLE PRECISION):
DEVICE UTILISATION SUMMARY
From the timing report obtained, it is found that the
maximum combinational path delay is 39.507 ns. Maximum
http://www.ijettjournal.org
Page 201
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
combinational path delay is only for paths that start at an input
to the design and go to an output of the design without being
clocked along the way.
2) VHDL Code for Multiplication of IEEE-754 Double
Precision Numbers
VHDL code for multiplication of double precision (64-bit)
numbers is being developed and then is simulated using
ModelSim SE Plus 6.5.
VHDL code is break down into small components to handle
normalisation, rounding, packing and unpacking respectively.
Operands are tested for special numbers such as zero, infinity
and NaN. Exponents are added and significands are further
multiplied. Sign bit is computed with XOR operation.
Various sets of inputs are fed to the block to get the results.
The further part of the document deals with simulation and
synthesis results.
A. ModelSim Simulation
Consider multiplication of two decimal numbers, 1.25 and
6500.12 fed as FP_A = 1.25 (0 01111111111
01000000000000000000000000000000000000000000000000
00)
and
FP_B
=
6500.12
(0
10000001011
10010110010000011110101110000101000111101011100001
01) were fed to the algorithm to get the desired output as
FP_Z
=
8125.15
(0
10000001011
11111011110100100110011001100110011001100110011001
10) as shown in Fig. 7.
B. Xilinx ISE Synthesis
VHDL Code for multiplication of IEEE-754 Double
Precision (64-bit) numbers are then synthesized for device
XC3S5000 having package as FG900 of Spartan®-3 FPGA
family. From the datasheet cited in [9], this device has
following attributes manifests in Table II.
Table VI shows the Device Utilisation Summary of the
VHDL code, so written, it is been observed that number of
device parameters used are very less. Hence, an optimum
Device Utilisation is obtained.
TABLE VI
FLOATING POINT M ULTIPLICATION (DOUBLE PRECISION):
DEVICE UTILISATION SUMMARY
From the timing report obtained, it is found that the
maximum combinational path delay is 40.608 ns. Maximum
combinational path delay is only for paths that start at an input
to the design and go to an output of the design without being
clocked along the way.
IV. CONCLUSION
The importance and usefulness of floating point format
nowadays does not allow any discussion. Any computer or
electronic device, which operates with real numbers,
implements this type of representation and operation.
Throughout the report, IEEE-754 compliant floating-point
representation is explained. Also, it dealt with floating-point
arithmetic namely, addition and multiplication.
The VHDL code has been implemented so that all the
operations are carried out with combinational logic, which
reaches a faster response because there are not any sequential
devices as flip-flops which delays the execution time
VHDL code for addition and multiplication of single
precision (32-bit) and double precision (64-bit) has been
developed and simulated over ModelSim Plus 6.5. Codes were
tested with special numbers such as zero, infinity (positive as
well as negative) and NaN. Desired results were obtained and
thereby an efficient algorithm was developed.
Codes were synthesized on device XC3S5000 having
package as FG900 of Spartan®-3 FPGA family. By means of
Xilinx ISE 14.5, synthesis results were obtained. Device
Utilization Summary specifies the amount of IOBs and LUTs
utilized. Also, maximum frequency at which algorithm can
operate and static power dissipation is estimated.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
ISSN: 2231-5381
William R. Dieter, Akil Kaveti, Henry G. Dietz, “Low-Cost
Microarchitectural Support for Improved Floating-Point Accuracy”,
March 2007.
Pat Kusbel, “Control and Computing System for the Compact
Microwave Radiometer for Humidity Profiling” B.Tech thesis,
Department of Electrical and Computer Engineering, Colorado State
University, March 2006.
Alex N. D. Zamfirescu, “Floating Point Type for Synthesis”, CA USA,
2000.
John J. G. Savard (2012) Floating-Point Formats [Online]. Available:
http://www.quadibloc.com/comp/cp0201.htm
Steve Hollasch, (2005) IEEE Standard 754 Floating Point Numbers
[Online]. Available:
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
Sun Microsystems, “Numerical Computation Guide”, Ch. 4 Exception
and Exception Handling.
Nathan Whitehead, Alex Fit-Florea, “Precision & Performance:
Floating Point and IEEE 754 Compliance for NVIDIA GPUs”, 2011
Arturo Barrabés Castillo, “Design of single precision Float Adder (32Bit Numbers) according to IEEE 754 Standard using VHDL” M.Tech,
thesis, Slovenská Technical University, Apr. 2012
Spartan-3 FPGA Family, Xilinx, 2013.
R. Sai Siva Teja, A. Madhusudhan, “FPGA Implementation of LowArea Floating Point Multiplier Using Vedic Mathematics” in
International Journal of Emerging Technology and Advanced
Engineering, Dec 2013.
http://www.ijettjournal.org
Page 202
International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 4- Feb 2014
MODELSIM SIMULATION RESULTS
Fig. 4 Floating Point Addition (Single Precision): Timing Diagram
12.3 + 1250.36 = 1262.66
Fig. 5 Floating Point Addition (Double Precision): Timing Diagram
1.25 + 6500.12 = 6501.37
Fig. 6 Floating Point Multiplication (Single Precision): Timing Diagram
102.3 × 1.253 = 128.1819
Fig. 7 Floating Point Multiplication (Double Precision): Timing Diagram
1.25 × 6500.12 = 8125.15
ISSN: 2231-5381
http://www.ijettjournal.org
Page 203
Download