18 - FPS 1

advertisement
Floating Point
Arithmetic – Part I
Motivation



Floating point representation and manipulation are
considered a key aspect in computer design
FLOPS – Floating Point Operations Per Second gives a
rough performance estimate of computers that must
perform precise mathematical operations
Floating point operations are inherently more complex
than integer operations


In addition or subtraction exponents must be equal before the
operation
In division or multiplication exponents have to be added
together, and the result normalized




All floating point arithmetic can be performed by
treating individual parts of the representation as
integers
The IEEE FPS is a widely accepted standard and
will be the representation used in this lecture.
A hardware implementation to performing floating
point arithmetic provides circuits that do the
operations.
A software implementation will require less
hardware and uses a program code to perform the
operations.
Addition and Subtraction
A floating point number can be expressed as N where
N = (-1)s(m)(2e)
Conversion to
S,F,E
Conversion to
s,m,e
F = (m-1)2n
m = 1 + F/2n
E = e + 127
e = E -127
S=s
s=S




To add two floating point numbers A and B, we must
first align their radix points. Let A be a number such that
its exponent is smaller than B’s.
Aligning the radix points means shifting the fraction
corresponding to the smaller exponent.
We have to increment A’s exponent until it is equal to B’s.
At the same time, the contents of the mantissa of A
must be shifted to the right including the hidden bit with
the same amount the exponent of A was incremented.
We then add the mantissas of A and B.
Example 12.1
2.25 + 134.0625
0 1000 0000 (1)001 0000 0000 0000 0000 0000
0 1000 0110 (1)000 0110 0001 0000 0000 0000
0 1000 0110 (0)000 0010 0100 0000 0000 0000
0 1000 0110 (1)000 0110 0001 0000 0000 0000
0 1000 0110 (1)000 1000 0101 0000 0000 0000
Note that this is already normalized




In general, when adding two positive
mantissas, the range of the resulting mantissa
is
1 m < 4
If m < 2, it is already normalized. If m  2,
then it must be normalized.
Note that only a single shift is required since
it cannot be as large as four.
To normalize, simply add one to the
exponent of the result and shift the mantissa
to the right 1 bit position
Example 12.2
255.0625 + 134.0625
0 1000 0110 (1)111 1111 0001 0000 0000 0000
0 1000 0110 (1)000 0110 0001 0000 0000 0000
0 1000 0110(10)000 0101 0010 0000 0000 0000
“overflow”
To normalize: add 1 to the exponent and shift
the mantissa 1 bit to the right. The answer
is:
0 1000 0111 (1)000 0010 1001 0000 0000 0000





The exponents can be positive or negative.
If both numbers are negative, the “smaller
exponent” means more negative.
In a biased-127 representation, the “more negative
number” always has a smaller value for E. Note
that E is unsigned.
Negative mantissas can also be handled by the
same algorithm.
To add a negative mantissa, convert the mantissa
first to 2’s complement. Then convert the result
back to sign magnitude.
Example 12.3
2.25 + (-134.0625)
sign extend
0 1000 0110 (0)000 0010 0100 0000 0000 0000
1 1000 0110 (1)000 0110 0001 0000 0000 0000
mantissas:
0000 0000 0000 0010 0100 0000 0000 0000
1111 1111 0111 1001 1111 0000 0000 0000
1111 1111 0111 1100 0011 0000 0000 0000
1111 1111 1000 0011 1101 0000 0000 0000
1 1000 0110 (1)000 0011 1101 0000 0000 0000
answer




Subtraction can be achieved by simply adding
the additive inverse of a number
The exponents are aligned and the mantissas are
converted to 2’s complement.
The mantissas are then added.
The result, if there is a need, is normalized.
Example 12.4
135.901 - 135.861
0 1000 0110 (1)000 0111 1110 0110 1010 1000
1 1000 0110 (1)000 0111 1101 1100 0110 1010
Mantissas:
0000 0000 1000 0111 1110 0110 1010 1000
1111 1111 0111 1000 0010 0011 1001 0110
Unnormalized result:
0 1000 0110 (0)000 0000 0000 1010 0011 1110
Normalized result:
0 0111 1010 (1)010 0011 1110 0000 0000 0000
Subtracted 12
Adjusted 12 positions



If two numbers being compared are identical,
the resulting subtraction will result in a
mantissa of zero.
No shifting can move a one into the hidden
bit position, thus this condition must be
explicitly detected and E = F = 0 is set.
In subtraction, if the exponents of the
numbers vary by more than the precision of
the mantissa (24), the result of the shift will
obtain a zero for the smaller number
Download