# Floating Point Numbers ```Floating Point Numbers

In the decimal system, a decimal point
(radix point) separates the whole numbers
from the fractional part

Examples:
37.25 ( whole=37, fraction = 25)
123.567
10.12345678
Floating Point Numbers
For example, 37.25 can be analyzed as:
101
100
10-1
10-2
Tens Units
Tenths Hundredths
3
7
2
5
37.25 = 3 x 10 + 7 x 1 + 2 x 1/10 + 5 x 1/100
Binary Equivalent
The binary equivalent of a floating point
number can be computed by computing
the binary representation for each part
separately.


whole part: subtraction or division
Fractional part: subtraction or multiplication
Binary Equivalent
In the binary representation of a floating
point number the column values will be as
follows:
… 26 25 24 23 22 21 20 . 2-1 2-2 2-3
2-4 …
… 64 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16 …
… 64 32 16 8 4 2 1 . .5 .25 .125 .0625 …
Finding Binary Equivalent of fraction part
Converting .25 using Multiplication method.
Step 1 : multiply fraction by 2 until fraction becomes 0
.25
x2
0.5
x2
1.0
Step 2 Collect the whole parts and place them after the radix point
64 32 16 8 4 2 1 . .5 .25 .125 .0625
. 0
1
Finding Binary Equivalent of fraction part
Converting .25 using subtraction method.
Step 1: write positional powers of two and column
values for the fractional part
. 2-1 2-2 2-3 2-4 2 -5
. &frac12; &frac14; 1/8 1/16 1/32
. .5 .25 .125 .0625 0.03125
Finding Binary Equivalent of fraction part
Converting .25 using subtraction method.
Step 2: start subtracting the column values from left to right,
place a 0 if the value cannot be subtracted or 1 if it can
until the fraction becomes .0 .
.25
2 1 . .5 .25 .125 .0625
- .25
. 0 1
.0
Binary Equivalent of FP number
Given 37.25, convert 37 and .25 using subtraction method.
64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1 0 0 1 0 1 . 0 1
37
.25
- 32
- .25
5
.0
-4
1
-1
0
37.2510 = 100101.012
So what is the Problem?
Given the following binary representation:
37.2510 = 100101.012
7.62510 = 111.1012
0.312510 = 0.01012
How we can represent the whole and
fraction part of the binary rep. in 4 bytes?
Solution is Normalization
Every binary number, except the one
corresponding to the number zero, can be
normalized by choosing the exponent so that the
radix point falls to the right of the leftmost 1 bit.
37.2510 = 100101.012 = 1.0010101 x 25
7.62510 = 111.1012 = 1.11101 x 22
0.312510 = 0.01012 = 1.01 x 2-2
So what Happened ?
After normalizing, the numbers now have
different mantissas and exponents.
37.2510 = 100101.012 = 1.0010101 x 25
7.62510 = 111.1012 = 1.11101 x 22
0.312510 = 0.01012 = 1.01 x 2-2
IEEE Floating Point Representation
Floating point numbers can be represented
by binary codes by dividing them into three
parts:
the sign, the exponent, and the mantissa.

1
2
9
10
32
IEEE Floating Point Representation

The first, or leftmost, field of our floating point
representation will be the sign bit:
 0 for a positive number,
 1 for a negative number.
IEEE Floating Point Representation


The second field of the floating point number will be the
exponent.
Since we must be able to represent both positive and
negative exponents, we will use a convention which uses
a value known as a bias of 127 to determine the
representation of the exponent.



An exponent of 5 is therefore stored as 127 + 5 or 132;
an exponent of -5 is stored as 127 + (-5) OR 122.
The biased exponent, the value actually stored, will
range from 0 through 255. This is the range of values that
can be represented by 8-bit, unsigned binary numbers.
IEEE Floating Point Representation

The mantissa is the set of 0’s and 1’s to the
left of the radix point of the normalized
(when the digit to the left of the radix point
is 1) binary number.


ex:1.00101 X 23
The mantissa is stored in a 23 bit field,
Converting decimal floating point values to stored IEEE
standard values.
Example: Find the IEEE FP representation of
40.15625.
Step 1.
Compute the binary equivalent of the whole part
and the fractional part. ( convert 40 and .15625.
to their binary equivalents)
Converting decimal floating point values to stored IEEE
standard values.
40
- 32
8
- 8
0
Result:
101000
.15625
- .12500
.03125
- .03125
.0
So: 40.1562510 = 101000.001012
Result:
.00101
Converting decimal floating point values to stored IEEE
standard values.
Step 2. Normalize the number by moving
the decimal point to the right of the
leftmost one.
101000.00101 = 1.0100000101 x 25
Converting decimal floating point values to stored IEEE
standard values.
Step 3. Convert the exponent to a biased
exponent
127 + 5 = 132
==&gt;
13210 = 100001002
Converting decimal floating point values to stored IEEE
standard values.
Step 4. Store the results from above
Sign
0
Exponent (from step 3)
10000100
Mantissa ( from step 2)
01000001010 .. 0
Covert 40.15625 to IEEE 32-bit format
40.15625
Step 1
Find binary of whole number
40-32-8
Binary value of 40
32 16 8 4 2 1
1 0 1 0 0 0
Step 1b
Find binary of fraction number
.15625 -.1250 - .03125
Binary value of .15625
.5 .25 .125 .0625 .03125
0 0
1
0
1
Step 2a
Move decimal (Radix point) to the
right of the left most 1 to come up with
the exponent
.
101000.00101
Step 2b
Normalize by multipling the number by 2
and the exponent from step 3
Equals
101000.00101 x 25 = 1.0100000101
Step 3a
convert the exponent to a biased exponent
by adding the bias of 127 to the exponent
127 + 5 = 13210
Step 3b
Convert the biased exponent to its binary
value
Binary value of 132
132-128-4
64 32 16 8 4 2 1
1 0 0 0 1 0 0
Step 4a
Piece together values.
Step 4c. Left pad exponent and right
pad mantissa to determine the binary
equivalent of IEEE
Sign Exponent
0
10000100
0 010000100
Mantissa
0100000101
01000001010…0
Sign Exponent
0 010000100
Mantissa
01000001010…0
Step 4b
Move to mantissa
without the 1.
Converting decimal floating point values to stored IEEE
standard values.
Ex : Find the IEEE FP representation of –24.75
Step 1. Compute the binary equivalent of the whole
part and the fractional part.
24
.75
- 16 Result:
- .50 Result:
8
11000
.25
.11
- 8
- .25
0
.0
So: -24.7510 = -11000.112
Converting decimal floating point values to stored IEEE
standard values.
Step 2.
Normalize the number by moving the
decimal point to the right of the leftmost
one.
-11000.11 = -1.100011 x 24
Converting decimal floating point values to stored IEEE
standard values.
Step 3. Convert the exponent to a biased exponent
127 + 4 = 131
==&gt; 13110 = 100000112
Step 4. Store the results from above
Sign
Exponent
mantissa
1
10000011
1000110..0
Converting from IEEE format to the decimal floating
point values.

Do the steps in reverse order

In reversing the normalization step move
the radix point the number of digits equal
to the exponent. if exponent is +ve move to
the right, if –ve move to the left.
Converting from IEEE format to the decimal floating
point values.
Ex: Convert the following 32 bit binary numbers to
their decimal floating point equivalents.
a.
Sign
Exponent
Mantissa
1
01111101
010..0
Converting from IEEE format to the decimal floating
point values.
Step 1 Extract exponent (unbias exponent)
biased exponent = 01111101 = 125
exponent: 125 - 127=
-2
Converting from IEEE format to the decimal floating
point values.
Step 2 Write Normalized number
Exponent
1 . ____________ x 2
mantissa
-1. 01 x 2 –2
----
Converting from IEEE format to the decimal floating
point values.
Step 3: Write the binary number (denormalize value
from step2)
-0.01012
Step 4: Convert binary number to FP equivalent (
-0.01012 = - ( 0.25 + 0.0625) = -0.3125
Converting from IEEE format to the decimal floating
point values.
Ex: Convert the following 32 bit binary
numbers to their decimal floating point
equivalents.
Sign
0
Exponent
10000011
Mantissa
1101010..0
Converting from IEEE format to the decimal floating
point values.
Step 1 Extract exponent (unbias exponent)
biased exponent = 10000011 = 131
exponent: 131 - 127= 4
Converting from IEEE format to the decimal floating
point values.
Step 2 Write Normalized number
Exponent
1 . ____________ x 2
mantissa
1. 110101 x 2 4
----
Converting from IEEE format to the decimal floating
point values.
Step 3 Write the binary number (denormailze value
from step 2)
11101.012
Step 4 Convert binary number to FP equivalent ( add
column values)
11101.012 = 16 + 8 + 4 + 1 + 0.25 = 29.2510
Convert
0 10000100 01000001010…0 back to IEEE
Sign Exponent
0 010000100
Step 1a
Determine the decimal of the binary
number
Step 1b
Unbias the number by Subtracting 127
from the decimal number to determine
the exponent
Mantissa
01000001010…0
128 32 16 8 4 2 1
1 0 0 0 1 0 0
128+4
Binary value = 132
Step 2
Denormalize by multipling the number by 2 and
the exponent from step 2.
Set up format 1 mantissa x 2 exponent
132 – 127 = 5
1.0100000101 x 25
.
Step 3a
Move decimal (Radix point) to the right of
the left most 1 to come up with the exponent
Equals
101000 . 001012
Step 3b
Convert binary number to FP equvalent
Step 4a
Find whole number of exponent
32 16 8 4 2 1
1 0 1 0 0 0
.5 .25 .125 .0625 .03125
0 0
1
0
1
32+8 = 40
.1250 + .03125 = .15625
Step 4c
Add together values. Make sure to include
the sign if it is a negative value
40.15625
Step 4b
Find fractional numberof
the mantissa
```