Floating Point Presentation

Data Representation Floating Point 13/04/2015 1 Learning Objectives: Demonstrate an understanding of floating point representation of a real binary number. Normalise a real binary number Discuss the trade-off between accuracy and range when representing numbers. 13/04/2015 2 Binary Range Limited by the number of bits used to represent a number. More bits means a wider range. But even using 4 bytes (32 bits) to represent a number means that 4,278,190,080 is the largest number which can be held. 13/04/2015 3 Fixed Point Binary A number with a decimal point is known (strangely!) as a real number as opposed to an integer which is a whole number. We can extend the binary system to represent real numbers by reserving some bits for the real or fractional part. 8 0 4 1 2 1 1 0 . ½ 1 ¼ 1 1/8 1/16 0 0 6.75 = 0110.1100 13/04/2015 4 Fixed Point Binary Range The range is now even more limited as now some bits are reserved for the real / fractional part and so are no longer being used to hold higher numbers! The range is even more limited if we also wish to represent negative numbers, as the last bit will need to be a sign bit. 13/04/2015 5 Fixed Point Binary - Decimal Converter Try using it to ‘play’ with fixed point binary. 13/04/2015 6 Fixed Point Binary Precision Also the fractional part can only hold 4 places, any places after the first 4 will be either rounded or truncated, so precision will be lost. This might first appear to be accurate enough for most purposes. However, each binary digit after the point is worth half of the last not 1/10 like in decimal values.  Example shown on next slide. 13/04/2015 7 Fixed Point Binary Precision 110.1 = 6.5 110.11 = 6.75 We have missed out 6.51 to 6.74! This means accuracy is poor. 13/04/2015 8 Floating Point (Fractional Real Numbers) This increases the possible range of stored real numbers but not accuracy (this only achieved by using more bytes (bits):  e.g. Standard form (also referred to as "Scientific notation"):   1,400,000,000,000 (Decimal) = 1.4*1012 1.4 = mantissa, 12 = exponent A number is therefore held in two parts:  Mantissa Some websites also state that this is known more correctly as the Significand and that the Mantissa is the number before the decimal point. But Cambridge exams have always used the term Mantissa for the “whole digit string” e.g. 14 in the example above (at least up to now).  Exponent Could be represented: 14 12 if it was understood that the first part is the mantissa and second part is the exponent 13/04/2015 9 Mantissa & Exponent – 1 byte each Most exam questions appear to use 8 bits for the mantissa and 8 bits for the exponent. Try ‘playing’ with the Floating Point Binary Decimal Converter. Also use it whenever you need to for the rest of this presentation. 13/04/2015 10 Mantissa Represents the magnitude of the number and is the fractional part of the representation.  Place value of MSB is –1 and the other bits are ½, ¼. 13/04/2015 11 Exponent Represents the power of 2 by which the mantissa must be multiplied to give the original value. 13/04/2015 12 Positive Mantissa & Positive Exponent Denary -> Floating Point Binary 13/04/2015 13 6.5 Fixed point binary 6.5 = 6 ½ = 110 .1000 3 Add 0’s to right of the = 0 .1101 * 2 Binary mantissa and to left 11 = 0.1101 * 2 (before the sign bit) of the exponent. sign bits 0 1101 000 mantissa 13/04/2015 0 00000 11 exponent 14 Try this independently first. Using an 8 bit byte for the mantissa and another 8 bit byte for the exponent show 1.75 as a 2 byte, floating point number in two’s complement form. 13/04/2015 15 1.75 1.75 = 1 + ½ + ¼ = 1.11 (binary – fixed point) = 0.111 * 21 = 0.111 * 200000001 = 01110000 00000001 mantissa 13/04/2015 exponent 16 Positive Mantissa & Positive Exponent Floating Point Binary -> Denary 13/04/2015 17 01101000 00000011 00000011 = 3 0.1101000 * 23 = 110.1 Assumed binary point between sign bit and 2nd bit. 13/04/2015 = 6.5 18 Try this independently first. Using 8 bits for the mantissa, 8 bits for the exponent and storing the mantissa and the exponent in two’s complement form. Give the denary number which would have 01011000 00000011 as its binary, floating point representation. 13/04/2015 19 01011000 00000011 0000011 = 3 0.1011000 * 23 = 101.1000 Assumed binary point between sign bit and 2nd bit. 13/04/2015 = 5.5 20 Positive Mantissa & Negative Exponent Denary -> Floating Point Binary 13/04/2015 21 0.125 0.125 = 1/8 = 0.001 (binary – fixed point) 0.1 * 2 -2 -2 = - 00000010 two’s complement = 1 1111110 01000000 11111110 13/04/2015 22 Try this independently first. Using an 8 bit byte for the mantissa and another 8 bit byte for the exponent show 0.375 as a 2 byte, floating point number in two’s complement form. 13/04/2015 23 0.375 0.375 = ¼ + 1/8 = 0.011 (binary – fixed point) = 11 * 2-1 -1 = - 00000001 two’s complement = 1 1111111 = 0 1100000 13/04/2015 1 1111111 24 Positive Mantissa & Negative Exponent Floating Point Binary -> Denary 13/04/2015 25 01000000 11111110 negative 11111110 undo two’s complement - 00000010 = -2 0.1000000 * 2 -2 = 0.001 (binary – fixed point) = 1/8 = 0.125 13/04/2015 26 Try this independently first. Using 8 bits for the mantissa, 8 bits for the exponent and storing the mantissa and the exponent in two’s complement form. Give the denary number which would have 01100000 11111111 as its binary, floating point representation. 13/04/2015 27 01100000 11111111 11111111 = - 00000001 = -1 (decimal) undo two’s complement 0.1100000 * 2-1 = 0.01100000 = ¼ + 1/8 = 0.25 + 0.125 = 0.375 13/04/2015 28 Negative Mantissa & Positive Exponent Denary -> Floating Point Binary 13/04/2015 29 - 1.5 1.5 = 1.1 (binary) two’s complement - 1.1 = - 0.11 * 21 = 1 01 * 200000001 = 1 0100000 0 0000001 mantissa 13/04/2015 exponent 30 Try this independently first. Using an 8 bit byte for the mantissa and another 8 bit byte for the exponent show -1.25 as a 2 byte, floating point number in two’s complement form. 13/04/2015 31 - 1.25 - 1.25 = - 1 + ¼ = - 1.01 (binary – fixed point) = - 0.101 * 21 = 1 011 * 200000001 = 1 0110000 mantissa 13/04/2015 0 0000001 exponent 32 Negative Mantissa & Positive Exponent Floating Point Binary -> Denary 13/04/2015 33 11101000 00000011 00000011 undo two’s complement 00000011 = 3 1.1101000 * 23 = - 0.0011000 * 23 = - 0001.1 = - 1.5 •You may notice that as shown previously, -1.5 can also be shown as 1 0100000 0 0000001. •This is because 11101000 00000011 is not normalised which is something we will look at later. Try this independently first. Using 8 bits for the mantissa, 8 bits for the exponent and storing the mantissa and the exponent in two’s complement form. Give the denary number which would have 10111010 00000011 as its binary, floating point representation. 13/04/2015 35 10111010 00000011 0 0000011 = 3 1.0111010 * 23 = - 0.1000110 = - 0100.0110 = - (4 + ¼ + 1/8) = - 4.375 13/04/2015 36 Negative Mantissa & Negative Exponent Denary -> Floating Point Binary 13/04/2015 37 - 0.125 -0.125 = - 1/8 = - 0.001 (binary – fixed point) - 0.1 * 2-2 = -0.1 * 2-00000010 = 1 1000000 * 21 11111110 1 1000000 13/04/2015 11111110 38 Try this independently first. Using an 8 bit byte for the mantissa and another 8 bit byte for the exponent show -0.25 as a 2 byte, floating point number in two’s complement form. 13/04/2015 39 - 0.25 - 0.25 = - ¼ = - 0.01 (binary – fixed point) = - 0.1 * 2-1 = - 0.1000000 * 2-00000001 = 11000000 * 211111111 = 11000000 13/04/2015 11111111 40 Negative Mantissa & Negative Exponent Floating Point Binary -> Denary 13/04/2015 41 10000000 11111101 11111101 = - 00000011 = -3 1.0000000 * 2-3 = 1.0000000 * 2-3 = - 0.001 = - 1/8 = - 0.125 undo two’s complement Note that the mantissa looks the same in two’s complement form as in none two’s complement form because the last 1 is at the beginning. 42 Try this independently first. Using 8 bits for the mantissa, 8 bits for the exponent and storing the mantissa and the exponent in two’s complement form. Give the denary number which would have 10000000 11111110 as its binary, floating point representation. 13/04/2015 43 10000000 11111110 11111110 = - 0000010 =-2 1.0000000 * 2-2 = - 0.01 undo two’s complement = - 0.25 •You may notice that as shown previously, -0.25 can also be shown as 1 1000000 1 1111111. •This is because 10000000 11111110 is normalised which is something we will look at next. Denary -> Floating Point 1. 2. 3. Convert fractional part of denary number to fractions. Convert to fixed point binary (keep – sign if exists). Move binary point to left hand side of first 1 and count how many places and note direction needed. 0.number * 2^no.of places needed in step 3 (denary). 0.number * 2^no.of places needed in step 3 (binary). 4. 5. If moved right then use – sign and then flip for two’s complement  1st binary number - Remove binary point (keeping 1st 0) and add any necessary 0’s to right (to make 8 bits). 6. Convert to two’s complement if –tive and remove - sign. This is the Mantissa.   7. Add any necessary 0’s (before sign bit) to left of 2nd binary number (to make 8 bits, including sign bit). This is the Exponent. 13/04/2015 45 Floating Point -> Denary 1. Convert exponent to denary. 2. If sign bit = 1 then flip to convert from two’s complement. 3. Mantissa * 2^exponent (denary).   Convert mantissa from two’s complement if sign bit = 1 and insert for our benefit a – sign. Insert assumed binary point after the sign bit. 4. Move the binary point the exponent number of places (> +, < -). 5. Convert to denary as fixed point binary. 13/04/2015 46 Decimal Normalisation 34,568,000 = 3456.8 x 104 = 0.34568 x 108 = 3.4568 x 107 The last way is more efficient and is the typical “correct” way to use scientific notation. This form is called the normalised form. 13/04/2015 47 Floating Point Binary Normalisation In binary the normalised form is used to maximise efficiency and to have only one way to represent a number. The mantissa is said to be normalised if the first two bits are different.  For positive numbers, the first bit is always 0 and the second is always 1.  For negative numbers the first bit is always 1 and the second is always 0. 13/04/2015 48 Normalising Floating Point Numbers 1. Convert the exponent to denary. 2. Shift the mantissa (not the sign bit) as many places to left as necessary to achieve a leading 1 (if positive i.e. sign bit = 0) or a leading 0 (if negative i.e. sign bit = 1). 3. Subtract the number of places that were necessary from the exponent and convert back to binary. 13/04/2015 49 0 0001101 00000010 1. The exponent 00000010 = 2 2. The mantissa 0 0001101 has to be shifted (3x) left to achieve a leading 1 (not including the sign bit) i.e. 0 1101000 3. So exponent should be 2 – 3 = -1 = - 00000001 = 1 1111111 So normalised 01101000 13/04/2015 11111111 50 1 1111001 00000011 1. 00000011 = 3 2. 1 1111001 has to be shifted (4x) left to achieve a leading 0 (not including the sign bit). 3. So exponent should be 3 - 4 = -1 = - 00000001 = 1 1111111  So normalised 10010000 13/04/2015 11111111 51 Try this independently first. Normalise these floating point binary numbers. 11101000 00000011 11000000 11111111 13/04/2015 52 11101000 00000011 1. 00000011 = 3 2. 11101000 has to be shifted (2x) left to achieve a leading 0 (not including the sign bit) to make 10100000. 3. So exponent should be 3 – 2 = 1 = 00000001  So normalised 1 13/04/2015 0100000 0 0000001 53 11000000 11111111 1. 11111111 = - 0000001 = -1 (denary) 2. 1100000 has to be shifted (1x) left to make the 2nd bit 0 and achieve 10000000. 3. So exponent should be -1 – 1 = -2 = - 00000010 = 11111110  So normalised 10000000 13/04/2015 11111110 54 If you are asked to give the floating point binary form of a decimal and make sure it is normalised. Then convert as practised and normalise if necessary. 13/04/2015 55 Numbers are held in floating point form with one byte for the mantissa (fraction) and one byte for the exponent (characteristic). All values are held in two’s complement form and the mantissa is normalised. Using this format, write down the binary floating point values and the denary values of (i) the largest magnitude, positive number; (ii) the smallest magnitude, positive number; (iii) the largest magnitude, negative number; (iv) the smallest magnitude, negative number. (The denary values may be left as a product of a power of 2). 13/04/2015 56 Floating Point Binary - Decimal converter Either use own understanding or the Fixed Point Binary - Decimal Converter to help you do the last slide independently first. 13/04/2015 57 The largest magnitude, positive number that can be held in a floating point system using 8 bits for the mantissa and 8 bits for the exponent. 0 1111111 * 20 13/04/2015 1111111 = 127/128 * 2127 58 The smallest magnitude, positive number that can be held in a floating point system using 8 bits for the mantissa and 8 bits for the exponent. 0 1000000 * 21 13/04/2015 0000000 = 0.5 * 2-128 59 The largest magnitude, negative number; that can be held in a floating point system using 8 bits for the mantissa and 8 bits for the exponent. 1 0000000 * 20 13/04/2015 1111111 = - 1 * 2127 60 The smallest magnitude, negative number; that can be held in a floating point system using 8 bits for the mantissa and 8 bits for the exponent. 1 0111111 * 21 13/04/2015 0000000 = - 65/128 * 2-128 61 Improving Accuracy of Binary Floating Point Numbers If we want to improve accuracy we must use more bits for the mantissa by reducing the number of bits for the exponent.  As more digits could be represented after the binary point. However the range would be decreased as the exponent could not be as large as before.  So the power of two which the mantissa is multiplied by is decreased. 13/04/2015 62 Representing Zero Using the Floating Point Binary - Decimal converter:   Try representing 0 as a non-normalised binary floating point number. Now try representing 0 as a normalised floating point number? Can you? Why? 13/04/2015 63 Representing Zero A normalised value must have the first two bits of the mantissa different. Therefore one must be a 1 which must represent either -1 or + ½ , but not zero. 13/04/2015 64 Floating Point Binary You may now be thinking ‘If the range is so large why don’t we use floating point binary representation for all numbers (including integers)?’ However, it is more complicated to perform arithmetic on floating point numbers than integers and so they are slower to work with. Because of this floating point representation is only used with real fractional numbers or integers outside the range of +2 billion to -2 billion (which is the limit for 4 byte normal binary representation). 13/04/2015 65 Plenary Give the denary number which would have 01000000 00000000 as its binary, floating point representation in this computer. 13/04/2015 66 Plenary ½ 13/04/2015 or 0.5 67 Plenary Show   10½ -10½ as 2 byte, normalised, floating point numbers. 13/04/2015 68 Plenary 01010100 00000100 10101100 00000100 13/04/2015 69 Plenary Explain the effect on the   range accuracy of the numbers that can be stored if the number of bits in the exponent is reduced. 13/04/2015 70 Plenary Range is decreased because power of two which the mantissa is multiplying by is decreased. Accuracy is increased because more digits are represented after the binary point. 13/04/2015 71

Floating Point Presentation

Related documents

Products

Support

Floating Point Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib