Lecture 4: Data Representation Floating Point Representation

Lecture 4: Data Representation Floating Point Representation Representing Floating Point Numbers ● ● ● A fixed point notation (e.g. two's complement) allows a range of positive and negative integers centered around 0 to be represented By assuming a fixed binary or radix point, this format would allow numbers with a fractional component to be represented But – this approach has limitations – ● ● very large numbers or very small fractions cannot be represented Floating point representation allows us to represent very large numbers and very small fractions Computers are equipped with specialised hardware that performs floating point arithmetic (FPU – Floating Point Unit) Scientific Notation ● Very large or very small numbers can be represented using scientific notation n = ± s x be ● defined by – – – – – +/- the sign of the number s the significand or mantissa e the exponent b is the base examples ● 0.00000000000000034 = 3.4 x 10-16 ● 976 000 000 000 000 = 9.76 x 1014 Fixed Width ● ● Assume that – s has three digits (base 10) and 0.1 <= |s| <= 0.999 – e has two digits (base 10) – and both have a sign We can represent numbers between -0.999 x 1099 and 0.999 x 1099 – with a magnitude that ranges from 0.100 x 10-99 to 0.999 x 1099 – with just 5 digits and two signs! Floating Point Representation 1 S ● ● Exponent 23 Significand / mantissa The left-most bit stores the sign of the number (0 = positive, 1 = negative) The exponent is stored using excess or biased notation – ● 8 a typical value (usually 2k-1 – 1, or 2k-1), known as the bias, is subtracted from the field to obtain the true exponent value (where k is the number of bits in the exponent) The significand is stored in the remaining 23 bits Excess or Biased Representation for n bits ● ● To convert a decimal number to n-bit excess or biased representation – add the bias (2n-1 or 2n-1 – 1) to the decimal number – convert the result to (n-bit) binary To convert n-bit excess or biased values to decimal – convert the n-bit excess value to its decimal equivalent – subtract the bias (2n-1 or 2n-1 – 1) from the result Excess or Biased Representation for 4 bits ● The sign of the exponent is encapsulated in the left-most bit of the number (positive numbers 1, negative 0) Decimal Bias 2k-1 Bias 2k-1- 1 Decimal Bias 2k-1 Bias 2k-1- 1 0000 0 -8 -7 1000 8 0 1 0001 1 -7 -6 1001 9 1 2 0010 2 -6 -5 1010 10 2 3 0011 3 -5 -4 1011 11 3 4 0100 4 -4 -3 1100 12 4 5 0101 5 -3 -2 1101 13 5 6 0110 6 -2 -1 1110 14 6 7 0111 7 -1 0 1111 15 7 8 Normalisation ● Floating point numbers are normalised in order to simplify operations – a normalised number is one in which the most significant digit of the significand is nonzero (i.e. 1 for base two) – the typical convention is that there is one bit to the left of the radix point ± 1.bbb ... b x 2±e – – where b is either binary digit (0 or 1) because the most significant bit is always one, it is unnecessary to store this bit (this bit is implicit) ● a 23-bit field can therefore store a 24-bit significand with a value in the half open interval [1, 2) Examples Example – Converting Floating Point to Decimal ● What is the decimal value of the following floating point number? Assume the bias is 2n-1 – 1 (for 8 bits, 27 – 1 = 127) 0 10010011 10100000000000000000000 – Determine the sign: sign value is 0 so the number is positive – Calculate the exponent ● Convert the exponent to decimal – ● 10010011excess = 27 + 24 + 21 + 20 = 147 Subtract the bias: 147 – 127 = 20, the exponent is 20 – Add the implicit 1. bit in front of the significand:1.101 – Convert the result 1.1012 x 220 to decimal – 1.1012 x 220 = 1.625 x 220 Changing Binary Fractions to Base 10 Fractions ● The integer is dealt with in the normal way – ● 1012 = 510 so 101.11012 = 5.?????10 To sort out the fraction – read the figures of the fraction as an integer and convert to base 10 ● – divide that number by 2 to the power of the number of the fraction columns ● ● 11012 = 1310 13 / 24 = 13 / 16 = 0.8125 Reassemble the result – 101.11012 = 5 + 13/16 or 5.812510 Changing Binary Fractions to Base 10 Fractions – an Alternative Method ● The integer is dealt with in the normal way – ● ● 1012 = 510 so 101.11012 = 5.?????10 To sort out the fraction use the base 2 column headings – 2-1 2-2 2-3 2-4 – ½ ¼ ⅛ 1/16 – 0.5 0.25 0.125 0.0625 – so 0.11012 = 0.5 + 0.25 + 0.0625 = 0.812510 Reassemble the result – 101.11012 = 5.812510 Example – Converting Decimal to Floating Point ● What is the floating point representation of -1.25 x 2-10 ? Assume the bias is 2n-1 – 1 (for 8 bits, 27 – 1 = 127) – The sign is negative so the sign bit is 1 – Convert the number to binary ● – Extract the significand (remove the implicit 1. bit and pad with 0s) ● – – 1.2510 = 110 + 0.2510 = 1.012 significand = 0100 0000 0000 0000 0000 000 Convert the exponent -10 to biased or excess notation ● add the bias to the exponent: -10 + 127 = 11710 ● convert the result to binary: 11710 = 011101018-bit excess -1.25 x 2-10 = 1 01110101 010000000000000000000000 Changing Base 10 Fractions to Binary Fractions ● The integer is dealt with in the normal way – ● ● 610 = 1102 so 6.37510 = 110.????2 To sort out the fraction, e.g. 0.375 – double the fraction and underline the integer part of the result – repeat the process by doubling the fraction part of the result until you have a whole number (or until you run out of space) – read the integer parts from top to bottom and place after the binary point ● 0.375 x 2 = 0.75 ● 0.75 x 2 = 1.5 ● 0.5 x 2 = 1.0 fraction part is 0.0112 Reassemble the number – 6.37510 = 110.0112 Expressible Numbers using a 32-bit word Expressible Numbers using a 32-bit word ● Using two's complement integer representation -231 to 231- 1 ● For the previous example floating-point format (1 bit for the sign, 8 bits for the exponent, 23 bits for the significand) the following ranges of numbers are possible – negative numbers between -(2 – 2-23) x 2128 and -2-127 – positive numbers between 2-127 and (2 – 2-23) x 2128 – only some numbers in these regions can be represented Numbers which cannot be represented ● Negative overflow – ● negative numbers less than -(2 – 2-23) x 2128 Negative underflow – negative numbers greater than -2-127 ● Zero ● Positive underflow – ● positive numbers less than 2-127 Positive overflow – positive numbers greater than (2 – 2-23) x 2128 Density ● ● It is important to note that we are not representing more individual values with floating-point notation – the maximum number of different values which can be represented is 2n where n is the number of bits Numbers that are represented using floating-point notation are not spaced evenly along the number line – the possible values get closer together near the origin (i.e. 0) and further apart as they move away from the origin Example (1) ● A representation with b = 2, a 1-bit sign bit, a 2-bit exponent e, and a 2-bit significand s, has 32 normalised numbers (16 positive and 16 negative values) 1.00 x 2-1 1.01 x 2-1 1.10 x 2-1 1.11 x 2-1 1.00 x 20 1.01 x 20 1.10 x 20 1.11 x 20 1.00 x 21 1.01 x 21 1.10 x 21 1.11 x 21 1.00 x 22 1.01 x 22 1.10 x 22 1.11 x 22 Example (2) significand 2-1 20 21 22 1.00 0.100 1.00 10.0 100 1.01 0.101 1.01 10.1 101 1.10 0.110 1.10 11.0 110 1.11 0.111 1.11 11.1 111 Range and Precision ● The size of the exponent determines the range of numbers that can be represented – ● The size of the significand determines the precision of the numbers that can be represented – ● the range of expressible numbers can be expanded by increasing the number of bits that are used to represent the exponent – this will decrease precision precision can be increased by increasing the number of bits that are used to represent the significand – this will decrease the range The only way to increase both range and precision is to use more bits – single-precision numbers, double-precision numbers IEEE 754 Single Precision Floating Point Format 1 S ● 8 Exponent 23 Significand / mantissa IEEE 754 single precision floating point format contains – 1 sign bit (s), 8 bit exponent, 23 bit significand ● the exponent is stored using biased representation – the bias is (2n-1 – 1) where n is the number of bits in the exponent

Lecture 4: Data Representation Floating Point Representation

Related documents

Products

Support

Lecture 4: Data Representation Floating Point Representation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib