Floating Point Representation Major: All Engineering Majors Authors: Autar Kaw, Matthew Emmons http://numericalmethods.eng.usf.edu Numerical Methods for STEM undergraduates 7/12/2016 http://numericalmethods.eng.usf.edu 1 Floating Decimal Point : Scientific Form 256.78 is written as 2.5678 10 2 0.003678 is written as 3.678 10 3 256.78 is written as 2.5678 10 2 2 http://numericalmethods.eng.usf.edu Example The form is or sign mantissa 10exponent m 10e Example: For 2.5678 10 2 1 m 2.5678 e2 3 http://numericalmethods.eng.usf.edu Floating Point Format for Binary Numbers y m 2 sign of number 0 for ve, 1 for - ve e m mantissa 12 m 102 1 is not stored as it is always given to be 1. e integer exponent 4 http://numericalmethods.eng.usf.edu Example 9 bit-hypothetical word the the the the first bit is used for the sign of the number, second bit for the sign of the exponent, next four bits for the mantissa, and next three bits for the exponent 54.7510 110110.112 1.10110112 25 1.10112 1012 We have the representation as 0 Sign of the number 5 0 1 Sign of the exponent 0 1 mantissa 1 1 0 1 exponent http://numericalmethods.eng.usf.edu Machine Epsilon Defined as the measure of accuracy and found by difference between 1 and the next number that can be represented 6 http://numericalmethods.eng.usf.edu Example Ten bit word Sign of number Sign of exponent Next four bits for exponent Next four bits for mantissa Next number 0 0 0 0 0 0 0 0 0 0 110 0 0 0 0 0 0 0 0 0 1 1.00012 1.062510 mach 1.0625 1 2 4 7 http://numericalmethods.eng.usf.edu Relative Error and Machine Epsilon The absolute relative true error in representing a number will be less then the machine epsilon Example 0.0283210 1.11002 25 1.11002 20110 2 10 bit word (sign, sign of exponent, 4 for exponent, 4 for mantissa) 0 Sign of the number 1 0 Sign of the exponent 1 1 exponent 1.11002 20110 2 0 1 1 0 0 mantissa 0.0274375 0.02832 0.0274375 a 0.02832 8 0.034472 2 4 0.0625 http://numericalmethods.eng.usf.edu IEEE 754 Standards for Single Precision Representation http://numericalmethods.eng.usf.edu IEEE-754 Floating Point Standard • Standardizes representation of floating point numbers on different computers in single and double precision. • Standardizes representation of floating point operations on different computers. One Great Reference What every computer scientist (and even if you are not) should know about floating point arithmetic! http://www.validlab.com/goldberg/paper.pdf IEEE-754 Format Single Precision 32 bits for single precision 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sign (s) Biased Exponent (e’) Mantissa (m) . Value ( 1) 1 m 2 2 s 12 e ' 127 Example#1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sign (s) Biased Exponent (e’) Mantissa (m) Value 1 1. m2 2 s e ' 127 1 1.101000002 2 1 1.625 2162127 1 1.625 235 5.5834 1010 1 13 (10100010) 2 127 Example#2 Represent -5.5834x1010 as a single precision floating point number. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Sign (s) Biased Exponent (e’) Mantissa (m) 5.5834 10 1 1. ? 2 10 14 1 ? Exponent for 32 Bit IEEE-754 8 bits would represent 0 e 255 Bias is 127; so subtract 127 from representation 127 e 128 15 Exponent for Special Cases Actual range of e 1 e 254 e 0 and e 255 are reserved for special numbers Actual range of e 126 e 127 Special Exponents and Numbers s 0 1 0 1 0 or 1 e 0 e 255 e all zeros all zeros all ones all ones all ones all zeros all ones m Represents all zeros 0 all zeros -0 all zeros all zeros non-zero NaN IEEE-754 Format The largest number by magnitude 1.1........12 2 127 3.40 10 38 The smallest number by magnitude 1.00......02 2 126 2.18 10 38 Machine epsilon mach 2 18 23 1.19 10 7 THE END http://numericalmethods.eng.usf.edu