Common Computer Arithmetic Limits For many floating point number systems a number is represented as ( d1.d2d3 . . . dt) x exponent d10 t is the number of digits in the mantissa L is the smallest value of the exponent U is the largest value of the exponent precision single double So L <= exponent <= U bits 32 64 --base | Key values Range (order of magnitude) 10-38 ≤ |x| ≤ 1038 10-308 ≤ |x| ≤ 10308 Significant digits (base 10) about 7 almost 16 ε ≈10-7 ≈10-16 A is the underflow limit (for “normal” numbers), A = L B is the overflow limit, B U+1 = machine epsilon or relative machine precision = (1/2) 1-t the number of significant digits (base 10) is | log 10 | Notes: (1) NaN stands for not a number (ex: 0 / 0 ), is infinity (ex. 1 / 0 ) and - is minus infinity (ex: -1 / 0 ) (2) Extended precision is usually only available inside the registers on the computers CPU (3) Matlab’s command eps returns twice the value listed below. Matlab’s definition of eps is slightly different than ours. Also you might look at realmax and realmin in Matlab. (4) See the web page link to the IEEE standard for information on “denormalized” numbers, if you are interested. If x is any (normal) number stored on the computer and x is the corresponding real world value then x x , A is the number closest to 0, B is the largest number and is a bound on the relative error in storing x. A | x | B and A | x | B is called the range of the computer. If one tries to store |x| > A an overflow error results. The computer might stop the program or simply let x and continue calculating. If |x| < B an underflow error results and most computer will simply let x = = 0 and continue calculating. If x in the range can’t be represented exactly a roundoff error (sometimes called rounding, storage or representation error) results. The nearest floating point value x to x is used The relative error is bounded by . The most important values in the table below are in bold. x or machine epsilon 5.9 x 10-8, about 10-7 Sign bit Mantissa Bits Exponent Bits t L U A or underflow limit B or overflow limit Single precision (32 bits) 1 23 8 -126 127 1.2 x 10-38 3.4 x 1038 Double precision (64 bits) 1 52 11 -1022 1023 2.2 x 10-308 1.8 x 10308 1.1 x 10-16, about 10-16 Extended Precision (at least 80 bits) 1 64 (at least) 15 (at least) 24 = 23 + 1 using “hidden bit trick” 53 = 52 + 1 using “hidden bit trick” 64 (at least) hidden bit often not used -16382 16384 3.4 x 10-4932 1.2 x 104932 5.4 x 10-20 Signif. Digits Special exponent values 7.2 or about 7 15.95 or about 16 19.3 or about 19 all 1’s NaN and , all 0’s “denormal” all 1’s NaN and , all 0’s “denormal” all 1’s NaN and , all 0’s “denormal”