Floating Point Arithmetic

Floating Point Format What do floating-point numbers represent? • Rational numbers with non-repeating expansions in the given base within the specified exponent range. • They do not represent repeating rational or irrational numbers, or any number too small or too large. CMPE12c 1 Gabriel Hugh Elkaim IEEE Double Precision FP • IEEE Double Precision is similar to SP – 52-bit M • 53 bits of precision with hidden bit – 11-bit E, excess 1023, representing –1022 <- -> 1023 – One sign bit • Always use DP unless memory/file size is important – SP ~ 10-38 … 1038 – DP ~ 10-308 … 10308 • Be very careful of these ranges in numeric computation CMPE12c 2 Gabriel Hugh Elkaim Floating Point Arithmetic Floating Point operations include •Addition •Subtraction •Multiplication •Division They are complicated because… CMPE12c 3 Gabriel Hugh Elkaim Floating Point Addition Decimal Review + 9.997 4.631 9.997 + 0.004631 10.001631 x 102 x 10-1 How do we do this? CMPE12c 1. Align decimal points 2. Add x 102 x 102 x 102 3. Normalize the result • Often already normalized • Otherwise move one digit 1.0001631 x 103 4. Round result 1.000 x 103 4 Gabriel Hugh Elkaim Floating Point Addition Example: 0.25 + 100 in SP FP First step: get into SP FP if not already .25 = 0 01111101 00000000000000000000000 100 = 0 10000101 10010000000000000000000 Or with hidden bit .25 = 0 01111101 1 00000000000000000000000 100 = 0 10000101 1 10010000000000000000000 Hidden Bit CMPE12c 5 Gabriel Hugh Elkaim Floating Point Addition Second step: Align radix points – – – – CMPE12c Shifting F left by 1 bit, decreasing e by 1 Shifting F right by 1 bit, increasing e by 1 Shift F right so least significant bits fall off Which of the two numbers should we shift? 6 Gabriel Hugh Elkaim Floating Point Addition Second step: Align radix points cont. Shift the .25 to increase its exponent so it matches that of 100. 0.25’s e: 01111101 – 1111111 (127) = 100’s e: 10000101 – 1111111 (127) = Shift .25 by 8 then. Easier method: Bias cancels with subtraction, so 10000101 100’s E - 01111101 0.25’s E 00001000 CMPE12c 7 Gabriel Hugh Elkaim Floating Point Addition Carefully shifting the 0.25’s fraction • • • • • • • • • S 0 0 0 0 0 0 0 0 0 CMPE12c E HB 01111101 1 01111110 0 01111111 0 10000000 0 10000001 0 10000010 0 10000011 0 10000100 0 10000101 0 F 00000000000000000000000 10000000000000000000000 01000000000000000000000 00100000000000000000000 00010000000000000000000 00001000000000000000000 00000100000000000000000 00000010000000000000000 00000001000000000000000 8 (original value) (shifted by 1) (shifted by 2) (shifted by 3) (shifted by 4) (shifted by 5) (shifted by 6) (shifted by 7) (shifted by 8) Gabriel Hugh Elkaim Floating Point Addition Third Step: Add fractions with hidden bit 0 10000101 1 10010000000000000000000 (100) 0 10000101 0 00000001000000000000000 (.25) 0 10000101 1 10010001000000000000000 + Fourth Step: Normalize the result • • • CMPE12c Get a ‘1’ back in hidden bit Already normalized most of the time Remove hidden bit and finished 9 Gabriel Hugh Elkaim Floating Point Addition Normalization example + S 0 0 0 E 011 011 011 HB 1 1 11 F 1100 1011 0111 Need to shift so that only a 1 in HB spot 0 CMPE12c 100 1 1011 1  discarded 10 Gabriel Hugh Elkaim Floating Point Example • 0xD4F80000 + 0x56B00000 CMPE12c 11 Gabriel Hugh Elkaim CMPE12c 12 Gabriel Hugh Elkaim Another SP FP Example • 0xD5D00000 + 0x54600000 CMPE12c 13 Gabriel Hugh Elkaim CMPE12c 14 Gabriel Hugh Elkaim Floating Point Subtraction •Mantissa’s are sign-magnitude •Watch out when the numbers are close - 1.23455 1.23456 x 102 x 102 •A many-digit normalization is possible This is why FP addition is in many ways more difficult than FP multiplication CMPE12c 15 Gabriel Hugh Elkaim Floating Point Subtraction Steps to do subtraction 1. Align radix points 2. Perform sign-magnitude operand swap if needed • Compare magnitudes (with hidden bit) • Change sign bit if order of operands is changed. 3. Subtract 4. Normalize 5. Round CMPE12c 16 Gabriel Hugh Elkaim Floating Point Subtraction Simple Example: - S 0 0 E 011 011 HB 1 1 F 1011 1101 switch order and make result negative 0 011 1 1101 - 0 011 1 1011 1 011 0 0010 1 000 1 0000 CMPE12c 17 smaller bigger bigger smaller switched sign Gabriel Hugh Elkaim Floating Point Multiplication Decimal example: 3.0 x 101 x 5.0 x 102 How do we do this? CMPE12c 1. Multiply mantissas 3.0 x 5.0 15.00 2. Add exponents 1+2=3 3. Combine 15.00 x 103 4. Normalize if needed 1.50 x 104 18 Gabriel Hugh Elkaim Floating Point Multiplication Multiplication in binary (4-bit F) x 0 10000100 0100 1 00111100 1100 Step 1: Multiply mantissas (put hidden bit back first!!) 10.00110000 CMPE12c 19 1.0100 x 1.1100 00000 00000 10100 10100 + 10100 1000110000 Gabriel Hugh Elkaim Floating Point Multiplication Second step: Add exponents, subtract extra bias. 11000000 - 01111111 (127) 10000100 + 00111100 01000001 11000000 Third step: Renormalize, correcting exponent 1 01000001 10.00110000 Becomes 1 01000010 1.000110000 Fourth step: Drop the hidden bit 1 01000010 000110000 CMPE12c 20 Gabriel Hugh Elkaim Floating Point Multiplication Multiply these SP FP numbers together x 0x49FC0000 0x4BE00000 CMPE12c 21 Gabriel Hugh Elkaim CMPE12c 22 Gabriel Hugh Elkaim CMPE12c 23 Gabriel Hugh Elkaim Another SP FP Example • 0xC9F4 × 0x484F CMPE12c 24 Gabriel Hugh Elkaim CMPE12c 25 Gabriel Hugh Elkaim Floating Point Division •True division •Unsigned, full-precision division on mantissas •This is much more costly (e.g. 4x) than mult. •Subtract exponents •Faster division •Newton’s method to find reciprocal •Multiply dividend by reciprocal of divisor •May not yield exact result without some work •Similar speed as multiplication CMPE12c 26 Gabriel Hugh Elkaim Questions? CMPE12c 27 Gabriel Hugh Elkaim

Floating Point Arithmetic

Related documents

Products

Support

Floating Point Arithmetic

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib