1 12.1 Rounding Modes 2 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard: round to floating number whose significand has an LSB of 0 (of two adjacent floatingpoint number, the significand of one must end in 0, and the other one in 1). This is called round-to-neareven. For example, 3.5 and 4.5 are both rounded to 4, the closet even number, based on round-to-near-even. 3 • Other rounding methods – Round inward (toward 0):choose the nearest value in the same direction as 0. – Round upward (toward +∞): choose the larger of the two possible values. – Round downward (toward -∞): choose the smaller of the two possible vavlues. • 4 Example 12.1 Rounding to the nearest integer a. Consider the rounded even integer corresponding to a real signed-magnitude number x a rtnei(x). Plot this round-tonearest-even-integer for x in the range [-4,4]. b. Repeat part a for the function rtni(x), that is, round-to-nearest-integer function, where the midway values are always rounded up 5 6 Example 12.2 Directed rounding a. Consider the inward-directed round corresponding to a real signed-magnitude number x as a function ritni(x). Plot this round-inward-to-nearest-integer function for x in the range [-4,4]. b. Repeat part a for the round-upward-tonearest-integer rutni(x). 7 Figure 12.3 Two directed round-to-nearest-integer functions for x in [– 4, 4]. 8 Figure 12.3 (Continued) 9 12.2 Special Values and Execeptions • Five special values in ANSI/IEEE floating-point standard – ±0 Biased exponent=0, significand=0 (no hidden 1) –±∞ Biased exponent=255 (short), or 2047 (long), significand=0 – NaN Biased exponent=255 (short), or 2047 (long), significand≠0 10 12.3 Floating-Point Addition Consider the addition of ±2e1s1 and ±2e2s2, where e1 > e2 (±2e1s1) +(±2e2s2)=±2e1(s1±s2/2e1-e2) 11 12 Figure 12.6 Simplified schematic of a floating-point adder 13 12.4 Other Floating-point Operations Multiplication of ±2e1s1 and ±2e2s2 (±2e1s1)×(±2e2s2)=±2e1+e2(s1×s2/2e1-e2) Division of ±2e1s1 and ±2e2s2 (±2e1s1)/(±2e2s2)=±2e1-e2(s1/s2) 14 Figure 12.6 Simplified schematic of a floating-point multiply/divide unit. 15 12.5 Floating-Point Instructions 10 floating-point arithmetic instructions (5 different operations: add, sub, multiply, divide, negate) add.s $f0,$f8,$f10 # set $f0 to ($f8)+($f10) add.d $f0,$f8,$f10 # set $f0 $f1 to ($f8$f9)+($f10$f11) Single operands can be in any of the floating registers. Double operands must be in specified to be in even numbered registers Figure 12.7 The common floating-point instruction format for MiniMIPS and components for arithmetic instructions. The extension (ex) field distinguishes single (* = s) from double (* = d) operands. 16 6 format conversion instructions: integer to single/double, single to double, double to single, and single/double to integer cvt.s.w $f0,$f8 # set $f0 to single (integer $f8) cvt.d.w $f0,$f8 # set $f0 to double (integer $f8) cvt.d.s $f0,$f8 # set $f0 to double ($f8) cvt.s.d $f0,$f8 # set $f0 to single ( $f8, $f9,) cvt.w.s $f0,$f8 # set $f0 to integer ($f8) cvt.w.d $f0,$f8 # set $f0 to integer ($f8, $f9) Figure 12.8 Floating-point instructions for format conversion in MiniMIPS. 17 6 data transfer instructions: load/store word to/from coprocessor1, move single/double from one FP register to another, move (copy) between FP registers and CPU general registers. lwcl $f8, 40($3) # load mem[40+($s3)] into $f8 swc1 $f8, A($3) # store mem[A+($s3)] into $f8 mv.s $f0,$f8 # load $f0 with ($f8) mv.d $f0,$f8 # load $f0,$f1 with ( $f8, $f9,) mfc1 $t0,$f12 # load $t0 with ($f12) mtc1 $f8,$t4 # load $f8 with ($t4) Figure 12.9 Instructions for floating-point data movement in MiniMIPS. 18 2 branch and 6 comparison instructions. The FP unit has a flag that is set to T or F based on 6 comparisons (equal, less than, or less or equal for single/double data type) bc1t L # branch on FP flag true bc1f L # branch on FP flag false c.eq.* $f0, $f8 # if ($f0)=($f8), set flag to true c.lt.* $f0, $f8 # if ($f0)<($f8), set flag to true c.lw.* $f0, $f8 # if ($f0)≤($f8), set flag to true Figure 12.10 Floating-point branch and comparison instructions in MiniMIPS. 19 Table 12.1 The 30 MiniMIPS floating-point instructions:because the op field contains 17 for all but two of the instructions (49 for lwc1 and 50 for swc1), it is not shown. 20 12.6 Result Precision and Errors • FP arithmetic can be quite dangerous and must be used with proper care, because results of FP computations are inexact. • Why? – Many real numbers do not have exact binary representation within a finite word format. This is referred as representation error. – Even for values that are exactly representable, FP arithmetic produces inexact results. For example, product of 2 short FP numbers will have a 48 bits significant that must be rounded to 23 bits (plus hidden 1) This is called computation error. 21 Example 12. 4 Associate law of addition does not hold in general in FP arithmetic. For example a= -25×(1.10101011) b=25 × (1.10101110) c=-2-2 × (1.01100101) (a+b)+c = a+(b+c) ? 22 Figure 12.11 Algebraically equivalent computations may yield different results with floating-point arithmetic. 23 • Using guard digits to avoid excessive error. For example, in a 10-digit calculator, 1/3 is represented as 0.333 333 333 3, multiplying 3 results in 0.999 999 999 9, but not 1. However, in a calculator with 2 guard bits, 1/3 is represented as 0.333 333 333 333, but still displayed as 0.333 333 333 3, multiplying 3 results in 1. 24 Figure 12.12 Function evaluation by table lookup and linear interpolation. 25