Document

Arithmetic for Computers Chapter 3 Sections 3.1 – 3.5 & 3.8 Appendix C.1 – C.3, C.5 – C.6 Dr. Iyad F. Jafar Outline  Addition and Subtraction  Overflow Detection  Faster Addition  The 1-Bit ALU  The 32-bit MIPS ALU  Shift Operations  Multiplication  Division  Floating Point Numbers  Fallacies and Pitfalls 2 Addition and Subtraction  Add corresponding bits including the sign bit and ignore the carry out of the MSB  For subtraction, add the negative 4 0100 -4 1100 4 0100 +3 0011 +3 0011 -3 1101 7 0111 -1 1111 1 1 0001 -4 - (-3) 3 1100 1101 -4 1100 +3 0011 -1 1111 Detecting Overflow  When do we get overflow?  Adding two positive numbers and get a negative number  When we add two negative numbers and get a positive number  Investigate the sign bit! + + + Cout 0 0 +0 0 Cin 0 No overflow 4 + + - Cout 0 0 +0 1 Cin 1 Overflow + Cout 1 1 +1 0 Cin 0 Overflow Overflow when carry into sign bit does not equal the carry out - Cout 1 1 +1 1 Cin 1 No Overflow Cin Cout + - Cout 0 1 +0 1 Cin 0 No Overflow + + Cout 1 1 +0 0 Cin 1 No Overflow Overflow Addition and Subtraction  How to perform addition in hardware?  Design 32-bit adder (two 32-bit inputs !!!!)  Cell design !  1-bit Full Adder CarryIn A1 + B1 Sum CarryOut Cout Cin 5 AB 00 B 0 0 1 1 0 0 1 1 Sum A 01 11 10 0 0 0 1 0 1 0 1 1 1 B A 0 0 0 0 1 1 1 1 Cout = AB + BCin+ ACin Cin Cin 0 1 0 1 0 1 0 1 AB Cout 0 0 0 1 0 1 1 1 Sum 0 1 1 0 1 0 0 1 A 00 01 11 10 0 0 1 0 1 1 1 0 1 0 B Sum = A B  Cin Addition and Subtraction  32-bit ripple-carry adder  Cascade 32 copies and wire them up through the Cin and Cout A31 B31 A2 B2 A1 B1 A0 B0 FA FA FA FA S31 S2 S1 S0 C32  How long does it take to get the result ? 6 0 Addition and Subtraction  32-bit ripple-carry Subtractor  Subtraction is addition of the negative!  Compute the 2s complement = 1s complement + 1 B1 B31 A31 A2 FA B2 A1 B0 A0 FA FA FA D2 D1 D0 B32 D31 7 1 Addition and Subtraction  32-bit ripple-carry adder/subtractor  Redundancy in hardware!! Subtraction is addition of the negative!  Use one adder and configure the second input  Remember X X’ and X X Add/Sub B31 A31 B1 A2 B0 B2 A1 A0 FA FA FA FA S31 S2 S1 S0 C32 8 0  ADD 1  Subtract Faster Addition  The ripple-carry adder is slow!  We have to wait until the carry is propagated to the final position in order to read out the addition or subtraction result.  Carry generation is associated with two levels of gates at each bit position Coi = AiBi + AiCini + BiCini  Total delay = gate delay x 2 x number of bits  Example  16 bit adder  delay is 32 delay units  Can we go faster?  What if we generate the carries in parallel? 9 Faster Addition  The carries can be expressed by the Adders inputs and c0 exclusively!  Add a separate hardware to compute the carry in parallel!  Carry-lookahead Adder A31 – A0 B31 – B0 c4 c3 c2 c1 c0 10 Faster Addition  In a 4-bit adder, the equations of the carries are  By substitution c1 = (b0 . c0) + (a0 . c0) + (a0 . b0) c2 = (b1 . c1) + (a1 . c1) + (a1 . b1) c3 = (b2 . c2) + (a2 . c2) + (a2 . b2) c4 = (b3 . c3) + (a3 . c3) + (a3 . b3) c2 = (a1 . a0 . b0) + (a1 . a0 . c0) + (a1 . b0 . c0) + (b1 . a0 . b0) + (b1 . a0 . c0 ) + (b1 . b0 . c0) + (a1 . b1) c3 = (b2 . a1 . a0 . b0) + (b2 . a1 . a0 . c0) + (b2 . a1 . b0 . c0) + (b2 . b1 . a0 . b0) + (b2 . b1 . a0 . c0 ) + (b2 . b1 . b0 . c0) + (b2 . a1 . b1) + (a2 . a1 . a0 . b0) + (a2 . a1 . a0 . c0) + (a2 . a1 . b0 . c0) + (a2 . b1 . a0 . b0) + (a2 . b1 . a0 . c0 ) + (a2 . b1 . b0 . c0) + (a2 . a1 . b1) + (a2 . b2) c4 = ……  All carries require two gate delays !  However, imagine the equation/cost if the adder is 32 bits ?? 11 Faster Addition  We can reduce the logic cost by simple simplification  ci+1 = (ai . bi) + (bi . ci) + (ai . ci) = (ai . bi) + (ai + bi) . ci = gi + pi . ci  gi : carry generate  pi : carry propagate  Carry equations for 4 bit adder  c1 = g0 + p0 . c0  c2 = g1 + p1. c1 = g1 + (p1 . g0) + (p1 . p0 . c0)  c3 = g2 + p2. c2 = g2 + (p2 . g1) + (p2 . p1 . g0) + (p2 . p1 . p0 . c0)  c4 = g3 + p3. c3= g3 + (p3 . g2) + (p3 . p2 . g1) + (p3 . p2 . p1 . g0) + (p3 . p2 . p1 . p0 . c0)  Delay to generate c4 is 3 gate delay  Still cost is high for large adders ! ! ! 12 Faster Addition  2nd Level of Abstraction  Example: 16-bit adder. assume that we have four 4-bit carry- lookahead adders  These 4-bit adders will be designed to produce supper generate (G) and propagate (P) signals  P  the four bits propagate a carry to the next four bits  G  the four bits generate a carry to the next four bits  The super carry signals are fed to a separate carry generation unit c0 c4 A3-A0 B3-B0 g0 p0 c3 c2 c1 a0 b0 4-bit CLA + 13 + + + s0 S3-S0 G0 P0 c0 Faster Addition  Need to generate the carry propagate and generate signals at higher level  Think of each 4-bit adder block as a single unit that can either generate or propagate a carry. A15-A12 B15-B12 A11-A8 B11-B8 A7-A4 B7-B4 A3-A0 B3-B0 C0 4-bit CLA S15-S12 4-bit CLA S11-S8 G3 P3 C3 4-bit CLA S7-S4 G2 P2 C2 S3-S0 G1 P1 C1 Carry Generation Unit 14 C4 4-bit CLA G0 P0 Faster Addition  Super propagate signals  P0 = p3⋅p2⋅p1⋅p0 (how can the first 4-bit adder propagate c0?)  P1 = p7⋅p6⋅p5⋅p4  P2 = p11⋅p10⋅p9⋅p8  P3 = p15⋅p14⋅p13⋅p12  Super generate signals  G0 = g3+(p3 ⋅ g2)+(p3⋅p2⋅g1)+(p3⋅p2⋅p1⋅g0)  G1 = g7+(p7 ⋅ g6)+(p7⋅p6⋅g5)+(p7⋅p6⋅p5⋅g4)  G2 = g11+(p11 ⋅ g10)+(p11⋅p10⋅g9)+(p11⋅p10⋅p9⋅g8)  G3 = g15+(p15 ⋅ g14)+(p15⋅p14⋅g13)+(p15⋅p14⋅p13⋅g12)  Carry signal at higher levels are  C1 = G0 + (P0 ⋅ c0)  C2 = G1 + (P1 ⋅ G0) + (P1⋅P0⋅c0)  C3 = G2 + (P2 ⋅ G1) + (P2⋅P1⋅G0) + (P2⋅P1⋅P0⋅c0)  C4 = G3 + (P3 ⋅ G2) + (P3⋅P2⋅G1) + (P3⋅P2⋅P1⋅G0) + (P3⋅P2⋅P1⋅P0⋅c0) 15 Faster Addition  Each supper carry signal is two level implementation in terms of Pi and Gi  Pi is one level of gates while Gi is two and expressed in terms of pi and gi  pi and gi are one level of gates  Total delay is 2 + 2 + 1 = 5  16-bit CLA is ~6 times faster than the 16- bit ripple carry adder 16 Designing the ALU  We want to design an ALU that  Supports logic operations zero ovf  Supports arithmetic operations  Supports the set-on-less-than instruction 1 A 32 ALU  Supports test for equality  With special handling to 17  sign extension  zero extension  overflow detection 1 result 32 B 32 4 m (operation) Designing the ALU  We start by 1-bit ALU  Starting with logical operations is easier since they map directly to hardware Two operands, two results. We need only one result... Use 2-to MUX Operation A B AB A+B 18 0 1 Result Function Operation A and B 0 A or B 1 The Operation input comes from logic that looks at the opcode Designing the ALU  How about addition? Add an Adder Cin A Operation 0 11 B + Cout 19 2 Connect Cin(from previous bit) and Cout (to next bit) Result Expand Mux to 3-to-1 (Op is now 2 bits) Function Operation A and B 00 A or B 01 A+B 10 Designing the ALU  How about subtraction? Cin BInvert A Operation 0 1 B 0 1 + Cout 20 Use the same adder for subtraction Result Depending operation, choose whether to compute the 2s complement of B or not (MUX or XOR) For 2s complement, define the Binvert signal and set Cin of LSB to 1 2 Function Operation BInvert Cin A and B 00 0 x A or B 01 0 x A+B 10 0 0 A-B 10 1 1 Designing the ALU  Can we add the NOR instruction? AInvert A BInvert Cin Operation 0 0 1 1 B 0 1 + Cout 21 No need to add a NOR gate !! Result 2 Use Demorgan’s theorem, an inverter and 2-to-1 MUX Define the Ainvert signal Function Operation BInvert Cin AInvert A and B 00 0 x 0 A or B 01 0 x 0 A+B 10 0 0 0 A-B 10 1 1 0 A nor B 00 1 x 1 Designing the ALU  Building the 32-bit ALU  Simply, we need to wire up 32 copies of the ALU we designed earlier with special care to the LSB ALU  The Cin and Binvert signals are the same, tie them together into one signal BNegate AInvert A BNegate Operation 0 0 1 1 B 0 1 22 + Cout 2 LSB ALU Result Designing the ALU  Building the 32-bit ALU BNegate Operation A0 B0 Note that the Cin and Bnegate for the LSB are the same in order to compute the 2s complement in case of subtraction A1 B1 Cin ALU0 Cout Cin ALU1 Cout A2 B2 A31 B31 Cin ALU2 Cout Cin ALU31 Cout Cout Result0 Result1 Result2 Result31 Designing the ALU  Supporting SLT instruction 24  Expand the multiplexer for one more input (Less).  Subtract the two registers and feed the sign bit (the result of bit 31) back to the Less input of the LSB ALU  The Less inputs of remaining ALUs is 0. Designing the ALU  The second version of BNegate Operation 32-bit ALU  For SLT instruction, the MSB is fed back to the LSB while other bits are set to zero!  The operation is basically subtraction A0 B0 A1 B1 0 A2 B2 0 A31 B31 0 Cin Result0 ALU0 Less Cout Cin Result1 ALU1 Less Cout Cin Result2 ALU2 Less Cout Cin ALU31 Less Cout Cout Set Result31 OverFlow Designing the ALU  Supporting Branch instructions  Basically, subtract two registers!  However, we need to generate a signal that indicates whether the result is zero or not.  Simply OR the result bits and take the complement.  This signal will be used to make the selection between the branch address and the PC. 26 Example on using the Zero signal to select the address for BEQ instruction Designing the ALU BNegate A0 B0 A1 B1 0 A2 B2 0 A31 B31 0 Operation Cin Result0 ALU0 Less Cout Cin Result1 ALU1 Less Cout Cin Result2 ALU2 Less Cout Cin ALU31 Less Cout Cout The 32-bit ALU Set Result31 OverFlow Designing the ALU  The 32-bit ALU List of Supported Operations 28 Function Operation BNegate AInvert A and B 00 0 0 A or B 01 0 0 A+B 10 0 0 A-B 10 1 0 A nor B 00 1 1 SLT 11 1 0 BEQ 10 1 0 BNE 10 1 0 Shift Operations  Shift operations are commonly needed!  MIPS ISA specifies three shift instructions  Two logical shift instructions SLL $rt, $rs, shift_amount #R[rt] = R[rs] << shift_amount SRL $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amount  One arithmetic shift instruction SRA $rt, $rs, shift_amount #R[rt] = R[rs] >> shift_amount  What is the difference?  Unlike the SRL, the SRA instruction preserves the sign of the number!  Encoding R-type 29 op rs rt rd shamt funct 6 5 5 5 5 6 Shift Operations  Example 1. 1. You need to extract the 2nd byte of a 4-byte word in $t1 $t1 0010 0011 0111 0110 1010 1111 0000 1101 8 srl $t1, $t1, 8 $t1 $t1 0000 0000 0010 0011 0111 0110 1010 1111 0000 0000 0000 0000 0000 0000 1111 1111 andi $t1, $t1, 0x00FF 0000 0000 0000 0000 0000 0000 1010 1111 2. You want to multiply $t3 by 8 (note: 8 equals 23) $t3 0000 0000 0000 0000 0000 0000 0000 0101 sll $t3, $t3, 3 30 $t3 (equals 5) # move 3 places to the left 0000 0000 0000 0000 0000 0000 0010 1000 (equals 40) Shift Operations  How are these instructions implemented?  Outside the ALU  Shift registers  slow; shifting by one bit requires one cycle!  Barrel Shifters 31  A digital circuit that can shift a data word by a specified number of bits in one clock cycle, if long enough!  Simply a set of multiplexors ! Shift Operations  Example 2. 4-bit barrel shifter (rotate to left by 0, 1, 2, or 3 bits) D 4 4-bit Barrel Shifter S1 32 4 Y S0 Shift Value Output S1 S0 Y3 Y2 Y1 Y0 0 0 D3 D2 D1 D0 0 1 D2 D1 D0 D3 1 0 D1 D0 D3 D2 1 1 D0 D3 D2 D1 D0 D3 D2 D1 Y0 D1 D0 D3 D2 Y1 D2 D1 D0 D3 Y2 D3 D2 D1 D0 Y3 Multiplication Multiplicand 421 Multiplier x 123 1263 842 + 421 51783 Multiplying two 3-digit numbers A and B n partial products, where B is n digits long In Binary... Each partial product is either: 110 (A*1) or 000 (A*0) Note: Product may take as many as two times the number of bits! 33 n - 1 additions 110 x 101 110 000 + 110 11110 6x5 Equals 30 Multiplication  Multiplication Steps 1 1 01 01 00 x 1 10 01 110 0000 + 11000 0 10 1 1 0 1 Step1: LSB of multiplier is 1  Add a copy of multiplicand Step2: Shift multiplier right to reveal new LSB Shift multiplicand left to multiply by 2 Step 3: LSB of multiplier is 0  Add zero Step 4: Shift multiplier right, multiplicand left Step 5: LSB of multiplier is 1  Add a copy of multiplicand Step 6: Add partial products Done! Thus, we need hardware to: 34 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 3. Hold product (result) (64 bits) 4. Add the multiplicand to the current result Multiplication  Multiplication Hardware 1. Hold multiplier (32 bits) and shift it right 2. Hold multiplicand (32 bits) and shift it left (requires 64 bits) 3. Hold product (result) (64 bits) 4. Add the multiplicand to the current result 5. Control the whole process Shift Left Multiplicand 64 bit LSB Multiplier 64-bit Shift Right Write Product 64 bit 35 Control 32 bit Multiplication  Example 3. (4-bit multiplication) Initial Values •1-->Add Multiplicand to Product •Shift M’cand left, M’plier right •0-->Do nothing •Shift M’cand left, M’plier right •1-->Add Multiplicand to Product •Shift M’cand left, M’plier right •0-->Do nothing •Shift M’cand left, M’plier right Multiplicand Multiplier Product xxxx1101 0101 xxx11010 0010 xx110100 0001 x1101000 0000 00001101 + 01000001 11010000 0000 01000001 xxxx1101 00000000 + 00001101 ShLeft 8 bit ShRight 0101 4 bit 8-bit 36 000000000 Write 8 bit Control Multiplication  A Cheaper Implementation  Even though we’re only adding 32 bits at a time, we need a 64- bit adder  Instead, hold the multiplicand still and shift the product register right!  Now we’re only adding 32 bits each time Extra bit for carryout Multiplicand Shift Right Multiplier 32 bit Write 32-bit LH Product RH Product 64 bit 37 Control Shift Right 32 bit Multiplication  A Cheaper than the Cheaper Implementation  Note that we’re shifting bits out of the multiplier and into the product  Why not put these together into the same register?!!  As space opens up in the multiplier, overwrite it with the product bits Multiplicand 32 bit 32-bit Control Write LSB LH Product Multiplier 64 bit 38 Shift Right Multiplication  Fast Multiplication  Use 31 32-bit adders to compute the partial products  One input is the multiplicand ANDed with a multiplier, and the other is the partial product from previous step.  Question? Show the multiplication tree to compute 5 X 3. Assume unsigned numbers represented using 3 bits and we have 4-bit ALU. 39 Multiplication  MIPS Multiplication  Two multiplication instructions mult $s0, $s1 # hi||lo = $s0 * $s1 multu $s0, $s1 # hi||lo = $s0 * $s1 R-type op rs rt rd shamt funct 6 5 5 5 5 6  The result is 64 bits and it stored in two special registers  LO  holds the lower 32 bits of the result  Hi  holds the upper 32 bits of the result  The contents of these registers can be read using two special instructions 40 mfhi mflo $t5 $t6 # move Hi to register $t5 # move Lo to register $t6 Multiplication  MIPS Multiplication (NOTES)  Both multiplication instructions ignore overflow!  It is the responsibility of the software to check if the result fits into 32 bits !  For MULTU, there is no overflow if hi is 0  For MULT, there is no overflow if hi is the replicated sign of lo  Question!  Modify the designed multiplier to support signed multiplication. 41 Division Dividend = Divisor * Quotient + Remainder divisor quotient 3221 15 48323 -45 33 -30 32 -30 23 -15 remainder 8 dividend 5 14 0111 0 101 1001001 -000 100 1 -101 100 0 -101 110 -101 11 -000 3 11 Idea: Repeatedly subtract divisor. Shift as appropriate. 42 73 Division Looking at the alignment a little differently… 0111 0 101 1001001 -000 100 1 -101 100 0 -101 110 -101 11 -000 11 43 0111 0 0101 01001001 -01010000 01001001 -00101000 00100001 -00010100 00001101 -00001010 00000011 -00000101 00000011 Make the dividend 8 bits and the divisor 4 bits by filling in with 0’s Each iteration, re-express the entire remainder as 8 bits Note: At any step, the dividend = divisor * quotient + current remainder Try subtracting the divisor from the current remainder each time – if it doesn’t fit, restore the remainder Division Division Hardware 1. Hold divisor (32 bits) and shift it right (requires 64 bits) 2. Hold remainder (64 bits) 3. Hold quotient (result) (32 bits) and shift it left 4. Subtract the divisor from the current result 5. Control the whole process Algorithm Divisor Shift Right 64 bit Quotient 64-bit Shift Left Write Remainder 64 bit 44 Control 32 bit initialize registers (divisor in LHS); for (i=0; i<33; i++) { remainder -= divisor; if (remainder < 0) { remainder+=divisor; left shift quotient 1, LSB=0 } else { left shift quotient 1, LSB=1 } Division  Read pages 236 -242 45 Division  MIPS Division  Two multiplication instructions div divu R-type $s0, $s1 $s0, $s1 # hi = $s0 / $s1 # lo = $s0 mod $s1 op rs rt rd shamt funct 6 5 5 5 5 6  As with multiply, divide ignores overflow so software must determine if the quotient is too large.  Software must also check the divisor to avoid division by 0  Signed division  Remember the signs of the dividend and divisor and use to determine the sign of the quotient  The sign of the remainder is always the same as the dividend 46 (Check by yourself the division of 5/2 using different combinations of the signs of the dividend and the divisor) Floating Point Numbers  Numbers used so far are 32-bit integers!  How about larger and smaller values? How about fractions?  4,600,000,000 or 4.6 x 109  0.0000000000000000000000000166 or 1.6 x 10-27  3.5 , - 0.0213  The IEEE 754 FP Standard !  Uses 32 (single precision) or 64 bits (double precision) to represent numbers  Any number is represented by 3 parts: sign, significand, and exponent  Used in most computers 47 Floating Point Numbers  The IEEE 754 FP Standard  Single precision (32 bits) Sign Exponent Fraction 1 bit 8 bits 23 bits  Normalized representation (no leading zeros and one none zero bit to the left of binary point in the significand)  Since the bit to the left of the binary point is always 1, it is implied and not stored in the fraction (WHY!) Value = (-1)sign x (Fraction+1) x 2Exponent 48  Smallest number is 1.175494350822288e-038  Largest number is 3.402823466385289e+038 Floating Point Numbers  The IEEE 754 FP Standard  Double precision (64 bits) Sign Exponent Fraction 1 bit 11 bits 52 bits  Normalized representation (no leading zeros and one none zero bit to the left of binary point in the significand)  Since the bit to the left of the binary point is always 1, it is implied and not stored in the fraction (WHY!) Value = (-1)sign x (Fraction+1) x 2Exponent  Smallest number is 2.225073858507201e-308 49  Largest number is 1.797693134862316e+308 Floating Point Numbers  The IEEE 754 FP Standard !  The way numbers are represented simplifies sorting of floating numbers using integer comparison  The fraction is sign-magnitude  The exponent is signed 2s complement  Placing the exponent before the significand  The exponent is biased  A constant value is added to represent all exponents with positive numbers  In single precision, bias is 127  Exponent -3 is represented as -3 + 127 = 124  Exponent 5 is represented as 5 + 127 = 132  While in double precision , the bias is 1023  So in biased notation 50 Value = (-1)sign x (Fraction+1) x 2Exponent - Bias Floating Point Numbers  Example 4. Show the IEEE754 representation of - 0.75 using single and double precision formats      (0.75)ten = (0.11)two (-0.75) ten = (-0.11)two (we use sign and magnitude) in binary scientific notation -0.11two x 20 in normalized binary scientific notation -1.1two x 2-1 add the bias to the exponent  In single precision add 127  -1.1two x 2126  In double precision add 1023  -1.1two x 21022  convert the exponent into binary  126 = (01111110)2  1022 = (01111111110)2  drop the 1 on the left of the binary point and fill the corresponding fields 51 Floating Point Numbers  Example 4. Show the IEEE754 representation of - 0.75 using single and double precision formats  Single precision  Double precision 52 Floating Point Numbers  Example 5. What is the value represented by the following IEEE754 number? N = (-1)S x (1+Fraction) x 2(Exponent – Bias) = (-1)1 x (1+0.25) x 2(129 – 127) = -1 x 1.25 x 22 = -1.25 x 4 = -5 53 Floating Point Numbers  Special Numbers in IEEE 754 Standard Single Precision Double Precision E (8) F (23) E (11) F (52) 0 0 0 0 0 nonzero 0 nonzero ± 1-254 anything ± 1-2046 anything ± 255 0 ± 2047 0 255 nonzero 2047 nonzero 54 Object Represented true zero (0) ± denormalized number ± floating point number ± infinity not a number (NaN) Floating Point Numbers  Addition of floating numbers  Analogy to adding floating decimals  Example: 9.999x101 + 1.610 x 10-1 using four digits)  Steps to perform (F1  2E1) + (F2  2E2) = F3  2E3 55  Step 1: Restore the hidden bit in F1 and in F2  Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2)  Step 2: Add the resulting F2 to F1 to form F3  Step 3: Normalize F3 (so it is in the form 1.XXXXX …) and check for overflow/underflow in the exponent  Step 4: Round F3 and possibly normalize F3 again  Step 5: Rehide the most significant bit of F3 before storing the result Floating Point Numbers  Example 6. Show how to add 0.625 and -0.125 using floating point binary representation  In normalized scientific notation this is equivalent     56 1.010 x 2-1 + -1.000 x 2-3 Align exponents 1.010 x 2-1 + -0.010 x 2-1 Add significands 1.000 x 2-1 Normalize the sum (if necessary) and check for overflow/underflow Round the sum and normalize again Floating Point Numbers  Addition hardware of floating numbers 57 Floating Point Numbers  Accurate Arithmetic  In arithmetic we are restricted with the number of bits. Thus we may need to truncate the operand with smallest power to fit into the available bits  IEEE754 standards define two extra bits to the right of the numbers; the guard and round bits.  Decimal example: 2.56 x 100 + 2.34 x 102  Assume significand is represented in 3 digits only  Without guard and round digits (truncation occurs for two digits) (2.34 + 0.02) x 102 = 2.36 x 102  With guard digit, we don’t have to truncate the small number when shifted to the right to match the large number (2.3400 + 0.0256) x 102 = 2.3656 x 102 = 2.37 x 102 (after rounding)  Sticky bit ! 58 Floating Point Numbers  MIPS Floating Point Support  MIPS ISA defines a separate floating point register file  Register $f0 -$f31 (each is 32 bit)  Registers are combined in pairs for double precision arithmetic  Some instructions 59 lwc1 $f1,54($s2) #$f1 = Memory[$s2+54] swc1 $f1,58($s4) #Memory[$s4+58] = $f1 add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 Floating Point Numbers  MIPS Floating Point Support  Compare instructions c.x.s $f2,$f4 #if($f2 x $f4) cond=1; else cond=0 c.x.d $f2,$f4 #$f2||$f3 x $f4||$f5 cond=1; # else cond=0  Branch instruction 60 bclt 25 #if(cond==1) go to PC+4+100 bclf 25 #if(cond==0) go to PC+4+100 Fallacies and Pitfalls  Fallacy 1. Only theoretical mathematicians care about floating point accuracy (The Pentium bug 1994)  Pitfall 1. Just as left shift instruction can replace an integer multiply by a power of 2, a right shift is the same as integer division by power of 2.  Pitfall 2. The MIPS instruction addiu sign-extends its 16-bit immediate 61

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib