Chapter 6. Arithmetic 1 Outline A basic operation in all digital computers is the addition or subtraction of two numbers. ALU – AND, OR, NOT, XOR Unsigned/signed numbers Addition/subtraction Multiplication Division Floating number operation 2 Adders 3 Addition of Unsigned Numbers – Half Adder x 0 0 1 1 + y + 0 + 1 + 0 + 1 c s 0 0 0 1 0 1 1 0 Carry Sum (a) The four possible cases Carry Sum x y c 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 s (b) Truth table x s y x y s HA c c 4 (c) Circuit (d) Graphical symbol Addition and Subtraction of Signed Numbers xi yi Carry-in ci Sumsi Carry-outci +1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 1 si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi ci ci +1 = yi ci + xi ci + xi yi Example: X 7 +Y = +6 Z 13 0 = + 00 1 1 1 1 1 1 1 1 0 0 1 0 1 0 Carry-out ci+1 xi yi si Carry-in ci Legend for stagei 5 Figure 6.1. Logic specification for a stage of binary addition. Addition and Subtraction of Signed Numbers A full adder (FA) yi c i xi yi xi c si i ci xi ci + 1 yi Full adder (FA) c i +1 x i yi ci s i (a) Logic for a single stage 6 Addition and Subtraction of Signed Numbers n-bit ripple-carry adder Overflow? xn - 1 cn yn - 1 x1 y1 cn - 1 x0 y0 c1 FA FA FA sn - 1 Most significant bit (MSB) position s1 s0 c0 Least significant bit (LSB) position (b) An n-bit ripple-carry adder 7 Addition and Subtraction of Signed Numbers kn-bit ripple-carry adder xkn - 1 ykn - 1 x2n - 1 y2n - 1 xn y n n-bit adder c kn s kn - 1 cn n-bit adder s( k - 1) n s 2n - 1 xn - 1 y n - 1 x 0 y 0 s n n-bit adder s n- 1 c 0 s 0 (c) Cascade of k n-bit adders Figure 6.2. Logic for addition of binary vectors. 8 Addition and Subtraction of Signed Numbers yn - 1 y1 y0 Add/Sub control xn - 1 x1 x0 n-bit adder cn c 0 sn - 1 s1 s0 Figure 6.3. Binary addition-subtraction logic netw ork. Addition/subtraction logic unit 9 Make Addition Faster 10 Ripple-Carry Adder (RCA) Straight-forward design Simple circuit structure Easy to understand Most power efficient Slowest (too long critical path) 11 Adders We can view addition in terms of generate, G[i], and propagate, P[i]. 12 Carry-lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry-in will equal carry-out here Sum and Carry can be reexpressed in terms of generate/propagate/Ci: Si = Ai xor Bi xor Ci = Pi xor Ci Ci+1 = Ai Bi + Ai Ci + Bi Ci = Ai Bi + Ci (Ai + Bi) = Ai Bi + Ci (Ai xor Bi) = Gi + Ci Pi 13 Carry-lookahead Logic Reexpress the carry logic as follows: C1 = G0 + P0 C0 C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0 C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0 Each of the carry equations can be implemented in a two-level logic network Variables are the adder inputs and carry in to stage 0! 14 Carry-lookahead Implementation Ai Pi @ 1 gate dela y Bi Si @ 2 gate dela ys Ci Gi @ 1 g ate d elay C0 P0 C1 G0 C0 P0 P1 P2 C2 G1 P2 G2 G1 Increasingly complex logic C0 P0 P1 P2 P3 G0 P1 P2 C0 P0 P1 G0 P1 Adder with Propagate and Generate Outputs C3 G0 P1 P2 P3 G1 P2 P3 G2 P3 C4 G3 15 Carry-lookahead Logic Cascaded Carry Lookahead C0 A0 Carry lookahead logic generates individual carries S 0 @2 B0 C 1 @3 A1 S 1 @4 B1 sums computed much faster C 2 @3 A2 S 2 @4 B2 C 3 @3 A3 S 3 @4 B3 C 4 @3 16 x15-12 y15-12 x11-8 y11-8 c12 c16 4-bit adder s15-12 G3 I P3 I x7-4 y7-4 c8 4-bit adder G2 I I P2 x3-0 y3-0 c4 4-bit adder s11-8 4-bit adder s7-4 I G1 I P1 . c0 s3-0 G0I P0I Carry -lookahead logic G0 II P0 II Figure 6.5. 16-bit carry-lookahead adder built from 4-bit adders (see b). Figure 6.4 Carry-lookahead Logic 17 Carry-lookahead Logic 4 4 4 A [15- 12] B [15- 12] 4-bit Adder G P C16 4 4 4 A [1 1-8] B [1 1-8] 4-bit Adder G P C 12 @8 4 S [15- 12] 4 @8 P3 @3 G3 @5 C3 A [3-0] B [3-0] 4-bit Adder P G C4 @7 4 P2 @3 G2 @5 C2 @5 @0 @4 @2 @3 P1 G1 @4 @2 @3 C1 P0 G0 C16 C4 C0 S [3-0] S [7-4] @2 4 4 A [7-4] B [7-4] 4-bit Adder G P C8 S [1 1-8] @2 4 C0 Lookahead Car r y Unit P 3-0 @3 G 3-0 C0 @0 @5 4 bit adders with internal carry lookahead second level carry lookahead unit, extends lookahead to 16 bits Group Propagate P = P3 P2 P1 P0 Group Generate G = G3 + G2P3 + G1P3P2 + G0P3P2P1 18 Unsigned Multiplication 19 Manual Multiplication Algorithm 1 1 0 1 (13) Multiplicand M 1 0 1 1 (11) Multiplier Q 1 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 ´ 1 1 1 1 (143) Product P (a) Manual multiplication algorithm 20 Array Multiplication Multiplicand Partial product 0 (PP0) m3 0 m2 0 m1 0 m0 q0 0 PP1 q1 0 PP2 p1 q2 0 PP3 q3 0 p7 p6 p5 p4 p0 p2 PP4 =p7 , p6, ...p0 = Product p3 Bit of incoming partial product (PP i) mj qi Typical cell Carry-out FA Carry-in Bit of outgoing partial product [PP( i +1)] (b) Array implementation 21 X3 Y3 X2 Y2 X1 Y1 X0 Y0 X3Y0 X2Y0 X1Y0 X0Y0 X 3Y 1 X 2Y 1 X 1Y 1 X 0Y 1 X3Y2 X2Y2 X1Y2 X0Y2 X3Y3 X2Y3 X1Y3 X0Y3 P7 P6 P5 P4 P3 P2 P1 P0 22 Another Version of 4×4 Array Multiplier 23 Array Multiplication What is the critical path (worst case signal propagation delay path)? Assuming that there are two gate delays from the inputs to the outputs of a full adder block, the path has a total of 6(n-1)-1 gate delays, including the initial AND gate delay in all cells, for the nn array. Any advantages/disadvantages? 24 Sequential Circuit Binary Multiplier Register A (initially 0) M 1 1 0 1 Shift right an - 1 C a0 qn - 1 q0 Multiplier Q Add/Noadd control n-bit adder Control sequencer MUX 0 0 mn - 1 m0 Multiplicand M Initial configuration 0 C 0 0 0 0 A 1 0 1 1 Q 0 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 Add Shift First cycle 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 0 Add Shift Second cycle 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 1 1 No add Shift Third cycle 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 Add Shift Fourth cycle Product (b) Multiplication example (a) Register configuration 25 Signed Multiplication 26 Signed Multiplication Considering 2’s-complement signed operands, what will happen to (13)(+11) if following the same method of unsigned multiplication? Sign extension is shown in blue 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 ( - 13) ( + 11) ( - 143) 27 Figure 6.8. Sign extension of negative multiplicand. Signed Multiplication For a negative multiplier, a straightforward solution is to form the 2’s-complement of both the multiplier and the multiplicand and proceed as in the case of a positive multiplier. This is possible because complementation of both operands does not change the value or the sign of the product. A technique that works equally well for both negative and positive multipliers – Booth algorithm. 28 Booth Algorithm Consider in a multiplication, the multiplier is positive 0011110, how many appropriately shifted versions of the multiplicand are added in a standard procedure? 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 +1 +1 + 1+1 1 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 1 0 29 Booth Algorithm Since 0011110 = 0100000 – 0000010, if we use the expression to the right, what will happen? 0 1 0 +1 0 0 1 0 1 0 0 -1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 2's complement of the multiplicand 30 Booth Algorithm In general, in the Booth scheme, -1 times the shifted multiplicand is selected when moving from 0 to 1, and +1 times the shifted multiplicand is selected when moving from 1 to 0, as the multiplier is scanned from right to left. 0 0 1 0 0 +1 -1 +1 1 1 0 - 1 0 0 1 1 1 0 1 0 0 +1 0 0 - 1 +1 - 1 + 1 1 1 0 0 0 - 1 0 0 Figure 6.10. Booth recoding of a multiplier. 31 Booth Algorithm 0 1 1 0 1 ´ 1 1 0 1 0 ( + 13) (- 6) 0 1 1 0 1 0 - 1 +1 - 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 ( - 78) Figure 6.11. Booth multiplication with a negative multiplier. 32 Booth Algorithm Multiplier Version of multiplicand selected by biti Bit i Bit i -1 0 0 0 ×M 0 1 + 1 ×M 1 0 1 ×M 1 1 0 ×M Figure 6.12. Booth multiplier recoding table. 33 Booth Algorithm Best case – a long string of 1’s (skipping over 1s) Worst case – 0’s and 1’s are alternating 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Worst-case multiplier +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 Ordinary multiplier 0 -1 0 0 +1 - 1 +1 0 - 1 +1 0 0 0 -1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 0 1 Good multiplier 34 0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1 Fast Multiplication 35 Bit-Pair Recoding of Multipliers Bit-pair recoding halves the maximum number of summands (versions of the multiplicand). Sign extension Implied 0 to right of LSB 1 1 0 1 0 1 0 0 1 +1 1 0 0 1 2 0 (a) Example of bit-pair recoding derived from Booth recoding 36 Bit-Pair Recoding of Multipliers Multiplier bit-pair Multiplier bit on the right Multiplicand selected at position i i +1 i i 1 0 0 0 0 ×M 0 0 1 +1 ×M 0 1 0 +1 ×M 0 1 1 +2 ×M 1 0 0 2 ×M 1 0 1 1 ×M 1 1 0 1 ×M 1 1 1 0 ×M (b) Table of multiplicand selection decisions 37 Bit-Pair Recoding of Multipliers 0 1 1 0 1 0 - 1 +1 - 1 0 0 1 1 0 1 ( + 13) ´ 1 1 0 1 0 (- 6 ) 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 ( - 78) 0 1 1 0 1 0 -1 -2 1 1 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 38 Figure 6.15. Multiplication requiring only n/2 summands. Carry-Save Addition of Summands 1 1 0 1 (13) Multiplicand M 1 0 1 1 (11) Multiplier Q 1 1 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 ´ 1 1 1 1 (143) Product P (a) Manual multiplication algorithm 39 Carry-Save Addition of Summands Multiplicand Partial product 0 (PP0) m3 0 m2 0 m1 0 m0 q0 0 PP1 q1 0 PP2 p1 q2 0 PP3 q3 0 p7 p6 p5 p4 p0 p2 PP4 =p7 , p6, ...p0 = Product p3 Bit of incoming partial product (PP i) mj qi Typical cell Carry-out FA Carry-in Bit of outgoing partial product [PP( i +1)] (b) Array implementation 40 Carry-Save Addition of Summands 0 m3q0 m3q1 FA m3q2 FA m3q3 FA p7 p6 FA m2q3 FA p5 FA m1q0 m1q1 m0q0 m0q1 FA m1q2 FA 0 m0q2 FA m1q3 p4 m2q0 m2q1 FA m2q2 FA 0 m0q3 FA p3 0 p2 p1 p0 (a) Ripple-carry array (Figure 6.6 structure) CSA speeds up the addition process. 41 Carry-Save Addition of Summands 0 m3q0 m2q2 FA m2q3 m3q3 FA FA p7 p6 FA p5 m2q0 m2q1 m3q1 m3q2 m1q3 FA FA p4 m1q0 m1q1 m1q2 FA m0q3 FA FA 0 0 FA p3 m0q0 m0q1 m0q2 FA FA 0 p2 p1 p0 (b) Carry -sav e array Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operationx M Q = P for 4-bit operands. Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands. 42 Carry-Save Addition of Summands The delay through the carry-save array is somewhat less than delay through the ripple-carry array. This is because the S and C vector outputs from each row are produced in parallel in one full-adder delay. Consider the addition of many summands, we can: Group the summands in threes and perform carry-save addition on each of these groups in parallel to generate a set of S and C vectors in one full-adder delay Group all of the S and C vectors into threes, and perform carry-save addition on them, generating a further set of S and C vectors in one more full-adder delay Continue with this process until there are only two vectors remaining They can be added in a RCA or CLA to produce the desired product 43 Carry-Save Addition of Summands 1 0 1 1 0 1 (45) M 1 1 1 1 1 1 (63) Q 1 0 1 1 0 1 A 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 0 X 1 B C D E F 1 0 0 1 1 (2,835) Product Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure 6.18. 44 0 1 0 1 1 0 1 M x 1 1 1 1 1 1 Q 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 S 2 C 2 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 0 1 + 0 1 0 1 0 1 0 0 0 0 0 0 1 S1 1 C 1 S2 0 0 1 1 0 0 C F 0 1 1 E 0 1 S 1 D 1 0 B C 1 1 A S 1 3 C3 C2 0 0 1 S4 1 C 4 1 Product Figure 6.18. The multiplication example from Figure 6.17 performed using carry-save addition. 45 + Product Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18. Carry-Save Addition of Summands Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18. 46 Carry-Save Addition of Summands When the number of summands is large, the time saved is proportionally much greater. Some omitted issues: Sign-extension Computation width of the final CLA/RCA Bit-pair recoding 47 Integer Division 48 Manual Division 13 21 274 26 14 13 1 10101 1101 100010010 1101 10000 1101 1110 1101 1 Figure 6.20. Longhand division examples. 49 Longhand Division Steps Position the divisor appropriately with respect to the dividend and performs a subtraction. If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back the divisor, and the divisor is repositioned for another subtraction. 50 Circuit Arrangement Shift left an an - 1 a0 q q n- 1 A 0 Dividend Q Quotient setting Add/Subtract n +1-bit adder Control sequencer 0 mn - 1 m0 Divisor M Figure 6.21. Circuit arrangement for binary division. 51 Restoring Division Shift A and Q left one binary position Subtract M from A, and place the answer back in A If the sign of A is 1, set q0 to 0 and add M back to A (restore A); otherwise, set q0 to 1 Repeat these steps n times 52 11 Examples 10 1000 11 1 10 Initially 0 0 Shift 0 Subtract 1 Set q0 1 Restore 0 Shift 0 Subtract 1 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 1 Shift 0 0 1 0 Subtract 1 1 1 0 Set q0 0 0 0 0 1 1 0 0 1 Shift 0 0 0 1 Subtract 1 1 1 0 Set q0 1 1 1 1 Restore 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 1 Remainder Quotient Set q0 Restore 1 0 0 0 0 0 0 First cycle 0 0 0 0 0 0 0 Second cycle 0 0 0 0 0 0 0 Third cycle 1 Fourth cycle 0 0 1 0 53 Figure 6.22. A restoring-division example. Nonrestoring Division Avoid the need for restoring A after an unsuccessful subtraction. Any idea? Step 1: (Repeat n times) If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and Q left and add M to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0. Step2: If the sign of A is 1, add M to A 54 Examples Initially 0 0 Shift 0 Subtract 1 Set q0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 Shift Add Set q0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0 0 Shift Add Set q0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 Shift 0 0 0 1 0 Subtract 1 1 1 0 1 Set q0 1 1 1 1 1 First cycle Second cycle Third cycle 0 0 1 Fourth cycle 0 0 1 0 Quotient Add 1 1 1 1 1 0 0 0 1 1 0 0 0 1 0 Restore remainder Remainder 55 Figure 6.23. A nonrestoring-division example. Floating-Point Numbers and Operations 56 Floating-Point Numbers So far we have dealt with fixed-point numbers (what is it?), and have considered them as integers. Floating-point numbers: the binary point is just to the right of the sign bit. B b 0 .b 1 b 2 b ( n 1 ) F ( B ) b0 2 0 b 1 2 1 b 2 2 2 ( n 1 ) Where the range of F is: 1 F 1 2 b ( n 1 ) 2 ( n 1 ) The position of the binary point is variable and is automatically adjusted as computation proceeds. 57 Floating-Point Numbers What are needed to represent a floating-point decimal number? Sign Mantissa (the significant digits) Exponent to an implied base (scale factor) “Normalized” – the decimal point is placed to the right of the first (nonzero) significant digit. 58 IEEE Standard for FloatingPoint Numbers Think about this number (all digits are decimal): ±X1.X2X3X4X5X6X7×10±Y1Y2 It is possible to approximate this mantissa precision and scale factor range in a binary representation that occupies 32 bits: 24-bit mantissa (1 sign bit for signed number), 8-bit exponent. Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer E’=E+127, so called excess-127 format 59 IEEE Standard 32 bits S Sign of number : 0 signifies+ 1 signifies - E M 8-bit signed exponent in excess-127 representation 23-bit mantissa fraction E - 127 Value represented= ±1.M ´ 2 (a) Single precision 0 00 10 1 00 0 0 0 10 1 0 . . . 0 0 ´ 2- 87 Value represented = 1.001010 (101000)2=4010, 40-127=-87 (b) Example of a single-precision number 64 bits S E M Sign 11-bit excess-1023 exponent 52-bit mantissa fraction Value represented = ±1.M ´ 2 E - 1023 (c) Double precision 60 Figure 6.24. IEEE standard floating-point formats. IEEE Standard For excess-127 format, 0 ≤ E’ ≤ 255. However, 0 and 255 are used to represent special value. So actually 1 ≤ E’ ≤ 254. That means -126 ≤ E ≤ 127. Single precision uses 32-bit. The value range is from 2-126 to 2+127. Double precision used 64-bit. The value range is from 2-1022 to 2+1023. 61 Two Aspects If a number is not normalized, it can always be put in normalized form by shifting the fraction and adjusting the exponent. excess-127 exponent 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 ... (There is no implicit 1 to the left of the binary point.) (100001000)2=13610, 136-127=-9 Value represented= + 0.0010110 ´ 29 (a) Unnormalized value 0 1 0 0 0 0 1 0 1 0 1 1 0 ... 6+127=133. 13310, = (100000101)2 6 Value represented= + 1.0110 ´ 2 (b) Normalized version 62 Figure 6.25. Floating-point normalization in IEEE single-precision format. Two Aspects As computations proceed, a number that does not fall in the representable range of normal numbers might be generated. It requires an exponent less than -126 (underflow) or greater than +127 (overflow). Both are exceptions that need to be considered. 63 Special Values The end value 0 and 255 are used to represent special values. When E’=0 and M=0, the value exact 0 is represented. (±0) When E’=255 and M=0, the value ∞ is represented. (± ∞) When E’=0 and M≠0, denormal numbers are represented. The value is ±0.M2-126. When E’=255 and M≠0, Not a Number (NaN). 64 Exceptions A processor must set exception flags if any of the following occur in performing operations: underflow, overflow, divide by zero, inexact, invalid. When exception occurs, the results are set to special values. 65 Arithmetic Operations on Floating-Point Numbers Add/Subtract rule Choose the number with the smaller exponent and shift its mantissa right a number of steps equal to the difference in exponents. Set the exponent of the result equal to the larger exponent. Perform addition/subtraction on the mantissas and determine the sign of the result. Normalize the resulting value, if necessary. Multiply rule Add the exponents and subtract 127. Multiply the mantissas and determine the sign of the result. Normalize the resulting value, if necessary. Divide rule Subtract the exponents and add 127. Divide the mantissas and determine the sign of the result. Normalize the resulting value, if necessary. 66 Guard Bits and Truncation During the intermediate steps, it is important to retain extra bits, often called guard bits, to yield the maximum accuracy in the final results. Removing the guard bits in generating a final result requires truncation of the extended mantissa – how? 67 Guard Bits and Truncation Chopping – biased, 0 to 1 at LSB. 0.b-1b-2b-3000 -- 0.b-1b-2b-31110.b-1b-2b-3 Von Neumann Rounding (any of the bits to be removed are 1, the LSB of the retained bits is set to 1) – unbiased, -1 to +1 at LSB. All 6-bit fractions with b-4is b-5better b6 not equal 000 are truncated to 0.b-1b-21 Why unbiased rounding for to the cases that many operands are involved? Rounding (A 1 is added to the LSB position of the bits to be retained if there is a 1 in the MSB position of the bits being removed) – unbiased, -½ to +½ at LSB. Round to the nearest number or nearest even number in case of a tie (0.b-1b-20000 - 0.b-1b-20, 0.b-1b-21100 - 0.b-1b-21+0.001) Best accuracy Most difficult to implement 68 Implementing Floating-Point Operations Hardware/software In most general-purpose processors, floatingpoint operations are available at the machineinstruction level, implemented in hardware. In high-performance processors, a significant portion of the chip area is assigned to floating-point operations. Addition/subtraction circuitry 69 EA EB MA 32-bit operands MB A : SA, EA , M A B : SB, EB , M B 8-bit subtractor M of number with smaller E SWAP M of number with larger E SHIFTER sign SA SB n bits to right n = EA - EB Add / Subtract Combinational Add/Sub CONTROL network Mantissa adder/subtractor Sign EA EB MagnitudeM Leading zeros detector MUX X E Normalize and round 8-bit subtractor E - X R : SR ER MR 32-bit result R = A+B 70 Figure 6.26. Floating-point addition-subtraction unit. Requirements for Homework6 5.6. (a): 3 credits 5.6. (b): Draw a figure to show how program words are mapped on the cache blocks: 4 sequence of reads from the main memory blocks into cache blocks:4 total time for reading the blocks from the main memory into the cache:4 Executing the program out of the cache: Outer loop excluding Inner loop:4 Inner loop:4 End section of program:4 Total execution time:3 Due time: class on Oct. 18 71 Hints for Homework6 Assume that consecutive addresses refer to consecutive words. The cycle time is for one word Assume this problem does not use load-through, which means when a read miss occurs, the block of words that contains the requested word is copied from the main MEM into the cache, after the entire block is loaded into the cache, the particular word requested is forwarded to the processor Total time for reading the blocks from the main memory into the cache: the number of readsx128x10 Executing the program out of the cache MEM word size for instructionsxloopNumx1 Outer loop excluding Inner loop: (outer loop word size-inner loop word size)x10x1 Inner loop: inner loop word sizex20x10x1 MEM word size from MEM 23 to 1200 is 1200-22 MEM word size from MEM 1200 to 1500(end) is 1500-1200 72 Homework 7 1. 2. 3. 4. 5. 6. 1. 2. Addition and Subtraction of Signed Numbers 5-9, Oct. 20 (Barret, Felix, Washington) Carry-lookahead Addition 11-18, Oct. 20 (Kyle White, Jose Jo) Unsigned Multiplication 20-25, Oct. 20 (Tannet Garrett, Garth Gergerich, Gabriel Graderson) Signed Multiplication 26-28 (Shen) Booth Alg. 29-34, Oct. 25 (Ashraf Hajiyer) Fast Multiplication Bit-Pair Recoding of Multipliers 36-38, Oct. 25(Alex, Suzanne, Scott) Carry-Save Addition of Summands 39-47, Oct. 25 (Jason, Jordan, Chris) Integer Division 7. 1. 2. Restoring Division 49-52, Oct. . 27 (Kyle, Brandan, Alex Shipman) Nonrestoring Division 53-55, Oct. 27 (Zach, Eric, Chase) Each presentation is limited to 15 minutes including 2 minutes for questions 73 Exercise for Oct.23 Read “Booth’s algorithm and Bit-Pair Recoding” in the textbook(6.4.1 & 6.5.1) Calculate 2’s complement multiplication (+4)×(-7) using Booth’s algorithm and BitPair Recoding. (Booth’s algorithm and BitPair Recoding will be introduced on Oct.25) You don’t need to hand in this exercise 74