Chapter 6-2 Multiplier Multiplier Next Lecture Divider Floating Point Numbers Multiplication of Positive Numbers using usual algorithm for multiplying integers Algorithm applies to unsigned numbers and to positive numbers Result of the product of two n-digit numbers can be accommodated in 2n digits Binary multiplication of positive operands can be implemented in a purely combinational, two dimensional logic array 1101 1011 (13) Multiplicand M (11) Multiplier Q Partial Products (143) Product P 2 Formal Representation Z = ·· X Y M + N– 1 = Zk 2 k=0 N – 1 i M – 1 = Xi 2 i=0 j = 0 = Xi Yj 2 with M –1 = Xi 2 i=0 N– 1 Y = Y j2 j= 0 3 i =0 j= 0 X j Yj 2 M – 1 N – 1 i j k i + j Multiplier Implementation Multiplicand Partial product 0 (PP0) m3 0 m2 0 m1 0 m0 q0 0 PP1 q1 0 PP2 q2 0 PP3 q3 0 p7 p6 p5 p4 p0 p1 p2 PP4 = p7 , p6, ...p0 = Product p3 Bit of incoming partial product PPi mj qi Typical cell Carry-out FA Carry-in Bit of outgoing partial product PP(i+1) 4 Array Multiplier m3 P7 5 m2 m1 m0 q 1 P0 m3 m2 m1 m0 HA FA FA HA m3 m2 m1 m0 FA FA FA HA m3 m2 m1 m0 FA FA FA HA P6 P5 P4 q3 P3 q2 P2 P1 q0 Ripple-Carry Array Multiplier For the multiplication operation M Q = P for 4-bit operands M: m3m2m1m0 Q: q3q2q1q0 P: p7p6p5p4p3p2p1p0 0 m3q1 miqj = mi·qj FA m3q2 FA m3q3 FA p7 6 p6 p5 m2q0 m1q1 m1q0 m0q1 FA FA FA m2q2 FA m2q3 FA m3q0 m2q1 m1q2 FA p4 0 m0q2 FA m1q3 m0q0 FA 0 m0q3 FA p3 0 p2 p1 p0 The MxN Array Multiplier Critical Path FA HA FA FA FA FA HA HA Critical Path 1 Critical Path 2 FA FA FA HA Critical Path 1 & 2 Dmult=[(M-1)+(N-2)]Dcarry +(N-1)Dsum+1Dand 7 Multiplier Implementation 8 The main component in each cell is an adder circuitry Each AND gate determines whether a multiplicand bit mj is added to the incoming partial product bit, based on the value of the multiplier bit qj For each row i ( 0 ≤ i ≤ 3) where qi = 1, adds the multiplicand appropriately shifted, to the incoming partial product, PPi, to generate PPi+1 If qi = 0, PPi is passed vertically downward unchanged PP0 is all 0s PP4 is the desired product The multiplicand is shifted left one position per row by the diagonal signal path Another Method of Multiplier Design 9 The previous algorithm may be impractical for large numbers because it uses many gates Multiplication can be performed using a mixture of combinational array techniques and sequential techniques that require less combinational logic In early computers, because of the cost of logic gates, the adder circuitry in the ALU was used to perform multiplication sequentially Called sequential circuit binary multiplier Register A (initially 0) Shift right an - 1 C a0 qn - 1 q0 Multiplier Q Add/Noadd control n-bit adder Control sequencer MUX 0 0 mn - 1 m0 Multiplicand M M 1 1 0 1 Initial configuration 0 C 0 0 0 0 A 1 0 1 1 Q 0 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 Add Shift First cycle 1 0 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 0 Add Shift Second cycle 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 1 1 No add Shift Third cycle 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 Add Shift Fourth cycle Product Sequential Circuit Binary Multiplier 11 This circuit performs multiplication by using a single adder n times to implement the spatial addition performed by the n rows of ripple carry adders Registers A and Q combined hold PPi while multiplier bit qi generates the signal Add/Noadd Add/Noadd controls the addition of the multiplicand M to PPi to generate PPi+1 The product is computed in n cycles The partial product grows in length 1 bit per cycle from the initial vector PP0 of n 0s in register A The carry-out from the adder is stored in Flip-Flop C At the start, the multiplier is loaded into register Q, the multiplicand into register M, and C as well as A are cleared to 0 Sequential Circuit Binary Multiplier 12 At the end of each cycle, C, A and Q are shifted right by one bit position to allow for the growth of the partial product as the multiplier is shifted out of register Q Because of this shifting, multiplier bit qi appears in the LSB position of Q to generate the Add/Noadd signal at the correct time, starting with q0 during the first cycle, q1 during the second cycle, etc... If the adder has a delay of 10 ns The control setting and the shift operations take another 10ns each A hardwired multiply in a 32-bit word-length computer would take about 640ns Multiply instructions took much longer to execute than Add instructions in early computers Signed Operand Multiplication 13 Multiplication of signed operands generates a double length product in the 2's complement number system Consider the case of a positive multiplier and a negative multiplicand When we add a negative multiplicand to a partial product, we must extend the sign bit value of the multiplicand to the left as far as the product will extend The previous hardware can be used for negative multiplicands if it provides for sign extension of the partial products Sign Extension of Negative Multiplicand Sign extension is shown in blue 14 1 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 ( - 13) ( + 11) ( - 143) Negative number must be the multiplicand and the positive number is the multiplier Booth Algorithm 15 A powerful algorithm for signed-number multiplication treats positive and negative numbers uniformly So far, the number of additions equals the number of 1s in the multiplier Consider a multiplication in which the multiplier is positive and has a single block of 1s (e.g., 00111102 = 3010) To derive the product, we could add four appropriately shifted versions of the multiplicand (i.e., for four 1s) We can reduce the number of operations by regarding the multiplier as the difference between two numbers, i.e., 3210-210 or 01000002-00000102 This suggests that the product can be generated by adding 25 times the multiplicand to the 2's complement of 21 times the multiplicand The sequence of required operations can be recoded as 0+1000-10 Booth Algorithm 16 -1 times the shifted multiplicand is selected when changing multiplier from 0 to 1 +1 times the shifted multiplicand is selected when changing multiplier from 1 to 0 The multiplier is scanned form right to left Normal and Booth Multiplication Schemes Normal 0 Booth 17 0 0 1 0 1 0 0 0 1 0 1 1 0 0 +1 +1 + 1+1 1 0 0 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1 0 +1 0 0 1 0 1 0 0 -1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 2's complement of the multiplicand Booth Recoding of a Multiplier 0 0 1 0 0 +1 -1 +1 18 1 1 0 - 1 0 0 1 1 1 0 1 0 0 +1 0 0 - 1 +1 - 1 + 1 1 1 0 0 0 - 1 0 0 When the least significant bit is 1 , assume an implied 0 lies to its right Booth Multiplication with a Negative Multiplier 0 1 1 0 1 1 1 0 1 0 ( + 13) (- 6) 0 1 1 0 1 0 - 1 +1 - 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 19 ( - 78) Handles both positive and negative multipliers uniformly Correctness of Booth Technique for Negative Multipliers Let the leftmost zero of a negative number, X, be at bit position k X = 11…10xk-1….x0 The value of X is given by V(X)= -2k+1 + xk-12k-1 +….+x020 -2k+1 = X= 20 Example V(X) 11000 (-8) 11001 (-7) = -23 = -23 + 1 For example, 1101102(-1010) is recoded as 0-1+10-10 -24+23-2 = -1010 Booth Multiplier Recoding Scheme Multiplier 21 Version of multiplicand selected by bit i Bit i Bit i -1 0 0 0M 0 1 +1M 1 0 1M 1 1 0M Booth Recoded Multipliers 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Worst-case multiplier +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 Ordinary multiplier 0 -1 0 0 +1 - 1 +1 0 - 1 +1 0 0 0 -1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 +1 0 0 0 0 -1 0 0 0 +1 0 0 -1 Good multiplier 22 Achieves some efficiency in the number of additions required when the multiplier has a few large blocks of 1s Fast Multiplication Bit-pair recoding of multipliers Halves the maximum number of summands Derived from the booth algorithm (+1 -1) is equivalent to (0 +1) Because (+1 -1) is (+102 + -12) = +2M + -M = +M = (0 +1) Instead of adding +1×M at position i+1 to -1 times the multiplicand M at a shift position i The same result can be obtained by adding +1×M at position i (+1 0) is equivalent to (0 +2) (-1 +1) is equivalent to (0 -1) The booth-recoded multiplier is examined two bits at a time, starting from the right 23 Multiplier Bit-Pair Recoding Sign extension Implied 0 to right of LSB 1 1 0 Multiplier bit-pair 1 0 1 0 0 1 +1 1 0 0 1 2 0 Multiplier bit on the right (a) Example of bit-pair recoding derived from Booth recoding Multiplicand selected at position i i +1 i i 1 0 0 0 0 M 0 0 1 + 1 M 0 1 0 + 1 M 0 1 1 + 2 M 1 0 0 2 M 1 0 1 1 M 1 1 0 1 M 1 1 1 0 M (b) Table of multiplicand selection decisions 24 Example 0 1 1 0 1 ( + 13) 1 1 0 1 0 (- 6 ) 0 1 1 0 1 0 - 1 +1 - 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 ( - 78) Multiplication Requiring only n/2 Summands 0 1 1 0 1 0 -1 -2 1 1 1 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 Ripple-Carry Array Disadvantage 26 Multiplication requires many additions Using Ripple-Carry Array is slow Consider the addition of three n-bit numbers W, X, Y to produce the sum Z We can first add W to X to generate a number A Then we can add A to Y to produce Z This can be done by using two ripple carry adders A Different Approach 27 Instead of adding W to X to produce A in the upper ripple carry adder, let’s introduce the bits of Y into the inputs This generates the vectors S and the saved carries C as the outputs In the second row, S and C are added in a a ripple carry adder to produce Z Carry save addition can speedup this process Carry Save Array For the multiplication operation M Q = P for 4-bit operands M: m3m2m1m0 Q: q3q2q1q0 P: p7p6p5p4p3p2p1p0 0 m3q0 m2 q1 m3q1 m3 q2 m2 q3 m3q3 FA p7 28 p6 FA m2 q0 m2 q2 FA m1q3 FA FA FA FA FA p5 p4 m1 q1 m1 q2 p3 m0 q2 FA m0 q3 FA m1 q0 m0 q1 FA m0 q0 0 0 FA 0 p2 Q: Do you see any saving here? p1 p0 Example Ripple-Carry vs. Carry-Save 29 Carry-Save Addition Approach 1 0 1 1 0 1 (45) M 1 1 1 1 1 1 (63) Q 1 0 1 1 0 1 A 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 0 X 1 30 B C D E F 1 0 0 1 1 (2,835) Product Complete Example 0 1 0 1 1 0 1 M x 1 1 1 1 1 1 Q 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 0 0 F S 1 2 C 2 0 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 0 1 + 0 1 0 1 0 1 0 0 0 0 0 0 0 1 S1 1 C 1 S2 0 0 1 E 0 1 1 C D 0 1 S 1 0 1 0 B C 1 1 A S 1 3 C3 C2 0 0 1 S4 1 C 4 1 Product Schematic Representation of C.S.A. F E D C B A Level 1 CSA C2 S2 C1 S1 Level 2 CSA C2 C3 S3 Level 3 CSA C4 S4 Final addition + Product 1.7log2k – 1.7 steps, where k is the number of summands 32 Example Ripple-Carry vs. Carry-Save 33 Carry-save addition transforms W, X and Y into S and C Advantages: all bits of S and C are produced in a short fixed amount of time after W, X, and Y are applied Each row approximately takes one full-adder delay Carry propagation takes place only in the last row Carry lookahead adder could be used effectively to add the S and C vectors because all bits of S and C are available in parallel Consider the addition of many summands We can group the summands in threes and perform the carry save addition on each of these groups in parallel to generate S and C Next, group all the S and C vectors into threes and perform carry save addition on them Continue this process until there are only two vectors remaining These remaining vectors can be added in a ripple carry or a carry lookahead adder to produce the sum