Integer Multipliers 1 A B X P Multipliers • A must have circuit in most DSP applications • A variety of multipliers exists that can be chosen based on their performance • Serial, Serial/Parallel,Shift and Add, Array, Booth, Wallace Tree,…. 2 A B X P en en en reset converter reset reset RA converter 16x16 multiplier RC Converter RB 3 A B X P Multiplication Algorithm X= Xn-1 Xn-2 …………………X0 Y=Yn-1 Yn-2…………………….Y0 Multiplicand Multiplier Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0 Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1 Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2 … … … … …. …. …. …. …. Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2 Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1 ----------------------------------------------------------------------------------------------------------------------------------------P2n-1 P2n-2 P2n-3 P2 P1 P0 4 . 1. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to do the the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up. A7 B7 A6 B6 A5 B5 A4 B4 A3 B3 A2 B2 A1 B1 A0 B0 A7.B0 A6.B0 A5.B0 A4.B0 A3.B0 A2.B0 A1.B0 A0.B0 A7.B1 A6.B1 A5.B1 A4.B1 A3.B1 A2.B1 A1.B1 A0.B1 A7.B2 A6.B2 A5.B2 A4.B2 A3.B2 A2.B2 A1.B2 A0.B2 A7.B3 A6.B3 A5.B3 A4.B3 A3.B3 A2.B3 A1.B3 A0.B3 A7.B4 A6.B4 A5.B4 A4.B4 A3.B4 A2.B4 A1.B4 A0.B4 A7.B5 A6.B5 A5.B5 A4.B5 A3.B5 A2.B5 A1.B5 A0.B5 A7.B6 A6.B6 A5.B6 A4.B6 A3.B6 A2.B6 A1.B6 A0.B6 A3.B7 A2.B7 A1.B7 A0.B7 A3.B7 A2.B7 A1.B7 A0.B7 P15 P14 The equation is : P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0 . m1 n 1 P(m n) A(m)B(n) ai b j 2i j i 0 j 0 5 A B X Multiplier Design P Storage R E G I N 1 MU (16X16 Multiplier Unit) R E G O U T Control Unit 6 X: x3x2x1x0 Y:y 3y2y1y0 Input Sequence for G1: A B X 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 P 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 0 Reset=0 G2 CLK 1 0 d 1-bit q REG 0 x0y0 + x0 y0 G1 x0y0 0 0 0 0 0 Serial Register CLK CLK/(N+1) Slide 1 7 X: x3x2x1x0 Si: the ith bit of the final result Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 0 Reset=0 G2 CLK 1 0 d 1-bit q REG 0 x1y0 + x1 y0 G1 x1y0 S0 0 0 0 0 Serial Register CLK CLK/(N+1) Slide 2 8 X: x3x2x1x0 Si: the ith bit of the final result Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 0 Reset=0 G2 CLK 1 0 d 1-bit q REG 0 x2y0 + x2 y0 G1 x2y0 x1y0 S0 0 0 0 Serial Register CLK CLK/(N+1) Slide 3 9 X: x3x2x1x0 Si: the ith bit of the final result Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 0 Reset=0 G2 CLK 1 0 d 1-bit q REG 0 x3y0 + x3 y0 G1 x3y0 x2y0 x1y0 S0 0 0 Serial Register CLK CLK/(N+1) Slide 4 10 X: x3x2x1x0 Si: the ith bit of the final result Y:y 3y2y1y0 Input Sequence for G1: 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 S0 Reset=1 G2 CLK 0 0 d 1-bit q REG 0 0 + 0 0 G1 0 x3y0 x2y0 x1y0 S0 0 Serial Register CLK CLK/(N+1) Slide 5 11 X: x3x2x1x0 Si: the ith bit of the final result Y:y 3y2y1y0 Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Reset:010000100001000010000 x1y0 Reset=0 G2 1 CLK d 1-bit q REG x1y0 C1 S1 + x0 y1 G1 x0y1 0 x3y0 x2y0 x1y0 S0 0 Serial Register CLK CLK/(N+1) Slide 6 12 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 x2y0 Reset=0 G2 1 CLK x2y0 d 1-bit q REG C20 + x1 y1 G1 Y:y 3y2y1y0 x1y1 S20 S1 0 x3y0 x2y0 S0 C1 Serial Register CLK CLK/(N+1) Slide 7 13 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 x3y0 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG x3y0 C30 + x2 y1 G1 x2y1 S30 S20 S1 0 x3y0 S0 C20 Serial Register CLK CLK/(N+1) Slide 8 14 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 0 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG 0 C40 + x3 y1 G1 x3y1 S40 S30 S20 S1 0 S0 C30 Serial Register CLK CLK/(N+1) Slide 9 15 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S1 Reset=1 G2 0 Y:y 3y2y1y0 CLK d 1-bit q REG 0 C50=0 + 0 0 G1 0 S50 S40 S30 S20 S1 S0 C40 Serial Register CLK CLK/(N+1) Slide 10 16 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S20 Reset=0 G2 1 CLK S20 d 1-bit q REG C21 S2 + x0 y2 G1 Y:y 3y2y1y0 x0y2 S50 S40 S30 S20 S1 S0 0 Serial Register CLK CLK/(N+1) Slide 11 17 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S30 Reset=0 G2 1 CLK S30 d 1-bit q REG C31 + x1 y2 G1 Y:y 3y2y1y0 x1y2 S31 S2 S50 S40 S30 S1 S0 C21 Serial Register CLK CLK/(N+1) Slide 12 18 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S40 Reset=0 G2 1 CLK S40 d 1-bit q REG C41 + x2 y2 G1 Y:y 3y2y1y0 x2y2 S41 S31 S2 S50 S40 S1 S0 C31 Serial Register CLK CLK/(N+1) Slide 13 19 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S50 Reset=0 G2 1 CLK S50 d 1-bit q REG C51 + x3 y2 G1 Y:y 3y2y1y0 x3y2 S51 S41 S31 S2 S50 S1 S0 C41 Serial Register CLK CLK/(N+1) Slide 14 20 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S2 Reset=1 G2 0 Y:y 3y2y1y0 CLK d 1-bit q REG 0 C60=0 + 0 0 G1 0 S60 S51 S41 S31 S2 S1 S0 C51 Serial Register CLK CLK/(N+1) Slide 15 21 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S31 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG S31 C32 S3 + x0 y3 G1 x0y3 S60 S51 S41 S31 S2 S1 S0 0 Serial Register CLK CLK/(N+1) Slide 16 22 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S41 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG S41 C42 + x1 y3 G1 x1y3 S4 S3 S60 S51 S41 S2 S1 S0 C32 Serial Register CLK CLK/(N+1) Slide 17 23 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S51 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG S51 C52 + x2 y3 G1 x2y3 S5 S4 S3 S60 S51 S2 S1 S0 C42 Serial Register CLK CLK/(N+1) Slide 18 24 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S60 Reset=0 G2 1 Y:y 3y2y1y0 CLK d 1-bit q REG S60 C61 + x3 y3 G1 x3y3 S6 S5 S4 S3 S60 S2 S1 S0 C52 Serial Register CLK CLK/(N+1) Slide 19 25 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset:010000100001000010000 S3 Reset=1 Y:y 3y2y1y0 G2 CLK 0 d 1-bit q REG 0 0 + 0 0 G1 0 S7 S6 S5 S4 S3 S2 S1 S0 C61 Serial Register CLK CLK/(N+1) Slide 20 26 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset=0 Reset:010000100001000010000 G2 1 CLK d 1-bit q REG + 0 0 G1 Y:y 3y2y1y0 0 S7 S6 S5 S4 S3 S2 S1 0 Serial Register CLK CLK/(N+1) Slide 21 27 S0 X: x3x2x1x0 Si: the ith bit of the final result Input Sequence for G1: Ci: the only carry from column i 00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0 Sij: the jth partial sum for column i 00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0 Cij: the jth partial carry from column i Reset=0 Reset:010000100001000010000 G2 1 CLK d 1-bit q REG + 0 0 G1 Y:y 3y2y1y0 0 S7 S6 S5 S4 S3 S2 S1 0 Serial Register CLK CLK/(N+1) Slide 21 28 S0 Si: the ith bit of the final result A B y0 x0 y1 D y2 D 0 0 S0 0 0 + S0 D 0 + 0 P y3 D 0 X + S0 D 0 S0 D S0 0 Slide 1 29 Si: the ith bit of the final result Ci: the only carry from column i A B y0 x1 y1 D x1y0 y2 x0 D x0y1 0 0 + S1 P y3 D 0 X 0 + S1 + S1 S1 S0 C1 D D 0 0 D 0 Slide 2 30 Si: the ith bit of the final result Ci: the only carry from column i A Sij: the jth partial sum for column i Cij: the jth partial carry from column i y0 x2 y1 D x2y0 B y2 x1 D x1y1 + D x0y2 0 0 S2 + C20 P y3 x0 S20 X + S2 S2 S1 S0 C21 D C1 D 0 D 0 Slide 3 31 Si: the ith bit of the final result Ci: the only carry from column i A Sij: the jth partial sum for column i B X Cij: the jth partial carry from column i y0 x3 y1 D x3y0 y2 x2 D x2y1 + y3 x1 D x1y2 S30 + C30 x0 x0y3 S31 + D C20 S3 S2 S1 S0 C32 C31 D S3 C21 D 0 Slide 4 32 P Si: the ith bit of the final result Ci: the only carry from column i A Sij: the jth partial sum for column i B X Cij: the jth partial carry from column i y0 0 y1 D 0 y2 x3 D x3y1 + y3 x2 D x2y2 S40 + C40 x1 x1y3 S41 + C41 D S4 S3 S2 S1 S0 C42 D C30 S4 C31 D C32 Slide 5 33 P Si: the ith bit of the final result A Ci: the only carry from column i S ij: B the jth partial sum for column i X P Cij: the jth partial carry from column i y0 y1 D 0 y2 D 0 x3 D 0 0 + y3 x3y2 C40 + x2 x2y3 S51 + C50 0 D S5 S4 S3 S2 S1 S0 C51 D C40 S5 C41 D C42 Slide 6 34 Si: the ith bit of the final result A Ci: the only carry from column i S ij: B the jth partial sum for column i X P Cij: the jth partial carry from column i y0 y1 D 0 y2 D 0 D 0 0 0 y3 x3 x3y3 0 + 0 + 0 C50 + 0 D 0 S6 S6 S5 S4 S3 S2 S1 S0 C6 D C50 D C51 Slide 7 35 Si: the ith bit of the final result A Ci: the only carry from column i B y0 y1 D 0 y2 D 0 0 0 D 0 0 + 0 + 0 0 + 0 D P y3 0 0 X S7 S6 S5 S4 S3 S2 S1 S0 0 D 0 S7 0 D C6 Slide 8 36 Shift Add Multiplier Design Implementation INPUT Ain (7 downto 0) A B X P REGA 0 MUX 8 bit Adder INPUT Bin (7 downto 0) REGC Result (15 downto 8) REGB Result (7 downto 0) CLOCK 37 A B X P Synchronous Shift and Add Multiplier controller Multiplication process: 5 states: Idle, Init, Test, Add, and Shift&Count. Idle: Starts by receiving the Start signal; Init: Multiplicand and multiplier are loaded into a load register and a shift register, respectively; Test: The LSB in the shift register which contains the multiplier is tested to decide the next state; 38 A B X P Synchronous Shift and Add Multiplier ControllerDesign Add: If LSB is ‘1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; Shift&Count: If LSB is ‘0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication. 39 n-bit Multiplier: Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q0=0: Registers C, A, Q are shifted to the right one bit Multiplicand Add Shift and Add Control Logic n-bit Adder Shift Right C An-1 An ... A1 A0 Qn-1 Qn ... Q1 Q0 Multiplier Slide 1 40 Example: 4-bit Multiplier A Initial Values B X Multiplicand 1 0 1 1 Add 4-bit Adder Shift and Add Control Logic Shift Right 0 0 0 0 0 1 1 0 1 Multiplier Slide 2 41 P Example: 4-bit Multiplier First Cycle--Add A B X Multiplicand 1 0 1 1 Add=1 4-bit Adder Shift and Add Control Logic Shift Right=0 0 1 0 1 1 1 1 0 1 Multiplier Slide 3 42 P Example: 4-bit Multiplier A First Cycle--Shift B X Multiplicand 1 0 1 1 Add=0 4-bit Adder Shift and Add Control Logic Shift Right=1 0 0 1 0 1 1 1 1 0 Multiplier Slide 4 43 P Example: 4-bit Multiplier Second Cycle--Shift A B X Multiplicand 1 0 1 1 Add=0 4-bit Adder Shift and Add Control Logic Shift Right=1 0 0 0 1 0 1 1 1 1 Multiplier Slide 5 44 P Example: 4-bit Multiplier A Third Cycle--Add B X Multiplicand 1 0 1 1 Add=1 4-bit Adder Shift and Add Control Logic Shift Right=0 0 1 1 0 1 1 1 1 1 Multiplier Slide 6 45 P Example: 4-bit Multiplier Third Cycle--Shift A B X P Multiplicand 1 0 1 1 Add=0 4-bit Adder Shift and Add Control Logic Shift Right=1 0 0 1 1 0 1 1 1 1 Multiplier Slide 7 46 Example: 4-bit Multiplier Fourth Cycle--Add A B X Multiplicand 1 0 1 1 Add=1 4-bit Adder Shift and Add Control Logic Shift Right=0 1 0 0 0 1 1 1 1 1 Multiplier Slide 8 47 P Example: 4-bit Multiplier Fourth Cycle--Shift A B X Multiplicand 1 0 1 1 Add=0 4-bit Adder Shift and Add Control Logic Shift Right=1 0 1 0 0 0 1 1 1 1 Multiplier Slide 9 48 P A B X P 4*4 Synchronous Shift and Add Multiplier Design Layout Design Floor plan of the 4*4 Synchronous Shift and Add Multiplier 49 A B X P Comparison between Synchronous and Asynchronous Approaches . 50 Example : (simulated by Ovais Ahmed, Fall_03,project) A B Multiplicand = 100010012 = 8916 Multiplier = AB16 101010112 = X Expected Result = 1011011100000112 =5B8316 51 P A B X P Array Multiplier Regular structure based on add and shift algorithm. Addition is mainly done by carry save algorithm. Sign bit extension results in a higher capacitive load and slows down the speed of the circuit. 52 A B X Addition with CLA P a3 a2 a1 a0 b0 A = a3a2a1a0 B = b3b2b1b0 a3 a2 a1 a0 b1 0 Cout Ci Four-bit Adder 0 n a3 a2 a1 a0 b2 Cout a3 Four-bit Adder a2 a1 Cin 0 a0 b3 Cout Four-bit Adder Cin 0 53 Product (A*B) A B X P Array Multiplier with CSA A3 A2 A1 A0 **Pij =Ai Bj Aj Total of 16 gates P03 P12 0 P02 P11 0 P01 P10 0 F.A F.A F.A B0 Bi Ci B1 0i3 B2 0 j3 B3 P13 P22 P21 F.A Pij Ci P23 P32 P31 F.A Ci Si P33 Ci Si Ci Si P20 F.A Si F.A Ci Si P30 F.A Ci Ci Si F.A Ci Si 0 F.A Ci Si Si P00 Si F.A Ci Si F.A Ci Si 54 R7 R6 R5 R4 R3 R2 R1 R0 A B X P Critical Path with Array Multipliers FA FA FA FA FA FA FA HA FA FA HA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder Delay = τ HA + (2N-1) τ FA 55 A B X P 56 B P9 + + + + + + + + P8 P7 P6 P5 P4 P3 + + + + + + + + + + P2 P1 x0y0 x0y1 x1y0 x0y2 x1y1 x2y0 x0y3 x1y2 x2y1 x3y0 x2y2 x3y1 x4y0 x0y4 x1y3 x2y3 x3y2 x4y1 x1y4 x2y4 x3y3 x4y2 X x3y4 x4y3 x4y4 A Wallace Tree P + + P0 57 A B X P Array Multiplier + Wallace Tree 58 A B X Background P Baugh-Wooley Algorithm X * Y ( xk 1 * 2 k 1 ( xk 1 * yk 1 * 2 k 2 xi * 2 ) * ( yk 1 * 2 i i 0 2 k 2 k 2 k 2 i 0 j 0 k 2 k 1 k 2 yi * 2i ) i 0 xi y j * 2 ) xk 1 yi * 2 i j i 0 k 1i k 2 yk 1 xi * 2 k 1i i 0 • Convert negative partial products to positive representation • No sign-extension required 4/13/2015 Concordia VLSI Lab 59 59 A X B P examples of 5-by-5 Baugh-Wooley a4b0' FA a4b1' FA a4b2' a3b2 FA 0 a3b0 a3b1 a2b2 FA a3b3 FA a2b3 FA a1b3 FA a2'b4 FA a1'b4 FA a0'b4 FA FA FA a2b0 0 a2b1 a1b2 FA FA 0 a1b1 a1b0 0 FA a0b1 a0b0 a0b2 a0b3 a4b3' a4' 1 FA b4' a4b4 FA a3'b4 a4 FA FA FA FA FA P9 P8 P7 P6 P5 FA b4 P4 P3 P2 P1 P0 The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier 4/13/2015 Concordia VLSI Lab 60 60 A B X P a7 a6 a5 a4 a3 a2 a1 a0 * a7 a6 a5 a4 a3 a2 a1 a0 ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 a0*a0 a7*a1 a6*a1 a5*a1 a4*a1 a3*a1 a2*a1 a1*a1 a0*a1 a7*a2 a6*a2 a5*a2 a4*a2 a3*a2 a2*a2 a1*a2 a0*a2 a7*a3 a6*a3 a5*a3 a4*a3 a3*a3 a2*a3 a1*a3 a0*a3 a7*a4 a6*a4 a5*a4 a4*a4 a3*a4 a2*a4 a1*a4 a0*a4 a7*a5 a6*a5 a5*a5 a4*a5 a3*a5 a2*a5 a1*a5 a0*a5 a7*a6 a6*a6 a5*a6 a4*a6 a3*a6 a2*a6 a1*a6 a0*a6 a7*a7 a6*a7 a5*a7 a4*a7 a3*a7 a2*a7 a1*a7 a0*a7 ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ a7*a6 a7*a5 a7*a4 a7*a3 a7*a2 a7*a1 a7*a0 a6*a0 a5*a0 a4*a0 a3*a0 a2*a0 a1*a0 ‘0' a0 ------------ ------------ ------------ ------------ ------------ ------------ 61 A X B P Example of an 8bit squarer N*N a6a2 a7a1 N=8bits a6a1 a5a3 a7a0 a5a2 a5a1 a6a0 a4a2 a3a1 a4a1 a5a0 a3a2 a4a0 a2a1 a3a0 a2a0 a2 ‘0’ a0 a1a0 a1 ‘0’ ‘0’ a6a3 a7a2 a5a4 a3 a3a4 ‘0’ ‘0’ a6a5 a7a4 a6 a6a4 a7a3 a5 a4 ‘0’ a7 a7a6 a7a5 ‘0’ S15 S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0 62 A B X P Array Multiplier 32bits by 32bits multiplier 63 1 Booth (Radix-4) Multiplier A B X Radix-4 (3 bit recoding) reduces number of partial products to be added by half. Great saving in area and increased speed. A = -an-12n-1 + an-22n-2 + an-32n-3 + …. + a12 + a0 B = -bn-12n-1 + bn-22n-2 + bn-32n-3 + …. + b12 + b0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B= 22i Ki i=0 64 P Ki is calculated by following equation Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2 3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and corresponding Ki is calculated. B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). The product AB is then obtained by adding n/2 partial products. (n/2) - 1 AB = P = 22i Ki A i=0 65 A B X P Booth Algorithm Decoding of multiplier to generate signals for hardware use Xi+1 Xi Xi-1 OP NEG ZERO TWO 0 0 0 0 0 1 0 1 0 0 2 1 0 1 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 1 1 2 0 0 1 1 1 1 0 1 1 0 66 A B X P Booth Algorithm A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand and the previous bit Xi+1 X Xi-1 Zi/2 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 2 1 0 0 -2 1 0 1 -1 1 1 0 -1 1 1 1 0 67 BIT M is multiplied by OPERATION 21 20 2-1 Xi Xi+1 Xi+2 0 0 0 add zero (no string) +0 0 0 1 add multipleic (end of string) +X 0 1 0 add multiplic. (a string) +X 0 1 1 add twice the mul. (end of string) +2X 1 0 0 sub. twice the m. (beg. of string) -2X 1 0 1 sub. the m. (-2X and +X) -X 1 1 0 sub . the m. (beg. of string) -X 1 1 1 sub. zero (center of string) -0 68 A B X P Booth Algorithm-a higher radix Multiplication ● ●●● (●●)(●●) Multiplicand A = Multiplier B= ● ● ● ● (B1B0)2A40 Partial product bits ● ●●● Partial product bits Product P= (B3B2)A41 ● ●●●● ●●● 69 A Example B X P The following example is used to show how the calculation is done properly. Added to the multiplier Multiplicand X = 000011 Multiplier Y = 011101 0 1 1 1 0 1 0 After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* +1 X* -1 X* +2 000000000011 1111111101 00000110 -------------------------------------------000001010111 70 A B X P Sign Extension 71 A B X P Sign extension Traditional sign-extension scheme • Segment the input operands based on the size of embedded blocks • Multiply the segmented inputs and extend the sign bit of each partial products • Sum all partial products × Segmented input operands Sign extension partial products + Sign 4/13/2015 Final result Concordia VLSI Lab 72 72 A B X P Booth Algorithm-Example 1 Example 1: 000011 011101 0 (+3) (+29) +2 -1 +1 000000000011 1111111101 00000110 1 000001010111 (+87) 73 A B X P Booth Algorithm Example 2 Notice sign extensions 111101 011101 0 (-3) (+29) +2 -1 +1 111111111101 0000000011 11111010 2s complement of multiplicand 1 111110101001 (-87) 74 A B X P Booth Algorithm-Example 3 Notice the sign extensions 111101 100011 0 (-3) (-29) -2 +1 -1 000000000011 1111111101 00000110 Shifted 2s complement 1 000001010111 (+87) 75 A B X P Comparison of Booth and parallel multiplier shift and Add 76 Template to reduce sign extensions for Booth Algorithm For hardware implementation Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed 77 A B X P Comparison of Template and the sign extension A B 1 A B S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S2 S2 S2 S2 S2 S2 S3 S3 S3 S3 S4 P Sign template P Sign extension 78 3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 1 0 3 2 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 S S S A A A A A A A A A A A A A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 S A A A A A A A A A A A A A A A 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A A 0 0 A A 0 0 A A 1 1 1 S A A A A A A A A A A A A A A A A A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 S A A A A A A A A A A A A A A A A A 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 S A A A A A A A A A A A A A A A A A 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 S A A A A A A A A A A A A A A A A A 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 S A A A A A A A A A A A A A A A A A 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 A A A A A A A A A A A A A A A A A A 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 5 5 4 4 3 3 2 2 Partial Product matrix generated for a 16 * 16 bit multiplication , Using booth and the template given in previous slide S A A A A A A A A A A A A A A A A 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 79 A B X Example of using the template P 25 * - 35 with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit Add SS Add inverted S 00011001 110111010 Add Inverted sign and add 1 Add Inverted sign bit No sign bit 10000011001 1011100111 100110010 1100111 * 1 * -1 * 2 * -1 11110010010101 This is a –ve number. Convert it 00001101101011 512 256 64 32 8 2 1 = 875 80 A B X P Booth Multiplier Components Multiplier Booth Encoder Mu lt ip li ca nd PPU (Partial products unit) PPA (Partial products adding unit) Product 81 A B X Wallace Tree and Ripple Carry Adder Structure. P Of 8*8 multiplier With Pipeline Partial Product PP0,PP1,PP2(15 downto 0) + + + + + + + + + + + + + + + + + Partial Product PP3(15 downto 0) + + + + + + + + + + + + 0 + + + Pipeline Register Ripple Carry Adder + + + + + + + + + + + + + + + P4 P3 P2 + 0 Critical Path P16 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P1 P0 82 CLK Hardware implementation of Booth with shift and add Start Mulbegin A D Init LD 16 17 X Q Doubleshift SH CLK 17 =0; A16=0 =1, A16=1 endcheck CLK F Mulbegin Stop Doubleshift Mux11 Mux11 CLK Mux12 Mux12 CLK A3bit CLK Mux0 Init CLK Mulend CLR CLR reg2right17 Mul11 Start ctrl0 A 32 B 32 C 32 D 32 *2 (shifter) 2s D 16 B complement Init LD Shift SH 32 Q Mul12 Init 32 32 Counter20 Mulend CLR Start FSM Mulend 11 32 10 Y 01 00 5 A 37 not used Mux0 Cout 37 Sum A 37 B 37 B 37 Cin Adder 37 Start Sel 1 Y 0 37 Mux37 D 37 Q 37 Result CLK Register37 CLR Start Start 16 D 16 LD Finish Mux0 Init Start sign expansion mux4-32 B Finish ctrl1 CLK CLK reg_2left32 CLR B X Shift Stop QA(0-2) A Q *2 (shifter) Shift SH CLK CLK reg_2left32 reg_2left32 CLR Start 83 P A B X P Simulation Plan 32-bit Signal Generator A Behavioral Multiplier A[31:0] A*B Result P[63:0] 64-bit Comparator Failed Number My_P[63:0] 32-bit Signal Generator B B[31:0] Array Multiplier Modified BoothWallace Tree Multiplier My Multiplier Modified Booth Multiplier Wallace Tree Multiplier Twin Pipe Serial-Parallel Multiplier 84 A B X P Testing the Design 85 A B X P Simulation For Parallel Multipliers Signed Number: Unsigned Number: 86 A B X P Simulation For Signed S/P Multipliers There are 340 ns delay between the result and the operators because of the D flip-flops delay. 87 A B X P FPGA after implementation, areas of programming shown clearly 88 A B X P Another implementation of the above after pipelining, the place and rout has paced the design in different places. 89 A B X P Spartacus FPGA board 90 A B X P Testing the multiplication system 91 A B X P Comparison of Multipliers Array Multiplier Area – Total CLB’s (#) Modified Booth Multiplier Wallace-Tree Multiplier Modified BoothWallace Tree Multiplier Twin Pipe SerialParallel Multiplier Behavioral Multiplier 3076.50 2649.50 3325.50 2672.50 490.00 2993.50 Maximum Delay D(ns) 35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33 Total Dynamic Power P (W) 7.52 6.33 7.46 6.41 0.28 6.24 Delay ·Power Product (DP) (ns W) 268.98 154.64 141.14 118.76 30.62 307.58 Area•Power Product (AP) (# W) 23128.20 16771.60 24793.93 17127.79 139.54 18665.07 Area•Delay Product (AD) (# ns) 1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05 3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06 Area•Delay2 Product (AD2) (# ns2) 92 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005 A B X Comparison of Multipliers P Array Multiplier Area – Total CLB’s (#) Modified Booth Multiplier Wallace-Tree Multiplier Modified BoothWallace Tree Multiplier Twin Pipe SerialParallel Multiplier Behavioral Multiplier 3280.50 2800.00 3321.50 2845.50 487.00 3003.00 37.23 25.33 18.93 18.33 107.52 44.50 Total Dynamic Power P (W) 7.57 6.66 7.32 6.66 0.29 6.26 Delay ·Power Product (DP) (ns W) 281.88 168.77 138.60 122.13 30.66 278.53 Area•Power Product (AP) (# W) 24837.98 18656.40 24319.36 18959.57 138.89 18795.78 Area•Delay Product (AD) (# ns) 1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05 4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06 Maximum Delay D(ns) Area•Delay2 Product (AD2) (# ns2) 93 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005 A B X P Comparison of Multipliers Change the value of “set_max_delay” in Script file (ns) 0 10 20 30 40 50 60 >60 3014. 5 3013. 0 3110. 0 3193. 5 3019. 5 2999. 5 2978. 5 2978. 5 Power(w) 6.649 9 6.647 0 7.568 3 8.187 8 8.064 5 8.041 9 8.015 6 8.015 6 Delay(n s) 31.98 30.93 30.08 39.93 49.88 59.63 59.63 Area(#) 31.98 3250 The relation of Area and Delay for behavioral multiplier -"banana curve" Area (#) 3200 3150 3100 Series1 3050 3000 2950 0 20 40 Delay (ns) 60 80 94 A B X P Comparison of Multipliers Array Multiplier Modified Booth Multiplier WallaceTree Multiplier Modified BoothWallace Tree Multiplier Twin Pipe SerialParallel Multiplier Behavioral Multiplier Area Medium Small Large Small Smallest Medium Critical Delay Medium Fast Very Fast Fastest Very Large Large Power Consumption Large Medium Large Medium Smallest Medium Complexity Simple Complex More Complex More Complex Simple Simplest Implement Easy Medium Difficut Difficut Easy Easiest By Chen Yaoquan, M.Eng. 2005 95 A B X P Pipelining Simulation 96 A B X P Synthesis for Signed Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral 97 A B X P Synthesis for Unsigned Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral 98 A B X P Conclusion • • • • Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases. 99 A B X Comparison P Array Multiplier Area – Total CLB’s (#) Maximum Delay (ns) Power Consumption at highest speed (mW) Delay Power Product (DP) (ns mW) Area Power Product (AP) (# mW) Area Delay Product (AD) (# ns) Area Delay2 Product(AD2) (# ns2) 1165 Modified Booth Multiplier 1292 Wallace Tree Multiplier 1659 Modified Booth & Wallace Tree Multiplier 1239 Twin Pipe SerialParallel Multiplier 133 187.87ns 139.41ns 101.14ns 101.43ns 22.58ns (722.56ns) 16.6506m W (at 188ns) 23.136mW (at 140ns) 30.95mW (at 101.14ns) 30.862mW (at 101.43ns) 2.089mW (at 722.56ns) 3128.15 3225.39 3130.28 3130.33 1509.42 19.397 x 103 29.891 x 103 51.346 x 103 38.238 x 103 277.837 218.868 x 103 180.118 x 103 167.791 x 103 125.671 x 103 96.101 x 103 41.119 x 106 25.110 x 106 16.970 x 106 12.747 x 106 69.438 x 106 100 A B X P NOTICE The rest of these slides are for extra information only and are not part of the lecture 101 Array Addition 102 Addition of 8 binary numbers using the Wallace tree principal 103 104 105 FINISH0 A B BEGIN0 CLK RESET START MULT320 Done RESULT INVERTER END0 AND_2 COUNTER20 CLR Adder37 32 37 37 CLK 37 D LAST_RESULT Q CLR REGSTER37 106 Baugh-Wooley two's complement multiplier: a4b0' • FA a4b1' FA a4b2' a3b2 FA 0 a3b0 a3b1 a2b2 FA a3b3 FA a2b3 FA a1b3 FA a2'b4 FA a1'b4 FA a0'b4 FA FA FA a2b0 0 a2b1 a1b2 FA FA 0 a1b1 a1b0 0 FA a0b1 a0b0 a0b2 a0b3 a4b3' a4' 1 FA b4' a4b4 FA a3'b4 a4 FA FA FA FA FA P9 P8 P7 P6 P5 FA P4 b4 P3 P2 P1 P0 The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier 107 A B a4 a3 a2 a1 a0 X b4 b3 b2 b1 b0 a4b0' a3b0 a2b0 a1b0 a0b0 a4b2' a4b4 a4' a4b3' a3'b4 a4b1' a3b1 a2b1 a1b1 a0b1 a3b2 a2b2 a1b2 a0b2 a0b3 a3b3 a2b3 a1b3 a2'b4 a1'b4 a0'b4 b4' + b4 p9 1 0 1 =13 1 1 0 1 1 = -5 0 1 1 0 1 0 1 1 0 1 0 0 0 0 0 1 1 1 = -5 0 1 =13 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 1 + 1 1 1 1 1 0 1 =13 = 5 0 1 0 1 0 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 = -65 1 0 1 1 0 1 1 0 1 X 0 0 1 1 1 0 0 0 1 = 65 1 1 1 1 = -65 1 0 0 1 1 = -13 1 1 0 1 1 = -5 0 0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 + 0 0 1 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 X 1 1 1 P 0 0 0 p0 1 0 X p1 1 1 1 p2 0 0 0 p3 0 1 1 p4 1 1 1 p5 0 0 1 p6 0 0 1 0 p7 1 0 + p8 0 X + a4 1 1 0 0 1 0 0 0 0 0 1 = 65 108 A B X P Cluster Multipliers Divide the multiplier into smaller multipliers 109 A B X Cluster Multipliers P Multiplicand Multiplier /CLR 8-bit Latch /CLR 8-bit Latch /CLR 8-bit Latch EN1 /CLR 8-bit Latch 8 8 EN0 CLK CLK CLK CLK 8 EN2 The circuit used to generate the enable signal 4-bit Multiplier 4-bit Multiplier 4-bit Multiplier EN3 CLK CLK CLK CLK 4-bit Multiplier 8-bit Latch 8-bit Latch 8-bit Latch 8-bit Latch 4 4 4 4 B3~B0 B8~B7 A3~A0 A8~A7 8 Final Addition Stage 16 P 110 8-bit cluster low power multiplier A B X P Cluster Multipliers • Dividing the multiplication circuit into clusters (blocks) of smaller multipliers • Applying clock gating techniques to disable the blocks that are producing a zero result. • Features – Low Power (claims 13.4 % savings) 111 A B X Multiplexer-Based Array Multipliers P Z 41 Zj Z 42 Z 40 Z30 Z31 Z 21 Z 20 Z10 Z32 Z 43 xjyj n 1 n 1 P xj yj 2 Z j 2 j 2j j 0 j 1 Z j x jY j X j y j X j X j 1 X j 2 ...X 0 112 A B X P Multiplexer-Based Array Multipliers Two types of cells: Cell 1: produce the terms carry save adder array Zij2j and includes a full adder of Cell 2: produce the terms xjyj 2j and includes a full adder of carry save adder array 113 A B X P Multiplexer-Based Array Multipliers • Characteristics – – – – Faster than Modified Booth Unlike Booth, does not require encoding logic Requires approximately N2/2 cells Has a zigzag shape, thus not layout-friendly 114 A B X P Multiplexer-Based Array Multipliers • Improvement – More rectangular layout – Save up to 40 percent area without penalties – Outperforms the modified Booth multiplier in both speed and power by 13% to 26% 115 A B • X Gray-Encoded Array Multiplier P Dec Hyb Dec Hyb Dec Hyb Dec Hyb 0 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 2’s complement Hybrid Coding – Having a single bit different for consecutive values – Reducing the number of transitions, and thus power ( for highly correlated streams ). 116 A B X P Gray-Encoded Array Multiplier An 8-bit wide 2’s complement radix-4 array multiplier 117 A B X P Gray-Encoded Array Multiplier • Characteristics – Uses gray code to reduce the switching activity of multiplier – Saves 45.6% power than Modified Booth – Uses greater area(26.4% ) than Modified Booth 118 A B X P Ultra-high Speed Parallel Multiplier • How to ultra-high speed? – Based on Modified Booth Algorithm and Tree Structure (Column compress) – Chooses efficient counters (3:2 and 5:3) – Uses the new compressor (faster 20% ) – Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%) 119 A B X P Ultra-high Speed Parallel Multiplier Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16-bit instead of 32-bit. Calculation process using parallel counter in case of 16x16 ---Totally reduce delay by about 30% 120 A B X P ULLRLF Multiplier • ULLRLF stands for Upper/Lower Left-toRight Leapfrog. • Combine the following techniques: – Signal flow optimization in [3:2] adder array for partial product reduction, – Left-to-right leapfrog (LRLF) signal flow, – Splitting of the reduction array into upper/lower parts. 121 A B X P ULLRLF Multiplier PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C 1) Signal flow optimization in [3:2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also. 122 A B X P ULLRLF Multiplier The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power. 123 A B X P ULLRLF Multiplier Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced. 124 A B X P ULLRLF Multiplier •ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. • With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)125 A B X Signed Array Multiplier P B0 A31 A3 A2 A1 A0 B1 A31 A30 A2 A1 A0 B2 One stage of carry save adder A31 A30 A1 A29 FA A0 FA FA FA HA B3 A31 A30 A28 A29 FA FA A0 FA A0 FA FA HA STAGE 4 TO 30 (Each stage includes 32 AND gates, 31 full adders ,1 half adder and 1 NOT gate) B31 A31 A30 A1 FA A0 FA FA HA P31 P30 1 32-bit carry look ahead adder HA P63 P62 P61 P34 P33 P3 P2 P1 P0 126 32*32-Bit Array Multiplier for Signed Number A B X Unsigned Array Multiplier P B0 A31 A3 A2 A1 A0 B1 A31 A30 A2 A1 A0 B2 One stage of carry save adder A31 A30 A1 A29 HA A0 FA FA FA HA B3 A31 A30 A28 A29 FA FA A0 FA A0 FA FA HA STAGE 4 TO 30 (Each stage includes 32 AND gates, 31 full adders and 1 half adder) B31 A31 A30 A1 FA A0 FA FA HA 32-bit carry look ahead adder P63 P62 P61 P33 P32 P31 P30 P3 P2 32*32-Bit Array Multiplier for Unsigned Number P1 P0 127 A X Signed Modified Booth Multiplier 0 B P 63 60 55 50 45 40 35 30 25 20 15 10 5 0 1E 1 E 1 E .S 1 E 16 rows of partial products .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E .S 1 E 1 E 1 E E = The inversion of sign bit in each row S = the B i+1 bit in the three encoded bits .S .S .S ................................ …............................ .............................. ............................... ................................ …......................... ................................. …....................... ................................. ......................... ....................................... .... ....................... ................................. ..................... ........................................... ........ ................... ................................. …{ { { { { { { { { { { { { { { { LSB B i-1 B B i+1 M u l t I p l i e r MSB 32*32-bit Booth Multiplier for Signed Number 128 A B X Signed Modified Booth Multiplier P A31 A31 SEL A31 A31 SEL One stage A29 A28 SEL SEL SEL SEL A3 A2 A1 A0 0 SEL SEL SEL SEL SEL SEL A2 A1 A0 0 SEL SEL SEL SEL X1[0] X2[0] INVERT0 X1[1] X2[1] INVERT1 Booth Encoder B[1:0]0 Booth Encoder B[3:1] Booth Encoder B[5:3] 1 HA A31 A31 SEL A30 1 A4 A30 FA HA HA HA A30 A29 A28 A27 A26 A1 A0 0 SEL SEL SEL SEL SEL SEL SEL SEL HA HA X1[2] X2[2] INVERT2 INVERT2 1 HA FA FA FA FA FA FA FA X1[n] X2[n] INVERT n STAGE 3 TO 15 (Each stage includes 33 PP selectors, 31 full adders ,1 half adder and 1 NOT gate) INVERT1 INVERT0 0 0 64-bit carry look ahead adder P62 P61 P60 B[31:5] INVERT n 1 P63 Booth Encoder P5 P4 P3 P2 P1 P0 32*32-Bit Modified Booth Multiplier for Signed Number 129 A X Unsigned Modified Booth Multiplier 0 B P 60 55 50 45 40 35 30 25 20 15 10 5 0 1 S' 1 S' 1 S' .S 1 S' 17 rows of partial products .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S 1 S' .S .S S = the B i+1 bit in the three encoded bits S' = The inversion of S 32*32-bit Booth Multiplier for unsigned Number ................................ 63 00 …............................ .............................. ............................... ................................. …......................... .................................. …....................... .................................. ......................... ........................................ .... ....................... .................................. ..................... ............................................ ........ ................... .................................. …{ { { { { { { { { { { { { { { { { LSB B i-1 B B i+1 M u l t i p l i e r MSB 130 A B X Unsigned Modified Booth Multiplier P A31 S[0] A31 S[1] A30 SEL_ END One stage A29 SEL A31 S[2] A30 SEL_ END A29 SEL FA A28 SEL SEL SEL A2 A28 SEL SEL HA A2 SEL A1 SEL A27 SEL SEL A26 SEL HA HA HA A1 SEL A0 0 SEL SEL_ END A1 SEL SEL A0 0 SEL SEL_ END A0 0 SEL SEL_ END X1[0] X2[0] S[0] X1[1] X2[1] S[1] HA FA FA FA FA FA FA STAGE 3 TO 15 (Each stage includes 33 PP selectors, 32 full adders ,1 half adder and 1 NOT gate) A31 A30 A1 A29 SEL SEL SEL A0 0 SEL SEL_ END X1[2] X2[2] S[2] FA B[3:1] Booth Encoder B[5:3] FA Booth Encoder B[i+1, I, i-1] S[i] X1[16] X2[16] S[16] FA FA FA FA Booth Encoder 00B[31] FA S[1] 0 S[0] 0 64-bit carry look ahead adder P63 Booth Encoder S16 1 HA B[1:0]0 HA X1[i] X2[i] S [i] SEL_ END Booth Encoder S[2] 1 HA A3 1 1 HA SEL A4 A30 SEL_ END P62 P61 P35 P34 P33 P32 P31 P6 P5 P4 P3 P2 32*32-Bit Modified Booth Multiplier for Unsigned Number P1 P0 131 A B X P Wallace Tree multipliers A[31:0] B[31:0] 32 partial products added in Wallace Tree Adder C[63:0] S[63:0] 64-bit Carry Look-ahead Adder P[63:0] 132 A B X P Wallace Tree multipliers……...................................... ……..................................... ……................................... .......................................... ………............................ …………....................... …………..................... …………......................... ………….................. …………….............. ………............................................... ................................................. ……...................................... ……...................................... ……................................... …………………................. • Use the 3:2 counters and 2:2 counters • Number of levels of = log (32/2) / log (3/2) ≈8 • Irregular structure • Fast Input: Output: .. .. Sum Carry 2:2 counter .. . .. Sum Carry 3:2 counter ................................................................ ............................................................ . .......................................................... . .......................................................... ......................................................... …................................................... …….............................................. ……....................................... .... …................................. …………………allace Tree multipliers B63 .................................. B0 .................................. A0 A63 Cin Carry Propagate/Generate unit P63 .................................. P0 P63-P56 G63-G56 ..................................................................................... 8-Bit BCLA C63-C56 PM7 C56 GM7 2-level hierarchical .................................. G0 G63 PM6 GM6 8-Bit BCLA 8-Bit BCLA C55-C48 C47-C40 PM5 C40 GM5 C48 PM4 GM4 C39-C32 PM3 GM3 C31-C24 C24 PM2 GM2 8-Bit BCLA 8-Bit BCLA 8-Bit BCLA 8-Bit BCLA 8-Bit BCLA C23-C16 C16 PM1 GM1 P7-P0 G7-G0 C15-C8 C8 PM0 GM0 C7-C0 8-Bit BCLA P63 .................................. P0 C63 .................................. C0 64-Bit Summation Unit C64 S63 ....................................................................................... S0 64-Bit Carry Look Ahead Adder 134 A B X Modified Booth-Wallace Tree Multipliers P 135 A B X P Modified Booth-Wallace Tree Multipliers • Use the 3:2 counters and 2:2 counters • Number of levels of = log (16/2) / log (3/2) ≈6 • Irregular structure • Fast • Less areaearrage 1 2 PP Dot Matrix of Booth-Wallace Multiplier for Signed Number 3 4 5 6 136 A B X P Twin pipe serial-parallel multipliers B30 B28 …… B2 B0 A31 A30 …………………… A1 B1 Parallel in – serial out shift registers P62 P60 ……………………… P2 P0 Serial in – parallel out shift registers Parallel in – serial out shift registers B31 B29 …… B3 A0 32-bit twin pipe serial-parallel multiplier unit P63 P61 ……………………… P3 Serial in – parallel out shift registers P1 Result_ready Load/Shift Reset Clock Sign Block diagram of 32*32-bit signed twin pipe serial-parallel multiplier with serial/parallel conversion logic 137 A B X Signed twin pipe serial-parallel multipliers P Even data bits on rising clock …... B2 B0 0 A31 A30 A0 0 reset FA D D rising_edge D falling_edge FA D D D FA D Repeat 28 units more D HA D D Even product D 0 Product MUX D D D FA FA FA HA D Odd data bits on rising clock …... B3 B1 0 D 0 reset 1 D Clock Odd product B31 B29 …... A31 A30 A0 Sign Reset Clock 32*32-bit twin pipe serial-parallel multiplier for signed number “Sign” control line and the sign-change hardware 138 A B X Unsigned twin pipe serial-parallel multipliers P Even data bits on rising clock …... B2 B0 0 A31 A30 A0 0 reset HA D D rising_edge D falling_edge FA D D D FA D Repeat 28 units more D D D Even product HA D 0 Product MUX D D D HA FA FA D Odd data bits on rising clock …... B3 B1 0 1 D Clock HA Odd product 0 reset A31 A30 A0 Reset Clock 32*32 bit twin pipe serial-parallel multiplier for unsigned number • Don’t need the “Sign” control line and the sign-change hardware 139