EE 5324 – VLSI Design II Part III: Multipliers and Shifters Kia Bazargan University of Minnesota Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 129 References and Copyright • Textbooks referenced [WE92] N. H. E. Weste, K. Eshraghian “Principles of CMOS VLSI Design: A System Perspective” Addison-Wesley, 2nd Ed., 1992. [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 1996. [Par00] B. Parhami “Computer Arithmetic: Algorithms and Hardware Designs” Oxford University Press, 2000. Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 130 References and Copyright (cont.) • Slides used(Modified by Kia when necessary) [©Hauck] © Scott A. Hauck, 1996-2000; G. Borriello, C. Ebeling, S. Burns, 1995, University of Washington [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html [©Oxford U Press] © Oxford University Press, New York, 2000 Slides for [Par00] With permission from the author http://www.ece.ucsb.edu/Faculty/Parhami/files_n_docs.htm Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 131 Why Multipliers? • Used in a lot of DSP applications Vector product, matrix multiplication Convolution Filtering (tap filters, FIR, …) ... “At least one good reason for studying multiplication and division is that there is an infinite number of ways of performing these operations and hence there is an infinite number of PhDs (or expense-paid visits to conferences in USA) to be won from inventing new forms of multiplier” Alan Clements The Principles of Computer Hardware, 1986 [Par00] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 132 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 133 Multiplication Example • Example: 12x5 Multiplicand: Multiplier: 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 12 5 4 partial products 60 • The partial product can be generated using an array of AND gates Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 134 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 135 Sequential Multiplier • Shift register Originally holds multiplicand Shifts it left for each partial product • One bit of multiplier at a time presented to the AND gates 2N bits Shift Register Initialized w/ mcand, shifts it left 0 One bit of mplier applied each cycle Adder Register [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 136 Sequential Multiplier – Resource Requirements • Adder: 2N-bit • Registers: 2N-bit wide • Better design: Shift result register to right Uses N AND gates Uses N-bit adder Register Register Adder Shift Register Adder Shift Register Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 137 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 138 Combinational Multiplier: Idea • Use an array of AND gates to generate the partial products in parallel multiplicand LSB multiplier 1 0 LSB 1 1 1 1 1 1 0 0 0 0 1 0 0 0 Spring 2006 1 1 EE 5324 - VLSI Design II - © Kia Bazargan 0 0 [©Hauck] 139 Combinational Multiplier: Adding PProds X3 Z7 Spring 2006 X2 X1 X0 X3 X2 X1 X0 HA FA FA HA X3 X2 X1 X0 FA FA FA HA X3 X2 X1 X0 FA FA FA HA Z6 Z5 Z4 Z3 Y3 Y2 Y0 Y1 Z 0 Z1 Z2 EE 5324 - VLSI Design II - © Kia Bazargan [WE92] p547 [Rab96] p.409 140 Combinational Multiplier: Critical Path(s) • A lot of critical paths: same delay. (AND gates not shown) MxN Multiplier M FA N HA FA FA HA FA FA HA Critical Path 1 Critical Path 2 FA FA FA HA Delay=(M+N-2)tcarry+(N-1)tsum+tAND Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p.410 141 Combinational Multiplier: Layout • Better floorplan for compact layout: HA FA FA HA FA FA FA HA FA FA FA HA Send partial product diagonally Results in better area (AND gates and hence the first row not shown) [WE92] p548 [Rab96] p.412 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 142 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 143 Carry-Save Adder: the Idea • When adding k n-bit numbers, don’t need to optimize the carry chain of each of the rows Below is the old-style ripple-adder FA Spring 2006 HA FA FA FA FA FA HA FA FA HA EE 5324 - VLSI Design II - © Kia Bazargan HA 144 Carry-Save Adder: structure • Postpone the “carry propagation” operation to the last stage Delay=N.tcarry+ tand + tmerge CSA HA HA HA HA HA FA FA FA HA FA FA FA FA FA HA HA Vector merging stage [Rab96] p.411 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 145 Carry-Save Adder: Details F Spring 2006 H H F F F F F F F H EE 5324 - VLSI Design II - © Kia Bazargan H 146 CSA: Intermediate FA Cells • Better to have the same sum and carry delays (both contribute to critical path) P A Ci P S P B A B Ci P P A A Setup Spring 2006 Co P Ci P EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p.410 147 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 148 Booth Multiplier: an Introduction • Recode each 1 in multiplier as “+2-1” Converts sequences of 1 to 10…0(-1) Might reduce the number of 1’s 0 0 1 1 1 1 1 1 0 0 0 0 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 +1 -1 0 Spring 2006 1 0 0 0 0 0 EE 5324 - VLSI Design II - © Kia Bazargan -1 149 Booth Multiplier: Recoding (Encoding) Example 0 1 1 (+1 -1) (+1 -1) +1 0 0 1 1 1 0 (+1 -1) (+1 -1) (+1 -1) -1 +1 0 0 -1 0 0 1 0 (+1 -1) 0 0 +1 -1 0 • If you use the last row in multiplication, you should get exactly the same result as using the first row (after all, they represent the same number!) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 150 Booth Recoding: Multiplication Example Sign extension 1 1 1 0 0 0 0 0 1 0 0 1 Spring 2006 1 0 0 1 0 0 0 0 1 +1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 1 -1 0 0 0 0 0 0 1 0 0 EE 5324 - VLSI Design II - © Kia Bazargan 6x 14 (-6) 84 151 Booth Recoding: Advantages and Disadvantages • Depends on the architecture Potential advantage: might reduce the # of 1’s in multiplier • In the multipliers that we have seen so far: Doesn’t save in speed (still have to wait for the critical path, e.g., the shiftadd delay in sequential multiplier) Increases area: recoding circuitry AND subtraction Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 152 Modified Booth Multiplier: Idea • Group pairs, leaving –2, -1, 0, 1, 2 Grouping reduces # of partial products by half • Booth recoding results in: Gets rid of 3’s (sequences of 1’s in general) 0 1 1 (+1 -1) (+1 -1) +1 0 +2 0 1 1 1 0 (+1 -1) (+1 -1) (+1 -1) -1 +1 -1 0 0 0 -1 0 -2 0 0 1 0 (+1 -1) 0 +1 -1 0 +1 -2 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 153 Modified Booth Multiplier: Idea (cont.) • Can encode the digits by looking at three bits at a time • Booth recoding table: i+1 i i-1 add 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0*M 1*M 1*M 2*M –2*M –1*M –1*M 0*M Spring 2006 Must be able to add multiplicand times –2, -1, 0, 1 and 2 Since Booth recoding got rid of 3’s, generating partial products is not that hard (shifting and negating) EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 154 Modified Booth Multiplier: Idea (cont.) • Interpretation of the Booth recoding table: i+1 i i-1 add Explanation 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0*M 1*M 1*M 2*M –2*M –1*M –1*M 0*M No string of 1’s in sight End of a string of 1’s Isolated 1 End of a string of 1’s Beginning of a string of 1’s End one string, begin new one Beginning of a string of 1’s Continuation of string of 1’s [Par] p. 160 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 155 (Modified) Booth Multiplier: Example • Retire two bits per shift operation • Addition: signed 0 0 1 1 0 1 Sign extend 2 bits if adding two partial products at a time i i-1 add 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0*M 1*M 1*M 2*M –2*M –1*M –1*M 0*M Spring 2006 0 -1 -2 1 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 i+1 1 1 1 0 1 0 13 -6 1 1 1 1 1 0 1 1 0 0 1 0 EE 5324 - VLSI Design II - © Kia Bazargan 156 Modified Booth Recoding: Summary • Grouping multiplier bits into pairs Orthogonal idea to the Booth recoding Reduces the num of partial products to half If Booth recoding not used have to be able to multiply by 3 (hard: shift+add) • Applying the grouping idea to Booth Modified Booth Recoding (Encoding) We already got rid of sequences of 1’s no mult by 3 Just negate, shift once or twice Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 157 Modified Booth Multiplier: Summary (cont.) • Uses high-radix to reduce number of intermediate addition operands Can go higher: radix-8, radix-16 Radix-8 should implement *3, *-3, *4, *-4 Recoding and partial product generation becomes more complex • Can automatically take care of signed multiplication (we will see why) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 158 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 159 Pipelined Multipliers • Insert registers (latches) between rows • Insert registers for bits of multiplier Schedule MSB bits to arrive later Spring 2006 HA FA FA HA FA FA FA HA FA FA FA HA EE 5324 - VLSI Design II - © Kia Bazargan 160 Pipelined Multiplier: Example a4 a3 a2 a1 a0 x0 x1 x2 x3 x4 Sum/ carry path FA with AND gate and latches (for ai, intermediate sum and carry) Latch FA p9 Spring 2006 p8 p 7 p6 p p p p p p 5 4 3 2 1 0 EE 5324 - VLSI Design II - © Kia Bazargan [Par00] p186 [© Oxford U Press] 161 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 162 Wallace Tree: Idea • Idea: divide & conquer • Why add the k numbers one by one? Tree structure logarithmic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... [Par00] p131 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 163 Wallace Tree Example Delay = 4 CSA + 1 CLA [Par00] p130 [© Oxford U Press] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 164 Wallace Tree: Structure for 7 k-bit Numbers [0,k-1] [0,k-1] [0,k-1] K-bit CSA [1,k] [0,k-1] [0,k-1] [0,k-1] [0,k-1] K-bit CSA [1,k] [0,k-1] [0,k-1] K-bit CSA [1,k] [0,k-1] K-bit CSA [2,k+1] ‘0’,[2,k] [k+1] [1,k] K-bit CSA [2,k+1] [1,k-1], ‘0’ [1,k+1] [2,k+1] K-bit CPA [k+2] Spring 2006 [2,k+1] EE 5324 - VLSI Design II - © Kia Bazargan [1] [0] [Par00] p131 165 Wallace Tree: Timing • At each step, # of operands reduces to 2/3 n k-bit numbers CSA CSA (2/3) n nums CSA (2/3)2 n CSA CSA CSA CSA CSA CSA CSA CSA CSA CSA CSA CSA ... (2/3)h Spring 2006 n=2 CSA CSA CSA CSA h levels CSA EE 5324 - VLSI Design II - © Kia Bazargan 166 Wallace Tree: Timing (cont.) • Delay depends on height h • h = O ( log n ) Logarithmic delay Max # N of k-bit numbers that can be added using a Wallace tree of height h h N h 0 1 2 3 4 5 6 2 3 4 6 9 13 19 7 8 9 10 11 12 13 N h 28 42 63 94 141 211 316 14 15 16 17 18 19 20 N 474 711 1066 1599 2398 3597 5395 [Par00] p132 [© Oxford U Press] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 167 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 168 Multiplying Signed Numbers • Coding of the numbers Signed-magnitude trivial 2’s complement? • 2’s complement Mplier positive, Mcand +/- : o Sign extend the partial products when adding up o Example: 0 00 000 000 Spring 2006 0 0 0 1 0 0 1 1 0 1 0 0 01 11 01 1 +5x +3 111 +15 1 1 1 1 0 0 0 000 111 1 1 EE 5324 - VLSI Design II - © Kia Bazargan 1 0 1 0 0 0 0 0 0 0 1 0 11 11 11 1 -5x +3 001 -15 169 Multiplying Signed Numbers (cont.) • 2’s complement (cont.) Mplier negative, Mcand +/- : o Ad-hoc solution: convert negative Mplier to positive, do the multiplication, negate the result o Example: 1011 1101 11 11 11 1 -5x +3 001 -15 0001111 +15 -5x -3 1 1 1 1 0 0 0 000 111 1 1 Spring 2006 1 0 1 0 0 0 0 EE 5324 - VLSI Design II - © Kia Bazargan 0 0 0 1 0 170 Multiplying Signed Numbers: Efficient Method • Using almost the same architecture, we can do signed mult w/o negating the result • Idea: “What if we had negated the mplier?” M 1 0101 =+5x 11 0 1 = -3 • Consider and as positive magnitudes (forget about the 2’s complement convention for now) • We want to use computation: . M Previously, we negated 1 to get 0 , then . M and negated it negate 1 1 0 1 =-3 0 0 1 1 =+3 computed Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 171 Multiplying Signed Numbers: Efficient Method • The negation process k-1 k-2 . . . 1 0 1 - + 0 negate 1 = 1 = 2k – 1 = 2k – (2k-1 + ) = 2k – 2k-1 - = (2k – 2k-1) - = 2k-1 - k-1 - = 2 = 0 Spring 2006 2k-1 = - k-1 k-2 . . . 1 0 EE 5324 - VLSI Design II - © Kia Bazargan 172 Multiplying Signed Numbers: Efficient Method Machine’s understanding Our interpretation k-1 k-2 . . . 1 0 k-1 k-2 . . . 1 0 1 3 2 1 0 1 1 0 1 = - 0 = - 0 3 2 1 0 0 1 1 = 2k-1 - 3 = 23 - 5 • We used to compute: - ( . M) - . M = - (2k-1 - ) . M = -2k-1 . M + .M Subtract the mcand for the last bit Normal mult for the first k-1 bits Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 173 Multiplying Signed Numbers: Example Normal mult for the first k-1 bits Use a subtractor for the last pproduct Spring 2006 0 0 1 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 (-5) 0 0 1 EE 5324 - VLSI Design II - © Kia Bazargan +5x -3 -15 174 Booth Recoding: Signed Numbers • For unsigned numbers, increase bit-width on mplier & mcand (add 0 to the left) 1 +1 0 1 1 0 1 0 -1 +1 0 1 1 0 0 -1 0 0 0 1 0 0 +1 -1 0 • If dealing with Signed numbers, discard the extra bit Why does it work? M. = M.( - 2k) = -M.(2k -) = -M. ( is the positive, 2’s compliment of ) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 175 Booth Recoding: Signed Mult Example 1 1 0 1 1 0 1 1 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 -1 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1 -1 0 1 0 1 0 1 1 0 0 1 -1 0 -10x -11 (+10) 1 1 1 0 Note: the column which has ‘1111’ generates a carry of ’10’ if calculating by hand Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 176 Multiplier: Summary • Goals different than addition In some structures, sum and carry delay equal Analysis more difficult : Multiple critical paths • Different levels of optimization Data encoding (Booth) Architecture-level: Wallace Tree Gate-level: pipelining Transistor-level: equal sum, carry delays • More to cover: Constant multiplication Floating point, precision Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 177 Outline • Serial Multiplier • Multiplier arrays • Carry save adder (CSA) and multiple operand addition • Booth encoding • Pipelined multipliers • Wallace tree • Signed multiplication • Shifters Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 178 Shift and Rotate Operations • Used in: Microprocessors Encryption algorithms • If fixed shift, simply wire the inputs to the correct output positions • Variable shift One-bit shifter Barrel shifter Logarithmic shifter Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 179 One-bit Shifter Right NOP Left Ai Bi Ai-1 Bi-1 Bit-slice i [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 180 Simple n-bit Shifter • Quadratic number of transistors • One switch per path in1 in2 in3 in4 out1 out2 out3 out4 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 181 Barrel Shifter A3 A2 B3 Sh1 B2 Data Wire Sh2 A1 A0 B1 Sh3 B0 Sh0 Spring 2006 Bit 3 wrapped around Sh1 Sh2 Sh3 EE 5324 - VLSI Design II - © Kia Bazargan Control Wire Area dominated by wiring [©Prentice Hall] 182 Barrel Shifter: Layout Example A3 A2 A1 A0 Sh0 Sh1 Sh2 Sh3 Buffer [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 183 Logarithmic Shifter i1 i2 i3 i4 S1 S2 S1' S2' S1 S1 S2 S2 S1' S2' S1 S1 S2 S2 S1' S2' S1 S1 S2 S2 S1' S2' S1 S2 o1 Simplified structure but more stages (greater delay) o2 o3 o4 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 184 Logarithmic Shifter: Layout [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 185 Shift: Summary • Trade-off between area, delay Barrel shifter: fastest O(1), n2 transistors Logarithmic shifter: O(log n), n log n transistors One-bit shifter: O(n), n transistors • Barrel shifter: wire-dominated circuit Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 186