Introduction to CMOS VLSI Design Lecture 12: Datapath Functional Units David Harris Harvey Mudd College Spring 2004 Outline Comparators Shifters Multi-input Adders Multipliers 12: Datapath Functional Units CMOS VLSI Design Slide 2 Comparators 0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B 12: Datapath Functional Units CMOS VLSI Design Slide 3 1’s & 0’s Detectors 1’s detector: N-input AND gate 0’s detector: NOTs + 1’s detector (N-input NOR) A7 A6 A5 A4 A3 A2 A3 A2 allones allzeros A1 A0 A1 A0 A7 A6 A5 A4 A3 A2 allones A1 A0 12: Datapath Functional Units CMOS VLSI Design Slide 4 Equality Comparator Check if each bit is equal (XNOR, aka equality gate) 1’s detect on bitwise equality B[3] A[3] B[2] A[2] A=B B[1] A[1] B[0] A[0] 12: Datapath Functional Units CMOS VLSI Design Slide 5 Magnitude Comparator Compute B-A and look at sign B-A = B + ~A + 1 For unsigned numbers, carry out is sign bit A B C B3 A B N A3 B2 A2 B1 Z A=B A1 B0 A0 12: Datapath Functional Units CMOS VLSI Design Slide 6 Signed vs. Unsigned For signed numbers, comparison is harder – – – – C: carry out Z: zero (all bits of A-B are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign B) 12: Datapath Functional Units CMOS VLSI Design Slide 7 Shifters Logical Shift: – Shifts number left or right and fills with 0’s • 1011 LSR 1 = ____ 1011 LSL1 = ____ Arithmetic Shift: – Shifts number left or right. Rt shift sign extends • 1011 ASR1 = ____ 1011 ASL1 = ____ Rotate: – Shifts number left or right and fills with lost bits • 1011 ROR1 = ____ 1011 ROL1 = ____ 12: Datapath Functional Units CMOS VLSI Design Slide 8 Shifters Logical Shift: – Shifts number left or right and fills with 0’s • 1011 LSR 1 = 0101 1011 LSL1 = 0110 Arithmetic Shift: – Shifts number left or right. Rt shift sign extends • 1011 ASR1 = 1101 1011 ASL1 = 0110 Rotate: – Shifts number left or right and fills with lost bits • 1011 ROR1 = 1101 1011 ROL1 = 0111 12: Datapath Functional Units CMOS VLSI Design Slide 9 Funnel Shifter A funnel shifter can do all six types of shifts Selects N-bit field Y from 2N-bit input – Shift by k bits (0 k < N) 2N-1 N-1 B 0 C offset + N-1 offset Y 12: Datapath Functional Units CMOS VLSI Design Slide 10 Funnel Shifter Operation Computing N-k requires an adder 12: Datapath Functional Units CMOS VLSI Design Slide 11 Funnel Shifter Operation Computing N-k requires an adder 12: Datapath Functional Units CMOS VLSI Design Slide 12 Funnel Shifter Operation Computing N-k requires an adder 12: Datapath Functional Units CMOS VLSI Design Slide 13 Funnel Shifter Operation Computing N-k requires an adder 12: Datapath Functional Units CMOS VLSI Design Slide 14 Funnel Shifter Operation Computing N-k requires an adder 12: Datapath Functional Units CMOS VLSI Design Slide 15 Simplified Funnel Shifter Optimize down to 2N-1 bit input 12: Datapath Functional Units CMOS VLSI Design Slide 16 Simplified Funnel Shifter Optimize down to 2N-1 bit input 12: Datapath Functional Units CMOS VLSI Design Slide 17 Simplified Funnel Shifter Optimize down to 2N-1 bit input 12: Datapath Functional Units CMOS VLSI Design Slide 18 Simplified Funnel Shifter Optimize down to 2N-1 bit input 12: Datapath Functional Units CMOS VLSI Design Slide 19 Simplified Funnel Shifter Optimize down to 2N-1 bit input 12: Datapath Functional Units CMOS VLSI Design Slide 20 Funnel Shifter Design 1 N N-input multiplexers – Use 1-of-N hot select signals for shift amount – nMOS pass transistor design (Vt drops!) k[1:0] left Inverters & Decoder s3 s2 s1 s0 Y3 Y2 Z6 Y1 Z5 Y0 Z4 Z3 Z2 Z1 12: Datapath Functional Units Z0 CMOS VLSI Design Slide 21 Funnel Shifter Design 2 Log N stages of 2-input muxes – No select decoding needed k1 k0 left Z0 Z1 Z2 Z3 Y0 Y1 Y2 Y3 Z4 Z5 Z6 12: Datapath Functional Units CMOS VLSI Design Slide 22 Multi-input Adders Suppose we want to add k N-bit words – Ex: 0001 + 0111 + 1101 + 0010 = _____ 12: Datapath Functional Units CMOS VLSI Design Slide 23 Multi-input Adders Suppose we want to add k N-bit words – Ex: 0001 + 0111 + 1101 + 0010 = 10111 12: Datapath Functional Units CMOS VLSI Design Slide 24 Multi-input Adders Suppose we want to add k N-bit words – Ex: 0001 + 0111 + 1101 + 0010 = 10111 Straightforward solution: k-1 N-input CPAs – Large and slow 0001 0111 1101 0010 + + 10101 + 10111 12: Datapath Functional Units CMOS VLSI Design Slide 25 Carry Save Addition A full adder sums 3 inputs and produces 2 outputs – Carry output has twice weight of sum output N full adders in parallel are called carry save adder – Produce N sums and N carry outs X4 C4 S4 Y4 Z4 X3 Y3 Z3 X2 C3 S3 C2 S2 Y2 Z2 X1 Y 1 Z1 C1 S1 XN...1 YN...1 ZN...1 n-bit CSA CN...1 12: Datapath Functional Units SN...1 CMOS VLSI Design Slide 26 CSA Application Use k-2 stages of CSAs – Keep result in carry-save redundant form Final CPA computes actual result 0001 0001 0111 1101 0010 0111 +1101 1011 4-bit CSA 0101_ 0101_ 1011 0101_ 1011 5-bit CSA +0010 X Y Z S C X Y Z S C + A B S 12: Datapath Functional Units CMOS VLSI Design Slide 27 CSA Application Use k-2 stages of CSAs – Keep result in carry-save redundant form Final CPA computes actual result 0001 0001 0111 1101 0010 0111 +1101 1011 4-bit CSA 0101_ 0101_ 1011 0101_ 1011 5-bit CSA +0010 01010_ 00011 00011 01010_ + 01010_ + 00011 12: Datapath Functional Units X Y Z S C X Y Z S C A B S CMOS VLSI Design Slide 28 CSA Application Use k-2 stages of CSAs – Keep result in carry-save redundant form Final CPA computes actual result 0001 0001 0111 1101 0010 0111 +1101 1011 4-bit CSA 0101_ 0101_ 1011 0101_ 1011 5-bit CSA +0010 01010_ 00011 00011 01010_ + 01010_ 10111 + 00011 10111 12: Datapath Functional Units X Y Z S C X Y Z S C A B S CMOS VLSI Design Slide 29 Multiplication Example: 12: Datapath Functional Units 1100 : 1210 0101 : 510 CMOS VLSI Design Slide 30 Multiplication Example: 12: Datapath Functional Units 1100 : 1210 0101 : 510 1100 CMOS VLSI Design Slide 31 Multiplication Example: 12: Datapath Functional Units 1100 : 1210 0101 : 510 1100 0000 CMOS VLSI Design Slide 32 Multiplication Example: 1100 : 1210 0101 : 510 1100 0000 1100 12: Datapath Functional Units CMOS VLSI Design Slide 33 Multiplication Example: 1100 : 1210 0101 : 510 1100 0000 1100 0000 12: Datapath Functional Units CMOS VLSI Design Slide 34 Multiplication Example: 1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010 12: Datapath Functional Units CMOS VLSI Design Slide 35 Multiplication Example: 1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010 multiplicand multiplier partial products product M x N-bit multiplication – Produce N M-bit partial products – Sum these to produce M+N-bit product 12: Datapath Functional Units CMOS VLSI Design Slide 36 General Form Y = (yM-1, yM-2, …, y1, y0) X = (xN-1, xN-2, …, x1, x0) Multiplicand: Multiplier: N 1 M 1 j i P y j 2 xi 2 j0 i0 Product: p11 y5 y4 y3 y2 y1 y0 x5 x4 x3 x2 x1 x0 x0y5 x0y4 x0y3 x0y2 x0y1 x0y0 x1y5 x1y4 x1y3 x1y2 x1y1 x1y0 x2y5 x2y4 x2y3 x2y2 x2y1 x2y0 x3y5 x3y4 x3y3 x3y2 x3y1 x3y0 x4y5 x4y4 x4y3 x4y2 x4y1 x4y0 x5y5 x5y4 x5y3 x5y2 x5y1 x5y0 p10 p9 p8 p7 p6 p5 12: Datapath Functional Units p4 p3 p2 N 1 M 1 i0 xi y j 2 i j j0 multiplicand multiplier partial products p1 CMOS VLSI Design p0 product Slide 37 Dot Diagram Each dot represents a bit x0 partial products multiplier x x15 12: Datapath Functional Units CMOS VLSI Design Slide 38 Array Multiplier y3 y2 y1 y0 x0 x1 CSA Array x2 x3 CPA p7 p6 p5 p4 p3 p2 p1 p0 A B critical path Sin A Cin B Sin B Cout = Cout Cout A Sout 12: Datapath Functional Units Cin Cin Sout A = Cout B Cin Sout Sout CMOS VLSI Design Slide 39 Rectangular Array Squash array to fit rectangular floorplan y3 y2 y1 y0 x0 p0 x1 p1 x2 p2 x3 p3 p7 12: Datapath Functional Units p6 p5 p4 CMOS VLSI Design Slide 40 Fewer Partial Products Array multiplier requires N partial products If we looked at groups of r bits, we could form N/r partial products. – Faster and smaller? – Called radix-2r encoding Ex: r = 2: look at pairs of bits – Form partial products of 0, Y, 2Y, 3Y – First three are easy, but 3Y requires adder 12: Datapath Functional Units CMOS VLSI Design Slide 41 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 42 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 43 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 44 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 45 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 46 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 47 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 48 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 12: Datapath Functional Units CMOS VLSI Design Slide 49 Booth Hardware Booth encoder generates control lines for each PP – Booth selectors choose PP bits yj yj-1 Xi x2i-1 x2i 2Xi x2i+1 Mi Booth Encoder Booth Selector PPij 12: Datapath Functional Units CMOS VLSI Design Slide 50 Sign Extension Partial products can be negative – Require sign extension, which is cumbersome – High fanout on most significant bit s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s PP0 PP1 PP2 s s s 0 x-1 x0 PP3 PP4 multiplier x s s s s s s s s PP5 PP6 PP7 PP8 12: Datapath Functional Units CMOS VLSI Design x15 0 x16 0 x17 Slide 51 Simplified Sign Ext. Sign bits are either all 0’s or all 1’s – Note that all 0’s is all 1’s + 1 in proper column – Use this to reduce loading on MSB s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 s 1 1 1 1 1 s 1 1 1 s 1 s s s s s s s s PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8 12: Datapath Functional Units CMOS VLSI Design Slide 52 Even Simpler Sign Ext. No need to add all the 1’s in hardware – Precompute the answer! s s s 1 s s PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8 CMOS VLSI Design Slide 53 s 1 s s 1 s s 1 s s 1 s s 1 s s s 12: Datapath Functional Units s Advanced Multiplication Signed vs. unsigned inputs Higher radix Booth encoding Array vs. tree CSA networks 12: Datapath Functional Units CMOS VLSI Design Slide 54