Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 EE141 Integrated © Digital Circuits2nd 1 Arithmetic Circuits A Generic Digital Processor INPUT-OUTPUT MEM ORY CONTROL DATAPATH EE141 Integrated © Digital Circuits2nd 2 Arithmetic Circuits Building Blocks for Digital Architectures Arithmetic unit - Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.) Memory - RAM, ROM, Buffers, Shift registers Control - Finite state machine (PLA, random logic.) - Counters Interconnect - Switches - Arbiters - Bus EE141 Integrated © Digital Circuits2nd 3 Arithmetic Circuits a g64 CARRYGEN SUMSEL node1 2-1 Mux 9-1 Mux ck1 b SUMGEN + LU REG 5-1 Mux 9-1 Mux An Intel Microprocessor sum sumb to Cache s0 s1 LU : Logical Unit 1000um Itanium has 6 integer execution units like this EE141 Integrated © Digital Circuits2nd 4 Arithmetic Circuits Bit-Sliced Design Control Bit 2 Bit 1 Data-Out Multiplexer Shifter Adder Register Data-In Bit 3 Bit 0 Tile identical processing elements EE141 Integrated © Digital Circuits2nd 5 Arithmetic Circuits Bit-Sliced Datapath From register files / Cache / Bypass Multiplexers Shifter Adder stage 1 Adder stage 2 Wiring Bit slice 0 Bit slice 1 Bit slice 2 Bit slice 63 Adder stage 3 Loopback Bus Loopback Bus Loopback Bus Wiring Sum Select To register files / Cache EE141 Integrated © Digital Circuits2nd 6 Arithmetic Circuits Itanium Integer Datapath Fetzer, Orton, ISSCC’02 2nd EE141 Integrated Circuits © Digital 7 Arithmetic Circuits Adders EE141 Integrated © Digital Circuits2nd 8 Arithmetic Circuits Full-Adder A Cin B Full adder Cout Sum EE141 Integrated © Digital Circuits2nd 9 Arithmetic Circuits The Binary Adder A Cin B Full adder Cout Sum S = A B Ci = ABC i + ABC i + ABCi + ABCi C o = AB + BC i + ACi EE141 Integrated © Digital Circuits2nd 10 Arithmetic Circuits Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB Propagate (P) = A B Delete = A B Can also derive expressions for S and Co based on D and P Note that we will be sometimes using an alternate definition for Propagate (P) = A + B EE141 Integrated © Digital Circuits2nd 11 Arithmetic Circuits The Ripple-Carry Adder Worst case delay linear with the number of bits td = O(N) tadder = (N-1)tcarry + tsum Goal: Make the fastest possible carry path circuit EE141 Integrated © Digital Circuits2nd 12 Arithmetic Circuits Complimentary Static CMOS Full Adder 28 Transistors EE141 Integrated © Digital Circuits2nd 13 Arithmetic Circuits Inversion Property A Ci A B FA S Co Ci B FA Co S S A B C i = S A B C i C A B C = C A B C o i o i EE141 Integrated © Digital Circuits2nd 14 Arithmetic Circuits Minimize Critical Path by Reducing Inverting Stages Exploit Inversion Property EE141 Integrated © Digital Circuits2nd 15 Arithmetic Circuits A Better Structure: The Mirror Adder VDD VDD A B VDD A B B Ci A B Kill "0"-Propagate A Ci Co Ci S Ci A "1"-Propagate Generate A B B A B Ci A B 24 transistors EE141 Integrated © Digital Circuits2nd 16 Arithmetic Circuits Mirror Adder Stick Diagram VDD A B Ci B A Ci Co Ci A B Co S GND EE141 Integrated © Digital Circuits2nd 17 Arithmetic Circuits The Mirror Adder •The NMOS and PMOS chains are completely symmetrical. A maximum of two series transistors can be observed in the carrygeneration circuitry. •When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. The reduction of the diffusion capacitances is particularly important. •The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell . •The transistors connected to Ci are placed closest to the output. •Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size. EE141 Integrated © Digital Circuits2nd 18 Arithmetic Circuits Transmission Gate Full Adder P VDD Ci A P A A P B VDD Ci A P Ci S Sum Generation Ci P B VDD A P Co Carry Generation Ci A Setup EE141 Integrated © Digital VDD Circuits2nd P 19 Arithmetic Circuits Manchester Carry Chain VDD Pi VDD Pi Co Ci Gi Co Ci Gi Di Pi EE141 Integrated © Digital Circuits2nd 20 Arithmetic Circuits Manchester Carry Chain VDD P0 P1 P2 P3 C3 Ci,0 G1 G0 G3 G2 C0 EE141 Integrated © Digital Circuits2nd C1 C2 C3 21 Arithmetic Circuits Manchester Carry Chain Stick Diagram Propagate/Generate Row VDD Pi Ci - 1 Gi Pi + 1 Gi + 1 Ci Ci + 1 GND Inverter/Sum Row EE141 Integrated © Digital Circuits2nd 22 Arithmetic Circuits Carry-Bypass Adder G1 Ci,0 P0 G1 C o,0 P0 FA P2 Also called Carry-Skip FA G2 Co,1 FA G3 Co,3 FA G1 C o,0 P3 Co,2 FA P0 G1 G2 C o,1 FA Ci,0 P2 P3 G3 BP=P oP1 P2 P3 C o,2 FA FA Multiplexer P0 Co,3 Idea: If (P0 and P1 and P2 and P3 = 1) then C o3 = C 0, else “kill” or “generate”. EE141 Integrated © Digital Circuits2nd 23 Arithmetic Circuits Carry-Bypass Adder (cont.) tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum EE141 Integrated © Digital Circuits2nd 24 Arithmetic Circuits Carry Ripple versus Carry Bypass tp ripple adder bypass adder 4..8 EE141 Integrated © Digital Circuits2nd N 25 Arithmetic Circuits Carry-Select Adder Setup P,G "0" "0" Carry Propagation "1" "1" Carry Propagation Co,k-1 Multiplexer Co,k+3 Carry Vector Sum Generation EE141 Integrated © Digital Circuits2nd 26 Arithmetic Circuits Carry Select Adder: Critical Path EE141 Integrated © Digital Circuits2nd 27 Arithmetic Circuits Linear Carry Select Bit 0-3 Bit 4-7 Setup Setup Bit 8-11 Bit 12-15 Setup Setup (1) "0" Carry "0" "0" "0" Carry "0" "0" Carry "0" "0" Carry (1) "1" Carry "1" "1" Carry "1" (5) (5) (5) (6) Multiplexer "1" Carry "1" (5) (7) Multiplexer "1" Carry "1" (5) (8) Multiplexer Ci,0 Multiplexer (9) Sum Generation S0-3 EE141 Integrated © Digital Circuits2nd Sum Generation Sum Generation S 4-7 S8-11 Sum Generation S 12-15 (10) 28 Arithmetic Circuits Square Root Carry Select Bit 0-1 Bit 2-4 Setup Setup Bit 5-8 Bit 9-13 Setup Setup Bit 14-19 (1) "0" Carry "0" "0" "0" Carry "0" "0" Carry "0" "0" Carry (1) "1" Carry "1" Carry "1" (3) "1" "1" Carry "1" (3) (4) (4) Multiplexer (5) (5) Multiplexer "1" Carry "1" (6) Multiplexer (7) (6) (7) Multiplexer Mux Ci,0 (8) Sum Generation Sum Generation Sum Generation S2-4 S5-8 S0-1 EE141 Integrated © Digital Circuits2nd Sum Generation S9-13 Sum S14-19 (9) 29 Arithmetic Circuits Adder Delays - Comparison EE141 Integrated © Digital Circuits2nd 30 Arithmetic Circuits LookAhead - Basic Idea C o k = f A k B k Co k – 1 = Gk + P kCo k – 1 EE141 Integrated © Digital Circuits2nd 31 Arithmetic Circuits Look-Ahead: Topology Expanding Lookahead equations: C o k = Gk + Pk Gk – 1 + Pk – 1Co k – 2 All the way: C o k = Gk + Pk Gk – 1 + P k – 1 + P1 G0 + P0 Ci 0 EE141 Integrated © Digital Circuits2nd 32 Arithmetic Circuits Logarithmic Look-Ahead Adder F A0 A1 A2 A3 A4 A5 A6 A7 tp N A0 A1 A2 A3 F A4 A5 A6 A7 EE141 Integrated © Digital Circuits2nd tp log2(N) 33 Arithmetic Circuits Carry Lookahead Trees Co 0 = G0 + P0 Ci 0 C o 1 = G1 + P1 G0 + P1 P0 Ci 0 C o 2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C i 0 = G2 + P2 G1 + P2 P1 G0 + P0 Ci 0 = G 2:1 + P2:1 C o 0 Can continue building the tree hierarchically. EE141 Integrated © Digital Circuits2nd 34 Arithmetic Circuits EE141 Integrated © Digital S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 (A3, B3) (A4, B4) (A5, B5) (A6, B6) (A7, B7) (A8, B8) (A9, B9) (A10, B10) (A11, B11) (A12, B12) (A13, B13) (A14, B14) (A15, B15) S1 (A1, B1) (A2, B2) S0 (A0, B0) Tree Adders 16-bit radix-2 Kogge-Stone tree Circuits2nd 35 Arithmetic Circuits EE141 Integrated © Digital S2 S3 S4 S5 S6 S7 S8 S9 S 10 S 11 S 12 S 13 S 14 S 15 (a 3, b 3) (a 4, b 4) (a 5, b 5) (a 6, b 6) (a 7, b 7) (a 8, b 8) (a 9, b 9) (a 10, b 10) (a 11, b 11) (a 12, b 12) (a 13, b 13) (a 14, b 14) (a 15, b 15) S1 (a 1, b 1) (a 2, b 2) S0 (a 0, b 0) Tree Adders 16-bit radix-4 Kogge-Stone Tree Circuits2nd 36 Arithmetic Circuits EE141 Integrated © Digital S2 S3 S4 S5 S6 S7 S8 S9 S 10 S 11 S 12 S 13 S 14 S 15 (a 3, b 3) (a 4, b 4) (a 5, b 5) (a 6, b 6) (a 7, b 7) (a 8, b 8) (a 9, b 9) (a 10, b 10) (a 11, b 11) (a 12, b 12) (a 13, b 13) (a 14, b 14) (a 15, b 15) S1 (a 1, b 1) (a 2, b 2) S0 (a 0, b 0) Sparse Trees 16-bit radix-2 sparse tree with sparseness of 2 Circuits2nd 37 Arithmetic Circuits S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 (A0, B0) (A1, B1) (A2, B2) (A3, B3) (A4, B4) (A5, B5) (A6, B6) (A7, B7) (A8, B8) (A9, B9) (A10, B10) (A11, B11) (A12, B12) (A13, B13) (A14, B14) (A15, B15) Tree Adders Brent-Kung Tree EE141 Integrated © Digital Circuits2nd 38 Arithmetic Circuits Example: Domino Adder VDD VDD Clk Clk Pi= ai + bi ai bi Gi = aibi ai bi Clk Clk Propagate EE141 Integrated © Digital Circuits2nd Generate 39 Arithmetic Circuits Example: Domino Adder VDD VDD Clkk Clkk Gi:i-2k+1 Pi:i-2k+1 Pi:i-k+1 Pi:i-k+1 Gi:i-k+1 Gi-k:i-2k+1 Pi-k:i-2k+1 Propagate EE141 Integrated © Digital Circuits2nd Generate 40 Arithmetic Circuits Example: Domino Sum EE141 Integrated © Digital Circuits2nd 41 Arithmetic Circuits Multipliers EE141 Integrated © Digital Circuits2nd 42 Arithmetic Circuits The Binary Multiplication Z = ·· X Y M+ N– 1 = Zk 2 k=0 N – 1 i M – 1 = Xi 2 i=0 j = 0 = j Yj 2 M – 1 N – 1 k Xi Yj 2 i =0 j= 0 i + j with M –1 X = Xi 2 i=0 N– 1 Y = Y j2 j= 0 EE141 Integrated © Digital Circuits2nd i j 43 Arithmetic Circuits The Binary Multiplication EE141 Integrated © Digital Circuits2nd 44 Arithmetic Circuits The Array Multiplier EE141 Integrated © Digital Circuits2nd 45 Arithmetic Circuits The MxN Array Multiplier — Critical Path FA HA FA FA FA FA HA HA Critical Path 1 Critical Path 2 FA EE141 Integrated © Digital FA Circuits2nd FA HA Critical Path 1 & 2 46 Arithmetic Circuits Carry-Save Multiplier HA HA HA HA HA FA FA FA HA FA FA FA FA FA HA HA Vector Merging Adder EE141 Integrated © Digital Circuits2nd 47 Arithmetic Circuits Multiplier Floorplan X3 X2 X1 X0 Y0 Y1 C S C S C S C S HA Multiplier Cell Z0 FA Multiplier Cell Y2 Y3 Z7 EE141 Integrated © Digital C S C S C S C S C S C S C S C S C C C C S Z6 Circuits2nd S Z5 S Z4 Z1 Vector Merging Cell Z2 X and Y signals are broadcasted through the complete array. ( ) S Z3 48 Arithmetic Circuits Wallace-Tree Multiplier EE141 Integrated © Digital Circuits2nd 49 Arithmetic Circuits Wallace-Tree Multiplier EE141 Integrated © Digital Circuits2nd 50 Arithmetic Circuits Wallace-Tree Multiplier y0 y1 y2 y0 y1 y2 Ci-1 FA y3 Ci y3 y4 y5 FA Ci-1 FA FA Ci Ci-1 Ci Ci-1 y4 FA Ci Ci-1 FA Ci Ci-1 y5 FA Ci FA C C EE141 Integrated © Digital S S Circuits2nd 51 Arithmetic Circuits Multipliers —Summary • Optimization Goals Different Vs Binary Adder • Once Again: Identify Critical Path • Other possible techniques - Logarithmic versus Linear (Wallace Tree Mult) - Data encoding (Booth) - Pipelining FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION EE141 Integrated © Digital Circuits2nd 52 Arithmetic Circuits Shifters EE141 Integrated © Digital Circuits2nd 53 Arithmetic Circuits The Binary Shifter Right nop Ai Left Bi Ai-1 Bi-1 Bit-Slice i ... EE141 Integrated © Digital Circuits2nd 54 Arithmetic Circuits The Barrel Shifter A3 B3 Sh1 A2 B2 : Data Wire Sh2 A1 B1 : Control Wire Sh3 A0 B0 Sh0 Sh1 Sh2 Sh3 Area Dominated by Wiring EE141 Integrated © Digital Circuits2nd 55 Arithmetic Circuits 4x4 barrel shifter A3 A2 A1 A0 Sh0 Sh2 Sh1 Sh3 Buffer Widthbarrel ~ 2 pm M EE141 Integrated © Digital Circuits2nd 56 Arithmetic Circuits Logarithmic Shifter Sh1 Sh1 Sh2 Sh2 Sh4 Sh4 A3 B3 A2 B2 A1 B1 A0 B0 EE141 Integrated © Digital Circuits2nd 57 Arithmetic Circuits 0-7 bit Logarithmic Shifter A A A A 3 Out3 2 Out2 1 Out1 0 EE141 Integrated © Digital Out0 Circuits2nd 58 Arithmetic Circuits