Chapter 14 Arithmetic Circuits Rev. 1.0 05/12/2003 Rev. 2.0 06/05/2003 1 EE141 Arithmetic Circuits A Generic Digital Processor INPUT-OUTPUT MEM ORY CONTROL DATAPATH 2 EE141 Arithmetic Circuits Building Blocks for Digital Architectures Arithmetic and Unit - Bit-sliced datapath (adder, multiplier, shifter, comparator, etc.) Memory - RAM, ROM, Buffers, Shift registers Control - Finite state machine (PLA, random logic.) - Counters Interconnect - Switches - Arbiters - Bus 3 EE141 Arithmetic Circuits a g64 CARRYGEN SUMSEL node1 2-1 Mux 9-1 Mux ck1 b SUMGEN + LU REG 5-1 Mux 9-1 Mux Intel Microprocessor sum sumb to Cache s0 s1 LU : Logical Unit 1000um Itanium has 6 integer execution units like this 4 EE141 Arithmetic Circuits Bit-Sliced Design Control Bit 2 Bit 1 Data-Out Multiplexer Shifter Adder Register Data-In Bit 3 Bit 0 Tile identical processing elements 5 EE141 Arithmetic Circuits Itanium Integer Datapath EE141 Fetzer, Orton, ISSCC’02 6 Arithmetic Circuits Adders 7 EE141 Arithmetic Circuits Full-Adder (FA) A Cin B Full adder Cout Sum Generate (G) = AB Propagate (P) = A B Delete = A B 8 EE141 Arithmetic Circuits Boolean Function of Binary Full-Adder A Cin S A B Ci B Full adder Cout Sum ABC i ABC i ABCi ABCi CO AB BCi Ci A CO AB Ci ( A B) S ABCi C i ( A B Ci ) CMOS Implementation 9 EE141 Arithmetic Circuits Express Sum and Carry as a function of P, G, D Define 3 new variable which ONLY depend on A, B Generate (G) = AB Propagate (P) = A B Delete = A B Can also derive expressions for S and Co based on D and P Note that we will be sometimes using an alternate definition for Propagate (P) = A B 10 EE141 Arithmetic Circuits Ripple-Carry Adder A0 B0 Ci,0 A1 B1 Co,0 FA S0 ( Ci,1) A2 B2 Co,1 A3 B3 Co,2 Co,3 FA FA FA S1 S2 S3 Critical Path Worst-case delay is linear with the number of bits tadder = (N-1)tcarry + tsum td = O(N) •Propagation delay (or critical path) is the worst-case delay over all possible input patterns •A= 0001, B=1111, trigger the worst-case delay •A: 0 1, and B= 1111 fixed to set up the worstcase delay transition. 11 EE141 Arithmetic Circuits Complimentary Static CMOS Full Adder VDD VDD A Ci A B 28 Transistors B A B B Ci A X Ci VDD Ci CO S A Ci A B B VDD A Co B Ci A B •Logic effort of Ci is reduced to 2 (c.f., A and B signals) •Ci is late arrival signal near the output signal 12 •Co needs to be inverted Slow down the ripple propagate EE141 Arithmetic Circuits Inversion Property A Ci A B FA S Co Ci B FA Co S S A B C i = S A B Ci C A B C = C A B C o i o i 13 EE141 Arithmetic Circuits Minimize Critical Path by Reducing Inverting Stages Even cell A0 B0 Ci,0 A1 B1 Co,0 A2 Odd cell B2 Co,1 A3 B3 Co,2 Co,3 FA FA FA FA S0 S1 S2 S3 •Exploit Inversion Property •Reduce One inverter delay in each Full-adder (FA) unit 14 EE141 Arithmetic Circuits A Better Structure: The Mirror Adder VDD VDD A B VDD A B B Ci A B Kill "0"-Propagate A Ci Co Ci S Ci A "1"-Propagate Generate A B B A B Ci A B 24 transistors Exploring the “Self-Duality” of the Sum and Carry functions 15 EE141 Arithmetic Circuits Mirror Adder: Stick Diagram VDD A B Ci B A Ci Co Ci A B Co S GND 16 EE141 Arithmetic Circuits Mirror Adder Design •The NMOS and PMOS chains are completely symmetrical •A maximum of two series transistors can be observed in the carry-generation circuitry for good speed. •When laying out the cell, the most critical issue is the minimization of the capacitance at node Co. •The capacitance at node Co is composed of four diffusion capacitances, two internal gate capacitances, and six gate capacitances in the connecting adder cell . •The transistors connected to Ci are placed closest to the output. 17 EE141 Arithmetic Circuits Transmission-Gate Full Adder (24T) P A B VDD P Ci A A P B VDD Ci A P Ci S Sum Generation Ci P B Setup VDD A P Co Carry Generation Ci A S P Ci PC i PCi P A VDD CO P Ci P A P •Same delay for Sum and Carry Multiplier design 18 EE141 Arithmetic Circuits Manchester Carry-Chain Adder VDD Pi VDD Pi Co Ci Co Ci Gi Gi Di Pi Static Circuits Dynamic Circuits CO ( P Ci Gi ) Di 19 EE141 Arithmetic Circuits Manchester Carry Chain VDD P0 P1 P2 P3 C3 Ci,0 G1 G0 G3 G2 C0 C1 C2 C3 20 EE141 Arithmetic Circuits Manchester Carry-Chain Adder Propagate/Generate Row VDD Pi Ci - 1 i t P 0.69 Ci R j i 1 j 1 N ( N 1) 0.69 RC 2 where R j R, Ci C N Gi Pi + 1 Gi + 1 Ci GND Ci + 1 Inverter/Sum Row 21 EE141 Arithmetic Circuits Carry-Bypass Adder G1 Ci,0 P0 G1 C o,0 P0 FA P2 FA G2 Co,1 FA G3 Co,3 FA G1 C o ,0 P3 Co,2 FA P0 G1 G2 C o ,1 FA Ci,0 P2 P3 G3 Also called Carry-Skip BP=P oP1 P2 P3 C o,2 FA FA Multiplexer P0 Co,3 Idea: If (P0 and P1 and P2 and P3 = 1) then Co3 = C0, else “kill” or “generate”. 22 EE141 Arithmetic Circuits Carry-Bypass Adder (cont.) Bit 0–3 Bit 4–7 Setup tsetup Setup tbypass Bit 8–11 Bit 12–15 Setup Setup Carry propagation Carry propagation Carry propagation Carry propagation Sum Sum Sum tsum Sum M bits tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum M bits form a Section (N/M) Bypass Stages 23 EE141 Arithmetic Circuits Carry Ripple versus Carry Bypass tp ripple adder bypass adder 4..8 N Wordlength (N) > 4~8 is better for Bypass Adder 24 EE141 Arithmetic Circuits Carry-Select Adder Setup P,G Co,k-1 "0" "0" Carry Propagation "1" "1" Carry Propagation Multiplexer Co,k+3 Carry Vector Sum Generation 25 EE141 Arithmetic Circuits Carry Select Adder: Critical Path Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15 Setup Setup Setup Setup 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry Ci,0 Multiplexer Co,3 Multiplexer Co,7 Multiplexer Co,11 Multiplexer Sum Generation Sum Generation Sum Generation Sum Generation S0–3 S4–7 S8–11 S12–15 Co,15 26 EE141 Arithmetic Circuits Linear Carry Select Bit 0-3 Bit 4-7 Setup Setup Bit 8-11 Bit 12-15 Setup Setup (1) "0" Carry "0" "0" "0" Carry "0" "0" Carry "0" "0" Carry (1) "1" Carry "1" "1" Carry "1" (5) (5) (5) (6) Multiplexer "1" Carry "1" (5) (7) Multiplexer "1" Carry "1" (5) (8) Multiplexer Ci,0 Multiplexer (9) Sum Generation S0-3 Sum Generation Sum Generation S 4-7 S8-11 Sum Generation S 12-15 (10) 27 EE141 Arithmetic Circuits Square Root Carry Select Bit 0-1 Bit 2-4 Setup Setup Bit 5-8 Bit 9-13 Setup Setup Bit 14-19 (1) "0" Carry "0" "0" "0" Carry "0" Carry "0" "0" "0" Carry (1) "1" Carry "1" (3) "1" Carry "1" (3) "1" Carry "1" (4) (4) Multiplexer "1" Carry "1" (5) (5) (6) Multiplexer Multiplexer (7) (6) (7) Multiplexer Mux Ci,0 (8) Sum Generation S0-1 Sum Generation Sum Generation S2-4 S5-8 Sum Generation S9-13 Sum S14-19 (9) N-bit adder with P stages, 1st stage adds M bits EE141 N P2 / 2 P 2N 28 Arithmetic Circuits Adder Delays - Comparison 50 Ripple adder tp (in unit delays) 40 30 Linear select 20 10 0 Square root select 0 20 40 60 N 29 EE141 Arithmetic Circuits Look-ahead Adder - Basic Idea A0, B0 Ci,0 A1, B1 P0 Ci,1 S0 P1 S1 AN-1, BN-1 ••• Ci, N-1 ••• PN-1 SN-1 CO ,k f ( Ak , Bk , CO ,k 1 ) Gk Pk CO ,k 1 CO ,k Gk Pk (Gk 1 Pk 1 CO ,k 2 ) CO ,k Gk Pk (Gk 1 Pk 1 ( P1 (G0 P0Ci , 0 ))) 30 EE141 Arithmetic Circuits Look-Ahead: Topology Expanding Lookahead equations: VDD C o k = Gk + Pk Gk – 1 + Pk – 1Co k – 2 G3 G2 All the way: G1 C o k = Gk + Pk Gk – 1 + P k – 1 + P1 G0 + P0 Ci 0 G0 Ci,0 Co,3 P0 P1 P2 P3 31 EE141 Arithmetic Circuits Multipliers 32 EE141 Arithmetic Circuits Binary Multiplication Z X Y M N 1 k Z 2 k k 0 N 1 M 1 N 1 M 1 i j i j X i 2 Y j 2 X iY j 2 i 0 j 0 i 0 j 0 with M 1 N 1 i 0 j 0 X X i 2i , Y Y j 2 j 33 EE141 Arithmetic Circuits Binary Multiplication 1 0 1 0 1 0 x 1 0 1 1 Multiplicand Multiplier 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 + Partial products 1 0 1 0 1 0 1 1 1 0 0 1 1 1 0 Result 34 EE141 Arithmetic Circuits Array Multiplier X3 Z7 X2 X1 X0 Y1 Z0 X3 X2 X1 X0 HA FA FA HA X3 X2 X1 X0 FA FA FA HA X3 X2 X1 X0 FA FA FA HA Z6 Z5 Z4 Y3 Y2 Y0 Z1 Z2 Z3 35 EE141 Arithmetic Circuits MxN Array Multiplier — Critical Path FA HA FA FA FA FA HA HA Critical Path 1 Critical Path 2 FA FA FA HA Critical Path 1 & 2 36 EE141 Arithmetic Circuits Carry-Save Multiplier HA HA HA HA HA FA FA FA HA FA FA FA FA FA HA HA Vector Merging Adder 37 EE141 Arithmetic Circuits Multiplier Floorplan X3 X2 X1 X0 Y0 Y1 C S C S C S C S HA Multiplier Cell Z0 FA Multiplier Cell Y2 Y3 Z7 C S C S C S C S C S C S C S C S C C C C S Z6 S Z5 S Z4 Z1 Vector Merging Cell Z2 X and Y signals are broadcasted through the complete array. ( ) S Z3 38 EE141 Arithmetic Circuits Wallace-Tree Multiplier First stage Partial products 6 5 3 4 2 1 0 6 5 4 1 0 Bit position 2 1 0 Final adder Second stage 5 2 (b) (a) 6 3 4 3 2 1 0 6 5 4 3 HA FA (c) (d) 39 EE141 Arithmetic Circuits Wallace-Tree Multiplier 40 EE141 Arithmetic Circuits Wallace-Tree Multiplier y0 y1 y2 y0 y1 y2 Ci-1 FA y3 Ci y3 y4 y5 FA Ci-1 FA FA Ci Ci-1 Ci Ci-1 y4 FA Ci Ci-1 FA Ci Ci-1 y5 FA Ci FA C C S S 41 EE141 Arithmetic Circuits Multipliers —Summary •Identify Critical Paths •Other Possible techniques: •Data Encoding (Booth) •Logarithmic v.s. Linear (Wallace Tree Multiplier) •Pipelining 42 EE141 Arithmetic Circuits Shifters 43 EE141 Arithmetic Circuits The Binary Shifter Right nop Ai Left Bi Ai-1 Bi-1 Bit-Slice i ... 44 EE141 Arithmetic Circuits The Barrel Shifter A3 B3 Sh1 A2 B2 : Data Wire Sh2 A1 B1 : Control Wire Sh3 A0 B0 Sh0 Sh1 Sh2 Sh3 Area Dominated by Wiring 45 EE141 Arithmetic Circuits 4x4 barrel shifter A3 A2 A1 A0 Sh0 Sh1 Sh2 Sh3 Buffer Widthbarrel ~ 2 pm M 46 EE141 Arithmetic Circuits Logarithmic Shifter Sh1 Sh1 Sh2 Sh2 Sh4 Sh4 A3 B3 A2 B2 A1 B1 A0 B0 47 EE141 Arithmetic Circuits 0-7 bit Logarithmic Shifter A A A A 3 Out3 2 Out2 1 Out1 0 Out0 48 EE141 Arithmetic Circuits Summary Datapath designs are fundamentals for highspeed DSP, Multimedia, Communication digital VLSI designs. Most adders, multipliers, division circuits are now available in Synopsys Designware under different area/speed constraint. For details, check “Advanced VLSI” notes, or “Computer Arithmetic” textbooks 49 EE141 Arithmetic Circuits