EE 5324 – VLSI Design II Part II: Adders Kia Bazargan University of Minnesota Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 68 References and Copyright • Textbooks referenced [WE92] N. H. E. Weste, K. Eshraghian “Principles of CMOS VLSI Design: A System Perspective” Addison-Wesley, 2nd Ed., 1992. [Rab96] J. M. Rabaey “Digital Integrated Circuits: A Design Perspective” Prentice Hall, 1996. [Par00] B. Parhami “Computer Arithmetic: Algorithms and Hardware Designs” Oxford University Press, 2000. Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 69 References and Copyright (cont.) • Slides used [©Hauck] © Scott A. Hauck, 1996-2000; G. Borriello, C. Ebeling, S. Burns, 1995, University of Washington [©Prentice Hall] © Prentice Hall 1995, © UCB 1996 Slides for [Rab96] http://bwrc.eecs.berkeley.edu/Classes/IcBook/instructors.html [©Oxford U Press] © Oxford University Press, New York, 2000 Slides for [Par00] With permission from the author http://www.ece.ucsb.edu/Faculty/Parhami/files_n_docs.htm Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 70 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 71 Why Adders? • Addition: a fundamental operation Basic block of most arithmetic operations Address calculation • Faster, faster and faster • How? Architectural level optimization Gate-level optimization Speed/area trade-off Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 72 Adding Two One-bit Operands • One-bit Half Adder: A Cout B Sum = A B HA Cout = A.B A 0 0 1 1 B Sum Cout 0 0 0 1 1 0 0 1 0 1 0 1 Sum • One-bit Full Adder: A Cout B FA Sum Spring 2006 Sum = A B Cin Cin Cout = A.B + B.Cin + A.Cin EE 5324 - VLSI Design II - © Kia Bazargan Cin A 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 B Sum Cout 0 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1 73 N-Bit Ripple-Carry Adder: Series of FA Cells • To add two n-bit numbers An-1 Bn-1 C FA A2 ... B2 A1 B1 A0 B0 FA FA FA S2 S1 S0 C0 n Sn-1 • Note: adder delay = Tc * n • Tc = (Cin:Cout delay) A Cou B FA t Ci n Sum Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 74 4-bit Ripple Carry Addition: Example A=0011 B=0101 C4 0 0 0 1 1 0 1 1 A3 B3 A2 B2 A1 B1 A0 B0 FA C3 S3 FA C2 S2 FA C1 S1 FA C0 0 S0 T=0 T=1 T=2 0 0 0 0 0 0 0 0 S=0000 0 0 0 1 0 1 1 0 S=0110 0 0 0 1 1 0 1 0 S=0100 T=3 T=4 0 0 1 0 1 0 1 0 S=0000 0 1 1 0 1 0 1 0 S=1000 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 75 One-bit Full Adder Implementation • Direct gate implementation Sum = A B Cin A B Cin Sum Cout = A.B + B.Cin + A.Cin = A.B + Cin. (A+B) A B Cin A B Cout 32 Transistors Used [WE92] p516 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 76 One-Bit Full Adder: Share Logic • An observation Almost always, sum = NOT carry includes 111 Sum = A.B.Cin + (A+B+Cin).Cout Cin A 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 B Sum Cout 0 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1 excludes 000 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 77 One-Bit Full Adder: Transistor Implementation Cout = A.B + C.(A+B) A B A B C Spring 2006 B A A B – – – – 28 Transistors C Cout C A Sum = A.B.C + (A+B+C).Cout B A B C A B C Sum C B A [WE92] p517 Use inverters to get Cout and Sum [Rab96] p390 C transistors close to output Cout delay: 2 inverting stages (1-stage possible?) Sum delay: 3 inverting stages (not an issue, though) EE 5324 - VLSI Design II - © Kia Bazargan 78 One-Bit Full Adder: Inverted Inputs • An observation Invert inputs => outputs invert FA FA • Exploit this property: Get rid of the inverter on the carry critical path Spring 2006 Cin A 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 B Sum Cout 0 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1 EE 5324 - VLSI Design II - © Kia Bazargan 79 Ripple Carry Adder: Inverting Property A B3 A 3 ... C4 B2 A 2 FA’ C3 B1 A 1 FA’ C2 B0 0 FA’ C1 FA’ S S S S 3 2 1 0 C0 FA’ is similar to FA, but with no inverters on the outputs Much faster (1-stage) Disadvantage: not regular data path Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 80 Summary: Ripple-Carry Adder • Basic ripple carry: AND-OR gates Area: 32 transistors (per bit position) Delay: 2 stages of inverting logic (per bit position) • Direct CMOS logic, share Cout’ Area: 28 transistors Delay: 2 stages • Use “inverting” property Area: 27 (odd bits:26, even bits:28) Delay: ~1 stage • So far: transistor/logic manipulation • Is that all we can do?!! Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 81 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 82 Carry-Lookahead Adder: Idea • New look: carry propagation • Idea: Try to “predict” Ck earlier than Tc*k Instead of passing through k stages, compute Ck separately using 1-stage CMOS logic • Carry propagation: an example Bit position 7 6 5 4 3 2 1 Carry 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 1 0 1 1 0 1 Sum 1 0 0 1 0 1 0 A B Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 0 1 1 + 0 83 Carry-Lookahead Adder (CLA): One Bit • What happens to the propagating carry in bit position k? 0-propagate A B A B Cin Cout 0 0 1 1 kill A 0 1 0 1 C C - 0 (kill) C (propagate) C (propagate) 1 (generate) B Cout C A B 1-propagate Spring 2006 B A p = A+B (or A B) g = A.B generate EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p391 84 CLA: Propagation Equations • If C4=1, then either: g3 g2.p3 g1.p2.p3 g0.p1.p2.p3 Cin.p0.p1.p2.p3 generated at bit pos 3 generated at bit pos 2, propagated 3 generated at bit pos 1, propagated 2,3 generated at bit pos 0, propagated 1,2,3 input carry, propagated 0,1,2,3 • C4 = g3+ g2.p3 + g1.p2.p3 + g0.p1.p2.p3 + Cin.p0.p1.p2.p3 Implement C4 as a one-stage CMOS logic delay=1 (or is it?) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 85 CLA: Static Logic Implementation d p1.g2.g3C4 o q e f r h s j Cin p0 p1 p2 p3 Spring 2006 t u v w x g3 g2 g1 g0 C4 k l m p3.g2 C4 n EE 5324 - VLSI Design II - © Kia Bazargan [Rab96] p405 [©Hauck] 86 CLA: Dynamic Logic Implementation • Dynamic gate implementation: C4 = g3+ p3 . (g2 + p2 . (g1 + p1 . (g0 + P0.Cin))) 6 transistors in series p2 p3 g2 p1 p0 g3 C4 g1 g0 Cin [WE92] p529 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 87 CLA: Dynamic Logic Implementation • Can we reuse logic? Can we get C1, C2 and C3 from the same circuit? C3? C2? C1? p3 p2 g2 p1 p0 g3 g1 g0 C4 No! C1, C2 and C3 may be floating (not precharged) Cin Spring 2006 Charge sharing problem EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 88 CLA: Dynamic Logic Implementation p0 Cin C1 g0 p1 p0 C2 g1 g0 Cin p1 p0 p2 g1 C3 g2 p1 g0 Cin Spring 2006 p3 p0 p2 g1 g3 C4 g2 g0 Cin EE 5324 - VLSI Design II - © Kia Bazargan [WE92] p529 89 CLA: Basic Block (4 Bits) Architecture • Block of 4-bit p, g, Cout A B3 A 3 B2 A 2 p,g p3 A 1 p,g g3 B1 p2 0 p,g g2 B0 p1 p,g g1 p0 g0 C0 C4 C1 C2 C3 Spring 2006 S S S S 3 2 1 0 EE 5324 - VLSI Design II - © Kia Bazargan 90 CLA: N-Bit Architecture • Put it all together: … A7 B7 A6 B6 A5 B5 A4 B4 … p,g p,g p,g p,g Carry Generator … … p,g p,g p,g p,g C0 Carry Generator C8 C4 S7 Spring 2006 A3 B3 A2 B2 A1 B1 A0 B0 S6 S5 S4 S3 EE 5324 - VLSI Design II - © Kia Bazargan S2 S1 S0 91 CLA: 12-Bit Example A= B= A11 1101 0111 A10 A9 A8 B11 B10 B9 B8 p,g p,g p,g p,g A7 1001 0110 B7 p,g A6 B6 p,g A5 B5 A4 p,g B4 p,g A3 1010 1101 B3 p,g A2 B2 p,g A1 B1 p,g A0 B0 p,g 0 C0 Carry Generator Carry Generator C12 Carry Generator C8 S11 S10 S9 S8 C4 S7 S6 S5 S4 S3 S2 S1 T=0 0 0000 0 0000 0 0000 T=2 T=3 T=4 1 0100 0 1111 1 0111 1 0100 1 0000 1 0111 1 0101 1 0000 1 0111 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan S0 92 Summary: Carry Lookahead Adder • CLA compared to ripple-carry adder: Faster (“4 times”?), but delay still linear (w.r.t. # of bits) Larger area o P, G signal generation o Carry generation circuits o Carry generation ckt for each bit position (no re-use) • Limitation: cannot go beyond 4 bits of look-ahead Large p,g fan-out slows down carry generation • Next: Manchester carry chains Tries to reuse logic by pre-charging each carry position Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 93 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 94 Recap: Carry Look-Ahead • Charge sharing problem C3? C2? C1? p0 p3 g3 p2 C4 g2 p1 g1 g0 Cin Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 95 Manchester Carry Chain: First Shot • Improvement over CLA: Precharge internal nodes to avoid charge-sharing problem Cin C1 C2 C3 p0 p1 p2 g0 g1 g2 p3 g3 C4 • Fastest way to do small adders – 6 transistors on the critical path Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 96 Manchester Carry Chain: Sizing Discharge Transistor R1 MC C1 2 R2 M0 C2 3 R3 M1 R4 4 M2 C3 5 C4 R5 M3 C5 6 R6 M4 C6 i tp = 0.69 Ci R j i = 1 j = 1 25 400 20 300 Area delay Speed N 15 100 10 5 1 2.0 2.5 3.0 k Speed (normalized by 0.69RC) Spring 2006 200 1.5 0 1 1.5 2.0 2.5 3.0 k Area (in minimum size devices) EE 5324 - VLSI Design II - © Kia Bazargan Out (“k” is the sizing factor) 1 [© Prentice Hall] 97 Manchester Carry Chain: An Improvement • Problem: Cin arrives late move it closer to output Use bypass logic: p0 Cin g0 p1 g1 p2 g2 p3 g3 C4 Cin p0 p1 p2 p3 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 98 Manchester Carry Chain: the Improvement • Direct implementation • Carry bypass circuitry p0 g0 p1 g1 p2 g 2 p3 g3 Cin C4 C1 C2 C3 C4 Cin C4 Cin p0 p1 p2 p3 • Advantages of the carry bypass circuitry – Only 5 series transistors – Less capacitance in internal nodes – Cin close to the output Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 99 Manchester Carry Chain: Summary • Compared to CLA: Smaller area o Pre-charge internal nodes o Reuse logic for intermediate carry signals Cin close to the output • Carry chain can be any length Series propagate is slow (O(n2) delay) buffer every 4 bits • Compact adder: good for up to 16 bits • Using carries to compute sum slows down MCC – Use two carry chains: one for sum, one for carry propagation [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 100 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 101 Carry Bypass Adder: Idea • The “bypass” idea is general Not just for Manchester carry chain The local carry chain could be “ripple carry adder” Bit i to i+k • Structure Setup Ci Local Carry Chain Sum Spring 2006 Ci+k+1 – Could be static, dynamic, pass transistor – Carry and sum paths shown in different colors – Bypass logic determines: “pass” or “kill/generate”? Bypass? EE 5324 - VLSI Design II - © Kia Bazargan 102 Carry Bypass Adder: Cell Examples • Static implementation, using ripple carry adder Local Carry Chain FA FA FA FA • Dynamic, Manchester (mux=wire!) p 0 p1 p2 p3 g0 g1 g2 g3 Cin p0 p1 p2 p3 Spring 2006 p0.p1.p2.p3 C4 [Rab96] p398 EE 5324 - VLSI Design II - © Kia Bazargan 103 Carry Bypass Adder: Cell Examples (cont.) • Static (pass transistor logic), Manchester T1=(p0.p1.p2).p3 p0 T2=p3 p1 T3=p0.p1.p2.p3 p2 T1 C0 p0 p1 g g 0 1 p0 T1 p2 T2 g2 p1 g3 p2 T2 T3 T3 Spring 2006 C4 EE 5324 - VLSI Design II - © Kia Bazargan [WE92] p531 104 Carry Bypass Adder: the Structure and Timing • Timing (Critical path shown in different color): 1-Setup 2-Local carry generate/kill, MUX select line ready 3-C0-C16 carry propagate (if applicable) Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15 Setup Setup Setup Setup Local Carry Chain Local Carry Chain Local Carry Chain Local Carry Chain Sum Sum Sum Sum C0 [Rab96] p.399 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 105 Carry Bypass Adder: Timing of a Sub-block • For an intermediate stage, after setup: If in pass mode o Local carry vector computes intermediate carries (possibly incorrectly) o At the same time, mux selection set to pass o When input carry arrives, intermediate carries might be recomputed o Meanwhile, input carry is sent to Cout – If not pass mode (assume bit 10 generates) • Local carry vector computes intermediate carries (bits 10, 11 correc) • At the same time, mux selection set to local • Meanwhile, output carry is sent to Cout correctly • When input carry arrives, intermediate carries C8and C9 (S8,S9,S10) will be recomputed correctly Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan Bit 8-11 Setup Local Carry Chain Sum Bit 8-11 Setup Local Carry Chain Sum 106 Carry Bypass Adder: Timing Delay = tsetup+ max { tselect , 4 x tFA} + 3 x tmux_pass + 3 x tFA+ tsum Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15 Setup Setup Setup Setup Local Carry Chain Local Carry Chain Local Carry Chain Local Carry Chain Sum Sum Sum Sum C0 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 107 • Speed: Faster than ripple adder Still linear! • Area overhead: Mux (setup?) Not worth for small adders (N<8) 10-20% for large adders Propagation Delay Carry Bypass Adder: Pros and Cons Ripple Adder Bypass Adder 4..8 Number of bits [Rab96] p.399 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 108 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 109 Carry Select Adder: the Idea • Similar to bypass k bits Setup (p,g) p,g p,g 0 0-Carry propagation 1 1-Carry propagation Ci Multiplexers Carry Vector Sum Generation Instead of “waiting” for the input carry, ”precompute” the carry output Compute Ci+k for both cases Ci=0 and Ci=1 When Ci arrives, select the appropriate result Sum computed in one step Ci+k after the intermediate carry signals are ready [Rab96] p.400 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 110 Linear Carry Select Adder: Structure Bits 0-3 Bits 4-7 Bits 8-11 Bits 12-15 Setup Setup Setup Setup 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry C0 C4 Sum C8 Sum C12 Sum C16 Sum [Rab96] p.401 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 111 Linear Carry Select Adder: Timing Bits 0-3 Bits 4-7 Bits 8-11 Bits 12-15 Setup Setup Setup Setup 0 0-Carry 0 0-Carry 0 0-Carry 0 0-Carry 1 1-Carry 1 1-Carry 1 1-Carry 1 1-Carry C0 C4 Sum C8 Sum C12 Sum Delay = 3 + 1 + 1 + 1 + 1 = 7 (16 bits) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan C16 Sum [Rab96] p.401 112 Square Root Carry Select Adder: the Idea • Later stages have to wait for the multiplexers in the earlier stages • Why not give them bigger chunks of data to compute? Balances the delay paths Sub-linear delay (we will see why) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 113 Square Root Carry Select Adder: the Structure • Assuming the following delays: Setup=1, carry propagate=1/bit, mux=1 Bits 0-1 Bits 2-4 3 C0 C2 Sum Bits 5-8 Bits 9-13 Bits 14-19 5 6 7 4 C5 C9 C14 C19 Delay from all paths = 8 (20 bits) [Rab96] p.402 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 114 Square Root Carry Select Adder: Delay • Assume N-bit adder P stages (delay directly depends on P) First stage computes M bits N M ( M 1) ( M 2) ( M P 1) P( P 1) MP 2 P2 1 P( M ) 2 2 • For M<<N (e.g. N=64, M=2) The first term dominates N P2/2 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan P 2N 115 Carry Select Adder: Trade-offs • Area overhead: An additional carry path and a multiplexer (not the whole adder) About 30% more than a ripple-carry • Delay Sub-linear (we can beat that too!) 40.0 ripple adder 30.0 20.0 10.0 0.0 0 linear select square root select 20 40 Number of bits 60 [© Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 116 Outline • One-bit adder, basic ripple-carry adder • Carry-Lookahead adders (CLA) • Manchester carry chain • Carry bypass • Carry select adder • Brent-Kung adder Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 117 Binary Carry-Lookahead or Brent-Kung Adder • Idea: use binary tree for carry propagation logarithmic delay F A0 A1 A2 A3 A0 A1 A2 A3 A4 A5 A6 A7 Spring 2006 A4 A5 A6 A7 tp ~ N F tp ~ log2(N) [© Prentice Hall] EE 5324 - VLSI Design II - © Kia Bazargan 118 Brent-Kung Adder • Basic component MSB LSB (gleft, pleft) gleft pleft gright (gright pright) pright (g, p) g p Concatenation g = gleft + pleft • gright p = pleft • pright [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 119 Brent-Kung Adder: Structure • Define (Gi, Pi) generate and propagate for least significant i bits (G0,P0) = (g0,p0) gi = Ai.Bi pi = AiBi • (Gi-1, Pi-1) = (gi, pi) • (gi-1, pi-1) • . . . . • (g1, p1) for i>0: (Gi, Pi) = (gi, pi) • Key to Brent-Kung adder – use tree structure to perform concatenations 7 6 5 4 3 2 1 C5 ? No! Doesn’t know about C0-3 yet! Spring 2006 7-6 5-4 3-2 0 1-0 3-0 7-4 7-0 EE 5324 - VLSI Design II - © Kia Bazargan [©Hauck] 120 Brent-Kung: the Complete Tree (g 0 ,p0 ) (g 1 ,p1 ) (g 2 ,p2 ) C0 C1 C2 C4 C3 C5 (g 3 ,p3 ) (g 4 ,p4 ) (g 5 ,p5 ) C6 (g 6 ,p6 ) C7 (g 7 ,p7 ) tadd ~ log2 (N) Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [© Prentice Hall] 121 Brent-Kung: Timing x15 x14x13 x12 x x x 11 10 9 x8 x x x x x x x x 7 6 5 4 3 2 1 0 Level 1 2 3 4 5 6 s15 s14 s13 s12 s s s s s s s s s s s s 11 10 9 8 7 6 5 4 3 2 1 0 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan [Par00] p.102 [©Oxford U Press] 122 Brent-Kung Adder: Summary • Area On average, twice as large as ripple adder Layout of the cells is very compact • Delay Logarithmic time Once carry signals are ready, sum bits derived in const time Good for wide adders Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 123 Comparing Adder Designs 60 static mirror manchester bypass Brent-Kung select Area (mm2) 80 tp(sec) 0.4 40 select static bypass mirror 0.2 Brent-Kung 20 manchester 0 0 10 20 Number of bits 0 0 10 20 Number of bits [© Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 124 Combining Different Adders [Par00] p.103 [©Oxford U Press] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 125 Combining Different Adders • Two-level carry skip adder Delay = 8 cycles Number of bits: 30 T produce {8, 1} c out bF 8 {7, 2} bE {6, 3} bD 6 7 S2 S2 F Blk E Cout t=8 7 Tassimilate {5, 4} bC Block D 4 5 S2 S2 Block C {4, 5} bB 3 S2 Block B Block A 2 6 5 4 {3, 8} c bA in 0 3 3 3 Cin t=0 [Par00] p.113 [©Oxford U Press] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 126 Combining Different Adders MSB RA(63:24) RB(63:24) RA(23:0) 40 Bit Carry Select Adder cout23 64 Bit Adder RB(23:0) LSB 24 Bit Differential Carry Lookahead Adder EA(23:0) EA(63:24) Data Cache TLB Compare real_add(40:0) Compare hit/miss/data © Dan Stasiak, IBM Rochester, 2001 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 127 Combining Different Adders 40 Bit Adder Section 24 Bit Adder Section EA(24:63) EA(0:23) & EA_L(0:23) © Dan Stasiak, IBM Rochester, 2001 Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 128 Should appear before Combining Different Adders slide 126 • Ripple+skip adder: delay=8. Max adder width? Assume: p,g, ripple, skip signal, skipping: 1 unit delay Carry signals o Pass mode: ready at time x through skip logic limit # blocks o Local gen mode: blocks can process y bits and still have time to deliver locally generated carry by time x for the next block. Sum signals Cout 1 o If in local generation mode, y is OK o If in pass mode, y not OK for left bits (e.g., bE receives cin at x=5, can process at most z=3 bits to meet the delay bound of 8 on the sum bits) 7 2 bG 7 Spring 2006 bF S 4 5 bD 3 6 bE 6 [Par00] p.112 S 5 S 4 bC 4 S EE 5324 - VLSI Design II - © Kia Bazargan 1 bA 3 bB 3 2 S Cin 0 2 [©Oxford U Press] 129 Should appear before slide 86 CLA Static Logic: Trimmed Down g0 h s j Cin p0 t u C1 k [Rab96] p405 [©Hauck] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan 130