Carry-Lookahead Addition CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Ripple-Carry Adder – Current design uses a “ripple-carry” adder technique • Cout propagates into the Cin of next adder – What is the associated electrical delay for this scheme? • Assume each gate (AND/OR only) has a delay of T units – Two level logic implementation of a single FA: • delay of 2T to compute Cout: A B Cin A Cout Cin B CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Carry-Lookahead Adder – A 16-bit Ripple-Carry adder has 15 * 2T + T = 31 T total delay to compute the sum! • Grows linearly with size of adder – Is there a faster way to add? yes. – Faster design uses a “carry-lookahead” adder technique – Real ALUs use this style – Idea is to compute needed carry-in to a bit position with only a very small delay (smaller than in the R.C. case) CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Generating a Carry – An adder will “generate” a carry-out on the sum of the bits ai and bi if ai • bi = 1 (i.e. a and b are both 1) Define gi = ai • bi (generate) – Hence: couti cini+1 = 1 if gi = 1 – Let ci = “carry-in to position i” – Note ci+1 = carry-in to position i+1 = carry-out from position i – Delay to compute each g = 1T CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Propagating a Carry – An adder will “propagate” a carry-in (ci) by the sum of the bits ai and bi if ci = 1 and ai + bi = 1 (i.e. cin is 1 and at least one of a or b is 1) Define pi = ai + bi – Hence: couti ci+1 = gi + pi • ci – A carry-out occurs from position i if it is either • generated by position i, or • a carry-in is propagated by position i – Delay to compute each p = 1T CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire (propagate) Dan Ernst • Propagate / Generate – Ex: using 4 bits c0 = initial carry-in c1 = g0 + p0 c0 c2 = g1 + p1 c1 = g1 + p1 (g0 + p0 c0) = g1 + p1 g0 + p1 p0 c0 c3 = g2 + p2 c2 = g2 + p2 (g1 + p1 g0 + p1 p0 c0) = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0 – Delay to compute each c = 2T fixed delay! • I am assuming that g and p are pre-computed • They take a total of 1T to pre-compute CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • An Abstraction of Propagate / Generate – These equations require large gate “fan-in” to implement in 2T delay therefore stop expansion at 4 bits as above – Delay to compute each c = 2T fixed delay! • pre-computed: 1T for each p and g (in parallel!) • 1T for the AND to create the subgroups (minterms) • 1T for the OR of all subgroups – Each sum bit Si can now be computed in 3T delay: Ai Bi Ci Si 2T delay to compute CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • 4-Bit Carry-Lookahead Adder – Combine these ideas to design a 4-bit adder with 3T delay for the entire 4-bit Sum (assuming p & g are pre-computed) Cin A0 B0 A1 B1 A2 B2 A3 B3 – – What are P0 and G0? • P0 = p3 p2 p1 p0 • G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 4-Bit C.L. Adder S0 S1 S2 S3 P0 G0 (super-propagate) (super-generate) Note: the device has no “carry-out”, only P0 and G0 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Super-Generate / Super-Propagate P0 = p3 p2 p1 p0 G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 (super-propagate) (super-generate) – P0 represents the propagate for the entire 4-bit unit • P0 takes 1T delay units to compute – G0 represents the generate for the entire 4-bit unit • G0 takes 2T delay units to compute – P0 and G0 represent a higher level of hardware abstraction of propagation and generation CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Cin • 16-bit C.L. Adder A0 B0 A1 B1 A2 B2 A3 B3 Carry-Lookahead Logic implements: – – – – Cin(0) = c0 (initial carry-in) Cin(1) = G0 + P0 c0 Cin(2) = G1 + P1 G0 + P1 P0 c0 Cin(3) = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 c0 A0 B0 A1 B1 A2 B2 A3 B3 Delay for Cin is (2T + 2T) = 4T Delay for Sum = 4T + 3T (per unit) + 1T (for pre-computation of p & g) = 8T Compare to 31T for R.C. adder A0 B0 A1 B1 A2 B2 A3 B3 P0 G0 Cin – 4-Bit C.L. Adder S0 S1 S2 S3 4-Bit C.L. Adder S4 S5 S6 S7 P1 G1 Cin CarryLookahead Logic – – – 4-Bit C.L. Adder S8 S9 S10 S11 P2 G2 Cin A0 B0 A1 B1 A2 B2 A3 B3 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire 4-Bit C.L. Adder S12 S13 S14 S15 P3 G3 Dan Ernst • 16-bit C.L. Adder Example (1) A: 0110 0011 1101 0101 B: 1110 1101 1000 0011 g: 0110 0001 1000 0001 p: 1110 1111 1101 0111 1T P0: P1: P2: P3: 0·1·1·1 = 0 1·1·0·1 = 0 1·1·1·1 = 1 1·1·1·0 = 0 1T (2T total) G0: G1: G2: G3: 0+00+010+0111=0 1+10+110+1100=1 0+10+110+1111=1 0+11+111+1110=1 2T (3T total) CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • 16-bit C.L. Adder Example (2) – Computing the actual sum (red bits only): A: 0110 0011 1101 0101 B: 1110 1101 1000 0011 g: 0110 0001 1000 0001 p: 1110 1111 1101 0111 P: 0100 a6 b6 c6 G: 1110 = 1 0 ( g5 + p5g4 + p5p4Cin(1) ) = 1 0 (0 + 0 0 + 0 1 (G0 + P0 c0)) = 1 0 (0 + 0 0 + 0 1 (0 + 0 0)) = 1 Delay to compute S6 = 1T + (2T + 2T) + (2T + 1T) = 8T CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst • Test Yourself – Compute sum bit S10 (red bits only): A: 0110 1001 1001 0101 B: 1011 0101 1000 1011 g: p: P: G: a10 b10 c10 = CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst