17: Adders CMOS VLSI Design CMOS VLSI Design 4th Ed.

advertisement
Lecture 17:
Adders
Outline









Datapath
Computer Arithmetic Principles
Single-bit Addition
Carry-Ripple Adder
Carry-Skip Adder
Carry-Lookahead Adder
Carry-Select Adder
Carry-Increment Adder
Tree Adder
17: Adders
CMOS VLSI Design 4th Ed.
2
A Generic Digital Processor
17: Adders
CMOS VLSI Design 4th Ed.
3
Building Blocks for Digital Architectures
 Arithmetic unit
– Bit sliced data path – adder, multiplier, shifter,
comparator, etc.
 Memory
– RAM, ROM, buffers, shift registers
 Control
– Finite state machine (PLA, random logic)
– Counters
 Interconnect
– Switches, arbiters, bus
17: Adders
CMOS VLSI Design 4th Ed.
4
An Intel Microprocessor
17: Adders
CMOS VLSI Design 4th Ed.
5
Bit-Sliced Design
17: Adders
CMOS VLSI Design 4th Ed.
6
Bit-Sliced Datapath
17: Adders
CMOS VLSI Design 4th Ed.
7
Itanium Integer Datapath
17: Adders
CMOS VLSI Design 4th Ed.
8
Motivation
 Arithmetic units are, among others, core of every data path and
addressing unit.
 Data path is at the core of
– microprocessors (CPU)
– signal processors (DSP)
– data processing application specific IC’s (ASIC) and
programmable IC’s (FPGA)
 Standard arithmetic units available from libraries
 Design of arithmetic units necessary for
– non-standard operations
– high performance components
– library development
17: Adders
CMOS VLSI Design 4th Ed.
9
Naming Conventions
 Signal busses: A (1-D), Ai, (2-D), ai:k (sub-bus, 1-D)
 Signals: a, ai (1-D), ai,k (2-D), Ai:k (group signal)
 Circuit complexity measures: A (Area), T (cycle time,
delay), AT (area-time product), L (latency, number of
cycles).
 Arithmetic operators: +, -, •, /, log (=log2)
 Logic operators: OR, AND, XOR, NOT, …
17: Adders
CMOS VLSI Design 4th Ed.
10
Circuit Complexity Measures
 Unit gate model
– Inverter, buffer: A = 0, T = 0
– Simple monotonic 2-input gates (AND, OR,
NAND, NOR): A = 1, T = 1
– Simple non-monotonic 2-input gates (XOR,
XNOR): A = 2, T = 2
– Simple m-input gates: A = m – 1, T = log m
– Wiring not considered
– Only for estimation purposes

17: Adders
CMOS VLSI Design 4th Ed.
11
Recursive Function Evaluation
 Given: inputs ai, outputs zi, function f (graph sym. •)
 Non-recursive functions (n.)
– Output zi is a function of input ai
zi  f ai, x; i  0,...,n 1
– Parallel structure

A  On, T  O1

17: Adders
CMOS VLSI Design 4th Ed.
12
Recursive Function Evaluation
 Recursive functions (r.)
– Output zi is a function of all inputs ak, k ≤ i
• with a single output z = zn-1 (r.s.):
ti  f ai ,t i1; i  0,...,n 1
t1  0/1, z  t n 1
– f is non-associative (r.s.n)
» serial structure

A  On, T  On
– f is associative (r.s.a)
» serial or single-tree structure

17: Adders
A  On, T  Ologn
CMOS VLSI Design 4th Ed.
13
Recursive Function Evaluation
– Output zi is a function of all inputs ak, k ≤ i
• multiple outputs zi (r.m.) (=> prefix problem)
zi  f ai,zi1; i  0,...,n 1, z1  0/1
– f is non-associative (r.m.n)
» serial structure
A  On, T  On


– f is associative (r.m.a)
» Serial or multi-tree structure
A  On 2 , T  Ologn
» Shared tree structure

17: Adders
A  Onlogn, T  Ologn
CMOS VLSI Design 4th Ed.
14
Arithmetic Operations
 Overview
17: Adders
CMOS VLSI Design 4th Ed.
15
Overview of Arithmetic Operations
 Direct implementation of dedicated units
– always: 1 – 5
– in most cases: 6
– sometimes: 7, 8
 Sequential implementation using simpler units and several
clock cycles (decomposition)
– sometimes: 6
– in most cases: 7, 8, 9
 Table look-up techniques using ROMs
– universal: simple application to all operations
– efficient only for single-operand operations of high
complexity (8 - 12) and small word length.
17: Adders
CMOS VLSI Design 4th Ed.
16
Overview of Arithmetic Operations
 Approximation using simpler units: 7 – 12
– Taylor series expansion
– polynomial and rational approximations
– convergence of recursive equation systems
– CORDIC (COordinate Rotation DIgital Computer)
17: Adders
CMOS VLSI Design 4th Ed.
17
Binary Number Systems
 Radix-2, binary number system (BNS): irredundant,
weighted, positional, monotonic.
 n-bit number is an ordered sequence of bits (binary
A  an1,an2,...,a0 2, ai 0,1
digits)
 Simple and efficient implementation in digital circuits
 MSB/LSB
(most/least significant bit): an-1/a0

 Represents an integer or fixed point number, exact.
 Fixed point numbers:
am1,...,a0 . a1,...,amn 
m-bit integer
17: Adders

n-m bit fraction
CMOS VLSI Design 4th Ed.
18
Binary Number Systems
 Unsigned: positive or natural numbers
n 1
– Value:
A  an 12n 1  ... a1 2  a0   ai 2i
i0
– Range: 0, 2n 1
Two’s (2’s) complement: standard representation of
 or integer numbers
signed
n 2
– Value
n 1
i
A  an 1 2
– Range
17: Adders
2
  ai 2
i0
, 2n 1 1
n 1
CMOS VLSI Design 4th Ed.
19
Binary Number Systems
– Complement: A  2n  A  A 1, where A  an 1,an 2,...,a0 
– Sign: an-1
– Properties:
 asymmetric range, compatible with
unsigned numbers in many arithmetic operations.
(same treatment of positive and negative
numbers)
 One’s (1’s) complement: similar to 2’s complement
n 2
– Value: A  a 2n 1 1  a 2i
n 1
– Range:

17: Adders


 
i
i0

2 n 1 1, 2 n 1 1
CMOS VLSI Design 4th Ed.
20
Binary Number Systems
n
– Complement: A  2  A 1  A
– Sign: an-1
– Properties: double representation of zero,
symmetric
range, modulo (2n-1) number system.

 Sign-magnitude: alternative representation of signed
numbers
n 2
a
– Value: A  1   ai 2i
n1
i0


n 1
n 1

2
1
,
2
1


– Range:
– 
Complement: A  a ,a ,...,a 
n1 n2
0
17: Adders
CMOS VLSI Design 4th Ed.
21
Binary Number Systems
 Sign: an-1
 Properties: double representation of zero, symmetric
range, different treatment of positive and negative
numbers in arithmetic operations, no MSB toggles at
sign changes around 0 (=> low power)
17: Adders
CMOS VLSI Design 4th Ed.
22
Gray Numbers
 Gray numbers (code): binary, irredundant, nonweighted, non-monotonic.
– Property: unit-distance coding. Exactly one-bit
toggles between adjacent numbers.
– Applications: counters with low output toggle rate
(low power busses), representation of continuous
signals for low-error sampling (no false numbers
due to switching of different bits at different
times).
– Non-monotonic numbers: difficult arithmetic
operations (addition, comparison).
17: Adders
CMOS VLSI Design 4th Ed.
23
Gray Numbers
– Binary - Gray conversion
gi  bi1  bi , bn  0;
i  0,...,n 1 (n.)
– Gray – binary conversion

bi  bi1  gi , bn  0
i  n 1,...,0 (r.m.a)

17: Adders
CMOS VLSI Design 4th Ed.
24
Redundant Number Systems
 Non-binary, redundant, weighted number systems.
 Digit set larger than radix (typically radix 2) => multiple
representations of the same number => redundancy.
 No carry propagation in adders => more efficient
implementation of adder-based units (multipliers, dividers, etc.)
 Redundancy => no direct implementation of relational
operators => conversion to irredundant numbers.
 Several bits used to represent one digit => higher storage
requirements.
 Expensive conversion to irredundant numbers. Not necessary if
redundant input operators are allowed.
17: Adders
CMOS VLSI Design 4th Ed.
25
Delayed-Carry Representation
 Delayed-carry or half adder representation
ri  0,1,2 , c i ,si ,ai ,bi  0,1 ,
ri  c i1,si   2c i1  si  ai  bi , c i1si  0
n 1
R   ri 2 i  C,S   C  S  A  B
i0
 1 digit holds the sum of 2 bits (no carry out)
Example: 01 + 01 = (0,0) (1,0) = 2
17: Adders
CMOS VLSI Design 4th Ed.
26
Carry-Save Representation
ri  0,1,2,3 , c i ,si ,ai ,bi ,di  0,1 ,
ri  c i1,si   2c i1  si  ai  bi  di  ai  ri
n 1
R   ri 2 i  C,S   C  S  A  R
i0
 One digit holds the sum of 3 bits or 1 digit and 1 bit.
No carry-out digit, carry is saved.

 Standard redundant number system for fast addition.
17: Adders
CMOS VLSI Design 4th Ed.
27
Signed-Digit Representation
 Signed-digit (SD) or redundant digit (RD) number
representation.
n 1
ri,si ,ti  1,0,1  1,0,1 , R   ri 2i
i0
 No carry propagation in S = R + T

ri  t i  c i1,ui   2c i1  ui , c i1,ui  1,0,1
c i1,ui  is redundant (e.g.,0 1  01  1 1
i c i ,ui  s.t. si  1,0,1
 One digit holds the sum of two digits. No carry-out.

17: Adders
CMOS VLSI Design 4th Ed.
28
Signed-Digit Representation
 Minimal SD representation: minimal number of nonzero digits.
0111
1010001 0
– Applications: sequential multiplication (less
cycles), filters with constant coefficients (less
 hardware).
– Example:
7  0111, 11 11 , 101 1 , 1001 , 11 111, 
minimal

17: Adders
CMOS VLSI Design 4th Ed.
29
Signed-Digit Representation
 Canonical SD representation: minimal SD. Not two
non-zero digits in sequence.
01
1 101001 0
 SD -> binary: carry propagation necessary => adder.
 Other applications: high speed multipliers.
 to carry-save, simple use for signed
 Similar
numbers.
17: Adders
CMOS VLSI Design 4th Ed.
30
Residue Number Systems
 Non-binary, irredundant, non-weighted number
system.
 Carry-free and fast additions and multiplications.
 Complex and slow other arithmetic operations (e.g.
comparison, sign, and overflow detection) because
digits are not weighted. Conversion to weighted
mixed-radix or binary system required.
 Codes for error correction and detection.
 Possible applications (but hardly used)
– Digital filters
– Error detection and correction
17: Adders
CMOS VLSI Design 4th Ed.
31
Residue Number Systems
 Base is n-tuple of integers (mn-1, mn-2, …, m0),
residues (or moduli). These mi are pairwise prime.
A  an 1,an 2,,a0 m
n1 ,m ns
,,m 0
ai  0,1,,mi 1
n 1
Range : M   mi , anywhere inZ
i0
ai  Amodmi  A m i , A  mi  qi  ai
 Arithmetic operations: each digit computed
separately.

17: Adders
CMOS VLSI Design 4th Ed.
32
Residue Number Systems
 
z i  Z m i  f A m  f A m i
i
A  B mi  A mi  B mi
A B m i  A m i  B m i
ai m  m i  ai m
i
ai1 m  aim i 2 m
i
mi
mi
 f ai 
mi
 ai  bi m
 ai  bi m
mi
i
i
i
(Fermats' T heorem)
i
 Best moduli mi are 2k and 2k – 1.
– High storage efficiency with k bits.

– Simple modular addition k bit adder without cout
17: Adders
CMOS VLSI Design 4th Ed.
33
Residue Number Systems
 Example:
m1,m0   3,2, M  6
5 6  A  a1,a0   5 3 , 5 2   2,1
4  5 6  1,0  2,1  1 2 3 , 0 1 2   0,1  3 6
4 5 6  1,0 2,1  1 2 3 , 0 1 2   2,0  2 6

17: Adders
CMOS VLSI Design 4th Ed.
34
Floating-Point Numbers
 Larger range, smaller precision than fixed-point
representation, inexact, real numbers.
 Double-number form => discontinuous precision.
 S | biased exponent E | unsigned norm mantissa M
F  1  M E  1  M 2E bias
S
S
 Basic arithmetic operations
 A B  1SA SB M A  M B   E A E B



A  B  1 A  M A  1 B  M B  E A  E B    E A
17: Adders

S
S
CMOS VLSI Design 4th Ed.
35
Floating-Point Numbers
 Basic arithmetic operations based in fixed point add,
multiply, and shift operations. Post-normalization
required.
 Applications:
– Processors: real floating point formats (e.g. IEEE
standard), large range due to universal use.
– ASICs: usually simplified floating-point formats
with small exponents, smaller range. Used for
range extension of normal fixed-point numbers.
 IEEE floating point format:
17: Adders
CMOS VLSI Design 4th Ed.
36
Logarithmic Number System
 Alternative representation to floating point (mantissa
+ integer exponent -> only fixed point exponent).
 Single number form => continuous precision =>
higher accuracy, more reliable.
S
|
biased fixed point exponentE
L  1  E  1  2E bias
S
S
 Basic arithmetic operations:
– (A
 < B) = (EA < EB) additionally consider sign
– A + B by approximation or addition in
conventional number system and double
conversion.
17: Adders
CMOS VLSI Design 4th Ed.
37
Logarithmic Number System
 Basic arithmetic operations
A B  1 A
S SB
A  1  
y
SA
 EA EB
yE A
,
y
A  1  
SA
EA
y
– Simpler multiplication, exponentiation. More
complex addition.

– Expensive conversion: (anti)logarithms probably
by table look-up.
– Applications: real-time digital filters.
17: Adders
CMOS VLSI Design 4th Ed.
38
Antitetrational Number System
 Tetration (t.x = 2{ 2
2...
2
and antitetration (a.t.x)
x times
 Larger range, but smaller precision than logarithmic
representation. Otherwise, analogous.
 Note that
 all these systems can be mixed in
composite arithmetic.
 Choice of number representation should be hidden
from the user. The compiler should handle it.
 Rational numbers can also be represented in
floating slash notation.
17: Adders
CMOS VLSI Design 4th Ed.
39
Round-Off Schemes
 Intermediate results with d additional lower bits. This
results in higher accuracy. A  an1,,a0,a1,,ad 
 Rounding: keeping error e small during final word
length reduction: R  rn1,,r0   A  e
 Trade-off: numericalaccuracy vs implementation
cost.
 Truncation
 RTRUNC  an1,,a0 
– bias  1  1
= average error e
d 1
2 2
 to nearest (normal rounding)
 Round

17: Adders
RROUND  an 1,, a0  , A A 
1
 A  0.12
2
CMOS VLSI Design 4th Ed.
40
Round-Off Schemes
 Round to nearest
1
– The error is bias d 1 nearly symmetric
2
– + 0.12 can often be included in a previous operation.
 Round to nearest even/odd

RROUND if a1,, ad   00
 RROUND EVEN  

an 1,, a1,0 otherwise
– bias = 0 (symmetric)
– Mandatory in IEEE floating-point standard
 3guard bits for rounding after floating point operations: guard
bit G (postnormalization), round bit R (round to nearest ), sticky
bit S (round to nearest even)
17: Adders
CMOS VLSI Design 4th Ed.
41
Addition
17: Adders
CMOS VLSI Design 4th Ed.
42
Single-Bit Addition
A
Half Adder
S  A B
B
Full Adder
S  A B C
Cout
Cout  A B
S
A
B
Cout
Cout  MAJ ( A, B, C )
C
S
A
B
Cout
S
A
B
C
Cout S
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
1
1
0
0
1
0
1
0
0
1
1
1
1
0
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
17: Adders
CMOS VLSI Design 4th Ed.
43
1-Bit Adders
 Add up m bits of same magnitude
 Output the sum as a k-bit number (k  log m 1)
 Or count 1’s at inputs => (m,k) counter –
combinational counter.

 A half adder is a (2,2) counter
cout ,s  2cout  s  a  b

s  a  b (sum)
c out  ab (carry- out)
A  3 , T  21


17: Adders
CMOS VLSI Design 4th Ed.
44
1-Bit Adders
17: Adders
CMOS VLSI Design 4th Ed.
45
1-Bit Adders
 A full-adder is a (3,2) counter.
cout ,s  2cout  s  a  b  cin
g  ab
A  7 , T  42
(generat e) c 0  ab
p  a  b (propagate) c1  a  b

s  a  b  c in  p  c in

c out  ab acin  bcin  ab a  bc in
 g  pcin  pg  pcin  pa  pcin
 c inc 0  c inc1

17: Adders
CMOS VLSI Design 4th Ed.
46
PGK
 For a full adder, define what happens to carries
(in terms of A and B)
– Generate: Cout = 1 independent of C
• G=A•B
– Propagate: Cout = C
• P=AB
– Kill: Cout = 0 independent of C
• K = ~A • ~B
17: Adders
CMOS VLSI Design 4th Ed.
47
Full Adder Design I
 Brute force implementation from eqns
S  A B C
Cout  MAJ ( A, B, C )
A
A
A
S
C
A
B
B
C
C
MAJ
C
B
C
B
B
B
C
A
C
A
A
A
17: Adders
C
B
S
Cout
B
A
B
A
B
C
A
B
C
B
B
Cout
B
A
CMOS VLSI Design 4th Ed.
48
Full Adder Design II
 Factor S in terms of Cout
S = ABC + (A + B + C)(~Cout)
 Critical path is usually C to Cout in ripple adder
MINORITY
A
B
C
Cout
S
S
Cout
17: Adders
CMOS VLSI Design 4th Ed.
49
Full Adder Design II
 Same circuit with sized transistors
17: Adders
CMOS VLSI Design 4th Ed.
50
Layout
 Clever layout circumvents usual line of diffusion
– Use wide transistors on critical path
– Eliminate output inverters
17: Adders
CMOS VLSI Design 4th Ed.
51
Full Adder Design III
 Complementary Pass Transistor Logic (CPL)
– Slightly faster, but more area
B
B
B
C
B
C
B
C
B
C
A
S
B
C
B
C
B
C
B
C
Cout
A
A
S
A
Cout
B
B
17: Adders
CMOS VLSI Design 4th Ed.
52
Full Adder Design III
 Transmission gates
17: Adders
CMOS VLSI Design 4th Ed.
53
Full Adder Design IV
 Dual-rail domino
– Very fast, but large and power hungry
– Used in very fast multipliers

C_h
A_h
B_h
A_h
C_l
B_h
A_l
C_h
B_h
A_h
C_l
B_l
Cout _l
A_l
B_l
B_l

S_l
17: Adders

Cout _h
S_h
C_h
B_h
A_l
CMOS VLSI Design 4th Ed.
54
(m,k) Counters
k 1
m 1
j 0
i0
sk 1,,s0    s j 2 j   ai
 Usually built from full-adders.
  Associativity of addition allows conversion from
linear to tree structure => faster at the same number
of FAs.
log m
A  7 m2 k   7m  logm
k 1
TLIN  4m  2logm , TTREE  4 log3 m   2logm
17: Adders

CMOS VLSI Design 4th Ed.
55
(7,3) Counter
 Example
A  28 , T 14

A  28 , T 10

17: Adders
CMOS VLSI Design 4th Ed.
56
Carry Propagate Adders
 Add two n-bit operands A and B and an optional
carry in cin by performing carry propagation.
 Sum (cout, S) is an irredundant (n+1) bit number
cout ,S  cout 2n  S  A  B  cin
2c i1  si  ai  bi  c i
i  0,1,,n 1
c 0  c in , c out
 c n (r.m.a)

17: Adders
CMOS VLSI Design 4th Ed.
57
Carry Propagate Adders
 N-bit adder called CPA
– Each sum bit depends on all previous carries
– How do we compute all these carries quickly?
AN...1 BN...1
Cout
Cout
+
SN...1
17: Adders
Cin
Cin
00000
1111
+0000
1111
CMOS VLSI Design 4th Ed.
Cout
11111
1111
+0000
0000
Cin
carries
A4...1
B4...1
S4...1
58
Ripple-Carry Adder(RCA)
 Serial arrangement of n full adders.
 Simplest, smallest, and slowest CPA structure.
A  7n , T  2n , AT  14 n 2

17: Adders
CMOS VLSI Design 4th Ed.
59
Carry-Ripple Adder
 Simplest design: cascade full adders
– Critical path goes from Cin to Cout
– Design full adder to have fast carry delay
A4
B4
Cout
B3
C3
S4
17: Adders
A3
A2
B2
C2
S3
A1
B1
Cin
C1
S2
CMOS VLSI Design 4th Ed.
S1
60
Carry Ripple Adder
 Note that worst case delay is linear with number of
bits.
tadder  N 1tcarry  tsum
 Goal: Make the fastest possible carry path circuit.

17: Adders
CMOS VLSI Design 4th Ed.
61
A Full Adder Circuit
17: Adders
CMOS VLSI Design 4th Ed.
62
Inversion Property
17: Adders
CMOS VLSI Design 4th Ed.
63
Inversions
 Critical path passes through majority gate
– Built from minority + inverter
– Eliminate inverter and use inverting full adder
A4
B4
Cout
B3
C3
S4
17: Adders
A3
A2
B2
C2
S3
A1
B1
Cin
C1
S2
S1
CMOS VLSI Design 4th Ed.
64
Mirror Adder
17: Adders
CMOS VLSI Design 4th Ed.
65
Mirror Adder
17: Adders
CMOS VLSI Design 4th Ed.
66
Mirror Adder
 The NMOS and PMOS chains are completely
symmetrical. A maximum of two series transistors
can be observed in the carry generation circuit.
 When laying out the cell, the most critical issue is
the minimization of the capacitance at node Co. The
reduction of the diffusion capacitances is particularly
important.
 The capacitance at node Co is composed of four
diffusion capacitances, two internal gate
capacitances, and six gate capacitances in the
connecting adder cell.
17: Adders
CMOS VLSI Design 4th Ed.
67
Mirror Adder
 The transistors connected to Ci are placed closest to
the input.
 Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the
sum stage can be minimal size.
17: Adders
CMOS VLSI Design 4th Ed.
68
Transmission Gate FA
17: Adders
CMOS VLSI Design 4th Ed.
69
Carry Propagation Speed-up
 Concatenation of partial CPA’s with fast cin -> cout.
 Fast carry look-ahead logic for entire range of bits.
17: Adders
CMOS VLSI Design 4th Ed.
70
Generate / Propagate
 Equations often factored into G and P
 Generate and propagate for groups spanning i:j
Gi: j  Gi:k  Pi:k
Pi: j  Pi:k
Gk 1: j
Pk 1: j
 Base case
Gi:i  Gi  Ai
Bi
Pi:i  Pi  Ai  Bi
0 GCP
0:00:0 in
G0:0  G0  Cin
P0:0  P0  0
 Sum:
Si  Pi  Gi 1:0
17: Adders
CMOS VLSI Design 4th Ed.
71
PG Logic
A4
B4
A3
B3
A2
B2
A1
B1
Cin
1: Bitwise PG logic
G4
P4
G3
P3
G2
P2
G1
P1
G0
P0
2: Group PG logic
G3:0
G2:0
G1:0
G0:0
C3
C2
C1
C0
3: Sum logic
C4
Cout
17: Adders
S4
S3
S2
S1
CMOS VLSI Design 4th Ed.
72
PG Logic
17: Adders
CMOS VLSI Design 4th Ed.
73
Carry-Ripple Revisited
Gi:0  Gi  Pi
Gi 1:0
A4
B4
G4
P4
A3
B3
G3
P3
A2
B2
G2
P2
A1
B1
G1
P1
Cin
G0
G3:0
G2:0
G1:0
G0:0
C3
C2
C1
C0
P0
C4
Cout
17: Adders
S4
S3
S2
S1
CMOS VLSI Design 4th Ed.
74
Carry-Ripple PG Diagram
Bit Position
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
tripple  t pg  ( N  1)tAO  txor
Delay
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
75
PG Diagram Notation
Black cell
i:k
Gray cell
k-1:j
i:k
i:j
Gi:k
Pi:k
Gk-1:j
Pk-1:j
17: Adders
Buffer
k-1:j
i:j
i:j
i:j
Gi:j
Gi:k
Pi:k
Gk-1:j
Pi:j
CMOS VLSI Design 4th Ed.
Gi:j
Gi:j
Gi:j
Pi:j
Pi:j
76
Manchester Carry Chain
17: Adders
CMOS VLSI Design 4th Ed.
77
Manchester Carry Chain
17: Adders
CMOS VLSI Design 4th Ed.
78
Manchester Carry Chain
17: Adders
CMOS VLSI Design 4th Ed.
79
Carry-Skip Adder
 Carry-ripple is slow through all N stages
 Carry-skip allows carry to skip over groups of n bits
– Decision based on n-bit propagate signal
Cout
A16:13 B16:13
A12:9 B12:9
A8:5 B8:5
A4:1
P16:13
P12:9
P8:5
P4:1
1
0
C12
+
S16:13
17: Adders
1
0
C8
+
S12:9
1
0
C4
+
S8:5
CMOS VLSI Design 4th Ed.
B4:1
1
0
+
Cin
S4:1
80
Carry-Skip Adder
17: Adders
CMOS VLSI Design 4th Ed.
81
Carry-Skip Adder
17: Adders
CMOS VLSI Design 4th Ed.
82
Carry-Skip Adder
17: Adders
CMOS VLSI Design 4th Ed.
83
Carry-Skip PG Diagram
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
For k n-bit groups (N = nk)
tskip  t pg   2  n  1  ( k  1)  t AO  txor
17: Adders
CMOS VLSI Design 4th Ed.
84
Variable Group Size
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Delay grows as O(sqrt(N))
17: Adders
CMOS VLSI Design 4th Ed.
85
Carry-Skip Adder
 Partial CPA with fast ck -> ci
c i  Pi1:kci  Pi1:kc k (bit groupai1,,ak )
Pi1:k  pi1 pi2 pk (group propagate)
 If Pi-1:k = 0 : ck does not become c’i and c’i is
selected, becoming ci.

 If Pi-1:k = 0 : ck becomes c’i, but c’i is skipped.
 Path ck -> c’i -> ci never sensitized => fast ck -> ci
 False path => inherent logic redundancy =>
problems in circuit optimization, timing analysis, and
testing.
17: Adders
CMOS VLSI Design 4th Ed.
86
Carry-Skip Adder
 Variable group sizes are faster.
– Use larger groups in the middle
– Minimize delays a0 -> ck -> si-1 and ak -> ci -> sn-1
 Partial CPA type is RCA or CSKA (multilevel CSKA)
 Medium speed-up at small hardware overhead (+
AND/bit +MUX/group)
A  8n , T  4n
1
2
, AT  32n
3
2

17: Adders
CMOS VLSI Design 4th Ed.
87
CSKA + Manchester
17: Adders
CMOS VLSI Design 4th Ed.
88
Carry-Select Adder
 Trick for critical paths dependent on late input X
– Precompute two possible outputs for X = 0, 1
– Select proper output when X arrives
 Carry-select adder precomputes n-bit sums
– For both possible carries into n-bit group
A16:13 B16:13
0
+
Cout
1
+
B8:5
C8
1
+
CMOS VLSI Design 4th Ed.
B4:1
C4
+
Cin
0
S12:9
A4:1
0
+
1
0
1
0
1
S16:13
A8:5
0
+
C12
1
+
17: Adders
A12:9 B12:9
S8:5
S4:1
89
Carry-Select Adder
 Partial CPA with fast ck -> ci and ck -> si-1:k
0
si1:k  c k si1:k
 c k s1i1:k
c i  c kc i0  c kc1i
 Two CPA’s compute two possible results (cin = 0/1),
group carry-in ck selects correct one afterwards.
 Variable
 group sizes are faster; use larger groups at
end (MSB). Balance delays a0 -> ck and ak -> ci0
 Partial CPA type is RCA, CSLA (multilevel CSLA) or
CLA.
17: Adders
CMOS VLSI Design 4th Ed.
90
Carry-Select Adder
 High speed-up at high hardware overhead.
– + MUX/bit + (CPA + MUX)/group
A 14n , T  2.8n
1
2
, AT  39n
3
2

17: Adders
CMOS VLSI Design 4th Ed.
91
Carry-Select Adder
17: Adders
CMOS VLSI Design 4th Ed.
92
Carry-Select Adder
17: Adders
CMOS VLSI Design 4th Ed.
93
Linear Carry-Select
17: Adders
CMOS VLSI Design 4th Ed.
94
Square-Root Carry-Select
17: Adders
CMOS VLSI Design 4th Ed.
95
Delay Comparison
17: Adders
CMOS VLSI Design 4th Ed.
96
Carry-Increment Adder
 Partial CPA with fast ck -> ci and ck -> si-1:k
si1:k  si1:k  c k , c i  ci  Pi1:kc k
Pi1:k  pi1 pi2 pk (group propagate)
 Result is incremented after addition if ck = 1
 Variable group sizes are faster, use larger groups at
end (MSB). Balance delays a0 -> ck and ak -> c’i
 Partial CPA could be RCA, CIA (multilevel CIA) or
CLA.
 High speed-up at medium hardware overhead
(+AND/bit + (incrementer + AND/OR)/group).
 Logic of CPA and incrementer could be merged.
17: Adders
CMOS VLSI Design 4th Ed.
97

Carry-Increment Adder
A 10n , T  2.8n
17: Adders
2
1
, AT  28n
CMOS VLSI Design 4th Ed.
3
2
98
Carry-Increment Adder
 Example: gate-level schematic of carry-increment
adder (CIA)
– Only two different logic cells (bit-slices): IHA and
IFA
17: Adders
CMOS VLSI Design 4th Ed.
99
Carry-Increment Adder
 Factor initial PG and final XOR out of carry-select
15
14
13
12
11
10
13:12
8
7
6
9:8
14:12
15:12
9
4
3
2
1
0
5:4
10:8
11:8
5
6:4
7:4
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
tincrement  t pg   n  1  (k  1)  t AO  txor
17: Adders
CMOS VLSI Design 4th Ed.
100
Variable Group Size
 Also buffer
noncritical
signals
15
14
13
12
11
10
9
12:11
8
7
6
8:7
13:11
4
5:4
9:7
14:11
5
3
2
1
0
3:2
6:4
10:7
15:11
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
15
14
13
12
11
10
9
12:11
7
6
8:7
13:11
14:11
8
9:7
10:7
5
5:4
6:4
4
3
3:2
2
1
0
1:0
3:0
6:0
15:11
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
101
Conditional-Sum Adder
 Optimized multilevel CSLA with logn levels
0
1
 Correct sum bits si1:k or sik:1 are conditionally
selected through logn levels of multiplexers.
 Bit groups of size 2l at level l.
 Higher parallelism,
more balanced signal paths.

 Highest speed-up at highest hardware overhead
(2RCA + more than logn MUX/bit)
17: Adders
CMOS VLSI Design 4th Ed.
102

Conditional-Sum Adder
A  3n logn , T  2logn , AT  6n log2 n
17: Adders
CMOS VLSI Design 4th Ed.
103
Conditional-Sum Adder
17: Adders
CMOS VLSI Design 4th Ed.
104
Conditional-Sum Adder
17: Adders
CMOS VLSI Design 4th Ed.
105
Carry-Lookahead Adder
 Carries look ahead before sum bits are computed
c 0  c0
c1  g0  p0c0
c 2  g1  p1g0  p1 p0c0
c 3  g2  p2 g1  p2 p1g0  p2 p1 p0c0
g3  g3  p3 g2  p3 p2 g1  p3 p2 p1g0
p3  p3 p2 p1 p0
1
 Hierarchical arrangement using logn levels: g3, p3 
2
passed up, c’ passed down between
levels.
0
 High 
speed-up at medium hardware overhead.

17: Adders
CMOS VLSI Design 4th Ed.

106
Carry-Lookahead Adder
A  14n , T  4 log n , AT  56n log n

17: Adders
CMOS VLSI Design 4th Ed.
107
Carry-Lookahead Adder
17: Adders
CMOS VLSI Design 4th Ed.
108
Carry-Lookahead Adder
 Carry-lookahead adder computes Gi:0 for many bits
in parallel.
 Uses higher-valency cells with more than two inputs.
A16:13 B16:13
Cout
G16:13
P16:13
+
S16:13
17: Adders
C12
A12:9 B12:9
G12:9
P12:9
+
S12:9
A8:5 B8:5
C8
A4:1
C4
G8:5
P8:5
B4:1
G4:1
P4:1
+
+
S8:5
S4:1
CMOS VLSI Design 4th Ed.
Cin
109
CLA PG Diagram
17: Adders
CMOS VLSI Design 4th Ed.
110
Carry-Lookahead
17: Adders
CMOS VLSI Design 4th Ed.
111
Lookahead Tree
17: Adders
CMOS VLSI Design 4th Ed.
112
Lookahead Tree
17: Adders
CMOS VLSI Design 4th Ed.
113
Higher-Valency Cells
i:k k-1:l l-1:m m-1:j
i:j
Gi:k
Pi:k
Gk-1:l
Pk-1:l
Gl-1:m
Pl-1:m
Gm-1:j
Gi:j
Pi:j
Pm-1:j
17: Adders
CMOS VLSI Design 4th Ed.
114
Higher Valency PG Diagram
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
115
Tree Adder
 If lookahead is good, lookahead across lookahead!
– Recursive lookahead gives O(log N) delay
 Many variations on tree adders
17: Adders
CMOS VLSI Design 4th Ed.
116
Parallel Prefix Adders
 Universal adder architecture comprising RCA, CIA,
CLA, and more (entire range of area-delay trade-offs
from slowest RCA to fastest CLA).
 Preprocessing, carry-lookahead, and postprocessing
step.
 Carries calculated using parallel-prefix algorithms
– High regularity: suitable for synthesis and layout
– High flexibility: special adders, other arthmetic
operations, exchangeable prefix algorithms.
– High performance: smallest and fastest adders
17: Adders
CMOS VLSI Design 4th Ed.
117
Parallel Prefix Adders
A  5n  3A , T  4  2T

17: Adders
CMOS VLSI Design 4th Ed.
118
Prefix Problem
 Inputs (xn-1,…,x0) outputs (yn-1,…,y0), associative
binary operator •
yn 1,, y0   xn 1   x0,, x1  x0, x0 
or
y 0  x 0 , y i  x i  y i1 ; i 1,,n 1 (r.m.a)
 Associativity of • => tree structures for evaluation





x 3  x 2  x1  x 0  x 3  x 2  x1  x 0 
14 2 43  14 2 43 14 2 43

1
1
1
y1 Y1:0
Y3:2
y1 Y1:0


1 4 44 2 4 4 43
1 4 4 2 4 43
2
2
y 3 Y3:0
y 2 Y2:0
1 4 44 2 4 4 43
3
y 3 Y3:0
17: Adders
CMOS VLSI Design 4th Ed.
119
Prefix Problem
l
 Group variables Yi:k : covers bits (xk,…,xi) at level l.
 Carry-propagation is prefix problem: Yi:kl  Gi:kl ,Pi:kl 
G
G

0
i:i
,Pi:i0   gi , pi 
l
i:k




l 1
l 1
,Pi:kl   Gi:l 1j 1,Pi:l 1
; k ji
j 1  G j:k ,P j:k
l 1
l 1 l 1
 Gi:l 1j 1  Pi:l 1
G
,P
j 1 j:k
j:k P j:k


m
c i1  Gi:0
; i  0,,n 1 , l  1,,m
 Parallel-prefix algorithms:
– Multi-tree structures T = O(n) -> O(logn)
– Sharing subtrees A = O(n2) -> O(nlogn)
– Different algorithms trading area vs delay. Also consider
wirng and fanout.

17: Adders
CMOS VLSI Design 4th Ed.
120
Prefix Algorithms
 Algorithms visualized by directed acyclic graphs
(DAG) with array structure (n bits x m levels).
 Graph vertex symbols
 Performance measures:
– A• : graph size (number of black nodes)
– T• : graph depth (number of black nodes on
critical path)
17: Adders
CMOS VLSI Design 4th Ed.
121
Prefix Algorithms
 Serial prefix algorithm (RCA)
A  n 1 , T  n 1 , FOmax  2

17: Adders
CMOS VLSI Design 4th Ed.
122
Prefix Algorithms
 Sklansky parallel-prefix algorithm (PPA-SK)
– Tree-like collection, parallel redistribution of
carries
1
1
A  nlogn , T  logn , FOmax  n
2
2

17: Adders
CMOS VLSI Design 4th Ed.
123
Sklansky
15 14 13 12 11 10
15:14
13:12
11:10
15:12 14:12
15:8
14:8
11:8 10:8
13:8
9
9:8
8
7
6
7:6
7:4
5
5:4
6:4
4
3
2
3:2
3:0
1
0
1:0
2:0
12:8
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
124
Prefix Algorithms
 Brent-Kung parallel-prefix algorithm (PPA-BK)
– Traditional CLA is PPA-BK with 4-bit groups
– Tree-like redistribution of carries (fan-out tree)
A  2n  logn  2 , T  2logn  2
FOmax  logn

17: Adders
CMOS VLSI Design 4th Ed.
125
Brent-Kung
15 14 13 12 11 10
15:14
13:12
15:12
11:10
9
9:8
11:8
8
7
7:6
6
5
5:4
7:4
15:8
4
3
3:2
2
1
0
1:0
3:0
7:0
11:0
13:0
9:0
5:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
126
Prefix Algorithms
 Kogge-Stone parallel-prefix algorithm (PPA-KS)
– very high wiring requirements
A  n log n  n 1 , T  log n , FOmax  2

17: Adders
CMOS VLSI Design 4th Ed.
127
Kogge-Stone
15:14 14:13 13:12 12:11 11:10 10:9
9:8
8:7
7:6
6:5
5:4
4:3
3:2
2:1
3:0
2:0
15:12 14:11 13:10
12:9
11:8 10:7
9:6
8:5
7:4
6:3
5:2
4:1
13:6
12:5
11:4 10:3
9:2
8:1
7:0
6:0
5:0
4:0
15:8
14:7
1
2
3
4
5
6
7
8
9
15 14 13 12 11 10
0
1:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
128
Prefix Algorithms
 Carry-increment parallel-prefix algorithm
1
1
A  2n 1.4n 2 , T 1.4n 2 , FOmax 1.4n
1
2

17: Adders
CMOS VLSI Design 4th Ed.
129
Prefix Algorithms
 Mixed serial/parallel-prefix algorithm (RCA+PPA)
– Linear size-depth trade-off using parameter k:
0  k  n  2log n  2

– k = 0 : serial prefix graph
– k  n  2log n 1 : Brent-Kung parallel-prefix

graph
– Fills the gap between RCA and PPA-BK (CLA) in
steps of single •-operations.
17: Adders
CMOS VLSI Design 4th Ed.
130

Prefix Algorithms
A  n 1 k , T  n 1  k , FOmax  var.
17: Adders
CMOS VLSI Design 4th Ed.
131
Prefix Algorithms
 Example: 4-bit PPA-SK
– Efficient AND-OR-prefix circuit for the generate
and AND-prefix circuit for the propagate signals
– Optimization: alternatingly AOI/OAI- resp. NAND/NOR-gates (inverting gatesare smaller and
faster).
– Can also be realized using two MUX-prefix
circuits
17: Adders
CMOS VLSI Design 4th Ed.
132
Prefix Algorithms
17: Adders
CMOS VLSI Design 4th Ed.
133
Prefix Algorithms
 Prefix adders can be synthesized by human or
computer as well.
 Starting from a serial structure, one can use
compression rules and expansion rules to obtain
new graphs.
 Can generate all previous graphs except PPA-KS.
 Universal adder synthesis approach.
17: Adders
CMOS VLSI Design 4th Ed.
134
Tree Adder Taxonomy
 Ideal N-bit tree adder would have
– L = log N logic levels
– Fanout never exceeding 2
– No more than one wiring track between levels
 Describe adder with 3-D taxonomy (l, f, t)
– Logic levels:
L+l
– Fanout:
2f + 1
– Wiring tracks:
2t
 Known tree adders sit on plane defined by
l + f + t = L-1
17: Adders
CMOS VLSI Design 4th Ed.
135
Tree Adder Taxonomy
l (Logic Levels)
3 (7)
Brent-Kung
f (Fanout)
2 (6)
Sklansky
3 (9)
1 (5)
2 (5)
1 (3)
0 (2)
0 (4)
0 (1)
1 (2)
2 (4)
Kogge-Stone
3 (8)
t (Wire Tracks)
17: Adders
CMOS VLSI Design 4th Ed.
136
Han-Carlson
15 14 13 12 11 10
9
8
7
6
5
4
3
15:14
13:12
11:10
9:8
7:6
5:4
3:2
15:12
13:10
11:8
9:6
7:4
5:2
3:0
15:8
13:6
11:4
9:2
7:0
5:0
2
1
0
1:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
137
Knowles [2, 1, 1, 1]
15 14 13 12 11 10
9
8
7
6
5
4
3
2
15:14 14:13 13:12 12:11 11:10 10:9
9:8
8:7
7:6
6:5
5:4
4:3
3:2
2:1
15:12 14:11 13:10
3:0
2:0
15:8
14:7
13:6
12:9
11:8 10:7
9:6
8:5
7:4
6:3
5:2
4:1
12:5
11:4 10:3
9:2
8:1
7:0
6:0
5:0
4:0
1
0
1:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
138
Ladner-Fischer
15 14 13 12 11 10
15:14
13:12
15:12
11:10
9
9:8
11:8
15:8
13:8
15:8
13:0
8
7
7:6
5
5:4
7:4
7:0
11:0
6
4
3
3:2
2
1
0
1:0
3:0
5:0
9:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
CMOS VLSI Design 4th Ed.
139
Taxonomy Revisited
(f)Ladner-Fischer
(b) Sklansky
15
15 14 13 12 11 10
9
8
7
6
5
4
3
2
1
14
15:14
15:14
13:12
11:10
9:8
7:6
5:4
3:2
13
12
15:8
14:8
11:8 10:8
13:8
7:4
6:4
3:0
BrentKung
LadnerFischer
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
LadnerFischer
f (Fanout)
13:12
11:10
4
3
2
15:14 14:13 13:12 12:11 11:10 10:9
9:8
8:7
7:6
6:5
5:4
4:3
3:2
2:1
15:12 14:11 13:10
9:6
8:5
7:4
6:3
5:2
4:1
3:0
2:0
12:9
11:8 10:7
1
15:8
13:8
15:8
13:0
15:14
0 (2)
0
1:0
0 (4)
0 (1)
13:12
11:10
15:12
14:7
13:6
12:5
11:4 10:3
9:2
8:1
7:0
6:0
5:0
4
3
2
7:6
5:4
3:2
7:0
Knowles
[4,2,1,1]
9:0
8:0
7:0
6:0
5:0
4:0
7
6
5
4
3
2
3:0
2:0
9
8
1
0
HanCarlson
9:8
7:6
5:4
3:2
7:4
15:8
1:0
3:0
7:0
11:0
9:0
5:0
2 (4)
(d) Han-Carlson
15 14 13
(c) Kogge-Stone
15 14 13 12 11 10
13:6
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
HanCarlson
Knowles
[2,1,1,1]
14:7
9
8
7
6
5
4
3
2
9:8
8:7
7:6
6:5
5:4
4:3
3:2
2:1
12:9
11:8 10:7
9:6
8:5
7:4
6:3
5:2
4:1
3:0
2:0
12:5
11:4 10:3
9:2
8:1
7:0
6:0
5:0
4:0
1
12 11 10
9
8
7
6
5
4
3
2
1
0
0
1:0
Kogge3 (8)
Stone
15:14
13:12
11:10
9:8
7:6
5:4
3:2
15:12
13:10
11:8
9:6
7:4
5:2
3:0
15:8
13:6
11:4
9:2
7:0
5:0
1:0
t (Wire Tracks)
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
17: Adders
0:0
5:0
New
(1,1,1)
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
15:8
1:0
1:0
1 (2)
15:12 14:11 13:10
0
3:0
4:0
15:14 14:13 13:12 12:11 11:10 10:9
1
9:0
11:8
13:0
15:8
5
7:4
11:0
15 14 13 12 11 10
1 (3)
5
9:8
11:8
15:0 14:0 13:0 12:0 11:0 10:0
2 (5)
6
6
1 (5)
(e) Knowles [2,1,1,1]
7
7
(a) Brent-Kung
2 (6)
3 (9)
8
8
3 (7)
Sklansky
9
9
l (Logic Levels)
2:0
12:8
15 14 13 12 11 10
10
1:0
15:12
15:12 14:12
11
0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
CMOS VLSI Design 4th Ed.
140
More Adder Issues




Multilevel adders
– Multilevel versions of adders possible
• CSKA, CSLA, CIA
Hybrid adders
– Arbitrary combination of speed-up techniques possible.
– Often used combinations: CLA – CSLA
Transistor level adders
– Influence of logic styles (dynamic logic, pass transistor logic)
– Efficient transistor level implementation of ripple-carry chains
(Manchester chain)
– Combinations of speed-up techniques make sense.
• Much higher design effort
– Many efficient implementations exist in the literature.
Higher valency (radix) also possible.
17: Adders
CMOS VLSI Design 4th Ed.
141
More Adder Issues
 Higher valency is a poor choice in static CMOS logic
since each stage has higher delay.
 However, if the stages are built using domino logic, it
could prove to be an advantage.
 Nodes with large fanouts or long wires could use
buffers.
 The prefix trees can also be internally pipelined.
17: Adders
CMOS VLSI Design 4th Ed.
142
Transistor Level
17: Adders
CMOS VLSI Design 4th Ed.
143
Transistor Level
17: Adders
CMOS VLSI Design 4th Ed.
144
Transistor Level
17: Adders
CMOS VLSI Design 4th Ed.
145
Higher Valency Adders
17: Adders
CMOS VLSI Design 4th Ed.
146
Sparse Trees
 Building a prefix tree to compute carries in every bit
is expensive in terms of power.
 An alternative is to compute carries into short groups
such as s = 2,3,8, or 16 bits.
 Meanwhile, pairs of s-bit adders precompute the
sums assuming both carries-in of 0 and 1 to each
group.
 It is a hybrid between a prefix adder and carry select
adder.
17: Adders
CMOS VLSI Design 4th Ed.
147
Valency-3 BK Adder
 Sparse tree adder with s = 3
17: Adders
CMOS VLSI Design 4th Ed.
148
Carry-Select Implementation
17: Adders
CMOS VLSI Design 4th Ed.
149
Sparse Tree Adders
 Intel Valency-2 Sklansky sparse tree adder with s=4
17: Adders
CMOS VLSI Design 4th Ed.
150
Sparse Tree Adders
 Valency-3 Kogge-Stone sparse tree adder with s=3
17: Adders
CMOS VLSI Design 4th Ed.
151
Ling Adders
 Ling discovered a technique to remove one series
transistor from the critical group generate path at the
expense of another XOR gate in the sum
precomputation.
 Define a pseudo-generate Hi:j = Gi + Gi-1:j This is a
simpler computation.
Ki Hi: j  KiGi  KiGi1: j  Gi  KiGi1: j  Gi: j
 Define a pseudo-propagate signal I that is a shifted
version of propagate. I i: j  K i 1: j 1

H i: j  H i: k  I i: k H k 1: j
I i: j  I i: k I k 1: j
17: Adders
CMOS VLSI Design 4th Ed.
152
Ling Adders
 Finally, the sums are computed by
Si  Pi  K i1H i1:0 
Si  H i1:0 Pi  K i1 H i1:0 Pi 

17: Adders
CMOS VLSI Design 4th Ed.
153
Ling Adders
17: Adders
CMOS VLSI Design 4th Ed.
154
Comparison
 Standard-cell implementation, 0.8mm technology
17: Adders
CMOS VLSI Design 4th Ed.
155
Comparison
17: Adders
CMOS VLSI Design 4th Ed.
156
Summary
Adder architectures offer area / power / delay tradeoffs.
Choose the best one for your application.
Architecture
Classification Logic
Levels
Max
Tracks
Fanout
Cells
Carry-Ripple
N-1
1
1
N
Carry-Skip n=4
N/4 + 5
2
1
1.25N
Carry-Inc. n=4
N/4 + 2
4
1
2N
Brent-Kung
(L-1, 0, 0)
2log2N – 1
2
1
2N
Sklansky
(0, L-1, 0)
log2N
N/2 + 1
1
0.5 Nlog2N
Kogge-Stone
(0, 0, L-1)
log2N
2
N/2
Nlog2N
17: Adders
CMOS VLSI Design 4th Ed.
157
E vs Delay Trade-off
17: Adders
CMOS VLSI Design 4th Ed.
158
E vs Delay Tradeoff
90nm 64 bit domino KS Ling adder with various valency and s
17: Adders
CMOS VLSI Design 4th Ed.
159
Area vs Delay
Synthesized Adders
17: Adders
CMOS VLSI Design 4th Ed.
160
Download