CMOS VLSI Design CMOS VLSI Design 4th Ed.

advertisement
Lecture 18:
Datapath
Functional
Units
Outline





Multi-input Adders
Comparators
Shifters
Multipliers
More complex operations
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
2
Carry-Save Adders (CSA)
 You can use a carry save adder to add three n-bit
operands A0, A1, and A2 without performing any
carry propagation.
C,S  C  S  A0  A1  A2
2c i1  si  a0,i  a1,i  a2,i
i  0,1,,n 1
 You can also add one n-bit operand to an n-digit
carry-save
 operand.
C,Sout  A  C,Sin
 Result is in carry-save format.

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
3
Carry-Save Adders
 Parallel arrangement of full-adders => constant
delay.
A  7n , T  4n
 Multi-operand carry-save adders also possible (m>3)
– Array or tree arrangement.

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
4
Multi-Operand Adders
 Add three or more (m>2) n-bit operands.
 Yield n  logm-bit result in irredundant number
representation
 Array adders
 – Linear arrangement of CPAs
– Linear arrangement of CSAs and a final CPA
• The final CPA has to be fast. If it is an RCA,
the performances of the two alternatives are
equal.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
5
4-Operand CPA Array
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
6
4-Operand CSA Array
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
7
Multi-input Adders
 Suppose we want to add k N-bit words
– Ex: 0001 + 0111 + 1101 + 0010 = 10111
 Straightforward solution: k-1 N-input CPAs
– Large and slow
0001 0111 1101 0010
+
1000
+
10101
+
10111
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
8
Carry Save Addition
 A full adder sums 3 inputs and produces 2 outputs
– Carry output has twice weight of sum output
 N full adders in parallel are called carry save adder
– Produce N sums and N carry outs
X4
C4 S4
Y4 Z4 X3
Y3 Z3 X2
C3 S3
C2 S2
Y2 Z2 X1
Y 1 Z1
C1 S1
XN...1 YN...1 ZN...1
n-bit CSA
CN...1
18: Datapath Functional Units
SN...1
CMOS VLSI Design 4th Ed.
9
CSA Application
 Use k-2 stages of CSAs
– Keep result in carry-save redundant form
 Final CPA computes actual result
0001
0001 0111 1101 0010 0111
+1101
1011
4-bit CSA
0101_
0101_ 1011
0101_
1011
5-bit CSA
+0010
01010_
00011 00011
01010_
+
01010_
10111
+ 00011
10111
18: Datapath Functional Units
X
Y
Z
S
C
X
Y
Z
S
C
A
B
S
CMOS VLSI Design 4th Ed.
10
(m,2)-Compressors
m 1
m 4
 m 4 
l
2c  c out
 s   ai  c inl
 l 0 
i0
l 0
 1-bit adders. Similar to (m,k)-counters.
 
Compresses m bits down to 2 by forwarding (m-3)
intermediate carries to next higher position.
 No horizontal carry propagation.
 Built from full adders ((3,2) compressors) or (4,2)
compressors arranged in linear or tree structures,
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
11
(m,2)-Compressors
 Example: 4-operand adder using (4,2) compressors.
A  7m  2 , TLIN  4m  2 , TTREE  6logm 1
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
12
(m,2)-Compressors
 Structure of a (4,2) compressor
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
13
(m,2)-Compressors
 Advantages of (4,2)-compressors over FAs for
realizing (m,2)-compressors:
– Higher compression rate.
– Less deep and more regular trees.
 Example: (8,2) compressor.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
14
Tree adders (Wallace Tree)
 Adder tree: n-bit m-operand carry-save adder
composed of n tree structured (m,2) compressors.
 Fastest multi-operand adders using an adder tree
and a fast final CPA.
A  A(m,2)  n  ACPA  Omn  n log n 
T  Tm,2  TCPA  Olog m  log n 

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
15
Adder Arrays and Trees
 Some FAs can often be replaced by HAs or eliminated
altogether.
 Number of FAs does not depend on adder structure, but
number of HAs does.
 An m-operand adder accomdates (m - 1) carry inputs.
 Adder trees (T = O(logn)) are faster than adder arrays (T =
O(n)) at the same amount of gates (A = O(mn)).
 Adder trees are less regular and have more complex routing
than adder arrays => larger area, difficult layout => limited use
in layout generators.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
16
Sequential Adders
 Bit-serial adder
A  AFA  AFF , T  TFA  TFF , L  n

 Accumulators
– With CPA
A  ACPA  AREG , T  TCPA  TREG , L  m
– With CSA and final CPA


A  ACSA  ACPA  4 AREG , T  TCSA  TREG , L  m
• Allows higher clock rates
• Final CPA too slow
– Pipelining or multiple cycles
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
17
Complement and Subtraction
 2’s complement
A  A 1
 2’s complement subtracter
A  B  A  B
 A  B 1

 2’s complement adder/subtracter
sub
A  B  A  1 B

 A  B  sub  sub
 1’s complement adder

18: Datapath Functional Units
A  Bmod2n 1  A  B  c out
(end- around carry)
CMOS VLSI Design 4th Ed.
18
Subtraction
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
19
Increment/Decrement
 Adds a single bit cin to an n-bit operand A
c out ,Z   c out 2 n  Z  A  c in
z i  ai  c i
c i1  aic i ; i  0,,n 1
c 0  c in , c out  c n (r.m.a)
 Corresponds to addition with B=0 (FA -> HA)
 Example:
 Ripple-carry incrementer using half-adders
A  3n , T  n 1 , AT  3n 2

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
20
Increment/Decrement
 Or, use incrementer slices
 Prefix problem Ci:k = Ci:j+1Cj:k => AND prefix structure
1
1


A  nlogn  2n , T  logn  2 , AT  nlog2 n
2
2

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
21
Increment/Decrement
 Decrementer
cout ,Z  A  cin

 Incrementer-decrementer cout ,Z  A  cin

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
22
Fast Incrementers
 4-bit incrementer using multi-input gates
 8-bit parallel-prefix incrementer (Sklansky-AND
prefix structure)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
23
Gray Incrementer
 Increments in Gray number system
c 0  an 1  an 2  a0 (parity)
c i1  aic i ; i  0,,n  3
(r.m.a)
z0  a0  c 0
zi  ai  ai1c i1 ; i  1,,n  2
zn 1  an 1  c n 2
 Prefix problem => AND-prefix structure

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
24
Counting
 Count clock cycles => counter
 Divide clock frequency => frequency divider (cout)
 Binary counter
– Sequential incrementer/decrementer
– Incrementer speed-up techniques applicable
– Down-and up-down counters using incrementers
or incrementer-decrementers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
25
Example
 Ripple-carry up-counter using counter slices
(HA+FF), cin is count enable.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
26
Synchronous Counters
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
27
Synchronous Counters
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
28
Asynchronous Counters
 Uses toggle flip-flops.
– Lower toggle rate => lower power
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
29
Gray Counter
 Counter using Gray incrementers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
30
Fast Counters
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
31
Fast Counters
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
32
Ring Counters
 Shift register connected to ring
 State is not encoded => n FF for counting n states.
 Must be initialized correctly.
 Applications:
– Fast dividers (no logic between FF)
– State counter for one-hot coded FSMs
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
33
Johnson Counter
 Inverted feedback
 n FF for counting 2n states.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
34
3-bit LFSR
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
35
3-bit LFSR
Cycle
Q0
Q1
Q2/Y
0
1
1
1
1
0
1
1
2
0
0
1
3
1
0
0
4
0
1
0
5
1
0
1
6
1
1
0
7
1
1
1
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
36
8-bit LFSR
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
37
Comparators
 Comparison operations
EQ  A  B
(equal)
NE  A  B  EQ
(not equal)
GE  A  B
(greater or equal)
LT  A  B  G E
(less t han)
GT  A  B  GE EQ
LE  A  B  G T  G E  EQ
(greater than)
(less or equal)

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
38
Comparators




0’s detector:
A = 00…000
1’s detector:
A = 11…111
Equality comparator: A = B
Magnitude comparator: A < B
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
39
1’s & 0’s Detectors
 1’s detector: N-input AND gate
 0’s detector: NOTs + 1’s detector (N-input NOR)
A7
A6
A5
A4
A3
A2
A3
A2
allones
allzeros
A1
A0
A1
A0
A7
A6
A5
A4
A3
A2
allones
A1
A0
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
40
Equality Comparison
EQ  A  B
eqi1  ai  bi eqi
 ai  bi eqi ;
i  0,,n 1
eq0  1 , EQ  eqn (r.s.a)

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
41
Equality Comparator
 Check if each bit is equal (XNOR, aka equality gate)
 1’s detect on bitwise equality
B[3]
A[3]
B[2]
A[2]
A=B
B[1]
A[1]
B[0]
A[0]
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
42
Magnitude Comparison
GE  A  B
gei1  ai  bi  ai  bi gei
 aibi  ai  bi gei ; i  0,,n 1
ge0  1 , GE  gen (r.s.a.)

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
43
Magnitude Comparator
 Compute B – A and look at sign
 B – A = B + ~A + 1
 For unsigned numbers, carry out is sign bit
A B
C
B3
A B
N
A3
B2
A2
B1
Z
A=B
A1
B0
A0
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
44
Comparators
 Subtractor (A-B)
GE  c out
EQ  Pn 1:0
ARCA  7n , TRCA  2n or
3
APPAKS  n logn , TPPAKS  2logn
2
 Optimized comparator
– Removing
redundancies in subtractor (unused si)

– Single-tree structure => speed up at no cost
A  6n , TLIN  2n , TTREE  2log n
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
45
Comparators
 Example: ripple comparator using comparator slices
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
46
Signed vs. Unsigned
 For signed numbers, comparison is harder
– C: carry out
–
–
–
–
Z: zero (all bits of A – B are 0)
N: negative (MSB of result)
V: overflow (inputs had different signs, output sign  B)
S: N xor V (sign of result)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
47
Decoder
 Decodes binary number An-1:0 to vector Zm-1:0 (m= 2n)
1 if A  i
zi  
0 else ; i  0,,m 1
Z  2A

A  n 12n , T  logn

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
48
Encoder
 Encodes vector Am-1:0 to binary number Zn-1:0 (m =2n)
i k if k  i thenak 1 else ak  0
Z  i if ai 1 ; i  0,,m 1 Z  log2 A

A  n2 n 1 1
T  n 1

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
49
Detection Operations
z  an 1  an 2  a0
 All-zeroes detection
z  an 1an 2 a0
 All-ones detection

A  n , T  log n

 Leading zeroes detection
 normalization, priority encding
– For scaling,
0101010
A  2n , T  n

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
50
Shift,Extension,Saturation
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
51
Shift,Extension,Saturation
 Applications
– Adaption of magnitude or word length of operands.
– Multiplication/division by multiples of 2
– Logic bit/byte operations
– Scaling of numbers for word length reduction
– Reducing error after under-/overflow
 Implementation of shift/extension/rotation by
– Constant values: hard-wired
– Variable values: multiplexers
– n possible values: n-by-n barrel-shifter/rotator
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
52
Shifters
 Logical Shift:
– Shifts number left or right and fills with 0’s
• 1011 LSR 1 = 0101
1011 LSL1 = 0110
 Arithmetic Shift:
– Shifts number left or right. Rt shift sign extends
• 1011 ASR1 = 1101
1011 ASL1 = 0110
 Rotate:
– Shifts number left or right and fills with lost bits
• 1011 ROR1 = 1101
1011 ROL1 = 0111
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
53
Funnel Shifter
 A funnel shifter can do all six types of shifts
 Selects N-bit field Y from 2N–1-bit input
– Shift by k bits (0  k < N)
– Logically involves N N:1 multiplexers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
54
Funnel Source Generator
Rotate Right
Logical Right
Arithmetic Right
Rotate Left
Logical/Arithmetic Left
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
55
Array Funnel Shifter
 N N-input multiplexers
– Use 1-of-N hot select signals for shift amount
– nMOS pass transistor design (Vt drops!)
k[1:0]
left
Inverters & Decoder
s3
s2
s1
s0
Y3
Y2
Z6
Y1
Z5
Y0
Z4
Z3
Z2
Z1
18: Datapath Functional Units
Z0
CMOS VLSI Design 4th Ed.
56
Logarithmic Funnel Shifter
 Log N stages of 2-input muxes
– No select decoding needed
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
57
32-bit Logarithmic Funnel
 Wider multiplexers reduce delay and power
 Operands > 32 bits introduce datapath irregularity
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
58
Barrel Shifter
 Barrel shifters perform right rotations using wraparound wires.
 Left rotations are right rotations by N – k = k + 1 bits.
 Shifts are rotations with the end bits masked off.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
59
4-by-4 Barrel Rotator
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
60
Logarithmic Barrel Shifter
Right shift only
Right/Left shift
18: Datapath Functional Units
Right/Left Shift & Rotate
CMOS VLSI Design 4th Ed.
61
32-bit Logarithmic Barrel
 Datapath never wider than 32 bits
 First stage preshifts by 1 to handle left shifts
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
62
Binary Shifter
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
63
Barrel Shifter
Area dominated
by wiring
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
64
4x4 barrel shifter
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
65
Logarithmic Shifter
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
66
0-7 bit Logarithmic Shifter
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
67
Addition Flags
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
68
Adder with Flags




C, N: for free
V: fast cn, cn-1 computed by PPA => Very cheap
Z: cin=1 (subtract.): Z = (A=B) = Pn-1:0 of PPA
cin = 0/1
Z  sn 1  sn 2  s0 (r.s.a)
A  ACPA  n , TZ  TCPA  logn
 Faster without final sum
z  a  b  c 

z  a  b  a  b 
0
0
i
i
0
i
in
i1
i1
Z  zn 1zn 2 z0 ; i  0,,n 1 (r.s.a.)
A = ACPA  3n , TZ  4  logn
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
69
Condition Flags
 Signed and unsigned addition/subtraction differ only
with respect to condition flags
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
70
ALU
 Arithmetic Logic Unit (ALU)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
71
ALU Operations
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
72
Multiplication
 Example:
1100 : 1210
0101 : 510
1100
0000
1100
0000
00111100 : 6010
multiplicand
multiplier
partial
products
product
 M x N-bit multiplication
– Produce N M-bit partial products
– Sum these to produce M+N-bit product
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
73
General Form
Y = (yM-1, yM-2, …, y1, y0)
X = (xN-1, xN-2, …, x1, x0)
 Multiplicand:
 Multiplier:
N 1
N 1 M 1
 M 1



P    y j 2 j    xi 2i    xi y j 2i  j
 i 0 j 0
 j 0
  i 0
 Product:
p11
y5
y4
y3
y2
y1
y0
x5
x4
x3
x2
x1
x0
x0y5
x0y4
x0y3
x0y2
x0y1
x0y0
x1y5
x1y4
x1y3
x1y2
x1y1
x1y0
x2y5
x2y4
x2y3
x2y2
x2y1
x2y0
x3y5
x3y4
x3y3
x3y2
x3y1
x3y0
x4y5
x4y4
x4y3
x4y2
x4y1
x4y0
x5y5
x5y4
x5y3
x5y2
x5y1
x5y0
p10
p9
p8
p7
p6
p5
18: Datapath Functional Units
p4
p3
p2
multiplicand
multiplier
partial
products
p1
p0
CMOS VLSI Design 4th Ed.
product
74
Binary Multiplication
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
75
Dot Diagram
 Each dot represents a bit
x0
partial products
multiplier x
x15
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
76
Array Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
77
Array Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
78
Carry Save Array Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
79
Array Multiplier
y3
y2
y1
y0
x0
x1
CSA
Array
x2
x3
CPA
p7
p6
p5
p4
p3
p2
p1
p0
A B
critical path
Sin A Cin
B
Sin
B
Cout
=
Cout
Cout
A
Sout
18: Datapath Functional Units
Cin
A
=
Cout
Cin
Sout
B
Cin
Sout
Sout
CMOS VLSI Design 4th Ed.
80
Rectangular Array
 Squash array to fit rectangular floorplan
y3
y2
y1
y0
x0
p0
x1
p1
x2
p2
x3
p3
p7
18: Datapath Functional Units
p6
p5
p4
CMOS VLSI Design 4th Ed.
81
Multiplier Floorplan
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
82
Sequential Multipliers
 Partial products generated and added sequentially
using an accumulator.
A  On , T  Ologn , L  n

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
83
Array Multipliers
 Partial products generated and added
simultaneously in linear array using array adder.
A  On 2  , T  On

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
84
Multiplication Algorithm
 Generation of partial products
 Adding up partial products
– Sequentially (sequential shift and add)
– Serially (combinational shift and add)
– In parallel
 Speed-up techniques
– Reduce the number of partial products
– Accelerate addition of partial products
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
85
Parallel Multipliers
 Partial products generated in parallel and added
subsequently in multi-operand adder (using tree
adder)
A  On 2  , T  Ologn
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
86
Signed Multipliers
 What about signed multiplication?
– Complement operands before and result after
multiplication => unsigned multiplication
– Direct implementation (dedicated signed
multipliers.
 Unsigned array multiplier using CSA and a final CPA
is sometimes called Braun multiplier.
 The unit gate model yields for a CPA of type RCA
A  8n 2 11n
T  6n  9
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
87
Modified Braun Multiplier
 For multiplying two’s complement numbers
 Sometimes called Pezaris multiplier
 Subtract bits with negative weight => special FA’s
1 neg. bit:  a  b  c in  2c out  s
2 neg. bits: a  b  c in  2c out  s
 Otherwise, exactly same structure and complexity as
theBraun multiplier => efficient and flexible
A  a 2   a 2
6
7
i
7
i
i0
6
B  b7 2   bi 2 i
7
i0
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
88
Modified Braun Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
89
Modified Braun Multiplier
 Type-1 adder has one negative input, the sum
output is negative.
 Type 2 adder has two negative inputs, the carry
output is negative.
 You can also design an adder with three negative
inputs and two negative outputs (Type 3 adder), but
it is never used.
 Type 0 and Type 3 adders are identical.
 Type 1 and Type 2 adders are identical.
s  a  b  c in
c out  a b  a c in  bcin
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
90
Baugh-Wooley Multiplier
3
3

 

4
i
4
i
P  A B  a4 2   ai 2  b4 2   bi 2 

 

i0
i0
3
3
3
3
 a4 b4 2    aib j 2  a4 2  b j 2  b4 2  ai 2 i
123
j 0
MSB
1i0 4j402 4 43 1 4 4
4 4 2 4 4 4i0 4 3
8
i j
ordinary multiplication
4
j
4
extra terms
3
3




8
i j
4
4
j
4
4
i
 a4 b4 2    aib j 2  a4 2 
2   b j 2 1
 b4 2 2   ai 2 1


i0 j 0

j 0

i0
3
3
3
3
3
 a4 b4 2 8    aib j 2 i j  a4 1 b4 12 8  a4  b4 2 4   a4 bi  aib4 2 i4
i0 j 0
i0

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
91
Baugh-Wooley Multiplier
29
28
a4 b4
a4
1
b4
s9
s8
27
25
24
23
22
21
20
a3b0
a2b1
a2b0
a1b1
a1b0
a0b1
a0b0
a4 b1
a4 b0
a3b1
a4 b2
a3b2
a2b2
a1b2
a0b2
a4 b3
a3b3
a2b3
a1b3
a0b3
a3b4
a2b4
a1b4
a0b4
a4
s1
s0
b4
s7
18: Datapath Functional Units

26
s6
s5
s4
s3
CMOS VLSI Design 4th Ed.
s2
92
Baugh-Wooley Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
93
Baugh-Wooley Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
94
Fewer Partial Products
 Array multiplier requires N partial products
 If we looked at groups of r bits, we could form N/r
partial products.
– Faster and smaller?
– Called radix-2r encoding
 Ex: r = 2: look at pairs of bits
– Form partial products of 0, Y, 2Y, 3Y
– First three are easy, but 3Y requires adder 
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
95
Booth Encoding
 Let us first try out base 4 encoding
n 1
n 2
2
i0
i0
B  bi 2i  c i 4 i
 The ci have to be 0,1,2, or 3. However, 3 is
problematic.

 Try the following:
n2
2
B    b2i 22i
for even bits
i 0
n2
2
B    b2i1 22i1
for odd bits
i 0
B  B   B 
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
96
Booth Encoding
 Numerical example:
1810  1 24  0 23  0 22 1 21  0 20  100102
1810  0 20  0 22 1 24  1 21  0 23  0 25 
 Reordering terms,



1
B  B 2
{ B  2 2  B
14 2 43
shift left
shift right
c i  2b2i1  b2i  b2i1
c i  2,1,0,1,2

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
97
Booth Encoding
 The ci can be written as
c 0  b1  b0  2b1
c1  b1  b2  2b3
c 2  b3  b4  2b5
c 3  b5  b6  2b7
c 4  b7  b8  2b9
 Take b-1 as 0. For an 8-bit unsigned number, take b8
and b9 as0 as well.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
98
Booth Encoding
 Take 18 as a numerical example again
1810  100102
c 0  0 1 0 11 2  2
c1  1 1 0 1 0 2  1
c 2  0 11 1 0 2  1
1810  112   2 4 0 1 41 1 4 2  18
 For two’s complement signed numbers, extension to
the leftside should not be used.
1810  101101 1  101110
 1 02   2 4 0  0 41  14 2  18
18: Datapath Functional Units

CMOS VLSI Design 4th Ed.
99
Booth Encoding
 Note that Booth notation is redundant.
024  124  2
 However, the method shown above always yields
the same representation for the same binary
numbers. 
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
100
Booth Encoding
 Instead of 3Y, try –Y, then increment next partial
product to add 4Y
 Similarly, for 2Y, try –2Y + 4Y in next partial product
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
101
Booth Encoding
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
102
Booth Hardware
 Booth encoder generates control lines for each PP
– Booth selectors choose PP bits
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
103
Booth Multipliers
 Applicable to sequential, array, and parallel
multipliers.
 Additional recoding logic and more complex partial
product generation (+8n in terms of area and +7 in
terms of delay)
 Adder array/tree cut in half.
• Considerably smaller (array and tree)
• Twice as fast for adder arrays
• Slightly faster for adder trees.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
104
Booth Multipliers
 Negative partial products require sign extension.
 Suited for signed multiplication.
 Radix 8 (3-bit recoding) possible.
– Reduces partial products 3 times.
– Pre-computing 3B, … is difficult.
– Sometimes used.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
105
Sign Extension
 Partial products can be negative
– Require sign extension, which is cumbersome
– High fanout on most significant bit
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s s s s s s s
s s s s s
s s s
s
s
s
s
s
s
PP0
PP1
PP2
s
s
s
0 x-1
x0
PP3
PP4
multiplier x
s
s
s
s
s
s
s
s
PP5
PP6
PP7
PP8
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
x15
0 x16
0 x17
106
Simplified Sign Ext.
 Sign bits are either all 0’s or all 1’s
– Note that all 0’s is all 1’s + 1 in proper column
– Use this to reduce loading on MSB
s
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
s
1 1 1 1 1 1 1 1 1 1 1 1 1
s
1 1 1 1 1 1 1 1 1 1 1
s
1 1 1 1 1 1 1 1 1
s
1 1 1 1 1 1 1
s
1 1 1 1 1
s
1 1 1
s
1
s
s
s
s
s
s
s
s
PP0
PP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
107
Even Simpler Sign Ext.
 No need to add all the 1’s in hardware
– Precompute the answer!
s s s
1 s
s
1 s
s
1 s
s
1 s
s
1 s
s
1 s
s
s
s
s
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
PP0
PP1
PP2
PP3
PP4
PP5
PP6
PP7
PP8
108
Advanced Multiplication
 Signed vs. unsigned inputs
 Higher radix Booth encoding
 Array vs. tree CSA networks
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
109
Tree Addition
 Wallace Trees.
 Very irregular tree.
– Irregular wiring and/or layout
– Non-uniform bit-arrival times at the final adder.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
110
Wallace Tree Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
111
Wallace Tree Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
112
Dot Diagram for Array Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
113
Dot Diagram for Tree Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
114
4:2 Tree Multiplier
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
115
4:2 Compressor
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
116
Carry-Save Adder
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
117
4:2 Compressors
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
118
4:2 Compressors
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
119
16x16 Booth Encoded Multipliers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
120
TDM Multipliers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
121
Vertical Compressor Slice
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
122
CPA Prefix Network
 Nonuniform arrival times
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
123
Multiplier Implementations
 Sequential Multipliers
– Low performance, small area, resource sharing
 Braun or Baugh-Wooley Multiplier (array multiplier)
– Medium performance, high area, high regularity
– Layout generators => data paths and macro cells
– Simple pipelining, faster CPA => higher speed
 Booth-Wallace Multiplier
– High performance, high area, low regularity
– Custom multipliers, netlist generators
– Often pipelined (between CSA and CPA)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
124
Composition from Smaller Multipliers
 (2n x 2n)-bit multiplier can be composed from
4 (n x n)-bit multipliers (can be repeated recursively).
A B  AH 2 n  AL  B H 2 n  BL 
 AH B H 2 2n  A H BL  AL BH 2 n  AL BL
 This requires 4 (n x n)-bit multipliers and (2n)-bit
CSA and (3n)-bit CPA.
 Less efficient in terms of area and speed.

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
125
Squaring
2
 Squaring is actually multiplication P  A  AA
 Multiplier optimizations possible.
a3
a 2 a3
a2 a3
a3 a2
a1a3
a1a3
a2
a3 a1
a0 a3
p5
a1a2
a2 a2
p4
a 3 a3
p7
p6
a0 a3
a0 a2
a0 a1
a1a2
a
2 a1
a3 a0
a0 a2
a1
a2 a0
a1a0
a0 a1
a0
a0 a0
a1a1
p3
p2
p1
p0
 n 21 partial products => optimized squarer better
than
multiplier. Table lookup (ROM) less efficient.


18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
126
Division
 Division basics
A
R
 Q
B
B
A  Q B  R ; R  B
 Conditions on values:
2n
n
A

0,2
1
,
B,Q,R

0,2
1, B  0




 Algorithms
– Subtract and shift
 – Sequential, recursive, non-associative
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
127
Division
 Basic Algorithm
– Compare and conditionally subtract
– Expensive comparison and CPA
 Restoring Division
– Subtract and conditionally restore
– Expensive CPA and restoring
 Non-restoring division
– Detect sign, subtract/add, and correct by next
steps.
– Expensive CPA
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
128
Division
 SRT Division
– Estimate range, subtract/add (CSA), correct by
next steps.
– Inexpensive CSA
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
129
Restoring Division
 Put x in register A, d in register B, 0 in register P,
and perform n divide steps (n is the quotient
wordlength).
 Each step consists of
– Shift the register pair (P,A) one bit left.
– Subtract the contents of B from P, put the result
back in P.
– If the result is negative, set the low order bit of A
to 0, otherwise to 1.
– If the result is negative, restore the old value of P
by adding the contents of B back into P.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
130
Restoring Division
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
131
Restoring Division
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
132
Non-restoring Division
 A variant that skips the restoring step and instead
works with negative residuals.
 If P is negative,
– Shift the register pair (P,A) one bit left.
– Subtract the contents of register B from P.
 If P is negative, set the low-order bit of A to 0,
otherwise set to 1.
 After n cycles,
– The quotient is in A
– If P is positive, it is the remainder, otherwise it
has to be restored (add B to it).
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
133
Non-restoring Division
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
134
Non-restoring Division
A  n 1ACPA
 On 2  or On 2 logn 
T  n 1TCPA
 On 2  or On logn 

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
135
Signed Division
 Example: Signed non-restoring array divider.
 B>0, final correction step omitted
A  9n 2 , T  2n 2  4n

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
136
SRT Division
 Sweeney, Robertson, Tocher
1 if B2 i  Ri1


qi  0 if  B2 i  Ri1  B2 i

i
1 if Ri1  B2
 If B is normalized

If 2 n 1  B  2 n
B2 i  2 n i1  Ri1  2 n i1  B2 i
1 if 2 n i1  Ri1


qi  0 if  2 n i1  Ri1  2 n i1

n i1
1 if Ri1  2

18: Datapath Functional
Units
CMOS VLSI Design 4th Ed.
137
SRT Division
 Only 3 MSB are compared
– qi’ are estimated
– CSA instead of CPA can be used
 Correction in the following steps + final correction
step.
 Redundant representation of qi’ (SD representation),
final conversion necessary (CPA).
 Highly regular and fast O(n) SRT aray dividers
– Only slightly slower/larger than array multipliers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
138
SRT Division
A  nACSA  2ACPA
 On 2 
T  nTCSA  TCPA
 On 

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
139
SRT Division
 Pre-normalization of divisor ½ ≤ d ≤ 1 and dividend
x<d.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
140
SRT Division
 The quotient digit set plays a crucial role in the
complexity of implementation.
 Restoring algorithm: 0 ≤ qi ≤ r-1
 Non-restoring algorithm: qi  1,1
 SRT: quotient digit selection function
1 if 1  2wi
2


 12  2wi  12
qi1  0 if

1

1 if 2w i   2
 SRT division is very fast in the case of consecutive
zeros in q.

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
141
SRT Division
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
142
SRT Division
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
143
High Radix Division
 Radix b
b  2 m , qi  b 1,K ,1,0,1,K , b 1
 m quotient bits per step => fewer, but more complex
 steps.
 Suitable for SRT algorithm => faster
 Complex comparisons and decisions
 Table look-up (Pentium bug)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
144
Pentium Bug
 March 1993: Intel introduces the Pentium
 June 1994: Prof. Thomas Nicely, Lynchburg
College, reports errors in calculating twin primes
reciprocals.
 October 1994: After considerable background
discussion, word starts circulating on the Internet.
Others confirm error and find more instances.
 November 1994: Tim Coe, of Vitess Semiconductor,
proposes a [substantially correct] software model
explain the cause.An Intel internal report analyzes a
flaw in the Pentium FDIV instruction.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
145
Pentium bug
 Intel CEO Andy Grove responds (Nov. 24, 1994):
– Minor bug known at Intel since mid-94.–
– All micros have bugs.
– “Average user” will never see the problem
(MTBE: 27,000 years).
– Most applications do fewer than 1,000 divisions a
day (?!).
– FDIV error rate is about 1.5 × 10−9
– Error conditions guarantee small errors.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
146
Pentium Bug
 Response (continued)
– Many applications (e.g. graphics) can tolerate
occasional small errors.
– Offers replacement for justified need.
 Popular press generally accepts Intel’s claims about
“obscure error.”
 Intel confirms 2 million defective chips have been
shipped.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
147
Pentium Bug
 December 1994: IBM disagrees (MTBE: 24 days);
stops shipment of Pentium based PCs.
– Even casual spreadsheet users may do about
4.2× 106 divides per day.
– The error distribution is not uniform.
– Under some reasonable conditions FDIV error
rate can approach 10−2
 Some question IBM’s motives.
 A flurry of Internet communication condemns Intel’s
attitude and questions its evaluation of the problem.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
148
Pentium Bug
 Intel revises replacement policy. Hard to interpret
policy but easy to accomplish in practice. 2% of
home users and 10% businesses eventually get replacements.
 Intel (Andy Grove) admits it mishandling the
problem, but stands by its evaluation.
 Public perception is that Intel was responsive ⇒
positive publicity.
 March 1995: Coe, et al. article appears in IEEE
Journ. Computational Sci. and Eng.
 May 1995: Lamport article appears at TAPSOFT.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
149
Pentium Bug
 Kahan posts should-have-known SRT test article.
 1996: Intel establishes the world’s largest verification
division, dominating industrial research through
20??.
 Reported cost of the Pentium affair reportedly $450
million; $15/$16 billion market in 1996. Intel
Marketing Rep: “. . . wrote it off to advertising.”
 1997–2000: All major μprocessor manufacturers
adopt formal verification.
 Surge in CAD industry tool offerings.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
150
Pentium Bug
 Significant research results appear in floating point
verification.
 2000–2002: Articles, conference panel sessions on
verification “culture.”
 IC technology roadmap: looming “design crisis.”
 Nice discussion of SRT and the bug in
http://www.eng.utah.edu/~cs5830/handouts/lec-SRT.pdf
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
151
Pentium Humor
 Q: How many Pentium designers does it take to
screw in a light bulb?
– A: 1.99904274017, but that’s close enough for
non-technical people.
 Q: What’s another name for the ”Intel Inside” sticker
they put on Pentiums?
– A: The warning label.
 Q: What do you call a series of FDIV instructions on
a Pentium?
– A: Successive approximations.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
152
Pentium Humor
– Q: Why didn’t Intel call the Pentium the 586?
• A: Because they added 486 and 100 on the
first Pentium and got 585.999983605.
– Q: According to Intel, the Pentium conforms to
the IEEE standards 754 and 854 for floating point
arith-metic. If you fly in aircraft designed using a
Pentium, what is the correct pronunciation of
”IEEE”?
– A: Aaaaaaaiiiiiiiiieeeeeeeeeeeee!
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
153
Division by Multiplication
 Division by convergence
Q
A A R0 R1 L Rm 1

B B R0 R1 L Rm 1
1
Q
 B 
1 1
B
B
A
Bi1  Bi Ri  2 n 1  y  1 y   2 n 1  y 2 
14 2 43 1 2 3
Bi
Ri
y  1  Bi 2 n , Ri  2  Bi 2 n  Bi 1
 Algorithm:

Bi1  Bi  Ri , Ai1  Ai  Ri
Ri  Bi 1 , i  0,K ,m 1
A0  A , B0  B , Q  Am
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
154
Division by Multiplication
 Quadratic convergence
L  log n

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
155
Division by Reciprocation
 Use the reciprocal
Q
A
1
 A
B
B
 How to find the reciprocal?
– Newton-Raphson iteration method

find f X  by recursionX i1  X i 
f X 0 
f X i 
1 
1
1
f X    B , f X    2 , f   0
B 
X
X

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
156
Division by Reciprocation
 Algorithm:
X i1  X i  2  B X i  ; i  0,K ,m 1
X0  B , Q  Xm
 Quadratic convergence L = O (log n)
 Speed-up: first approximation of X0 from table.

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
157
Divider Implementations
 Iterative dividers (through multiplication):
– Resource sharing of existing components
(multiplier)
– Medium performance, medium area
– High efficiency if components are shared
 Sequential dividers (restoring, non-restoring, SRT)
– Resource sharing of existing components (e.g.
adder)
– Low performance, low area
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
158
Divider Implementations
 Array dividers
– Dedicated hardware component
– High performance, high area
– High regularity -> layout generators, pipelining
– Square root extraction possible by minor changes
– Combination with multiplication and/or square
root
 No parallel dividers exist as compared to parallel
multipliers.
– Sequential nature of division.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
159
Square Root
 Algorithm:
A  R  Q , A  Q2  R
A  0,2 2n 1 , Q  0,2 n 1
Qi  Qi1  qi 2 i  qn 1,K ,qi ,0,K ,0
2
Q  Qi1  qi 2   Qi1
 qi 2 i 2Qi1  qi 2 i 
2
i
i

2

qi  Ri1  2 i 2Qi1  2 i  , Qi  Qi1  qi 2 i
Ri  Ri1  qi 2 i 2Qi1  qi 2 i  ; i  n 1,K ,0
Rn  A , Qn  0 , R  R0 , Q  Q0

18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
160

Square Root
 Implementation:
– Similar to division -> same algorithms applicable
– Restoring, non-restoring, SRT, high radix
 Combination with division in same component
possible
 Only triangular array required
ADIV
A
2
T  TDIV
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
161
Elementary Functions





Exponential function: ex, exp(x)
Logarithm function: ln x, log x
Trigonometric functions: sin x, cos x, tan x
Inverse trig. Functions: arcsin x, arccos x, arctan x
Hyperbolic functions: sinh x, cosh x, tanh x
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
162
Algorithms
 Table lookup
– Inefficient for large word lengths
 Taylor series expansion
– Complex implementation
 Polynomial and rational approximations
 Shift and add algorithms
 Convergence algorithms
– Similar to division by convergence
– Two (or more) recursive formulas: one formula
converges to a constant, the other to the result.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
163
Algorithms
 Coordinate rotation (CORDIC)
– 3 equations for x-, y- coordinate, and angle
– Computes all elementary functions by proper
input settings and choice of odes and outputs
– Simple, universal hardware, small look-up table.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
164
Design Levels
 Transistor level design
– Circuit and layout designed by hand (full custom)
– Low design efficiency
– High circuit performance
– High flexibility: choice of architecture and logic
style
– Transistor level circuit optimizations
• Logic style (static/dynamic logic,
complementary CMOS/pass-transistor logic)
• Special arithmetic circuits better than with
gates.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
165
Design Levels
 Gate level design
– Cell based design techniques: standard cells,
gate-array/sea-of-gates, field programmable gate
array (FPGA)
– Circuit implemented by hand or synthesis (library)
– Layout implemented by automated place and
route
– Medium to high design efficiency
– Medium to low circuit performance
– Medium to low flexibility: full choice of
architecture.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
166
Design Levels
 Block level design
– Layout blocks and netlists from parameterized
automatic generators or compilers
– High design efficiency
– Medium to high circuit performance
– Low flexibility (limited choice of architectures)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
167
Design Levels
 Block level design
– Implementations:
• Data-path: bit-sliced, bus oriented layout,
implementation of entire data paths, medium
performance, medium diversity
• Macro-cells: tiled layout, fixed/single operation
components, high performance, small diversity
• Portable netlists: gate level design
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
168
Synthesis
 High-level synthesis
– Synthesis from abstract, behavioral hardware
description (e.g., data dependency graphs) using
e.g. VHDL
– Involves architectural synthesis and arithmetic
transformations
– High-level synthesis still not fully mature
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
169
Synthesis
 Low-level synthesis
– Layout and netlist generators
– Included in libraries and synthesis tools
– Low level synthesis is state-of-the art
– Basis for efficient ASIC design
– Limited diversity and flexibility of library
components
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
170
Synthesis
 Circuit optimization
– Efficient optimization of random-logic is state of
the art.
– Optimization of entire arithmetic circuits is not
feasible
• Only local optimizations possible
– Logic optimization cannot replace the synthesis of
efficient arithmetic circuit structures using
generators.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
171
Low Power
 High glitching activity due to high bit dependencies
and large logic depth
 Reduce the switched capacitance by choosing an
area efficient circuit architecture
 Allow for lower supply voltage by speeding up the
circuitry
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
172
Low Power
 Reduce the transition activity
– Apply stable inputs when circuit not in use
• Disable circuits
– Reduce glitching transitions by balancing signal
paths (partly done by speed-up techniques,
otherwise difficult to realize)
– Reduce glitching transitions by reducing logic
depth
– Take advantage of correlated data streams
– Choose appropriate number representations (e.g.
Gray codes for counters)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
173
Testability
 Testability goal: high fault coverage with few test
vectors that are easy to generate/apply.
 Random test vectors: easy to generate and
apply/propagate, few vectors give high (but not
perfect) fault coverage for most arithmetic circuits.
 Special test vectors: sometimes hard to generate
and apply, required for coverage of hard-detectable
faults which are inherent in most arithmetic circuits.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
174
Testability
 Hard detectable faults found in:
– Circuits of arithmetic operations with inherent
special cases (arithmetic exceptions): detectors,
comparators, incrementers, and counters
(MSBs), adder flags.
– Circuits using redundant number representations
(≠ redundant hardware): dividers (Pentium bug!)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
175
Download