Lecture 2

advertisement
ELECT 90X
Programmable Logic Circuits:
Computer Arithmetic: Introduction
Dr. Eng. Amr T. Abdel-Hamid
Slides based on slides prepared by:
• B. Parhami, Computer Arithmetic: Algorithms and Hardware
Design, Oxford University Press, 2000.
• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. P
eters, Natick, MA, 2002.
Fall 2009
What is Computer Arithmetic?
Programmable Logic Circuits
Pentium Division Bug (1994-95): Pentium’s radix-4 SRT
algorithm occasionally gave incorrect quotient
First noted in 1994 by T. Nicely who computed sums of re
ciprocals of twin primes:
1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .
Worst-case example of division error in Pentium:
Dr. Amr Talaat
c = 4 195 835 =
3 145 727
1.333 820 44...
1.333 739 06...
Correct quotient
circa 1994 Pentium
double FLP value;
accurate to only 14 bits
(worse than single!)
ELECT 90X
A Motivating Example
Programmable Logic Circuits
Dr. Amr Talaat
Using a calculator with √, x2, and xy functions, compute:
u = √√ … √ 2 =
1.000 677 131 “1024th root of 2”
v = 21/1024
=
1.000 677 131 Save u and v; If you ca
n’t save, recompute values when needed
x = (((u2)2)...)2
=
1.999 999 963
x' = u1024
=
1.999 999 973
y = (((v2)2)...)2
=
1.999 999 983
y' = v1024
=
1.999 999 994
Perhaps v and u are not really the same value
w = v – u = 1  10–11 Nonzero due to hidden digits
(u – 1)  1000 =0.677 130 680
[Hidden ... (0) 68]
(v – 1)  1000 =0.677 130 690
[Hidden ... (0) 69]
ELECT 90X
Finite Range Can Lead to Disaster
Programmable Logic Circuits
Example: Explosion of Ariane Rocket (1996 J
une 4)
Dr. Amr Talaat
Unmanned Ariane 5 rocket of the European Space Agency v
eered off its flight path, broke up, and exploded only 30 s
after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its first voyage
after a decade of development costing $7 billion
Cause: “software error in the inertial reference system”
Problem specifics: A 64 bit floating point number relating
to the horizontal velocity of the rocket was being convert
ed to a 16 bit signed integer
An SRI* software exception arose during conversion becaus
e the 64-bit floating point number had a value greater th
an what could be represented by a 16-bit signed integer
(max 32 767) *SRI = Inertial Reference System
ELECT 90X
Encoding Numbers in 4 Bits
16 14 12 10
8
6
4
2
0
2
4
6
8
10
12
14
16
Programmable Logic Circuits
Number
format
Unsigned integers
Signed-magnitude

3 + 1 fixed-point, xxx.x
Signed fraction, .xxx

2’s-compl. fraction, x.xxx
Dr. Amr Talaat
2 + 2 floating-point, s  2 e
e in [2, 1], s in [0, 3]
e
2 + 2 logarithmic (log = xx.xx)
log x
s
Some of the possible ways of assigning 16 distinct codes to represent n
umbers.
ELECT 90X
The Binary Number System
Programmable Logic Circuits
 In conventional digital computers - integers repr
esented as binary numbers of fixed length n
 An ordered sequence
of bi
nary digits
 Each digit x (bit) is 0 or 1
i
 The above sequence represents the integer value
X
Dr. Amr Talaat
 Upper case letters represent numerical values or s
equences of digits
 Lower case letters, usually indexed, represent indi
vidual digits
ELECT 90X
Radix of a Number System
Programmable Logic Circuits
Dr. Amr Talaat
 The weight of the digit xi is the i th power of 2
 2 is the radix of the binary number system
 Binary numbers are radix-2 numbers allowed digits are 0,1
 Decimal numbers are radix-10 numbers - allo
wed digits are 0,1,2,…,9
 Radix indicated in subscript as a decimal numb
er
 Example:
 (101)10 - decimal value 101
 (101) - decimal value 5
2
ELECT 90X
Range of Representations
Programmable Logic Circuits
Dr. Amr Talaat
 Operands and results are stored in registers of
fixed length n - finite number of distinct value
s that can be represented within an arithmetic
unit
 Xmin ; Xmax - smallest and largest representab
le values
 [Xmin,Xmax] - range of the representable num
bers
 A result larger then Xmax or smaller than Xmin
- incorrectly represented
 The arithmetic unit should indicate that the ge
nerated result is in error - an overflow indica
tion
ELECT 90X
Example - Overflow in Binary System
 Unsigned integers with 5 binary digits (bits)
Programmable Logic Circuits
 Xmax = (31)10 - represented by (11111)2
Dr. Amr Talaat
 Xmin = (0)10 - represented by (00000)2
 Increasing Xmax by 1 = (32)10 =(100000)2
 5-bit representation - only the last five digits retained yielding (00000)2 =(0)10
 In general  A number X not in the range [Xmin,Xmax]=[0,31] is
represented by X mod 32
 If X+Y exceeds Xmax - the result is S = (X+Y) mod 32
 Example: X 10001 17
+Y 10010 18
1 00011
3 = 35 mod 32
 Result has to be stored in a 5-bit register - the most signif
icant bit (with weight 2 =32)
is discarded
5
ELECT 90X
Fixed Radix Systems
Programmable Logic Circuits
 r - the radix of the number system
 Conventional number systems are also called fix
ed-radix systems
 With no redundancy -
0  xi  r-1
 xi  r introduces redundancy into the fixed-radix
number system ?? HOW?
 If xi  r is allowed -
Dr. Amr Talaat
 two machine representations for the same value
-(...,xi+1,xi,... )
and
(...,xi+1+1,xi-r,... )
ELECT 90X
Representation of Mixed Numbers
Programmable Logic Circuits
 A sequence of n digits in a register - not necessa
rily representing an integer
 Can represent a mixed number with a fractional
part and an integral part
 The n digits are partitioned into two - k in the in
tegral part and m in the fractional part (k+m=n)
 The value of an n-tuple with a radix point betwee
n the k most significant digits and the m least sig
nificant digits
Dr. Amr Talaat
 is
ELECT 90X
Fixed Point Representations
Programmable Logic Circuits
Dr. Amr Talaat
 Radix point not stored in register - understood to be in a fix
ed position between the k most significant digits and the m
least significant digits
 These are called fixed-point representations
 Programmer not restricted to the predetermined position of
the radix point
 Operands can be scaled - same scaling for all operands
 Add and subtract operations are correct  aX  aY=a(X  Y) (a - scaling factor)
 Corrections required for multiplication and division
 aX  aY=a2 X  Y ; aX/aY=X/Y
 Commonly used positions for the radix point  rightmost side of the number (pure integers - m=0)
 leftmost side of the number (pure fractions - k=0)
ELECT 90X
ULP - Unit in Last Position
Programmable Logic Circuits
 Given the length n of the operands, the weigh
t -m
r
of the least significant digit indicates the
position of the radix point
 Unit in the last position (ulp) - the weight of t
he least significant digit
 ulp = r-m
Dr. Amr Talaat
 This notation simplifies the discussion
 No need to distinguish between the different p
artitions of numbers into fractional and integr
al parts
ELECT 90X
Representation of Negative Numbers
Programmable Logic Circuits
Dr. Amr Talaat
 Fixed-point numbers in a radix r system
 Two ways of representing negative numbers:
 Sign and magnitude representation (or signedmagnitude representation)
 Complement representation with two alternative
s
Radix complement (two's complement in the
binary system)
Diminished-radix complement (one's comple
ment in the binary system)
ELECT 90X
Signed-Magnitude Representation
Programmable Logic Circuits
Dr. Amr Talaat
 Sign and magnitude are represented separately
 First digit is the sign digit, remaining n-1 digits repre
sent the magnitude
 Binary case - sign bit is 0 for positive, 1 for negative
numbers
 Non-binary case - 0 and r-1 indicate positive and ne
gative numbers
n
n-1
 Only 2r
out of the r possible sequences are utili
zed
 Two representations for zero - positive and negative
 Inconvenient when implementing an arithmetic un
it - when testing for zero, the two different repre
sentations must be checked
ELECT 90X
Programmable Logic Circuits
Disadvantage of the Signed-Magnitude
Representation
Dr. Amr Talaat
 Operation may depend on the signs of the operands
 Example - adding a positive number X and a negative num
ber -Y : X+(-Y)
 If Y>X, final result is -(Y-X)
 Calculation  switch order of operands
 perform subtraction rather than addition
 attach the minus sign
 A sequence of decisions must be made, costing excess con
trol logic and execution time
 This is avoided in the complement representation methods
ELECT 90X
Programmable Logic Circuits
Complement Representations of Negative
Numbers
Dr. Amr Talaat
 Two alternatives  Radix complement (called two's complemen
t in the binary system)
 Diminished-radix complement (called one's c
omplement in the binary system)
 In both complement methods - positive numbe
rs represented as in the signed-magnitude met
hod
 A negative number -Y is represented by R-Y w
here R is a constant
 This representation satisfies -(-Y )=Y since R
-(R-Y)=Y
ELECT 90X
Programmable Logic Circuits
Advantage of Complement Representation
Dr. Amr Talaat
 No decisions made before executing addition o
r subtraction
 Example: X-Y=X+(-Y)
 -Y is represented by R-Y
 Addition is performed by X+(R-Y) = R-(Y-X)
 If Y>X, -(Y-X) is already represented as R-(YX)
 No need to interchange the order of the two o
perands
ELECT 90X
Two’s Complement
Programmable Logic Circuits
Dr. Amr Talaat
0
 r=2, k=n=4, m=0, ulp=2 =1
 Radix complement (called two's complement in the binary c
4
ase) of a number X = 2 - X
 It can instead be calculated by X+1
 0000 to 0111 represent positive numbers 010 to 710
 The two's complement of 0111 is 1000+1=1001
 it represents the value (-7)10
 The two's complement of 0000 is 1111+1=10000=0 mod
24 - single representation of zero
 Each positive number has a corresponding negative number
that starts with a 1
 1000 representing (-8)10 has no corresponding positive num
ber
 Range of representable numbers is -8  X  7
ELECT 90X
The Two’s Complement Representation
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Example - Addition in Two’s complement
Programmable Logic Circuits
Dr. Amr Talaat
 Calculating X+(-Y) with Y>X - 3+(-5)
0011 3
+ 1011 -5
1110 -2
 Correct result represented in the two's comple
ment method - no need for preliminary decision
s or post corrections
 Calculating X+(-Y) with X>Y - 5+(-3)
0101 5
+ 1101 -3
1 0010 2
 Only the last four least significant digits are ret
ained, yielding 0010
ELECT 90X
One’s Complement in Binary System
Programmable Logic Circuits
 r=2, k=n=4, m=0, ulp=2
=1
0
 Diminished-radix complement (called one's com
plement in the binary case) of a number X =
Dr. Amr Talaat
(2 - 1) - X = XAs before, the sequences 0000 to 0111 represen
t the positive numbers 010 to 710
The one's complement of 0111 is 1000, represe
nting (-7)10
The one's complement of zero is 1111 - two rep
resentations of zero
Range of representable numbers is -7  X  7
4




ELECT 90X
Comparing the Three Representations in a
Binary System
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
5.1 Bit-Serial and Ripple-Carry Adders
Programmable Logic Circuits
Inputs
Outputs
x
y
c
s
---------------0
0
0
0
0
1
0
1
1
0
0
1
1
1
1
0
x
y
c
HA
s
Half-adder (HA): Truth table and block diagram
Dr. Amr Talaat
Inputs
Outputs
x
y
cin
cout s
---------------------0
0
0
0
0
0
0
1
0
1
0
1
0
0
1
0
1
1
1
0
1
0
0
0
1
1
0
1
1
0
1
1
0
1
0
1
1
1
1
1
x
cout
y
FA
cin
s
Full-adder (FA): Truth table and block diagram
ELECT 90X
Half-Adder Implementations
Programmable Logic Circuits
x
y
c
_
x
_
y
c
x
y
x
y
s
s
(a) AND/XOR half-adder.
_
c
(b) NOR-gat e half-adder.
x
Dr. Amr Talaat
s
y
(c) NAND -gat e half-adder wit h com plem ent ed carry.
Three implementations of a half-adder.
ELECT 90X
Full-Adder Implementations
y x
y x
Programmable Logic Circuits
cout
HA
cout
HA
cin
cin
s
( a) Built o f h alf- adders.
y
x
Mux
cout
0
1
2
3
0
1
s
Dr. Amr Talaat
s
0
1
2
3
( c) Suitable f or
cin
( b) Built as an AND-O R circuit .
Possible designs for a full-adder in terms
of half-adders, logic gates, and CMOS
CM OS realization .
transmission gates.
ELECT 90X
Full-Adder Details
Programmable Logic Circuits
Logic equations for a full-adder:
s = x  y  cin
= x y cin  x  y  cin  x  y cin  x y  cin
cout = x y  x cin  y cin
(odd parity function)
(majority function)
y
P
x0
TG
z
Dr. Amr Talaat
N
TG
(a) CMOS transmission gate:
circuit and symbol
x1
TG
(b) Two-input mux built of two
transmission gates
CMOS transmission gate and its use in a 2-to-1 mux.
ELECT 90X
Simple Adders Built of Full-Adders
Programmable Logic Circuits
y
Using full-adders in building
bit-serial and ripple-carry
adders.
x
xi
Shift
Carry
FF
yi
ci
ci+1
FA
Clock
Shift
si
s
(a) Bit-serial adder.
x31
y31
c32
x1
c31
Dr. Amr Talaat
FA
. . .
y1
c2
x0
y0
c1
FA
c0
FA
cout
s32
cin
s31
s1
s0
(b) Ripple-carry adder.
ELECT 90X
Critical Path Through a Ripple-Carry Adder
Programmable Logic Circuits
Dr. Amr Talaat
sk
Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)
xk–1
yk–1
ck
xk-2
yk–2
ck–1
FA
ck–2
FA
x1
. . .
y1
c2
x0
y0
c1
FA
c0
FA
cout
cin
sk–1
sk–2
s1
s0
Critical path in a k-bit ripple-carry adder.
ELECT 90X
Inputs
Outputs
Binary Adders as
Versatile
Building
Blocks
x
y
c
c
s
in
Programmable Logic Circuits
Set one input to 0:
Set one input to 1:
Set one input to 0
and another to 1:
Bit 3
0
1
w  xyz
c4
out
---------------------0
0
0
0
0
c
=
AND
of
other
inputs
out
0
0
1
0
1
0
1
0
0
1
c
=
OR
of
other
inputs
0 out 1
1
1
0
1
0
0
0
1
1
0
1
1
0
1 s 1= NOT
0 of third
1 input
0
1
1
1
1
1
Bit 2
w
1
c3
w  xyz
Bit 1
z
0
c2
xyz
x
y
FA
cout
cin
s
Bit 0
y
x
c1
xy
c0
0
Dr. Amr Talaat
(w  xyz)
Four-bit binary adder used to realize the logic function
f = w + xyz and its complement.
ELECT 90X
Conditions and Exceptions
Programmable Logic Circuits
y0 x0
y1 x1
yk–1 xk–1 yk–2 xk–2
c k–1
c1
ck
c k–2
c2
c0
...
FA c
FA
FA
FA
in
cout
Ov erflo w
Neg ativ e
Zero
s k–1
s k–2
s1
s0
Dr. Amr Talaat
Two’s-complement adder with provisions for
detecting conditions and exceptions.
overflow2’s-compl = ck  ck–1 = ck ck–1  ck ck–1
ELECT 90X
Manchester Carry Chains and Adders
Programmable Logic Circuits
Sum digit in radix r
Special case of radix 2
si
si
=
=
(xi + yi + ci) mod r
xi  yi  ci
Computing the carries ci is thus our central problem
For this, the actual operand digits are not important
What matters is whether in a given position a carry is
generated,
For binary addition:
gi = x i y i
propagated,
or
pi = x i  y i
annihilated (absorbed)
ai = xiyi  = (xi  yi) 
Dr. Amr Talaat
It is also helpful to define a transfer signal:
ti = gi  pi = ai = xi  yi
Using these signals, the carry recurrence is written as
ci+1 = gi  ci pi = gi  ci gi  ci pi = gi  ci ti
ELECT 90X
Carry Network is the Essence of a Fast Adder
Programmable Logic Circuits
gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
xi
g k2 p k2
yi
gi = x i y i
pi = x i  y i
g i+1 p i+1
gi
pi
...
...
g1 p1
g0 p0
c0
Carry network
ck
c k1
...
c k2
ci
Dr. Amr Talaat
c i+1
...
c1
c0
Ripple; Skip;
Lookahead;
Parallel-prefix
si
The main part of an adder is the carry network. The rest is just a set of
gates to produce the g and p signals and the sum bits.
ELECT 90X
Ripple-Carry Adder Revisited
Programmable Logic Circuits
The carry recurrence: ci+1 = gi  pi ci
Latency of k-bit adder is roughly 2k gate delays:
1 gate delay for production of p and g signals, plus
2(k – 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits
gk1 pk1
Dr. Amr Talaat
ck
gk2 pk2
g1
p1
...
ck1
ck2
c2
c1
g0
p0
c0
The carry propagation network of a ripple-carry adder.
ELECT 90X
The Complete Design of a Ripple-Carry Adder
Programmable Logic Circuits
gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
gk1 pk1
ck
ck
xi
g k2 p k2
c k1
gi = x i y i
pi = x i  y i
g i+1 p i+1
gi
pi
...
...
gk2 pk2
g1
...
ck1
yi
ck2
c2
p1
g0
gk2 pk2
c
Carry network
c
c
c 1 ck
...
c k2
p0 pk1
gk1
ci
Dr. Amr Talaat
c i+1
g1
p1
...
0
k1
g1 p1
k2
c2
c1
...
c1
g0 p0
g0
c0
p0
c0
c0
si
ELECT 90X
Unrolling the Carry Recurrence
Programmable Logic Circuits
Recall the generate, propagate, annihilate (absorb), and transfer signals:
Signal
gi
pi
ai
ti
si
Radix r
is 1 iff xi + yi  r
is 1 iff xi + yi = r – 1
is 1 iff xi + yi < r – 1
is 1 iff xi + yi  r – 1
Binary
x i yi
x i  yi
xiyi  = (xi  yi) 
x i  yi
(xi + yi + ci) mod r
x i  yi  ci
The carry recurrence can be unrolled to obtain each carry signal directly
from inputs, rather than through propagation
Dr. Amr Talaat
ci = gi–1  ci–1 pi–1
= gi–1  (gi–2  ci–2 pi–2) pi–1
= gi–1  gi–2 pi–1  ci–2 pi–2 pi–1
= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  ci–3 pi–3 pi–2 pi–1
= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  gi–4 pi–3 pi–2 pi–1  ci–4 pi–4 pi–3 pi–2 pi–1
=...
ELECT 90X
Full Carry Lookahead
Programmable Logic Circuits
x3 y3
x2 y2
x1 y1
x0 y0
cin
...
s3
s2
s1
s0
Dr. Amr Talaat
Theoretically, it is possible to derive each sum digit directly
from the inputs that affect it
Carry-lookahead adder design is simply a way of reducing
the complexity of this ideal, but impractical, arrangement by
hardware sharing among the various lookahead circuits
ELECT 90X
Four-Bit Carry-Lookahead Adder
c4
Programmable Logic Circuits
Complexity
reduced by
deriving the
carry-out
indirectly
p3
g3
c3
p2
g2
Full carry lookahead is quite practical
for a 4-bit adder
Dr. Amr Talaat
c1
c2
c3
c4
=
=
=
=
p1
c2
g0  c 0 p0
g1  g0 p1  c 0 p0 p 1
g2  g1 p2  g0 p1 p 2  c 0 p0 p1 p2
c
g3  g2 p3  g1 p2 p 3  g0 p1 p2 p3 1
 c 0 p0 p1 p2 p3
g1
p0
g0
c0
Four-bit carry network with
full lookahead.
ELECT 90X
Carry Lookahead Beyond 4 Bits
Programmable Logic Circuits
Consider a 32-bit adder
c 1 = g 0  c 0 p0
c 2 = g1  g0 p1  c 0 p0 p1
c 3 = g2  g1 p2  g0 p1 p2  c 0 p0 p1 p2
.
.
.
32-input AND
c31 = g30  g29 p30  g28 p29 p30  g27 p28 p29 p30  . . .  c0 p0 p1 p2 p3 ... p29 p30
Dr. Amr Talaat
...
32-input OR
High fan-ins necessitate
tree-structured circuits
ELECT 90X
Solutions to the Fan-in Problem
Programmable Logic Circuits
• Multilevel lookahead
• Block Adders
•High-radix addition (i.e., radix 2h) : Increases the latency for
generating g and p signals and sum digits, but simplifies the carry
network (optimal radix?)
Example: 16-bit addition
Radix-16 (four digits)
Two-level carry lookahead (four 4-bit blocks)
Dr. Amr Talaat
Either way, the carries c4, c8, and c12 are determined first
c16 c15 c14 c13 c12 c11 c10 c9
Cout
?
cin
c8
?
c7
c6
c5
c4
c 3 c2 c1 c0
?
ELECT 90X
Block Ripple Adder
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Larger Carry-Lookahead Adder Design
Programmable Logic Circuits
Block generate and propagate signals
g [i,i+3] = gi+3  gi+2 pi+3  gi+1 pi+2 pi+3  gi pi+1 pi+2 pi+3
p [i,i+3] = pi pi+1 pi+2 pi+3
• If all 4 bits in a block propagate, the block propagates a carry.
• If at least one of the 4 bits generates carry and it can be propagated to
the MSB, the block generates a carry.
ci+3
ci+2
ci+1
gi+3 p i+3 gi+2 pi+2 gi+1 pi+1 gi pi
Dr. Amr Talaat
4-bit lookahead carry generator
g[ i,i+3]
ci
p[ i,i+3]
ELECT 90X
A Building Block for
Carry-Lookahead Addition
Programmable Logic Circuits
Four-bit
lookahead
carry generator.
p [i,i+3]
g [i,i+3]
pi+3
c4
gi+3
Block Signal Generation
Int ermediat e Carries
p3
g3
ci+3
c3
Four-bit
adder
p2
pi+2
g2
gi+2
Dr. Amr Talaat
p1
c2
pi+1
ci+2
gi+1
g1
p0
pi
ci+1
c1
g0
c0
ci
gi
ELECT 90X
Combining Block g and p Signals
Programmable Logic Circuits
Dr. Amr Talaat
Combining of g and p signals of four blocks of arbitrary widths into
the g and p signals for the overall block
ELECT 90X
A Two-Level Carry-Lookahead Adder
Programmable Logic Circuits
c16
c 32
c48
c12
c8
c4
g [12,15]
p [12,15]
g [8,11]
p [8,11]
g [4,7]
p [4,7]
c0
g [0,3]
p [0,3]
4-bit lookahead carry generator
g [48,63]
p [48,63]
g [32,47]
p [32,47]
g [16,31]
p [16,31]
g [0,15]
p [0,15]
16-bit
Carry-Lookahead
Adder
4-bit lookahead carry generator
Dr. Amr Talaat
g [0,63]
p [0,63]
Building a 64-bit carry-lookahead adder from 16 4-bit
adders and 5 lookahead carry generators.
ELECT 90X
Ling Adder and Related Designs
Programmable Logic Circuits
Consider the carry recurrence and its unrolling by 4 steps:
ci = gi–1  ci–1 ti–1
= gi–1  gi–2 ti–1  gi–3 ti–2 ti–1  gi–4 ti–3 ti–2 ti–1  ci–4 ti–4 ti–3 ti–2 ti–1
Ling’s modification: Propagate hi = ci  ci–1 instead of ci
hi = gi–1  hi–1 ti–2
= gi–1  gi–2  gi–3 ti–2  gi–4 ti–3 ti–2  hi–4 ti–4 ti–3 ti–2
CLA:
Ling:
5 gates
4 gates
max 5 inputs
max 5 inputs
19 gate inputs
14 gate inputs
The advantage of hi over ci is even greater with wired-OR:
Dr. Amr Talaat
CLA:
Ling:
4 gates
3 gates
max 5 inputs
max 4 inputs
14 gate inputs
9 gate inputs
Once hi is known, however, the sum is obtained by a slightly more
complex expression compared with si = pi  ci
si = (ti  hi+1)  hi gi ti–1
ELECT 90X
Carry Determination as Prefix Computation
Programmable Logic Circuits
Blo ck B'
g
Blo ck B"
j0
j1
p
i0
i1
p
(g", p")
g"
p"
g
g'
(g', p')
p'
¢
Dr. Amr Talaat
g
g = g" + g'p"
p = p'p"
p
(g, p)
Block B
g
p
ELECT 90X
Formulating the Prefix Computation Problem
Programmable Logic Circuits
The problem of carry determination can be formulated as:
Given
(g0, p0)
(g1, p1) . . . (gk–2, pk–2)
(gk–1, pk–1)
Find
(g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])
Dr. Amr Talaat
Prefix sums analogy:
Given
x0
x1
Find
x0
x0+x1
c1
c2
. . .
ck–1
ck
The desired pairs are found by evaluating all prefixes of
(g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)
The carry operator ¢ is associative, but not commutative
[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]
x2
x0+x1+x2
. . .
. . .
xk–1
x0+x1+...+xk–1
ELECT 90X
Example Prefix-Based Carry Network
g3, p3
Programmable Logic Circuits
6
g2, p2
1
+
+
12
2
+
+
6
g[0,3], p[0,3] g[0,2], p[0,2]
=g(c
, --) = g(c2,3,p--)
3, 4p3
2
¢
Dr. Amr Talaat
¢
g1, p1
g[0,3], p[0,3] g[0,2], p[0,2]
= (c4, --) = (c3, --)
5
Four-input
prefix sums
network
7
5
g[0,1], p[0,1] g[0,0], p[0,0] Scan
order
=
,
--)
g1(c
, p21, --) =g(c
,
p
0 1 0
¢
¢
g0, p0
g
p
g
p
Four-bit
Carry
lookahead
network
g[0,1], p[0,1] g[0,0], p[0,0]
= (c2, --) = (c1, --)
g
p
ELECT 90X
Alternative Parallel Prefix Networks
Programmable Logic Circuits
xk–1 . . . xk/2
. . .
xk/2–1 . . . x0
. . .
P refix Sums k/2
P refix Sums k/2
. . .
. . .
...
s k/2–1 . . . s 0
+
+
s k–1 . . . s k/2
Dr. Amr Talaat
Parallel prefix sums network built of two k/2-input
networks and k/2 adders. (Ladner-Fischer)
ELECT 90X
Brent-Kung Recursive Construction
x k–1 x k–2
. . .
x3 x2 x1 x0
Programmable Logic Circuits
+
+
+
. . .
P refix Sums k/2
. . .
+
s k–1 s k–2
+
. . .
s3 s2 s1 s0
Dr. Amr Talaat
Parallel prefix sums network built of one k/2-input
network and k – 1 adders.
ELECT 90X
Brent-Kung Carry Network (8-Bit Adder)
Programmable Logic Circuits
[7, 7 ]
[6, 6 ]
[5, 5 ]
[4, 4 ] [3, 3 ]
[2, 2 ]
[1, 1 ]
[0, 0 ]
g[1,1] p[1,1]
g[0,0]
¢
¢
¢
[6, 7 ]
¢
p[0,0]
[2, 3 ]
[4, 5 ]
[0, 1 ]
¢
¢
[4, 7 ]
[0, 3 ]
¢
¢
Dr. Amr Talaat
¢
¢
¢
g[0,1] p[0,1]
[0, 7 ]
[0, 6 ]
[0, 5 ]
[0, 4 ] [0, 3 ]
[0, 2 ]
[0, 1 ]
[0, 0 ]
ELECT 90X
Brent-Kung Carry Network (16-Bit Adder)
Programmable Logic Circuits
x15 x14 x13 x12 x x x x x x x x x x x x
11 10 9
8
7
6
5
4 3
2 1
0
Level
1
Reason for
latency
2
3
4
Dr. Amr Talaat
Brent-Kung
parallel prefix
graph for
16 inputs.
5
6
s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s s s s s s s s
7
6 5
4 3
2 1
0
ELECT 90X
Kogge-Stone Carry Network (16-Bit Adder)
x15 x14 x13 x12 x
x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0
s15 s14 s13 s12 s
s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0
11
Programmable Logic Circuits
log2k levels
(minimum
possible)
Dr. Amr Talaat
Kogge-Stone
parallel prefix
graph for
16 inputs.
11
ELECT 90X
Speed-Cost Tradeoffs in Carry Networks
Programmable Logic Circuits
Method
Delay
Cost
Ladner-Fischer
?
(k/2) log2k
Kogge-Stone
?
k log2k – k + 1
Brent-Kung
?
2k – 2 – log2k
Dr. Amr Talaat
ELECT 90X
Hybrid B-K/K-S Carry Network (16-Bit Adder)
x
x x
15 14 13
x
x
12 11
x
x
10 9
x
8 x7 x6 x5
x
4
x
3
x x
2
1
x
0
x15 x14 x13 x12 x x x x x x x x x x x x
11 10 9
8 7
6 5
4 3 2 1
0
Leve l
Programmable Logic Circuits
1
2
Brent-Kung:
6 levels
26 cells
Kogge-Stone:
4 levels
49 cells
3
4
5
6
s 15 s 14 s 13 s 12 s 11 s 10 s 9 s 8 s s s s s s s s
7 6 5 4 3 2 1
0
x15 x14 x13 x12 x
11
s 15 s14 s 13 s12 s11 s 10 s9 s8 s7 s 6 s5 s4 s 3 s2 s 1 s0
x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0
BrentKung
Dr. Amr Talaat
A Hybrid
Brent-Kung/
Kogge-Stone
parallel prefix
graph for
16 inputs.
KoggeStone
Hybrid:
5 levels
32 cells
BrentKung
s15 s14 s13 s12 s
11
s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0
ELECT 90X
Simple Carry-Skip Adders
Programmable Logic Circuits
c 16
4-Bit
Block
c12
4-Bit
Block
c8
4-Bit
Block
c4
(a) Ripple-carry adder.
c16
c 12
4-Bit
Block
p [12,15]
Skip
4-Bit
Block
c8
p [8,11]
Skip
c4
4-Bit
Block
p [4,7]
Skip
(b) Simple carry-skip adder.
c0
3 2 1 0
Ripple-carry st ages
c0
3 2 1 0
p[0,3]
Ski p logi c (2 gates )
Dr. Amr Talaat
Converting a 16-bit ripple-carry adder into a simple carry-skip
adder with 4-bit skip blocks.
ELECT 90X
Another View of Carry-Skip Addition
Programmable Logic Circuits
g4j+3 p4j+3
c4j+4
c4j+3
g4j+2 p4j+2
c4j+2
g4j+1
p4j+1
c4j+1
g4j
p4j
c4j
One-way street
Dr. Amr Talaat
Freeway
Street/freeway analogy for carry-skip adder.
ELECT 90X
Multilevel Carry-Skip Adders
c out
Programmable Logic Circuits
c in
S1
S1
S1
S1
S1
One-level carry-skip adder.
c out
c in
S1
S1
S1
S1
S1
S2
Example of a two-level carry-skip adder.
c out
c in
Dr. Amr Talaat
S1
S1
S1
S2
Two-level carry-skip adder optimized by removing the short-block
skip circuits.
ELECT 90X
Using Two-Operand Adders
Programmable Logic Circuits
Some applications of multioperand addition
• • • •
 • • • •
---------• • • •
• • • •
• • • •
• • • •
---------------• • • • • • • •
a
x
x0
x1
x2
x3
p
a2 0
a2 1
a2 2
a2 3
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
----------------• • • • • • • • •
p (0)
p (1)
p (2)
p (3)
p (4)
p (5)
p (6)
s
Dr. Amr Talaat
Multioperand addition problems for multiplication or innerproduct computation in dot notation.
ELECT 90X
Serial Implementation with One Adder
Programmable Logic Circuits
x(i)
k bits
Adder
k + log2 n bits i–1
 x (j)
j=0
Partial sum
register
Dr. Amr Talaat
Serial implementation of multi-operand addition
with a single 2-operand adder.
ELECT 90X
Pipelined Implementation for Higher Throughput
Programmable Logic Circuits
x(i–6) +x(i–7)
x(i–1)
Ready t o
comput e
Del ay
Dr. Amr Talaat
x(i)
Del ays
s (i–12)
x(i) + x(i–1)
x(i–8) +x(i–9) +x(i–10) +x(i–11)
x(i–4) +x(i–5)
Serial multi-operand addition when each adder is a
4-stage pipeline.
ELECT 90X
Parallel Implementation as Tree of Adders
Programmable Logic Circuits
k
k
Adder
k+1
n–1
adders
k
k
Adder
k+1
k
k
Adder
k+1
Adder
k+2
Adder
k+2
k
log2n
adder levels
Adder
k+3
Dr. Amr Talaat
Adding 7 numbers in a binary tree of adders.
ELECT 90X
Carry-Save Adders
Programmable Logic Circuits
Cut
A ripple-carry adder
turns into a
carry-save adder if the
carries are saved
(stored) rather than
propagated.
cin
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
Carry-propagate adder
cout
Dr. Amr Talaat
Carry-save adder (CSA)
or
(3; 2)-counter
or
3-to-2 reduction circuit
Carry-propagate adder (CPA) and
carry-save adder (CSA) functions in
dot notation.
Full-adder
Half-adder
Specifying full- and halfadder blocks,
with their inputs and
outputs, in dot notation.
ELECT 90X
Multioperand Addition Using Carry-Save Adders
Programmable Logic Circuits
CSA
CSA
CSA
Input
CSA
CSA
Sum register
Carry register
Dr. Amr Talaat
CPA
CSA
Carry-propagate adder
Output
Serial carry-save addition
using a single CSA.
Tree of carry-save adders reducing
seven numbers to two.
ELECT 90X
Example Reduction by a CSA Tree
Programmable Logic Circuits
8
12 FAs
6 FAs
6 FAs
7
6
5 4 3 2
7 7 7 7
2 5 5 5 5
3 4 4 4 4
1 2 3 3 3 3
2 2 2 2 2 1
--Carry-propagate
1 1 1
1 1 1 1
1 0
7 7
5 3
4 1
2 1
2 1
adder-1 1
Bit position
62 = 12 FAs
6 FAs
6 FAs
4 FAs + 1 HA
7-bit adder
Representing a seven-operand
addition in tabular form.
4 FAs + 1 HA
Dr. Amr Talaat
7-bit adder
Total cost = 7-bit adder + 28 FAs + 1 HA
Addition of seven 6-bit
numbers in dot notation.
A full-adder compacts 3 dots into 2
(compression ratio of 1.5)
A half-adder rearranges 2 dots
(no compression, but still useful)
ELECT 90X
Width of Adders in a CSA Tree
[0 , k –1 ]
Programmable Logic Circuits
[0 , k –1 ]
[0 , k –1 ]
[0 , k –1 ]
k -bit CSA
[1 , k ]
[0 , k –1 ]
[0 , k –1 ]
[0 , k –1 ]
k -bit CSA
[0 , k –1 ]
Adding seven k-bit
numbers and the
CSA/CPA widths required.
[0 , k –1 ]
[1 , k ]
k -bit CSA
[1 , k ]
[0 , k –1 ]
k -bit CSA
[2 , k +1 ]
Dr. Amr Talaat
The ind ex pair
[i , j] mean s t hat
b it p os iti on s
from i u p to j
are i nv olv ed .
[1 , k –1 ]
[1 , k ]
k -bit CSA
Due to the gradual
retirement (dropping out)
of some of the result bits,
CSA widths do not vary
much as we go down the
tree levels
[1 , k +1 ]
[2 , k +1 ]
[2 , k +1 ]
k -bit CP A
k +2
[2 , k +1 ]
1
0
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits
Dr. Amr Talaat
ELECT 90X
Saturating Adders
Programmable Logic Circuits
Saturating (saturation) arithmetic:
When a result’s magnitude is too large, do not wrap around;
rather, provide the most positive or the most negative value
that is representable in the number format
Example – In 8-bit 2’s-complement format, we have:
120 + 26  18 (wraparound); 120 +sat 26  127 (saturating)
Saturating arithmetic in desirable in many DSP applications
Designing saturating adders
Dr. Amr Talaat
Adder
Unsigned (quite easy)
Signed (slightly harder)
0
1
Overflow
Saturation value
ELECT 90X
Readings:
Programmable Logic Circuits
 Main reference for the above slides:
 Chapters 5,6,7,& 8, B. Parhami, Computer Ar
ithmetic: Algorithms and Hardware Design, O
xford University Press, 2000.
Dr. Amr Talaat
ELECT 90X
Download