Adder Design and Performance Optimization - ECE

advertisement
ENEE759T Project Report
Adder Design and Performance Optimization
Ren Mao
School of Electrical and Computer Engineering
University of Maryland
Email: neroam@umd.edu
Abstract—Adders are widely used in most computing
systems and processors, such as arithmetic logic units and
address calculation. This article describes how to implement and verify basic shematics for a 4-bit ripple-carry
adder(RCA) and a 4-bit carry-lookahead adder(CLA) in
static CMOS . It measures and compares adder performance in terms of delay and power consumption. Also, it
proposes a more efficient 4-bit adder, which combines RCA
and CLA together, and a different kind of RCA, which
uses optimized fulladder design, for better performance.
At last, it gives the simulation results of verification and
performance measurement and analyzes its efficiency.
I. I NTRODUCTION
Adders are widely used in most computing systems and processors, such as arithmetic logic units
and address calculation. To design the best practices
of adders, we need to consider about both the
speed and power consumption of the circuit. In
this project, I investigate various ways to implement
adders for 4-bit inputs in order to design a more efficient adder respect to delay and power consumption.
Specifically, I examine schematics for the two most
popluar types: ripple-carry adder(RCA) and carrylookahead adder(CLA). The ripple-carry adder is a
basic design that cascades multiple full adders to
add N-bit numbers, where each carry bit “ripples“
to the next full adder. The layout of a ripple-carry
adder is simple, which allows for fast design time;
however, the ripple-carry adder is relatively slow,
since each full adder must wait for the carry bit
to be calculated from the previous full adder. The
carry-lookahead adder is an alternative approach
that reduces the adder computation time by reducing
the amount of time required to determine carry
bits. It calculates one or more carry bits before the
sum, which reduces the wait time to calculate the
result of the larger value bits. Meanwhile, it requires
more complex schematic which brings more power
consumption and design time.
Take both delay and energy into consideration,
I qunatify the efficiency of the adder using the
product of energy consumption and worse case
delay(EDP). Thus, the measurement of this product
could represent a design tradeoff between propagation delay and power consumption. Having implemented these two basic designs and compared
their performance, I could find that fast delay indeed
requires greater power, which results in a large
product of energy and delay. Therefore, I design and
test two different optimized adders: hybrid adder
and ripple-carry-opt adder. The hybrid adder is a
composite design of both RCA and CLA. It cascades two 2-bit adders to add 4-bit numbers, which
is similar to RCA. And the 2-bit adder is designed
as CLA, which has a small propagation delay. In
this way, the delay of whole 4-bit adder could be
smaller than RCA and the energy consumption of
that could be smaller than CLA, which gives a
better performance in terms of EDP. The ripplecarry-opt adder is almost the same as RCA, where
the fulladder is fully optimized with mirror-structure
of CMOS, so that the total delay of the adder could
be smaller than the basic design of RCA.
This article mainly i) describes how to implement
and verify the two basic designs of RCA and CLA,
ii) measures the delay and energy performance of
the adders, and iii) proposes two optimized design
which could be more efficient in terms of EDP. The
article is organized as follows: section II describes
how to design these four different adders ,the intuition behind the design and what behavior I expect
to see from the schematics. Section III provides the
EDPs in a table and discusses the efficiency of all
these designs. Section IV gives the conclusion of
this project.
II. D ESIGNS
To implement and test different adders, I create
schematics in Virtuoso for all 4 kinds of 4-bit adders
using static CMOS logic. In all the designs, assume
three input signals are: In A(4-bit operand), In B(4bit operand), and C in(1-bit operand); two output
signals are S out(4-bit) and C out(1-bit).
Fig. 3.
Schematic of 4-bit Carry Lookahead Adder(CLA)
A. Ripple-carry Adder
Fig. 4.
Fig. 1.
Schematic of 4-bit Ripple Carry Adder(RCA)
The top-level schematic of RCA is as Fig.1. And
the schematic of full adder in this design is as Fig.2.
It basically cascades multiple full adders to add 4bit numbers, where each carry out is connected to
the next full adder as carry in.
This ripple-carry adder works in the same way as
pencil-and-paper methods of addition. Starting from
the least significant bit, the two corresponding bits
are added and the carry obtained. Take this carry
as input of the second bit addition, it will produce
another carry and sum. Propagating this carry into
next full adder, it calculates the final results at the
Fig. 2.
Schematic of Fulladder
Schematic of Propagation Fulladder
last full adder. The schematic/layout of ripple-carry
adder is simple, which allows for fast design time.
The propagation delay can easily be calculated by
inspection of the full adder circuit. The worst case
delay should be the the path of carry propagation,
which is relatively long because each full adder
must wait for the carry bit to be calculated from
the previous full adder. On the other hand, since
the circuit is simple, the power consumption of this
adder should be relatively small.
B. Carry-lookahead Adder
To reduce the computation time, another way to
implement the 4-bit adder is to use carry-lookahead
unit to parallel generate the carry for each bit
addition. The top-level schematic of CLA is as
Fig.3. It basically generates indication bits for carry
propagation and generation(P and G) of each fulladder and calculate all the carries simultaneously.
The schematics of propagtion full adder and carrylookahead unit is as Fig.4 and Fig.5.
Carry lookahead logic uses the concepts of generating and propagating carries. In the case of binary
addition, it generates carry if and only if both of
Fig. 6.
Fig. 5.
Schematic of 4-bit Hybrid Adder
Schematic of 4-bit Carry Look Ahead Unit
the inputs are 1, G = AB; it propagates if and only
if at least one of inputs are 1, P = A + B. Given
these concepts of generate and propagate, it will
carry precisely when either the addition generates
or the next less significant bit carry propagates,
Ci+1 = Gi + (Pi Ci ). For each bit in a binary
sequence to be added, the carry lookahead logic
will determine whether that bit pair will generate
a carry or propagate a carry. This allows the circuit
to pre-process the two numbers being added to
determine the carry ahead of time. Then, when the
actual addition is performed, there is no delay from
waiting for the ripple carry effect. Specifically for
4-bit CLA, carry calculations are as follows:
C1 = G0 + P0 C0
C2 = G1 + G0 P1 + C0 P0 P1
C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2
C4 = G3 + G2 P3 + G1 P2 P3 + G0 P1 P2 P3 + C0 P0 P1 P2 P3
According to this carry lookahead unit, CLA
could acheieve smaller worst case delay than RCA.
Meanwhile, because of its more complex logic, it
should consume more power than RCA.
C. Hybrid Adder
Faster digital Circuits usually require greater
power. Thus, propagation delay and power consumption generally form a design tradeoff. The
RCA has a larger delay and smaller power, while
the CLA has a smaller delay and larger power. To
implement a more efficient 4-bit adder, I combine
RCA and CLA together to get a “hybrid“ adder,
which cascades two 2-bit adders to calculate 4-bit
Fig. 7.
Schematic of 2-bit Carry Lookahead Unit
numbers and the 2-bit adder are designed as carrylookahead adder. The top level schematic of this
adder is as Fig.6 and the 2-bit carry lookahead unit
is as Fig.7.
In this circuit, since the adder is partly carry
lookahead and partly ripple carry, its worst case
delay should be between the delay of RCA and
CLA. And since its logic is simpler than the CLA,
its power consumption should also lie between RCA
and CLA. Therefore, the overall performance of
both delay and energy could be better than either
of RCA and CLA.
D. Ripple-carry Optimized Adder
knowing that the worst case delay of ripple carry
adder is the delay between carry in and carry out
of each full adder, I investigate another optimized
design of fulladder to see if it will get a better
performance. The top level of the ripple carry adder
is the same, while the full adder schematic is
different, as Fig.8.
The intuition of this fulladder optimization is to
minimize the delay of carry in and carry out of the
full adder which is the key point of the ripple carry
Fig. 8.
(a) RCA
(b) CLA
(c) hybrid adder
(d) Opt-RCA
schematic of optimized fulladder
Fig. 10.
Vdd (v)
1
2
3
Fig. 9.
Matlab Results of Functionality Verification
Worst case delay of different adders with vdd = 3.
Adders
RCA
CLA
hybrid
Opt-RCA
RCA
CLA
hybrid
Opt-RCA
RCA
CLA
hybrid
Opt-RCA
Delay (ns)
5.46
5.16
5.24
3.04
2.40
2.01
2.10
1.10
1.55
1.51
1.53
0.91
Energy (10−10 )
0.84
0.91
0.84
0.94
0.34
0.38
0.35
0.37
0.78
0.88
0.80
0.86
EDP (10−9 )
0.46
0.47
0.44
0.28
0.82
0.76
0.73
0.41
1.21
1.33
1.23
0.79
TABLE I
D ELAY, E NERGY AND EDP OF A DDERS UNDER DIFFERENT V DD
adder. As the schematic shows, the carry out is calcuated at the first CMOS level which is much faster
than the basic design with normal gates. On the
other hand, this design changes the CMOS structure
of full adder which makes the power consumption
still small enough, the performance of the delay and
energy could be better than two basic designs of
RCA and CLA.
To quantify the efficiency of these designs, I
use the product of energy consumption (E) and
worse case delay(twc ): EDP = E × twc . For twc ,
the worst case propagation delay is given by the
transition of inputs from : In A=1111, In B=1111
to In A=0000, In B=0000, assuming carry in is
always 0. This delay could be obtained manually
III. R ESULTS AND D ISCUSSION
by viewing and measuring the output waveforms
To verify the functionality of these designs, I use for this transition in Spectre, such as Fig.10. For
Spectre to do the simulation and check the outputs energy calculation, I export the power of outputs
in Matlab. The inputs are varied for every bit of and use matlab to calculate the total energy of the
In A and In B, assuming the carry in is always 0. simulation. Since the source voltage will influence
And the frequency of In A0 is set to be 10MHz in both delay and energy, in order to test different cases
order to enable correct functionality of all adders. of all these designs, I have calculated EDP under
It can be seen from Fig.9 in Matlab that all these different voltages: vdd = 1, 2, 3. The detailed EDPs
design works correctly.
are in the Table.I.
From the EDP results, we can find out :
• The RCA has largest delay in these designs because of the waiting time of carry propagation.
As the input size grows, its delay will be much
larger than others but its power consumption
could be undoubtly smaller than others.
• The CLA has largest power consumption in
these designs because of its complex structure
of carry lookahead unit and smallest delay
among RCA, CLA and hybrid adder because of
its carry preprocess. As the input size grows,
its delay will increase slowly than others but
its power consumption could be serverely increased.
• The hybrid adder is slightly better than two basic designs of RCA and CLA because of good
tradeoff between delay and energy. The reason
that it has little improvment under vdd = 3 is
that the adder is just 4 bits adder where the
distance of delay between RCA and CLA is
small and the power becomes the key fact of
the performance. If the size of inputs grows,
this design could be much better in EDP because of smaller delay than RCA and smaller
energy consumption than CLA.
• The optimized RCA has the best performance
among these four designs because of its special
design in full adder. Since the CMOS structure
of its full adder is fully optimized to minimize the carry propagation delay, these could
be much more efficient than the basic RCA
and even better than CLA. However, as the
input size grows, its delay will be increased
proportionally to the number of bits. Therefore,
it will not be better than hybrid adder if the
input size is big enough.
• The EDP will be different if the source voltage
is changed. With larger vdd, the energy will be
larger and the delay will be smaller (since the
time to charge of CMOS will be shorter). It can
be found that with some source voltage, RCA
could be better than CLA and with other cases,
CLA could be better than RCA.
IV. C ONCLUSION
In this project, I implement and verify four different design of 4-bit adders: RCA, CLA, hybrid
adder and optimized RCA. Measure and compare
adder performance in terms of delay and power
consumption, I figure out the best practice for 4-bit
adder is the optimized RCA because the input size is
relatively small. While with input size increased, the
difference of delay and energy between RCA and
CLA should be larger and the hybrid adder should
be better than others. As a result, the calculation
of EDP and analysis of schematics are consistent,
which gives an outline to design a performance
optimzed adder.
Download