Hardware Efficient VLSI Adder For Low Power Applications

advertisement
International Journal of Engineering Trends and Technology- Volume3Issue5- 2012
Hardware Efficient VLSI Adder For Low Power
Applications
Nune.Radika1, K.Rajasekhar2
1. Department of E.C.E, Akula Sree Ramulu Engineering College, Tetali
2. H.O.D, Department of E.C.E, Akula Sree Ramulu Engineering College, Tetali
Abstract—
Carry Select Adder (CSLA) is one of the fastest
adders used in many data-processing processors to perform fast
arithmetic functions. From the structure of the CSLA, it is clear
that there is scope for reducing the area and power consumption
in the CSLA. This work uses a simple and efficient gate-level
modification to significantly reduce the area and power of the
CSLA. Based on this modification 8-, 16-, 32-, and 64-b squareroot CSLA (SQRT CSLA) architecture have been developed and
compared with the regular SQRT CSLA architecture. The
proposed design has reduced area and power as compared with
the regular SQRT CSLA with only a slight increase in the delay.
This work evaluates the performance of the proposed designs in
terms of delay, area, power, and their products by hand with
logical effort and through custom design and layout in 0.18- m
CMOS process technology. The results analysis shows that the
proposed CSLA structure is better than the regular SQRT
CSLA. A simple approach is proposed in this paper to reduce the
area and power of SQRT CSLA architecture. The reduced
number of gates of this work offers the great advantage in the
reduction of area and also the total power. The compared results
show that the modified SQRT CSLA has a slightly larger delay
(only 3.76%), but the area and power of the 64-b modified SQRT
CSLA are significantly reduced by 17.4% and 15.4%
respectively. The power-delay product and also the area-delay
product of the proposed design show a decrease for 16-, 32-, and
64-b sizes which indicates the success of the method and not a
mere tradeoff of delay for power and area.
Keywords— Modelsim 6.4c, Xilinx ISE 10.1i,Binary to excess3
converter, carry look ahead adder
The CSLA is used in many computational systems to
alleviate the problem of carry propagation delay by
independently generating multiple carries and then select a
carry to generate the sum [1]. However, the CSLA is not area
efficient because it uses multiple pairs of Ripple Carry Adders
(RCA) to generate partial sum and carry by considering
carry .Then the final sum and carry are selected by the
multiplexers (mux).
The basic idea of this work is to use Binary to Excess-1
Converter (BEC) instead of RCA in the regular CSLA to
achieve lower area and power consumption [2]–[4]. The main
advantage of this BEC logic comes from the lesser number of
logic gates than the 3-bit Full Adder (FA) structure. The
details of the BEC logic are discussed .
This brief is structured as follows. Section II deals with the
delay and area evaluation methodology of the basic adder
blocks. Here we present the detailed structure and the function
of the BEC logic. The SQRT CSLA has been chosen for carry
bits before the sum, which reduces the wait time to calculate
the result of the larger value bits. We will start by explaining
the operation of one-bit full adder which will be the basis for
Comparison with the proposed design as it has a more
balanced delay, and requires lower power and area [5], [6].
A .BEC
As stated above the main idea of this work is to use BEC
instead of the RCA in order to reduce the area and power
consumption of the regular CSLA. To replace the 4-bit RCA,
I. INTRODUCTION
Design of area- and power-efficient high-speed data path logic
an 4-bit BEC is required. A structure and the function table of
systems are one of the most substantial areas of research in
illustrates how the basic function of the CSLA is obtained by
VLSI system design. In digital adders, the speed of addition is
using the 4-bit BEC together with the mux. One input of the
limited by the time required to propagate a carry through the
8:4 mux gets as it input (B3, B2, B1, and B0) and another
adder. The sum for each bit position in an elementary adder is
input of the mux is the BEC output. This produces the two
generated sequentially only after the previous bit position has
possible partial results in parallel and the mux is used to select
been summed and a carry propagated into the next position.
either the BEC output or the direct inputs according to the
a 4-b BEC are shown in Fig. and Table II, respectively. Fig. 3
control signal Cin. The importance of the BEC logic stems
ISSN: 2231-5381
http://www.internationaljournalssrg.org
Page 650
International Journal of Engineering Trends and Technology- Volume3Issue5- 2012
from the large silicon area reduction when the CSLA with
The structure of the 16-b regular SQRT CSLA is shown in Fig.
large number of bits are designed. The Boolean expressions of
4. It has five groups of different size RCA. The delay and area
the 4-bit BEC is listed as (note the functional symbols NOT,
evaluation of each group are shown in Fig. 5, in
& XOR)
which the numerals within [] specify the delay values, e.g.,
sum2 requires 10 gate delays. The steps leading to the
evaluation are as follows Except for group2, the arrival time
of mux selection input is always greater than the arrival time
of data outputs from the RCA’s. Thus, the delay of group3 to
group5 is determined.
Fig.4-bit BEC.
B. METHODOLOGY OF THE BASIC ADDER BLOCKS
The AND, OR, and Inverter (AOI) implementation of an
XOR gate is shown in Fig.The gates between the dotted lines
are performing the operations in parallel and the numeric
representation of each gate indicates the delay contributed by
that gate. The delay and area evaluation methodology
considers all gates to be made up of AND, OR, and Inverter,
each having delay equal to 1 unit and area equal to 1 unit. We
then add up the number of gates in the longest path of a logic
block that contributes to the maximum delay. The area
evaluation is done by counting the total number of AOI gates
required for each logic block. Based on this approach, the
CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are
evaluated
Fig. Regular 16-b SQRT CSLA.
C. METHODOLOGY OF MODIFIED 16-B SQRT CSLA
The structure of the proposed 16-b SQRT CSLA using BEC
for RCA optimize the area and power is shown in Fig. 6. We
again split the structure into five groups. The delay and area
estimation of each group are shown in Fig. The steps leading
to the evaluation are given here. Instead of another 2-b RCA
with a 3-b BEC is used which adds one to the output from 2-b
RCA.,Thus, the sum3 and final
depending
(output from mux) are
mux and partial (input to mux) and mux,
respectively. The sum2 depends mux. 2) For the remaining
group’s the arrival time of mux selection input is always
greater than the arrival time of data inputs from the BEC’s.
Thus, the delay of the remaining groups depends on the arrival
time of mux selection input and the mux delay.
Fig BEC with 8:4 mux.
C. METHODOLOGY OF REGULAR 16-B SQRT CSLA
ISSN: 2231-5381
http://www.internationaljournalssrg.org
Page 651
International Journal of Engineering Trends and Technology- Volume3Issue5- 2012
After this step, a 40MHz clock signal is applied to
the clock port of the root module, and the synthesis tool is
programmed not to modify the clock tree during the
optimization phase. In addition, an arbitrary input delay of
5ns with respect to the clock port is applied to all input and
output ports (except the clock port itself) to set a safe margin
by considering any unintended source of delay such as the
delay associated with driving module/modules.
Then, the design is constrained with hypothetical
maximum area equal to zero to force the tool to make the gate
level netlist as compact as possible.
Fig. The parallel RCA with BEC.
II IMPLEMENTATION OF ALGORITHM
A primary objective of this project was to develop a
synthesizable model for the AES128 encryption algorithm.
Synthesis is the process of converting the register transfer
level (RTL) representation of a design into an optimized gatelevel netlist. This is a major step in ASIC design flow that
takes an RTL model closer to a low-level hardware
In the next steps, the tool is programmed to
consider a unique design for each cell instance by
removing the multiply-instantiated hierarchy in the current
design. Then, the synthesis script removes the boundaries
from all the components in the design hierarchy and removes
all levels of hierarchy.
Finally, the tool compiles the design with high effort
and reports any warning related the mapping and final
optimization step. At the end, the tool generates reports for
the optimized gate level netlist area, the worst combinational
path timing, and any violated design constraint.
implementation.
Synthesis consists of three main steps. The first step
is the “Translation”, which involves converting the RTL
description of a design into a non-optimized intermediate
representation that is used by the synthesis tool. The second
step is the “logic optimization”, which optimizes the internal
representation by removing redundant logic and performing
Boolean logic optimizations.
The third step is called
“technology mapping & optimization” which maps the
internal representation to an optimized gate level
representation using the technology library cells based on
design constraints.[3]
A. SYNTHESIS METHODOLOGY:
The first step in the synthesis process is to read all
the components in the design hierarchy. There are three
components in the 3-level design hierarchy that needs to be
synthesized.
Since the RTL model utilizes a Verilog
“Package”, then the synthesis tool needs to enable the
semantics of a package. In addition, the synthesis tool needs
to know if there are multiple instances of calling an automatic
function in the design, to preserve separate values for each
instance.
After reading the design files, they are “Analyzed”
and “Elaborated” through which the RTL code is converted
into the Synopsys Design Compiler(SDC) internal format. [6]
The intermediate results are stored in the defined “working
library”.
ISSN: 2231-5381
Figure RTL Schematic report
B.SYNTHESIS TIMING RESULT:
The synthesis tool optimizes the combinational paths in a
design. In General, four types of combinational paths can
exist in any design: [3]
Input port of the design under test to input of one internal
flip-flip
http://www.internationaljournalssrg.org
Page 652
International Journal of Engineering Trends and Technology- Volume3Issue5- 2012
Output of an internal flip-flip to input of another flip-flip
Output of an internal flip-flip to output port of the design
under test
A combinational path connecting the input and output ports
of the design under test
The last DC command in the script developed in
previous section, instructs the tool to report the path with the
worst timing. In this case, the path with the worst timing is a
combinational path of type two. The delay associated with
this path is the summation of delays of all combinational gates
in the path plus the Clock-To-Q delay of the originating flipflop, which was calculated as 24.09ns. By considering the
setup time of the destination flip-flop in this path, which is
0.85ns, the 40MHz clock signal satisfies the worst
combinational path delay. The delays of combinational gates,
setup time of flip-flops and Clock-To-Q values are derived
from the LSI_10k library file that was used for the mapping
step during synthesis. The synthesis timing report is shown
below:
combinational and sequential areas. The synthesis area report
is shown below:
III CONCLUSION
In this paper we proposed new adder in order to
reduce the area and power of SQRT CSLA architecture. The
reduced number of gates of this work offers the great
advantage in the reduction of area and also the total power.
The compared results show that the modified CSLA low area
and power as compared to SQRT CSLA. The power-delay
product and also the area-delay product of the proposed
design show a decrease for 16-, 32-, and 64-b sizes which
indicates the success of the method and not a mere tradeoff of
delay for power and area. The modified CSLA architecture is
therefore, low area, low power, simple and efficient for VLSI
hardware implementation.
REFERENCES
[1]
FIGURE. SIMULATED OUTPUT.
C SYNTHESIS AREA RESULT:
The synthesis area report shows the total number of cells and
nets in the netlist. It also uses the area parameter associated
with each cell in the LSI_10K library file, to calculate the total
combinational and sequential area of the netlist. The total
area of the gate level netlist is unknown since it depends on
total area of the interconnects, which itself is a function of the
wiring load model used in physical design. The total cell area
in the netlist is reported as 22978 units, which is the sum of
ISSN: 2231-5381
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[1] A. Avizienis, “Signed Digit Number Representation for Fast
Parallel
Arithmetic,” IRE Trans. Electron. Computers, vol. EC-10, pp.389400, 1961.
[2] R. I. Hartley, “Subexpression sharing in filters using canonic signed
digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, no. 10, pp.
677–688, Oct. 1996.
[3] T. Solla and O. Vainio, “Comparison of programmable FIR filter
architectures for low power,” in Proc. 28th Eur. Solid-State Circuits
Conf., Firenze, Italy, pp. 759–762, Sep. 2002.
[4] K. H. Chen and T. D. Chiueh, “A low-power digit-based
reconfigurable FIR filter,” IEEE Trans. Circuits Syst. II, vol. 53, no.
8, pp. 617–621, Aug. 2006.
[5] T. Zhangwen, J. Zhang, and H. Min, “A high-speed, programmable,
CSD coefficient FIR filter,” IEEE Trans. Consumer Electron., vol. 48,
no. 4, pp. 834–837, Nov. 2002.
[6] J. Park, et al., "Computation Sharing Programmable FIR Filter for
Low-Power and High-Performance Applications", IEEE J. Solid state
Cir. Sys., vol.39, no.2, pp.348-357, Feb. 2004.
[7] R. Mahesh and A. P. Vinod, "New Reconfigurable Architectures
for
Implementing FIR Filters with Low Complexity", IEEE Trans. CAD
of Int. Cir., vol. 29, no.2, pp.275-288, Feb. 2010.
[8] K. Dabbagh-Sadeghipour, A. Aghagolzadeh, "A New Hardware
Efficient, Low Power Fir Digital Filter Implementation Approach",
Proc. IEEE ICECS,
http://www.internationaljournalssrg.org
Page 653
Download