Implementation of Constant Delay Model Implementation of Low

advertisement
1
Implementation of Constant Delay Model
Implementation of Low Power Constant Delay
Model
T. Leela Krishna 1, Dr.J. Selva Kumar 2.
1. SRM University, Kattankulathur, Tamil Nadu 603203, INDIA, e-mail: thota.leelakrishna@gmail.com ,
2. SRM University, Kattankulathur, Tamil Nadu 603203, INDIA, e-mail: selvakumar.j@ktr.srmuniv.ac.in
ABSTRACT
This paper describes a unique logic style description named Constant Delay Logic (CDL) style,
targeting at full custom implementation of high speed & low power applications. Pre evaluation of
output before the arrival of inputs from the preceding stages is ready becomes an added advantage of
CDL style. Besides adjusting the width of timing window, clock allocation & its distribution are
considered as important design factors. Power consumption is significantly reduced, but the precharge propagation path delay affects the speed performances and limits the energy–delay product
(EDP) improvements. Using 45-nm general purpose CMOS technology, CDL is evaluated for single
cycle multi-staged circuit block. Simulation results unveil that the CDL achieves better performance
and is more energy efficient than the other logic styles for the implementation on several circuits.
Keywords:
Constant delay logic model (CDL), contention, glitch, pre – evaluation.
1. INTRODUCTION
Ever growing demand of Low Power energy efficient high
performance very large scale integration (VLSI) can be
addressed at different design levels mentioned as the
architectural, algorithmic, circuit, layout, and the process
technology level [2]. At the circuit design level, power saving
can be done by means of selecting proper choice of a logic
style for implementing arbitrary circuits. This is because all
the important parameters governing power dissipation like
switching capacitance in the entire circuit, transition activity,
and short-circuit currents which are strongly influenced by the
logic style selected. Application dependency, implementation
methodology of the circuit, and the design technique used,
various performance aspects become vital, disallowing the
articulation of universal rules for optimal logic styles.
With ever increasing clock frequency, energy efficient logic
style which can be useful in the design of low power
consuming circuits became a pre-requisite. The power
dissipation characteristics of various existing logic styles are
compared qualitatively and quantitatively by actual logic gate
implementations & simulations are carried under realistic
circuit arrangements and operating conditions [2].
However, the performance enhancement comes with several
costs, including a reduced noise margin, a problem of chargesharing and higher power dissipation due to a higher data
activity. Several variations of the dynamic domino logic,
namely NP domino (NORA domino) [2], zipper domino [4],
and data-driven dynamic logic (D3L) [5], [6], have been
proposed but they are never widespread in the VLSI industry
[7], [8].
A tremendous effort on research has been dedicated to explore
new logic styles that go beyond dynamic domino & static
logic’s. In specific, source-coupled logic (SCL) [12]
demonstrated victor performances that are difficult to achieve
by any other logic styles. However, it lacks due to its
sufferings from high power dissipation due to a constant
current draw, and its differential nature requires
complementary signals.
Pseudo-NMOS logic uses a single pull-up PMOS transistor,
provides both high speed and less transistor count at the
expense of high static power consumption as well as reduced
output voltage swing. Output prediction logic (OPL) [11] has
also shown superior performance in high-speed adders [6].
Yet, OPL requires the generation and distribution of
multiphase clock signals with small timing separations and
low skews, which are hard to achieve. Though numerous highspeed logic styles have been proposed, dynamic and CDL still
remain the enthralling choices when performance is the
primary concern.
In recent years, a new way of logic operation named feed
through logic (FTL) [12], has been proposed, which has
demonstrated its high-performance potentiality.
2
Implementation of Constant Delay Model
The rest of the paper is organised as follows. Section 2 gives
an introduction to the most important existing static &
dynamic logic styles and compares them qualitatively. Section
3 provides the importance of FTL & introduction of CDL.
Section 4 renders the design of CDL & measurement of
unwanted glitch voltage using first order approximation of
Taylor’s series. Section 5 dissects the impact of timing
window width on CDL & it’s comparison with other logic
style’s using different logic expressions. Section 6 concludes
with some final comments.
A optimum robust operation of CDL is achieved as long as
clock signals reaches earlier than the input signals. In this
paper, we exhibit that:
1) CDL surpasses other logic styles as it has better energy
efficiency which is well suited for high-performance digital
blocks &
2) Robustness under extreme process and temperature
variations (PVT).
2. LOGIC STYLES
3.1. FTL Logic
The logic style employed in logic gates basically
influences the speed, wiring complexity, power dissipation,
and the size of a circuit.
FTL [13] [Figure 2] in CMOS technology basic operation is as
follows:
Consider dynamic domino logic [Figure 1.], the critical path
consists of NMOS logic transistors. In FTL, the part of the
clock & logic transistor is replaced & the clock transistor itself
is included in critical path.
3. EXPLOITATION OF CDL
When CLK is high, the pre discharge period occurs and Out is
pulled down to GND through M2. When CLK is low, M1 is
on, M2 is off, and the gate moves into the evaluation period.
Figure 2: FTL
Figure 1: Schematic of Dynamic Domino Logic
with Footer transistor
A high performance capable feed through logic (FTL) came
into existence. The first era of FTL discloses many defects
which includes excessive power dissipation & reduced noise
margin.
To alleviate these problems proposed new high performance
logic is Constant Delay Logic (CDL).CDL renders a local
window technique & a self-reset circuit where robust logic
operation is enabled with minimum power consumption
besides maintaining speed advantage of FTL.
The uniqueness of CDL from previously existing logic styles
is delay which is not affected by the logic expression on firstorder approximation,. CDL does not need complementary
signals & can be easily incorporated with static as well as
dynamic domino logics. CDL is exempted to suffer from
constant static power dissipation which usually occurs in
Pseudo NMOS. Moreover, timing requirement of the clock in
CDL is not as rigorous as OPL.
If inputs (IN) are at logic “1,” Out enters into the contention
mode where M1 and transistors in the NMOS pull-down
network (PDN) are conducting current simultaneously. If PDN
is off, then the output quickly rises to logic “1.” In this case,
FTL’s critical path is always a single PMOS transistor.
Contempting the performance advantage, FTL is affected from
reduced noise margin, excess direct path current and output
voltage which is nonzero nominally low. It is because of
contention between M1 and NMOS PDN during the
evaluation period. Furthermore, cascading multiple FTL
stages together to perform complicated logic evaluations is not
practical. Consider a chain of inverters implemented in FTL
cascaded together and driven by the same clock, as shown in
Figure 3.
When CLK is low, M1 of every stage turns on, and the output
of every stage begins to rise. This will result in false logic
evaluations at even numbered (i.e., 2, 4, 6, etc.,) stages since
initially there is no contention between M1 and NMOS PDN
because all inputs to NMOS transistors are reset to logic “0”
during the reset period.
3
Implementation of Constant Delay Model
Figure 5: CDL Basic Buffer
Figure 3: Inverter Chain with FTL
In order to resolve the above problems, CDL is proposed with
a schematic shown in Figure 4.
Figure 6: Timing diagram
4. DESIGN CONSIDERATIONS
4.1 CDL Sizing:
The sizing of INV1-3 and M3–M6 in Figure 5 is close to the
minimum size which will not create any effect on area.
Modification of the length of INV1-3 provides the required
timing window duration based on requirement.
Figure 4: CDL Block Diagram
Timing block (TB) creates an adjustable window period in
order to minimise the static power dissipation. Logic Block
(LB) helps to eliminate the unwanted glitch and also makes
cascading CDL feasible. A buffer implemented in CDL with
schematics of TB and LB is shown in Figure 5
4.1.1 CDL Versus Pseudo-NMOS:
Though pseudo-NMOS and CDL are ratioed circuits, they
depend on the correct PMOS to NMOS strength ratio to
perform correct logic operations[.3] PMOS transistor width is
adjusted one-half the strength of the NMOS PDN as a tradeoff between noise margin and speed in pseudo-NMOS [1]. On
the other hand, CDL always discharges X to GND when CLK
is high, therefore, CDL can be optimized for low-to-high
transition only. In order to provide more speed up, PMOS
clock transistors in CDL can be upsized larger until the output
glitch is maintained at an acceptable level.
4
Implementation of Constant Delay Model
4.2 Output Glitch Measurement:
Figure 7 provides a simplified schematic of CDL during the
contention mode, where both transistors P1 and N1 are on
simultaneously and induce a glitch voltage
Figure 7 again transistor N2 operates in the subthreshold
region while P2 is working in the linear mode. Equating the
two current equations yields
π‘Šπ‘›2
𝐼𝐷 =
𝐼𝑒
𝐿𝑛2 1
𝑉
− 𝑑𝑠𝑛2
𝑉𝑔𝑠𝑛2 −𝑉𝑑𝑛
(1−𝑒 𝑉𝑇 )
πœ‚π‘‰π‘‡
π‘Šπ‘2
(𝑉𝑑𝑠𝑝2 )
= πœ‡π‘ πΆπ‘œπ‘₯
[(𝑉𝑔𝑠𝑝2 − 𝑉𝑑𝑝 )𝑉𝑑𝑠𝑝2 −
𝐿𝑝2
2
Where,𝐼𝑑
ΔV1, which in turn generates another smaller glitch ΔV2. By
design, ΔV1 should be small [<VT]. Hence, P1 operates in the
saturation region while N1 is in the linear region. The current
equation is given as
= πœ‡π‘› πΆπ‘œπ‘₯
𝐿𝑛1
[ (𝑉𝑔𝑠𝑛1− 𝑉𝑑𝑛 )𝑉𝑑𝑠𝑛1 −
2
= (𝑉𝐷𝐷 − βˆ†π‘‰1 − 𝑉𝑑𝑝 )βˆ†π‘‰2 −
(−𝑉𝑑𝑠𝑛2 )
⁄𝑉
𝑇
√(𝑉𝐷𝐷 − β–³ 𝑉1 − 𝑉𝑑𝑝 ) 2 − 𝐴𝑒
(1)
]
(β–³ 𝑉1 )2
π‘Šπ‘1
(𝑉𝐷𝐷 − 𝑉𝑑𝑝 ) = (𝑉𝐷𝐷 − 𝑉𝑑𝑛 ) β–³ 𝑉𝑑 −
(2)
4π‘Šπ‘›1
2
𝐴=
△𝑉1 − 𝑉𝑑𝑛
πœ‚π‘‰π‘‡
π‘Šπ‘1
2
β–³ 𝑉𝑑 = (𝑉𝐷𝐷 − 𝑉𝑑𝑛 ) − √((𝑉𝐷𝐷 − 𝑉𝑑𝑛 )2 −
(𝑉 − 𝑉𝑑𝑝 )
2π‘Šπ‘›1 𝐷𝐷
(4)
Assuming Vtp ≈ Vtn, (3) can be approximated as
2
π‘Šπ‘1 (𝑉𝐷𝐷 − 𝑉𝑑𝑝 )
β–³ 𝑉1 ≈ 𝑉𝐷𝐷 − 𝑉𝑑𝑛 − (𝑉𝐷𝐷 − 𝑉𝑑𝑛 ) +
4π‘Šπ‘›1 (𝑉𝐷𝐷 − 𝑉𝑑𝑛 )
4π‘Šπ‘›1
(5)
β–³ 𝑉2 = (𝑉𝐷𝐷 − β–³ 𝑉1 − 𝑉𝑑𝑝 )
𝐴𝑒
△𝑉1 − 𝑉𝑑𝑛
πœ‚π‘‰π‘‡
)
2(𝑉𝐷𝐷 − β–³ 𝑉1 − 𝑉𝑑𝑝 )
𝐴𝑒
is found
through a similar approach. Consider
(11)
△𝑉1 − 𝑉𝑑𝑛
πœ‚π‘‰π‘‡
2(𝑉𝐷𝐷 − △𝑉1 − 𝑉𝑑𝑝 )
(12)
(3)
Equations (5) and (12) provide several first-order design for
CDL. For a given ΔV1, designers can quickly estimate the
required Wp1 to Wn1 ratio. Moreover, ΔV1 is linearly
proportional to the shift of Vt and transistor width in the
presence of process variations.
4.3 Power Consumption
Data activity measures how frequent signals toggle and is
defined as
π·π‘Žπ‘‘π‘Ž 𝐴𝑐𝑑𝑖𝑣𝑖𝑑𝑦 =
# π‘π‘œ π‘œπ‘“ π‘ π‘–π‘”π‘›π‘Žπ‘™ π‘‡π‘Ÿπ‘Žπ‘›π‘ π‘–π‘‘π‘–π‘œπ‘›π‘ 
ΔV2
(10)
Applying Taylor expansion
βˆ†π‘‰2 ≈
By Taylor expansion, it is approximated in first order as
𝑑
𝑑2
𝑑3
𝑑
−
+
βˆ™βˆ™βˆ™βˆ™βˆ™βˆ™βˆ™ ≈ 𝑁 +
2𝑁 8𝑁 3 16𝑁 5
2𝑁
(9)
2π‘Šπ‘›2 𝐼𝑑
4π‘Šπ‘›2
≈
(𝑉 )2 𝑒 1.8
πœ‡π‘ πΆπ‘œπ‘₯ π‘Šπ‘2
π‘Šπ‘2 𝑑
− ((𝑉𝐷𝐷 − β–³ 𝑉1 − 𝑉𝑑𝑝 ) −
ΔV1 can be found by solving the quadratic equation
π‘Šπ‘1 (𝑉𝐷𝐷 −𝑉𝑑𝑝 )
(βˆ†π‘‰2 )2
(8)
2
since (𝑉𝐷𝐷 − β–³ 𝑉2 ) ≫ 𝑉𝑇
≈0
2
≈
(7)
Solving β–³ 𝑉2 = (𝑉𝐷𝐷 − β–³ 𝑉1 − 𝑉𝑑𝑝 ) −
where μp and μn are the hole and electron mobility of PMOS
and NMOS transistors, respectively, Cox is the oxide
capacitance, W and L are the transistor width and length,
respectively, and Vgs and Vds are the transistor gate-to-source
and drain-to-source voltages, respectively. Assuming same
length devices are used, and μp ≈ 0.5 μn, rearranging (1) gives
√𝑁 2 + 𝑑 = 𝑁 +
, 𝑉𝑇 = πΎπ‘‡π‘ž
βˆ†π‘‰1 −βˆ†π‘‰π‘‘π‘›
π‘Šπ‘›2 𝐼1 𝑒 πœ‚π‘‰π‘‘
πœ‡π‘ πΆπ‘œπ‘₯ π‘Šπ‘2
Where 𝑒
π‘Šπ‘1
1
𝐼𝐷 = πœ‡π‘ 𝐢0π‘₯
(𝑉
− 𝑉𝑑𝑝 )
2
𝐿𝑝1 𝑔𝑠𝑝1
(𝑉𝑑𝑠𝑛1 ) 2
3π‘‡π‘œπ‘₯
π‘Šπ‘‘π‘š
(6)
where μo is the zero bias mobility, η is the sub threshold swing
coefficient, Wdm is the maximum depletion layer width, VT is
the thermal voltage, K is the Boltzmann constant, T is the
temperature in kelvin, and q is the electron charge. In the case
of NMOS transistors, μo is simply μn. Rearranging (6) gives
Figure 7: CDL during Contention Mode
π‘Šπ‘›1
= πœ‡0 𝐢0π‘₯ (𝑉𝑇 )2 𝑒1.8 , πœ‚ = 1 +
2
# π‘π‘œ.π‘œπ‘“ π‘ π‘–π‘”π‘›π‘Žπ‘™π‘  𝑋 # π‘π‘œ.π‘œπ‘“ π‘π‘™π‘œπ‘π‘˜ 𝑐𝑦𝑐𝑙𝑒𝑠
(13)
5
Implementation of Constant Delay Model
CDL is still an attractive choice in a high-performance fullcustom design because:
1) CDL is only intended to replace the critical path
2) Power management techniques such as clock gating [10],
where the clock connection to idle module is turned off
(gated), will significantly reduce CDL dynamic power
consumption.
The window duration (width) is defined as the 50%
point of the falling edge of CLK to the 50% point of the rising
edge of node Y. The delay is measured at the 50% switching
point of either the CLK or data to the 50% switching point of
the latest output.
5. PERFORMANCE ANALYSIS
Delay (C-Q & D-Q),Power, Energy , Power Delay Product
(PDP),Energy Delay Product (EDP) are measured & plotted
for different logic styles in comparison.
5.1 Ripple Carry Adder
The clock timing is designed in such a way that all the CDL
gates except the first stage are operated in the D–Q mode.
5.2 32-bit Carry Lookahead Adder (CLA)
The 32-bit CLA uses eight 4-bit FAs with dedicated circuitry
to facilitate carry generation[9]. The energy-efficient FA used
in this analysis utilizes pass transistor logic styles with only 24
transistors for sum generation [23]. For the carry generation,
only the critical path is replaced with different logic style. In
this case, the 4-bit critical carry generation path of CLA is
G3:0 = G3 + P3 (G2 + P2(G1(G0)))
(15)
where G and P are the generate (A.B) and propagate (A⊕B)
signals, respectively. CDL and CCD logic are implemented to
reduce the number of fan-ins.
When expression 15 is rearranged
𝐺1:0 = 𝐺1 + 𝑃1 𝐺0 ,
𝑃3:2 = 𝑃3 𝑃2
𝐺3:2 = 𝐺3 + 𝑃3 𝐺2 , 𝐺3:0 = 𝐺3:2 (𝑃3:2 + 𝐺1:0 ) (16)
Figure 9: 32 bit CLA Block Diagram
Figure 8: RCA Block Diagram
The main purpose of this 8-bit RCA is to demonstrate CDL
performance advantage and to discuss the design
considerations that should be taken into account when using
CDL A more energy-efficient pass-transistor FA design [13]
will be implemented in the subsequent analysis to provide a
more realistic comparison. Only the timing-critical carry
generation is replaced with dynamic and CDL, while
noncritical sum computation remains static in all three RCAs.
VI. PERFORMANCE ANALYSIS
VI. PERFORMANCE ANALYSIS
Figure 10: Graph between Power Vs Data
Activity for different Logic Style
6
Implementation of Constant Delay Model
Table 1.RCA Performance Comparison:
Data Activity (%)
Delay (ps)
Power (μW)
Energy (fJ)
PDP (fJ)
EDP (fJ.ps)
100
18.152
145.5
331.33
2642.2
Dynamic
50
18.16
18.168
72.76
330.33
1321.32
25
12.5
100
18.176
36.37
329.3
660.47
18.19
18.17
329.69
330.14
10.583
10.57
330.82
264.25
50
25
10.586
5.273
330.25
131.82
CDL
25
12.5
10.33
2.625
390.5
65.625
10.353
1.301
390.65
32.52
CDL
25
12.5
9.187
69.01
187.23
7047.4
6.753
10.078
137.62
210.96
Table 2. CLA 32 bit Performance Comparison:
Data Activity (%)
Delay (ps)
Power (μW)
Energy (fJ)
PDP (fJ)
EDP (fJ.ps)
100
16.046
127.3
450.72
3577.13
Dynamic
50
28.1
15.848
239.7
445.32
6735.57
25
12.5
100
15.835
4.607
444.96
129.45
15.75
0.916
442.57
25.73
10.38
135.9
211.54
2769.64
6. CONCLUSION
Implementation of 8 bit RCA using CDL provides an area
improvement of 41.7% when compared with dynamic logic
style & an area improvement of 22.3% when compared with
static logic. Performance analysis of 32-bit CLAs reveals that
CDL is 35.2% faster than dynamic domino logic. PDP & EDP
is measured for the simulation circuits where a drastic
improvement is observed.
REFERENCES
1. R. Zimmermann and W. Fichtner, “Low-power logic styles:
CMOS versus pass-transistor logic,” IEEE J. Solid-State
Circuits, vol. 32, no. 7, pp. 1079–1090, Jul. 1997.
2. N. Goncalves and H. De Man, “NORA: A racefree dynamic
CMOS technique for pipelined logic structures,” IEEE J.
Solid-State Circuits, vol. 18, no. 3, pp. 261–266, Jun. 1983.
3. C. Lee and E. Szeto, “Zipper CMOS,” IEEE Circuits Syst.
Mag., vol. 2,no. 3, pp. 10–16, May 1986.
4. R. Rafati, S. Fakhraie, and K. Smith, “A 16-bit barrelshifter implemented in data-driven dynamic logic (D3L),”
IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 53, no. 10,
pp. 2194–2202, Oct. 2006.
5. F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P.
Corsonello, “Lowpower split-path data-driven dynamic
logic,” Circuits Dev. Syst. IET,vol. 3, no. 6, pp. 303–312,
Dec. 2009.
6. N. Weste and D. Harris, CMOS VLSI Design: A Circuits
and Systems Perspective, 4th ed. Reading, MA: Addison
Wesley, Mar. 2010.
7. K. H. Chong, L. McMurchie, and C. Sechen, “A 64 b adder
using selfcalibrating differential output prediction logic,”
in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,
San Francisco, CA, Feb. 2006, pp. 1745–1754.
8. S. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios,
K. R. Mistry, and K. Soumyanath, “Sub-500-ps 64-b
ALUs in 0.18-μm SOI/bulk CMOS: Design and scaling
50
23.8
9.2
345.8
187.49
1406.42
trends,” IEEE J. Solid-State Circuits, vol. 36, no. 11, pp.
318–319, Nov. 2001.
9. S. Mathew, M. Anders, R. Krishnamurthy, and S. Borkar,
“A 4 GHz 130 nm address generation unit with 32-bit
sparse-tree adder core,” in VLSI Circuits Dig. Tech. Papers
Symp., 2002, pp. 126–127.
10. S. K. Mathew, M. A. Anders, B. Bloechel, T. N.
Krishnamurthy, and S. Borkar, “A 4-GHz 300-mW 64-bit
integer execution ALU with dual supply voltages in 90-nm
CMOS,” IEEE J. Solid-State Circuits, vol. 40,no. 1, pp.
44–51, Jan. 2005.
11. I. Sutherland, R. F. Sproull, and D. Harris. (Feb. 1999).
Logical Effort: Designing Fast CMOS Circuits [Online].
Available: http://amazon.com/o/ASIN/1558605576/
12. S. Kiaei, S.-H. Chee, and D. Allstot, “CMOS sourcecoupled logic for mixed-mode VLSI,” in Proc. IEEE Int.
Circuits Syst. Symp., New Orleans, LA, May 1990, pp.
1608–1611.
13. L. McMurchie, S. Kio, G. Yee, T. Thorp, and C. Sechen,
“Output prediction logic: A high-performance CMOS
design technique,” in Proc. Comput. Des. Int. Conf.,
Austin, TX, 2000, pp. 247–254.
T. Leela Krishna received the B.Tech in
Electronics & Communication Engineering
from SVCET, Andhra Pradesh, India in 2010.
He is currently pursuing the M.Tech. in VLSI
Design from SRM .University. His current
research interests include low-power highperformance digital CMOS circuits.
Dr. J Selva Kumar, received the B.E in
Electronics & Communication Engineering
from MADRAS University, India in 1999.
He received his M.Tech from Anna
University, 2003 & Ph.D. degree from SRM
University, 2013 respectively. His current
research interests are Low power &
Reconfigurable VLSI architecture Design.
Download