Sizing of Dual-VT Gates for Sub-VT Circuits

advertisement
Sizing of Dual-VT Gates for Sub-VT Circuits
Babak Mohammadi, S.M. Yasser Sherazi, and Joachim Neves Rodrigues
Electrical and Information Technology, Lund University, 22100 Lund, Sweden
{babak.mohammadi, yasser.sherazi, joachim.rodrigues}@eit.lth.se
I. I NTRODUCTION
Scaling down the supply voltage (VDD ) to the sub-threshold
(sub-VT ) regime is well known as an effective method for energy reduction [1]. Unfortunately, the exponential dependence
of the sub-VT currents on process parameters like threshold
voltage (VT ), makes the transistor performance and functionality extremely vulnerable to process variations [2]. Thus, the
transistor’s performance in terms of delay and reliability is
considerably degraded compared to super-VT operation [3].
This reduces the maxima of the attainable throughput and
adds extra energy overhead to the design. Sub-VT optimized
designs are often realized by full-custom cells (FCL) [4], [5].
In this case, the impact of process variations is combated by
transistor up-sizing. Transistor sizing improves the timing, i.e.,
equalized rise/fall time, and increased noise margins at the cost
of higher area and energy [5].
Modern sub-micron CMOS technologies are offered with
different threshold options which gives designers the opportunity to address the leakage energy by employing high-VT
gates, whereas performance is improved by using low-VT
devices [6]. However, this method is mainly employed on
gate level. The advantages of using different threshold options
on schematic level, i.e., inside gates is not well explored in
literature.
Contribution: In this work the performance and reliability
degradation of gates operated in the sub-VT regime is addressed. To speed-up the performance bottlenecks in gates
and balance the driving strength of pull-up and pull-down
networks (PUN and PDN), selected transistors are replaced by
their lower-VT equivalent. This method is referred as dual-VT
(DVT) in this study. The performance gain of the proposed
optimization techniques is analyzed by means of extensive
Monte-Carlo (MC) simulations of an inverter, a NAND3, and
NOR3 gate.
4
ION−NMOS/ION−PMOS
Abstract—This paper presents a novel method to improve the
performance of sub-threshold (sub-VT ) gates in 65-nm CMOS
technology. Faster transistors with a lower threshold voltage are
introduced in the weaker network of a gate. It is shown that
the employed method significantly enhances the reliability and
performance of the gate, with an additive advantage of a lower
area cost compared to traditional transistor sizing. Extensive
Monte-Carlo simulations are carried out to verify the proposed
optimization technique. The simulation results predict that the
NAND3 and NOR3 testbench shows a 98% higher noise margin.
Furthermore, the inverter and NAND3 gates show an speed
improvement of 48% and 97%, respectively.
1 Wp
3
2 Wp
2
5 Wp
1
0
10 Wp
0.2
0.4
VDD [V]
0.6
Fig. 1. The ratio of active currents of HVT-NMOS and HVT-PMOS in
sub-VT . VT of transistors ∼700 mV. WP is the min size allowed in the
technology. The NMOS transistor has the minimum width.
The remaining of the paper is structured as follows: Sec. II,
describes the theory behind the method employed. In Sec. III,
the Dual-VT approach is presented for basic combinational
gates and elaborated by means of MC simulations. Finally,
conclusions are drawn in Sec. IV.
II. T HEORY
CMOS processes are designed and optimized for superVT operation. Consequently, all the optimization techniques
in super-VT need to be carefully analyzed for their efficiency
in the sub-VT domain. The transistor strength balancing is
one of these techniques which has an important effect on
design’s performance and reliability. The driving balance of
a circuit depends on different process parameters, i.e., the
primary process parameter VT and secondary parameters drain
induced barrier lowering (DIBL) and subthreshold slope. The
traditional method to equalize the imbalance is transistor
sizing. This is done by a relatively low size-ratios of PMOS
and NMOS in the super-VT regime. However, the transistor
size-ratios become very large in the sub-VT domain, see
Fig. 1. The peak current ratio between PMOS and NMOS is
found in the sub-VT regime. Furthermore, it is observed that
by upsizing the PMOS transistor by 10×, a strength balancing
is still not achieved. The imbalance between the PUN and PDN
worsens even more when transistor stacking is required to form
a complex gate (architectural dependency), as this reduces the
current driving capability. In this case the transistor size ratios
reach impractical large values.
The balanced strength improves the gate’s stability and
robustness, as the switching threshold voltage (Vm ) moves to
its ideal value (VDD /2) and increases the noise-margins (N M ).
Unbalanced switching threshold and low NMs are among the
main sources of functionality and stability failures in sub-VT
regime. Therefore, designing the gates with maximum possible
NM (N ML = N MH ) is of vital importance.
In this study, the use of different threshold options in the
PUN and PDN as an alternative to traditional transistor sizing
is investigated. The lower-VT transistors are introduced in the
slowest network, either PUN or PDN, to improve their driving
capability. The best VT option for PUN and PDN transistors
depends on VDD and the architecture of the gate. However,
finding the best VT option for each VDD and architecture is out
of the scope of this study. The effectiveness of the proposed
method is demonstrated on 3 logic gates; an inverter, a NAND,
and a NOR gate at 300 mV. To consider more extreme cases
with stacked transistors in PUN or PDN, 3 input NAND
and NOR gates are used in simulations, since they contain
3 stacked transistors in PDN and PUN, respectively.
III. R ESULTS
The driving imbalance has a strong exponential dependency
on mismatch of process parameters (VT , slope factor and
DIBL coefficient) in PMOS and NMOS transistors, where the
transistor width has a linear relation in sub-VT regime [7].
Consequently, small differences in exponential process parameters require large changes in the linear current coefficient, i.e.,
device dimensions to compensate for the changes in exponential process parameter mismatch to compensate. Therefore, this
method has a large area penalty.
Figure 2(a) shows the NMOS and PMOS sizes for an ideal
Vm in inverters implemented with different threshold options
at 300 mV. The process used in this study has 3 different
threshold options which are 1) high-VT (HVT) with a VT
of ∼ 700 mV, 2) standard-VT (SVT) with a VT of ∼ 560 mV
and 3) low-VT (LVT) with a VT of ∼ 450 mV. It is observed
that in the LVT and HVT inverters, the required minimum
transistor width is ∼ 8× the Wmin (minimum allowed width
of transistor in the technology), whereas for the SVT inverter
the ratio is ∼13×.
The voltage transfer characteristic (VTC) of a balanced
inverter based on the DVT method is shown in Fig 2(b). It is
observed that Vm of pure LVT and SVT inverters is 127 mV,
however, by replacing the PMOS transistor in the SVT inverter
by a LVT transistor, an ideal Vm (VDD /2) is achieved. To
get the VTC with pure HVT, SVT and LVT transistors, the
PMOS transistors need to be upsized by 6.8×, 12.4× and
7.7×, respectively, while keeping the NMOS at minimum size.
Thus, the area cost of this proposed technique is much lower,
compared to the traditional transistor sizing.
A commonly practised functionality metric in the static
logic is static noise margin (SNM) measurement. This metric
is mainly used in SRAM stability analysis, however, it is
shown in [8] that the SNM of two back to back gates is
equal to maximum noise that can be applied to a long chain
of the same gates. Fig. 2(c) shows the benchmark used for
SNM analysis. The reason for the selection of NAND and
NOR is because they give the worst input low voltage (VIL )
and input high voltage (VIH ). The input number of NAND
and NOR is selected to be 3 to consider worst case output
swing, as each gate has 3 stacked transistors which reduces the
driving capability. The DVT approach for NAND3 and NOR3
for 300 mV and the process used in this study is shown in
Fig. 2(c). The best balanced Vm match for NAND3 is SVT
transistors for PUN and LVT transistors for the PDN. The best
balanced Vm match for NOR3 gate is LVT for the PUN and
HVT for PDN. The transistor sizes of DVT-NAND3 and DVTNOR3 is the same in PUN and PDN. However, the NOR3 gate
in a standard-cell library (SCL), employs 85% and 25% wider
transistors in the PUN and PDN, respectively.
To consider the local variations, 1000 point Monte-carlo
simulations at 300 mV and 27 ◦ C are performed. The same
setting and simulations is performed for the gates with NAND3
and NOR3 gates in SCL. The simulation results are shown in
Fig. 3. By comparing the butterfly curves, it is observed that
the DVT approach, despite of having narrower transistors in
the NOR3 gate, has higher symmetrical curves and the SNM
windows are larger. By comparing the SNM distributions in
Fig 3, it is concluded that the DVT approach, by having a mean
SNM voltage of 105 mV, is ∼ 47% better than single threshold
gates. Furthermore, the SNM variation is ∼ 82% lower than
other cases, and worst-case SNM is found at 90 mV, which in
average is 96% lower than other single-VT approaches.
The same gates in the SNM testbench, i.e. NAND3 and
NOR3 are used for timing analyses. Fig. 4 represents the
delay distributions of a NAND3 gate with different threshold
options. All the inputs in the benchmark toggled simultaneously. As expected, the mean fall delay in HVT-NAND3 gate
is 35× higher than its rise delay and worst-case rise delay
is 93× longer. But as shown in Fig. 4(d), the mean delay of
rise and fall delays are almost equal. The rise delay of the
SVT and LVT gates are shorter than the DVT gate, but since
their their fall delay is equal to or higher than DVT, there is
no overall performance gain. It is observed that by employing
DVT, the performance boosts to LVT gate levels, while the
static energy dissipation remains low in the HVT levels. The
similar behaviour is observed for NOR3 gates.
IV. C ONCLUSION
In this study, it is shown that a dual-VT approach in
schematic level results in a higher both performance and
reliability. The SNM of the NAND3 and NOR3 gates show an
improvement of 47% over the same setting with SCL gates.
The overall performance gain of a DVT-inverter and DVTNAND3 gates is 45 % and 67 %, respectively, compared to the
gates in the SCL. Furthermore, the MC simulations confirm
a lower worst-case delay and noise-margins. Additionally, the
proposed technique is highly area efficient.
ACKNOWLEDGMENT
This work was kindly supported by the Swedish Vetenskapsrådet (621-2011-4540), and Swedish VINNOVA Industrial Excellence Centre (SOS).
R EFERENCES
[1] A. Wang et al., “Optimal supply and threshold scaling for subthreshold
cmos circuits,” in IEEE ISVLSI, 2002, pp. 5 –9.
[2] A. Tajalli et al., “Design trade-offs in ultra-low-power digital nanoscale
cmos,” IEEE Transactions on Circuits and Systems I: Regular Papers,
vol. 58, no. 9, pp. 2189 –2200, sept. 2011.
[3] P. Friedberg et al., “Modeling within-die spatial correlation effects for
process-design co-optimization,” in ISQED, march 2005, pp. 516 – 521.
[4] P. Meinerzhagen et al.
[5] S. Luetkemeier et al., “A 200 mV 32b subthreshold processor with
adaptive supply voltage control,” in IEEE ISSCC Papers, feb. 2012, pp.
484 –486.
[6] D. Bol et al., “A 25 MHz 7 uW/MHz ultra-low-voltage microcontroller
SoC in 65nm LP/GP CMOS for low-carbon wireless sensor nodes,” in
IEEE ISSCC, feb. 2012, pp. 490 –492.
[7] J. Kwong et al., “Variation-driven device sizing for minimum energy subthreshold circuits,” in Proceedings of ISLPED, oct. 2006, pp. 8 –13.
[8] J. Lohstroh et al., “Worst-case static noise margin criteria for logic circuits
and their mathematical equivalence,” IEEE JSCC, vol. 18, no. 6, pp. 803
–807, dec. 1983.
0.3
16
SVT
0.25
12
VOUT [V]
Normalized PMOS width
14
10
8
HVT
6
LVT
0.2
Dual−VT
LVT
SVT
VN
LVT
LVT
0.15
127 mV
150 mV
0.1
LVT
SVTX3
LVT
LVT
0.05
4
VN
2
1
1.5
2
2.5
3
Normalized NMOS width
3.5
4
HVTX3
LVT
0
0
0.1
(a)
VIN[V]
0.2
0.3
(b)
(c)
Fig. 2. a) Required NMOS and PMOS widths for having ideal Vm in the HVT, SVT and LVT inverters at 300 mV, b) Voltage transfer curves (VTC) of
SVT, HVT and DVT inverters, c) 3 input DVT-NAND and DVT-NOR gates in the benchmark used for static noise margin (SNM) extraction.
150
0.3
0.3
0.1
100
Occurrences
0.15
0.2
VOUT
0.2
VOUT
150
0.25
Occurrences
0.25
µ:66.3 [mV]
σ :7.87 [mV]
σ / µ :11.9 %
50
0.15
0.1
0.05
µ:69.7 [mV]
σ :8.08 [mV]
σ / µ :11.6 %
50
0.05
0
0
0.1
VIN
0.2
0
0
0.3
0.02
0.04 0.06 0.08
SNM distribution[V]
0
0
0.1
0.1
VIN
0.2
0.25
80
0.2
0.1
0.05
µ:79 [mV]
σ :8.49 [mV]
σ / µ :10.7 %
60
40
0.1
VIN
0.2
0.1
0
0
0.3
250
0.15
0.1
0.05
20
0
0
0.04 0.06 0.08
SNM distribution[V]
300
Occurrences
100
VOUT
0.25
Occurrences
120
0.3
0.15
0.02
(b)
0.3
0.2
0
0
0.3
(a)
VOUT
100
0.02
0.04 0.06 0.08
SNM distribution[V]
µ:105 [mV]
σ :2.19 [mV]
σ / µ :2.08 %
150
100
50
0
0
0.1
200
0.1
VIN
0.2
0.3
(c)
0
0
0.02
0.04 0.06 0.08
SNM distribution[V]
0.1
(d)
50
0
0
W.C.
5
10
15
Normalized delay
(a)
20
25
Rise delay
µ:11.2 [ns]
σ:1.9 [ns]
Fall delay
µ:0.201 [µ s]
σ:54.3 [ns]
100
W.C.
50
0
0
10
20
30
Normalized delay
(b)
40
Rise delay
300 µ:66.2 [ns]
σ:37 [ns]
300
Occurrences
Fall delay
µ:62.3 [ns]
σ:16.8 [ns]
100
150
Occurrences
200
Rise delay
150 µ:6.57 [ns]
σ:1.11 [ns]
Occurrences
Occurrences
Fig. 3. Butter-fly curves and SNM distribution of a) LVT, b) SVT, c) HVT, d) DVT inverters from 1000 point Monte-carlo simulation for local variations
in typical-typical (TT) corner at 300 mV and 27 ◦ C.
Fall delay
µ:2.38 [µ s]
σ:0.744 [µ s]
200
100
0
0
W.C.
20
40
60
Normalized delay
(c)
80
100
Rise delay
µ:66.4 [ns]
σ:35.9 [ns]
200
Fall delay
µ:66 [ns]
σ:16.9 [ns]
100
0
0
2
4
Normalized delay
W.C.
6
(d)
Fig. 4. Delay variations of NAND3 with different configurations at 300 mV and 27 ◦ C for 1000 point MC simulations in TT process corner. All inputs are
toggling simultaneously. The fan-out at the output of all cases is 4. The down-pointing arrows show the worst-case delays. a) LVT, b) SVT, c) HVT, d) DVT
Download