Sizing of Dual-VT Gates for Sub-VT Circuits Babak Mohammadi, S.M. Yasser Sherazi, and Joachim Neves Rodrigues Electrical and Information Technology, Lund University, 22100 Lund, Sweden {babak.mohammadi, yasser.sherazi, joachim.rodrigues}@eit.lth.se I. I NTRODUCTION Scaling down the supply voltage (VDD ) to the sub-threshold (sub-VT ) regime is well known as an effective method for energy reduction [1]. Unfortunately, the exponential dependence of the sub-VT currents on process parameters like threshold voltage (VT ), makes the transistor performance and functionality extremely vulnerable to process variations [2]. Thus, the transistor’s performance in terms of delay and reliability is considerably degraded compared to super-VT operation [3]. This reduces the maxima of the attainable throughput and adds extra energy overhead to the design. Sub-VT optimized designs are often realized by full-custom cells (FCL) [4], [5]. In this case, the impact of process variations is combated by transistor up-sizing. Transistor sizing improves the timing, i.e., equalized rise/fall time, and increased noise margins at the cost of higher area and energy [5]. Modern sub-micron CMOS technologies are offered with different threshold options which gives designers the opportunity to address the leakage energy by employing high-VT gates, whereas performance is improved by using low-VT devices [6]. However, this method is mainly employed on gate level. The advantages of using different threshold options on schematic level, i.e., inside gates is not well explored in literature. Contribution: In this work the performance and reliability degradation of gates operated in the sub-VT regime is addressed. To speed-up the performance bottlenecks in gates and balance the driving strength of pull-up and pull-down networks (PUN and PDN), selected transistors are replaced by their lower-VT equivalent. This method is referred as dual-VT (DVT) in this study. The performance gain of the proposed optimization techniques is analyzed by means of extensive Monte-Carlo (MC) simulations of an inverter, a NAND3, and NOR3 gate. 4 ION−NMOS/ION−PMOS Abstract—This paper presents a novel method to improve the performance of sub-threshold (sub-VT ) gates in 65-nm CMOS technology. Faster transistors with a lower threshold voltage are introduced in the weaker network of a gate. It is shown that the employed method significantly enhances the reliability and performance of the gate, with an additive advantage of a lower area cost compared to traditional transistor sizing. Extensive Monte-Carlo simulations are carried out to verify the proposed optimization technique. The simulation results predict that the NAND3 and NOR3 testbench shows a 98% higher noise margin. Furthermore, the inverter and NAND3 gates show an speed improvement of 48% and 97%, respectively. 1 Wp 3 2 Wp 2 5 Wp 1 0 10 Wp 0.2 0.4 VDD [V] 0.6 Fig. 1. The ratio of active currents of HVT-NMOS and HVT-PMOS in sub-VT . VT of transistors ∼700 mV. WP is the min size allowed in the technology. The NMOS transistor has the minimum width. The remaining of the paper is structured as follows: Sec. II, describes the theory behind the method employed. In Sec. III, the Dual-VT approach is presented for basic combinational gates and elaborated by means of MC simulations. Finally, conclusions are drawn in Sec. IV. II. T HEORY CMOS processes are designed and optimized for superVT operation. Consequently, all the optimization techniques in super-VT need to be carefully analyzed for their efficiency in the sub-VT domain. The transistor strength balancing is one of these techniques which has an important effect on design’s performance and reliability. The driving balance of a circuit depends on different process parameters, i.e., the primary process parameter VT and secondary parameters drain induced barrier lowering (DIBL) and subthreshold slope. The traditional method to equalize the imbalance is transistor sizing. This is done by a relatively low size-ratios of PMOS and NMOS in the super-VT regime. However, the transistor size-ratios become very large in the sub-VT domain, see Fig. 1. The peak current ratio between PMOS and NMOS is found in the sub-VT regime. Furthermore, it is observed that by upsizing the PMOS transistor by 10×, a strength balancing is still not achieved. The imbalance between the PUN and PDN worsens even more when transistor stacking is required to form a complex gate (architectural dependency), as this reduces the current driving capability. In this case the transistor size ratios reach impractical large values. The balanced strength improves the gate’s stability and robustness, as the switching threshold voltage (Vm ) moves to its ideal value (VDD /2) and increases the noise-margins (N M ). Unbalanced switching threshold and low NMs are among the main sources of functionality and stability failures in sub-VT regime. Therefore, designing the gates with maximum possible NM (N ML = N MH ) is of vital importance. In this study, the use of different threshold options in the PUN and PDN as an alternative to traditional transistor sizing is investigated. The lower-VT transistors are introduced in the slowest network, either PUN or PDN, to improve their driving capability. The best VT option for PUN and PDN transistors depends on VDD and the architecture of the gate. However, finding the best VT option for each VDD and architecture is out of the scope of this study. The effectiveness of the proposed method is demonstrated on 3 logic gates; an inverter, a NAND, and a NOR gate at 300 mV. To consider more extreme cases with stacked transistors in PUN or PDN, 3 input NAND and NOR gates are used in simulations, since they contain 3 stacked transistors in PDN and PUN, respectively. III. R ESULTS The driving imbalance has a strong exponential dependency on mismatch of process parameters (VT , slope factor and DIBL coefficient) in PMOS and NMOS transistors, where the transistor width has a linear relation in sub-VT regime [7]. Consequently, small differences in exponential process parameters require large changes in the linear current coefficient, i.e., device dimensions to compensate for the changes in exponential process parameter mismatch to compensate. Therefore, this method has a large area penalty. Figure 2(a) shows the NMOS and PMOS sizes for an ideal Vm in inverters implemented with different threshold options at 300 mV. The process used in this study has 3 different threshold options which are 1) high-VT (HVT) with a VT of ∼ 700 mV, 2) standard-VT (SVT) with a VT of ∼ 560 mV and 3) low-VT (LVT) with a VT of ∼ 450 mV. It is observed that in the LVT and HVT inverters, the required minimum transistor width is ∼ 8× the Wmin (minimum allowed width of transistor in the technology), whereas for the SVT inverter the ratio is ∼13×. The voltage transfer characteristic (VTC) of a balanced inverter based on the DVT method is shown in Fig 2(b). It is observed that Vm of pure LVT and SVT inverters is 127 mV, however, by replacing the PMOS transistor in the SVT inverter by a LVT transistor, an ideal Vm (VDD /2) is achieved. To get the VTC with pure HVT, SVT and LVT transistors, the PMOS transistors need to be upsized by 6.8×, 12.4× and 7.7×, respectively, while keeping the NMOS at minimum size. Thus, the area cost of this proposed technique is much lower, compared to the traditional transistor sizing. A commonly practised functionality metric in the static logic is static noise margin (SNM) measurement. This metric is mainly used in SRAM stability analysis, however, it is shown in [8] that the SNM of two back to back gates is equal to maximum noise that can be applied to a long chain of the same gates. Fig. 2(c) shows the benchmark used for SNM analysis. The reason for the selection of NAND and NOR is because they give the worst input low voltage (VIL ) and input high voltage (VIH ). The input number of NAND and NOR is selected to be 3 to consider worst case output swing, as each gate has 3 stacked transistors which reduces the driving capability. The DVT approach for NAND3 and NOR3 for 300 mV and the process used in this study is shown in Fig. 2(c). The best balanced Vm match for NAND3 is SVT transistors for PUN and LVT transistors for the PDN. The best balanced Vm match for NOR3 gate is LVT for the PUN and HVT for PDN. The transistor sizes of DVT-NAND3 and DVTNOR3 is the same in PUN and PDN. However, the NOR3 gate in a standard-cell library (SCL), employs 85% and 25% wider transistors in the PUN and PDN, respectively. To consider the local variations, 1000 point Monte-carlo simulations at 300 mV and 27 ◦ C are performed. The same setting and simulations is performed for the gates with NAND3 and NOR3 gates in SCL. The simulation results are shown in Fig. 3. By comparing the butterfly curves, it is observed that the DVT approach, despite of having narrower transistors in the NOR3 gate, has higher symmetrical curves and the SNM windows are larger. By comparing the SNM distributions in Fig 3, it is concluded that the DVT approach, by having a mean SNM voltage of 105 mV, is ∼ 47% better than single threshold gates. Furthermore, the SNM variation is ∼ 82% lower than other cases, and worst-case SNM is found at 90 mV, which in average is 96% lower than other single-VT approaches. The same gates in the SNM testbench, i.e. NAND3 and NOR3 are used for timing analyses. Fig. 4 represents the delay distributions of a NAND3 gate with different threshold options. All the inputs in the benchmark toggled simultaneously. As expected, the mean fall delay in HVT-NAND3 gate is 35× higher than its rise delay and worst-case rise delay is 93× longer. But as shown in Fig. 4(d), the mean delay of rise and fall delays are almost equal. The rise delay of the SVT and LVT gates are shorter than the DVT gate, but since their their fall delay is equal to or higher than DVT, there is no overall performance gain. It is observed that by employing DVT, the performance boosts to LVT gate levels, while the static energy dissipation remains low in the HVT levels. The similar behaviour is observed for NOR3 gates. IV. C ONCLUSION In this study, it is shown that a dual-VT approach in schematic level results in a higher both performance and reliability. The SNM of the NAND3 and NOR3 gates show an improvement of 47% over the same setting with SCL gates. The overall performance gain of a DVT-inverter and DVTNAND3 gates is 45 % and 67 %, respectively, compared to the gates in the SCL. Furthermore, the MC simulations confirm a lower worst-case delay and noise-margins. Additionally, the proposed technique is highly area efficient. ACKNOWLEDGMENT This work was kindly supported by the Swedish Vetenskapsrådet (621-2011-4540), and Swedish VINNOVA Industrial Excellence Centre (SOS). R EFERENCES [1] A. Wang et al., “Optimal supply and threshold scaling for subthreshold cmos circuits,” in IEEE ISVLSI, 2002, pp. 5 –9. [2] A. Tajalli et al., “Design trade-offs in ultra-low-power digital nanoscale cmos,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 9, pp. 2189 –2200, sept. 2011. [3] P. Friedberg et al., “Modeling within-die spatial correlation effects for process-design co-optimization,” in ISQED, march 2005, pp. 516 – 521. [4] P. Meinerzhagen et al. [5] S. Luetkemeier et al., “A 200 mV 32b subthreshold processor with adaptive supply voltage control,” in IEEE ISSCC Papers, feb. 2012, pp. 484 –486. [6] D. Bol et al., “A 25 MHz 7 uW/MHz ultra-low-voltage microcontroller SoC in 65nm LP/GP CMOS for low-carbon wireless sensor nodes,” in IEEE ISSCC, feb. 2012, pp. 490 –492. [7] J. Kwong et al., “Variation-driven device sizing for minimum energy subthreshold circuits,” in Proceedings of ISLPED, oct. 2006, pp. 8 –13. [8] J. Lohstroh et al., “Worst-case static noise margin criteria for logic circuits and their mathematical equivalence,” IEEE JSCC, vol. 18, no. 6, pp. 803 –807, dec. 1983. 0.3 16 SVT 0.25 12 VOUT [V] Normalized PMOS width 14 10 8 HVT 6 LVT 0.2 Dual−VT LVT SVT VN LVT LVT 0.15 127 mV 150 mV 0.1 LVT SVTX3 LVT LVT 0.05 4 VN 2 1 1.5 2 2.5 3 Normalized NMOS width 3.5 4 HVTX3 LVT 0 0 0.1 (a) VIN[V] 0.2 0.3 (b) (c) Fig. 2. a) Required NMOS and PMOS widths for having ideal Vm in the HVT, SVT and LVT inverters at 300 mV, b) Voltage transfer curves (VTC) of SVT, HVT and DVT inverters, c) 3 input DVT-NAND and DVT-NOR gates in the benchmark used for static noise margin (SNM) extraction. 150 0.3 0.3 0.1 100 Occurrences 0.15 0.2 VOUT 0.2 VOUT 150 0.25 Occurrences 0.25 µ:66.3 [mV] σ :7.87 [mV] σ / µ :11.9 % 50 0.15 0.1 0.05 µ:69.7 [mV] σ :8.08 [mV] σ / µ :11.6 % 50 0.05 0 0 0.1 VIN 0.2 0 0 0.3 0.02 0.04 0.06 0.08 SNM distribution[V] 0 0 0.1 0.1 VIN 0.2 0.25 80 0.2 0.1 0.05 µ:79 [mV] σ :8.49 [mV] σ / µ :10.7 % 60 40 0.1 VIN 0.2 0.1 0 0 0.3 250 0.15 0.1 0.05 20 0 0 0.04 0.06 0.08 SNM distribution[V] 300 Occurrences 100 VOUT 0.25 Occurrences 120 0.3 0.15 0.02 (b) 0.3 0.2 0 0 0.3 (a) VOUT 100 0.02 0.04 0.06 0.08 SNM distribution[V] µ:105 [mV] σ :2.19 [mV] σ / µ :2.08 % 150 100 50 0 0 0.1 200 0.1 VIN 0.2 0.3 (c) 0 0 0.02 0.04 0.06 0.08 SNM distribution[V] 0.1 (d) 50 0 0 W.C. 5 10 15 Normalized delay (a) 20 25 Rise delay µ:11.2 [ns] σ:1.9 [ns] Fall delay µ:0.201 [µ s] σ:54.3 [ns] 100 W.C. 50 0 0 10 20 30 Normalized delay (b) 40 Rise delay 300 µ:66.2 [ns] σ:37 [ns] 300 Occurrences Fall delay µ:62.3 [ns] σ:16.8 [ns] 100 150 Occurrences 200 Rise delay 150 µ:6.57 [ns] σ:1.11 [ns] Occurrences Occurrences Fig. 3. Butter-fly curves and SNM distribution of a) LVT, b) SVT, c) HVT, d) DVT inverters from 1000 point Monte-carlo simulation for local variations in typical-typical (TT) corner at 300 mV and 27 ◦ C. Fall delay µ:2.38 [µ s] σ:0.744 [µ s] 200 100 0 0 W.C. 20 40 60 Normalized delay (c) 80 100 Rise delay µ:66.4 [ns] σ:35.9 [ns] 200 Fall delay µ:66 [ns] σ:16.9 [ns] 100 0 0 2 4 Normalized delay W.C. 6 (d) Fig. 4. Delay variations of NAND3 with different configurations at 300 mV and 27 ◦ C for 1000 point MC simulations in TT process corner. All inputs are toggling simultaneously. The fan-out at the output of all cases is 4. The down-pointing arrows show the worst-case delays. a) LVT, b) SVT, c) HVT, d) DVT