 Motivation

 Background

 Contributions of This Work

 Future Work

 Summary


Demand for energy constrained design has increased tremendously, like portable electronics, medical electronics and sensors

 Minimum energy operation typically occurs in sub-threshold region [1]

 Increasing problems of leakage currents as technology scales down

 Dual-Vth technique is an effective approach to suppress leakage but has not been explored in sub-threshold region

[1] A. Wang, B. H. Calhoun and A. P. Chandrakasan, Sub-threshold design for ultra low-power systems, Springer, 2006



Demonstrate the effectiveness of dual-Vth method on energy per cycle (EPC) reduction

Minimum EPC design of sub-threshold circuits by dual-Vth method




 Motivation


 Contributions of This Work

 Future Work

 Conclusion

Sub-threshold Circuit Applications

Analog circuits like amplifier, oscillator in 1970 and 80 s


Energy constrained circuits, portable devices, Gyroscope

Wrist watch in 1970s

Micro sensors, pacemakers since 1990s [8-9]

Digital CMOS circuits:

DLMS filter, sensor processor, FFT processor, μ controller since 2000s [10-13]


Sub-threshold Circuits: Vdd < Vth

 Low power and energy consumption compared to above-threshold circuits

 Minimum EPC typically occurs in sub-threshold range

HSPICE simulation of an 8-bit Ripple Carry Adder

Minimum energy is achieved when dynamic energy is equal to leakage energy


Kim: 16-bit Ripple Carry Adder (RCA)




 K. Kim, Ultra Low Power CMOS Design, PhD Dissertation,

Auburn University, May 2100.

 Need dual voltage supply, level converters, . . .

 Can we get more saving with dual threshold voltages?


MOSFET Sub-threshold Operation

Vgs < Vth

 Sub-threshold operation or Weak inversion operation

Transistor is NOT completely OFF

Small amount of electrons flow from Drain to Source


MOSFET Sub-threshold Operation

Vgs < Vth

 Sub-threshold current I sub is dominant [1] where when Vds > 3Vt , I sub can be further simplified to

Note: μ is effective mobility, Cox is oxide capacitance, W is transistor width, L is transistor length, Vgs is gate-source voltage, Vds is drain-source voltage, Vt is thermal voltage

( 25mV at 300K ), Vth is threshold voltage, n is sub-threshold slope, η is DIBL effect coefficient


MOSFET Sub-threshold Operation

Vgs < Vth


HSPICE simulation results of drain current I



GS vs. gate-source voltage for PTM 32nm bulk CMOS technology NMOS transistor with

Wn=5L , Vth = 0.329V at Vdd = 0.9 V

Sub-threshold Inverter

 Circuits function correctly in sub-threshold region but come with large delay ( 500x larger )

Supply Voltage

Vdd (V)

Vdd = 0.2

Vdd = 0.3

Vdd = 0.4

Vdd = 0.5

Vdd = 0.7

Vdd = 0.9


Delay (ns)







HSPICE simulation results of Voltage Transfer Curve of an inverter in PTM 32nm bulk CMOS technology at Vdd=0.2V with varying transistor sizing ratio β = Wp / Wn


HSPICE simulation results of Inverter delay under varying supply voltages in PTM 32nm bulk CMOS technology with Wn = 5L and Wp = 12L, fan-out is one inverter



 Motivation

 Background

Contributions of This Work

Single-Vth design

Dual-Vth minimum EPC design

 Future Work

 Conclusion

Single-Vth Design of

Sub-threshold Circuits

EPC is independent of Vth

Increasing Vth can not reduce EPC

 EPC for single low Vth and single high Vth designs remain same

 High Vth design reduces leakage power but increases delay

Two effects cancel out


HS model 0.328 V -0.291 V

LP model 0.549 V -0.486 V

HSPICE simulations for EPC for 32-bit RCA single-Vth designs in

PTM 32nm bulk CMOS technology with Wn=5L Wp=12L.Each

design runs at its maximum operating frequency


Threshold voltage of PTM 32nm models calculated in HSPICE at Vdd = 0.9 V

Single-Vth Design of

Sub-threshold Circuits

On current I on with Vgs = Vdd

Off current I off with Vgs = 0

Gate delay D

C is gate capacitance of a characteristic inverter


Single-Vth Design of

Sub-threshold Circuits

Circuit delay Tc

Vth factor is canceled out

C is gate capacitance of a characteristic inverter, Ceff is average switched capacitance per clock cycle in the circuit, l is the length of critical path in terms of a characteristic inverter


General Dual-Vth Design Procedure

 Low Vth gate is fast but more leaky; used on critical paths to maintain high speed

High Vth gate is slow but less leaky; used on non-critical paths to reduce leakage

 Normally, start with assigning low Vth to all gates and switch as many gates as possible to high Vth to reduce leakage [2]

[2] D. Flynn, R. Aitken, A. Gibbons and K. Shi, Low Power Methodology Manual: For System-on-Chip Design.

New York: Springer, 2007


Dual-Vth Minimum EPC Design

 Dual Vth design reduces EPC by inserting high Vth gates to reduce leakage power while keeping the operating frequency unchanged

 This is the maximum operating frequency obtained for the single low Vth design

 For given circuit netlist, the proposed framework uses the gate slack based algorithm to generate optimum dual-Vth design with minimum EPC, optimum Vdd, optimum high Vth level and estimate the EPC



 Assuming each gate has one unit time (t


) of gate delay, gate 9 is regarded as non-critical path gate.

However, if gate 9 is a high Vth gate with 4 t

0 delay, a new critical path would be created. The critical path delay would be changed from 6 t

0 to 8 t



Gate Slack Based Dual-Vth Algorithm *

Name Definition

Tpi (i)

Tpo (i)

D (i) the longest time for an event to arrive from PI to gate i the longest time for an event to reach a PO from gate i

Dp (i)


S (i)

Gate delay of gate i

The path delay of the longest path through gate i

Dp (i) = Tpi (i) + Tpo (i) + D (i)

Critical path delay of the whole circuit

Tc = Max { Dp (i) }

Gate slack

S (i) = Tc – Dp (i)

Dh (i) , Dl (i) Gate delay of gate i with low Vth or high Vth


Delta (i)


Gate delay difference for gate i

Delta (i) = Dh (i) – Dl (i)

Upper boundary for slack

Su = ( k-1 ) / k * Tc and k = Tc ’ / Tc


Lower boundary for slack

Sl = Min { Delta (i) }

* Note: Algorithm is modified for dual-Vth design based on previous work in [14-17]

Gate Slack Based Dual-Vth Algorithm

 Step 1: Library Characterization

 Construct high Vth gate by applying different reverse body bias voltages on PTM HS model

Low Vth Gate zero bias


High Vth Gate reverse bias = 0.1 V

Body bias

Threshold voltage

NMOS PMOS zero bias 0.328 V -0.291 V bias = 0.1V

0.348 V -0.309 V bias = 0.2 V 0.367 V -0.327 V bias = 0.3 V 0.385 V -0.344 V bias = 0.4 V 0.402 V -0.360 V bias = 0.5 V 0.419 V -0.375 V bias = 0.6 V 0.435 V -0.389 V bias = 0.7 V 0.450 V -0.403 V bias = 0.8 V 0.465 V -0.417 V

Threshold voltage of PTM 32nm bulk CMOS technology HS models with varying reverse bias voltages calculated by HSPICE at Vdd = 0.9 V

Gate Slack Based Dual-Vth Algorithm

 Step 1: Library Characterization

 Calculate gate delay, power consumption, nodal capacitance of basic logic gates under varying Vdd,

Vth, fan-out conditions

 Step 2: Initialization

Assign each gate to low Vth initially

 Step 3: First Round of Selection

Run Static Timing Analysis (STA),

If S (i) > Su, gate i can directly switch to high Vth

If S (i) < Sl, gate i can never switch to high Vth

If S (i) > Delta (i), gate i can possibly switch



Gate slack analysis for 8-bit Ripple Carry Adder





Gate Slack Based Dual-Vth Algorithm

 Step 4: Verification

For any gate j selected in step 3, switch it to high

Vth, and re-run STA to calculate circuit delay Tc,

If newly calculated Tc_new ! > original Tc, gate j can switch to high Vth

Step 5: Results

Generate dual Vth design, estimate EPC and find out optimum Vdd and high Vth level with lowest


EPC estimation


Ceff (i) = α (i) * C (i) = the product of gate output activity and nodal capacitance

C (i) and Pleak (i) are obtained from HSPICE simulations of basic logic gates under varying conditions, α (i) is obtained from Modelsim simulations with real gate delays

Implementation Results

32-bit RCA

 Single-Vth design

Min EPC = 2.268E-014 J

Optimum Vdd = 0.31V

Frequency = 3.99 MHz

 Dual-Vth design

Min EPC = 1.610E-014J

Optimum Vdd = 0.24V

Optimum Bias = 0.3V

Frequency = 0.82 MHz

Min EPC reduction: 29%

HSPICE simulations of EPC for 32-bit RCA single and dual-Vth designs in PTM 32nm bulk CMOS technology with Wp=12L and Wn=5L


Implementation Results

32-bit RCA

Single low

Vth design

Single low

Vth design

Single high

Vth design

Bias =


High Vth Vs. Normalized minimum EPC points from single-Vth and dual-Vth designs


Bias = 0.3V

Single high

Vth design

High Vth Vs. Optimal Vdd points from single-Vth and dual-Vth designs

Implementation Results


 Minimum EPC reduction is between 10.8% and 29% from

4-by-4 multiplier and 32-bit RCA respectively















7.59 E-15 J

C432 7.21 E-15 J

C499 2.13 E-14 J

C880 1.43 E-14 J

C1355 1.98 E-14 J

C1980 3.14 E-14 J

C2670 5.09 E-14 J

0.26 V

0.28 V

0.27 V

0.25 V

0.26 V

0.27 V


6.77 E-15 J

6.32 E-15 J

1.85 E-14 J

1.06 E-14 J

1.73 E-14 J

2.68 E-14 J

3.71 E-14 J

0.21 V 10.8%

0.26 V

0.26 V



0.22 V 25.9%

0.24 V 12.28%

0.25 V 14.52%



32 RCA 2.26 E-014 J 0.31V

1.610 E-014 J 0.24 V 29%


Implementation Results

Estimation Accuracy

 HSPICE Simulation

Min EPC = 1.61E-014J

Optimum Vdd = 0.24V

 Estimation

Min EPC = 1.77E-014J

Optimum Vdd = 0.25V

 The average error between estimation and simulation is 6.99%

HSPICE simulations Vs. estimation for EPC for 32-bit RCA dual-

Vth design at bias = 0.3V in PTM 32nm bulk CMOS technology


Result Analysis

 Minimum EPC occurs when dynamic energy is equal to leakage energy

 Minimum EPC reduction comes from Vdd reduction

Dynamic energy and leakage energy analysis for 32-bit

RCA single-Vth and dual-Vth design


 Reduction of leakage energy comes from leakage power reduction and unchanged circuit period

Results Analysis

 Theoretical analysis to verify the observed 29% minimum EPC reduction on 32-bit RCA

 Step 1: Leakage energy characterized as 3 rd degree polynomials based on HSPICE simulation results on leakage power of 32-bit RCA with single low Vth or high Vth (with bias=0.3V) as well as circuit delay with single low Vth where p1 = -2.9 E-12, p2 = 3.46 E-12, p3 = -1.4 E-12 and p4 = 1.95 E-13 where h1 = -3.4 E-13, h2 = 4.19 E-13, h3 = -1.75 E-13 and h4 = 2.54 E-14


RMSE and regression coefficient R-squared analysis of polynomial fit for leakage energy

Results Analysis

 Step 2: Dynamic energy characterized as 2 nd degree polynomial based on HSPICE simulation results on total energy and leakage energy of 32-bit RCA with single low Vth

Where a = 1.65 E-13 and b = -2.1 E-16

 Step 3: Single-Vth design

Optimal Vdd = 0.305 V

31 where p1 = -2.9 E-12, p2 = 3.46 E-12, p3 = -1.4 E-12, p4 = 1.95 E-13 a = 1.65 E-13 and b = -2.1 E-16

Results Analysis

 Step 4: Dual-Vth design

X = fraction of high Vth gates in the circuit and

1- X = fraction of low Vth gates

Where K1 = x * h1 + (1-x) * p1

K2 = x * h2 + (1-x) * p2 + a

K3 = x * h3 + (1-x) * p3

K4 = x * h4 + (1-x) * p4 +b

X = 198/288 in optimal dual-Vth design

Optimal Vdd = 0.254 V where p1 = -2.9 E-12, p2 = 3.46 E-12, p3 = -1.4 E-12, p4 = 1.95 E-13, a = 1.65 E-13, b = -2.1

E-16, h1 = -3.4 E-13, h2 = 4.19 E-13, h3 = -1.75 E-13 and h4 = 2.54 E-14


Results Analysis

 Step 5: Calculate minimum EPC saving between single-low Vth and dual-Vth design

 Theoretical results show minimum EPC saving is

33.4 %

 Blue curve only express a lower bound of energy saving

In practical, circuit delay increases as Vth in single-Vth design increseas


Low Vth


High Vth

Dual Vth

HSPICE simulation results vs. theoretical analysis of energy ratio of

32-bit RCA dual-Vth design with bias = 0.3V and single-Vth design



 Motivation

 Backgroud

 Contributions of This Work

Future Work

 Conclusion


Future Work

Robust Dual-Vth Design

 Why do we need robust design?

Process variation causes variance in circuit performance and lower yield

 Process variation issue gets worse in sub-threshold circuits due to exponential relation between I sub and Vth


Future Work

Combine Dual-Vth with Different

Low Power Design Methods

 Dual-Vth only reduces leakage energy

Dual-Vth and Dual-Vdd : Reduce both dynamic and leakage energy

Dual-Vth and Transistor Sizing: Reduce both dynamic and leakage



 Motivation

 Backgroud

 Contributions of This Work

 Future Work




 EPC of single-Vth design is independent of Vth

 Dual-Vth approach is effective to suppress leakage and reduce minimum EPC

 For given circuit, the proposed framework uses the gate slack based algorithm to generate optimum dual-Vth design with minimum EPC, optimum Vdd, optimum high

Vth level and estimate the EPC

 For 32-bit RCA, minimum EPC is reduced by 29% by dual-Vth approach; for 4-by-4 multiplier, minimum EPC is reduced by 10.8%; for ISCAS85 benchmark circuits, energy saving is between this range



