Exploiting Body Biasing For Leakage Reduction: A Case Study

advertisement
2013 IEEE Computer Society Annual Symposium on VLSI
Exploiting Body Biasing For Leakage Reduction:
A Case Study
Andrea Manuzzato1,2, Fabio Campi1, Davide Rossi1, Valentino Liberali2, and Davide Pandini1
1
Central CAD and Design Solutions, STMicroelectronics, Agrate Brianza, Italy
2
Dipartimento di Fisica, Università degli Studi di Milano, Milano, Italy
Abstract—In modern System-on-Chip (SoC) designs, the
fulfillment of power constraints is one of the most important and
challenging tasks. In this framework, the use of both voltage
scaling and body biasing techniques is a mainstream strategy
largely used for leakage power reduction. This work presents a
case study to evaluate the impact of these techniques on an
industrial microprocessor-based design. We analyze the impact
of body biasing in terms of area penalties and routing efforts.
Furthermore, a complete analysis flow is proposed to evaluate the
achievable leakage reduction and the expected performance
degradation. In order to overcome the limited spectrum of
operating configurations covered by a given library set, we
propose a practical and effective methodology based on a
standard digital design and characterization flow. By using this
procedure, a designer can efficiently evaluate the most
appropriate leakage/timing trade-offs, and consequently
determine the best supply voltage and biasing configurations to
implement the design. The experimental results on our testcase
demonstrate that body biasing leads to a leakage reduction up to
six times with respect to the standard reference supply voltage
configuration.
Keywords-Body biasing; leakage reduction; voltage scaling;
power dissipation; low power; timing analysis
I.
INTRODUCTION
As semiconductor technology evolves, meeting the power
budget has become one of the most important objectives in
SoC design. In particular, one of the major concerns about
power dissipation is related to the dramatic increase of the
leakage current component. As forecasted by the International
Technology Roadmap for Semiconductors (ITRS) [1], with
the device feature size steadily shrinking into the nanometer
range, the relative contribution of leakage to the overall power
Figure 1. Triple-well process cross-section
978-1-4799-1331-2/13/$31.00 ©2013 IEEE
consumption is consistently growing. Indeed, ITRS has
predicted that this kind of static power dissipation will exceed
the dynamic power component in the more advanced
technologies. Obviously, this is a very severe issue, especially
for applications with short processing time and long stand-by
periods, such as sensor and biomedical applications.
Body biasing is an appealing technique that is encountering
significant success in modern IC design [4]. By means of an
appropriate bias voltage applied to the silicon substrate, it is
possible to alter the electrical characteristics of the transistors,
i.e., the threshold voltage, thus modifying the device leakage
dissipation and timing performance. Typically, in triple-well
processes (Figure 1), P-type and N-type substrates are biased
symmetrically with respect to the reference power supply
levels (VDD for N-type substrate and GND for P-type
substrate). With this manufacturing process, it is possible to
set the bias voltages independently for the nMOS and pMOS
devices. The power/ground supply lines for bulk biasing are
usually named VDDS (pMOS devices, N-type substrate) and
GNDS (nMOS devices, P-type substrate). In particular, a
forward body biasing (FBB) of the substrate (VDDS < VDD,
GNDS > GND) would allow a faster timing behavior of the
library standard cells, while introducing higher leakage
consumption. In contrast, a reverse body biasing (RBB) of the
substrate would require a smaller leakage consumption, while
imposing higher delays in standard cell switching activity. As
a consequence, body biasing can be used as a powerful tool to
finely tune the speed/leakage ratio depending on the design
constraints. The designer may determine his/her own ideal
trade-off by selecting the precise VDD/VDDS combination
(and for symmetry the corresponding GND/GNDS pair).
This paper focuses on the extensive application of body
biasing to a standard cell-based digital design. Our goal is to
enable the designer to explore and select the best trade-off
between speed and leakage dissipation that meets the design
constraints, i.e., to deliver the required timing performance
with the lowest possible leakage. However, a standard digital
design flow typically does not support the user with sufficient
information. Power and timing figures are stored in a Liberty
file that contains the results of SPICE-level characterization
for each library cell in different states and corners [2]. We
define a design corner as the set of all process, voltage, and
temperature (PVT) conditions that impact the cell behavior.
Since the library characterization procedure is computationally
133
Table 1. Body biasing configurations
VDDS (V)
FBB
0.90
Zero-bias
RBB
0.70
GNDS (V)
0.20
1.10
1.30
0.40
0.00
1.50
-0.20
-0.40
expensive and time-consuming, the number of available
corners for each cell library is limited.
The introduction of body biasing represents in practice a
fourth parameter in the corner space. While some of the most
recent cell libraries include few extra bias points in the set of
available corners, an exhaustive coverage of VDD/VDDS (and
GND/GNDS) pairs is impractical, due to the high
characterization cost. For the same reason, during logic
synthesis the designer cannot explore all possible VDD/VDDS
combinations to determine the optimal leakage/timing tradeoff. Moreover, even if the optimal configuration is feasible, it
will be jeopardized after the physical design step due to the
contribution of parasitic loads extracted from interconnection
lines and by cell resizing imposed by place&route and clocktree synthesis (CTS). In the end, the final design
implementation might exhibit an unnecessarily redundant
leakage consumption that could be easily adjusted by
switching to a more suitable VDD/VDDS configuration in
post-design, while guaranteeing timing constraints.
In this paper, we propose an efficient methodology to
perform leakage recovery on a microprocessor-based design,
by allowing the designer to select a VDD/VDDS configuration
that will preserve the timing constraints at the minimal
feasible leakage power dissipation. In particular, in our work
we considered static biasing configuration, i.e., the bias
voltage is kept at a fixed value during circuit operations. First,
we demonstrate that the additional circuitry necessary to
implement body biasing during the back-end flow is minimal.
After verifying the negligible impact on die area, we show
how it is possible to evaluate the benefit/penalty introduced by
body biasing. In order to overcome the limited operating
configurations covered by the available library set, we
characterize the cell leakage consumption by means of a fast
characterization methodology based on the Apache RedHawk
tool [3], which is our reference tool for power integrity
analysis. In this way, we can quickly compute the power
consumption of our testcase at different power supply and
substrate bias voltages. This approach allowed us to extend the
configuration exploration beyond the typical supply voltage
range covered by the given cell libraries. Furthermore, to
evaluate the impact of body biasing on timing, we extracted
the SPICE-level netlists of the critical paths, and we simulated
the propagation delay as a function of the power supply and
biasing conditions.
The paper is structured as follows. Section II introduces
some fundamental concepts on body biasing. Section III
provides a description of the case study evaluating the impact
of body biasing on area and timing. The complete
leakage/timing evaluation flow is presented in Section IV;
while Section V discusses experimental results showing an
achievable significant leakage reduction with respect to a
nominal zero-bias VDD/VDDS condition. Finally, Section VI
summarizes our conclusive remarks.
II.
BACKGROUND: BODY BIASING TECHNIQUES
Body biasing adapts the voltage level of the transistors bulk
to different operating conditions. Controlling the bulk biasing
has a strong impact on both timing performance and leakage.
In order to have a symmetric effect on N-type and P-type
transistors, biasing must be consistently applied to both device
types, so that P-type and N-type substrates have opposite bias
voltages. This is possible only in a triple-well process depicted
in Figure 1. Usually, the device junctions are zero-biased: i.e.,
the nMOS bulk is grounded and the pMOS bulk is connected
to VDD. Biasing the device bulk at different voltages has the
effect of changing the threshold voltage, which is given by [5]:
Vth = Vth 0 + γ ⋅
(
)
2 ⋅ Φ F − VBS − 2 ⋅ Φ F ,
where Vth0 is the threshold voltage for the zero-bias condition,
γ is the body effect coefficient, and ΦF is the Fermi potential.
Hence, the transistor threshold voltage can be adjusted by
changing the bulk-source voltage VBS. The threshold voltage
plays a very important role on power dissipation. In particular,
leakage current strongly depends on the transistor Vth. The
relationship between sub-threshold current and threshold
voltage can be expressed as [8][9]:
I sub = I 0 ⋅ e
VGS −Vth 0 − λ ⋅VSB +η ⋅VDS
n ⋅VT
V
− DS
⎛
⎜
⋅ 1 − e VT
⎜
⎝
⎞
⎟,
⎟
⎠
(1)
where I0 is the drain-source current (IDS) when VGS = Vth0, VT is
the thermal voltage (k·T/q), η is drain-induced barrier lowering
(DIBL) coefficient, γ is the body effect coefficient, and n the
sub-threshold slope. In this work, we consider the subthreshold current as the major contribution to the leakage
current. This is not completely true: there are various leakage
components such as the band-to-band tunneling currents and
leakage through the gate oxide [12]. However, in our target
technology, the subthreshold current is the dominant leakage
source for the considered range of VDD/VDDS
configurations. Moreover, in a given technology, the
capabilities to control/reduce other current components at
design level are quite limited. Furthermore, the propagation
delay for a CMOS gate strongly depends on the threshold
voltage and can be expressed as [10]:
tp ∝
CL ⋅VDD
C L ⋅VDD
≈
,
I DS
A ⋅ (VDD − Vth )
where CL is the fan-out load capacitance, and A is a constant
factor. Hence, the lower the threshold voltage, the better the
timing performance. However, such performance boost comes
with a penalty in terms of leakage increase.
We define FBB as the configuration in which the sourcebulk junctions are forward biased both for pMOS and nMOS
transistors. Similarly, we define RBB as the configuration
where both junctions are reverse biased. Table 1 shows the
voltage configurations for biasing a circuit operating at
134
Table 2. Testcase statistics
Zero-bias
Design area (μm2)
21571
537
537
Leaf cell count
9231
9251
Buf/inv cell count
5460
5480
14
14
Hierarchical port count
Clock-tree buf/inv cell count
III.
Figure 2. Power grid layout
nominal voltage of 1.1 V. Transistor behavior in the two body
biasing configurations is very different. FBB decreases the
threshold voltage, thus decreasing the propagation delays
while increasing power dissipation due to the higher leakage
current component. In contrast, RBB increments the threshold
voltage, thus reducing the power dissipated by leakage current
while increasing the propagation delay. Furthermore, there are
some limitations in the allowable bias voltages. In FBB we are
applying a direct bias voltage to a diode (the source-bulk and
drain-bulk junctions). In RBB the band-to-band tunneling
current increases as the reverse bias voltage is raised [9], thus
limiting the overall effect from the reduction of Isub (1).
Leakage reduction by means of RBB is a very appealing
option for power dissipation control and reduction strategies.
However, it is worth noticing that previous works such as
[14][15] presented a decreasing effectiveness of leakage
reduction with RBB in more scaled technologies.
Even in relatively mature industrial technologies such as
65/55 nm, the leakage power dissipation has become a critical
concern, since it accounts for about 50% of the total power
budget in microcontrollers and microprocessor-based designs.
At the same time, the very tight constraints imposed on silicon
area by increasingly competitive market conditions, limit the
economically viable design techniques that can be effectively
used to reduce the leakage component. Therefore, SoC
designers are working hard to achieve this target, and RBB has
emerged as one of the most interesting techniques that can be
exploited in a standard design flow on an industrial scale.
Finally, for the sake of completeness, also dynamic
applications of body biasing have been proposed as an
effective technique to mitigate the threshold voltage spread
due to process variations [6][7], and to reduce the power
consumption by adapting the threshold voltage to different
workloads [11]. The application of run-time adaptive body
biasing during circuit working operations is based on on-chip
monitors (like ring oscillators) and a body biasing controller
that can dynamically adjust the threshold voltage to keep the
circuit behavior closer to the nominal conditions.
Biased
21351
CASE STUDY DESCRIPTION
As a case study, we considered an internal testcase that is a
sub-circuit of the Manyac system described in [13]. The 20kgate design was implemented using an industrial 40 nm triplewell CMOS technology with a nominal supply voltage of
1.1 V. Synthesis was performed with Synopsys Design
Compiler exploiting the topographical capability and targeting
a 250 MHz operating frequency. In order to verify the cost of
body biasing, we implemented the same design, both with and
without the additional power distribution network circuitry
that supports body biasing.
If the designer chooses to apply either RBB or FBB, the
chip floorplan needs to be designed differently, as a separate
power/ground (P/G) grid must be routed to distribute the Ptype and N-type bulk biasing across the die. This is normally
done by inserting specific well taps in each standard cell row.
These components are library cells with no logic functionality,
explicitly designed to connect the substrate. VDDS/GNDS
metal grids are then routed on top of the VDD/GND power
distribution network to carry the voltage supply to well taps.
The substrate P/G grids are similar to the primary power
supply grids, with narrower metal widths, since they do not
have to distribute large current loads. Typically, these currents
are at least two orders of magnitude smaller than those
flowing in the primary power distribution network. Figure 2
shows a snapshot of the P/G grid layout, representing the M6
layer, with the additional bias power stripes and the well-tap
connections to the substrate. The initial RTL description for
the bias and zero-bias versions is the same, but the floorplan
passed to the topographical synthesis differs, leading to
slightly different synthetized gate-level netlists. The
differences are quite small: usually the synthesis with the body
biased floorplan uses more cells to compensate for a smaller
available routing space due to the congestion induced by the
additional VDDS/GNDS grids. Eventually, in our testcase the
difference counted only five cells. All the implementation
phase (place&route, CTS, and post-routing optimization) was
performed using Synopsys IC Compiler. As reported in Table
2, for this design the overhead due to routing the additional
power/ground stripes is quite small: ~1%. The additional
effort for routing the biased version (expressed as the number
of timing violations before final optimization) was negligible
as well.
135
Figure 3. Propagation delay as a function
of the supply/bias voltage
IV.
BODY BIASING ANALYSIS
In order to assess the post-routing version of the testcase
implemented with body biasing, we applied the following
evaluation methodology to explore the feasible timing vs.
leakage trade-offs. We analyzed different operating power
supply voltages and biasing configurations:
• Primary power supply values (VDD): 0.9 V, 1.1 V, and
1.3 V;
•
For every operating voltage, we considered the following
body biasing configurations (VDDS/GNDS): zero-bias,
±0.4 V, ±0.2 V.
A. Leakage Characterization
To investigate leakage variations, we used Apache
RedHawk for a fast assessment of the design leakage
consumption. In particular, we exploited the Apache Power
Library (APL) CDEV characterization [3], setting the bias and
supply voltages at the target values. With this approach, we
extended the analysis to voltage combinations not covered by
the available library data set. Similarly to Liberty files, the
APL/CDEV characterization performs a SPICE-level
simulation of the design library cells in the desired corners. It
is important to notice that the APL/CDEV characterization is
design-dependent; hence, it only targets the standard cells used
in the design, which is typically a much smaller subset of the
original library. Moreover, by focusing only on leakage as
opposed to full timing and power characterization, the
simulation and the definition of the test vectors is static, and is
drastically simplified since it does not take into account input
slew rates and output load capacitances of the cells.
B. Timing Performances
To investigate the timing variations induced by different
bias voltages, we considered the critical path obtained from a
nominal condition analysis. However, the critical path could
change across different VDD/VDDS configurations. As timing
Figure 4. Leakage power as a function
of the supply/bias voltage
is concerned, we assumed that the first 10 worst paths are
significant. Hence, we focused on the timing degradation of
the max-delay paths to assess the overall performance
degradation of the design. We carried out SPICE-level
simulations in the relevant VDD/GND corners using Mentor
Graphics Eldo, while the critical path netlists were extracted
with Synopsys Prime Time, and post-processed with an
internal tool to make them readable by Eldo. We considered
the SPEF file extracted with Synopsys StarRC to improve the
simulation accuracy. In this way, the true load capacitances of
the standard cells were taken into account.
V.
LEAKAGE AND TIMING RESULTS
In order to evaluate the trade-offs introduced by body
biasing, the central zero-bias VDD/VDDS point was
considered as the reference. A graphical representation of the
timing analysis results is presented in Figure 3, while Table 3
reports the same data normalized as percent deviations with
respect to the central zero-bias VDD/VDDS reference. Table 4
and Table 5 report the numerical results for both leakage and
total power consumption as a function of the voltage
configurations. Furthermore, the leakage as a function of the
supply and substrate voltages is plotted in Figure 4, where the
negative bias values indicate a RBB configuration while the
positive values represent a FBB configuration.
The experimental results confirm that leakage reduction is
strongly dependent on the RBB/FBB configuration.
Considering a leakage recovery strategy, our testcase can
reduce the overall power dissipation up to 84% with respect to
the zero-bias VDD/VDDS reference, by means of the joint
contribution of RBB and voltage scaling. As expected, Table 3
reports a strong timing degradation at the lower leakage
points, where propagation delays can be almost doubled with
respect to the reference. If design requirements dictate
stringent performance constraints, other supply/bias
configurations can be considered. For example, for the
reference VDD/VDDS, an RBB of 200 mV can lead to a 50%
136
leakage reduction, with only 9% degradation in propagation
delay.
The impact of leakage dissipation on the overall power
consumption depends on many factors: mainly technology and
application. For our testcase, the activity file for power
calculation had a high activity. In case of applications with
longer stand-by and lower processing times, the impact would
be even more marked. From Figure 3 and Figure 4, it is worth
noticing the strong asymmetry observed between timing and
leakage behavior. The leakage dissipation has a larger
variation range with respect to the variation observed for
propagation delay. In particular, an FBB configuration may
induce a greater and non-constant increase in leakage, as
shown in Figure 4. From the leakage dissipation point of view,
we can observe that in FBB the ratio between timing and
leakage is strongly unbalanced.
A few details have to be considered regarding the
implementation step, because a change in the VDD/VDDS
configuration implies a variation in the design timing
behavior. Therefore, we must guarantee the correct timing
functionality at the target operating voltage to prevent hold
and setup timing violations. If the designer decides to use a
VDD/VDDS combination with timing characterization not
covered by the available libraries, then some issues may arise
during place&route. The implementation flow is multi-corner:
usually hold violations are fixed by considering the fastest cell
library, while setup violations are fixed in the slowest corner.
Hold violations occur when signals change too fast with
respect to the active clock edge. If the target VDD/VDDS
configuration is slower than the fastest corner used during the
physical implementation, then the timing functionality is safe.
In contrast, if it is faster, the newly characterized library
should be added to the library set used during the
implementation. On the other hand, setup violations occur
when signals arrive too late with respect to the active clock
edge. In our testcase, these violations are covered by the
timing closure obtained at the maximum target operating
frequency.
VI.
while guaranteeing the correct timing functionality. As far as
leakage is concerned, we achieved a reduction up to one sixth
with respect to a reference zero-bias implementation, with a
propagation delay almost doubled. Furthermore, some details
concerning the proper implementation to preserve the correct
design timing functionality have been also discussed.
ACKNOWLEDGEMENT
This work was partially supported by the Dote di Ricerca
Applicata of Region Lombardy.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
CONCLUSION
In this work, a detailed case study to evaluate the impact on
timing and leakage of body biasing is presented. In particular,
we detailed a complete analysis flow to assess the achievable
trade-offs in terms of leakage reduction and timing
performance on a routed design, even though our solution can
be used at different steps of the implementation phase (postsynthesis, post-placement, post-CTS).
Results obtained on our testcase reported a negligible
overhead in terms of area and effort to implement the body
biasing technique. Exploration of feasible VDD/VDDS
operating conditions is based on Apache RedHawk APL
characterizations and SPICE-level simulations. The proposed
approach provided a practical and efficient method to expand
the exploration space for supply/bias voltage configurations
not covered by the available cell library set. Such flow can
deliver relevant information to designers, in order to determine
the most appropriate configuration for leakage reduction,
[11]
[12]
[13]
[14]
[15]
Semiconductor Industry Association, “International Technology
Roadmap for Semiconductors,” 2012 update, http://www.itrs.net
Liberty User Guides and Reference Manual Suite Version 2012.06,
http://www.opensourceliberty.org
Apache RedHawk User Manual. Release 12.2, Jan. 2013,
http://www.apache-da.com
J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V.
De, “Dynamic sleep transistor and body bias for active leakage power
control of microprocessors,” IEEE Journal of Solid-State Circuits, vol.
38, pp. 1838-1845, Nov. 2003.
J. M. Rabaey, A. P. Chandrakasan, and B. Nikolić, Digital integrated
circuits, 2nd edition. Upper Saddle River, NJ, Prentice Hall, 2003.
J. W. Tschanz, S. G. Narendra, R. Nair, and V. De, “Effectiveness of
adaptive supply voltage and body bias for reducing impact of parameter
variations in low power and high performance microprocessors,” IEEE
Journal of Solid-State Circuits, vol. 38, pp. 826-829, May 2003.
J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A.
P. Chandrakasan, and V. De, “Adaptive body bias for reducing impacts
of die-to-die and within-die parameter variations on microprocessor
frequency and leakage,” IEEE Journal of Solid-State Circuits, vol. 37,
pp. 1396-1402, Nov. 2002.
J. Kao, S. G. Narendra, and A. P. Chandrakasan, “Subthreshold leakage
modeling and reduction techniques,” in Proc. ICCAD, pp. 141-148, Nov.
2002.
A. Agarwal, S. Mukhopadhyay, C. H. Kim, A. Raychowdhury, and K.
Roy, “Leakage power analysis and reduction: models, estimation and
tools,” IEEE Proc. Computers and Digital Techniques, vol. 152, pp.
353- 368, May 2005.
S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J.
Yamada, “1-V power supply high-speed digital circuit technology with
multithreshold-voltage CMOS,” IEEE Journal of Solid-State Circuits,
vol. 30, pp. 847-854, Aug. 1995.
S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw, “Combined
dynamic voltage scaling and adaptive body biasing for lower power
microprocessors under dynamic workloads,” in Proc. ICCAD, pp. 721725, Nov. 2002.
K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage
current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits,” Proc. IEEE, vol. 91, pp. 305-327, Feb.
2003.
D. Rossi, F. Campi, S. Spolzino, S. Pucillo, and R. Guerrieri, “A
heterogeneous digital signal processor for dynamically reconfigurable
computing," IEEE Journal of Solid-State Circuits, vol. 45, pp. 16151626, Aug. 2010.
A. Keshavarzi, S. G. Narendra, S. Borkar, C. Hawkins, K. Royi, and V.
De, “Technology scaling behavior of optimum reverse body bias for
standby leakage power reduction in CMOS IC's,” in Proc. International
Symposium on Low Power Electronics and Design, pp. 252-254, Aug.
1999.
A. Keshavarzi, S. Ma, S. G. Narendra, B. Bloechel, K. Mistry, T. Ghani,
S. Borkar, and V. De, “Effectiveness of reverse body bias for leakage
control in scaled dual Vt CMOS ICs,” in Proc. International Symposium
on Low Power Electronics and Design, pp. 207-212, Aug. 2001.
137
Table 3. Timing vs. voltage configurations: propagation delay variations:
(values normalized to a central zero-bias VDD/VDDS configuration)
Bias Configuration
RBB
Supply Voltage (V)
-0.4 V
Zero-bias
-0.2 V
FBB
0.2 V
0.4 V
0.9
108%
83%
60%
39%
20%
1.1
18%
9%
Reference
-8%
-17%
1.3
-16%
-21%
-25%
-30%
-34%
Table 4. Leakage dissipation vs. voltage configurations:
(values normalized to a central zero-bias VDD/VDDS configuration)
Bias Configuration
RBB
Supply Voltage (V)
-0.4 V
Zero-bias
-0.2 V
FBB
0.2 V
0.4 V
0.9
-84%
-70%
-39%
41%
677%
1.1
-72%
-49%
Reference
127%
980%
1.3
-54%
-18%
59%
250%
1365%
Table 5. Total power dissipation vs. voltage configurations:
(values normalized to a central zero-bias VDD/VDDS configuration)
Bias Configuration
RBB
Supply Voltage (V)
0.9
-0.4 V
-34%
Zero-bias
-0.2 V
FBB
0.2 V
-34%
-33%
0.4 V
-30%
-9%
1.1
-1%
-1%
Reference
4%
32%
1.3
36%
37%
40%
46%
83%
138
Download