2013 IEEE Computer Society Annual Symposium on VLSI Exploiting Body Biasing For Leakage Reduction: A Case Study Andrea Manuzzato1,2, Fabio Campi1, Davide Rossi1, Valentino Liberali2, and Davide Pandini1 1 Central CAD and Design Solutions, STMicroelectronics, Agrate Brianza, Italy 2 Dipartimento di Fisica, Università degli Studi di Milano, Milano, Italy Abstract—In modern System-on-Chip (SoC) designs, the fulfillment of power constraints is one of the most important and challenging tasks. In this framework, the use of both voltage scaling and body biasing techniques is a mainstream strategy largely used for leakage power reduction. This work presents a case study to evaluate the impact of these techniques on an industrial microprocessor-based design. We analyze the impact of body biasing in terms of area penalties and routing efforts. Furthermore, a complete analysis flow is proposed to evaluate the achievable leakage reduction and the expected performance degradation. In order to overcome the limited spectrum of operating configurations covered by a given library set, we propose a practical and effective methodology based on a standard digital design and characterization flow. By using this procedure, a designer can efficiently evaluate the most appropriate leakage/timing trade-offs, and consequently determine the best supply voltage and biasing configurations to implement the design. The experimental results on our testcase demonstrate that body biasing leads to a leakage reduction up to six times with respect to the standard reference supply voltage configuration. Keywords-Body biasing; leakage reduction; voltage scaling; power dissipation; low power; timing analysis I. INTRODUCTION As semiconductor technology evolves, meeting the power budget has become one of the most important objectives in SoC design. In particular, one of the major concerns about power dissipation is related to the dramatic increase of the leakage current component. As forecasted by the International Technology Roadmap for Semiconductors (ITRS) [1], with the device feature size steadily shrinking into the nanometer range, the relative contribution of leakage to the overall power Figure 1. Triple-well process cross-section 978-1-4799-1331-2/13/$31.00 ©2013 IEEE consumption is consistently growing. Indeed, ITRS has predicted that this kind of static power dissipation will exceed the dynamic power component in the more advanced technologies. Obviously, this is a very severe issue, especially for applications with short processing time and long stand-by periods, such as sensor and biomedical applications. Body biasing is an appealing technique that is encountering significant success in modern IC design [4]. By means of an appropriate bias voltage applied to the silicon substrate, it is possible to alter the electrical characteristics of the transistors, i.e., the threshold voltage, thus modifying the device leakage dissipation and timing performance. Typically, in triple-well processes (Figure 1), P-type and N-type substrates are biased symmetrically with respect to the reference power supply levels (VDD for N-type substrate and GND for P-type substrate). With this manufacturing process, it is possible to set the bias voltages independently for the nMOS and pMOS devices. The power/ground supply lines for bulk biasing are usually named VDDS (pMOS devices, N-type substrate) and GNDS (nMOS devices, P-type substrate). In particular, a forward body biasing (FBB) of the substrate (VDDS < VDD, GNDS > GND) would allow a faster timing behavior of the library standard cells, while introducing higher leakage consumption. In contrast, a reverse body biasing (RBB) of the substrate would require a smaller leakage consumption, while imposing higher delays in standard cell switching activity. As a consequence, body biasing can be used as a powerful tool to finely tune the speed/leakage ratio depending on the design constraints. The designer may determine his/her own ideal trade-off by selecting the precise VDD/VDDS combination (and for symmetry the corresponding GND/GNDS pair). This paper focuses on the extensive application of body biasing to a standard cell-based digital design. Our goal is to enable the designer to explore and select the best trade-off between speed and leakage dissipation that meets the design constraints, i.e., to deliver the required timing performance with the lowest possible leakage. However, a standard digital design flow typically does not support the user with sufficient information. Power and timing figures are stored in a Liberty file that contains the results of SPICE-level characterization for each library cell in different states and corners [2]. We define a design corner as the set of all process, voltage, and temperature (PVT) conditions that impact the cell behavior. Since the library characterization procedure is computationally 133 Table 1. Body biasing configurations VDDS (V) FBB 0.90 Zero-bias RBB 0.70 GNDS (V) 0.20 1.10 1.30 0.40 0.00 1.50 -0.20 -0.40 expensive and time-consuming, the number of available corners for each cell library is limited. The introduction of body biasing represents in practice a fourth parameter in the corner space. While some of the most recent cell libraries include few extra bias points in the set of available corners, an exhaustive coverage of VDD/VDDS (and GND/GNDS) pairs is impractical, due to the high characterization cost. For the same reason, during logic synthesis the designer cannot explore all possible VDD/VDDS combinations to determine the optimal leakage/timing tradeoff. Moreover, even if the optimal configuration is feasible, it will be jeopardized after the physical design step due to the contribution of parasitic loads extracted from interconnection lines and by cell resizing imposed by place&route and clocktree synthesis (CTS). In the end, the final design implementation might exhibit an unnecessarily redundant leakage consumption that could be easily adjusted by switching to a more suitable VDD/VDDS configuration in post-design, while guaranteeing timing constraints. In this paper, we propose an efficient methodology to perform leakage recovery on a microprocessor-based design, by allowing the designer to select a VDD/VDDS configuration that will preserve the timing constraints at the minimal feasible leakage power dissipation. In particular, in our work we considered static biasing configuration, i.e., the bias voltage is kept at a fixed value during circuit operations. First, we demonstrate that the additional circuitry necessary to implement body biasing during the back-end flow is minimal. After verifying the negligible impact on die area, we show how it is possible to evaluate the benefit/penalty introduced by body biasing. In order to overcome the limited operating configurations covered by the available library set, we characterize the cell leakage consumption by means of a fast characterization methodology based on the Apache RedHawk tool [3], which is our reference tool for power integrity analysis. In this way, we can quickly compute the power consumption of our testcase at different power supply and substrate bias voltages. This approach allowed us to extend the configuration exploration beyond the typical supply voltage range covered by the given cell libraries. Furthermore, to evaluate the impact of body biasing on timing, we extracted the SPICE-level netlists of the critical paths, and we simulated the propagation delay as a function of the power supply and biasing conditions. The paper is structured as follows. Section II introduces some fundamental concepts on body biasing. Section III provides a description of the case study evaluating the impact of body biasing on area and timing. The complete leakage/timing evaluation flow is presented in Section IV; while Section V discusses experimental results showing an achievable significant leakage reduction with respect to a nominal zero-bias VDD/VDDS condition. Finally, Section VI summarizes our conclusive remarks. II. BACKGROUND: BODY BIASING TECHNIQUES Body biasing adapts the voltage level of the transistors bulk to different operating conditions. Controlling the bulk biasing has a strong impact on both timing performance and leakage. In order to have a symmetric effect on N-type and P-type transistors, biasing must be consistently applied to both device types, so that P-type and N-type substrates have opposite bias voltages. This is possible only in a triple-well process depicted in Figure 1. Usually, the device junctions are zero-biased: i.e., the nMOS bulk is grounded and the pMOS bulk is connected to VDD. Biasing the device bulk at different voltages has the effect of changing the threshold voltage, which is given by [5]: Vth = Vth 0 + γ ⋅ ( ) 2 ⋅ Φ F − VBS − 2 ⋅ Φ F , where Vth0 is the threshold voltage for the zero-bias condition, γ is the body effect coefficient, and ΦF is the Fermi potential. Hence, the transistor threshold voltage can be adjusted by changing the bulk-source voltage VBS. The threshold voltage plays a very important role on power dissipation. In particular, leakage current strongly depends on the transistor Vth. The relationship between sub-threshold current and threshold voltage can be expressed as [8][9]: I sub = I 0 ⋅ e VGS −Vth 0 − λ ⋅VSB +η ⋅VDS n ⋅VT V − DS ⎛ ⎜ ⋅ 1 − e VT ⎜ ⎝ ⎞ ⎟, ⎟ ⎠ (1) where I0 is the drain-source current (IDS) when VGS = Vth0, VT is the thermal voltage (k·T/q), η is drain-induced barrier lowering (DIBL) coefficient, γ is the body effect coefficient, and n the sub-threshold slope. In this work, we consider the subthreshold current as the major contribution to the leakage current. This is not completely true: there are various leakage components such as the band-to-band tunneling currents and leakage through the gate oxide [12]. However, in our target technology, the subthreshold current is the dominant leakage source for the considered range of VDD/VDDS configurations. Moreover, in a given technology, the capabilities to control/reduce other current components at design level are quite limited. Furthermore, the propagation delay for a CMOS gate strongly depends on the threshold voltage and can be expressed as [10]: tp ∝ CL ⋅VDD C L ⋅VDD ≈ , I DS A ⋅ (VDD − Vth ) where CL is the fan-out load capacitance, and A is a constant factor. Hence, the lower the threshold voltage, the better the timing performance. However, such performance boost comes with a penalty in terms of leakage increase. We define FBB as the configuration in which the sourcebulk junctions are forward biased both for pMOS and nMOS transistors. Similarly, we define RBB as the configuration where both junctions are reverse biased. Table 1 shows the voltage configurations for biasing a circuit operating at 134 Table 2. Testcase statistics Zero-bias Design area (μm2) 21571 537 537 Leaf cell count 9231 9251 Buf/inv cell count 5460 5480 14 14 Hierarchical port count Clock-tree buf/inv cell count III. Figure 2. Power grid layout nominal voltage of 1.1 V. Transistor behavior in the two body biasing configurations is very different. FBB decreases the threshold voltage, thus decreasing the propagation delays while increasing power dissipation due to the higher leakage current component. In contrast, RBB increments the threshold voltage, thus reducing the power dissipated by leakage current while increasing the propagation delay. Furthermore, there are some limitations in the allowable bias voltages. In FBB we are applying a direct bias voltage to a diode (the source-bulk and drain-bulk junctions). In RBB the band-to-band tunneling current increases as the reverse bias voltage is raised [9], thus limiting the overall effect from the reduction of Isub (1). Leakage reduction by means of RBB is a very appealing option for power dissipation control and reduction strategies. However, it is worth noticing that previous works such as [14][15] presented a decreasing effectiveness of leakage reduction with RBB in more scaled technologies. Even in relatively mature industrial technologies such as 65/55 nm, the leakage power dissipation has become a critical concern, since it accounts for about 50% of the total power budget in microcontrollers and microprocessor-based designs. At the same time, the very tight constraints imposed on silicon area by increasingly competitive market conditions, limit the economically viable design techniques that can be effectively used to reduce the leakage component. Therefore, SoC designers are working hard to achieve this target, and RBB has emerged as one of the most interesting techniques that can be exploited in a standard design flow on an industrial scale. Finally, for the sake of completeness, also dynamic applications of body biasing have been proposed as an effective technique to mitigate the threshold voltage spread due to process variations [6][7], and to reduce the power consumption by adapting the threshold voltage to different workloads [11]. The application of run-time adaptive body biasing during circuit working operations is based on on-chip monitors (like ring oscillators) and a body biasing controller that can dynamically adjust the threshold voltage to keep the circuit behavior closer to the nominal conditions. Biased 21351 CASE STUDY DESCRIPTION As a case study, we considered an internal testcase that is a sub-circuit of the Manyac system described in [13]. The 20kgate design was implemented using an industrial 40 nm triplewell CMOS technology with a nominal supply voltage of 1.1 V. Synthesis was performed with Synopsys Design Compiler exploiting the topographical capability and targeting a 250 MHz operating frequency. In order to verify the cost of body biasing, we implemented the same design, both with and without the additional power distribution network circuitry that supports body biasing. If the designer chooses to apply either RBB or FBB, the chip floorplan needs to be designed differently, as a separate power/ground (P/G) grid must be routed to distribute the Ptype and N-type bulk biasing across the die. This is normally done by inserting specific well taps in each standard cell row. These components are library cells with no logic functionality, explicitly designed to connect the substrate. VDDS/GNDS metal grids are then routed on top of the VDD/GND power distribution network to carry the voltage supply to well taps. The substrate P/G grids are similar to the primary power supply grids, with narrower metal widths, since they do not have to distribute large current loads. Typically, these currents are at least two orders of magnitude smaller than those flowing in the primary power distribution network. Figure 2 shows a snapshot of the P/G grid layout, representing the M6 layer, with the additional bias power stripes and the well-tap connections to the substrate. The initial RTL description for the bias and zero-bias versions is the same, but the floorplan passed to the topographical synthesis differs, leading to slightly different synthetized gate-level netlists. The differences are quite small: usually the synthesis with the body biased floorplan uses more cells to compensate for a smaller available routing space due to the congestion induced by the additional VDDS/GNDS grids. Eventually, in our testcase the difference counted only five cells. All the implementation phase (place&route, CTS, and post-routing optimization) was performed using Synopsys IC Compiler. As reported in Table 2, for this design the overhead due to routing the additional power/ground stripes is quite small: ~1%. The additional effort for routing the biased version (expressed as the number of timing violations before final optimization) was negligible as well. 135 Figure 3. Propagation delay as a function of the supply/bias voltage IV. BODY BIASING ANALYSIS In order to assess the post-routing version of the testcase implemented with body biasing, we applied the following evaluation methodology to explore the feasible timing vs. leakage trade-offs. We analyzed different operating power supply voltages and biasing configurations: • Primary power supply values (VDD): 0.9 V, 1.1 V, and 1.3 V; • For every operating voltage, we considered the following body biasing configurations (VDDS/GNDS): zero-bias, ±0.4 V, ±0.2 V. A. Leakage Characterization To investigate leakage variations, we used Apache RedHawk for a fast assessment of the design leakage consumption. In particular, we exploited the Apache Power Library (APL) CDEV characterization [3], setting the bias and supply voltages at the target values. With this approach, we extended the analysis to voltage combinations not covered by the available library data set. Similarly to Liberty files, the APL/CDEV characterization performs a SPICE-level simulation of the design library cells in the desired corners. It is important to notice that the APL/CDEV characterization is design-dependent; hence, it only targets the standard cells used in the design, which is typically a much smaller subset of the original library. Moreover, by focusing only on leakage as opposed to full timing and power characterization, the simulation and the definition of the test vectors is static, and is drastically simplified since it does not take into account input slew rates and output load capacitances of the cells. B. Timing Performances To investigate the timing variations induced by different bias voltages, we considered the critical path obtained from a nominal condition analysis. However, the critical path could change across different VDD/VDDS configurations. As timing Figure 4. Leakage power as a function of the supply/bias voltage is concerned, we assumed that the first 10 worst paths are significant. Hence, we focused on the timing degradation of the max-delay paths to assess the overall performance degradation of the design. We carried out SPICE-level simulations in the relevant VDD/GND corners using Mentor Graphics Eldo, while the critical path netlists were extracted with Synopsys Prime Time, and post-processed with an internal tool to make them readable by Eldo. We considered the SPEF file extracted with Synopsys StarRC to improve the simulation accuracy. In this way, the true load capacitances of the standard cells were taken into account. V. LEAKAGE AND TIMING RESULTS In order to evaluate the trade-offs introduced by body biasing, the central zero-bias VDD/VDDS point was considered as the reference. A graphical representation of the timing analysis results is presented in Figure 3, while Table 3 reports the same data normalized as percent deviations with respect to the central zero-bias VDD/VDDS reference. Table 4 and Table 5 report the numerical results for both leakage and total power consumption as a function of the voltage configurations. Furthermore, the leakage as a function of the supply and substrate voltages is plotted in Figure 4, where the negative bias values indicate a RBB configuration while the positive values represent a FBB configuration. The experimental results confirm that leakage reduction is strongly dependent on the RBB/FBB configuration. Considering a leakage recovery strategy, our testcase can reduce the overall power dissipation up to 84% with respect to the zero-bias VDD/VDDS reference, by means of the joint contribution of RBB and voltage scaling. As expected, Table 3 reports a strong timing degradation at the lower leakage points, where propagation delays can be almost doubled with respect to the reference. If design requirements dictate stringent performance constraints, other supply/bias configurations can be considered. For example, for the reference VDD/VDDS, an RBB of 200 mV can lead to a 50% 136 leakage reduction, with only 9% degradation in propagation delay. The impact of leakage dissipation on the overall power consumption depends on many factors: mainly technology and application. For our testcase, the activity file for power calculation had a high activity. In case of applications with longer stand-by and lower processing times, the impact would be even more marked. From Figure 3 and Figure 4, it is worth noticing the strong asymmetry observed between timing and leakage behavior. The leakage dissipation has a larger variation range with respect to the variation observed for propagation delay. In particular, an FBB configuration may induce a greater and non-constant increase in leakage, as shown in Figure 4. From the leakage dissipation point of view, we can observe that in FBB the ratio between timing and leakage is strongly unbalanced. A few details have to be considered regarding the implementation step, because a change in the VDD/VDDS configuration implies a variation in the design timing behavior. Therefore, we must guarantee the correct timing functionality at the target operating voltage to prevent hold and setup timing violations. If the designer decides to use a VDD/VDDS combination with timing characterization not covered by the available libraries, then some issues may arise during place&route. The implementation flow is multi-corner: usually hold violations are fixed by considering the fastest cell library, while setup violations are fixed in the slowest corner. Hold violations occur when signals change too fast with respect to the active clock edge. If the target VDD/VDDS configuration is slower than the fastest corner used during the physical implementation, then the timing functionality is safe. In contrast, if it is faster, the newly characterized library should be added to the library set used during the implementation. On the other hand, setup violations occur when signals arrive too late with respect to the active clock edge. In our testcase, these violations are covered by the timing closure obtained at the maximum target operating frequency. VI. while guaranteeing the correct timing functionality. As far as leakage is concerned, we achieved a reduction up to one sixth with respect to a reference zero-bias implementation, with a propagation delay almost doubled. Furthermore, some details concerning the proper implementation to preserve the correct design timing functionality have been also discussed. ACKNOWLEDGEMENT This work was partially supported by the Dote di Ricerca Applicata of Region Lombardy. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] CONCLUSION In this work, a detailed case study to evaluate the impact on timing and leakage of body biasing is presented. In particular, we detailed a complete analysis flow to assess the achievable trade-offs in terms of leakage reduction and timing performance on a routed design, even though our solution can be used at different steps of the implementation phase (postsynthesis, post-placement, post-CTS). Results obtained on our testcase reported a negligible overhead in terms of area and effort to implement the body biasing technique. Exploration of feasible VDD/VDDS operating conditions is based on Apache RedHawk APL characterizations and SPICE-level simulations. The proposed approach provided a practical and efficient method to expand the exploration space for supply/bias voltage configurations not covered by the available cell library set. Such flow can deliver relevant information to designers, in order to determine the most appropriate configuration for leakage reduction, [11] [12] [13] [14] [15] Semiconductor Industry Association, “International Technology Roadmap for Semiconductors,” 2012 update, http://www.itrs.net Liberty User Guides and Reference Manual Suite Version 2012.06, http://www.opensourceliberty.org Apache RedHawk User Manual. Release 12.2, Jan. 2013, http://www.apache-da.com J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De, “Dynamic sleep transistor and body bias for active leakage power control of microprocessors,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 1838-1845, Nov. 2003. J. M. Rabaey, A. P. Chandrakasan, and B. Nikolić, Digital integrated circuits, 2nd edition. Upper Saddle River, NJ, Prentice Hall, 2003. J. W. Tschanz, S. G. Narendra, R. Nair, and V. De, “Effectiveness of adaptive supply voltage and body bias for reducing impact of parameter variations in low power and high performance microprocessors,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 826-829, May 2003. J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, “Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 1396-1402, Nov. 2002. J. Kao, S. G. Narendra, and A. P. Chandrakasan, “Subthreshold leakage modeling and reduction techniques,” in Proc. ICCAD, pp. 141-148, Nov. 2002. A. Agarwal, S. Mukhopadhyay, C. H. Kim, A. Raychowdhury, and K. Roy, “Leakage power analysis and reduction: models, estimation and tools,” IEEE Proc. Computers and Digital Techniques, vol. 152, pp. 353- 368, May 2005. S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, “1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS,” IEEE Journal of Solid-State Circuits, vol. 30, pp. 847-854, Aug. 1995. S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw, “Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads,” in Proc. ICCAD, pp. 721725, Nov. 2002. K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits,” Proc. IEEE, vol. 91, pp. 305-327, Feb. 2003. D. Rossi, F. Campi, S. Spolzino, S. Pucillo, and R. Guerrieri, “A heterogeneous digital signal processor for dynamically reconfigurable computing," IEEE Journal of Solid-State Circuits, vol. 45, pp. 16151626, Aug. 2010. A. Keshavarzi, S. G. Narendra, S. Borkar, C. Hawkins, K. Royi, and V. De, “Technology scaling behavior of optimum reverse body bias for standby leakage power reduction in CMOS IC's,” in Proc. International Symposium on Low Power Electronics and Design, pp. 252-254, Aug. 1999. A. Keshavarzi, S. Ma, S. G. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, and V. De, “Effectiveness of reverse body bias for leakage control in scaled dual Vt CMOS ICs,” in Proc. International Symposium on Low Power Electronics and Design, pp. 207-212, Aug. 2001. 137 Table 3. Timing vs. voltage configurations: propagation delay variations: (values normalized to a central zero-bias VDD/VDDS configuration) Bias Configuration RBB Supply Voltage (V) -0.4 V Zero-bias -0.2 V FBB 0.2 V 0.4 V 0.9 108% 83% 60% 39% 20% 1.1 18% 9% Reference -8% -17% 1.3 -16% -21% -25% -30% -34% Table 4. Leakage dissipation vs. voltage configurations: (values normalized to a central zero-bias VDD/VDDS configuration) Bias Configuration RBB Supply Voltage (V) -0.4 V Zero-bias -0.2 V FBB 0.2 V 0.4 V 0.9 -84% -70% -39% 41% 677% 1.1 -72% -49% Reference 127% 980% 1.3 -54% -18% 59% 250% 1365% Table 5. Total power dissipation vs. voltage configurations: (values normalized to a central zero-bias VDD/VDDS configuration) Bias Configuration RBB Supply Voltage (V) 0.9 -0.4 V -34% Zero-bias -0.2 V FBB 0.2 V -34% -33% 0.4 V -30% -9% 1.1 -1% -1% Reference 4% 32% 1.3 36% 37% 40% 46% 83% 138