An Overview of Static Power Dissipation Jayanth Srinivasan 1 Introduction Power consumption is an increasingly important issue in general purpose processors, particularly in the mobile computing segment. In present processors, most of the power dissipation is dynamic power dissipation, which arises due to signal transitions. Various techniques have been studied and implemented to reduce dynamic power dissipation, including clock gating, cache sub-banking, voltage scaling, and eliminating needless computation (these techniques are directly relevant to computer architects). However, as transistors become smaller and faster, static power (also called leakage power) dissipation will become increasingly significant. Technology scaling is increasing both the absolute and relative contributions of static power dissipation. Looking at current technology trends, it is evident that static power dissipation is growing at a faster rate than dynamic power dissipation. In just a few processor generations, the curves will intersect. Using scaling theory, Borkar predicts that leakage power increases by 5 times every generation, while active power remains roughly constant. Because leakage current flows from every transistor that is powered on, with increasing die sizes and integration, static power will become a significant part of the total power. 2 Sources of Static Power Consumption There are three sources of power dissipation in digital CMOS circuits which are summarized in the following equation: Pavg = Pswithing + Pshort iruit + Pstati = CL Vdd2 f + Is Vdd + Ileakage Vdd Pswithing refers to the dynamic component of power, where CL is the load capacitance, f is the clock frequency, and is the node transition activity factor. This equation also assumes the voltage swing is equal to the supply voltage, Vdd . Pshort iruit is due to the direct-path short circuit current Is , which arises when both the NMOS and PMOS transistors are simultaneously active, conducting current directly from supply to ground. Significant short-circuit power dissipation can be avoided if the output rise/fall time of a gate is much longer than the input rise/fall time. Pstati is due to the leakage current Ileakage . Ileakage has five components: 1. Reverse biased pn junction current 1 Diode leakage occurs when a transistor is turned off and another active transistor charges up/down the drain with respect to the former’s bulk potential. For example, consider an inverter with a high input voltage. The output is low, and the NMOS is on. The PMOS transistor will be turned off, but it’s drain to bulk voltage will be Vdd since the output voltage is at 0V and the bulk for PMOS is at Vdd . For the p-well to bulk diode, the leakge current is given by: ID = IS (eV =VT where IS is the reverse saturation current, V 1) is the diode voltage, and VT is the thermal voltage and is equal to KT=q . This current is especially significant for an application which spends much of its time idle, since this power is always being dissipated even when there is no switching. 2. Sub-threshold leakage This occurs when the gate-source voltage, Vgs , has exceeded the weak inversion point but is still below the threshold voltage Vt h. In this region, the MOSFET behaves similar to a bipolar transistor, with it’s exponential characteristics. The current in the sub-threshold region is given by: ISUB = K (W=L)e(Vgs Vth )=(nVT ) (1 e Vds =VT ) where n and K are technology paramaters, and Vds is the drain-source voltage. Scaling down the supply voltage in CMOS requires also to scale down the threshold voltage, Vth , in order to maintain the performance of the scaled down logic. From the equation above, it becomes clear that the reduction of the threshold voltage increases the sub-threshold leakage current significantly. Sub-threshold leakage current along with reverse biased pn junction current are currently the most important components of leakage current. 3. Gate induced drain leakage (GIDL) Gate induced drain leakage (GIDL) current (IGIDL ) arises in the high electric field under the gate/drain overlap region causing deep depletion. GIDL occurs at low VG and high VD and generates carriers into the the substrate and drain from surface traps or band-to-band tunneling. 4. Punch through Punchthrough occurs when the drain and source depletion region approach each other and electrically ”touch” deep in the channel. Punchthrough current ( IP T ) varies quadratically with drain voltage. 5. Gate tunneling Gate oxide tunneling current (IG ) is present when the electric field at the gate is high enough to tunnel through the gate oxide layer. This phenomenon is common in scaled down devices with reduced oxide thickness. 2 3 Impact of technology scaling on static current Butts et al. have explained the impact of technology scaling on static current by using the constant field scaling methodology. The primary constraint on device scaling is the process technology (for e.g., lithography). In order to keep up with Moore’s law, and also to maintain chip reliability, chip designers use the constant field scaling methodology. Constant field scaling reduces the supply voltage by the same factor as device dimensions in order to keep the electric fields the same across technology generations. One of the metrics used is: t = Cgate V dd=IDsat where t is a single transistor delay, Cgate is the gate capacitance per unit width, and IDsat is the maximum saturation drain current that can flow from the transistor. Under constant field scaling, if the supply voltage is reduced by some factor S, the delay must also be reduced by the same factor S. For this, it is sufficient to keep Cgate =IDSat constant. Cgate is proportional to the channel length and inversely proportional to the oxide thickness. Since both these dimensions are reduced by S, Cgate remains constant. Hence, to achieve the expected performance improvement under scaling, the drive current IDSat must remain constant. IDSat is a function of many variables including Vdd Vth . In order to maintain constant IDSat , Vth has to be reduced by a factor greater than S. From the equations in the previous section, we can see that this will lead to exponentially increasing leakage currents. 4 Estimating Leakage Power Various research groups have developed power models for the estimation of leakage power dissipation. However, most of these models are at a transistor level, and are not feasible for efficient architecture power dissipation simulation. Current publicly available power estimation tools for general purpose architectures either ignore leakage power dissipation or assign it a fixed fraction of the dynamic power dissipation. Although such approximations may be acceptable with current process parameters, better leakage power estimation should be incorporated into power estimation tools. Butts et al. have proposed a relatively simple static power model for architects. They model the leakage power as : Pleakage = Vdd Nkdesign Ileak Pleakage is the static power consumption at the architectural level, N is the number of transistors, kdesign is a design dependent parameter, and Ileak is a technology dependent parameter. This equation allows us to seperate the contributions to reduction in leakage power by architects and circuit designers. Ileak depends on technology parameters like Vth while kdesign depends on design parameters like the fraction of where transistors on at any time. 3 (Note: I’m not too sure how easy it would be to incorporate a leakage power model into Wattch for RSIM. We would have to estimate transistor counts for the different blocks, and I’m not too sure we can directly use the kdesign values in the Butts paper.) 5 Reducing Static Power Many circuit and device level techniques have been evolved to reduce static power dissipation. The leakage power equation developed by Butts et al. also lends itself to some obvious ways to reduce power dissipation. 5.1 Input selection for stand-by mode Studies have shown that vectors at the input to logic gates have a large impact on the leakage current. Chen et al. have developed a genetic algorithm based technique to estimate the standby leakage poewr in CMOS circuits. 5.2 Steeper sub-threshold swing Sub-threshold swing is the metric used to evaluate sub-threshold leakage current. The equation for subthreshold leakage current in section 2 lends itself to various methods to reduce this current. The current is proportional to the temperature of operation. Hence, one option is to operate the circuit at liquid nitrogen temperature. This is expensive and not practical for mobile applications though. Another option is to use a Silicon on Insulator (SOI) circuit. It’s found that the leakage current of the SOI device in the standby mode is much lower than that of the bulk silicon device for the same threshold voltage. 5.3 Multiple supply voltages Since power dissipation decreases quadratically with the scaling of supply voltage, while delay only increases linearly, it is possible to use high supply voltage in the critical paths of a design to achieve the required performance while the off-critical paths of the design use lower supply voltage to achieve low power dissipation. By partioning the circuit into several domains operating at different supply voltages, both static and dynamic savings are possible. However, level shifter circuits are required for inter-domain communication. Another way to reduce the supply voltage without impacting performance is to emphasize high IPC designs. However, this should not come at the cost of added circuitry, as the extra leakage current might offset the benefit of the savings due to lower voltage. 5.4 Multiple threshold voltages It is clear that threshold voltage is one of the most important parameters for device and circuit design. For the active mode, the low Vth is preferred because of the higher performance. However, for the standby mode of operation, high Vth is useful for reduction of leakage power. Hence, if different threshold voltages could be used during the different modes of operation, large improvements in performance are possible without 4 sacrificing the speed. Different threshold voltages can be developed during fabrication. Different transistor speeds may be used in different ways. One method would be to employ fast devices along critical timing paths and to slower higher Vth devices in non critical parts of the circuit. A second technique involves determining which functional units require the lowest latencies and allocating the budget of fast, leaky devices to these units only. 5.5 Reducing the number of devices One obvious technique to reduce static power is to reduce the total number of transistors used in the circuit. However, it is difficult to find opportunities to reduce the device count enough to impact power. Since a large number of devices must be removed to have a noticeable impact, units with replication make obvious targets. Cache size, number of functional units, and issue/retire bandwidth may all be reduced with varying degrees of difficulty and performance impact. Another beneficial task for architects would be to equalize utilization. Power gating may be used to achieve the same benefit of reducing the number of devices without actually removing any devices. It is analogous to clock gating. Sections of the circuit are turned off when not in use in order to reduce leakage power. However, additional circuitry is required to monitor when shutting off can be done and to implement the powering down. This leads to extra power dissipation. The other major problem with power gating is the latency required for units to turn on after they have been powered down. Due to the huge capacitance on the power supply nodes in a unit, several clock cycles will be needed to allow the power supply to reach its operating level. Solutions to this problem involve stalling or prediction of when units are required. 5.6 Using more efficient circuits kdesign offers few opportunities for static power reduction directly. Power efficient circuitry can be used if performance is maintained within required limits. 5.7 Power reduction with speculation Speculation can be an important tool for architects when designing power-efficient architectures. It provides an opportunity to use slower devices without proportionally impacting performance. Fast circuitry is used for the performance critical speculation circuitry while slower circuitry can be used for the relatively simple verification. Thus, the verification circuitry may use high threshold devices, use a lower supply voltage, a lower frequency, etc. resulting in both static and dynamic power savings. DIVA is a good example of an architecture in which such devices can be used. Another application of speculation is predicting when certain circuitry will be needed in order to bring it out of a power gated state. Speculation can be used to power down parts of the circuit and power them up again. 5 6 Related Work in the Architecture Community Many research groups have proposed and developed architectural techniques to reduce dynamic power dissipation. However, very little work has been done on static power dissipation from an architectural perspective. Recent work by Powell et al. combines circuit and architectural techniques to reduce the power consumption in a processor’s cache. The cache miss rate is used to determine the working set size of the application relative to that of the cache. Power is then removed from the unused portions of the cache using gated-Vdd transistors. Recent work by Kaxiras et al. also attacks static power dissipation in the cache. Policies and implementations for reducing cache leakage by invalidating and turning off cache lines when they hold date not likely to be reused is discussed. This leads to power savings in the cache. The device community has been looking at the problem of static leakage for a much longer time. Several device techniques have been developed. Several low static power transistor families (like MTCMOS) have also been developed. 7 Future Work Most of the current work on reducing static power dissipation lies in the domain of circuit and device engineers. They targer lower power circuits by tweaking the design at a fabrication level. Reducing the number of devices in order to save static power is difficult. Most of the devices which can be removed are redundant and can be removed during fabrication using design algorithms. There is more scope in attacking power gating. Exploiting speculation is probably the best way for architects to deal with static power consumption. Our current adaptive framework for multimedia applications can be modified to take static power consumption into account. This could possibly lead to different results than the case where we only consider dynamic power. A more aggressive architecture, though efficient from the point of view of dynamic power, might cause too much leakage when idle. Speculation could be used to predict the need for functional units. Issue width and instruction window size could possibly impact static power in a different way than dynamic power Based on this, voltage can be scaled appropriately. Different sections of the circuit can be identified, and alloted different Vdd and Vth . Soft errors in multimedia applications can be exploited to further reduce static power. An architecture like DIVA can be further optimized for static power consumption when certain soft errors are allowed. That would allow us to further reduce the speed of the verification circuit (resulting in higher Vth and lower Vdd and to also reduce the dynamic power consumption of entire circuit. [1, 11, 7, 2, 8, 5, 3, 9, 6, 4, 10] References [1] T. M. Austin. Diva: A reliable substrate for deep submicron microarchitecture design. In Proc. of the 32nd Annual Intl. Symp. on Microarchitecture, 1998. 6 [2] S. Borkar. Design challenges of technology scaling. In IEEE MICRO, 1999, 1999. [3] J. A. Butts and G. S. Sohi. A static power model for architects. In Proc. of the 33rd Annual Intl. Symp. on Microarchitecture, 2000. [4] A. P. Chandrakasan and R. W. Brodersen. Low power digital cmos design. [5] S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay : Exploiting generational behavior to reduce cache leakage power. In Proc. of the 28th Annual Intl. Symp. on Comp. Architecture, 2001. [6] W. Nebel and J. Mermet. Low power design in deep submicron electronics. [7] J. P.Halter and F. N. Najm. A gate-level leakage power reduction method for ultra-low-power cmos circuits. In Proc. IEEE Custom IC Conference, 1997, 1997. [8] M. D. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T.N.Vijaykumar. Gated-vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. of the Intl. Symposium on Low Power Electronics and Design, 2000, 2000. [9] K. Roy and S. C. Prasad. Low-power cmos vlsi circuit design. [10] A. S. Sedra and K. C. Smith. Microelectronic circuits. [11] S.-H. Yang, M. D. Powell, B. Falsafi, K. Roy, and T.N.Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In Proc. of the 7th Intl. Symp. on High-Perf. Comp. Architecture, 2001. 7