Impact of Scaling on The Effectiveness of Dynamic Power Reduction

advertisement
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
Impact of Scaling on The Effectiveness of Dynamic Power Reduction Schemes
D. Duarte‡
Intel Corporation
david.e.duarte@intel.com
G. McFarland
N. Vijaykrishnan, M.J. Irwin, H-S Kim
Intel Corporation
Department of CSE, Penn State University
grant.mcfarland@intel.com
{vijay, mji, hykim}@cse.psu.edu
Abstract
Power is considered to be the major limiter to the
design of more faster and complex processors in the near
future. In order to address this challenge, a combination
of process, circuit design and micro-architectural
changes are required. Consequently, to focus the
optimization efforts in the right direction, the models
proposed and studies performed in this work are a first
step for understanding the relative importance of leakage
and dynamic energy in future technologies. Further, we
analyze the effectiveness of two energy reduction
mechanisms that employ voltage scaling, namely, supply
and threshold voltage selection. We consider the impact
of imminent technology changes and packaging
improvements while showing that neglecting the impact
of temperature may lead to underestimate the power
savings by up to 19.5%.
1. Introduction
Energy dissipation has become an important design
consideration, which can be attributed to the proliferation
of battery-driven mobile systems and concerns about
circuit reliability and packaging costs. In fact, power is
widely considered to be the major impediment for more
powerful high-performance processors.
For CMOS circuits, the major sources of power
consumption are dynamic and leakage power, with the
latter becoming more significant as threshold voltages
scale with technology. In order for devising new solutions
to address the increasingly important power problem, it is
essential for circuit designers and architects to have a
mechanism to analyze future trends accurately and
understand the relative importance of these components.
There is a lot of literature that deals with the impact
of technology scaling in the various aspects of VLSI
circuit design [3, 4] and this paper does not intend to be
one more with the same perspective. Here, we go a step
further by providing a systematic approach to analyzing
the dynamic and leakage energy trends. Further, we
evaluate the anticipated effectiveness of supply voltage
scaling that is widely used for energy optimization in
∗
current processors and compare it to a threshold voltage
scaling approach. This is done considering the impact of
technology and packaging improvements, as well as the
key role of the operating temperature.
2. Effects of scaling on power consumption
The dynamic power consumption of a given design
has been usually estimated as:
Pact = N t CavgVdd2 ( Act ) f clock
Where Nt is the number of transistors in the design,
Cavg is average capacitive load, Vdd is the power supply,
fclock is the operating frequency and Act is the activity
factor, which accounts for the number of devices that are
actually switching. We calculate Cave = Cgate_ave+ Cdrain_ave
+ Cwire_local, with gate and diffusion capacitances
estimated as normally [9] for an average-size device. The
interconnect component is calculated as Cwire_local =
Cwire/um. Llocal, where Cwire/um is the wire associated
capacitance per unit length and Llocal is made equal to 10
times the minimum feature (λ) as connections are only to
neighboring cells. Please refer to [1] for more details in
the extraction of Cwire/um.
The contribution of short-circuit currents will
become of lesser importance for deep-submicron
technologies, in particular since the threshold voltage
(Vth) scales down at a slower rate than Vdd [9]. The cycle
time is estimated as:
Tcycle =
LD C avgVdd
1
=
f clock
I on
Pact =
N tVdd I on ( Act )
LD
Where ION is the drive current for an average-size
device and LD is the logic depth (i.e., number of gate
delays) of the slowest pipeline stage. The result obtained
after replacing fclock in the power equation, is also given
above. Following a starting reference number given in [1]
for a 0.6um technology, we have scaled down LD by a
constant factor up to the point were deeper pipelining is
basically non-feasible as the latching time cost becomes
comparable to the evaluation time of the logic between
the registers. Similarly, we have scaled up the activity
factor as a way to capture architectural improvements for
enhanced Instruction Level Parallelism (ILP). In [8], a
Acknowledgements: This work was supported in part by GSRC grant 98-DT-660, NSF Career Awards 0093085 and 0082064. ‡ D. Duarte was with the
Department of Electrical Engineering, Pennsylvania State University while developing this work.
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
value of Act = 0.015 is used and we choose it as the base
value for the 0.6um design. This number may seem very
low but it captures the impact of aggressive clock gating,
which is standard in current designs. The scaling factor of
about 0.75 for LD was selected for consistency with
industry data. Besides LD and Act, all other factors in the
power equation given above scale with technology at a
predictable rate depending on the scaling laws followed.
We have used the scaling models presented in [1] and we
have found a fairly good agreement of the main
technology parameters with those presented by the ITRS
roadmap [7].
We have assumed that short-channel effects (SCE)
dominate and the effect of Drain Induced Barrier
Lowering (DIBL) is captured. The number of transistors
(Nt) is estimated by dividing the total die area by the area
of an average-size device with individual contacts and
some spare area around it. This approach attempts to
balance the effect of very compact structures (such as
memories) and other structures not so regular (such as
datapaths). Two cases are considered: a constant (80mm2)
and a variable die size, with the latter assuming an
increase of 14% in die size from one technology
generation to the next [3]. The first case can be seen as a
low-end or embedded design, where simple clocking
mechanisms are desirable, while the second one can be
regarded as a high-performance design.
Now, to estimate leakage power consumption due to
subtreshold currents we use the following expression [2]:
Pleak = N t Vdd I off K design
Where Kdesign is a factor that accounts for the
distribution/sizing of P and N devices, the stacking effect,
the idleness of the devices and the design style used. This
factor is defined empirically, and there is no analytical
expression for it. In [2], experiments show that Kdesign for
logic is around 10, while for memory structures it is about
1. Based on the area used by logic and memory
structures, we estimate an average Kdesign of 2, as the area
used for memory structures tends to be 85% in nanometer
technologies. For details about how the logic and memory
areas were determined, please refer to [6].
Subthreshold conduction is not the only leakage
mechanism but it has by far the largest impact, which is
worsened when DIBL effects are considered. The
subthreshold current was estimated as [1]:
I sub =
V −V 
µCoxW
(η − 1)φT2 exp gs th 
Leff
 ηφT 
whe re
η = 1+
Cdep
Cox
In the above equations, µ is the carrier mobility, φT is
the thermal voltage (=KT/q) and Cdep is the capacitance of
the channel depletion region. The gate leakage estimates
based on direct oxide tunneling effects were found to be
almost completely negligible for the technologies studied.
Figure 1. Active and leakage power (constant die
size)
Figure 2. Active and leakage power (increasing
die size).
Figures 1 and 2 illustrate how the estimates of
dynamic and leakage power (obtained with the equations
given) vary across the technologies considered. Note that
we have only captured the influence of subthreshold
currents, as they are the dominant leakage mechanism.
Additionally, the effect of temperature has also been
taken into account and from the plots, it is clear that it has
a deep impact in the way that power (leakage power, in
particular) behaves. For more details about the modeling
of these effects please refer to [6]
3. Impact of technology and packaging
There are two technology improvements that are
expected to become standard in mainstream CMOS
products within the next five years [10]. The first
technique proposes replacing SiO2 with high permittivity
materials. The thickness of the inversion layer beneath the
oxide makes the apparent electrical thickness significantly
larger than the actual physical thickness, with deviations
in the range of 0.5nm to 1.0nm [11]. It now seems very
likely that in the 0.1um generation and later, gate oxides
will be fabricated with high-K materials such that the
physical thickness will remain approximately constant
while the electrical thickness is reduced. These materials
are also expected to dramatically reduce gate leakage due
to a higher oxide energy barrier (φB).
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
The second improvement is the replacing of Bulk
CMOS by SOI (Silicon On Insulator). SOI has a
significant impact on power by virtually eliminating
diffusion capacitance and allowing for steeper
subthreshold slopes (ST). In particular, in bulk CMOS, ST
is approximately 100mV/dec, while in SOI ST becomes
75mV/dec, at 100OC. It should be noted that the former
effect (elimination of diffusion capacitances) is beneficial
but does not return much as interconnect capacitance
takes place as the second contributor to the total parasitic
capacitance for technologies where SOI is expected to
become standard (0.1um and beyond).
It was found that, after the two mentioned technology
improvements are incorporated, while subthreshold
currents decrease due to the use of SOI, the use of high-K
dielectrics helps maintaining the impact of gate leakage to
a minimum. Figure 3 captures the impact of the
mentioned improvements in the total system power,
estimated with the equations given earlier. We assume
that dynamic power remains the same as the bulk CMOS
case, following assumptions made earlier. In the optimum
case (when DIBL effects are effectively minimized by
SOI), leakage is always less than active power for the
technologies considered. But as process variations
continue to influence the device parameters, the actual
effect is not ideal but translates into delaying the surge of
leakage power by one technology generation (i.e., for this
study, from 0.035um to 0.025um, as shown in Figure 3).
Figure 3. Impact of SOI and high-K dielectrics in
leakage system power (constant die size).
In parallel with technology improvements, the impact
of packaging and cooling mechanisms should be
accounted for. In fact, the ITRS roadmap has stated that
power consumption will be strongly determined by how
effectively heat is removed from the die. The following
equation shows how the total power and the die
temperature are related to each other [5]:
T j − Ta = θ ja ⋅ Power
Where θja is the thermal resistance and Tj and Ta are
the junction and ambient temperatures, respectively. The
thermal resistance captures the thermal behavior of the
CPU package, interfaces, heat sink and any forced air
mechanisms, if present. Typical heat-sink thermal
resistances vary with the geometry of the sink. For mobile
devices, extruded heat sinks are in the order of 11.5OC/W while vapor-chamber folded-fin sinks are in the
order of 0.2-0.4OC/W. For further details, please refer to
[5]. We have used the above equation to determine what
would be the required θja values to maintain the junction
temperature down to safe levels. The ITRS roadmap has
defined that for mobile designs (constant die size) Tj and
Ta should be 100 OC and 55 OC, while for high
performance designs (increasing die size) Tj and Ta
should be 85 OC and 45 OC, respectively.
The bars in Figure 4 show how θja must change to
guarantee the Tj given above for the two design cases.
This behavior can be analytically described by average
reductions in θja of 33% and 48% per generation for lowend and high-performance cases, respectively. This
estimation was, however an overkill. It was found that
average reductions of 26% and 43% per generation, will
work well until leakage power becomes significant, as
shown by the lines in Figure 4. It must be highlighted that
thermal resistance depends strongly on the cost of all
associated components and also on the volume of the heat
sink [5]. For the study that follows, our default case
assumes DIBL effects and an operating temperature of
1000C, as technology improvements and limitations of
efficient cooling mechanisms compensate each other.
Figure 4. Thermal resistance and non-ideal
temperature behavior.
4. Reducing power and temperature
The chosen techniques for this study are based on
dynamic adjustment, at runtime, of some basic operating
parameters (such as Vdd and Vth). Since these run-time
techniques adversely affect performance, smart policies
must be devised in order to apply them wisely in real
designs. Moreover, due to the strong relationship of
leakage power with temperature, it is important to
accurately model any temperature change associated with
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
the application of a given technique such that a better
estimate is obtained.
performance accentuates for the three cases considered
when the decrease in Vdd is larger than about 20%.
4.1. Supply voltage dynamic scaling
Reduction of the nominal supply voltage gives a
significant reduction on power consumption at the
expense of performance, as the drive current capability
(Ion) reduces and the operating frequency must be reduced
as well. Thus, Dynamic Voltage Scaling (DVS) schemes
must be applied whenever the system operating
requirements allow it. We now explore whether such
schemes would be as useful in future technologies and
whether DVS should be implemented in parallel with
supply gating schemes as leakage power become
dominant.
We consider three base technologies, which were
selected to provide three different power consumption
scenarios. These are summarized in Table 1. We consider
the case where the die size has not been scaled up, which
can be viewed as an initial step towards lowering power
consumption. The results are easily extendable to the case
where die scaling takes place.
Figure 5. Power variation as Vdd changes.
Table 1. Technologies used for evaluation.
Tech (µm)
0.07
0.05
0.035
Total Power
(W)
41
64
126
Dynamic
Power (%)
78
56
33
Leakage
Power (%)
22
44
67
Figures 5 and 6 present the expected power and
performance changes (as estimated with the equation for
Tcycle on Section 2) as the nominal Vdd is scaled down up
to about 40%. Figure 5 shows two cases; the dashed lines
represent the instantaneous power savings after the
change is applied (short-term policy). If the temperature
is allowed to settle (long-term policy), the device leakage
current reduces, causing a further reduction in the power
consumed which ends up reaching a stable point given the
linear relationship of power versus temperature and the
logarithmic one of leakage versus temperature. The
threshold that separates a long-term policy from a shortterm policy depends on how effectively the heat is
removed from the die, such that its temperature follows
closely any change in power consumption. It should be
noted that, in the long-term case, all technologies
basically converge to the same curve in terms of power
reduction and temperature (the minimum temperature
reached was 580C). The figures also show that, even
though the attainable power reduction is almost linear
with the change on Vdd, the negative effect on
Figure 6. Delay variation as Vdd changes.
There are some problems associated with Vdd scaling.
In memory structures, as cell capacitances decrease, the
amount of charge they can store reduces and makes them
more susceptible to soft errors. Another problem is
increased threshold variation in very short channel
devices due to random dopant variation in the channel,
which affects the cell stability during read processes.
These two conditions worsen with Vdd scaling. The latter
phenomenon might be fixed by increasing the beta ratio
of the cell (the ratio of the NMOS pulldown to the NMOS
pass device), which unfortunately prevents the memory
cells from taking full advantage of process scaling. Thus,
it is likely that memory arrays in processors implemented
in 0.1um processes and beyond will need a separate
power supply, higher than that used by the processor core
or they will simply not be able to be scaled as the core,
resulting in non-optimum area utilization.
4.2. Threshold voltage impact
Threshold scaling by substrate biasing has been
proposed and used as an effective way to reduce leakage
power consumption. Although this technique has been
applied for reducing leakage only when a unit or the
whole system is idle, we explore now the feasibility of
applying body bias control at run-time and system-wide.
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
The results of this section can also be used in assessing
the impact of implementing a design in a Dual-Vth
process.
> 3VTH, so that enough current drive is available and
performance is not dramatically harmed.
4.3. Supply and threshold voltage scaling
Figure 7. Power variation as Vth changes.
The following experiments assume variations on Vdd
and Vth, according to the relative contributions of
dynamic and leakage power to the total power number,
respectively. Figures 9 and 10 present the results obtained
when both Vth and Vdd are scaled for a total of 14 steps,
with a maximum performance penalty of 16%. The
starting Vdd values were the nominal ones and they were
lowered by steps of 15mV, 10.5mV and 5.5mV such that
final variations of 23%, 21% and 14% at step 14 were
reached, for 0.07, 0.05 and 0,035um technologies,
respectively. Similarly, the base Vth value was the
nominal and steps of 1.9mV, 2.5mV and 2.4mV were
used in order to reach final variations at step 14 of 14%,
21% and 23%.
Figure 8. Delay variation as Vth changes.
Figures 7 and 8 present the expected power and
performance changes as the nominal Vth is scaled up to by
70%, which directly impacts the average value of ION.
Larger increments on Vth are possible when the technique
is applied to idle units. We observe that increments on Vth
for overall power reduction become more effective as
technology scales, at the expense of increased
performance penalty. And as before, the impact of
temperature is significant. For instance, to achieve a 20%
reduction in power in a 0.035um design, short-term
policies will require a 11% change in Vth while long-term
policies only require a 5% change in Vth. The gap
between the two cases decreases for larger changes in Vth
and less aggressive technologies. This effect is enhanced
by a lower operating temperature, which in the higher
threshold voltage setting was reported to be 78, 67 and
560C for 0.07, 0.05 and 0.035um processes, respectively.
The figures also show that, although the negative
effect on performance is almost linear with the increase
on Vth, the attainable power reduction presents a steeper
rate of change for the three cases considered for increases
on Vth up to about 20%. It was found that the required
body bias voltage that will change Vth by 70% is lower
than the operating voltage of each technology. In
threshold voltage selection, it must be guaranteed that Vdd
Figure 9. Power variation as Vdd and Vth change.
Figure 10. Delay variation as Vdd and Vth change.
If a short-term policy is implemented, we observe
that the attainable power savings converge to a common
trend, as shown in Figure 9. But the trend changes in the
case of long-term policies where the savings are largest
for the 0.035um technology and decrease for less
aggressive processes. The effect is enhanced by a lower
operating temperature, which in step 14 was found to be
68, 64 and 610C for 0.07, 0.05 and 0.035um processes,
respectively.
Published in the Proceedings of the 20th International Conference on Computer Design (ICCD),
September 16-18, 2002, Freiburg, Germany
5. Concluding remarks
We have presented a complete framework for the
estimation of the impact of technology scaling in the
power behavior of future designs. It also accounts for
changes in architecture design and optimizations, aspect
that we have called ‘architectural scaling’.
We have used the mentioned framework to evaluate
the effectiveness of various power reduction techniques.
It was found that supply voltage scaling becomes less
effective in providing power savings as leakage power
becomes larger, which is reasonable given the quadratic
dependence of the dynamic power with Vdd in contrast
with the linear dependence of the leakage power. On the
other hand, power savings obtained by increasing the
threshold voltage are more significant as leakage power
becomes dominant. Again, this is also reasonable given
the logarithmic dependence of the leakage power on Vth,
in contrast with the linear dependence of the dynamic
power. An integrated scheme that uses both supply and
threshold voltage scaling will provide the highest savings
for the least amount of change in the controllable
parameters.
Table 2. Additional percentage power savings
provided by temperature feedback.
Tech
(um)
0.07
0.05
0.035
Additional
Savings %
Average
Maximum
Average
Maximum
Average
Maximum
Vdd
Scaling
5.4
7.4
9.8
13.5
14.4
19.5
Vth
Scaling
1.2
2.0
3.9
6.2
8.9
13.4
Vdd / Vth
Scaling
3.0
4.2
6.5
8.5
11.4
16.7
It was found, however, that the above observations
change significantly if the application of certain scheme is
held for some relatively long time (which we called longterm policy). In such a case, the decrease of power
consumption causes a decrease in temperature, which in
turn will reduce leakage power significantly (temperature
feedback). Table 2 shows the additional percentage
savings that can be obtained if the die is allowed to cool
down after a power reduction scheme is applied, which
can be as high as 19.5%. It is clear that additional savings
increase as leakage becomes more important. This result
emphasizes the importance of including runtime
parameters, such as temperature, if accurate estimations
are to be obtained. Also, design time optimizations such
as technology and packaging improvements should be
accounted for, as discussed in Section 3.
We hope that framework proposed here can be used
in a way that the goal is no longer to have simply the
highest performance, but instead the highest performance
within a particular market segment'
s power budget and by
considering the physical aspects of the real design. For
instance, with the estimates given here, it will be possible
to balance the benefits of using the high threshold devices
in a low leakage process running at the higher possible
frequency at a full Vdd versus using faster but leakier
devices which require more voltage scaling in order to
reach the desired power budget.
Cases like these might lead the design team to select
some optimum percentage of total power to be from
leakage, which would be a function of the power budget
being targeted. In the extreme case, if a process increases
leakage greatly such that Vdd has to be reduced to the
extent of making the design slower than the previous
generation, then this is clearly a bad choice. It is possible
that analysis like the one presented here will lead to the
definition of Leff, Vdd, Tox, and Vth that will keep leakage
power near its optimum percentage for a given processor.
6. References
[1] Mc. Farland, G., “CMOS Technology Scaling and Its
Impact on Cache Delay”, PhD. Thesis, Stanford
University, June 1997, http://umunhum.stanford.edu/
~farland/.
[2] Butts, J. and Sohi, G., “A Static Power Model for
Architects”, Proceedings of the 33rd Annual IEEE–
MICRO 2000, pp. 223-234.
[3] Borkar, S., “Design Challenges of Technology Scaling”,
IEEE Micro, July-August 1999, pp. 23-29.
[4] Sylvester, D., et al., “Future Performance Challenges in
Nanometer Design”, Proc. of the 38th DAC, pp. 3-8.
[5] Viswanath, R., et al., “Thermal Performance Challenges
from Silicon to Systems”, Intel Technology Journal, 3rd
quarter, 2000.
[6] Duarte D., “Clock Network and Phase-Locked Loop
Power Estimation and Experimentation”, PhD. Thesis,
Penn State University, May 2002.
[7] ITRS Roadmap, http://public.itrs.net.
[8] Chen, Z., Diaz, C., et al., “0.18um Dual Vt MOSFET
Process and Energy-delay Measurement”, International
Electronic Devices Meeting, 1996, pp.851-854.
[9] Rabaey, J., Chandrakasan, A. and Nikolic, B., “Digital
Integrated Circuits: A Design Perspective”, 2nd Ed.,
Prentice-Hall International, NJ, 2002 (draft).
[10] Intel Corporation, “Intel Announces Breakthrough In Chip
Transistor Design”, http://www.intel.com/
pressroom/archive/releases/20011126tech.htm.
[11] Hu, C., "Gate Oxide Scaling Limits and Projection",
IEDM, 1996, pp. 319-322.
Download