Turbo Boost vs. Overclocking

advertisement
Turbo Boost and Overclocking
© Intel Corp.
Architecture and Early Performance Results of Turbo Boost Technology
on Intel® CoreTM i7 Processor and Intel Xeon® Processor 5500 Series (2009)
Markus Mattwandel, Todd Baird, Jorge Garcia, Seongwoo Kim, Herbert Mayer*
Abstract We survey the Turbo Boost Technology on the new Intel® Core TM i7 multi-core, multi-threaded micro processor. Turbo Boost
Technology dynamically increases the frequency of processor cores for the benefit of higher performance, while operating under thermal
design limits and maintaining safe conditions on the physical chip. This paper outlines the degree, how much the core frequency can be
raised as a function of the number of currently active cores and of other electrical and temperature parameters. We explain conditions,
under which such boosts are possible, depending on instantaneously flowing current, on overall power consumption with resulting heat
generation, and on actual temperature of the core[s] being boosted. We contrast Turbo with Overclocking, another method of boosting
frequency and improving performance, and discuss the pros and cons of Turbo versus thermal throttling. Since the Turbo Boost
Technology has been implemented in silicon on the Core i7, on both single-socket desktop and dual-socket servers, we include actual
performance data from average to ideal cases. Core i7 is implemented in 45 nm High-K Silicon, launched in late 2008 as a High End
Desktop platform with 1, and in 2009 as a server with 2 processors, each having 4 cores and 2 hardware threads per core. We conclude
with conjectures into the future and a list of references.
Keywords: Multi-Core; Turbo Mode; Overclocking; Simultaneous Multi-Threading (SMT); Parallel System; Logical Core; Green Computing
1. Introduction
Turbo Boost Technology (Turbo, for short)
dynamically enables a temporary performance boost on
the new Intel® CoreM i7 multi-core, multi-threaded
micro processor, stylized in Figure 2.1. Turbo Boost
Technology increases the core clock of a processor in
defined, discreet frequency steps (AKA bins) for the
benefit of higher performance, while conditions on the
physical chip allow this without endangering the
microprocessor. This survey outlines the degree, how
much the core frequency can be raised as a function of
the number of active cores and of other parameters.
Section 2 describes the design goals of Turbo
Boost Technology on Core i7 and contrasts the new
method with an older Turbo legacy method
implemented on earlier Intel silicon. It discusses the
pros and cons of Turbo vs. Overclocking, both of them
being methods of boosting frequency to increase
performance, yet with different goals and conditions.
It also compares Turbo with thermal throttling.
Section 3 summarizes, how much Turbo boosting is
theoretically possible, as set by predefined system
parameters. In section 4 we list costs, shortcomings,
and dangers of Turbo. Since the Turbo Boost
Technology has been implemented in silicon on the
Core i7, on both single-socket desktops and dualsocket servers, Section 5 includes detailed, actual
performance data on client- and server platforms, from
average to ideal cases. Section 6 contrasts Turbo with
other performance boost ideas, while sections 7 and 8
conclude with a conjecture into the future and
references.
The physical Core i7 microprocessor is realized
by Intel in 45 nm High-K Silicon technology, launched
in late 2008 as a High-End Desktop platform with a
single socket, and in 2009 as a server with 2 sockets.


2. Description of Turbo Boost Technology
Why Turbo Boost? Intel Turbo Boost Technology,
introduced on Intel’s flagship Core i7 and Core i7
Extreme Edition processors in Q3’2008, allows
processor cores to automatically run faster than their
base operating frequency if cores are operating at the
low end of a defined envelope of power, current, and
temperature, the specification limits. The amount of
additional frequency upside each core actually will
achieve depends on the total number of active cores,
executing processes (threads) that a workload has
spawned, and on the thermal operating environment,
which includes current (thermal design current, or
TDC) and power consumption (thermal design power,
or TDP), as well as temperature. Turbo Boost kicks in
when the OS power scheme is set for performance and
the processor package is operating below critical
constraints. The core frequency is dynamically
adjusted within the defined limits, as the operating
conditions change.
Frequency &
Voltage
Independent
Interface
DRAMs
DDR3
C
O
R
E
0
C
O
R
E
1
C
O
R
E
2
Last Level Cache
IMC
QPI
QPI
C
O
R
E
3
C
O
R
E
S
Pw
r
&
Clk
U
N
C
O
R
E
QP
I
Figure 2.1 High-Level Nehalem Architecture
Corresponding author: herb.g.mayer@intel.com
SPEC, SPECint and SPECfp are copyright of SPEC
1
Turbo Boost and Overclocking
© Intel Corp.
Thermal Throttling comes from the other end by taking
a greedy approach of performance enhancement.
Thermal throttling assumes that the microprocessor is
generally running in some steady state of execution,
but acknowledges that temporary hot spots are
possible. This happens when the typical mix of IObound plus compute-bound execution is replaced by
compute-bound only execution, resulting in more heat
generation than is safe. Similar to the safety action
taken in Turbo, the frequency is throttled in thermal
throttling, resulting is less current and thus less heat
being generated, and less performance being delivered.
A microprocessor architect must decide, which
safe technology of performance boosting should be
realized in Silicon, one, or the other, or both. On the
Core i7 Intel decided to provide both methods.
2.1. Turbo Boost Technology vs. Enhanced
Dynamic Acceleration Technology: Prior to the
introduction of Turbo Boost in Core i7, Intel’s previous
generation Core 2 Duo processors introduced the 1st
generation of Turbo technologies known as Enhanced
Dynamic Acceleration Technology (EDAT). This
technology allows processor cores to automatically run
faster than their base operating frequency, if one or
more core(s) are idle. In that event, the operating
frequency of the other cores is increased. Note that this
increase is influenced by the number of active
hardware threads and by various electrical and thermal
parameters, before taking advantage of a clock boost
within the product constraints. Turbo Boost and EDAT
also happen to be “Green” technologies that provide
performance on demand, while keeping power
consumption at a minimum when the additional
processor performance is not needed, as judged by the
current load.
3. Ideal Performance Speedup with Turbo
A number of dynamic parameters dictate the upper
limit of Turbo Boost speedup limit. These include the
current core’s temperature, the overall current and
momentary power, and the number of active cores.
Each frequency step of turbo boots is 133.33 MHz.
For each SKU, fuse values are set in a small internal
table during chip manufacturing, to define an upper
bound, how many of these frequency steps maximally a
core can increase safely. The table parameters d-c-b-a
mean: If 1 core is active, that core’s frequency may
increase by a bins. Else if 2 cores are active, these
cores can grow by b frequency steps, etc.
Applying the same encoding principle, but
starting from he other end, the table entry 1-1-4-8
means that for 3 or 4 cores being active, the frequency
may increase by just 1 frequency step. But if only 2
cores are busy, the speed may grow up to 4 steps, and
if only a single core is active, the current one may grow
by 8 frequency steps, amounting to 1.06 GHz
incremental clock speed.
However, this boost may decrease, if for any
reason a predefined envelope of maximally allowable
current or temperature is exceeded. Decrease is
designed to not only save the microprocessor from
thermal stress, but to save power and run “more green”.
Similarly, as the sample 1-1-4-8 bound shows, other
cores may become active, forcing a current high boost
rate to decrease, again to protect the processor and save
power.
2.2. Turbo Boost vs. Overclocking
Turbo is quite distinct from overclocking. First of all,
overclocking increases clock frequency by running
outside the specification of the part, while Turbo
operates completely within spec. Turbo does not
change the reliability or durability of a part.
Overclocking occurs when the clock rate of the
processor is manually and statically increased. This
results in running the processor out of its specified and
thus safe limits. Conversely, Turbo technologies run
the processor within specification, and aim to take
advantage of optional thermal headroom available
during under-utilized conditions. Overclocking is not a
“Green” technology, since it forces increased processor
power consumption continuously without regard to
actual demand.
Starting Clock
heat protection
Protective action
Turbo Boost
Base op frequency
Yes
Decrease clock
Application
Mechanism
Automatic, based
on sys. Conditions
Overclocking
Base op frequency
Yes
Thermal throttling
set by user.
Manual, user driven
by brute force
2.3. Turbo Execution vs. Thermal Throttling
Turbo Boost Technology is a conservative performance
enhancement method that increases the clock rate, after
the microprocessor recognizes that an increase in clock
speed is safe; it is understood that the processor was
already operation in a safe way before boosting the
clock speed. When the thermal parameters change, or
when the number of active cores increases, then the
prior clock increase is reversed, not only saving the
chip from possible damage, but also saving power.
Starting Clock
heat protection
Protective action
Arch. driven
Turbo Boost
Low, to run safely
Yes
Decrease clock
Yes
4. Technology Investment for Turbo Boost
Although the goal of improving performance with Intel
Turbo Boost Technology is worth pursuing, the longterm investments and shorter-term costs must be
weighed against gains on the performance side for the
user and the business side for the manufacturer.
Thermal Throttle
High, to run fast
Yes
Decrease clock
Yes
4.1. Engineering Investment
The up front engineering costs to design and
implement the Turbo Boost Technology were
noticeable but contained despite the existence of past
2
Turbo Boost and Overclocking
© Intel Corp.
technological history at Intel; e.g. the Enhanced Speed
Step Technology. Design costs included a new minicontroller, called the Power Control Unit (PCU), and
associated microcode. Also, the cost of validation was
significant because new methods were developed to
ensure that the feature was working properly without
interfering with the operation of the feature. The
manufacturing flow was also updated to support testing
of the PCU, which added another minor development
cost.
4.2. End User Costs
When Turbo Boost Technology promotes cores to a
higher frequency, the processor will draw more current
than it would while running at nominal frequency. The
end user will incur an incremental cost for additional
electrical power consumed in this mode, however this
cost is very minor compared to the power used by the
system as a whole. If necessary, users may choose to
manually adjust the balance between performance and
power consumption through the OS power policies.
A final theoretical cost to note is the introduction
of a variable frequency processor into an environment
that has largely been able to depend on a constant
processor frequency. Some applications may attempt
to synchronize events in time based on the assumption
that frequency does not change over time, although
none has yet been found by Intel. Computer users may
also become alarmed when their frequency reporting
tools begin to show dynamic frequency changes.
Figure 5.1 Cinebench 10
Allowing single threaded workloads to run on any
hardware thread incurs performance penalties because
each time a thread moves around the OS needs to
SAVE/RESTORE state to preserve determinism.
5. Actual Performance Data with Turbo Boost
We isolated workloads known to be CPU-centric, and
concentrated further on single and multi-threaded
workloads in our focus on turbo performance
measurements. We proceeded by running three
baseline frequencies without enabling turbo. The base
frequencies were 2.66 GHz, 2.8 GHz, and 2.93 GHz to
simulate the lower and upper bounds of the workload.
Initial results showed mix results because the OS
scheduler was allowing single-core workloads to run
on multiple CPUs. By setting affinity manually, and
forcing workloads to run on a single CPU we were able
to obtain maximum benefit from Turbo. Affinity here
means to associate any particular thread with a
dedicated core or hyper-thread. The learning of setting
affinity manually was then applied to all singlethreaded workloads.
Figure 5.2 Cinebench 9.5
Figures 5.1 and 5.2 show Cinebench obtaining highest
Turbo upside when affinity is set, as it can run on one
single core for the whole test duration. Rendering
software performance data show Turbo to have a
positive result; rendering is conventionally calculated
in time units, hence smaller is better.
5.1. Turbo Speedup on UP Client
Setting processor affinity is the process by which an
application manually tells the OS scheduler where to
run, in other words, it restricts the available hardware
threads where the workload may run. For instance,
setting Affinity = p3, tells the OS scheduler to only run
on Processor 3. Setting Affinity = P0, P2, P3, allows
an application to run on hardware thread 0, 2, or 3.
3
Turbo Boost and Overclocking
© Intel Corp.
Figure 5.6 Estimated Individual SPEC CPU2000
Score, 4-Users
Figure 5.3 Rendering Workloads
Figures 5.5 and 5.6 display various components of
CPU2000 visibly benefiting from Turbo. These
workloads represent a gamut of diverse disciplines and
do not all scale linearly with core frequency. Thus,
some workloads do not reach full theoretical Turbo
benefit. Estimated individual SPEC CPU 2000 scores
are based on measurements on Intel internal
development platforms and may differ from
measurements on production platforms available later
in 2009. For more information about the benchmarks
see [4].
Figure 5.3 shows 3DStudioMax and MainConcept
H.264 reaching nearly ideal Turbo speedup because
these workloads are CPU centric, can run on specific
cores, and incur no other overhead.
5.2 Turbo Speedup on DP Server
Table 5.1 summarizes our setup for DP Turbo
experiments. We used an engineering validation board,
called Green City with an open bench top
configuration. This is certainly a different thermal
system condition compared to a typical end-user
environment in a standard chassis. However, we
learned that thermal impact on Turbo performance is
still second-order based on pre-Si study and other postSi experiments conducted. As shown in Figure 5.4,
each processor has an individual heatsink with active
fans attached. In addition, four external fans are placed
on the side to cool down the memories, voltage
regulators, etc. All fans were running at a constant
speed. If the workload does not hit Turbo constraints,
the Core frequency can increase up to 3.33 GHz
dynamically depending on the number of active cores.
Figure 5.4 Arithmetic and Multi-Media Workloads
Figure 5.4 exhibits Sandra measurements of Arithmetic
and Multimedia application with multi-threaded
workloads.
Figure 5.5 Estimated Individual SPEC CPU 2000
Score, 4-Users
Table 5.1 Experimental Setup
4
Turbo Boost and Overclocking
© Intel Corp.
separate processor (two separate sockets on a server
platform) and it is the main explanation for the case
where two ideal performance bars mismatch.
Figure 5.8 Turbo Speedup for SPEC CPU 2000
Integer Rate IC11.0 – 16-user
Figure 5.7 NHM-EP System with external fans
Figure 5.9 is the observed speedup for SPECfpRate.
The average performance benefit by Turbo is 3.3% out
of a 3.5% goal, which is less than observed in the
integer suite. One of the reasons is that some floatingpoint components do not rely on activity that scales
with frequency, e.g. DRAM accesses. Even though
some components directly take advantage of faster core
clock, e.g., sixtrack, they often hit TPD constraint
throughout the execution. Some bars look erroneous in
terms of basic relationship. However, run-to-run
variation has to be factored in to explain. Although we
present the variation only for Turbo case here, its level
was not dramatically different in non-Turbo cases.
There is no empirical evidence so far suggesting that
Turbo introduces additional run-to-run variation on a
given system.
We first tested the SPEC CPU2000 benchmark
compiled with Intel Compiler 11.0 for multiple cases of
our interest. The baseline configuration was to turn off
the Turbo mode and the benchmark scores were
compared with the cases of Turbo mode. Since Turbo
is designed to operate within predetermined TDC,
TDP, and thermal constraints, it may not always run at
maximum Turbo frequency. In order to assess the
efficiency, we compared actual performance against
maximum performance without the constraints. This
unconstrained case would give the same performance
as the case of overclocking the processors to the Turbo
frequency in non-Turbo mode, e.g., 3.20 GHz for
multi-core active workload, unless there is some
overhead caused by the Turbo. Note that we used
maximum scores among several samples of each
experiment in the comparison. The system could
occasionally generate exceptionally low scores due to
certain abnormal transient conditions at the beginning
of tests. Based on our previous experience, we believe
this water-marking approach in sampling is effective
when dealing with a pre-production platform prior to
fine tuning. Figure 5.5 presents the Turbo speedup for
16-user SPECintRate along with the level of run-to-run
variation. The red line indicates the amount of
frequency increase between P1 and P0, i.e., slightly
more than 9%. On average, Turbo mode brings about
5.8% performance upside, compared to non-Turbo.
This extra boost is still within the thermal design
envelope. For example, bzip2 and gcc reach the ideal
Turbo performance target. On the other hand, multiple
components are below the ideal level. Our analysis
using workload profile from the power control unit
showed that these workloads hit TDP limit. The
variation is represented by standard deviation over the
mean for 9 different trials of each benchmark
component. The variability is mainly due to suboptimal memory usage by OS under non-uniform
memory configurations (NUMA) between the two
Figure 5.9. Turbo speedup for SPEC CPU 2000
Floating-point Rate IC11.0 – 16-user.
Since we observed TDP is the only limiter in our test
on this platform with 95W processors, one may wonder
how much performance improvement can be obtained
by a bit more power headroom via process
enhancement or other system implementation factors,
e.g., voltage regulator accuracy. To address this
question, we experimented with additional cases by
artificially adjusting the TDP to higher limits. Figure
5.7 illustrates the performance impact of 4W and 8W
additional TDP budget for selected benchmark
5
Turbo Boost and Overclocking
© Intel Corp.
components, which are core-bound and power
constrained. It is clear that performance gain is
measurable for these workloads.
the clock from its standard rate, with thermal throttling,
which slows down clock rates from the standard rate
for the sake of component protection.
Whether future processors, which will exceed the
1 billion transistors per part, continue to provide both
Turbo boost and thermal throttling, remains to be seen.
But it will be a natural evolutionary step to let the
number of cores grow beyond the 4 in the current Core
i7. Whether such future cores shall have sibling hyperthreads, or whether the architects shall use those same
transistors instead for even more cores remains to be
seen.
8. References and Referees
Figure 5.10 Performance impact with Additional
Power Headroom
We wish to thanks the anonymous reviewers … and
our colleagues at Intel, Ronak Singhal and Jeff Reilly,
who suggested crucial improvements and contributed
clarifications.
We also evaluated SPEC JBB 2005 benchmark. As
shown in Figure 5.11 we compared the impact of the
simultaneous multi-threading (SMT) under Turbo
mode. The Turbo provides the upside with and without
SMT while the best performance is achievable with
SMT and Turbo for this workload. In this case, the
benchmark rarely hit TDP, which is indicated by the
unconstrained Turbo case.
[1] 2008 November, Intel White Paper,
http://download.intel.com/design/processor/applnots/320354.pdf?iid
=tech_tb+paper “Intel® Turbo Boost Technology in Intel Core™
Microarchitecture (Nehalem) Based Processors.”
[2] 2008 November 8, POD Tech website
http://www.podtech.net/home/search/Turbo+Boost+Technology
“Turbo Boost Technology”
[3] 2008 November 3, Intel website
http://download.intel.com/pressroom/kits/corei7/pdf/Intel%C2%AE
%20Core%E2%84%A2%20i7_Overview.pdf “Intel® Core™ i7
Microprocessors, The Best Processor on The Planet”
[4] General SPEC website http://www.spec.org
[5] 2006 August, SPEC website for integer component of SPEC
CPU2006: http://www.spec.org/cpu2006/CINT2006/
[6] 2003 October, SPEC website for floating point component of
SPEC CPU2000: http://www.spec.org/cpu2000/CFP2000/
[7] Intel® Turbo Boost technology,
http://www.intel.com/technology/turboboost/
Figure 5.11 Turbo Performance for SPEC JBB 2005
The results of JBB2005 are not to be interpreted as
official results by Intel, and instead are being presented
as we found them in 2008 on our development
platform, in line with section 5.0 of the JBB run rules.
6. Related Work
If “EIST (Enhanced Intel SpeedStep Technology)” -see Jeff Reilly comment –is different from EDAT,
explain and provide reference.
7. Conclusion and a Look Ahead
In this survey we provided a high-level explanation of
the Turbo Boost mechanism on Core i7, how it can
accelerate some applications, but how the degree of
speedup is dependent on activity-factors of cores on the
same physical processor, and dependent on electrical
and thermal conditions. We compared Turbo, a builtin, dynamic, automatic boosting feature with
overclocking, initiated by the end user at the user’s
own risk. We also contrast Turbo, which will speed up
6
Download