Variation Aware Application Scheduling for Chip Multi

advertisement
Variation Aware Application Scheduling for Chip Multi-Processors
Lavanya Subramanian, Aman Kumar
Carnegie Mellon University
(lsubrama, amank}@andrew.cmu.edu
Abstract
be exploited to get better energy efficiency and
Variations in chip multi processors are fast becoming a major
performance out of it.
concern, with nanometer scaling. The within die variation,
particularly, is gaining significance in the sub 65 nanometer
technologies. Techniques are being explored to make use of the
2. Related Work
variability information, to achieve better performance and
There has been some work in this direction. [1]
energy efficiency. We propose a unified approach for application
presents a set of algorithms, intended either towards
scheduling that attacks performance and energy efficiency
power or performance.
simultaneously, using information on variability.

Variations in chip multi processors are a major
concern. There are two components to this, the die

being addressed by speed binning and there
has been quite some work on these
gaining
attention
lately.
At
the
transistor/device level, these are variations
in Leff and Vth. These variations in Leff and Vth
translate into frequency and leakage current
variations at the micro-architecture level.
The perspective of a chip multiprocessor as
consisting of several homogeneous cores is not valid
anymore. A CMP has to be relooked at, as a
collection of heterogeneous cores, with different
frequencies and power profiles. These variations can
be profiled or modelled in terms of per-core leakage
and frequency parameters. This information coupled
characteristics
of
the
applications/workloads that run on the CMP, could
The enhanced version of this (VarP+AppP)
cores.

Similarly,
the
performance
centric
algorithms map applications onto the fastest
cores.
The within die component, however, has
the
inclined
consuming applications onto the least leaky
techniques and methodologies.
with
efficiency
tries to map the highest dynamic power
The die to die component of variation is
been
power
onto the least leaky cores.
to die component and the within die component.

basic
algorithm (VarP) tries to map applications
1. Introduction

The
3. Motivation
[1] presents power and performance optimized
algorithms. However, these are oriented solely
towards
power
reduction
or
performance
enhancement. We aim at looking at these in a
unified
fashion,
motivated
by
the
following
observation: For cores that can operate at a specific
maximum frequency, there is a wide variation in
the leakage profiles. Similarly, for cores that have a
certain leakage power, there is a wide spread in the
maximum frequency characteristics [3]. It is on the
basis of this observation that we propose to enhance
the schemes presented by [1].
4. Proposed Scheme
As mentioned earlier, the previous work has
memory and non-memory instructions and per cycle
focussed either on power or performance. One
leakage numbers. The rationale behind obtaining
possible heuristic for the unified scheme is as
these numbers is that BLESS does not model the
follows:
core in great detail. It just distinguishes between
1.
Rank the cores in the order of the maximum
memory and non-memory instructions. The power
frequencies that they can run up to.
numbers from the static profiling are presented in
2. Obtain the static leakage power number for
each core (profiled statically at a nominal
the Preliminary Results section. We use the average
of these numbers in BLESS.
temperature)
3. Rank the applications in the order of
5.2 Variation map generation
dynamic power (obtained by static profiling
The next step is to generate variation maps to
on a core)
characterize the variation of the leakage power and
4. For each application, starting from the
frequencies of the different cores. We obtain these
highest dynamic power one, map the
maps at the per core granularity. We use the
application onto the core with the highest
Varimap tool developed by Sebastian to generate
frequency, with the least leakage. This could
variation maps for Leff (gate length). We model the
be achieved by sorting the cores in frequency
leakage power’s variation with Leff as follows: We
and leakage levels/bins
simulate an inverter in HSPICE, by varying the gate
We plan to analyze the power/performance gains
length and plot the variation of the inverter’s leakage
from using this heuristic and possibly, tweaking it
power, with gate length. We fit this data using
based on the results we obtain.
MATLAB and obtain the following relationship for
The variability model in [2] will be used to model the
leakage power.
frequency and leakage variability information.
LeakageVar = exp(0.051∆Leff 2 – 0.6 ∆Leff – 0.062) Leakage
Where ∆Leff is the gate length variation from the nominal
Leakage is the nominal leakage power
5. Technical Description
LeakageVar is the variation accounted for leakage power
The infrastructure needed to run and analyze our
heuristic against other algorithms requires the
The frequency variation is modelled as the delay
following steps to build
variation being directly proportional to the gate
length variation.
5.1 Static Profiling
This is the first step in the power macro modelling in
We use these models to come up with a 4 x 4
the BLESS (CMP) simulator. We use a single core
variation map. This states the leakage power and
simulator, Sim-GALS to obtain these. This simulator
frequencies for each core in a 4 x 4 CMP.
is intended for a locally synchronous and globally
asynchronous system. We make all the local
5.3 Power/Variation Macro modelling in
frequencies the same and the main purpose of using
BLESS
this simulator is the reasonably accurate leakage
The next step is to take in the variation accounted for
modelling present as part of this tool.
The
power/frequency models/numbers into BLESS, the
technology models we use are 45nm. We simulate
CMP simulator. We read in frequency and leakage
SPEC 2000 benchmarks on this simulator and
maps generated by the Variation modelling. We use
obtain per instruction dynamic power numbers for
the per instruction dynamic power numbers for the
memory and non-memory instructions, scaled by the
frequency of operation of the corresponding core, for
6.2 Results after power/variation macro-
the dynamic power numbers. We use the per cycle
modelling in BLESS
leakage numbers for each core (from the variation
We picked two applications, perlbench, a compute
map) for the leakage power computation. We put
intensive application and mcf, a memory intensive
together all of these and finally report the power and
application. We mapped a copy of perlbench onto all
performance (MIPS) for the different cores. We look
cores and studied the MIPS and power with and
at the variation of the power and performance across
without variation. We repeated the same thing for
the different processors, to get a rough feel of the
mcf. The results are interesting
variation
behaviour.
We
present
this
in
the
preliminary results section
We now have the basic infrastructure – a CMP
simulator with power and variability models. The
next step is to build a mock scheduler, to perform
Standard
Application
Variation
Mean
Perlbench
Without
6273
20
With
6077
133
Without
1970
99
With
1909
103
Mcf
the application migration between the different
Deviation
Table 2: MIPS comparison
cores, at scheduling intervals. Then, we’ll be all set to
compare the different algorithms proposed in [1]
and our heuristic.
6. Preliminary Results
Variation
Mean(Watt)
Perlbench
Without
5.9
0.0179
With
5.7
0.1720
Without
1.9462
0.0906
With
1.8669
0.1258
6.1 Static Profiling results from Sim-GALS
These are our static profling results from Sim-GALS
for Spec 2000 benchmarks. They list the memory
Standard
Application
Mcf
Deviation
Table 3: Avg Power per Cycle (Watt) comparison
and non-memory instruction dynamic powers and
the core per cycle leakage powers. We obtain average
The sigma of the MIPS for perlbench, the compute
numbers from all benchmarks to use in BLESS.
intensive application is much bigger when variation
NMIDP*
MIDP*
ACLP/cycle*
(Watt)
(Watt)
(Watt)
ammp
4.856
3.6018
0.1272
gzip
2.514
1.3364
0.0897
vpr
4.0125
2.9914
0.1569
performance as much. However, the sigma for
mesa
2.6177
1.5051
0.1261
average power per cycle for both perlbench and mcf
art
3.7089
2.8037
0.1719
is quite large (though perlbench’s sigma variation is
mcf
3.3925
2.5841
0.1716
larger than mcf’s), as compared to the no variation
parser
2.6258
1.7255
0.1529
case. This can be explained by the fact that non-
vortex
3.8746
2.8734
0.1536
memory instructions also consume 2.3 Watt per
bzip2
2.4704
1.3382
0.0854
cycle in the core and hence this component is
3.341377778
2.306622222
0.137255556
Application
Average
Table 1: Sim-GALS results (45nm)
*NMIDP – Non-memory instruction Dynamic Power
*MIDP – Memory Instruction Dynamic Power
*ACLP/cycle – Avg. Core Leakage Power per Cycle
is accounted for. However, the sigma increase of the
MIPS for mcf is small, as it is memory intensive and
variations in processor frequency do not affect its
affected by variation too.
Lavanya
7. Original Plan
worked
on
the
static
profiling
and
incorporation in BLESS part. Aman worked on the
variation modelling/map generation aspect.
9. Conclusion
We observe that there is definitely a difference in the
power/performance of different cores, even when
they run the same applications. A large part of this
difference is a result of process variability. Hence, we
believe that our original plan of building scheduling
algorithms that use this process variability is further
bolstered by these observartions.

Milestone 1:
Static profiling of applications to obtain
10. Project Website
http://www.cs.cmu.edu/~amank/
dynamic powers, on Simple scalar with
Wattch.
Build variability information into the BLESS
CMP simulator

Milestone 2
Build a scheduler into or on top of the CMP

11. References
[1] R. Teodorescu and J. Torrellas. Variationaware application scheduling and power
management for chip multiprocessors. In
ISCA’08: Proceedings of the 35th annual
simulator.
InternationalSymposium
Milestone 3
Architecture, 2008.
Implement and analyze the proposed scheme
against the baseline algorithms.
on
Computer
[2] Y. Abulafia and A. Kornfeld. Estimation of
FMAX and ISB in microprocessors. IEEE
Transactions on VLSI Systems, 13(10), Oct
8. Progress
We have stuck to our original plan/schedule so far.
2006.
[3] Borkar,
S.,
Karnik,
T.,
Narendra,
S.,
We have achieved Milestone 1, which we promised
Tschanz, J., Keshavarzi, A., and De, V. 2003.
to, during the proposal. Building a scheduler and
Parameter variations and impact on circuits
analyzing the different algorithms is what is left to be
and microarchitecture. In Proceedings of the
done.
40th
Annual
Design
Automation
Conference (Anaheim, CA, USA, June 02 06, 2003). DAC '03. ACM, New York, NY,
338-342.
Fig Variability per core modelled
Download