Phase Detection

advertisement
Phase Detection
Jonathan Winter
Casey Smith
CS 612
04/05/05
Motivation
• Large-scale phases exist (order of millions of
instructions)
– For many programs, if we look at any interesting metric (cache
misses, IPC, etc.), we see repeating behavior
– Call the regions with similar behavior “phases”
• Knowledge of phase-based behavior can be used for
adaptive optimization
– Current hardware doesn’t exploit phase behaviors
• For instance
– A region of execution may only need a small cache—save
power/increase performance by shrinking
– A region of execution may benefit from data structure
reorganization
Phase Detection
2
Jonathan Winter and Casey Smith
Basic Methodology
1) Identify phase boundaries
2) Classify phases
3) Determine what optimizations to perform
for each phase
When can each step be performed?
Run time, compile time, offline
Phase Detection
3
Jonathan Winter and Casey Smith
Overview
• We’ll focus two papers on phase detection
– Sherwood, Sair, and Calder, “Phase Tracking
and Prediction,” ISCA 2003
– Shen, Zhong, and Ding, “Locality Phase
Prediction,” ASPLOS 2004
Phase Detection
4
Jonathan Winter and Casey Smith
Sherwood et al. 2003
•
•
•
•
Classifies the behavior of a program into
phases based on code execution
Finds strong correlations between code
execution phases and important performance
and energy metrics
Simulates hardware for real-time detection and
prediction of phases
Demonstrates usefulness through a variety of
optimization techniques made possible by
phase detection
Phase Detection
5
Jonathan Winter and Casey Smith
Definition of a Phase
• Previously (stemming from Denning 1972),
a phase was defined as an interval of
execution where a measured program
metric stayed relatively constant.
• Sherwood et al. consider all sections of
code with similar values for the program
metric to be part of the same phase even if
the intervals are spread out over the
course of the programs execution.
Phase Detection
6
Jonathan Winter and Casey Smith
Key Program Metrics
• Instructions per cycle
(IPC), energy, branch
prediction accuracy,
data cache misses,
instruction cache
misses, L2 cache
misses are all vital
statistics for
optimizing speed and
power consumption
Phase Detection
7
Jonathan Winter and Casey Smith
Single Unified Metric
• Goal: find a single metric that
– Uniquely distinguishes phases
– Guides optimization and policy decisions
• Need some section of code on which to measure
this metric—pick 10M instructions
– Much longer time span than typical architectural
techniques handle
– Long enough to capture large-scale behavior
– Short enough to capture detailed phase behavior
– Size of an OS timeslice
Phase Detection
8
Jonathan Winter and Casey Smith
Metric for Classification
• Based on Basic Blocks
– Basic blocks are a section of code with one entry
point and one exit point
• Basic Block Vector
– Count the number of times each basic block is
executed in the 10M interval
– Entries in the vector are the product of the number
times each basic block is executed and the block
length (BB1*L1, BB2*L2, BB3*L3, …)
– This vector is a signature of the phase which
correlates well with other metrics of interest: IPC,
cache misses, etc.
Phase Detection
9
Jonathan Winter and Casey Smith
Advantages of BBVs
• Independent of architectural measures
and thus unaffected by optimizations
• Weighting biases the signatures to more
frequently executed instructions
• Creates unique signatures which execute
the same code but in different proportions
Phase Detection
10
Jonathan Winter and Casey Smith
Hardware Implementation
• Don’t want to store and examine the whole
vector: compress to a 32-entry vector (footprint)
Phase Detection
11
Jonathan Winter and Casey Smith
Visualization of the
Footprints
Footprints for different intervals of gzip
Phase Detection
12
Jonathan Winter and Casey Smith
What do we do with our
footprint?
• Store a small sample of representative
footprints as phase signatures
• Compare the current footprint to previously
stored footprints
• If we have a close enough match, we
classify them as the same phase
• If not, we store the new footprint as the
representative member of a new phase
Phase Detection
13
Jonathan Winter and Casey Smith
Comparing Footprints
• To save space, only store the top 6 bits of each
entry in the 32-vector
– Counters were saturating 24-bit counters
– The smallest value that the maximum entry could
have would occur if all 10M instructions were
distributed evenly across the 32 entries
– In this case the top six bits means that a counter
value of 10M/32 would have a value of 1
• Distance between footprints is defined as the
Manhattan distance: the sum of the absolute
difference between corresponding entries in two
vectors
Phase Detection
14
Jonathan Winter and Casey Smith
Finding a Match
• If the Manhattan distance is less than a
threshold, two footprints are classified as being
in the same phase
• Determine threshold
by false positives/
false negatives as
compared to an offline
oracle tool.
• Threshold of 220
chosen
Phase Detection
15
Jonathan Winter and Casey Smith
Opportunity
• These classification methods are
oversimplified
• Opportunity to apply better machine
learning techniques
Phase Detection
16
Jonathan Winter and Casey Smith
Within Phase Homogeneity
• Within a phase, architectural metrics have nearly
constant values (this is what we were aiming for)
Phase Detection
17
Jonathan Winter and Casey Smith
Phase Prediction
• Once we’ve been through an interval, we
can identify the phase easily
• But we want to know what phase we’re
going to go to next
• We need to know what phase we will be in
before the interval starts in order to
perform useful optimizations (such as
changing the cache size)
Phase Detection
18
Jonathan Winter and Casey Smith
Simple Prediction
• We could just predict that the next phase
would be the same as the current phase
• The program tends to change phases
more slowly than our 10M intervals, so this
actually gives reasonable accuracy
• However, we can do better
• Note: standard hardware predictors have
not been tried (branch prediction, memory
disambiguation, etc.)
Phase Detection
19
Jonathan Winter and Casey Smith
Markov Model Predictor
• Phase changes depend on the set of previous
phases and the duration of their execution
• Phases tend to last many intervals, therefore
studying recent previous history doesn’t provide
more information than the current state
• Need to encode how long we’ve been in the
current state
• Predict the length of phase to be the same
length it was previously
Phase Detection
20
Jonathan Winter and Casey Smith
Run Length Encoding
Phase Detection
21
Jonathan Winter and Casey Smith
Opportunity
• RLE Markov model is overly simple
• Better prediction techniques exist
• Make use of the order of previous states
rather than just the length of the current
state
Phase Detection
22
Jonathan Winter and Casey Smith
Prediction Accuracy
Phase Detection
23
Jonathan Winter and Casey Smith
Applications
• Frequent Value Locality
– Certain data values form bulk of loads
• Compress to save energy
• Specialize code segments to common values
• Dynamic cache size adaptation
– Shrink cache size to save energy
• Dynamic processor width adaptation
– Fetch/Decode/Issue fewer instructions per
cycle when IPC will be low anyway
Phase Detection
24
Jonathan Winter and Casey Smith
Frequent Value Locality
Phase Detection
25
Jonathan Winter and Casey Smith
Cache Size Adaptation
Phase Detection
26
Jonathan Winter and Casey Smith
Processor Width Adaptation
Phase Detection
27
Jonathan Winter and Casey Smith
Summary of BBV method
• Divide program into 10M instruction intervals
• Characterize each interval by footprint
approximation to basic block vector
• Classify intervals as phases based on footprint
• Predict future phases based on RLE Markov
predictor
• Use information about phases to improve
frequent value locality and optimize cache size
and processor width for performance/energy
Phase Detection
28
Jonathan Winter and Casey Smith
Bottom Line
• Classifying phases based on the
frequency of executed basic blocks is
effective at partitioning the program into
regions of homogenous architectural
behavior
• Significant energy savings with small
performance degradation can be achieved
by applying phase specific optimizations.
Phase Detection
29
Jonathan Winter and Casey Smith
Shen et al. 2004
• Defines phases in a totally different way
• Phases have variable lengths (not 10M
intervals)
• Detects phases by finding likely phase
boundaries
• Uses offline analysis of programs on test
inputs to predict behavior on other inputs
Phase Detection
30
Jonathan Winter and Casey Smith
Metric of Interest
• For optimizing cache size, what we really
care about is the locality of reference
• Measure the locality directly, and classify
phases based on that
• Independent of optimizations performed:
phases recovered are independent of the
hardware it runs on.
Phase Detection
31
Jonathan Winter and Casey Smith
Reuse Distance
• Define the reuse distance as the number of
distinct data elements (locations in memory)
touched between two consecutive references to
the same element.
• Define the reuse distance at the second
reference
• Example:abcbbac
---1022
• Also called LRU Stack Distance
Phase Detection
32
Jonathan Winter and Casey Smith
Overview
• Simulate a test run and record reuse
distance throughout the program
• Use this to separate the program into
“phases”
• Insert phase markers into binary code
• Predict when phase changes will occur
• Use information about phases to adjust
cache size or other hardware parameters
Phase Detection
33
Jonathan Winter and Casey Smith
New Definition of Phase
• Here, a phase is a unit of repeating
behavior, rather than a unit of nearly
uniform behavior
• A phase change is an abrupt change in the
data reuse pattern
Phase Detection
34
Jonathan Winter and Casey Smith
Reuse Trace
Phase Detection
35
Jonathan Winter and Casey Smith
Why Offline Analysis?
• Compilers cannot fully analyze data
locality in programs with indirect
referencing or dynamic structures
• Hardware methods like the one presented
earlier require many severe
approximations for real-time analysis
• Solution: take method offline and analyze
program behavior on test inputs.
Phase Detection
36
Jonathan Winter and Casey Smith
Phase Detection Process
1) Record reuse trace
2) Perform signal processing techniques to
extract useful information from the trace
3) Use the extracted information to find
good places for phase transitions
Phase Detection
37
Jonathan Winter and Casey Smith
1) Record Reuse Trace
• Nontrivial programs access data locations so
many times that an actual full trace would be
overwhelming
• Just sample a representative set of memory
locations/reuse distances
• Threshold to reduce trace size and remove
irrelevant data
– Throw out short distances (C[i] = C[i] + 2)
– Throw out references to nearby memory locations
Phase Detection
38
Jonathan Winter and Casey Smith
2) Signal Processing
• Use wavelet filtering to find abrupt changes in
reuse distance for each recorded memory
location
Phase Detection
39
Jonathan Winter and Casey Smith
3) Phase Partitioning
• Now we have points representing locations of
abrupt changes in reuse distance for individual
memory locations
• Want to divide the list with two things in mind:
– Maximize phase length
– Minimize repetitions of memory locations within a
phase (no multiple abrupt changes)
• Example:
Phase Detection
abcdeefabdfccabef
abcde efabdfc cabef
40
Jonathan Winter and Casey Smith
Missing Link
• So now we have locations of phase
transitions.
• How do detect which regions are the same
phase? Doesn’t say.
• Missing section in paper?
• Assume we can somehow classify the
regions into phases
Phase Detection
41
Jonathan Winter and Casey Smith
Phase Markers
• We know how often a phase occurs and
approximately where its boundaries are
• Goal: find markers that tell us when we’re
entering a particular phase
• For each phase, look for basic blocks that occur
once near each of its beginning boundaries, and
only near the beginnings of its boundaries.
• Use that basic block as a marker to tell when the
program enters that phase
Phase Detection
42
Jonathan Winter and Casey Smith
Using Phases
• Now we know what basic blocks signal
phase entry points
• Run the program with new input
• When we enter a phase for the first time,
we record how long it lasts and its locality
properties
• Assume that these properties will hold for
all subsequent executions of the same
phase
Phase Detection
43
Jonathan Winter and Casey Smith
Phase
Prediction
Performance
Negative Examples
• Not all programs have phases of repeating
behavior that can be identified from test runs
Phase Detection
45
Jonathan Winter and Casey Smith
Applications
• Adaptive Cache Resizing
– Potential performance increase
– Potential power savings
• Memory Remapping
– Reorder data in memory to speed up
execution
Phase Detection
46
Jonathan Winter and Casey Smith
Adaptive Cache Resizing
• Shrink cache without increasing miss ratio
• Phases have repeating behavior, not
uniform behavior
• Divide phases into 10K intervals
• First couple of times we execute a phase
follow test properties
• Apply those cache sizes to subsequent
executions of the phase
Phase Detection
47
Jonathan Winter and Casey Smith
Cache Size Reductions
Phase Detection
48
Jonathan Winter and Casey Smith
Cache Size Reductions with
5% Miss Increase
Phase Detection
49
Jonathan Winter and Casey Smith
Memory Remapping
• Reorder data in memory to speed up
execution
• For example, we might interleave arrays
that tend to be accessed together.
• Options:
– Analyze whole program to find array affinities
– Analyze by phase and reorganize data during
execution (should take into account cost of
remapping, but the authors don’t)
Phase Detection
50
Jonathan Winter and Casey Smith
Memory Remapping
Phase Detection
51
Jonathan Winter and Casey Smith
Summary of the localitybased method
• Record a sampled version of the reuse distance trace on
test input
• Process the trace
• Find phase boundaries
• Find basic block markers for each phase
• Run the program on new data.
• When we see a new phase marker, record how long it
lasts and experiment with optimization parameters for
10K intervals
• Assume subsequent executions of the phase will have
the same length and locality profile, so we can use the
determined optimization parameters
Phase Detection
52
Jonathan Winter and Casey Smith
Bottom Line
• Many programs have long repeating
patterns of data reuse separated by abrupt
changes
• These repeating patterns can be detected
by analyzing the reuse trace
• Characterizing these patterns can lead to
significant energy savings and
performance enhancement through cache
resizing and memory remapping
Phase Detection
53
Jonathan Winter and Casey Smith
Overall Conclusions
• Many programs exhibit large-scale phase
behavior which can be classified and predicted
• Characterization of the phases can lead to
energy savings and performance enhancement
through cache resizing and other techniques
– But no well-done analysis of just how much power is
saved
• Some of this can be done at compile time
(identifying many phase markers), but interval
type analysis and phase characterization must
be done at runtime
Phase Detection
54
Jonathan Winter and Casey Smith
Opportunities
• More intelligent classification
• More sophisticated prediction
• Account for the cost of changing the cache
size in the energy/performance analysis
• Compare results of phase-based
adjustments to actual optimal adjustments
• Examine potential for using compilers for
different parts of the analysis
Phase Detection
55
Jonathan Winter and Casey Smith
Download