Analyzing Branch Mispredictions Bernardo Toninho, Ligia Nistor and Filipe Militão Correlation Analysis

advertisement
15-740 Computer Architecture
Fall 2011 Class Project
Analyzing Branch Mispredictions
Bernardo Toninho, Ligia Nistor and Filipe Militão
Analysis and Results
Introduction
Branch Prediction: fundamental component of
modern pipelined architectures, keeps pipeline full in
presence of changes in control flow.
System Performance is highly sensitive to
predictor accuracy. Mispredictions require flushing
the entire pipeline!
Clustered Mispredictions
Definition: when a predictor is wrong in consecutive guesses.
How large and frequent are misprediction clusters?
Determine the sets of correlated predictions
Correlations are linked to collisions in table index.
We decomposed GShare into:
Base: BHR⊕ PC % TABLE_SIZE
Branch Prediction in a Nutshell
Filter a program’s global branch behavior to exploit
branch correlations (eg. common history, path).
Train a prediction mechanism (eg. 2-bit counter,
perceptron) with the filtered behavior stream.
Correlation Analysis
How can we use this to improve prediction rates?
Keep track of past consecutive mispredictions
“Flip” prediction when a cluster is found
Problem: when to stop flipping?
• ideal/optimal: without making errors
• worst case: after causing a misprediction
Implementations (invert after 2 wrongs):
• gshare+: stop when first error is made
• gshare++: track cluster size average and stop after sum of
average and std. deviation or after mistake
Analyzed Predictors
Unlimited: BHR ⊕ PC
no TABLE_SIZE ambiguity
No collisions: (BHR , PC)
no XOR ambiguity
Path history: (path, BHR , PC)
no path ambiguity
We decomposed Local into:
Limited: (BHR[PC % TABLE_SIZE] ⊕ PC) % TABLE_SIZE
Unlimited: (BHR[PC] ⊕ PC)
no TABLE_SIZE ambiguity
No collisions: (BHR[PC] , PC)
no XOR ambiguity
BHR = Branch History Register
For MM01 (10K predictions)
avg. set
predictor
mispred.
# sets
size
9.09
1.46
base
no collision
11.69
1.105
unlimited
11.69
1.105
11.69
1.095
path
best mix 5.99 (3.10)
For MM01 (10K Predictions)
predictor
base
GSHARE: global history, combined with the PC to
train a table of 2-bit counters.
LOCAL: local history of a branch combined with the
PC to train a table of 2-bit counters.
TAGE: variable length global history combined with
PC indexes into several cache-like prediction tables.
Questions
How do indexing collisions affect predictions?
Are there problematic classes of branches?
Are the captured correlations useful?
Are mispredictions isolated or clustered?
Contributions
In-depth analysis of “state of the practice” predictors:
 Global history, 2-bit counter (GShare)
 Local history, 2-bit counter (PAg-like, Local)
• New analysis methods to measure:
- Correlation (decomposing into correlation sets)
- Training stream characteristics (2-bit counter adequacy)
• New predictor to reduce clustered mispredictions
[1] An Analysis of Correlation and Predictability: What makes two-level branch
predictors work, M. Evers et al., 1998.
[2] Two-level Adaptive Training Branch Prediction, Yeh and Patt, 1991.
[3] Combining Branch Predictors, S. McFarling, 1993.
[4] The L-TAGE predictor, André Seznec, 2007.
base
no collision unlimited
path
682
94.8
1.89
0
0
906
906
914
1.19
0
0
0
0
0
mispred.
avg. set size
# sets
base
no collision
unlimited
9.69
3.34
2988
94.63
0.3
0.01
2.4
2.36
4153
4232
2.04
0
0
no collision
12.62
12.63
unlimited
best mix 8.47 (0.02!)
Misprediction Rates
correlation set
training
set
base
no collision
unlimited
path
training
set
base
no
collision
unlimited
Training Stream Analysis
1-bit counter is a last value predictor
N-bit counter is a last value predictor with a slack of N
Tolerates alternating sequences
Exploits repeating values
Analyzed stream characteristics:
Length of consecutive repeating values (bursts)
Transition (noise) length between bursts
Burst value switches
Warm-up value sensitivity (how long until initial value is irrelevant?)
Branch Classification
Classify branches according to
their behavior (static, single
execution, alternating,
loops, block patterns,
complex patterns, others):
Conclusions
Simple methods to exploit misprediction clusters yield small gains.
Cluster distribution causes high inversion penalty.
Performance gains due to decreasing indexing collisions are not
significant overall
Irregular/random patterns are generally the main cause of
mispredictions (despite being a small class of branches)
Training Stream/Correlation Analysis:
 We observed some instances of destructive interference
 Speculative predictions due to deep pipelines and superscalar
execution cause errors that could have been easily predicted if
the updates were committed sooner
 Initial values affect predictions for a short time
 2-bit counter adequately models most patterns
Download