2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set M. Aater Suleman

advertisement
2D-Profiling
Detecting Input-Dependent Branches
with a Single Input Data Set
Hyesoon Kim
M. Aater Suleman
Onur Mutlu
Yale N. Patt
HPS Research Group
The University of Texas at Austin
Motivation
 Profile-guided code optimization has
become essential for achieving good
performance.
 Run-time behavior  profile-time behavior: Good!
 Run-time behavior ≠ profile-time behavior: Bad!
2
Motivation
 Profiling with one input set is not enough!
 Because a program can show different behavior with
different input data sets
 Example: Performance of predicated execution is
highly dependent on the input data set
 Because some branches behave differently with
different input sets
3
Input-dependent Branches
 Definition
 A branch is input-dependent if its misprediction
rate differs by more than some Δ over different
input data sets.
Inp. 1
Inp. 2
Inp.1 – Inp. 2
Misprediction
rate of Br. X
30%
29%
1%
Misprediction
rate of Br. Y
30%
5%
25%
Input-dependent branch
 Input-dependent br.
 hard-to-predict br.
4
An Example Input-dependent Branch
 Example from Gap (SPEC2K):
Type checking branch
TypHandle Sum (A,B)
If ((type(A) == INT) && (type(B) == INT)) {
Result = A + B;
Return Result;
//input-dependent br
}
Return SUM(A, B);


Train input set: A&B are integers 90% of the time
 misprediction rate: 10%
Reference input set: A&B are integers 42% of the time
 misprediction rate: 30% (30%-10%)>Δ
5
Predicated Execution
(normal branch code)
(predicated code)
A
if (cond) {
b = 0;
}
else {
b = 1;
}
T
N
C
B
A
B
C
D
A
B
C
D
p1 = (cond)
branch p1, TARGET
A
mov b, 1
jmp JOIN
B
TARGET:
mov b, 0
C
p1 = (cond)
(!p1) mov b, 1
(p1) mov b, 0
Eliminate hard-to-predict branches
but fetch blocks B and C all the time
6
Predicated Code Performance vs.
Branch Misprediction Rate
Predicated code performs better
Execution time (cycles)
8
predicated code
7
normal branch code
6
5
4
run-time (input B) profile-time (input A)
3
2
0
1
2
3
4
5
6
X7
8
9
10
11
12
13
14
15
Branch misprediction rate (%)
Normal branch code performs better
 Converting a branch to predicated code could hurt
performance if run-time misprediction rate is lower
than profile-time misprediction rate
7
Exec. time normalized to non-predicated code
Predicated Code Performance vs. Input Set
1.2
run-time: input-A
run-time: input-B
run-time: input-C
-16%
-4%
9%
1%
non-predicated
1
0.8
0.6
Predicated execution loses performance because of input-dependent branches
0.4
0.2
0
gzip
vpr
gcc
mcf
crafty parser perlbmk gap
vortex bzip2
twolf
Measured on an Itanium-II machine
8
If We Know a Branch is Input-Dependent
 May not convert it to predicated code.
 May convert it to a wish branch.
[Kim et al. Micro’05]
 May not perform other compiler optimizations
or may perform them less aggressively.
 Hot-path/trace/superblock-based optimizations
[Fisher’81, Pettis’90, Hwu’93, Merten’99]
9
Our Goal
Identify input-dependent branches by
using a single input set for profiling
10
Talk Outline
 Motivation
 2D-profiling Mechanism
 Experimental Results
 Conclusion
11
Key Insight of 2D-profiling
Branch Prediction Accuracy
Branch
Phase behavior in prediction accuracy
is a good indicator of input dependence
phase 2
1
0.8
phase 1
0.6
0.4
input-dependent
phase 3
input-independent
0.2
0
0
500
1000
1500
Time (in terms of number of executed instructions x 100M)
12
brA
pr. Acc
Traditional Profiling
MEAN pr.Acc(brA)
brB
pr. Acc
time
MEAN pr.Acc(brB)
time
MEAN pr.Acc(brA)  MEAN pr.Acc(brB)
behavior of brA  behavior of brB
13
brA
pr. Acc
2D-profiling
MEAN pr.Acc(brA)
brB
pr. Acc
time
STD pr.Acc(brA)
MEAN pr.Acc(brB)
STD pr.Acc(brB)
time
MEAN pr.Acc(brA)  MEAN pr.Acc(brB)
STD pr.Acc(brA) ≠ STD pr.Acc(brB)
behavior of brA ≠ behavior of brB
A: input-dependent br, B: input-independent br
14
2D-profiling Mechanism

The profiler collects branch prediction accuracy information
for every static branch over time
slice size = M instructions
Slice 1
Slice 2
…
Slice N
time
mean Pr.Acc(brA,s1) mean Pr.Acc(brA,s2) ...
mean Pr.Acc(brA,sN)
mean Pr.Acc(brB,s1)
mean Pr.Acc(brB,s2)
...
...
...
mean Pr.Acc(brB,sN)
...
PAM:50%
brA
Calculate MEAN (brA,mean
brB, brA
…),
Standard deviation (brA, brB, …),
brB …)
PAM:Points Above Mean (brA, brB,
mean brB
PAM:0%
15
Input-dependence Tests
 STD&PAM-test: Identify branches that have large variations
in accuracy over time (phase behavior)


STD-test (STD > threshold): Identify branches that have large
variations in the prediction accuracy over time
PAM-test (PAM > threshold): Filter out branches that pass STDtest due to a few outlier samples
 MEAN&PAM-test: Identify branches that have low prediction
accuracy and some time-variation in accuracy


MEAN-test (MEAN < threshold): Identify branches that have low
prediction accuracy
PAM-test (PAM > threshold): Identify branches that have some
variation in the prediction accuracy over time
 A branch is classified as input-dependent if it passes either
STD&PAM-test or MEAN&PAM-test
16
Talk Outline
 Motivation
 2D-profiling Mechanism
 Experimental Results
 Conclusion
17
Experimental Methodology
 Profiler: PIN-binary instrumentation tool
 Benchmarks: SPEC 2K INT
 Input sets

Profiler: Train input set

Input-dependent Branches: Reference input set and train/other
extra input sets
 Input-dependent branch: misprediction rate of the branch
changes more than Δ = 5% when input data set changes

Different Δ are examined in our TechReport [reference 11].
 Branch predictors

Profiler: 4KB Gshare, Machine: 4KB Gshare

Profiler: 4KB Gshare, Machine: 16KB Perceptron (in paper)
18
Evaluation Metrics
 Coverage and Accuracy for input-dependent
branches
Correctly Predicted Input-dependent br.
Predicted Input-dependent br. (2D-profiler)
A
B
AB
Correctly Predicted
COV 

A
Actual Input - dependent
Actual Input-dependent br.
ACC 
AB
Correctly Predicted

B
Predicted Input - dependent
19
% of branches that are input-dependent
Input-dependent Branches
100%
2 input sets
3 input sets
4 input sets
5 input sets
6 input sets
7 input sets
8 input sets
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
bzip2
gzip
twolf
gap
crafty
gcc
20
2D-profiling Results
1
88%
93%
0.9
0.8
Fraction
0.7
76%
2 input sets
63%
3 input sets
0.6
4 input sets
0.5
5 input sets
0.4
6 input sets
0.3
7 input sets
0.2
8 input sets
0.1
0
coverage
accuracy
input-dependent branches
coverage
accuracy
input-independent branches
Phase behavior and input-dependence are strongly correlated!
21
The Cost of 2D-profiling?
Normalized execution time
1.8
54%1%
1.6
1.4
1.2
1
0.8
0.6
Edge
0.4
Gshare
0.2
2D+Gshare
0
bzip2

gzip
twolf
gap
crafty
gcc
AVG
2D-profiling adds only 1% overhead over modeling the branch
predictor in software

Using a H/W branch predictor [Conte’96]
22
Conclusion
 2D-profiling is a new profiling technique to find inputdependent characteristics by using a single input data set
for profiling
 2D-profiling uses time-varying information instead of just
average data
 Phase behavior in prediction accuracy in a profile run 
input-dependent
 2D-profiling accurately identifies input-dependent branches
with very little overhead (1% more than modeling the
branch predictor in the profiler)
 Applications of 2D-profiling are an open research topic
 Better predicated code/wish branch generation algorithms
 Detecting other input-dependent program characteristics
23
Questions?
Download