Wish Branches Hyesoon Kim Onur Mutlu Jared Stark*

advertisement
Wish Branches
Combining Conditional Branching and Predication
for Adaptive Predicated Execution
Hyesoon Kim
Onur Mutlu
Jared Stark*
Yale N. Patt
The University of Texas at Austin
*Oregon Microarchitecture Lab
Electrical and Computer Engineering Intel Corporation
Talk Outline
 Problem
 Wish Branches
 Experimental Methodology
 Results
 Conclusion
2
Predicated Execution
(normal branch code)
(predicated code)
A
if (cond) {
b = 0;
}
else {
b = 1;
}
T
N
C
B
A
B
C
D
A
B
C
D
p1 = (cond)
branch p1, TARGET
A
mov b, 1
jmp JOIN
B
TARGET:
mov b, 0
C
p1 = (cond)
(!p1) mov b, 1
(p1) mov b, 0
D
Convert control flow dependency to data dependency
add x, b, 1
Pro: Eliminate hard-to-predict branches
Cons: (1) Fetch blocks B and C all the time
(2) Wait until p1 is resolved
3
The Overhead of Predicated Execution
A
p1 = (cond)
B (0)
(!p1) mov
mov b,1
b, 1
C
D
(1)
(p1) mov b,0
b, 0
Normalized execution time
non-predicated
-2%
16%
13%
2.02
1.2
1
0.8
0.6
0.4
PREDICATED CODE
NO-DEPENDENCY
NO-DEPENDENCY + NO-FETCH
0.2
add x, b, 1
0
(Predicated code)
gzip
vpr
mcf
crafty
parser
gap
vortex
bzip2
twolf
AVG
If all overhead is ideally eliminated, predicated execution would
provide 16% improvement in average execution time
4
The Problem
 Due to the predication overhead, predicated
execution sometimes reduces performance
 Branch misprediction characteristics are
dependent on run-time behavior: input set,
control-flow path and phase behavior.
The compiler cannot accurately estimate the
run-time behavior of branches
5
Talk Outline
 Problem
 Wish Branches
 Experimental Methodology
 Results
 Conclusion
6
Wish Branches
 A new type of control flow instruction
3 types: wish jump/join and wish loop
 The compiler generates code (with wish branches)
that can be executed either as predicated code or
non-predicated code (normal branch code)
 The hardware decides to execute predicated code
or normal branch code at run-time based on the
confidence of branch prediction
 Easy to predict: normal branch code
 Hard to predict: predicated code
7
Wish Jump/Join
High
Confidence
Low Confidence
A wish jump
A
T
N
A
B
C
B
C
D
D
A
B
C
p1 = (cond)
branch p1, TARGET
mov b, 1
jmp JOIN
TARGET:
mov b,0
normal branch code
A
B
C
(!p1) mov b,1
C
D
A
p1 = (cond)
B wish join
B
p1=(cond)
wish.jump p1 TARGET
(!p1)
(1) mov b,1
wish.join
wish.join
!p1(1)JOIN
JOIN
C TARGET:
(p1) mov b,0
(p1) mov b,0
(1)
D JOIN:
predicated code
wish jump/join code
8
Wish Loop
H
X
do {
T
N
a++;
i++;
X
T
N
Y
} while (i<N);
X
LOOP:
add a, a, 1
add i, i, 1
p1 = (i<N)
branch p1, LOOP
Y
High
Low Confidence
H
mov p1, 1
X LOOP:
(p1) add a, a, 1
(1)
(p1) add i, i, 1
(1)
(1)
(p1) p1 = (cond)
wish. loop p1, LOOP
Y EXIT:
Y EXIT:
wish loop code
normal backward branch code
9
Mispredicted Case 1: Early-Exit
H
X
Correct
execution:
H X1 X2 X3 Y
T
T
N
T
Early-exit:
N
Y
H X1 X2 Y … Flush pipeline
(Low confidence)
T N
X3 Y
N
Compared to normal branch code:
predicate data dependency and one extra instruction (-)
10
Mispredicted Case 2: Late-Exit
H
X
Correct
execution:
H X1 X2 X3 Y
T
T
N
T
Late-exit:
N
Y
H X1 X2 X3 X4 X5 Y …
(Low confidence)
T T T T N
Compared to normal branch code:
pro: reduce flush penalty (+++)
cons: predicate data dependency and one extra instruction (-)
11
Mispredicted Case 3: No-Exit
H
X
Correct
execution:
H X1 X2 X3 Y
T
T
N
Flush pipeline
T
No-exit:
N
Y
H X1 X2 X3 X4 X5 X6 …
(Low confidence)
T T T T T T
Y
Compared to normal branch code:
predicate data dependency and one extra instruction (-)
12
Advantages/Disadvantages of Wish Branches
 Advantages compared to predicated execution
 Reduce the overhead of predication
 Increase the benefits of predicated code by
allowing the compiler to generate more
aggressively-predicated code
 Provide a mechanism to exploit predication to
reduce the branch misprediction penalty for
backward branches (Wish loops)
 Make predicated code less dependent on
machine configuration (eg. branch predictor)
13
Advantages/Disadvantages of Wish Branches
 Disadvantages compared to predicated execution
 Extra branch instructions use machine
resources
 Extra branch instructions increase the
contention for branch predictor table entries
 May constrain the compiler’s scope for code
optimizations
14
Wish Branch Support
 ISA Support
 predicated execution, wish branch instruction
 Compiler Support
 Wish branch generation algorithms
The compiler needs to decide which branches
are predicated, which are converted to wish
branches, and which stay as normal branches
 Hardware Support
 Confidence estimator
 Front-end and branch misprediction
detection/recovery module
15
Talk Outline
 Problem
 Wish Branches
 Experimental Methodology
 Results
 Conclusion
16
Experimental Infrastructure
Source
Code
IA-64
Compiler
(ORC)
IA-64
Binary
Trace
generation
module
IA-64
Trace
Micro-op
Translator
µops
Micro-op
Simulator
 IA-64 provides full support for predication
 Convert IA-64 traces to micro-ops to simulate
an out-of-order superscalar processor model
17
Simulation Methodology
 Nine SPEC 2000 integer benchmarks
 Baseline Processor Configuration
 Front End
 Large and accurate branch predictor
(64KB
hybrid branch predictor: gshare + local)
 Minimum 30-cycle branch misprediction penalty
 64KB, 2-cycle latency I-cache
 Execution Core
 8-wide out-of-order processor
 512-entry instruction window
 Confidence Estimator
 1KB tagged 16-bit history JRS confidence
estimator (Jacobsen et al. MICRO-29)
18
Talk Outline
 Problem
 Wish Branches
 Experimental Methodology
 Results
 Conclusion
19
Performance Improvement
2.02
2.02
2.02
2.02
1.2
24%
Normalized
Normalized execution
execution time.
time.
non-predicated 1
-4%
14%
8%
0.8
0.6
0.4
SELECTIVE-PREDICATION
SELECTIVE-PREDICATION
AGGRESSIVE-PREDICATION
AGGRESSIVE-PREDICATION
wish
wish jump/join
jump/join
wish
wish jump/join/loop
jump/join/loop
0.2
0
gzip
vpr
mcf
crafty parser
gap
vortex bzip2
twolf AVG AVGnomcf
16% over conditional branch prediction and
14%
12%
(w/o mcf)
13% over selective-predication and
11%
(w/o mcf)
AGGRESSIVE-PREDICATION:
all branches
that are suitable
for ifSELECTIVE-PREDICATION:
branches
are selectively
predicated
7%
16%
13
%over
over
overaggressive
aggressive-predication
aggressivepredication
predication(w/o mcf)
conversion
are predicated
using
compile-time
cost-benefit analysis
20
Talk Outline
 Problem
 Wish Branches
 Experimental Methodology
 Results
 Conclusion
21
Conclusion

New control flow instructions: wish branches (jump/join/loop)

Wish branches improve performance by dividing the work of
predication between the compiler and the microarchitecture



Compiler: analyzes the control-flow graph and generates code

Microarchitecture: makes run-time decision to use predication
Wish branches provide significant performance benefits

16% compared to conditional branch prediction

13% compared to selectively predicated code
Wish branches can make predicated execution more viable
and effective in high performance processors

By enabling adaptive and aggressive predicated execution
22
Download