Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation Talk Outline Problem Wish Branches Experimental Methodology Results Conclusion 2 Predicated Execution (normal branch code) (predicated code) A if (cond) { b = 0; } else { b = 1; } T N C B A B C D A B C D p1 = (cond) branch p1, TARGET A mov b, 1 jmp JOIN B TARGET: mov b, 0 C p1 = (cond) (!p1) mov b, 1 (p1) mov b, 0 D Convert control flow dependency to data dependency add x, b, 1 Pro: Eliminate hard-to-predict branches Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved 3 The Overhead of Predicated Execution A p1 = (cond) B (0) (!p1) mov mov b,1 b, 1 C D (1) (p1) mov b,0 b, 0 Normalized execution time non-predicated -2% 16% 13% 2.02 1.2 1 0.8 0.6 0.4 PREDICATED CODE NO-DEPENDENCY NO-DEPENDENCY + NO-FETCH 0.2 add x, b, 1 0 (Predicated code) gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG If all overhead is ideally eliminated, predicated execution would provide 16% improvement in average execution time 4 The Problem Due to the predication overhead, predicated execution sometimes reduces performance Branch misprediction characteristics are dependent on run-time behavior: input set, control-flow path and phase behavior. The compiler cannot accurately estimate the run-time behavior of branches 5 Talk Outline Problem Wish Branches Experimental Methodology Results Conclusion 6 Wish Branches A new type of control flow instruction 3 types: wish jump/join and wish loop The compiler generates code (with wish branches) that can be executed either as predicated code or non-predicated code (normal branch code) The hardware decides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction Easy to predict: normal branch code Hard to predict: predicated code 7 Wish Jump/Join High Confidence Low Confidence A wish jump A T N A B C B C D D A B C p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN TARGET: mov b,0 normal branch code A B C (!p1) mov b,1 C D A p1 = (cond) B wish join B p1=(cond) wish.jump p1 TARGET (!p1) (1) mov b,1 wish.join wish.join !p1(1)JOIN JOIN C TARGET: (p1) mov b,0 (p1) mov b,0 (1) D JOIN: predicated code wish jump/join code 8 Wish Loop H X do { T N a++; i++; X T N Y } while (i<N); X LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP Y High Low Confidence H mov p1, 1 X LOOP: (p1) add a, a, 1 (1) (p1) add i, i, 1 (1) (1) (p1) p1 = (cond) wish. loop p1, LOOP Y EXIT: Y EXIT: wish loop code normal backward branch code 9 Mispredicted Case 1: Early-Exit H X Correct execution: H X1 X2 X3 Y T T N T Early-exit: N Y H X1 X2 Y … Flush pipeline (Low confidence) T N X3 Y N Compared to normal branch code: predicate data dependency and one extra instruction (-) 10 Mispredicted Case 2: Late-Exit H X Correct execution: H X1 X2 X3 Y T T N T Late-exit: N Y H X1 X2 X3 X4 X5 Y … (Low confidence) T T T T N Compared to normal branch code: pro: reduce flush penalty (+++) cons: predicate data dependency and one extra instruction (-) 11 Mispredicted Case 3: No-Exit H X Correct execution: H X1 X2 X3 Y T T N Flush pipeline T No-exit: N Y H X1 X2 X3 X4 X5 X6 … (Low confidence) T T T T T T Y Compared to normal branch code: predicate data dependency and one extra instruction (-) 12 Advantages/Disadvantages of Wish Branches Advantages compared to predicated execution Reduce the overhead of predication Increase the benefits of predicated code by allowing the compiler to generate more aggressively-predicated code Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops) Make predicated code less dependent on machine configuration (eg. branch predictor) 13 Advantages/Disadvantages of Wish Branches Disadvantages compared to predicated execution Extra branch instructions use machine resources Extra branch instructions increase the contention for branch predictor table entries May constrain the compiler’s scope for code optimizations 14 Wish Branch Support ISA Support predicated execution, wish branch instruction Compiler Support Wish branch generation algorithms The compiler needs to decide which branches are predicated, which are converted to wish branches, and which stay as normal branches Hardware Support Confidence estimator Front-end and branch misprediction detection/recovery module 15 Talk Outline Problem Wish Branches Experimental Methodology Results Conclusion 16 Experimental Infrastructure Source Code IA-64 Compiler (ORC) IA-64 Binary Trace generation module IA-64 Trace Micro-op Translator µops Micro-op Simulator IA-64 provides full support for predication Convert IA-64 traces to micro-ops to simulate an out-of-order superscalar processor model 17 Simulation Methodology Nine SPEC 2000 integer benchmarks Baseline Processor Configuration Front End Large and accurate branch predictor (64KB hybrid branch predictor: gshare + local) Minimum 30-cycle branch misprediction penalty 64KB, 2-cycle latency I-cache Execution Core 8-wide out-of-order processor 512-entry instruction window Confidence Estimator 1KB tagged 16-bit history JRS confidence estimator (Jacobsen et al. MICRO-29) 18 Talk Outline Problem Wish Branches Experimental Methodology Results Conclusion 19 Performance Improvement 2.02 2.02 2.02 2.02 1.2 24% Normalized Normalized execution execution time. time. non-predicated 1 -4% 14% 8% 0.8 0.6 0.4 SELECTIVE-PREDICATION SELECTIVE-PREDICATION AGGRESSIVE-PREDICATION AGGRESSIVE-PREDICATION wish wish jump/join jump/join wish wish jump/join/loop jump/join/loop 0.2 0 gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf 16% over conditional branch prediction and 14% 12% (w/o mcf) 13% over selective-predication and 11% (w/o mcf) AGGRESSIVE-PREDICATION: all branches that are suitable for ifSELECTIVE-PREDICATION: branches are selectively predicated 7% 16% 13 %over over overaggressive aggressive-predication aggressivepredication predication(w/o mcf) conversion are predicated using compile-time cost-benefit analysis 20 Talk Outline Problem Wish Branches Experimental Methodology Results Conclusion 21 Conclusion New control flow instructions: wish branches (jump/join/loop) Wish branches improve performance by dividing the work of predication between the compiler and the microarchitecture Compiler: analyzes the control-flow graph and generates code Microarchitecture: makes run-time decision to use predication Wish branches provide significant performance benefits 16% compared to conditional branch prediction 13% compared to selectively predicated code Wish branches can make predicated execution more viable and effective in high performance processors By enabling adaptive and aggressive predicated execution 22