FSU DEPARTMENT OF COMPUTER SCIENCE Improving Performance by Branch Reordering by Minghui Yang and David Whalley Florida State University and Gang-Ryung Uh Lucent Technologies 1 FSU DEPARTMENT OF COMPUTER SCIENCE Outline of Presentation • Motivation • Detecting a Reorderable Sequence • Selecting the Sequence Ordering • Applying the Transformation • Results • Future Work 2 FSU DEPARTMENT OF COMPUTER SCIENCE Example Sequence of Comparisons with the Same Variable while ((c=getchar()) != EOF) if (c == ’\n’) X; else if (c == ’ ’) Y; else Z; (a) Original Code Segment while (1) { while (1) { c = getchar(); c = getchar(); if (c == ’ ’) if (c > ’ ’) Y; goto def; else if (c == ’\n’) else if (c == ’ ’) X; Y; else if (c == EOF) else if (c == ’\n’) break; X; else else if (c == EOF) Z; break; } else def: Z; (b) Conventional Reordering } (c) Improved Reordering 3 FSU DEPARTMENT OF COMPUTER SCIENCE Overview of Compilation Process for Branch Reordering C first source program training input compilation profile second data compilation data test input data executable instrumented for profiling executable with branches reordered 4 FSU DEPARTMENT OF COMPUTER SCIENCE Ranges and Corresponding Range Conditions Form 1 2 3 4 Range c..c MIN..c c..MAX c1..c2 Range Condition v == c v <= c v >= c c1 <= v && v <= c2 5 FSU DEPARTMENT OF COMPUTER SCIENCE Requirements for a Sequence to Be Reorderable • All the ranges in the sequence are nonoverlapping. • The sequence can only be entered through the first range condition. • The sequence has no side effects. • Each range condition can only contain comparisons and branches. 6 FSU DEPARTMENT OF COMPUTER SCIENCE Example of Detecting Range Conditions if (c>=’a’ && c<=’z’ || c>=’A’ && c<=’Z’) T1; else if (c==’_’) T2; else if (c<=’˜’) T3; else T4; (a) C Code Segment T T c < 97 F c <= 122 F c < 65 F c > 90 F T1 1 2 3 T 4 T 5 c != 95 6 F 7 T T2 c > 126 8 F 9 T T3 T4 10 (b) Control Flow 7 FSU DEPARTMENT OF COMPUTER SCIENCE Example of Detecting Range Conditions (cont.) Blocks 1,2 3,4 6 8 Range [97..122] [65..90] [95..95] [127..MAX] Target T1 T1 T2 T4 8 FSU DEPARTMENT OF COMPUTER SCIENCE Explicit and Default Ranges • An explicit range is a range that is checked by a range condition. • A default range is a range that is not checked by a range condition. P R1 F R2 T T T1 [c1..c2] T2 [c3..c4] F TD [MIN..c1-1] [c2+1..c3-1] [c4+1..MAX] 9 FSU DEPARTMENT OF COMPUTER SCIENCE Example of Reordering Range Conditions P P R1 F R2 T T T1 [c1..c2] R1 F T2 [c3..c4] R2 F F TD [MIN..c1-1] [c2+1..c3-1] [c4+1..MAX] (a) Original Sequence R3 F R4 F R5 T T T T T T1 [c1..c2] T2 [c3..c4] [MIN..c1-1] TD [c2+1..c3-1] [c4+1..MAX] (b) Equivalent Original Sequence 10 FSU DEPARTMENT OF COMPUTER SCIENCE Example of Reordering Range Conditions (cont.) P R5 F R1 F R2 F R3 F R4 T T T T T [c4+1..MAX] P R5 T1 [c1..c2] F R1 T2 [c3..c4] F R2 [MIN..c1-1] T T T [c4+1..MAX] T1 [c1..c2] T2 [c3..c4] F TD [MIN..c1-1] TD [c2+1..c3-1] (c) Reordered Sequence [c2+1..c3-1] (d) Equivalent Reordered Sequence 11 FSU DEPARTMENT OF COMPUTER SCIENCE Sequence Cost Equations pi is the probability that Ri will exit the sequence. ci is the cost of testing Ri. Explicit_Cost([R1 , . . . , R n ]) = p1 c 1 + p2 (c 1 + c 2 ) + . . . + p n (c 1 + c 2 + . . . + c n ) The optimal order of a sequence of explicit range conditions is achieved by sorting them in descending order of pi/ci. Cost([R1 , . . . , R n ]) = E xplicit_Cost([R1 , . . . , R n ]) + (1 − ( p1 + . . . + p n ))(c 1 + . . . + c n ) 12 FSU DEPARTMENT OF COMPUTER SCIENCE Selecting the Sequence Ordering • We need to select one of t targets as the default. • A potential default target having m m-1 combinations ranges could have 2 of ranges that do not have to be explicitly checked. • We used the ordering p1/c1 ≥ ... ≥ pm/cm to select the lowest cost from only m combinations of default range conditions for each target. {Rm}, {Rm-1,Rm}, ..., {R1, ..., Rm} • The minimum cost among the t tar- gets is selected. • Only the cost of n sequences are con- sidered, where n is the total number of ranges for all of the targets. 13 FSU DEPARTMENT OF COMPUTER SCIENCE Applying the Reordering Transformation P1 P1 P1 R3’ F TD F T1 R1’ R1 ... T F F S1 R2’ S1 P2 T2 F T R3’ R2 T F F T2 S1 S2 S2 S1 T S2 R3 T3 TD F TD (a) Original Sequence (b) After Duplicating the Sequence (c) After Eliminating Intervening Side Effects ... R1 F S1 T T1 ... ... P2 P3 R2 T F T2 S2 R3 F TD T T3 P3 R1 F S1 T T1 T R1’ ... F S1 P2 F T R2’ R2 T T2 F F S2 S2 R3 F TD T T3 T T ... P3 T 14 FSU DEPARTMENT OF COMPUTER SCIENCE Applying the Reordering Transformation (cont.) P1 ... P3 R1 F S1 T R2 T F S2 R3 F TD ... S1 T2 P2 T1 T2 S1 S2 T T3 R4 F T R2’ F T R1’ F T R3’ F T S1 S2 TD (d) After Reordering Range Conditions P1 ... P3 S2 R3 F TD ... S1 T2 P2 T1 T2 S1 S2 T T3 R4 F T R2’ F T R1’ F T R3’ F T S1 S2 TD (e) After Dead Code Elimination 15 FSU DEPARTMENT OF COMPUTER SCIENCE Heuristics Used for Translating switch Statements Term n m Heuristic Set I II III Definition Number of cases in a switch statement. Number of possible values between the first and last case. Indirect Jump Binary Search Linear Search n ≥ 4 && m ≤ 3n n ≥ 16 && m ≤ 3n never !indirect_jump && n ≥ 8 !indirect_jump && n ≥ 8 never !indirect_jump && !binary_search !indirect_jump && !binary_search always 16 FSU DEPARTMENT OF COMPUTER SCIENCE Dynamic Frequency Measurements Switch Translation Heuristics Set I Set II Set III Reordered Original Program awk cb cpp ctags deroff grep hyphen join lex nroff pr ptx sdiff sed sort wc yacc average average average Insts Insts Branches 13,611,150 17,100,927 18,883,104 71,889,513 15,460,307 9,256,749 18,059,010 3,552,801 10,005,018 25,307,809 73,051,342 20,059,901 14,558,535 14,229,310 23,146,400 25,818,199 25,127,817 23,477,465 23,510,571 24,556,842 -2.02% -7.65% -0.13% -9.10% -1.53% -3.60% +3.42% -1.68% -4.56% -2.48% -16.25% -9.18% -16.09% -1.16% -47.20% -15.05% -0.25% -7.91% -8.37% -12.72% -4.19% -15.46% -0.19% -14.72% -2.63% -8.31% +3.40% -2.12% -10.39% -6.35% -29.96% -13.28% -37.03% -2.03% -57.38% -26.26% -0.44% -13.37% -14.30% -20.75% 17 FSU DEPARTMENT OF COMPUTER SCIENCE Execution Time Machine SPARC IPC SPARC 20 SPARC Ultra I Heuristic Set I I II Average Execution Time -4.94% -5.57% -2.88% 18 FSU DEPARTMENT OF COMPUTER SCIENCE Future Work • Using Binary Search Instead of Linear Search • Contrasting Various Semi-static Search Methods — Linear Search — Binary Search — Jump Table — Combinations of Methods • Reordering Branches with a Common Successor 19