Florida State University Exhaustive Optimization Phase Order Space Exploration Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson Symposium on Code Generation and Optimization - 2006 Florida State University Optimization Phase Ordering • Optimizing compilers apply several optimization phases to improve the performance of applications. • Optimization phases interact with each other. • Determining the best order of applying optimization phases has been a long standing problem in compilers. Symposium on Code Generation and Optimization - 2006 2 Florida State University Exhaustive Phase Order Enumeration... is it Feasible ? • A obvious approach to address the phase ordering problem is to exhaustively evaluate all combinations of optimization phases. • Exhaustive enumeration is difficult • compilers typically contain many different optimization phases • optimizations may be successful multiple times for each function / program Symposium on Code Generation and Optimization - 2006 3 Florida State University Optimization Space Properties • Phase ordering problem can be made more manageable by exploiting certain properties of the optimization search space • optimization phases might not apply any transformations • many optimization phases are independent • Thus, many different orderings of optimization phases produce the same code. Symposium on Code Generation and Optimization - 2006 4 Florida State University Re-stating the Phase Ordering Problem • Rather than considering all attempted phase sequences, the phase ordering problem can be addressed by enumerating all distinct function instances that can be produced by combination of optimization phases. • We were able to exhaustively enumerate 109 out of 111 functions, in a few minutes for most. Symposium on Code Generation and Optimization - 2006 5 Florida State University Outline • Experimental framework • Algorithm for exhaustive enumeration of the phase order space • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 6 Florida State University Experimental Framework • We used the VPO compilation system • established compiler framework, started development in 1988 • comparable performance to gcc –O2 • VPO performs all transformations on a single representation (RTLs), so it is possible to perform most phases in an arbitrary order. • Experiments use all the 15 available optimization phases in VPO. • Target architecture was the StrongARM SA-100 processor. Symposium on Code Generation and Optimization - 2006 7 Florida State University VPO Optimization Phases ID b c d g h i j k Optimization Phase branch chaining common subexpr. elim. remv. unreachable code loop unrolling dead assignment elim. block reordering minimize loop jumps register allocation ID l n o q r s u Optimization Phase loop transformations code abstraction eval. order determin. strength reduction reverse branches instruction selection remv. useless jumps Symposium on Code Generation and Optimization - 2006 8 Florida State University Disclaimers • Did not include optimization phases normally associated with compiler front ends • no memory hierarchy optimizations • no inlining or other interprocedural optimizations • Did not vary how phases are applied. • Did not include optimizations that require profile data. Symposium on Code Generation and Optimization - 2006 9 Florida State University Benchmarks • Used one program from each of the six MiBench categories. • Total of 111 functions. Category Program Description auto network telecomm consumer security office bitcount dijkstra fft jpeg sha stringsearch test processor bit manipulation abilities Dijkstra’s shortest path algorithm fast fourier transform image compression / decompression secure hash algorithm searches for given words in phrases Symposium on Code Generation and Optimization - 2006 10 Florida State University Outline • Experimental framework • Exhaustive enumeration of the phase order space. • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 11 Florida State University Naïve Optimization Phase Order Space Exploration • All combinations of optimization phase sequences are attempted. L0 a d c b L1 a b c d a b c d a b c d a b c d L2 Symposium on Code Generation and Optimization - 2006 12 Florida State University Eliminating Consecutively Applied Phases • A phase just applied in our compiler cannot be immediately active again. L0 a c b d L1 b c d a c d a b d a b c L2 Symposium on Code Generation and Optimization - 2006 13 Florida State University Eliminating Dormant Phases • Get feedback from the compiler indicating if any transformations were successfully applied in a phase. L0 a c b d L1 b c d a c d a b d L2 Symposium on Code Generation and Optimization - 2006 14 Florida State University Detecting Identical Function Instances • Some optimization phases are independent • example: branch chaining & register allocation • Different phase sequences can produce the same code r[2] = 1; r[3] = r[4] + r[2]; ⇒instruction selection r[3] = r[4] + 1; r[2] = 1; r[3] = r[4] + r[2]; ⇒constant propagation r[2] = 1; r[3] = r[4] + 1; ⇒dead assignment elimination r[3] = r[4] + 1; Symposium on Code Generation and Optimization - 2006 15 Florida State University Detecting Equivalent Function Instances sum = 0; for (i = 0; i < 1000; i++ ) sum += a [ i ]; Source Code r[10]=0; r[12]=HI[a]; r[12]=r[12]+LO[a]; r[1]=r[12]; r[9]=4000+r[12]; L3 r[8]=M[r[1]]; r[10]=r[10]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L3; r[11]=0; r[10]=HI[a]; r[10]=r[10]+LO[a]; r[1]=r[10]; r[9]=4000+r[10]; L5 r[8]=M[r[1]]; r[11]=r[11]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L5; Register Allocation before Code Motion Code Motion before Register Allocation r[32]=0; r[33]=HI[a]; r[33]=r[33]+LO[a]; r[34]=r[33]; r[35]=4000+r[33]; L01 r[36]=M[r[34]]; r[32]=r[32]+r[36]; r[34]=r[34]+4; IC=r[34]?r[35]; PC=IC<0,L01; After Mapping Registers Symposium on Code Generation and Optimization - 2006 16 Florida State University Resulting Search Space • Merging equivalent function instances transforms the tree to a DAG. L0 a c b L1 c d a d a d L2 Symposium on Code Generation and Optimization - 2006 17 Florida State University Efficient Detection of Unique Function Instances • Even after pruning there may be tens or hundreds of thousands of unique instances. • Use a CRC (cyclic redundancy check) checksum on the bytes of the RTLs representing the instructions. • Used a hash table to check if an equivalent function instance already exists in the DAG. Symposium on Code Generation and Optimization - 2006 18 Florida State University Techniques to Make Searches Faster • Kept a copy of the program representation of the unoptimized function instance in memory to avoid repeated disk accesses. • Also kept the program representation after each active phase in memory to reduce the number of phases applied for each sequence. • Reduced search time by at least a factor of 5 to 10. • Out of 111 functions in our benchmark suite we were able to completely enumerate all instances for 109 functions. Symposium on Code Generation and Optimization - 2006 19 Florida State University Outline • Experimental framework • Exhaustive enumeration of the phase order space. • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 20 Florida State University Search Space Statistics Function start_inp...(j) parse_swi...(j) start_inp...(j) start_inp...(j) start_inp...(j) fft_float(f) main(f) sha_trans...(h) read_scan...(j) LZWRea...(j) main(j) dijkstra(d) .... Insts 1,371 1,228 1,009 971 795 680 624 541 480 472 465 354 .... Blk Loop Instances 88 2 74,950 198 1 200,397 72 1 39,152 82 1 64,571 63 1 7,018 45 4 N/A 50 5 N/A 33 6 343,162 59 2 34,270 44 2 49,412 40 1 33,620 30 3 86,370 .... .... .... Phases 1,153,279 2,990,221 597,147 999,814 106,793 N/A N/A 5,119,947 511,093 772,864 515,749 1,361,960 .... Len 20 18 16 18 15 N/A N/A 26 15 20 17 20 .... CF 153 53 18 47 37 N/A N/A 95 57 41 12 18 .... Leaves 587 2365 324 591 52 N/A N/A 2964 540 159 153 1168 .... average 166.7 16.9 381,857.7 12 27.5 182.9 0.9 25,362.6 Symposium on Code Generation and Optimization - 2006 21 Florida State University Outline • Experimental framework • Exhaustive enumeration of the phase order space. • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 22 Florida State University Weighted Function Instance DAG • Each node is weighted by the number of paths to a leaf node. 5 a 2 b 1 a b 1 [bc] c [a] [abc] c 2 [c] c a 1 [d] [ab] b 1 d 1 1 Symposium on Code Generation and Optimization - 2006 23 Florida State University Enabling Interaction Between Phases • b enables a along the path a-b-a. 5 a 2 b 1 a b 1 [bc] c [a] [abc] c 2 [c] c a 1 [d] [ab] b 1 d 1 1 Symposium on Code Generation and Optimization - 2006 24 Florida State University Enabling Probabilities Ph b c d g h i j k l n o q r s u St b c 0.62 0.01 1.00 0.02 d g h i 0.15 0.06 0.23 0.14 j l n o q r s u 0.12 0.99 0.72 0.38 0.33 1.00 0.05 0.32 0.01 0.06 0.7 0.61 0.01 0.01 0.03 0.01 0.01 0.59 0.06 0.42 0.04 0.87 0.16 0.45 0.02 1.00 0.29 0.73 k 0.18 0.01 0.03 0.02 0.01 0.46 0.61 0.13 0.11 0.02 0.01 0.22 0.01 0.04 0.15 0.16 0.23 0.03 0.81 0.06 0.03 0.05 0.03 0.01 0.08 0.03 0.01 0.05 0.01 0.97 0.53 0.2 0.01 1.00 Symposium on Code Generation and Optimization - 2006 25 Florida State University Disabling Interaction Between Phases • b disables a along the path b-c-d. 5 a 2 b 1 a b 1 [bc] c [a] [abc] c 2 [c] c a 1 [d] [ab] b 1 d 1 1 Symposium on Code Generation and Optimization - 2006 26 Florida State University Disabling Probabilities Ph b b c d g h i j k l n o q r s u 1.00 c d g h 0.15 1.00 i j 0.02 k l n o q 0.08 r s 0.05 0.02 u 0.31 0.15 1.00 0.35 1.00 0.01 0.19 0.02 0.03 1.00 0.08 0.06 0.04 0.71 0.33 0.49 1.00 0.01 1.00 0.14 0.13 1.00 0.14 0.49 1.00 0.30 1.00 0.07 0.31 0.53 1.00 0.02 1.00 1.00 1.00 1.00 0.07 0.09 0.25 1.00 0.73 0.33 0.21 0.12 1.00 0.05 0.01 0.03 0.53 1.00 0.11 0.08 0.55 0.14 1.00 1.00 0.20 Symposium on Code Generation and Optimization - 2006 1.00 27 Florida State University Disabling Probabilities Ph b b c d g h i j k l n o q r s u 1.00 c d g h 0.15 1.00 i j 0.02 k l n o q 0.08 r s 0.05 0.02 u 0.31 0.15 1.00 0.35 1.00 0.01 0.19 0.02 0.03 1.00 0.08 0.06 0.04 0.71 0.33 0.49 1.00 0.01 1.00 0.14 0.13 1.00 0.14 0.49 1.00 0.30 1.00 0.07 0.31 0.53 1.00 0.02 1.00 1.00 1.00 1.00 0.07 0.09 0.25 1.00 0.73 0.33 0.21 0.12 1.00 0.05 0.01 0.03 0.53 1.00 0.11 0.08 0.55 0.14 1.00 1.00 0.20 Symposium on Code Generation and Optimization - 2006 1.00 28 Florida State University Outline • Experimental framework • Exhaustive enumeration of the phase order space. • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 29 Florida State University Faster Conventional Compiler • We modified the VPO compiler to use enabling and disabling probabilities to decrease compilation time. # p[i] - current probability of phase i being active # e[i][j] - probability of phase j enabling phase i # d[i][j] - probability of phase j disabling phase i For each phase i do p[i] = e[i][st]; While (any p[i] > 0) do Select j as the current phase with highest probability of being active Apply phase j If phase j was active then For each phase i, where i != j do p[i] += ((1-p[i]) * e[i][j]) - (p[i] * d[i][j]) p[j] = 0 Symposium on Code Generation and Optimization - 2006 30 Florida State University Probabilistic Compilation Results Function start\_inp...(j) parse\_swi...(j) start\_inp...(j) start\_inp...(j) start\_inp...(j) fft\_float(f) main(f) sha\_trans...(h) read\_scan...(j) LZWReadByte(j) main(j) dijkstra(d) .... average Old Compilation Prob. Compilation Attempted Active Attempted Active 233 16 55 14 233 14 53 12 270 15 55 14 233 14 49 13 231 11 53 12 463 28 99 25 284 20 73 18 284 17 67 16 233 13 43 10 268 12 45 11 270 12 57 14 231 9 43 9 .... .... .... .... 230.3 8.9 47.7 9.6 Time 0.469 0.371 0.353 0.420 0.436 0.451 0.550 0.605 0.342 0.325 0.375 0.409 .... 0.297 Prob. / Old Size Speed 1.014 N/A 1.016 0.972 1.010 N/A 1.003 N/A 1.004 1.000 1.012 0.974 1.007 1.000 0.965 0.953 1.018 N/A 1.014 N/A 1.007 1.000 1.010 1.000 .... .... 1.015 Symposium on Code Generation and Optimization - 2006 1.005 31 Florida State University Outline • Experimental framework • Exhaustive enumeration of the phase order space. • Search space enumeration results • Optimization phase interaction analysis • Making conventional compilation faster • Future work and conclusions Symposium on Code Generation and Optimization - 2006 32 Florida State University Future Work • Study methods to find more equivalent performing function instances to further reduce the optimization phase order space. • Evaluate approaches to find the dynamically optimal function instance. • Improve non-exhaustive searches of the phase order space. • Study additional methods to improve conventional compilers. Symposium on Code Generation and Optimization - 2006 33 Florida State University Conclusions • First work to show that the optimization phase order space can often be completely enumerated (at least for the phases in our compiler). • First analysis of the entire phase order space to capture various phase probabilities. • Used phase interaction information to achieve a much faster compiler that still generates comparable code. Symposium on Code Generation and Optimization - 2006 34 Florida State University Optimization Phase Independence • a-c and c-a are independent. 5 a 2 b 1 a b 1 [bc] c [a] [abc] c 2 [c] c a 1 [d] [ab] b 1 d 1 1 Symposium on Code Generation and Optimization - 2006 35 Florida State University Optimization Phase Independence • b-c and c-b are not independent. 5 a 2 b 1 a b 1 [bc] c [a] [abc] c 2 [c] c a 1 [d] [ab] b 1 d 1 1 Symposium on Code Generation and Optimization - 2006 36 Florida State University Independence Probabilities Ph b c d g h i j k l n o q r s u b c d g h i 0.84 0.94 0.96 0.91 0.84 0.96 0.91 0.94 0.97 0.45 0.95 0.44 0.82 0.65 0.12 0.98 0.96 0.99 0.22 0.95 0.98 0.84 0.98 0.84 0.98 0.79 0.97 0.96 0.95 0.96 0.98 0.88 0.59 j k l n o q r s u 0.97 0.95 0.82 0.96 0.95 0.45 0.44 0.65 0.12 0.98 0.99 0.22 0.96 0.98 0.79 0.95 0.88 0.59 0.98 0.97 0.96 0.87 0.81 0.30 0.87 0.78 0.45 0.81 0.78 0.58 0.30 0.45 0.58 0.98 0.71 0.97 0.99 0.96 0.96 0.82 0.45 0.61 0.39 0.5 0.98 0.97 0.96 0.98 0.96 0.71 0.5 0.97 0.98 0.99 0.82 0.97 0.45 0.61 0.39 0.89 0.94 0.89 0.94 Symposium on Code Generation and Optimization - 2006 37 Florida State University Independence Probabilities Ph b c d g h i j k l n o q r s u b c d g h i 0.84 0.94 0.96 0.91 0.84 0.96 0.91 0.94 0.97 0.45 0.95 0.44 0.82 0.65 0.12 0.98 0.96 0.99 0.22 0.95 0.98 0.84 0.98 0.84 0.98 0.79 0.97 0.96 0.95 0.96 0.98 0.88 0.59 j k l n o q r s u 0.97 0.95 0.82 0.96 0.95 0.45 0.44 0.65 0.12 0.98 0.99 0.22 0.96 0.98 0.79 0.95 0.88 0.59 0.98 0.97 0.96 0.87 0.81 0.30 0.87 0.78 0.45 0.81 0.78 0.58 0.30 0.45 0.58 0.98 0.71 0.97 0.99 0.96 0.96 0.82 0.45 0.61 0.39 0.5 0.98 0.97 0.96 0.98 0.96 0.71 0.5 0.97 0.98 0.99 0.82 0.97 0.45 0.61 0.39 0.89 0.94 0.89 0.94 Symposium on Code Generation and Optimization - 2006 38 Florida State University VPO Optimization Phases (cont...) • Register assignment (assigning pseudo registers to hardware registers) is implicitly performed before the first phase that requires it. • Some phases are applied after the sequence • fixing the entry and exit of the function to manage the run-time stack • exploiting predication on the ARM • performing instruction scheduling Symposium on Code Generation and Optimization - 2006 39