Evaluating Heuristic Optimization Phase Order Search Algorithms by Prasad A. Kulkarni David B. Whalley Gary S. Tyson Jack W. Davidson Computer Science Department, Florida State University, Tallahassee, Florida Computer Science Department, University of Virginia, Charlottesville, Virginia 2007 International Symposium on Code Generation and Optimization (CGO) / 32 Compiler Optimizations • Optimization phases require enabling conditions – need specific patterns in the code – many also need available registers • Phases interact with each other • Applying optimizations in different orders generates different code 2007 International Symposium on Code Generation and Optimization (CGO) 2 / 32 Phase Ordering Problem • To find an ordering of optimization phases that produces optimal code with respect to possible phase orderings • Evaluating each sequence involves compiling, assembling, linking, execution and verifying results • Best optimization phase ordering depends on – source application – target platform – implementation of optimization phases • Long standing problem in compiler optimization!! 2007 International Symposium on Code Generation and Optimization (CGO) 3 / 32 Addressing Phase Ordering • Exhaustive phase order space evaluation [CGO ’06, LCTES ’06] – possible for most functions – long search times for larger functions • Heuristic approaches – commonly employed, extensively studied – allow faster searches – no guarantees on solution quality 2007 International Symposium on Code Generation and Optimization (CGO) 4 / 32 Survey of Heuristic Algorithms • Cost and performance comparison – with optimal – with other heuristic searches • Analyze phase order space properties – sequence length – leaf sequences • Improving heuristic search algorithms – propose new algorithms 2007 International Symposium on Code Generation and Optimization (CGO) 5 / 32 Outline • Experimental setup • Local search techniques – distribution of local minima – local search algorithms • Exploiting properties of leaf sequences – genetic search algorithm • Conclusions 2007 International Symposium on Code Generation and Optimization (CGO) 6 / 32 Outline • Experimental setup • Local search techniques – distribution of local minima – local search algorithms • Exploiting properties of leaf sequences – genetic search algorithm • Conclusions 2007 International Symposium on Code Generation and Optimization (CGO) 7 / 32 Experimental Framework • We used the VPO compilation system – established compiler framework, started development in 1988 – comparable performance to gcc –O2 • VPO performs all transformations on a single representation (RTLs), so it is possible to perform most phases in an arbitrary order • Experiments use all the 15 re-orderable optimization phases in VPO • Target architecture was the StrongARM SA-100 processor 2007 International Symposium on Code Generation and Optimization (CGO) 8 / 32 VPO Optimization Phases ID Optimization Phase ID Optimization Phase b branch chaining l c n code abstraction common subexpr. elim. loop transformations d remv. unreachable code o eval. order determin. g loop unrolling q strength reduction h dead assignment elim. r reverse branches i block reordering s instruction selection j minimize loop jumps u remv. useless jumps k register allocation 2007 International Symposium on Code Generation and Optimization (CGO) 9 / 32 Benchmarks • 12 MiBench benchmarks; 88 functions Category Program bitcount qsort dijkstra network patricia fft telecomm adpcm jpeg consumer tiff2bw sha security blowfish stringsearch office ispell auto Description test processor bit manipulation abilities sort strings using quicksort sorting algorithm Dijkstra’s shortest path algorithm construct patricia trie for IP traffic fast fourier transform compress 16-bit linear PCM samples to 4-bit image compression and decompression convert color .tiff image to b&w image secure hash algorithm symmetric block cipher with variable length key searches for given words in phrases fast spelling checker 2007 International Symposium on Code Generation and Optimization (CGO) 10 / 32 Terminology • Active phase – an optimization phase that modifies the function representation • Dormant phase – a phase that is unable to find any opportunity to change the function • Function instance – any semantically, syntactically, and functionally correct representation of the source function (that can be produced by our compiler) 2007 International Symposium on Code Generation and Optimization (CGO) 11 / 32 Terminology (cont...) • Attempted sequence – phase sequence comprising of both active and dormant phases • Active sequence – phase sequence only comprising active phases • Batch sequence – active sequence applied by the default (batch) compiler 2007 International Symposium on Code Generation and Optimization (CGO) 12 / 32 Setup for Analyzing Search Algorithms • Exhaustively evaluate optimization phase order space – represent phase order space as DAG • For each search algorithm – use algorithm to generate next optimization phase sequence – lookup performance in DAG 2007 International Symposium on Code Generation and Optimization (CGO) 13 / 32 Phase Order Search Space DAG • Performance evaluation of each phase order is traversal in the DAG – a-b-d = 52 100 a b 79 b c 60 c c 54 66 d d 52 46 2007 International Symposium on Code Generation and Optimization (CGO) 99 a b 96 14 / 32 Outline • Experimental setup • Local search techniques – distribution of local minima – local search algorithms • Exploiting properties of leaf sequences – genetic search algorithm • Conclusions 2007 International Symposium on Code Generation and Optimization (CGO) 15 / 32 Local Search Techniques • Consecutive sequences differ in only one position • if m phases and a sequence length of n, then will have n(m-1) neighbors bseq neighbors a b c a a a a a a b b b a c b b b b c c a a a a a a a b c c c c a b c 2007 International Symposium on Code Generation and Optimization (CGO) c 16 / 32 Local Search Space Properties • Analyze local search space to – study distribution of local and global minima – study importance of sequence length up to 100 attempts to generate seq. of length n is new sequence? Y get perf. of seq. & (m-1)n neighbors record if node is minima mark node as seen N exit 2007 International Symposium on Code Generation and Optimization (CGO) 17 / 32 Distribution of Minima % (num. minima / total samples) % (num global minima / total minima) 100% 80% 60% 40% 20% 0% 1 1.5 2 2.5 3 3.5 4 4.5 Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 18 / 32 Hill Climbing • Steepest decent – compare all successors of base – exit on local minima randomly generate sequence of length n get perf. of seq. & (m-1)n neighbors is seq. minima? N select best neighbor as new base seq. Y record local minima 2007 International Symposium on Code Generation and Optimization (CGO) 19 / 32 Hill Climbing Results (cont...) % best perf. distance from optimal avg. steps to local minimum 5 4 3 2 1 9 9. 5 10 10 .5 8 8. 5 7 7. 5 6 6. 5 5 5. 5 4 4. 5 3 3. 5 2 2. 5 1 1. 5 0 Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 20 / 32 Local Search Conclusions • Phase order space consists of few minima, but significant percentage of local minima can be optimal • Selecting appropriate sequence length is important – smaller length results in bad performance – larger length is expensive 2007 International Symposium on Code Generation and Optimization (CGO) 21 / 32 Outline • Experimental setup • Local search techniques – distribution of local minima – local search algorithms • Exploiting properties of leaf sequences – genetic search algorithm • Conclusions 2007 International Symposium on Code Generation and Optimization (CGO) 22 / 32 Leaf Sequence Properties • Leaf function instances are generated when no additional phases can be successfully applied – sequences leading to leaf function instances are leaf sequences • Leaf sequences result in good performance – at least one leaf instance represents an optimal phase ordering for over 86% of functions – significant percentage of leaf instances among optimal 2007 International Symposium on Code Generation and Optimization (CGO) 23 / 32 Focusing on Leaf Sequences • Modify phase order search algorithms to only produce leaf sequences – no need to guess appropriate sequence length – likely to result in optimal or close to optimal performance – leaf function instances comprise only 4.2% of the total instances 2007 International Symposium on Code Generation and Optimization (CGO) 24 / 32 Genetic Algorithm • A biased sampling search method – evolves solutions by merging parts of different solutions Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond. ? Y Output the best sequence found N crossover & mutation to create new generation 2007 International Symposium on Code Generation and Optimization (CGO) 25 / 32 Modified Genetic Algorithm • Only generate leaf sequences Create initial population of optimization sequences Evaluate fitness of each sequence in the population Terminate cond. ? Y Output the best sequence found N adjust sequences to leaf sequences crossover & mutation to create new generation 2007 International Symposium on Code Generation and Optimization (CGO) 26 / 32 Genetic Algorithm – Performance % Performance from Optimal All Sequences Leaf Sequences 10 8 6 4 2 0 1 5 1.7 2.5 5 3.2 4 5 4.7 5.5 6.25 7 5 7.7 8.5 5 9.2 10 .75 10 Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 27 / 32 Genetic Algorithm – Cost Leaf Sequences 120 80 40 10 .8 10 9. 25 8. 5 7. 75 7 6. 25 5. 5 4. 75 4 3. 25 2. 5 1. 75 0 1 Number of Generations All Sequences Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 28 / 32 Leaf Search Conclusion • Benefits of restricting searches to leaf sequences – no need for apriori knowledge of appropriate sequence length – near-optimal performance – low cost 2007 International Symposium on Code Generation and Optimization (CGO) 29 / 32 Outline • Experimental setup • Local search techniques – distribution of local minima – local search algorithms • Exploiting properties of leaf sequences – genetic search algorithm • Conclusions 2007 International Symposium on Code Generation and Optimization (CGO) 30 / 32 Conclusions • First study to compare heuristic search solutions with optimal orderings • Analyzed properties of phase order search space – few local and global minima • Illustrated importance of choosing the appropriate sequence length • Demonstrated importance of leaf sequences – achieve near-optimal performance at low cost 2007 International Symposium on Code Generation and Optimization (CGO) 31 / 32 Questions ? 2007 International Symposium on Code Generation and Optimization (CGO) 32 / 32 Simulated Annealing • A worse solution is accepted with prob = exp (-δf/T) – δf Æ diff. in perf. randomly generate sequence of length n get perf. of seq. & (m-1)n neighbors Y – T Æ current temp. • Annealing schedule – initial temperature – cooling schedule is seq. minima? Y N select best neighbor as new base seq. 2007 International Symposium on Code Generation and Optimization (CGO) select best with prob p? N record local minima 33 / 32 Simulated Annealing (cont...) • Simulated annealing parameters – sequence length Æ 1.5 times batch length – initial temperature Æ 0.5 to 0.95 – cooling schedule Æ 0.5 to 0.95, steps 0.5 • Experimental results – perf. 0.15% from optimal, std. dev. of 0.13% – avg. perf. 15.95% worse, std. dev. of 0.55% – 41.06% iter. reach optimal, std. dev. of 0.81% 2007 International Symposium on Code Generation and Optimization (CGO) 34 / 32 Random Algorithm • Random sampling used for search spaces that are discrete and sparse Randomly initialize (leaf) sequence Lookup performance No improve > 100? Y Output best sequence found & exit N Record best performance 2007 International Symposium on Code Generation and Optimization (CGO) 35 / 32 All Sequences Leaf Sequences 25 20 15 10 5 10 .8 10 9. 25 8. 5 7. 75 7 6. 25 5. 5 4. 75 4 3. 25 2. 5 1. 75 0 1 % Performance from Optima Random Algorithm – Performance Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 36 / 32 Random Algorithm – Cost Leaf Sequences 10 .7 5 10 9. 25 8. 5 7. 75 7 6. 25 5. 5 4. 75 4 3. 25 2. 5 1. 75 180 170 160 150 140 130 120 110 100 1 Number of Attempts All Sequences Multiple of Batch Sequence Length 2007 International Symposium on Code Generation and Optimization (CGO) 37 / 32