Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto Motivation My cool program Unroll Unroll Compiler -O2 Executable Inline Inline few seconds Peephole Peephole DCE DCE But what to do if executable is slow? Replace –O2 with –O5 New Fast Executable Unroll Unroll Unroll Unroll Unroll Unroll Optimization 100 1-10 minutes Motivation (2) Compiler -O2 Executable 1 hour Too slow! Our cool Operating System New Executable Compiler -O5 20 hours We do not have that much time Why did it happen? Basic Idea Unroll Unroll Unroll Optimization 100 Do we need all these optimizations for every function? Probably not. Compiler writers can typically solve this problem, but how ? 1. Description of every function 2. Classification based on the description 3. Only certain optimizations for every class Machine Learning is good for solving this kind of problems Overview Motivation System Overview Experiments and Results Related Work Conclusions Future Work Initial Experiment 3X difference on average Initial Experiment (2) Time, secs SPEC2000 execution time at –O3 and –qhot –O3 500 400 300 200 100 0 bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise "-O3" "-qhot -O3" Benchmarks Our System Gather Training Data Prepare • extract features • modify heuristic values • choose transformations • find hot methods Online Offline Best feature settings Learn Deploy TPO/XL Compiler set heuristic values Measure run time Compile Logistic Regression Classifier Classification parameters Data Preparation Three key elements: Feature extraction Heuristic values modification Target set of transformations • Total # of insts • Loop nest level • # and % of Loads, Stores, Branches • Loop characteristics • Float and Integer # and % • Existing XL compiler is missing functionality • Extension was made to the existing Heuristic Context approach • Unroll • Wandwaving • If-conversion • Unswitching • CSE • Index Splitting …. Gather Training Data Try to “cut” transformation backwards (from last to first) Late Inlining Unroll Wandwaving If run time not worse than before, transformation can be skipped Otherwise we keep it We do this for every hot function of every test The main benefit is linear complexity. Learn with Logistic Regression Function Descriptions Classifier Input Best Heuristic Values • Logistic Regression • Neural Networks • Genetic Programming Compiler + Heuristic Values .hpredict files Output Deployment Online phase, for every function: Calculate the feature vector Compute the prediction Use this prediction as heuristic context Overhead is negligible Overview Motivation System Overview Experiments and Results Related Work Conclusions Future Work Experiments Benchmarks: SPEC2000 Others from IBM customers Platform: IBM server, 4 x Power5 1.9 GHz, 32GB RAM Running AIX 5.3 Results: compilation time 2x average speedup Oracle Classifer bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise GeoMean Normalized Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Benchmarks bzip2 crafty eon gap gzip mcf vortex vpr ammp applu art equake facerec fma3d galgel lucas mesa mgrid sixtrack swim wupwise Results: execution time Time, secs 350 300 Baseline 250 Oracle 200 Classifer 150 100 50 0 Benchmarks New benchmarks: compilation time Normalized Time 1 0.8 Classifier 0.6 0.4 0.2 0 Benchmarks New benchmarks: execution time Time, secs 350 300 Baseline 250 Classifer 200 150 4% speedup 100 50 0 apsi parser twolf Benchmarks dmo argonne Overview Motivation System Overview Experiments and Results Related Work Conclusions Future Work Related Work Iterative Compilation Pan and Eigenmann Agakov, et al. Single Heuristic Tuning Calder, et al. Stephenson, et al. Multiple Heuristic Tuning Cavazos, et al. MILEPOST GCC Conclusions and Future Work 2x average compile time decrease Future work Execution time improvement -O5 level Performance Counters for better method description Other benefits Heuristic Context Infrastructure Bug Finding Thank you Raul Silvera, Arie Tal, Greg Steffan, Mathew Zaleski Questions?