Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko Angela Demke Brown

advertisement
Efficient Program Compilation through
Machine Learning Techniques
Gennady Pekhimenko
IBM Canada
Angela Demke Brown
University of Toronto
Motivation
My cool
program
Unroll
Unroll
Compiler
-O2
Executable
Inline
Inline
few seconds
Peephole
Peephole
DCE
DCE
But what to do if executable is slow?
Replace –O2 with –O5
New
Fast
Executable
Unroll
Unroll
Unroll
Unroll
Unroll
Unroll
Optimization
100
1-10 minutes
Motivation (2)
Compiler
-O2
Executable
1 hour
Too slow!
Our cool
Operating
System
New
Executable
Compiler
-O5
20 hours
We do not have that much time
Why did it happen?
Basic Idea
Unroll
Unroll
Unroll
Optimization
100
Do we need all these optimizations for every function?
Probably not.
Compiler writers can typically solve this problem, but how ?
1. Description of every function
2. Classification based on the description
3. Only certain optimizations for every class
Machine Learning is good for solving this kind of problems
Overview






Motivation
System Overview
Experiments and Results
Related Work
Conclusions
Future Work
Initial Experiment
3X difference on
average
Initial Experiment (2)
Time, secs
SPEC2000 execution time at –O3 and –qhot –O3
500
400
300
200
100
0
bzip2
crafty
eon
gap
gzip
mcf
vortex
vpr
ammp
applu
art
equake
facerec
fma3d
galgel
lucas
mesa
mgrid
sixtrack
swim
wupwise
"-O3"
"-qhot -O3"
Benchmarks
Our System
Gather Training Data
Prepare
• extract features
• modify heuristic values
• choose transformations
• find hot methods
Online
Offline
Best
feature
settings
Learn
Deploy
TPO/XL Compiler
set heuristic values
Measure
run time
Compile
Logistic Regression Classifier
Classification
parameters
Data Preparation
Three key elements:
 Feature extraction
 Heuristic values modification
 Target set of transformations
• Total # of insts
• Loop nest level
• # and % of Loads, Stores,
Branches
• Loop characteristics
• Float and Integer # and %
• Existing XL compiler is
missing functionality
• Extension was made to the
existing Heuristic Context
approach
• Unroll
• Wandwaving
• If-conversion
• Unswitching
• CSE
• Index Splitting ….
Gather Training Data
 Try to “cut” transformation backwards (from
last to first)
Late
Inlining
Unroll
Wandwaving
 If run time not worse than before,
transformation can be skipped
 Otherwise we keep it
 We do this for every hot function of every test
The main benefit is linear complexity.
Learn with Logistic Regression
Function
Descriptions
Classifier
Input
Best Heuristic
Values
• Logistic Regression
• Neural Networks
• Genetic Programming
Compiler +
Heuristic Values
.hpredict
files
Output
Deployment
Online phase, for every function:
 Calculate the feature vector
 Compute the prediction
 Use this prediction as heuristic context
Overhead is negligible
Overview






Motivation
System Overview
Experiments and Results
Related Work
Conclusions
Future Work
Experiments
Benchmarks:
SPEC2000
Others from IBM customers
Platform:
IBM server, 4 x Power5
1.9 GHz, 32GB RAM
Running AIX 5.3
Results: compilation time
2x
average
speedup
Oracle
Classifer
bzip2
crafty
eon
gap
gzip
mcf
vortex
vpr
ammp
applu
art
equake
facerec
fma3d
galgel
lucas
mesa
mgrid
sixtrack
swim
wupwise
GeoMean
Normalized
Time
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Benchmarks
bzip2
crafty
eon
gap
gzip
mcf
vortex
vpr
ammp
applu
art
equake
facerec
fma3d
galgel
lucas
mesa
mgrid
sixtrack
swim
wupwise
Results: execution time
Time, secs
350
300
Baseline
250
Oracle
200
Classifer
150
100
50
0
Benchmarks
New benchmarks: compilation time
Normalized
Time
1
0.8
Classifier
0.6
0.4
0.2
0
Benchmarks
New benchmarks: execution time
Time, secs
350
300
Baseline
250
Classifer
200
150
4%
speedup
100
50
0
apsi
parser
twolf
Benchmarks
dmo
argonne
Overview






Motivation
System Overview
Experiments and Results
Related Work
Conclusions
Future Work
Related Work
 Iterative Compilation
Pan and Eigenmann
 Agakov, et al.

 Single Heuristic Tuning
Calder, et al.
 Stephenson, et al.

 Multiple Heuristic Tuning
Cavazos, et al.
 MILEPOST GCC

Conclusions and Future Work
 2x average compile time decrease
 Future work
Execution time improvement
 -O5 level
 Performance Counters for better method
description

 Other benefits
Heuristic Context Infrastructure
 Bug Finding

Thank you
 Raul Silvera, Arie Tal, Greg Steffan, Mathew
Zaleski
 Questions?
Download