Evaluating Heuristic Optimization Phase Order Search Algorithms by Prasad A. Kulkarni

advertisement
Evaluating Heuristic Optimization
Phase Order Search Algorithms
by
Prasad A. Kulkarni
David B. Whalley
Gary S. Tyson Jack W. Davidson
Computer Science Department, Florida State University, Tallahassee, Florida
Computer Science Department, University of Virginia, Charlottesville, Virginia
2007 International Symposium on Code Generation and Optimization (CGO)
/ 32
Compiler Optimizations
• Optimization phases require enabling
conditions
– need specific patterns in the code
– many also need available registers
• Phases interact with each other
• Applying optimizations in different orders
generates different code
2007 International Symposium on Code Generation and Optimization (CGO)
2 / 32
Phase Ordering Problem
• To find an ordering of optimization phases that
produces optimal code with respect to possible
phase orderings
• Evaluating each sequence involves compiling,
assembling, linking, execution and verifying
results
• Best optimization phase ordering depends on
– source application
– target platform
– implementation of optimization phases
• Long standing problem in compiler optimization!!
2007 International Symposium on Code Generation and Optimization (CGO)
3 / 32
Addressing Phase Ordering
• Exhaustive phase order space evaluation
[CGO ’06, LCTES ’06]
– possible for most functions
– long search times for larger functions
• Heuristic approaches
– commonly employed, extensively studied
– allow faster searches
– no guarantees on solution quality
2007 International Symposium on Code Generation and Optimization (CGO)
4 / 32
Survey of Heuristic Algorithms
• Cost and performance comparison
– with optimal
– with other heuristic searches
• Analyze phase order space properties
– sequence length
– leaf sequences
• Improving heuristic search algorithms
– propose new algorithms
2007 International Symposium on Code Generation and Optimization (CGO)
5 / 32
Outline
• Experimental setup
• Local search techniques
– distribution of local minima
– local search algorithms
• Exploiting properties of leaf sequences
– genetic search algorithm
• Conclusions
2007 International Symposium on Code Generation and Optimization (CGO)
6 / 32
Outline
• Experimental setup
• Local search techniques
– distribution of local minima
– local search algorithms
• Exploiting properties of leaf sequences
– genetic search algorithm
• Conclusions
2007 International Symposium on Code Generation and Optimization (CGO)
7 / 32
Experimental Framework
• We used the VPO compilation system
– established compiler framework, started development
in 1988
– comparable performance to gcc –O2
• VPO performs all transformations on a single
representation (RTLs), so it is possible to
perform most phases in an arbitrary order
• Experiments use all the 15 re-orderable
optimization phases in VPO
• Target architecture was the StrongARM SA-100
processor
2007 International Symposium on Code Generation and Optimization (CGO)
8 / 32
VPO Optimization Phases
ID Optimization Phase
ID Optimization Phase
b branch chaining
l
c
n code abstraction
common subexpr. elim.
loop transformations
d remv. unreachable code
o eval. order determin.
g loop unrolling
q strength reduction
h dead assignment elim.
r
reverse branches
i
block reordering
s
instruction selection
j
minimize loop jumps
u remv. useless jumps
k
register allocation
2007 International Symposium on Code Generation and Optimization (CGO)
9 / 32
Benchmarks
• 12 MiBench benchmarks; 88 functions
Category
Program
bitcount
qsort
dijkstra
network
patricia
fft
telecomm
adpcm
jpeg
consumer
tiff2bw
sha
security
blowfish
stringsearch
office
ispell
auto
Description
test processor bit manipulation abilities
sort strings using quicksort sorting algorithm
Dijkstra’s shortest path algorithm
construct patricia trie for IP traffic
fast fourier transform
compress 16-bit linear PCM samples to 4-bit
image compression and decompression
convert color .tiff image to b&w image
secure hash algorithm
symmetric block cipher with variable length key
searches for given words in phrases
fast spelling checker
2007 International Symposium on Code Generation and Optimization (CGO)
10 / 32
Terminology
• Active phase – an optimization phase that
modifies the function representation
• Dormant phase – a phase that is unable to
find any opportunity to change the function
• Function instance – any semantically,
syntactically, and functionally correct
representation of the source function (that
can be produced by our compiler)
2007 International Symposium on Code Generation and Optimization (CGO)
11 / 32
Terminology (cont...)
• Attempted sequence – phase sequence
comprising of both active and dormant
phases
• Active sequence – phase sequence only
comprising active phases
• Batch sequence – active sequence
applied by the default (batch) compiler
2007 International Symposium on Code Generation and Optimization (CGO)
12 / 32
Setup for Analyzing Search
Algorithms
• Exhaustively evaluate optimization phase
order space
– represent phase order space as DAG
• For each search algorithm
– use algorithm to generate next optimization
phase sequence
– lookup performance in DAG
2007 International Symposium on Code Generation and Optimization (CGO)
13 / 32
Phase Order Search Space DAG
• Performance
evaluation of each
phase order is
traversal in the
DAG
– a-b-d = 52
100
a
b
79
b
c
60
c
c
54
66
d
d
52
46
2007 International Symposium on Code Generation and Optimization (CGO)
99
a
b
96
14 / 32
Outline
• Experimental setup
• Local search techniques
– distribution of local minima
– local search algorithms
• Exploiting properties of leaf sequences
– genetic search algorithm
• Conclusions
2007 International Symposium on Code Generation and Optimization (CGO)
15 / 32
Local Search Techniques
• Consecutive sequences differ in only one
position
• if m phases and a
sequence length of
n, then will have
n(m-1) neighbors
bseq
neighbors
a
b c a a a a a a
b
b b a c b b b b
c
c
a
a a a a a a b c
c
c
c
a b c
2007 International Symposium on Code Generation and Optimization (CGO)
c
16 / 32
Local Search Space Properties
• Analyze local search space to
– study distribution of local and global minima
– study importance of sequence length
up to 100 attempts
to generate seq.
of length n
is new
sequence?
Y
get perf.
of seq. &
(m-1)n
neighbors
record if
node is
minima
mark node
as seen
N
exit
2007 International Symposium on Code Generation and Optimization (CGO)
17 / 32
Distribution of Minima
% (num. minima / total samples)
% (num global minima / total minima)
100%
80%
60%
40%
20%
0%
1
1.5
2
2.5
3
3.5
4
4.5
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
18 / 32
Hill Climbing
• Steepest decent
– compare all
successors of base
– exit on local minima
randomly generate
sequence of
length n
get perf.
of seq. &
(m-1)n
neighbors
is seq.
minima?
N
select best
neighbor as
new base seq.
Y
record
local
minima
2007 International Symposium on Code Generation and Optimization (CGO)
19 / 32
Hill Climbing Results (cont...)
% best perf. distance from optimal
avg. steps to local minimum
5
4
3
2
1
9
9.
5
10
10
.5
8
8.
5
7
7.
5
6
6.
5
5
5.
5
4
4.
5
3
3.
5
2
2.
5
1
1.
5
0
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
20 / 32
Local Search Conclusions
• Phase order space consists of few
minima, but significant percentage of local
minima can be optimal
• Selecting appropriate sequence length is
important
– smaller length results in bad performance
– larger length is expensive
2007 International Symposium on Code Generation and Optimization (CGO)
21 / 32
Outline
• Experimental setup
• Local search techniques
– distribution of local minima
– local search algorithms
• Exploiting properties of leaf sequences
– genetic search algorithm
• Conclusions
2007 International Symposium on Code Generation and Optimization (CGO)
22 / 32
Leaf Sequence Properties
• Leaf function instances are generated when no
additional phases can be successfully applied
– sequences leading to leaf function instances are leaf
sequences
• Leaf sequences result in good performance
– at least one leaf instance represents an optimal
phase ordering for over 86% of functions
– significant percentage of leaf instances among
optimal
2007 International Symposium on Code Generation and Optimization (CGO)
23 / 32
Focusing on Leaf Sequences
• Modify phase order search algorithms to
only produce leaf sequences
– no need to guess appropriate sequence
length
– likely to result in optimal or close to optimal
performance
– leaf function instances comprise only 4.2% of
the total instances
2007 International Symposium on Code Generation and Optimization (CGO)
24 / 32
Genetic Algorithm
• A biased sampling search method
– evolves solutions by merging parts of different
solutions
Create initial
population
of optimization
sequences
Evaluate fitness of
each sequence
in the population
Terminate
cond. ?
Y
Output the
best
sequence
found
N
crossover &
mutation to
create new
generation
2007 International Symposium on Code Generation and Optimization (CGO)
25 / 32
Modified Genetic Algorithm
• Only generate leaf sequences
Create initial
population
of optimization
sequences
Evaluate fitness of
each sequence
in the population
Terminate
cond. ?
Y
Output the
best
sequence
found
N
adjust
sequences
to leaf
sequences
crossover &
mutation to
create new
generation
2007 International Symposium on Code Generation and Optimization (CGO)
26 / 32
Genetic Algorithm – Performance
% Performance from Optimal
All Sequences
Leaf Sequences
10
8
6
4
2
0
1
5
1.7
2.5
5
3.2
4
5
4.7
5.5 6.25
7
5
7.7
8.5
5
9.2
10
.75
10
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
27 / 32
Genetic Algorithm – Cost
Leaf Sequences
120
80
40
10
.8
10
9.
25
8.
5
7.
75
7
6.
25
5.
5
4.
75
4
3.
25
2.
5
1.
75
0
1
Number of Generations
All Sequences
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
28 / 32
Leaf Search Conclusion
• Benefits of restricting searches to leaf
sequences
– no need for apriori knowledge of appropriate
sequence length
– near-optimal performance
– low cost
2007 International Symposium on Code Generation and Optimization (CGO)
29 / 32
Outline
• Experimental setup
• Local search techniques
– distribution of local minima
– local search algorithms
• Exploiting properties of leaf sequences
– genetic search algorithm
• Conclusions
2007 International Symposium on Code Generation and Optimization (CGO)
30 / 32
Conclusions
• First study to compare heuristic search solutions
with optimal orderings
• Analyzed properties of phase order search
space
– few local and global minima
• Illustrated importance of choosing the
appropriate sequence length
• Demonstrated importance of leaf sequences
– achieve near-optimal performance at low cost
2007 International Symposium on Code Generation and Optimization (CGO)
31 / 32
Questions ?
2007 International Symposium on Code Generation and Optimization (CGO)
32 / 32
Simulated Annealing
• A worse solution is
accepted with
prob = exp (-δf/T)
– δf Æ diff. in perf.
randomly generate
sequence of
length n
get perf.
of seq. &
(m-1)n
neighbors
Y
– T Æ current temp.
• Annealing schedule
– initial temperature
– cooling schedule
is seq.
minima?
Y
N
select best
neighbor as
new base seq.
2007 International Symposium on Code Generation and Optimization (CGO)
select
best with
prob p?
N
record
local
minima
33 / 32
Simulated Annealing (cont...)
• Simulated annealing parameters
– sequence length Æ 1.5 times batch length
– initial temperature Æ 0.5 to 0.95
– cooling schedule Æ 0.5 to 0.95, steps 0.5
• Experimental results
– perf. 0.15% from optimal, std. dev. of 0.13%
– avg. perf. 15.95% worse, std. dev. of 0.55%
– 41.06% iter. reach optimal, std. dev. of 0.81%
2007 International Symposium on Code Generation and Optimization (CGO)
34 / 32
Random Algorithm
• Random sampling used for search spaces
that are discrete and sparse
Randomly
initialize
(leaf)
sequence
Lookup
performance
No
improve
> 100?
Y
Output best
sequence
found & exit
N
Record best
performance
2007 International Symposium on Code Generation and Optimization (CGO)
35 / 32
All Sequences
Leaf Sequences
25
20
15
10
5
10
.8
10
9.
25
8.
5
7.
75
7
6.
25
5.
5
4.
75
4
3.
25
2.
5
1.
75
0
1
% Performance from Optima
Random Algorithm – Performance
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
36 / 32
Random Algorithm – Cost
Leaf Sequences
10
.7
5
10
9.
25
8.
5
7.
75
7
6.
25
5.
5
4.
75
4
3.
25
2.
5
1.
75
180
170
160
150
140
130
120
110
100
1
Number of Attempts
All Sequences
Multiple of Batch Sequence Length
2007 International Symposium on Code Generation and Optimization (CGO)
37 / 32
Download