Exhaustive Optimization Phase Order Space Exploration Florida State University Prasad A. Kulkarni

advertisement
Florida State University
Exhaustive Optimization Phase
Order Space Exploration
Prasad A. Kulkarni
David B. Whalley
Gary S. Tyson
Jack W. Davidson
Symposium on Code Generation and Optimization - 2006
Florida State University
Optimization Phase Ordering
• Optimizing compilers apply several
optimization phases to improve the
performance of applications.
• Optimization phases interact with each other.
• Determining the best order of applying
optimization phases has been a long standing
problem in compilers.
Symposium on Code Generation and Optimization - 2006
2
Florida State University
Exhaustive Phase Order
Enumeration... is it Feasible ?
• A obvious approach to address the phase
ordering problem is to exhaustively evaluate
all combinations of optimization phases.
• Exhaustive enumeration is difficult
• compilers typically contain many different
optimization phases
• optimizations may be successful multiple times
for each function / program
Symposium on Code Generation and Optimization - 2006
3
Florida State University
Optimization Space Properties
• Phase ordering problem can be made more
manageable by exploiting certain properties
of the optimization search space
• optimization phases might not apply any
transformations
• many optimization phases are independent
• Thus, many different orderings of
optimization phases produce the same code.
Symposium on Code Generation and Optimization - 2006
4
Florida State University
Re-stating the Phase Ordering
Problem
• Rather than considering all attempted phase
sequences, the phase ordering problem can
be addressed by enumerating all distinct
function instances that can be produced by
combination of optimization phases.
• We were able to exhaustively enumerate
109 out of 111 functions, in a few minutes
for most.
Symposium on Code Generation and Optimization - 2006
5
Florida State University
Outline
• Experimental framework
• Algorithm for exhaustive enumeration of
the phase order space
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
6
Florida State University
Experimental Framework
• We used the VPO compilation system
• established compiler framework, started development
in 1988
• comparable performance to gcc –O2
• VPO performs all transformations on a single
representation (RTLs), so it is possible to perform
most phases in an arbitrary order.
• Experiments use all the 15 available optimization
phases in VPO.
• Target architecture was the StrongARM SA-100
processor.
Symposium on Code Generation and Optimization - 2006
7
Florida State University
VPO Optimization Phases
ID
b
c
d
g
h
i
j
k
Optimization Phase
branch chaining
common subexpr. elim.
remv. unreachable code
loop unrolling
dead assignment elim.
block reordering
minimize loop jumps
register allocation
ID
l
n
o
q
r
s
u
Optimization Phase
loop transformations
code abstraction
eval. order determin.
strength reduction
reverse branches
instruction selection
remv. useless jumps
Symposium on Code Generation and Optimization - 2006
8
Florida State University
Disclaimers
• Did not include optimization phases normally
associated with compiler front ends
• no memory hierarchy optimizations
• no inlining or other interprocedural optimizations
• Did not vary how phases are applied.
• Did not include optimizations that require
profile data.
Symposium on Code Generation and Optimization - 2006
9
Florida State University
Benchmarks
• Used one program from each of the six
MiBench categories.
• Total of 111 functions.
Category Program Description
auto
network
telecomm
consumer
security
office
bitcount
dijkstra
fft
jpeg
sha
stringsearch
test processor bit manipulation abilities
Dijkstra’s shortest path algorithm
fast fourier transform
image compression / decompression
secure hash algorithm
searches for given words in phrases
Symposium on Code Generation and Optimization - 2006
10
Florida State University
Outline
• Experimental framework
• Exhaustive enumeration of the phase order
space.
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
11
Florida State University
Naïve Optimization Phase Order
Space Exploration
• All combinations of optimization phase
sequences are attempted.
L0
a
d
c
b
L1
a
b
c
d
a
b
c
d
a
b
c
d
a
b
c
d
L2
Symposium on Code Generation and Optimization - 2006
12
Florida State University
Eliminating Consecutively
Applied Phases
• A phase just applied in our compiler cannot
be immediately active again.
L0
a
c
b
d
L1
b
c
d
a
c
d
a
b
d
a
b
c
L2
Symposium on Code Generation and Optimization - 2006
13
Florida State University
Eliminating Dormant Phases
• Get feedback from the compiler indicating
if any transformations were successfully
applied in a phase.
L0
a
c
b
d
L1
b
c
d
a
c
d
a
b
d
L2
Symposium on Code Generation and Optimization - 2006
14
Florida State University
Detecting Identical Function
Instances
• Some optimization phases are independent
• example: branch chaining & register allocation
• Different phase sequences can produce the
same code
r[2] = 1;
r[3] = r[4] + r[2];
⇒instruction selection
r[3] = r[4] + 1;
r[2] = 1;
r[3] = r[4] + r[2];
⇒constant propagation
r[2] = 1;
r[3] = r[4] + 1;
⇒dead assignment elimination
r[3] = r[4] + 1;
Symposium on Code Generation and Optimization - 2006
15
Florida State University
Detecting Equivalent Function
Instances
sum = 0;
for (i = 0; i < 1000; i++ )
sum += a [ i ];
Source Code
r[10]=0;
r[12]=HI[a];
r[12]=r[12]+LO[a];
r[1]=r[12];
r[9]=4000+r[12];
L3
r[8]=M[r[1]];
r[10]=r[10]+r[8];
r[1]=r[1]+4;
IC=r[1]?r[9];
PC=IC<0,L3;
r[11]=0;
r[10]=HI[a];
r[10]=r[10]+LO[a];
r[1]=r[10];
r[9]=4000+r[10];
L5
r[8]=M[r[1]];
r[11]=r[11]+r[8];
r[1]=r[1]+4;
IC=r[1]?r[9];
PC=IC<0,L5;
Register Allocation
before Code Motion
Code Motion before
Register Allocation
r[32]=0;
r[33]=HI[a];
r[33]=r[33]+LO[a];
r[34]=r[33];
r[35]=4000+r[33];
L01
r[36]=M[r[34]];
r[32]=r[32]+r[36];
r[34]=r[34]+4;
IC=r[34]?r[35];
PC=IC<0,L01;
After Mapping
Registers
Symposium on Code Generation and Optimization - 2006
16
Florida State University
Resulting Search Space
• Merging equivalent function instances
transforms the tree to a DAG.
L0
a
c
b
L1
c
d
a
d
a
d
L2
Symposium on Code Generation and Optimization - 2006
17
Florida State University
Efficient Detection of Unique
Function Instances
• Even after pruning there may be tens or
hundreds of thousands of unique instances.
• Use a CRC (cyclic redundancy check)
checksum on the bytes of the RTLs
representing the instructions.
• Used a hash table to check if an equivalent
function instance already exists in the DAG.
Symposium on Code Generation and Optimization - 2006
18
Florida State University
Techniques to Make Searches
Faster
• Kept a copy of the program representation of the
unoptimized function instance in memory to avoid
repeated disk accesses.
• Also kept the program representation after each
active phase in memory to reduce the number of
phases applied for each sequence.
• Reduced search time by at least a factor of 5 to 10.
• Out of 111 functions in our benchmark suite we
were able to completely enumerate all instances for
109 functions.
Symposium on Code Generation and Optimization - 2006
19
Florida State University
Outline
• Experimental framework
• Exhaustive enumeration of the phase order
space.
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
20
Florida State University
Search Space Statistics
Function
start_inp...(j)
parse_swi...(j)
start_inp...(j)
start_inp...(j)
start_inp...(j)
fft_float(f)
main(f)
sha_trans...(h)
read_scan...(j)
LZWRea...(j)
main(j)
dijkstra(d)
....
Insts
1,371
1,228
1,009
971
795
680
624
541
480
472
465
354
....
Blk Loop Instances
88
2
74,950
198
1
200,397
72
1
39,152
82
1
64,571
63
1
7,018
45
4
N/A
50
5
N/A
33
6
343,162
59
2
34,270
44
2
49,412
40
1
33,620
30
3
86,370
....
....
....
Phases
1,153,279
2,990,221
597,147
999,814
106,793
N/A
N/A
5,119,947
511,093
772,864
515,749
1,361,960
....
Len
20
18
16
18
15
N/A
N/A
26
15
20
17
20
....
CF
153
53
18
47
37
N/A
N/A
95
57
41
12
18
....
Leaves
587
2365
324
591
52
N/A
N/A
2964
540
159
153
1168
....
average
166.7
16.9
381,857.7
12
27.5
182.9
0.9
25,362.6
Symposium on Code Generation and Optimization - 2006
21
Florida State University
Outline
• Experimental framework
• Exhaustive enumeration of the phase order
space.
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
22
Florida State University
Weighted Function Instance DAG
• Each node is weighted by the number of
paths to a leaf node.
5
a
2
b
1
a
b
1
[bc]
c
[a]
[abc]
c
2
[c]
c
a
1
[d]
[ab]
b
1
d
1
1
Symposium on Code Generation and Optimization - 2006
23
Florida State University
Enabling Interaction Between
Phases
• b enables a along the path a-b-a.
5
a
2
b
1
a
b
1
[bc]
c
[a]
[abc]
c
2
[c]
c
a
1
[d]
[ab]
b
1
d
1
1
Symposium on Code Generation and Optimization - 2006
24
Florida State University
Enabling Probabilities
Ph
b
c
d
g
h
i
j
k
l
n
o
q
r
s
u
St
b
c
0.62
0.01
1.00 0.02
d
g
h
i
0.15 0.06
0.23 0.14
j
l
n
o
q
r
s
u
0.12 0.99 0.72 0.38 0.33 1.00 0.05 0.32
0.01
0.06
0.7
0.61 0.01 0.01
0.03 0.01
0.01
0.59
0.06
0.42
0.04
0.87
0.16
0.45 0.02
1.00
0.29
0.73
k
0.18
0.01 0.03
0.02
0.01
0.46
0.61
0.13
0.11
0.02 0.01
0.22 0.01 0.04
0.15
0.16 0.23
0.03
0.81
0.06
0.03 0.05 0.03
0.01
0.08
0.03
0.01
0.05 0.01
0.97 0.53 0.2
0.01
1.00
Symposium on Code Generation and Optimization - 2006
25
Florida State University
Disabling Interaction Between
Phases
• b disables a along the path b-c-d.
5
a
2
b
1
a
b
1
[bc]
c
[a]
[abc]
c
2
[c]
c
a
1
[d]
[ab]
b
1
d
1
1
Symposium on Code Generation and Optimization - 2006
26
Florida State University
Disabling Probabilities
Ph
b
b
c
d
g
h
i
j
k
l
n
o
q
r
s
u
1.00
c
d
g
h
0.15
1.00
i
j
0.02
k
l
n
o
q
0.08
r
s
0.05
0.02
u
0.31
0.15
1.00
0.35
1.00
0.01
0.19 0.02
0.03
1.00
0.08
0.06
0.04
0.71
0.33 0.49
1.00
0.01
1.00 0.14
0.13 1.00
0.14
0.49
1.00
0.30 1.00
0.07 0.31 0.53 1.00 0.02
1.00 1.00 1.00 1.00
0.07
0.09 0.25
1.00
0.73
0.33
0.21
0.12
1.00
0.05
0.01 0.03 0.53
1.00
0.11
0.08
0.55
0.14
1.00
1.00 0.20
Symposium on Code Generation and Optimization - 2006
1.00
27
Florida State University
Disabling Probabilities
Ph
b
b
c
d
g
h
i
j
k
l
n
o
q
r
s
u
1.00
c
d
g
h
0.15
1.00
i
j
0.02
k
l
n
o
q
0.08
r
s
0.05
0.02
u
0.31
0.15
1.00
0.35
1.00
0.01
0.19 0.02
0.03
1.00
0.08
0.06
0.04
0.71
0.33 0.49
1.00
0.01
1.00 0.14
0.13 1.00
0.14
0.49
1.00
0.30 1.00
0.07 0.31 0.53 1.00 0.02
1.00 1.00 1.00 1.00
0.07
0.09 0.25
1.00
0.73
0.33
0.21
0.12
1.00
0.05
0.01 0.03 0.53
1.00
0.11
0.08
0.55
0.14
1.00
1.00 0.20
Symposium on Code Generation and Optimization - 2006
1.00
28
Florida State University
Outline
• Experimental framework
• Exhaustive enumeration of the phase order
space.
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
29
Florida State University
Faster Conventional Compiler
• We modified the VPO compiler to use enabling and
disabling probabilities to decrease compilation time.
# p[i] - current probability of phase i being active
# e[i][j] - probability of phase j enabling phase i
# d[i][j] - probability of phase j disabling phase i
For each phase i do
p[i] = e[i][st];
While (any p[i] > 0) do
Select j as the current phase with highest probability of being active
Apply phase j
If phase j was active then
For each phase i, where i != j do
p[i] += ((1-p[i]) * e[i][j]) - (p[i] * d[i][j])
p[j] = 0
Symposium on Code Generation and Optimization - 2006
30
Florida State University
Probabilistic Compilation Results
Function
start\_inp...(j)
parse\_swi...(j)
start\_inp...(j)
start\_inp...(j)
start\_inp...(j)
fft\_float(f)
main(f)
sha\_trans...(h)
read\_scan...(j)
LZWReadByte(j)
main(j)
dijkstra(d)
....
average
Old Compilation Prob. Compilation
Attempted Active Attempted Active
233
16
55
14
233
14
53
12
270
15
55
14
233
14
49
13
231
11
53
12
463
28
99
25
284
20
73
18
284
17
67
16
233
13
43
10
268
12
45
11
270
12
57
14
231
9
43
9
....
....
....
....
230.3
8.9
47.7
9.6
Time
0.469
0.371
0.353
0.420
0.436
0.451
0.550
0.605
0.342
0.325
0.375
0.409
....
0.297
Prob. / Old
Size
Speed
1.014
N/A
1.016
0.972
1.010
N/A
1.003
N/A
1.004
1.000
1.012
0.974
1.007
1.000
0.965
0.953
1.018
N/A
1.014
N/A
1.007
1.000
1.010
1.000
....
....
1.015
Symposium on Code Generation and Optimization - 2006
1.005
31
Florida State University
Outline
• Experimental framework
• Exhaustive enumeration of the phase order
space.
• Search space enumeration results
• Optimization phase interaction analysis
• Making conventional compilation faster
• Future work and conclusions
Symposium on Code Generation and Optimization - 2006
32
Florida State University
Future Work
• Study methods to find more equivalent performing
function instances to further reduce the
optimization phase order space.
• Evaluate approaches to find the dynamically
optimal function instance.
• Improve non-exhaustive searches of the phase
order space.
• Study additional methods to improve conventional
compilers.
Symposium on Code Generation and Optimization - 2006
33
Florida State University
Conclusions
• First work to show that the optimization phase
order space can often be completely enumerated
(at least for the phases in our compiler).
• First analysis of the entire phase order space to
capture various phase probabilities.
• Used phase interaction information to achieve a
much faster compiler that still generates
comparable code.
Symposium on Code Generation and Optimization - 2006
34
Florida State University
Optimization Phase Independence
• a-c and c-a are independent.
5
a
2
b
1
a
b
1
[bc]
c
[a]
[abc]
c
2
[c]
c
a
1
[d]
[ab]
b
1
d
1
1
Symposium on Code Generation and Optimization - 2006
35
Florida State University
Optimization Phase Independence
• b-c and c-b are not independent.
5
a
2
b
1
a
b
1
[bc]
c
[a]
[abc]
c
2
[c]
c
a
1
[d]
[ab]
b
1
d
1
1
Symposium on Code Generation and Optimization - 2006
36
Florida State University
Independence Probabilities
Ph
b
c
d
g
h
i
j
k
l
n
o
q
r
s
u
b
c
d
g
h
i
0.84
0.94
0.96 0.91
0.84 0.96
0.91
0.94
0.97 0.45
0.95 0.44
0.82 0.65
0.12
0.98
0.96 0.99
0.22
0.95
0.98 0.84
0.98
0.84
0.98
0.79 0.97
0.96 0.95 0.96
0.98 0.88
0.59
j
k
l
n
o
q
r
s
u
0.97 0.95 0.82
0.96
0.95
0.45 0.44 0.65 0.12 0.98 0.99 0.22
0.96 0.98
0.79 0.95 0.88 0.59
0.98 0.97 0.96
0.87 0.81 0.30
0.87
0.78 0.45
0.81 0.78
0.58
0.30 0.45 0.58
0.98 0.71 0.97 0.99
0.96 0.96
0.82 0.45 0.61 0.39
0.5 0.98 0.97
0.96
0.98 0.96
0.71
0.5
0.97
0.98
0.99 0.82 0.97
0.45
0.61
0.39
0.89
0.94
0.89 0.94
Symposium on Code Generation and Optimization - 2006
37
Florida State University
Independence Probabilities
Ph
b
c
d
g
h
i
j
k
l
n
o
q
r
s
u
b
c
d
g
h
i
0.84
0.94
0.96 0.91
0.84 0.96
0.91
0.94
0.97 0.45
0.95 0.44
0.82 0.65
0.12
0.98
0.96 0.99
0.22
0.95
0.98 0.84
0.98
0.84
0.98
0.79 0.97
0.96 0.95 0.96
0.98 0.88
0.59
j
k
l
n
o
q
r
s
u
0.97 0.95 0.82
0.96
0.95
0.45 0.44 0.65 0.12 0.98 0.99 0.22
0.96 0.98
0.79 0.95 0.88 0.59
0.98 0.97 0.96
0.87 0.81 0.30
0.87
0.78 0.45
0.81 0.78
0.58
0.30 0.45 0.58
0.98 0.71 0.97 0.99
0.96 0.96
0.82 0.45 0.61 0.39
0.5 0.98 0.97
0.96
0.98 0.96
0.71
0.5
0.97
0.98
0.99 0.82 0.97
0.45
0.61
0.39
0.89
0.94
0.89 0.94
Symposium on Code Generation and Optimization - 2006
38
Florida State University
VPO Optimization Phases (cont...)
• Register assignment (assigning pseudo registers to
hardware registers) is implicitly performed before
the first phase that requires it.
• Some phases are applied after the sequence
• fixing the entry and exit of the function to manage the
run-time stack
• exploiting predication on the ARM
• performing instruction scheduling
Symposium on Code Generation and Optimization - 2006
39
Related documents
Download