A Dynamic Programming Approach to Optimal Integrated Code Generation Christoph Keßler

advertisement
A Dynamic Programming
Approach to Optimal
Integrated Code Generation
Christoph Keßler
Andrzej Bednarski
Linköping University (Sweden)
Outline





Code generation
Our integrated approach
Implementation and results
Current and future work
Conclusion
Code Generation
IR-level
Instruction scheduling
Instruction selection
Instruction selection
Target-level
Instruction scheduling
Target-level
Instruction scheduling
Instruction selection
Instruction selection
IR-level
Instruction scheduling
IR
Target
code
Related Work


Heuristics
Optimal approaches





ILP
Dynamic programming
Branch-and-bound
Enumeration
Constraint logic programming
Integrated Code Generation
IR-level
Instruction scheduling
Instruction selection
Instruction selection
Target-level
Instruction scheduling
Target-level
Instruction scheduling
Instruction selection
Instruction selection
IR-level
Instruction scheduling
IR
Target
code
Integrated Approach

Christoph Keßler’s previous work





Scheduling by topological sorting
Dynamic programming
Selection DAG
Time profile
Extended selection DAG
Basic block scope of code generation
Topological Sorting
z
z’
v
v
u
scheduled(z)
u
scheduled(z’)
Selection Tree
a
{a,b,c}
b
{b,c}
b
c
{c,d} {b}
…
{a,c}
…
a
c
c
{a,b}
a
{c,d} {a,e} {b}
…
…
…
b
{a,e}
h
…
f
d
a
g
e
b
c
Selection DAG


Merge multiple instances of same zero
indegree set z in one selection node
Selection DAG


Selection DAG is leveled in n+1 levels
Each schedule S corresponds to one path
in the selection DAG
Selection DAG
{a,b,c}
a
{b,c}
b
c
{c,d} {b}
…
…
b
a
{a,c}
c
c
a
{a,b}
b
{a,e}
h
…
f
d
a
g
e
b
c
Towards Time Optimization

Machine model



Generic superscalar/VLIW architecture
Single/Multiple issue
From IR level to target level



Instruction selection
Register allocation (homogenous)
Imitate instruction dispatcher behaviour
Time Profile

Window of the instructions scheduled
last for each unit that may still influence
future scheduling decisions
time
t
e
f
-
-
c
d
b
-
-
a
-
-
u1
u2
u3
Extended Selection Node


An extended selection node (z, t, P),
summarizes all schedules of
scheduled(z) that end with the time
profile (t, P).
Pruning (formal proof in the paper)
time
t
e
f
-
-
c
d
b
-
a
u1
t’
e
f
-
-
a
c
-
-
b
u2
u3
u1
t’
a
f
-
d
e
c
d
-
-
b
-
-
u2
u3
u1
u2
u3
Extended Selection DAG
Level 0
Level 1
Level 2
...
Solution Space



Group the extended
selection nodes in
each level according
to execution time
Construct solution
space in order of
increasing time
Postpones the
combinatorial
explosion
Implementation




C++
LEDA
XML based architecture description
language
LCC as C–front-end
Results – Random DAGs
Results – Random DAGs
Results – FIR Filter
Basic
Block
DAG
#nodes
Time
archi. 1
Time
archi. 2
BB1
16
3.5s
4.0s
BB2
16
8.0s
9.5s
BB3
30
3:21:50.2s 4:40:44.9s
Results – Matrix Multiplication
Basic
Block
DAG
Time
#nodes
archi. 1
Time
archi. 2
BB2
30
1:05.0s
1:41.8s
BB2
(unrolled)
40
6:08.5s
9:47.2s
Results – Jacobi Grid Relax.
Basic
Block
DAG
#nodes
Loop
body (5)
40
Loop
body (9)
53
Time
archi. 1
1:15.8s
Time
archi. 2
1:31.8s
1:36:13.2s 2:00:51.5s
Current and Future Work




Time-space profile for irregular register
sets
Speculative instruction selection
Extensions of architecture description
language
Beyond basic block level

Time-space profiles as connector
descriptions
Conclusion








Goal: fully integrated code generation
Dynamic programming approach
Time profiles to compress the solution space
Improved order of solution space construction
Feasible for medium sized basic blocks
Potential for extensions
Alternative to ILP
Home page: www.ida.liu.se/~chrke/optimist
Download