A Dynamic Programming Approach to Optimal Integrated Code Generation Christoph Keßler Andrzej Bednarski Linköping University (Sweden) Outline Code generation Our integrated approach Implementation and results Current and future work Conclusion Code Generation IR-level Instruction scheduling Instruction selection Instruction selection Target-level Instruction scheduling Target-level Instruction scheduling Instruction selection Instruction selection IR-level Instruction scheduling IR Target code Related Work Heuristics Optimal approaches ILP Dynamic programming Branch-and-bound Enumeration Constraint logic programming Integrated Code Generation IR-level Instruction scheduling Instruction selection Instruction selection Target-level Instruction scheduling Target-level Instruction scheduling Instruction selection Instruction selection IR-level Instruction scheduling IR Target code Integrated Approach Christoph Keßler’s previous work Scheduling by topological sorting Dynamic programming Selection DAG Time profile Extended selection DAG Basic block scope of code generation Topological Sorting z z’ v v u scheduled(z) u scheduled(z’) Selection Tree a {a,b,c} b {b,c} b c {c,d} {b} … {a,c} … a c c {a,b} a {c,d} {a,e} {b} … … … b {a,e} h … f d a g e b c Selection DAG Merge multiple instances of same zero indegree set z in one selection node Selection DAG Selection DAG is leveled in n+1 levels Each schedule S corresponds to one path in the selection DAG Selection DAG {a,b,c} a {b,c} b c {c,d} {b} … … b a {a,c} c c a {a,b} b {a,e} h … f d a g e b c Towards Time Optimization Machine model Generic superscalar/VLIW architecture Single/Multiple issue From IR level to target level Instruction selection Register allocation (homogenous) Imitate instruction dispatcher behaviour Time Profile Window of the instructions scheduled last for each unit that may still influence future scheduling decisions time t e f - - c d b - - a - - u1 u2 u3 Extended Selection Node An extended selection node (z, t, P), summarizes all schedules of scheduled(z) that end with the time profile (t, P). Pruning (formal proof in the paper) time t e f - - c d b - a u1 t’ e f - - a c - - b u2 u3 u1 t’ a f - d e c d - - b - - u2 u3 u1 u2 u3 Extended Selection DAG Level 0 Level 1 Level 2 ... Solution Space Group the extended selection nodes in each level according to execution time Construct solution space in order of increasing time Postpones the combinatorial explosion Implementation C++ LEDA XML based architecture description language LCC as C–front-end Results – Random DAGs Results – Random DAGs Results – FIR Filter Basic Block DAG #nodes Time archi. 1 Time archi. 2 BB1 16 3.5s 4.0s BB2 16 8.0s 9.5s BB3 30 3:21:50.2s 4:40:44.9s Results – Matrix Multiplication Basic Block DAG Time #nodes archi. 1 Time archi. 2 BB2 30 1:05.0s 1:41.8s BB2 (unrolled) 40 6:08.5s 9:47.2s Results – Jacobi Grid Relax. Basic Block DAG #nodes Loop body (5) 40 Loop body (9) 53 Time archi. 1 1:15.8s Time archi. 2 1:31.8s 1:36:13.2s 2:00:51.5s Current and Future Work Time-space profile for irregular register sets Speculative instruction selection Extensions of architecture description language Beyond basic block level Time-space profiles as connector descriptions Conclusion Goal: fully integrated code generation Dynamic programming approach Time profiles to compress the solution space Improved order of solution space construction Feasible for medium sized basic blocks Potential for extensions Alternative to ILP Home page: www.ida.liu.se/~chrke/optimist