demesmay-strigkos

advertisement
Carnegie Mellon
Optimal Scheduling “in a lifetime” for
the SPIRAL compiler
Frédéric de Mesmay
Theodoros Strigkos
based on Y. Voronenko’s idea
Carnegie Mellon
Scientific Code Generation Approaches


Produces Blocked, Parallelized, SIMDized, Scheduled code
Backend : C, asm, verilog
Carnegie Mellon
The SPIRAL compiler

The IR is simple
 SSA form
 no function calls
 no pointers
 type is uniform
 no control flow (this is dealt by others)


Huge simple DAG
SPIRAL compiles libraries
 Plenty of time for compilation…
Carnegie Mellon
Scheduling & Register Allocation

Traditionally done in two different passes
 The first pass is imposing constraints on the second
 Allocate registers first
introduce anti- and output dependencies
 used when targeting OoO architectures
– few ISA registers, hardware scheduler
 Schedule first
 may run out of registers
 used when targeting In-order processors
– plenty of registers, limited hardware scheduling

Carnegie Mellon
Scheduling & Register Allocation

Optimality is achieved with Integer Linear
Programming (ILP)
 Schedule code and allocate registers together
 Model processor with constraints
 Solve the equations
 Optimal Scheduling = NP complete
Optimal Register Allocation = NP complete
 Simultaneous Allocation and Scheduling =
Problematic
Carnegie Mellon
Node Packing Graph
b
Adder
#1
Adder
#2
b
a
a
c
c
a
a
c
c
ASAP
a
1
b
1
c
2
ALAP
2
2
3
a1,1  a1, 2  a2,1  a2, 2  1
aib, j {0,1}
Cycle 1
a <- x + 3
b <- x + 5
c <- a + b
b
Cycle 2
Cycle 3
Operation Assignment : 3 equ.
FU Constraint : 4 equ.
Precedence : 28 equ.
+ Number of registers
+ Spilling
+ Objective function
+…
Carnegie Mellon
LP instead of ILP


ILP hardly schedule DFT[4]…
Every ILP problem can be reformulated such that it
can be solved with an LP solver and keep integer
solution!
 LP is polynomial! Where is the catch?
 Characterize integral facets: “good” inequalities
 Have an objective function whose optimum lie on those facets
Carnegie Mellon
Current Implementation



Simple Processor model for functional unit
allocation
DFT(8) do-able (1 hour)
Poor Objective function
 Non-integer solutions, that we fix
 Lose optimality (keep good solution?)
Carnegie Mellon
What’s Left ?



Get rid of non-integer solutions if possible
 Get a better objective function
 If not, determine if our “fixing step” is costly
Add register allocation equations
Deal with necessary spilling
Carnegie Mellon
Thank You!

Questions?
Download