Integrated Code Generation Phase ordering problems

advertisement
DF00100 Advanced Compiler Construction
Phase ordering problems
TDDC86 Compiler Optimizations and Code Generation
Integrated Code Generation
IR
gcc,
lcc
 Phase ordering problems
Instruction
selection
 Clustered VLIW Processors:
More phase ordering problems
 Integrated Code Generation
 Our research project: OPTIMIST
target
code
Register
allocation
Instruction scheduling
Christoph Kessler, IDA,
Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.
TDDC86 Compiler Optimizations and Code Generation
2
Phase ordering problems (1)
Phase ordering problems (2)
Instruction scheduling vs. register allocation
Conflicts Instruction selection  Scheduling / Reg. alloc.
 Selection first: Cost attribute of a pattern covering rule
(a) Scheduling first:
(b) Register allocation first:
determines Live-Ranges
 Register need,
possibly spill-code to be
inserted afterwards
Reuse of same register by different
values introduces ”artificial”
data dependences
 constrains scheduler
is only a coarse estimate of the real cost
(effect on e.g. overall time)
Real cost based on
 currently free functional units
 other instructions ready to execute simultaneously
 pending latencies of already issued but unfinished
instructions
 Integration with instruction scheduling desirable
 Mutations with different resource requirements

C. Kessler, IDA, Linköpings universitet.
TDDC86 Compiler Optimizations and Code Generation
3
a=2*b
oder
a = b << 1
oder
a=b+b ?
 Different instructions with different register
need
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
4
More phase ordering problems
Clustered VLIW processor
 E.g., TI C62x, C64x DSP processors
 Register classes
 Parallel execution constrained by operand residence
 In parallel e.g.:
Data bus
Register File A
v
u
+
u
Register File B
u
v
Unit B1
Unit A1
load on A || load on B || move AB
 Mapping instructions  cluster
should preferably know already beforehand about free
move-slots in schedule...
 Instruction scheduling
 must know mapping to generate moves where needed

 Heuristic [Leupers’00]
add.A1 u, v
C. Kessler, IDA, Linköpings universitet.
add.B1 u, v
5
TDDC86 Compiler Optimizations and Code Generation

iterative optimization by simulated annealing
C. Kessler, IDA, Linköpings universitet.
6
TDDC86 Compiler Optimizations and Code Generation
1
Integrated Code Generation
for Clustered VLIW Architectures
Integrated Code Generation
for non-clustered architectures
IR
Instruction
selection
Target
code
Register
allocation
Instruction scheduling
C. Kessler, IDA, Linköpings universitet.
7
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
8
TDDC86 Compiler Optimizations and Code Generation
Integrated Code Generation
Some Approaches
Example
 Weak integration / Phase coupling
 Clustered 8-issue VLIW processor TI C6201

E.g., register pressure aware scheduler [Goodman/Hsu ’88], HRMS…
 Heuristic Phase Interleaving

 Leupers’
example
E.g., alternating partitioning / scheduling for clust. VLIW [Leupers’00]
INDIR INDIRINDIR INDIR INDIRINDIRINDIR INDIR
 Integration of 2 phases

Instruction selection and register allocation
 DP
 DP

(Dynamic pr.) for space optimization, e.g. [Aho/Johnsson ’77]
for space- or time optimization, e.g. [Fraser et al.’92]
Instruction scheduling and register allocation
 ILP
(Integer Linear Progr.) e.g. [Kästner’00]
 Integration of 3 and more phases

ILP for simple, non-pipelined RISC processor [Wilson et al.’94]
DP, for clustered VLIW: OPTIMIST [K., Bednarski’06]

ILP, for clustered VLIW: Integrated software pipelining [Eriksson, K.’09]

C. Kessler, IDA, Linköpings universitet.
9
TDDC86 Compiler Optimizations and Code Generation
TDDC86 Compiler Optimizations and Code Generation
[A. Bednarski, PhD thesis, Linköping Univ., 2006]
Retargetable integrated code
generator
Open Source
www.ida.liu.se/~chrke/optimist
x
x
11
10
Processor specification language xADML
Our project: OPTIMIST
C. Kessler, IDA, Linköpings universitet.
C. Kessler, IDA, Linköpings universitet.
TDDC86 Compiler Optimizations and Code Generation
<architecture omega="8">
Specify reservation OPx L1 L2 X2
<registers> ... </registers>
tables by
<residenceclasses> ... </residenceclasses>
t+2
X
<funits> ... </funits>
<cycle_matrix>
t+1
X X
<patterns> ... </patterns>
...
<instruction_set>
t
X
X
</cycle_matrix>
<instruction id="ADDP4" op="4407">
<target id="ADD .L1" op0="A" op1="A" op2="A" use_fu="L1"/>
<target id="ADD .L2" op0="B" op1="B" op2="B" use_fu="L2"/>
+
...
</instruction>
...
<transfer>
<target id="MOVE" op0=“A" op1=“B">
<use_fu="X2"/>
<use_fu="L1"/>
</target>
...
</transfer>
</instruction_set>
TDDC86 Compiler Optimizations and Code Generation
</architecture>
C. Kessler,
IDA, Linköpings universitet.
12
2
DF00100 Advanced Compiler Construction
Optimal Integrated Code Generation ...
TDDC86 Compiler Optimizations and Code Generation
 Why ”Optimal” ?

Quantitative assessment of heuristics

Feedback to architecture design
of an application specific processor

Often feasible for small, sometimes not so small instances
absolute
APPENDIX
Optimal Integrated Code Generation
with OPTIMIST
thanks
to modern solver technology and problem
transformations

Christoph Kessler, IDA,
Linköpings universitet, 2009.
Optimal Method 0:
Exhaustive Enumeration
instead of relative comparison
Scientific challenge
C. Kessler, IDA, Linköpings universitet.
14
TDDC86 Compiler Optimizations and Code Generation
Interleaved Exhaustive Enumeration
Any ordering of these
loops is possible...
For all possible instruction selections
For all possible schedules with each selection
For all possible moves of live values ...
 Explores a decision tree (enumeration tree, backtracking tree)
with partial solutions (= code for sub-DAGs)
 Example: (only) instruction scheduling = topological sorting
Evaluate resulting code, keep optimum
 Variant: Interleaved exhaustive enumeration
C. Kessler, IDA, Linköpings universitet.
15
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
16
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
17
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
18
TDDC86 Compiler Optimizations and Code Generation
3
Optimal Method I:
Dynamic Programming
 Compression of the solution space (tree)
 Example: (only) instruction scheduling
C. Kessler, IDA, Linköpings universitet.
19
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
20
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
21
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
22
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
23
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
24
TDDC86 Compiler Optimizations and Code Generation
4
Dynamic Programming,
Regular VLIW / Superscalar Processor
 Compression Theorem I:
Two partial solutions (codes) are equivalent, if they
(a) compute the same subDAG, and
(b) end up in the same pipeline state
 ”time profiles”
[K., Bednarski LCTES 2001]
C. Kessler, IDA, Linköpings universitet.
25
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
26
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
27
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
28
TDDC86 Compiler Optimizations and Code Generation
Results (1) – DP
Dynamic Programming,
Clustered VLIW Processor (1)
 Regular 3-issue VLIW Processor (no MOVEs)
 Register classes / Residence classes
 Random
 Instruction selection constrained by operand residence
DAGs
 Add Move-instructions speculatively  more solutions...
Data bus
Register File A
+
u
Register File B
v
u
v
Unit B1
Unit A1
add.A1 u, v
C. Kessler, IDA, Linköpings universitet.
29
TDDC86 Compiler Optimizations and Code Generation
u
C. Kessler, IDA, Linköpings universitet.
add.B1 u, v
30
TDDC86 Compiler Optimizations and Code Generation
5
Dynamic Programming,
Clustered VLIW Processor (2)
 Compression Theorem II:
Two partial solutions (codes) are equivalent, if they
(a) compute the same subDAG and
(b) end up in the same pipeline-state and
(c) hold the live values at the end in the same residence
classes
C. Kessler, IDA, Linköpings universitet.
31
TDDC86 Compiler Optimizations and Code Generation
Dynamic Programming,
Clustered VLIW Processor (3)
 Partial solutions for (subDAG, pipeState, LiveV-Residences)
C. Kessler, IDA, Linköpings universitet.
32
TDDC86 Compiler Optimizations and Code Generation
Results (2) – DP
Results (3) – DP
 Clustered 8-issue VLIW Processor TI C6201
 Clustered VLIW Processor TI C62x
 Leupers’
 TI C64x DSP Benchmarks and FreeBench
example
Measure for degree
of compression
INDIR INDIRINDIR INDIR INDIRINDIRINDIR INDIR
C. Kessler, IDA, Linköpings universitet.
33
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
Results (4) – DP
Heuristic Methods
 Single-issue vs. Single-cluster vs. Double-cluster architecture
 Heuristic Pruning

34
TDDC86 Compiler Optimizations and Code Generation
Limit number of considered schedules, moves
 Genetic Programming
C. Kessler, IDA, Linköpings universitet.
35
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
36
TDDC86 Compiler Optimizations and Code Generation
6
Results (5) – Heuristic Pruning
Results (6) – Heuristic Pruning
 Regular VLIW processor
 TI C62x Clustered VLIW DSP
C. Kessler, IDA, Linköpings universitet.
37
TDDC86 Compiler Optimizations and Code Generation
See M. Eriksson, PhD thesis, Linköping U., 2011
C. Kessler, IDA, Linköpings universitet.
38
TDDC86 Compiler Optimizations and Code Generation
Optimal Method II:
Integer Linear Programming
 Model by 0/1-variables e.g. ci,p,k,t
und linearen Constraints
 Example:
covering constraint for instruction selection
Supply of pattern instances
No spilling
Total spilling
Partial spilling
(Rematerialization)
C. Kessler, IDA, Linköpings universitet.
39
TDDC86 Compiler Optimizations and Code Generation
ILP-Model (1)
C. Kessler, IDA, Linköpings universitet.
for all i in VG,
SMusterinstanzen p Sk in p St=1...Tmax
C. Kessler, IDA, Linköpings universitet.
ci,p,k,t = 1
(1)
40
TDDC86 Compiler Optimizations and Code Generation
42
TDDC86 Compiler Optimizations and Code Generation
ILP-Model (2)
41
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
7
ILP-Model (3)
Results (7) – DP vs. ILP (AMPL+CPLEX)
 Calculate makespan from the variables ci,p,k,t
 TI C64x DSP Benchmarks and FreeBench
 Minimize makespan t:
min t
Observations:
On the average,
DP can solve larger problem instances
but ILP is often faster (where it terminates).
www.ampl.com
 Problem specified in AMPL
DP works better for deep dependence chains.
 ILP Solver (CPLEX, Gurobi, lp_solve, ...)
C. Kessler, IDA, Linköpings universitet.
43
TDDC86 Compiler Optimizations and Code Generation
C. Kessler, IDA, Linköpings universitet.
44
TDDC86 Compiler Optimizations and Code Generation
Selected Project Publications
 Christoph Kessler, Andrzej Bednarski:
Optimal integrated code generation for VLIW architectures.
Concurrency and Computation: Practice and Experience 18: 1353-1390, 2006.
 Andrzej Bednarski, Christoph Kessler:
Optimal Integrated VLIW Code Generation with Integer Linear Programming.
Proc. Euro-Par 2006 conference, Springer LNCS 4128, pp. 461-472, Aug. 2006.
 Andrzej Bednarski:




Optimal Integrated Code Generation for Digital Signal Processors.
PhD thesis, Linköping university - Institute of Technology, Linköping, Sweden,
June 2006.
Christoph Kessler, Andrzej Bednarski, Mattias Eriksson:
Classification and generation of schedules for VLIW processors.
Concurrency and Computation: Practice and Experience 19: 2369-2389, 2007.
Mattias Eriksson, Christoph Kessler:
Integrated Modulo Scheduling for Clustered VLIW Architectures.
Proc. HiPEAC-2009 High-Performance and Embedded Architecture and
Compilers, Paphos, Cyprus, Jan. 2009. Springer LNCS 5409, pp. 65-79.
Mattias Eriksson: Integrated Code Generation. PhD thesis, Linköping Univ., 2011
www.ida.liu.se/~chrke/optimist
C. Kessler, IDA, Linköpings universitet.
45
TDDC86 Compiler Optimizations and Code Generation
8
Download