Contr o l F

advertisement
Page 1
C. Kessler, IDA, Linköpings Universitet, 2009.
[ASU1e 10.4] [ALSU2e 9.6] [Muchnick 7]
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Control Flow Analysis
necessary to enable global optimizations beyond basic blocks
basis for data-flow analysis
3
2
1
m <= 1 ?
f1 = 1
f0 = 0
receive m
C. Kessler, IDA, Linköpings Universitet, 2009.
i = 2
N
5
i <= m ?
N
11
10
9
i = i + 1
f1 = f2
f0 = f1
Y
8 f2 = f0+f1
6
return f2
4
7
(iterative)
(recursive)
(recursive)
reconstruction of if-then-else, loops
from MIR, from unstructured source code or from target code
! loops: candidates for loop transformations, software pipelining
! if-then-else: candidates for predication
identify basic blocks of a routine
m
Y
Page 3
construct its flow graph / basic block graph
1. Dominator-based analysis
2. Interval analysis
3. Structural analysis
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Example (cont.): Control Flow Graph
receive m
// read arg value
f0 = 0
f1 = 1
if m<=1 goto L3
12
return
i = 2
L1: if i<=m goto L2
return f2
L2: f2 = f0 + f1
f0 = f1
f1 = f2
i = i + 1
goto L1
L3: return f2
// MIR
1
2
3
4
5
6
7
8
9
10
11
12
13
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 2
Page 4
Control Flow Analysis – Running Example
// Fibonacci - Iterative alg.
}
unsigned int f0 = 0,
f1 = 1,
f2, i;
if (m <= 1)
return m;
else {
for (i=2; i<=m; i++) {
f2 = f0 + f1;
f0 = f1;
f1 = f2;
}
return f2;
unsigned int fib (
unsigned int m )
{
}
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Detecting basic blocks
C. Kessler, IDA, Linköpings Universitet, 2009.
receive m
// read arg value
f0 = 0
f1 = 1
if m<=1 goto L3
i = 2
L1: if i<=m goto L2
return f2
L2: f2 = f0 + f1
f0 = f1
f1 = f2
i = i + 1
goto L1
L3: return f2
// MIR, flattened
1
2
3
4
5
6
7
8
9
10
11
12
13
C. Kessler, IDA, Linköpings Universitet, 2009.
basic block (BB)
= max. sequence of consecutive statements (IR or target level)
that can be entered by program control only via the first one
and left only via the last one.
first instruction (“leader”) of a BB: either
+ entry point of a procedure, or
+ branch target, or
+ instruction immediately following a branch or return
! call instructions need not delimit the basic block
(ok for most cases, but not for e.g. instruction scheduling)
! exception-based control transfer not considered here
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Basic-block graph
Page 5
C. Kessler, IDA, Linköpings Universitet, 2009.
Terminology: in [Muchnick’97] called control-flow graph CFG,
whereas “our” CFG (statement level) is there called a “flowchart”
rooted, directed graph G = (N E )
nodes = basic blocks + entry + exit
Succ(b)
=
Pred (b)
1
f0 = 0
receive m
i = 2
f0 = f1
fn 2 N : (b n) 2 E g
B2
Y
exit
entry
B1
B5
N
B6
Y
B4
B3
C. Kessler, IDA, Linköpings Universitet, 2009.
fn 2 N : (n b) 2 E g
Page 7
=
edges = control flow edges from CFG/flowchart
(there connecting BB exits to the leaders of their successor BBs)
+ enter ! initial basic block
+ final basic blocks (no succ.) ! exit
successor BB’s of a BB b:
predecessor BB’s of a BB b:
2
f1 = 1
5
N
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
4
3
2
1
m <= 1 ?
f1 = 1
f0 = 0
receive m
return f2
6
5
N
N
10
9
i = i + 1
f1 = f2
f0 = f1
Y
8 f2 = f0+f1
i <= m ?
i = 2
12
Page 6
return m
Y
3
2
1
f1 = 1
f0 = 0
receive m
C. Kessler, IDA, Linköpings Universitet, 2009.
5
i = 2
N
i <= m ?
10
9
i = i + 1
f1 = f2
f0 = f1
Y
8 f2 = f0+f1
6
N
11
C. Kessler, IDA, Linköpings Universitet, 2009.
return f2
m <= 1 ?
7
4
Example (cont.): CFG ! Basic Block Leaders
12
return m
7
11
Page 8
Algorithm for computing the EBB’s of a CFG: see e.g. [Muchnick 7.1]
EBB’s are useful for some optimizations e.g. instruction scheduling
! EBB also known as treegion
! single entry, multiple exits, tree-like internal control flow
Extended basic block (EBB)
= max. sequence of instructions beginning with a leader
that contains no join nodes other than (maybe) its first node
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
3
m <= 1 ?
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
4
Extended basic blocks, regions
Y
return m
i <= m ?
N
9
f1 = f2
Y
8 f2 = f0+f1
6
return f2
10
i = i + 1
N
Example (cont.): CFG + Basic Blocks ! Basic Block Graph
12
7
11
Region
= strongly connected subgraph (SCC) of the CFG with a single entry
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
entry
B1
B5
N
N
B6
Y
B4
B3
Example (cont.): Extended Basic Blocks
B2
exit
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 9
Page 11
Graph-theoretic concepts of control-flow analysis (1)
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
depth-first search (dfs): recursively explore descendants of a node
before any of its siblings (as far as not yet visited)
dfs-number: order in which dfs enters nodes
tree edges: edges followed by dfs via recursive calls
dfs-tree: ( nodes, tree edges )
non-tree edges classified as
forward edges “F”
back edges “B”
cross edges “C”
not unique! depends on ordering of descendants
see also DFS-slides on course homepage
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Finding Loops
Page 10
Programs spend most of the execution time in loops.
C. Kessler, IDA, Linköpings Universitet, 2009.
! Optimizations that exploit the loop structure are important
loop unrolling, loop parallelization, software pipelining, ...
Loops may be expressed in programs by different constructs
(while, for, goto, ..., compiler-converted tail-recursion)
! Find uniform treatment for program loops
T
1
4
C
T
T
6
Page 12
T
7
T
8
B
B
T
2
T
T
3
T
1
5
C
C
4
T
6
T
7
T
8
B
C. Kessler, IDA, Linköpings Universitet, 2009.
Use a general approach based on graph-theoretic properties of CFG.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
2
T
3
T
Example: DFS-tree, edge classification
B
F
5
Page 13
C. Kessler, IDA, Linköpings Universitet, 2009.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 14
Dominance, immediate dominance, strict dominance
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Graph-theoretic concepts of control-flow analysis (2)
idom(b) is unique for each b 2 N
! (i)dominator tree, rooted at entry
Lamp
N
N
Y
B4
B6
Page 16
strict dominance d sdom b if d dom b and d 6= b
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
entry
B1
B5
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
i immediately dominates b
(i idom b)
if i dom b and there is no c 2 N, i 6= c 6= b, with i dom c and c dom b
dom is reflexive, transitive, antisymmetric ! partial order on N
d dominates b
(d dom b)
if every possible execution path entry ! b includes d
Given: flow graph G = (N E ), nodes d i p b 2 N
C. Kessler, IDA, Linköpings Universitet, 2009.
preorder traversal of a digraph G = (N E ):
at each node b 2 N process b before its descendants
(not unique; in dfsnum-order)
Page 15
postorder traversal
at each node b 2 N process b after its descendants
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
Dominance intuition (2)
Lamp
N
B4
Y
exit
B3
Dominance intuition (1)
entry
B1
N
B6
B2
The entry node dominates all nodes:
Y
B5
B3
Imagine a source of light going into the entry node,
nodes are transparent and edges are optical fibers
B2
exit
Place an opaque barrier at node v ! nodes dominated by v get dark
(Adapted from a nice presentation by J. Amaral 2003)
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 17
C. Kessler, IDA, Linköpings Universitet, 2009.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
exit
entry
B1
B5
Lamp
N
N
B3
Y
B4
B6
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
B2
Dominance intuition (4)
Lamp
N
Y
B4
B3
Dominance intuition (3)
entry
B1
N
B6
C. Kessler, IDA, Linköpings Universitet, 2009.
Node B6 only dominates itself:
Y
B5
Page 19
Node B3 dominates B3, B4, B5, B6:
B2
exit
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
c1
c2
a
....
Page 20
Page 18
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
if D =
6 Domin(n)
Domin(n) D; change true
return Domin
p2Pred (n)
change true;
Domin(n) N 8n 2 N ; frg
Domin(r) frg;
while ( change )
change false
for all n 2 N ; frg // in dfsnum order
T Domin( p)
D fng Algorithm 1:
Computing the dominators of a node
or
(1) a = b,
iff
Postdominance
b dom p in the reversed flow graph
a dom b
()
p postdominates b
(p pdom b)
if every possible execution path b ! exit includes p
p pdom b
(2) 9 unique
immediate
predecessor of b,
namely a,
i.e. Pred (b) = fag,
or
(3) for all c 2 Pred (b)
c=
6 a and a dom c.
b
B2
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
entry
B1
N
N
B3
B4
Y
B6
node i
entry
B1
B2
B3
B4
B5
B6
exit
Page 21
Page 23
Domin(i), init.
fentryg
change=true
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
fentry, B1, B2, B3, B4, B5, B6, exitg
Example: Computing dominators
Y
B5
exit
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
entry
B1
N
N
B3
B4
Y
B6
node i
entry
B1
B2
B3
B4
B5
B6
exit
B2
init. Tmp(i)=Domin(i)–fig
fentryg
fentryg
fentry, B1g
fentry, B1g
fentry, B1, B3g
fentry, B1, B3, B4g
fentry, B1, B3, B4g
fentry, B1g
Example: Computing immediate dominators
B2
B5
exit
Dominator tree:
iter. 2 ...
...
...
...
...
...
...
...
...
C. Kessler, IDA, Linköpings Universitet, 2009.
Domin(i), iter. 1
fentryg
fentry, B1g change=true
fentry, B1, B2g
fentry, B1, B3g
fentry, B1, B3, B4g
fentry, B1, B3, B4,B5g
fentry, B1, B3, B4,B6g
fentry, B1, exitg
exit
C. Kessler, IDA, Linköpings Universitet, 2009.
B3
B4
B6
Tmp(i), iter. 1
fentryg
fentryg
fB1g
fB1g
fB3g
fB4g
fB4g
fB1g
entry
B1
B5
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 22
Extension for computing immediate dominators
... compute Domin ...
for each n 2 N
Tmp(n) Domin(n) ; fng
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
for each n 2 N ; frg // in dfsnum order
// if a s in Tmp(n) has a dominator t 6= s, remove t from Tmp(n)
for each s 2 Tmp(n)
for each t 2 Tmp(n) ; fsg
if t 2 Tmp(s) then
Tmp(n) Tmp(n) ; ft g
for each n 2 N ; frg // in dfsnum order
idom(n) the b 2Tmp(n)
Page 24
Total time: O(n2e) if sets are represented by bitvectors
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Computing dominators
Algorithm 2 [Lengauer/Tarjan’79]
based on depth first search and path compression
time O(e log n) or O(e α(e n))
(see e.g. [Muchnick pp. 185–190])
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 25
Loops and Strongly Connected Components
T
T
T
T
C
c
a
b
w
v’
? Can’t exist because n dom m
a
b
c
d
e
T
T
T
T
F
C. Kessler, IDA, Linköpings Universitet, 2009.
d
e
C. Kessler, IDA, Linköpings Universitet, 2009.
B
We call a (backward, B) edge (m n) a loop back edge if n dom m.
Remark: Not every B edge is a loop back edge!
w
v’
B
Natural loop of a loop back edge (m n)
= subgraph of n and all nodes v
from which m can be reached without passing through n
v
n
n is the loop header.
B
m
Page 27
c does not dominate d ) not a natural loop (2 entry points)
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Identifying the natural loop of a loop back edge
v
n
stopping recursive backward search at already found loop nodes.
Backwards from m, (df)search predecessors v,
Start by marking m and n as loop nodes.
Algorithm: Compute the loop node set for a given loop back edge (m n)
B
m
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Y
entry
B1
B5
N
N
n
m
B6
Y
B4
B3
Example (cont.): Natural Loop
B2
exit
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Loop header, preheader
B2
Page 26
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
B2
B
Page 28
B1
header
pre−header
For technical reasons, add a pre-header (initially empty)
if the header has more than 2 predecessors:
B1
header
B
! Easier to place new instructions immediately before the loop
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Properties of Natural Loops
Page 29
C. Kessler, IDA, Linköpings Universitet, 2009.
Two natural loops with different headers are either disjoint
or one is nested in the other.
Each natural loop is a SCC.
Background:
Strongly connected component (SCC)
= subgraph S = (NS ES), NS N, ES E,
where every node in VS is reachable in S
from every other node in VS via edges in ES.
SCC’s can be computed with Tarjan’s algorithm (extension of dfs)
in time O(jV j + jE j)
[Tarjan’72]
Page 31
C. Kessler, IDA, Linköpings Universitet, 2009.
Several loops sharing a common header node is a pathological special case that must be treated ad hoc.
See e.g. Muchnick 7.4 for more details.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Reducibility of flow graphs
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Properties of Natural Loops (cont.)
Page 30
C. Kessler, IDA, Linköpings Universitet, 2009.
C. Kessler, IDA, Linköpings Universitet, 2009.
A SCC S V is maximal if every SCC containing S is just S itself.
Example:
entry
B1
B2
B3
exit
Page 32
S1 = fB1 B2 B3g is a maximal SCC.
S2 = fB2g is a SCC but not a maximal SCC.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Interval analysis
B1a
T2:
Simplest variant: T 1-T 2 Analysis [Ullman’73]
T1:
B1
Try to fold entire flow graph into a single node
Works only for very restricted flow graphs
B2
B1
B1a
Hierarchical folding structure allows for faster / simpler data flow analysis
! nested regions (control tree)
! abstract flowgraph
Divide flowgraph into regions (e.g., loops in CFA)
b
A flow-graph is reducible if all B edges in any DFS tree are loop back edges.
c
Repeatedly collapse a region to an abstract node
d
c’
f
d
Intuitively: ... if there are no jumps into the middles of loops (e.g., goto’s).
c
f
b
e
e
Reducible flow graphs are well-structured (loops properly nested).
Irreducible flow graphs are rare
and can be made reducible by replicating nodes.
B1a
B1a
B3a
T1:
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
B1
T2:
T1-T2 Analysis — Example
T1:
Example:
B1
B2
B3
B4
B1a
B3b
Page 33
T2:
T2:
B1
B2
B2
B1b
C. Kessler, IDA, Linköpings Universitet, 2009.
B1a
B1a
B1
B3
T 1-T 2 control tree
B1a
C. Kessler, IDA, Linköpings Universitet, 2009.
B1b
B1
B3b
B3a
B4
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Structural Analysis
is a special case of interval analysis:
Page 34
C. Kessler, IDA, Linköpings Universitet, 2009.
CFG folding follows the hierarchical structure of the program
! folding transformations for loops, if-then-else, switch, etc.
Every region has 1 entry point
Works only for well-structured programs
– Extensions to handle arbitrary flowgraphs
(define a new region / transf. for otherwise irreducible constructs)
Y
B1
entry
R8
B5
N
R7
n
...
m
B3
Y
B4
B6
Page 36
entry
R7
R8
B6
B5
exit
C. Kessler, IDA, Linköpings Universitet, 2009.
procedure
R9
B2
B3
B4
Region Hierarchy Tree
B1
+ Equations etc. for dataflow analysis can be pre-formulated
for each construct ! faster
R9
B2
exit
Remark: If only loop-based regions are of interest, the hierarchy flattens accordingly (R8 and R9 merged with top level).
N
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 35
Tstmt−block:
B1
B2
Bn
Twhile−loop:
B2
B1a
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
B1a
B1a
B1a
Example (cont.): Structural analysis
B1
B3
Structural analysis — Some regions and transformations
Tself−loop:
B1
Tif−then:
B2
B1
Tif−then−else:
B2
C. Kessler, IDA, Linköpings Universitet, 2009.
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Page 37
TDDC86 Compiler Optimizations and Code Generation — Control Flow Analysis.
Summary: Control-Flow Analysis
g = all leaf regions, i.e., all single blocks in G, in any order
Structural analysis
Interval-based CFA
Dominator-based CFA
Loop detection
Basic blocks, extended basic blocks
Computing a bottom-up order of regions of a reducible flow graph
:::
Input: A reducible flow graph G
Output: A bottom-up ordered list R of loop-based regions of G
1. R fB1 B2
2. repeat
Choose a natural loop L such that,
if there are any natural loops L0 contained within L,
then the (body and loop) regions for these L0 were already added to R.
R.add( the region consisting of the body of L )
// body of L = L without the back edges to the header of L
R.add ( the loop region for L )
until all natural loops have been considered
3. If the entire flow graph is not itself a natural loop,
R.add( the region consisting of the entire flow graph ).
Page 38
C. Kessler, IDA, Linköpings Universitet, 2009.
Download