15-745 Lecture 5 Loops are Key Common loop optimizations

advertisement
Loops are Key
15-745 Lecture 5
• Loops are extremely important
90 10 rule
– the “90-10”
• Loop optimization involves
– understanding control-flow
control flow structure
– Understanding data-dependence information
– sensitivity to side-effecting
side effecting operations
– extra care in some transformations such as
register spilling
Control flow analysis
Natural loops
Classical Loop Optimizations
D
Dependencies
d
i
C
Copyright
i ht © S
Seth
th Goldstein,
G ld t i 2008
Based on slides from Lee & Callahan
Lecture 5
15-745 © 2008
1
Lecture 5
Common loop optimizations
15-745 © 2008
2
Finding Loops
• To optimize loops, we need to find them!
• Could use source language loop information in
the abstract syntax tree…
• BUT:
Scalar opts,
• Hoisting of loop-invariant computations
DF analysis,
l i
– pre-compute before entering the loop Control flow analysis
• Elimination of induction variables
– change
h
p=i*w+b
i* b to p=b,p+=w,
b
when
h w,b
b invariant
i
i
• Loop unrolling
– to improve scheduling of the loop body
Requires
• Software pipelining
understanding
– To improve scheduling of the loop body
data
dependencies
• Loop permutation
– to improve cache memory performance
Lecture 5
15-745 © 2008
– There are multiple source loop constructs: for, while,
do-while, even goto in C
– Want IR to support different languages
– Ideally, we want a single concept of a loop so all have
same analysis,
analysis same optimizations
– Solution: dismantle source-level constructs, then
re-find loops from fundamentals
3
Lecture 5
15-745 © 2008
4
Finding Loops
Control-flow analysis
• To optimize loops, we need to find them!
Specifically
• Specifically:
– loop-header node(s)
• Many languages have goto and other complex
control, so loops can be hard to f
find
nd in
n general
• Determining
g the control structure of a program
p g
is called control-flow analysis
• nodes in a loop that have immediate predecessors
not in the loop
– back edge(s)
• c
control-flow
nt l fl
edges
d s tto
previously executed nodes
• Based on the notion of dominators
– all nodes in the loop
p body
y
Lecture 5
15-745 © 2008
5
Lecture 5
Dominators
• A control-flow edge
from node B3 to B
B2
is a back edge if
B2 dom B3
• a sdom b
– a strictly dominates b if a dom b and a != b
• Furthermore, in
that case node B2
is a loop header
• a idom b
– a immediately
i
di
l d
dominates
i
b if a sdom
d
b
b, AND
ND
there is no c such that a sdom c and c sdom b
15-745 © 2008
6
Back edges and loop headers
• a dom b
dom nates b if
f every possible
poss ble
– node a dominates
execution path from entry to b includes a
Lecture 5
15-745 © 2008
Entry
B1
k = false
i = 1
j = 2
i <= n
B3
B2
..k..
j = j*2
k = true
i = i+1
print j
B5
B4
i = i+1
B6
exit
7
Lecture 5
15-745 © 2008
8
Natural loop
Examples
• Consider a back edge from node n to node h
a
• The natural loop of n→h is the set of nodes L
such that for all x∈L:
– h dom x and
– there is a path from x to n not containing h
Lecture 5
15-745 © 2008
Simple example:
c
(often it’s more
complicated, since
a FOR loop
l
f
found
d iin
the source code
might
g need an if/then
guard)
9
Lecture 5
a
for (..) {
if {
…
} else {
…
if (x) {
e;
break;
)
}
}
c
d
e
f
15-745 © 2008
10
Examples
b
Lecture 5
d
15-745 © 2008
Examples
Try this:
b
11
Lecture 5
e
15-745 © 2008
12
Examples
for (..) {
if {
…
lexically, in loop,
} else {
but not in
…
natural loop
if (x) {
e;
break;
)
}
}
Lecture 5
15-745 © 2008
Examples
for (..) {
if {
…
lexically, in loop,
} else {
but not in
…
natural loop
if (x) {
e;
break;
and another
)
y CFG
reason why
}
analysis is
}
preferred over
source/AST
loops
e
13
Lecture 5
e
15-745 © 2008
Examples
14
Natural Loops
One loop per header..
• Yes, it can happen in C
What are the natural loops?
0
0
1
1
0
{0 1 2}
{0,1,2}
1
2
2
2
0
3
3
{}
1
{1,2,3}
Lecture 5
15-745 © 2008
15
Lecture 5
2
{1,2},{0,1,2,3}
15-745 © 2008
16
Nested Loops
General Loops
• Unless two natural loops have the same header,
they are either
e ther d
disjoint
sjo nt or nested w
within
th n each
other
p (sets
(
of blocks)) with
• If A and B are loops
headers a and b such that a ≠ b and b ∈ A
–B⊂A
– loop B is nested within A
p
– B is the inner loop
• Can compute the loop-nest tree
• A more general looping structure is a strongly
connected
t d componentt off th
the control
t l flow
fl
graph
h
– subgraph <Nscc,Escc> such that
every block in Nscc is reachable from every
other node using only edges in Escc
0
scc
1
Not very useful definition of a loop
Lecture 5
15-745 © 2008
17
Lecture 5
Reducible Flow Graphs
There is a special
p
class of flow graphs,
g p , called
reducible flow graphs, for which several codeoptimizations are especially easy to perform.
In reducible flow graphs loops are unambiguously
defined and dominators can be efficiently
computed.
15-745 © 2008
18
Reducible flow graphs
Definition: A flow graph G is reducible iff we can partition
the edges into two disjoint groups, forward edges and back
edges with the following two properties.
edges,
properties
1. The forward edges form an acyclic graph in which every
node can be reached from the initial node of G.
G
2. The back edges consist only of edges whose heads
dominate their tails.
Why isn’t this
reducible?
This flow graph has no back edges.
edges Thus
Thus, it would be
reducible if the entire graph were acyclic, which is
not the case.
Lecture 5
15-745 19© 2008
2
Lecture 5
15-74520
© 2008
0
1
2
Alternative definition
Properties of Reducible Flow Graphs
• Definition: A flow graph G is reducible if we can
repeatedly collapse (reduce) together blocks (x,y)
where x is the only predecessor of y (ignoring self
loops) until we are left with a single node
0
0
0
0
• In a reducible flow graph,
g p ,
all loops are natural loops
p
• Can use DFS to find loops
• Many analyses are more efficient
– polynomial versus exponential
1,2,3
1
1,2
,
1,2,3
, ,
2
3
Lecture 5
0 1 2 3
0,1,2,3
3
15-745 21© 2008
Lecture 5
Good News
• Most flow graphs are reducible
• Languages prohibit irreducibility
– goto free C
– Java
• Programmers usually don’t use such constructs
even if they’re
they re available
– >90% of old Fortran code reducible
15-74522
© 2008
Dealing with Irreducibility
• Don’t
• Can split nodes and duplicate code to get
reducible graph
– possible exponential blowup
• Other techniques…
0
0
1
1
2
2a
Lecture 5
15-74523
© 2008
Lecture 5
15-745 © 2008
2
Loop-invariant computations
• A definition
t = x op y
in a loop is (conservatively) loop-invariant if
– x and y are constants,
constants or
– all reaching definitions of x and y are
outside the loop
loop, or
– only one definition reaches x (or y), and
that definition is loop
loop-invariant
invariant
Loop optimizations:
L
ti i ti s:
Hoisting
g of loop-invariant
p
computations
• so keep marking iteratively
Lecture 5
15-745 © 2008
25
Lecture 5
Loop-invariant computations
• Be careful:
• In order to “hoist” a loop-invariant computation
out of a loop, we need a place to put itt
Of course, not an issue in SSA
• We could copy it to all immediate predecessors
(except along the back-edge) of the loop
header...
• ...But we can avoid code duplication
p
by
y inserting
g
a new block, called the pre-header
jmp L1;
L2:
• Even though t’s two reaching
expressions
p
are
each invariant, s is not invariant…
15-745 © 2008
26
Hoisting
t1 = expr;
t = expr;
L1:
for () {
b L2;
brc
s = t * 2;
t2 = phi(t1, t3);
t = loop_invariant_expr;
s = t2 * 2;;
x = t + 2;
2
t3 = loop_invariant_expr;
…
x1 = t3 * 2;
}
...
Lecture 5
15-745 © 2008
27
Lecture 5
15-745 © 2008
28
Hoisting
Hoisting
preheaders
A
A
B
B
A’
A
A
B’
B
Lecture 5
15-745 © 2008
29
Lecture 5
Hoisting conditions
15-745 © 2008
We need to be careful...
• For a loop-invariant definition
d: t = x op y
• we can hoist d into the loop’s pre-header only if
• All hoisting conditions must be satisfied!
L0:
L0
t = 0
L1:
i = i + 1
t = a * b
M[i] = t
if i<N goto L1
L2:
x = t
1 d
1.
d’ss block dominates all loop exits at which t is live
liveout, and
2. d is only the only definition of t in the loop, and
3. t is not live-out of the pre-header
OK
Lecture 5
15-745 © 2008
30
31
Lecture 5
L0:
L0
t = 0
L1:
if i>=N goto L2
i = i + 1
t = a * b
M[i] = t
goto L1
L2:
x = t
L0:
L0
t = 0
L1:
i = i + 1
t = a * b
M[i] = t
t = 0
M[j] = t
if i<N goto L1
L2:
violates 1,3
violates 2
15-745 © 2008
32
The basic idea of IVE
• Suppose we have a loop variable
– i initially 0; each iteration i = i + 1
Loop optimizations:
Induction variable
Induction-variable
Strength reduction
• and a variable that linearly depends on it:
x = i * c1 + c2
• In such cases, we can try to
– initialize
i iti li x = io * c11 + c2
2 (execute
(
once))
– increment x by c1 each iteration
Lecture 5
15-745 © 2008
33
Lecture 5
Simple Example of IVE
15-745 © 2008
34
Simple Example of IVE
i <- 0
j' <- 0
k' <- a
i <- 0
i <- 0
H:
if i
j <k <M[k]
i <<
goto
for i = 0 to n
a[i] = 0
H:
>= n goto exit
i * 4
j + a
<- 0
i + 1
H
if i
j <k <M[k]
i <goto
Clearly, j & k do not need to be computed anew
each time since they
y are related to i and
i changes linearly.
Lecture 5
15-745 © 2008
>= n goto e
exit
t
i * 4
j + a
<- 0
i + 1
H
H:
if i >= n goto exit
j <- j'
k <- k'
M[k] <- 0
i <- i + 1
j' <- j' + 4
k' <
<- k' + 4
goto H
B t then
But,
th we don't
d 't even need
d j ((or j')
35
Lecture 5
15-745 © 2008
36
Simple Example of IVE
Simple Example of IVE
Rewrite comparison
i <- 0
j' <- 0
k' <- a
i <- 0
k' <- a
H:
i <- 0
k' <- a
H:
:
:
H:
if i >= n goto exit
j <- j'
k <- k'
M[k] <- 0
i <- i + 1
j' <- j' + 4
k' <
<- k' + 4
goto H
if i >= n goto exit
k <- k'
M[k] <- 0
i <- i + 1
k' <- k' + 4
goto H
15-745 © 2008
37
Lecture 5
k' <- a
n' <- a + (n * 4)
H:
:
15-745 © 2008
k' <- a
n' <- a + (n * 4)
H:
:
if k' >= n' goto exit
k <- k'
M[k] <- 0
k' <- k' + 4
goto H
H:
:
if k' >= n' goto exit
k <- k'
M[k] <- 0
k' <- k' + 4
goto H
if k' >= n' goto exit
M[k'] <- 0
k' <- k' + 4
goto H
Voila!
now, we do
d copy propagation
ti and
d eliminate
li i t k
Lecture 5
38
Copy propagation
k' <- a
n' <- a + (n * 4)
if k' >= a+(n*4)goto exit
k <- k'
M[k] <- 0
k' <- k' + 4
goto H
15-745 © 2008
Simple Example of IVE
Invariant code motion on a+(n*4)
H:
:
if k' >= a+(n*4) goto exi
k <- k'
M[k] <- 0
k' <- k' + 4
goto H
But a+(n*4) is loop invariant
But,
Simple Example of IVE
i <- 0
k' <- a
:
H:
if i >= n goto exit
k <- k'
M[k] <- 0
i <- i + 1
k' <- k' + 4
goto H
D we need
Do
d i?
Lecture 5
i <- 0
k' <- a
39
Lecture 5
15-745 © 2008
40
Simple Example of IVE
What we did
Compare original and result of IVE
•
•
•
•
identified induction variables (i,j,k)
strength reduction (changed * into +)
dead-code elimination (j <- j')
useless variable elimination (j
useless-variable
(j' << jj' + 4)
(This can also be done with ADCE)
• loop invariant identification & code-motion
• almost useless-variable elimination (i)
• copy propagation
k' <- a
n' <- a + (n * 4)
i <- 0
H:
H:
if i
j <k <M[k]
i <
<goto
>= n goto exit
i * 4
j + a
<- 0
i + 1
H
if k' >= n' goto exit
M[k'] <- 0
k' <- k' + 4
goto H
Voila!
Lecture 5
15-745 © 2008
41
Lecture 5
Is it faster?
• Before attempting IVE, it is best to first
perform :
– constant propagation & constant folding
– copy propagation
– loop-invariant hoisting
• Furthermore, one fewer value is computed,
– thus potentially saving a register
– and decreasing the possibility of spilling
15-745 © 2008
42
Loop preparation
• On some hardware,
hardware adds are much faster than
multiplies
Lecture 5
15-745 © 2008
43
Lecture 5
15-745 © 2008
44
How to do it, step 1
Representing IVs
• First, find the basic IVs
– scan loop body for defs of the form
x = x + c or x = x – c
where c is loop
loop-invariant
invariant
– record these basic IVs as
x = (x,
(x 1,
1 c)
– this represents the IV: x = x * 1 + c
• Characterize all induction variables by:
(base-variable, offset, multiple)
p are loopp
– where the offset and multiple
invariant
• IOW, after an induction variable is defined it
equals:
offset
ff t + multiple
lti l * b
base-variable
i bl
Lecture 5
15-745 © 2008
45
Lecture 5
15-745 © 2008
How to do it, step 2
How to do it, step 3
• Scan for derived IVs of the form
k = i * c1 + c2
– where i is a basic IV,
this is the only def of k in the loop
loop, and
c1 and c2 are loop invariant
• We say k is in the family of i
• Record as k = (i, c1, c2)
• Iterate, looking for derived IVs of the form
k = j * c1 + c2
– where IV j =(i, a, b), and
– this is the only def of k in the loop
loop, and
– there is no def of i between the def of j and
the def of k
– c1 and c2 are loop invariant
• Record as k = (i,
(i a
a*c1
c1, b
b*c1+c2)
c1+c2)
Lecture 5
15-745 © 2008
47
Lecture 5
15-745 © 2008
46
48
Simple Example of IVE
Finding the IVs
• Maintain three tables: basic & maybe & other
• Find basic Ivs:
Scan stmts. If var ∉ maybe, and of proper
form,, put
p into basic. Otherwise,, put
p var in
other and remove from maybe.
p
Ivs:
• Find compound
– If var defined more than once, put into other
proper
p form that use a basic
– For all stmts of p
IV
i <- 0
H:
if i
j <k <M[k]
i <goto
>= n goto exit
i * 4
j + a
<- 0
i + 1
H
i: (i, 1, 1) i.e., i = 1 + 1 * i
j: (i, 0, 4) i.e., j = 0 + 4 * i
k: (i, a, 4) i.e., k = a + 4 * i
» FIX THIS SLIDE
S j & k are in
So,
i f
family
il of
fi
Lecture 5
15-745 © 2008
49
Lecture 5
How to do it, step 4
15-745 © 2008
How to do it, step 5
• This is the strength reduction step
• This is the comparison rewriting step
• For an induction variable k = (i, c1, c2)
• For an induction variable k = (i, ak, bk)
– If k used only in definition and comparison
– There exists another variable, j, in the same
class and is not “useless”
useless and j=(i
j=(i, aj, bj)
• Rewrite k < n as
j < (bj/bk)(n
)(n-a
ak)+aj
• Note: since they are in same class:
(j-aj)/bj = (k-ak)/bk
– initialize k = i * c1 + c2 in the preheader
– replace k
k’ss def in the loop by
k = k + c1
– make sure to do this after ii’ss def
Lecture 5
15-745 © 2008
50
51
Lecture 5
15-745 © 2008
52
Notes
Is it faster? (2)
• Are the c1, c2 constant, or just invariant?
constant then you can keep folding them:
– if constant,
they’re always a constant even for derived IVs
otherwise they can be expressions of looploop
– otherwise,
invariant variables
• On some hardware, adds are much faster than
mult pl es
multiplies
• But…not always a win!
– Constant multiplies might otherwise be
reduced to shifts/adds that result in even
better code than IVE
– Scaling of addresses (i*4) might come for
free on your processor’s address modes
• So maybe: only convert i*c1+c2 when c1 is
loop invariant but not a constant
• But if constant, can find IVs of the type
x = i/b
and know that it’s legal, if b evenly divides the
stride…
Lecture 5
15-745 © 2008
53
Common loop optimizations
• Hoisting of loop-invariant computations
– pre-compute before entering the loop
• Elimination of induction variables
– change
h
p=i*w+b
i* b to p=b,p+=w,
b
when
h w,b
b invariant
i
i
• Loop unrolling
– to to improve scheduling of the loop body
Requires
• Software pipelining
understanding
– To improve scheduling of the loop body
data
dependencies
• Loop permutation
– to improve cache memory performance
Lecture 5
15-745 © 2008
55
Lecture 5
15-745 © 2008
54
Download