Classic Optimizations

advertisement
Classical Optimization

Types of classical optimizations

Operation level: one operation in isolation
Local: optimize pairs of operations in same basic

peephole optimization
Global: optimize pairs of operations spanning


block (with or without dataflow analysis), e.g.
multiple basic blocks and must use dataflow
analysis in this case, e.g. reaching definitions,
UD/DU chains, or SSA forms
Loop: optimize loop body and nested loops
1
Local Constant Folding


Goal: eliminate
unnecessary operations
Rules:
X is an arithmetic
operation
2. If src1(X) and src2(X)
are constant, then
change X by applying
the operation
1.
r7 = 4 + 1
r5 = 2 * r4
r6 = r5 * 2
src2(X) = 1
src1(X) = 4
2
Local Constant Combining

Goal: eliminate unnecessary
operations


First operation often
becomes dead
Rules:
Operations X and Y in
same basic block
2. X and Y have at least one
literal src
3. Y uses dest(X)
4. None of the srcs of X have
defs between X and Y
(excluding Y)
1.
r7 = 5
r5 = 2 * r4
r6 = r5 * 2
r6 = r4 * 4
3
Local Strength Reduction


Goal: replace expensive
operations with cheaper
ones
Rules (example):
X is an multiplication
operation where src1(X)
or src2(X) is a const 2k
integer literal
2. Change X by using shift
operation
3. For k=1 can use add
1.
r7 = 5
r5 = 2 * r4
r6 = r4 * 4
r5 = r4 + r4
r6 = r4 << 2
4
Local Constant Propagation
r1 = 5
r2 = _x
r3 = 7
r4 = r4
r1 = r1
r1 = r1
r3 = 12
r8 = r1
r9 = r3
r3 = r2
r7 = r3
M[r7] =

+ r1
+ r2
+ 1
+
+
0
r2
r5
1
r1

Goal: replace register uses
with literals (constants) in
single basic block
Rules:
Operation X is a move to
register with src1(X) literal
2. Operation Y uses dest(X)
3. There is no def of dest(X)
between X and Y
(excluding defs at X and Y)
4. Replace dest(X) in Y with
src1(X)
1.
5
Local Common Subexpression
Elimination (CSE)
r1
r4
r1
r6
r2
r5
r7
r5
=
=
=
=
=
=
=
=
r2
r4
6
r2
r1
r4
r2
r1
+ r3
+ 1
+
+
+
-
r3
1
1
r3
1

Goal: eliminate recomputations
of an expression



More efficient code
Resulting moves can get copy
propagated (see later)
Rules:
1.
2.
3.
4.
5.
Operations X and Y have the
same opcode and Y follows X
src(X) = src(Y) for all srcs
For all srcs, no def of a src
between X and Y (excluding Y)
No def of dest(X) between X
and Y (excluding X and Y)
Replace Y with move dest(Y) =
dest(X)
6
Dead Code Elimination
r1 = 3
r2 = 10

r4 = r4 + 1
r7 = r1 * r4
r3 = r3 + 1

X is an operation with no
use in DU chain, i.e.
dest(X) is not live
2. Delete X if removable (not
a mem store or branch)
1.
r2 = 0

r3 = r2 + r1
M[r1] = r3
Goal: eliminate any operation
who’s result is never used
Rules (dataflow required)
Rules too simple!
Misses deletion of r4, even
after deleting r7, since r4
is live in loop
 Better is to trace UD
chains backwards from
“critical” operations

7
Local Backward Copy
Propagation
r1
r2
r4
r6
r9
r7
r5
r4
r8
=
=
=
=
=
=
=
=
=
r8
r9
r2
r2
r1
r6
r6
0
r2
+ r9
+ r1


+ 1

+ 1
+ r7
Goal: propagate LHS of moves
backward
Eliminates useless moves
Rules (dataflow required)
1.
2.
3.
4.
5.
6.
7.
X and Y in same block
Y is a move to register
dest(X) is a register that is not
live out of the block
Y uses dest(X)
dest(Y) not used or defined
between X and Y (excluding X
and Y)
No uses of dest(X) after the
first redef of dest(Y)
Replace src(Y) on path from X
to Y with dest(X) and remove Y
8
Global Constant Propagation
r1 = 4
r2 = 10

r5 = 2
r7 = r1 * r5
r3 = r3 + r5
r2 = 0
r3 = r2 + r1
r6 = r7 * r4

Goal: globally replace
register uses with literals
Rules (dataflow required)
X is a move to a register
with src1(X) literal
2. Y uses dest(X)
3. dest(X) has only one def at
X for UD chains to Y
4. Replace dest(X) in Y with
src1(X)
1.
M[r1] = r3
9
Global Constant Propagation
with SSA
r1 = 4
r2 = 10

r5 = 2
r7 = r1 * r5

Goal: globally replace register
uses with literals
Rules (high level)
1.
2.
3.
r3 = r3 + r5
r2 = 0
4.
r3 = r2 + r1
r6 = r7 * r4
5.
6.
M[r1] = r3
For operation X with a register
src(X)
Find def of src(X) in chain
If def is move of literal, src(X)
is constant: done
If RHS of def is an operation,
including  node, recurse on all
srcs
Apply rule for operation to
determine src(X) constant
Note: abstract values T (top)
and  (bottom) are often used
to indicate unknown values
Exercise: compute SSA form and propagate constants
10
Forward Copy Propagation

Goal: globally propagate RHS of
moves forward


r1 = r2
r3 = r4

Rules (dataflow required)
1.
r6 = r3 + 1
r2 = 0
2.
3.
r5 = r2 + r3
Reduces dependence chain
May be possible to eliminate
moves
4.
5.
X is a move with src1(X)
register
Y uses dest(X)
dest(X) has only one def at X
for UD chains to Y
src1(X) has no def on any path
from X to Y
Replace dest(X) in Y with
src1(X)
11
Global Common Subexpression
Elimination (CSE)

r1 = r2 * r6
r3 = r4 / r7

Goal: eliminate recomputations
of an expression
Rules:
1.
2.
3.
r2 = r2 + 1
r1 = r3 * 7
4.
r5 = r2 * r6
r8 = r4 / r7
5.
X and Y have the same opcode
and X dominates Y
src(X) = src(Y) for all srcs
For all srcs, no def of a src on
any path between X and Y
(excluding Y)
Insert rx = dest(X) immediately
after X for new register rx
Replace Y with move dest(Y) =
rx
r9 = r3 * 7
12
Loop Optimizations

Loops are the most important target for
optimization


Programs spend much time in loops
Loop optimizations




Invariant code removal (aka. code motion)
Global variable migration
Induction variable strength reduction
Induction variable elimination
13
Code Motion
r1 = 0
preheader
r4 = M[r5]
r7 = r4 * 3
header


Goal: move loop-invariant
computations to preheader
Rules:
1.
2.
r8 = r2 + 1
r7 = r8 * r4
r3 = r2 + 1
r1 = r1 + r7
M[r1] = r3
3.
4.
5.
6.
Operation X in block that
dominates all exit blocks
X is the only operation to
modify dest(X) in loop body
All srcs of X have no defs in
any of the basic blocks in
the loop body
Move X to end of preheader
Note 1: if one src of X is a
memory load, need to check
for stores in loop body
Note 2: X must be movable
and not cause exceptions
14
Global Variable Migration

r4 = M[r5]
r4 = r4 + 1
r8 = M[r5]
r7 = r8 * r4
M[r5] = r7
M[r5] = r4

Goal: assign a global variable to
a register for the entire duration
of a loop
Rules:
X is a load or store to M[x]
2. Address x of M[x] not
modified in loop
3. Replace all M[x] in loop by
new register rx
4. Add rx = M[x] to preheader
5. Add M[x] = rx to each loop
exit
6. Memory disambiguation is
required: all mem ops in loop
whose address can equal x
must use same address x
1.
15
Loop Strength Reduction (1)
preheader
r5 = r4 - 3
r4 = r4 + 1
header


1.
r7 = r4 * r9
r6 = r4 << 2
Goal: create basic IVs from
derived IVs
Rules
src2(X) = r9
src1(X) = r4
dest(X) = r7
2.
3.
4.
5.
6.
X is a *, <<, +, or operation
src1(X) is a basic IV
src2(X) is invariant
No other ops modify
dest(X)
dest(X) != src(X) for all srcs
dest(X) is a register
Basic IV r4 has triple (r4, 1, ?)
16
Loop Strength Reduction (2)
r1 = r4 * r9
r2 = 1 * r9
r5 = r4 - 3
r4 = r4 + 1
r1 = r1 + r2
r7 = r1

Transformation
1.
2.
3.
4.
r6 = r4 << 2
5.
Insert into the bottom of the
preheader:
new_reg = RHS(X)
If opcode(X) is not + or -, then
insert into the bottom of the
preheader:
new_inc = inc(src1(X))
opcode(X) src2(X)
Else
new_inc = inc(src1(X))
Insert at each update of
src1(X):
new_reg += new_inc
Change X by:
dest(X) = new_reg
Exercise: apply strength reduction to r5 and r6
17
IV Elimination (1)
r1 = 0
r2 = 0

r1 = r1 - 1
r2 = r2 - 1

Goal: remove unnecessary basic IVs
from the loop by substituting uses
with another basic IV
Rules for IVs with same increment
and initial value:
1.
2.
r9 = r2 + r4
r7 = r1 * r9
3.
4.
5.
r4 = M[r1]
6.
Find two basic IV x and y
If x and y in same family and have
same increment and initial values
Incremented at same place
x is not live at loop exit
For each basic block where x is
defined, there are no uses of x
between first/last def of x and
last/first def of y
Replace uses of x with y
M[r2] = r7
Exercise: apply IV elimination
18
IV Elimination (2)

Many variants, from simple to complex:
1.
2.
3.
4.
5.


Trivial cases: IV variable that is never used except by the
increment operations and is not live at loop exit
IVs with same increment and same initial value
IVs with same increment and initial values are known
constant offset from each other
IVs with same increment, but initial values unknown
IVs with different increments and no info on initial values
Method 1 and 2 are virtually free, so always applied
Methods 3 to 5 require preheader operations
19
IV Elimination (3)

Example for method 4
r1 = ?
r2 = ?
r1 = ?
r2 = ?
r5 = r2-r1+8
r3
r4
…
r1
r2
r3 = M[r1+4]
r4 = M[r1+r5]
…
r1 = r1 + 4
= M[r1+4]
= M[r2+8]
= r1 + 4
= r2 + 4
20
Download