Peephole Optimization: improves the performance by examining a

advertisement
Notes for 05-31-03
Session 2
Peephole Optimization: improves the performance by examining a short sequence of
instructions, the peephole, and is characterized by 4 types of program transformations:
- RIE: redundant-instruction elimination
- FOC: flow-of-control optimizations
- Algebraic simplifications
- Machine idioms: using the hardware, the fastest piece of code.
Here is an example from the dragon book, dealing with code optimization:
Pg 590:
The leaders are:
(1) i = m-1
(5) i = i +1
(9) j = j –1
(13) if i >= j goto (23)
(14) t6 = 4 * i
(23) t11 = 4 * i
There are therefore 6 basic blocks labeled Bi, 1<i<6.
The flow graph uses directed arrows, and we need to change line numbers.
goto 5 -> goto B2
goto 9 -> goto B3
Global optimization deals with more than one basic block and is characterized the
following transformations:
- CSE: common subexpression elimination; if 2 expressions are the same and as
long as the variable is not redefined in the middle, we only need to compute one.
- CP: copy propagation eliminates unnecessary copy of a variable.
- DCE: dead code elimination
- Constant Folding
In the example, we need to apply local optimization first, then global optimization. Local
optimization reduces the code before applying global optimization.
Local optimization
Global optimization
In B5:
t6 = 4 * i
x = a[t6]
t7 = t6
In B2: i’s value change
i = i +1
t2 = 4 * ii
now looking B2 and B5
t6 = t2 (note that it does not
hold if t2 is redefined after
B2)
…
t8 = 4 * j
t10 = t8
similarly for this expression
t4 = 4 * j
t8 = t4
t10 = t8
Apply copy propagation only if there is no redefinition.
t11 = t2 : substitute t2 for t11
Substitution can be done as long as t2 is not live (or used) somewhere else in the code.
t6 = t2
t7 = t6
Check that t2 is not used after the point where it is defined and t6 is not redefined.
x = a[t2]
t7 = t2
but we cannot delete t6 yet.
DCE: eliminates a dead variable that is not used after it is defined, or is never reached.
i is defined and then used, we cannot get rid of it.
u-d chains and d-u chains, u stands for used, d stands for defined.
There are 2 d-u chains for the variable i.
Since t2 = 4 * i and t4 = 4* j, we can eliminate t6, t7, t8, t10, t11, t12, t13, t15.
Copy propagation leads to dead code elimination.
Constant folding: substitute values in a compiler time that we know.
example:
#define debug 0
if (debug){
// dead code, debug is never true.
}
#define base 2
x = y * base
x = y << 1
Loop optimizations:
- code motion: idempotent
- induction variables / reduction in strength
How do we know there is a loop?
- all nodes and subcollections are strongly connected from one node to any other
node. (not to be confused with fully connected, when there is a direct connection
from one node to any other node).
- there is a unique entry point
inner loops:
B2 to B2
B2 to B5 to B2
A loop invariant bears same results (and is different from induction) illustrated by this
example:
for (i = 0; i<10;i++){
j=i;
}
At the end of the loop, j is always 10.
In our given problem, and i is the induction variable.
Take a look at the big loop:
i=i+1
t2 = 4 * i
i
t2
1
2
3
8
12
16
B1:
is equivalent to
B1:
v = a[t1]
B2:
v = a[t1]
t2 = 4 * i
B2:
i=i+1
t2 = 4 * i
i=i+1
t2 = t2 + 4
i and t2 are 2 induction variables, but we only need one induction variable per loop,
therefore we can get rid of i = i + 1.
The same can be done for t4, then using DCE, we can get rid of x. After transformations,
we obtain the following flow graph:
dominator: deals with the flow graph.
d dom n:
- d dominates n if every path from the initial node to n includes d.
- every node dominates itself.
1
2
3
4
5
6
7
8
9
10
dom
{1}
{1, 2}
{1, 3}
{1, 3, 4}
{1, 3, 4, 5}
{1, 3, 4, 6}
{1, 3, 4, 7}
{1, 3, 4, 7, 8}
{1, 3, 4, 7, 8, 9}
{1, 3, 4, 7, 8, 10}
the flow graph from pg 603:
immediate dom: each node has a unique immediate dom; it’s the last node that dominates
the node.
Dominator tree:
Finding loops using dominators.
find node whose head dominates the tail.
tail
head
a -> b
b dom a: back edge
7 -> 4
4 -> dom 7
8 to 3 is a back edge
10 -> 7, loop: {7, 8, 10}
1 -> 9 => loop: {1, 2, 3, 4, 5, 6, 7, 8, 9}
Download