Notes for 05-31-03 Session 2 Peephole Optimization: improves the performance by examining a short sequence of instructions, the peephole, and is characterized by 4 types of program transformations: - RIE: redundant-instruction elimination - FOC: flow-of-control optimizations - Algebraic simplifications - Machine idioms: using the hardware, the fastest piece of code. Here is an example from the dragon book, dealing with code optimization: Pg 590: The leaders are: (1) i = m-1 (5) i = i +1 (9) j = j –1 (13) if i >= j goto (23) (14) t6 = 4 * i (23) t11 = 4 * i There are therefore 6 basic blocks labeled Bi, 1<i<6. The flow graph uses directed arrows, and we need to change line numbers. goto 5 -> goto B2 goto 9 -> goto B3 Global optimization deals with more than one basic block and is characterized the following transformations: - CSE: common subexpression elimination; if 2 expressions are the same and as long as the variable is not redefined in the middle, we only need to compute one. - CP: copy propagation eliminates unnecessary copy of a variable. - DCE: dead code elimination - Constant Folding In the example, we need to apply local optimization first, then global optimization. Local optimization reduces the code before applying global optimization. Local optimization Global optimization In B5: t6 = 4 * i x = a[t6] t7 = t6 In B2: i’s value change i = i +1 t2 = 4 * ii now looking B2 and B5 t6 = t2 (note that it does not hold if t2 is redefined after B2) … t8 = 4 * j t10 = t8 similarly for this expression t4 = 4 * j t8 = t4 t10 = t8 Apply copy propagation only if there is no redefinition. t11 = t2 : substitute t2 for t11 Substitution can be done as long as t2 is not live (or used) somewhere else in the code. t6 = t2 t7 = t6 Check that t2 is not used after the point where it is defined and t6 is not redefined. x = a[t2] t7 = t2 but we cannot delete t6 yet. DCE: eliminates a dead variable that is not used after it is defined, or is never reached. i is defined and then used, we cannot get rid of it. u-d chains and d-u chains, u stands for used, d stands for defined. There are 2 d-u chains for the variable i. Since t2 = 4 * i and t4 = 4* j, we can eliminate t6, t7, t8, t10, t11, t12, t13, t15. Copy propagation leads to dead code elimination. Constant folding: substitute values in a compiler time that we know. example: #define debug 0 if (debug){ // dead code, debug is never true. } #define base 2 x = y * base x = y << 1 Loop optimizations: - code motion: idempotent - induction variables / reduction in strength How do we know there is a loop? - all nodes and subcollections are strongly connected from one node to any other node. (not to be confused with fully connected, when there is a direct connection from one node to any other node). - there is a unique entry point inner loops: B2 to B2 B2 to B5 to B2 A loop invariant bears same results (and is different from induction) illustrated by this example: for (i = 0; i<10;i++){ j=i; } At the end of the loop, j is always 10. In our given problem, and i is the induction variable. Take a look at the big loop: i=i+1 t2 = 4 * i i t2 1 2 3 8 12 16 B1: is equivalent to B1: v = a[t1] B2: v = a[t1] t2 = 4 * i B2: i=i+1 t2 = 4 * i i=i+1 t2 = t2 + 4 i and t2 are 2 induction variables, but we only need one induction variable per loop, therefore we can get rid of i = i + 1. The same can be done for t4, then using DCE, we can get rid of x. After transformations, we obtain the following flow graph: dominator: deals with the flow graph. d dom n: - d dominates n if every path from the initial node to n includes d. - every node dominates itself. 1 2 3 4 5 6 7 8 9 10 dom {1} {1, 2} {1, 3} {1, 3, 4} {1, 3, 4, 5} {1, 3, 4, 6} {1, 3, 4, 7} {1, 3, 4, 7, 8} {1, 3, 4, 7, 8, 9} {1, 3, 4, 7, 8, 10} the flow graph from pg 603: immediate dom: each node has a unique immediate dom; it’s the last node that dominates the node. Dominator tree: Finding loops using dominators. find node whose head dominates the tail. tail head a -> b b dom a: back edge 7 -> 4 4 -> dom 7 8 to 3 is a back edge 10 -> 7, loop: {7, 8, 10} 1 -> 9 => loop: {1, 2, 3, 4, 5, 6, 7, 8, 9}