CPSC 388 – Compiler Design and Construction Optimization Optimization Goal Produce Better Code Fewer instructions Faster Execution Do Not Change Behavior of Program! Optimization Techniques Peep-hole optimization Done after code generation Makes small local changes to assembly Moving Loop-Invariants Done before code generation Find Computations in loops that can be moved outside Strength Reduction in for loops Done before code generation Replace multiplications with additions Copy Propagation Done before code generation Replace use of variable with literal or other variable Peep-hole Optimization Look through small window at assembly code for common cases that can be improved 1. Redundant load 2. Redundant push/pop 3. Replace a Jump to a jump 4. Remove a Jump to next instruction 5. Replace a Jump around jump 6. Remove Useless operations 7. Reduction in strength Redundant Load Before store load Rx, M M, Rx After store Rx, M Redundant Push/Pop Before push pop Rx Rx After … nothing … Replace a jump to a jump Before goto … L1:goto After goto L2 L1:goto L2 L1 L2 Remove a Jump to next Instruction Before goto L1:… After L1:… L1 Replace a jump around jump Before if T0 = 0 goto L1 else goto L2 L1:… After if T0 != 0 goto L2 L1:… Remove useless operations Before add mul T0, T0, 0 T0, T0, 1 After … nothing … Reduction in Strength Before mul add T0, T0, 2 T0, T0, 1 After shift-left T0 inc T0 One optimization may lead to another load add store Tx, M Tx, 0 Tx, M After One Optimization: load store Tx, M Tx, M After Another Optimization: load Tx, M You Try It The code generated from this program contains opportunities for the first two kinds (redundant load, jump to a jump). Can you explain how just by looking at the source code? public class Opt { public static void main() { int a; int b; if (true) { if (true) { b = 0; } else { b = 1; } return; } a = 1; b = a; } } Moving Loop-Invariant Computations Out of the Loop For greatest gain, optimize “hot spots”, i.e. inner loops. An expression is loop invariant if the same value is computed on every iteration of the loop Compute the value once outside loop and reuse value inside loop Example for (int i=0;i<100;i++) { for (int j=0;j<100;j++) { for (int k=0;k<100;k++) { A[i][j][k]=i*j*k; } } } Example for (int i=0;i<100;i++) { for (int j=0;j<100;j++) { for (int k=0;k<100;k++) { T0=i*j*k; T1=FP+<offset of A>-i*4000-j*400-k*4; Store T0, 0(T1) } } } Invariant to I loop Invariant to J loop Invariant to K loop Example tmp0=FP + <offset of A> for (int i=0;i<100;i++) { tmp1=tmp0-i*4000; for (int j=0;j<100;j++) { tmp2=tmp1-j*400; tmp3=i*j; for (int k=0;k<100;k++) { T0=tmp3*k; T1=tmp2-k*4; store T0, 0(T1) } } } Comparison before and after of inner most loop (executed 1 million times) Original Code 5 multiplications (3 for lvalue, 2 for rvalue) 3 subtractions(for lvalue) 1 indexed store New Code 2 multiplications (1 for lvalue, 1 for rvalue) 1 subtraction (for lvalue) 1 indexed store Questions How do you recognize loop-invariant expressions? When and where do we move the computations of those expressions? Recognizing Loop Invariants An expression is invariant with respect to a loop if for every operand, one of the following holds: It is a literal It is a variable that gets its value only from outside the loop When and Where to move invariant expressions Must consider safety of move Must consider profitability of move Safety of moving invariants If evaluating expression might cause an error and the loop might not get executed: b=a; while (a != 0) { x = 1/b; //possible “/0” if moved a--; } Safety of moving invariants What about preserving order of events? if the unoptimized code performed output THEN had runtime error Is it valid for the optimized code to simply have runtime error? Changing order of computations may change result for floating-point computations due to differing precisions Profitability of moving invariants If the computation might NOT execute in the original program then moving the computation might actually slow down the program! Moving is Safe and Profitable If Loop will execute at least once Code will execute if loop does Isn’t inside any condition Is on all paths through loop (both if and else portions) Expression is in non short-circuited part of the loop test E.g. while (x < i+j*100) You Try It What are some examples of loops for which the compiler can be sure that the loop will execute at least once? Strength Reduction Concentrate on “hot spots” Replace expensive operations (*) with cheaper ones (+) Example Strength Reduction For i from low to high do …i*k1+k2 Where i is the loop index K1 and K2 are constant with respect to the loop Consider the sequence of values for i and expression Examples Strength Reduction Iteration # i i*k1+k2 1 low low*k1+k2 2 low+1 (low+1)*k1+k2= low*k1+k2+k1 3 low+1+1 (low+1+1)*k1+k2= low*k1+k2+k1+k1 Example Strength Reduction Compute low*k1+k2 once before loop Store value in a temporary Use the temporary instead of the expression inside loop Increment temporary by k1 at the end of the loop Example Strength Reduction temp=low*k1+k2 For i from low to high do …temp… temp=temp+k1 end Another Example tmp0 = FP + offset A for (i=0; i<100; i++) { tmp1 = tmp0 - i*40000 // for (j=0; j<100; j++) { tmp2 = tmp1 - j*400 // tmp3 = i*j // for (k=0; k<100; k++) { T0 = tmp3 * k // k T1 = tmp2 - k*4 // k store T0, 0(T1) } } } i * -40000 + tmp0 j * -400 + tmp1 j * i + 0 * tmp3 + 0 * -4 + tmp2 Now Perform Strength Reduction tmp0 = FP + offset A temp1 = tmp0 // temp1 = 0*-40000+tmp0 for (i=0; i<100; i++) { tmp1 = temp1 temp2 = tmp1 // temp2 = 0*-400+tmp1 temp3 = 0 // temp3 = 0*i+0 for (j=0; j<100; j++) { tmp2 = temp2 tmp3 = temp3 temp4 = 0 // temp4 = 0*tmp3+0 temp5 = tmp2 // temp5 = 0*-4+tmp2 for (k=0; k<100; k++) { T0 = temp4 T1 = temp5 store T0, 0(T1) temp4 = temp4 + tmp3 temp5 = temp5 - 4 } temp2 = temp2 - 400 temp3 = temp3 + i } temp1 = temp1 - 40000 } You Try It Suppose that the index variable is incremented by something other than one each time around the loop. For example, consider a loop of the form: for (i=low; i<=high; i+=2) ... Can strength reduction still be performed? If yes, what changes must be made to the proposed algorithm? Copy Propagation Statements of the form “x=y” (called d) are called copy statements. For every use, u, of variable x reached by a copy statement such that: No other definition of x reaches u, and y can’t change between d and u You can replace the use of x at u with a use of y. Examples of Copy Propagation x=y a=x+z Yes x=y if (…) x=2 a=x+z No x=y if (…) y=3 a=x+z No Question Why is this a useful transformation? If ALL uses of x reached by definition d are replaced, then the definition of d is useless, and can be removed. tmp0 = FP + offset A temp1 = tmp0 // cannot be propagated for (i=0; i<100; i++) { tmp1 = temp1 temp2 = tmp1 // cannot be propagated temp3 = 0 // cannot be propagated for (j=0; j<100; j++) { tmp2 = temp2 tmp3 = temp3 temp4 = 0 // cannot be propagated temp5 = tmp2 // cannot be propagated for (k=0; k<100; k++) { T0 = temp4 T1 = temp5 store T0, 0(T1) temp4 = temp4 + tmp3 temp5 = temp5 - 4 } temp2 = temp2 - 400 temp3 = temp3 + i } temp1 = temp1 - 40000 } tmp0 = FP + offset A temp1 = tmp0 for (i=0; i<100; i++) { temp2 = temp1 temp3 = 0 for (j=0; j<100; j++) { temp4 = 0 temp5 = temp2 for (k=0; k<100; k++) { store temp4 0(temp5) temp4 = temp4 + temp3 temp5 = temp5 - 4 } temp2 = temp2 - 400 temp3 = temp3 + i } temp1 = temp1 - 40000 } Comparision before and after Before 5 *, 3 +/-, 1 indexed store in inner most loop After 2 +/- in inner most loop 2 +/-, 2 copy statements in middle loop 1 +/-, 1 copy in outer loop