CSE
4100
Chap 10: Optimization
Prof. Steven A. Demurjian
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Way, Unit 2155
Storrs, CT 06269-3155 steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Material for course thanks to:
Laurent Michel
Aggelos Kiayias
Robert LeBarre
CH10.1
Overview
CSE
4100
Motivation and Background
Code Level Optimization
Common Sub-expression elimination
Copy Propagation
Dead-code elimination
Peephole optimization
Load/Store elimination
Unreachable code
Flow of Control Optimization
Algebraic simplification
Strength Reduction
Concluding Remarks/Looking Ahead
CH10.2
Motivation
CSE
4100
What we achieved
We have working machine code
What is missing
Code generation does not see the “big” picture
We can generate poor instruction sequences
What we need
A simple way to locally improve the code quality
Goal:
Transition from “Lousy” Intermediate Code to
More Effective and Efficient Code
Response Time, Performance (Algorithms), Memory
Usage
Measured in terms of Number of Variables Saved,
Operands Saved, Memory Accesses, etc.
CH10.3
Where can Optimation Occur?
CSE
4100
Source
Program
Front End
LA, Parse,
Int. Code
Int. Code
Code
Generator
Target
Program
Software Engineer can:
Profile Program
Change Algorithm Data
Transform/Improve Loops
Compiler Can:
Improve Loops/Proc
Calls
Calculate Addresses
Use Registers
Selected Instructions
Perform Peephole Opt.
All are Optimizations
1 st is User Controlled and Defined
At Intermediate Code Level by Compiler
At Assembly Level for Target Architecture (to take advantage of different machine features)
CH10.4
Code Level Optimization
CSE
4100
First Look at Optimization
Section 9.4 in 1 st Edition
Introduce and Discuss Basic Blocks
Requirements for Optimization
Section 10.1 in 1 st Edition
Basic Blocks, Flow Graphs
Indepth Examination of Optimization
Section 10.2 in 1 st Edition
Function Preserving Transformations
Loop Optimizations
CH10.5
First Look at Optimization
CSE
4100
Optimization Applied to 3 Address Coding (3AC)
Version of Source Program - Examples:
A + B[i] * ct1 = b[i] t2 = t1 * a t3 = t2 * c
CH10.6
First Look at Optimization
CSE
4100
Once Code has been Generated in 3AC, an Algorithm can be Applied to:
Identify each Basic Block which Represents a set of
Three Address Statements where
Execution Enters at Top and Leaves at Bottom
No Branches within Code
Represent the Control Flow
Dependencies Among and Between Basic Blocks
Defines what is Termed a “Flow Graph”
Let’s see an Example
CH10.7
First Look at Optimization
CSE
4100
Steps 1 to 12 from two Slides Back Represented as:
Optimization Works with Basic Blocks and Flow
Graph to Perform Transformations that:
Generate Equivalent Flow Graph w/Improved Perf.
CH10.8
First Look at Optimization
CSE
4100
Optimization will Perform Transformations on Basic
Blocks/Flow Graph
Resulting Graph(s) Passed Through to Final Code
Generation to Obtain More Optimal Code
Two Fold Goal of Optimization
Reduce Time
Reduce Space
Optimization Used to Come at a Cost:
In “Old Days” Turning on Optimizer Could Double the Compilation Time
From 2 hours to 4 hours
Is this an Issue Today?
CH10.9
First Look at Optimization
CSE
4100
Two Types of Transformations
Structure Preserving
Inherent Structure and Implicit Functionality of Basic
Blocks is Unchanged
Algebraic
Elimination of Useless Expressions x = x + 0 or y = y * 1
Replace Expensive Operators
Change x = y ** 2 to x = y * y
Why?
We’ll Focus on Both …
CH10.10
Structure Preserving Transformations
CSE
4100
Common Sub-Expression Elimination
How can Following Code be Improved?
a = b + c b = a – d c = b + c d = b d = a – d
What Must Make Sure Doesn’t happen?
Dead-Code Elimination
If x is not Used in Block, Can it be Removed?
x = y + z
What are the Possible Ramifications if so?
CH10.11
Structure Preserving Transformations
CSE
4100
Renaming Temporary Variables
Consider the code t = b + c
Can be Changed to u = b + c
May Reduce the Number of temporaries
Make Change from all t’s to all u’s
Interchange of Statements
Consider and Change to: t1 = b + c t2 = x + y t2 = x + y t1 = b + c
This can Occur as Long as:
x and y not t1
b and c not t2
What Do you have to Check?
CH10.12
Requirements for Optimization
CSE
4100
Identify Frequently Executed Portions of Code and
Make them Perform Better
Rule-of-Thumb - Most Programs spend 80% of their
Time in 20% of Code – Is this True?
We Focus on Loops since Every Gain in Space or Time is Multiplied by Loop Iterations
Reduce Loop’s Code and Improve Performance
What Other Programming Technique Should be a
Major Concern for Optimization?
CH10.13
Requirements for Optimization
CSE
4100
Criteria for Transformations
Preserve Meaning of Code
Don’t Change Output, Introduce Errors, etc.
Speed up Programs by Measurable Amount
(on Average for Entire Code)
Must be Work the Effort
Stick to Meaningful, Useful Transformations
Provide Different Versions of Compiler
Non-Optimizing
Optimizing
Extra Optimization on Demand
CH10.14
CSE
4100
Requirements for Optimization
Beware that Some Optimization Directives are
Ignored!
In C, Define variable as “register int I;”
While a Feature of Language, cc States that these
Instructions are Ignored and Compiler Controls Use of Registers
CH10.15
CSE
4100
The Overall Optimization Process
Advantages
Intermediate Code has Explicit Operations and Their
Identification Promotes Optimization
Intermediate Code is Relatively Machine Independent
Therefore, Optimization Doesn’t Impact Final Code
Generation
CH10.16
CSE
4100
Example Source Code
CH10.17
CSE
4100
Generated Three Address Coding
CH10.18
CSE
4100
Flow Graph of Basic Blocks
CH10.19
Indepth Examination of Optimization
CSE
4100
Code-Transformation Techniques:
Local – within a “Basic Block”
Global – between “Basic Blocks”
Data Flow Dependencies Determined by Inspection
what do i, a, and v refer to?
Dependent in Another Basic Block
Scoping is Very Critical
CH10.20
Indepth Examination of Optimization
CSE
4100
Function Preserving Transformations
Common Subexpressions
Copy Propagation
Deal Code Elimination
Loop Optimizations
Code Motion
Induction Variables
Strength Reduction
CH10.21
Common Sub-Expressions
CSE
4100
E is a Common Sub-Expression if
E as Previously Computed
Value of E Unchanged since Previous Computation
What Can be Saved in B5?
t6 and t7 same computation t8 and t10 same computation
Save:
Remove 2 temp variables
Remove 2 multiplications
Remove 4 variable accesses
Remove 2 assignments a[t10]:= x
Goto B2
CH10.22
Common Sub-Expressions
CSE
4100
What about B6?
t11 and t12
t13 and t15
Similar Savings as in B5 t15 := 4 * n a[t15]:= x
CH10.23
Common Sub-Expressions
CSE
4100
What else Can be Accomplished?
Where is Variable j Determined?
In B3 – and when drop through B3 to B4 and into B5, no change occurs to j!
What Does B5 Become?
Are we done? No t9 same as t5!
Again savings in access, variables, operations, etc.
t6 := 4 * i x := a[t6] t9 := a[t4] a[t6] := t9 a[t4]:= x
Goto B2 t6 := 4 * i x := a[t6] a[t6] := t5 a[t4]:= x
Goto B2 j := j - 1 t4 := 4 * j t5 := a[t4] if t5>4 goto B3
B4 t6 := 4 * i x := a[t6] t8 := 4 * j t9 := a[t8] a[t6] := t9 a[t8]:= x
Goto B2
CH10.24
Common Sub-Expressions
CSE
4100
Are we done yet?
Where is “i” defined?
Any Values we can Leverage?
Yes!
t2 = 4*i Defined in B2 and is unchanged as it arrives at B5
t3 = a[t2] in B3 and B2 and also unchanged as it arrives
Result at Left Saves:
From 9 statements down to 4
4 Multiplications are Gone
4 addr/array offsets are only 2 t6 := 4 * i x := a[t6] a[t6] := t5 a[t4]:= x
Goto B2 x := t3 a[t2] := t5 a[t4]:= x
Goto B2
CH10.25
Common Sub-Expressions
CSE
4100
B6 is Similarly Changed ….
t11 := 4 * i x := a[t11] t13 := 4 * n t14 := a[t13] a[t11]:= t14 a[t13]:= x x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x
CH10.26
CSE
4100
Resulting Flow Diagram
CH10.27
Copy Propagation
CSE
4100
Introduce a Common Copy Statement to Replace an
Arithmetic Calculation with Assignment a:= d + e b:= d + e a:= d + e a:= t b:= d + e a:= t c:= d + e c:= t
Regardless of the Path Chosen, the use of an
Assignment Saves Time and Space
CH10.28
Copy Propagation
CSE
4100
In our Example for B5 and B6 Below: x := t3 a[t2] := t5 a[t4]:= x
Goto B2 x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x
Since x is t3, we can replace the use of x on right hand side as below:
x := t3 a[t2] := t5 a[t4] := t3
Goto B2 x := t3 t14 := a[t1] a[t2] := t14 a[t1] := t3
We’ll come back to this shortly!
CH10.29
Dead Code Elimination
CSE
4100
Variable is “Dead” if its Value will never be Utilized
Again Subsequently
Otherwise, Variable is “Live”
What’s True about B5 and B6?
x := t3 a[t2] := t5 a[t4] := t3
Goto B2 x := t3 t14 := a[t1] a[t2] := t14 a[t1] := t3
Can Any Statements be Eliminated? Which Ones?
Why?
B5 and B6 are Now Optimized with
B5 has 9 Statements Reduced to 3
B56 has 8 Statements Reduced to 3
CH10.30
Loop Optimizations
CSE
4100
Three Types: Code Motion, Induction Variables, and
Strength Reduction
Code Motion
Remove Invariant Operations from Loop while (limit * 2 > i) do
Replaced by: t = limit * 2 while (t > i) do
Induction Variables
Identify Which Variables are Interdependent or in
Step j = j – 1 t4 = 4 * j
Replaced by below with an initialization of t4 t4 = t4 - 4
CH10.31
Loop Optimizations
CSE
4100
Strength Reduction
Replace an Expensive Operation (Such as Multiply) with a Cheaper Operation (Such as Add)
In B4, I and j can be replaced with t2 and t4
This Eliminates the need for Variables i and j
CH10.32
CSE
4100
Final Optimized Flow Graph – Done?
CH10.33
Turn to Prof. Michel’s Slides …
CSE
4100
Motivation
Rewrite the basic block to eliminate subexpressions
Technique
Change the representation
Move to a tree!
CH10.34
CSE
4100
Example
L1: t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
CH10.35
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.36
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.37
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.38
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.39
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.40
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.41
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.42
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.43
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
CH10.44
L1:
CSE
4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1
Example
What we have
Common sub-expressions are known
Used variables are known (leaves)
Live on exit are known
CH10.45
Peephole Optimization
CSE
4100
Simple Idea
Slide a window over the code
Optimize code in the window only.
Optimizations are
Local [still no big picture]
Semantic preserving
Cheap to implement
Usually
One can repeat the peephole several times!
Each pass can create new opportunities for more
CH10.46
CSE
4100
Peephole Optimizer block_3: block_4: block_5: block_6: mov [esp-4],ebp mov ebp,esp mov [ebp-8],esp sub esp,28 mov eax,[ebp+8] cmp eax,0 mov eax,0 sete ah cmp eax,0 jz block_5 mov eax,1 jmp block_6 mov eax,[ebp+8] sub eax,1 push eax mov eax,[ebp+4] push eax mov eax,[eax] mov eax,[eax] call eax add esp,8 mov ebx,[ebp+8] imul ebx,eax mov eax,ebx mov esp,[ebp-8] mov ebp,[ebp-4] ret
CH10.47
Peephole Optimizations
CSE
4100
A Few Simple technique [in a nutshell]
Load/Store elimination
Get rid of redundant operations
Unreachable code
Get rid of code guaranteed to never execute
Flow of Control Optimization
Simply jump sequences.
Algebraic simplification
Use rules of algebra to rewrite some basic operation
Strength Reduction
Replace expensive instructions by equivalent ones (yet cheaper)
Machine Idioms
Replace expensive instructions by equivalent ones (for a
CH10.48
given machine)
Load / Store Sequences
CSE
4100
Imagine the following sequence
“a” is a label for a memory location
e.g. a variable in memory on on the stack
If “a” is on the stack, it would look like ebp(k) [k == constant] mov a,eax mov eax,a
What is guaranteed to be true after the first instruction ?
Corollary....
CH10.49
Unreachable Code
CSE
4100
What is it?
A situation that arise because...
Conditional compilation
Previous optimizations “created/exposed” dead code
Example
#define debug 0
....
if (debug) { printf(“This is a trace message\n”);
}
....
CH10.50
Example
CSE
4100
The Generated code looks like....
....
if (debug == 0) goto L2 printf(“This is a trace message\n”);
L2: ....
If we know that...
debug == 0
Then
L2:
1
....
if (0 == 0) goto L2 printf(“This is a trace message\n”);
....
CH10.51
CSE
4100
Final transformation
Example
L2:
....
goto L2 printf(“This is a trace message\n”);
....
Given this code
There is no way to branch “into” the blue block
The last instruction (goto L2) jumps over the blue block
The blue block is never used. Get rid of it!
CH10.52
Unreachable Code Example
CSE
4100
Bottom Line
L2:
....
goto L2
....
Now L2 is instruction after goto...
So get rid of goto altogether!
L2:
....
....
CH10.53
Flow of Control Optimization
CSE
4100
Situation
We can have chains of jumps
Direct to conditional or vice-versa
Objective
Avoid extra jumps.
Why? [a.k.a. motivation....]
Example
L2:
L3:
L4: if (x relop y) goto L2
....
goto L4
....
L4_BLOCK
CH10.54
Flow of Control
CSE
4100
What can be done
Collapse the chain
L2:
L3:
L4: if (x relop y) goto L4
....
goto L4
....
L4_BLOCK
CH10.55
Algebraic Simplification
CSE
4100
Simple Idea
Use algebraic rules to rewrite some code
Examples x := y + 0 x := y * 1 x := y x := y x := y * 0 x := 0
CH10.56
Strength Reduction
CSE
4100
Idea
Replace expensive operation
By semantically equivalent cheaper ones.
Examples
Multiplication by 2 is equivalent to a left shift
Left shift is much faster
CH10.57
Hardware Idiom
CSE
4100
Idea
Replace expensive instructions by...
Equivalent instruction that are optimized for the platform
Example add eax,1 inc eax
CH10.58
Concluding Remarks/Looking Ahead
CSE
4100
Optimization Techniques/Concepts are Not Only
Relevant to Programming Languages
Database Systems do Optimization to Reduce Access to Secondary Storage
Concern when Asking for too Much Data
Joining Three or More Tables at Once
Doing a Cartesian Product Instead of a Join
Doing Selections before Joins
Termed Query Optimization
Looking Ahead
Review Machine Code Generation (if time)
Final Exam Review
CH10.59