Chap 10: Optimization

advertisement

CSE

4100

Chap 10: Optimization

Prof. Steven A. Demurjian

Computer Science & Engineering Department

The University of Connecticut

371 Fairfield Way, Unit 2155

Storrs, CT 06269-3155 steve@engr.uconn.edu

http://www.engr.uconn.edu/~steve

(860) 486 - 4818

Material for course thanks to:

Laurent Michel

Aggelos Kiayias

Robert LeBarre

CH10.1

Overview

CSE

4100

Motivation and Background

Code Level Optimization

 Common Sub-expression elimination

Copy Propagation

Dead-code elimination

Peephole optimization

 Load/Store elimination

Unreachable code

Flow of Control Optimization

Algebraic simplification

Strength Reduction

Concluding Remarks/Looking Ahead

CH10.2

Motivation

CSE

4100

What we achieved

 We have working machine code

What is missing

Code generation does not see the “big” picture

 We can generate poor instruction sequences

What we need

 A simple way to locally improve the code quality

Goal:

Transition from “Lousy” Intermediate Code to

More Effective and Efficient Code

Response Time, Performance (Algorithms), Memory

Usage

 Measured in terms of Number of Variables Saved,

Operands Saved, Memory Accesses, etc.

CH10.3

Where can Optimation Occur?

CSE

4100

Source

Program

Front End

LA, Parse,

Int. Code

Int. Code

Code

Generator

Target

Program

Software Engineer can:

 Profile Program

 Change Algorithm Data

 Transform/Improve Loops

 Compiler Can:

 Improve Loops/Proc

Calls

Calculate Addresses

Use Registers

Selected Instructions

 Perform Peephole Opt.

All are Optimizations

 1 st is User Controlled and Defined

 At Intermediate Code Level by Compiler

 At Assembly Level for Target Architecture (to take advantage of different machine features)

CH10.4

Code Level Optimization

CSE

4100

First Look at Optimization

 Section 9.4 in 1 st Edition

Introduce and Discuss Basic Blocks

Requirements for Optimization

 Section 10.1 in 1 st Edition

Basic Blocks, Flow Graphs

Indepth Examination of Optimization

 Section 10.2 in 1 st Edition

 Function Preserving Transformations

 Loop Optimizations

CH10.5

First Look at Optimization

CSE

4100

 Optimization Applied to 3 Address Coding (3AC)

Version of Source Program - Examples:

 A + B[i] * ct1 = b[i] t2 = t1 * a t3 = t2 * c

CH10.6

First Look at Optimization

CSE

4100

Once Code has been Generated in 3AC, an Algorithm can be Applied to:

 Identify each Basic Block which Represents a set of

Three Address Statements where

Execution Enters at Top and Leaves at Bottom

No Branches within Code

 Represent the Control Flow

Dependencies Among and Between Basic Blocks

 Defines what is Termed a “Flow Graph”

Let’s see an Example

CH10.7

First Look at Optimization

CSE

4100

 Steps 1 to 12 from two Slides Back Represented as:

 Optimization Works with Basic Blocks and Flow

Graph to Perform Transformations that:

 Generate Equivalent Flow Graph w/Improved Perf.

CH10.8

First Look at Optimization

CSE

4100

Optimization will Perform Transformations on Basic

Blocks/Flow Graph

Resulting Graph(s) Passed Through to Final Code

Generation to Obtain More Optimal Code

Two Fold Goal of Optimization

Reduce Time

Reduce Space

Optimization Used to Come at a Cost:

In “Old Days” Turning on Optimizer Could Double the Compilation Time

 From 2 hours to 4 hours

Is this an Issue Today?

CH10.9

First Look at Optimization

CSE

4100

 Two Types of Transformations

 Structure Preserving

Inherent Structure and Implicit Functionality of Basic

Blocks is Unchanged

Algebraic

Elimination of Useless Expressions x = x + 0 or y = y * 1

Replace Expensive Operators

Change x = y ** 2 to x = y * y

Why?

We’ll Focus on Both …

CH10.10

Structure Preserving Transformations

CSE

4100

Common Sub-Expression Elimination

How can Following Code be Improved?

a = b + c b = a – d c = b + c d = b d = a – d

What Must Make Sure Doesn’t happen?

Dead-Code Elimination

If x is not Used in Block, Can it be Removed?

x = y + z

What are the Possible Ramifications if so?

CH10.11

Structure Preserving Transformations

CSE

4100

Renaming Temporary Variables

 Consider the code t = b + c

Can be Changed to u = b + c

May Reduce the Number of temporaries

Make Change from all t’s to all u’s

Interchange of Statements

 Consider and Change to: t1 = b + c t2 = x + y t2 = x + y t1 = b + c

This can Occur as Long as:

 x and y not t1

 b and c not t2

What Do you have to Check?

CH10.12

Requirements for Optimization

CSE

4100

Identify Frequently Executed Portions of Code and

Make them Perform Better

Rule-of-Thumb - Most Programs spend 80% of their

Time in 20% of Code – Is this True?

We Focus on Loops since Every Gain in Space or Time is Multiplied by Loop Iterations

Reduce Loop’s Code and Improve Performance

What Other Programming Technique Should be a

Major Concern for Optimization?

CH10.13

Requirements for Optimization

CSE

4100

 Criteria for Transformations

 Preserve Meaning of Code

Don’t Change Output, Introduce Errors, etc.

Speed up Programs by Measurable Amount

(on Average for Entire Code)

Must be Work the Effort

Stick to Meaningful, Useful Transformations

Provide Different Versions of Compiler

Non-Optimizing

Optimizing

 Extra Optimization on Demand

CH10.14

CSE

4100

Requirements for Optimization

 Beware that Some Optimization Directives are

Ignored!

In C, Define variable as “register int I;”

 While a Feature of Language, cc States that these

Instructions are Ignored and Compiler Controls Use of Registers

CH10.15

CSE

4100

The Overall Optimization Process

 Advantages

 Intermediate Code has Explicit Operations and Their

Identification Promotes Optimization

Intermediate Code is Relatively Machine Independent

Therefore, Optimization Doesn’t Impact Final Code

Generation

CH10.16

CSE

4100

Example Source Code

CH10.17

CSE

4100

Generated Three Address Coding

CH10.18

CSE

4100

Flow Graph of Basic Blocks

CH10.19

Indepth Examination of Optimization

CSE

4100

Code-Transformation Techniques:

 Local – within a “Basic Block”

 Global – between “Basic Blocks”

Data Flow Dependencies Determined by Inspection

 what do i, a, and v refer to?

Dependent in Another Basic Block

Scoping is Very Critical

CH10.20

Indepth Examination of Optimization

CSE

4100

Function Preserving Transformations

 Common Subexpressions

Copy Propagation

Deal Code Elimination

Loop Optimizations

 Code Motion

Induction Variables

Strength Reduction

CH10.21

Common Sub-Expressions

CSE

4100

E is a Common Sub-Expression if

 E as Previously Computed

 Value of E Unchanged since Previous Computation

What Can be Saved in B5?

 t6 and t7 same computation t8 and t10 same computation

 Save:

Remove 2 temp variables

Remove 2 multiplications

Remove 4 variable accesses

Remove 2 assignments a[t10]:= x

Goto B2

CH10.22

Common Sub-Expressions

CSE

4100

What about B6?

 t11 and t12

 t13 and t15

Similar Savings as in B5 t15 := 4 * n a[t15]:= x

CH10.23

Common Sub-Expressions

CSE

4100

What else Can be Accomplished?

Where is Variable j Determined?

 In B3 – and when drop through B3 to B4 and into B5, no change occurs to j!

What Does B5 Become?

Are we done? No t9 same as t5!

Again savings in access, variables, operations, etc.

t6 := 4 * i x := a[t6] t9 := a[t4] a[t6] := t9 a[t4]:= x

Goto B2 t6 := 4 * i x := a[t6] a[t6] := t5 a[t4]:= x

Goto B2 j := j - 1 t4 := 4 * j t5 := a[t4] if t5>4 goto B3

B4 t6 := 4 * i x := a[t6] t8 := 4 * j t9 := a[t8] a[t6] := t9 a[t8]:= x

Goto B2

CH10.24

Common Sub-Expressions

CSE

4100

Are we done yet?

Where is “i” defined?

 Any Values we can Leverage?

Yes!

 t2 = 4*i Defined in B2 and is unchanged as it arrives at B5

 t3 = a[t2] in B3 and B2 and also unchanged as it arrives

Result at Left Saves:

 From 9 statements down to 4

4 Multiplications are Gone

4 addr/array offsets are only 2 t6 := 4 * i x := a[t6] a[t6] := t5 a[t4]:= x

Goto B2 x := t3 a[t2] := t5 a[t4]:= x

Goto B2

CH10.25

Common Sub-Expressions

CSE

4100

B6 is Similarly Changed ….

t11 := 4 * i x := a[t11] t13 := 4 * n t14 := a[t13] a[t11]:= t14 a[t13]:= x x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x

CH10.26

CSE

4100

Resulting Flow Diagram

CH10.27

Copy Propagation

CSE

4100

 Introduce a Common Copy Statement to Replace an

Arithmetic Calculation with Assignment a:= d + e b:= d + e a:= d + e a:= t b:= d + e a:= t c:= d + e c:= t

 Regardless of the Path Chosen, the use of an

Assignment Saves Time and Space

CH10.28

Copy Propagation

CSE

4100

 In our Example for B5 and B6 Below: x := t3 a[t2] := t5 a[t4]:= x

Goto B2 x := t3 t14 := a[t1] a[t2]:= t14 a[t1]:= x

 Since x is t3, we can replace the use of x on right hand side as below:

 x := t3 a[t2] := t5 a[t4] := t3

Goto B2 x := t3 t14 := a[t1] a[t2] := t14 a[t1] := t3

We’ll come back to this shortly!

CH10.29

Dead Code Elimination

CSE

4100

Variable is “Dead” if its Value will never be Utilized

Again Subsequently

Otherwise, Variable is “Live”

What’s True about B5 and B6?

 x := t3 a[t2] := t5 a[t4] := t3

Goto B2 x := t3 t14 := a[t1] a[t2] := t14 a[t1] := t3

Can Any Statements be Eliminated? Which Ones?

Why?

B5 and B6 are Now Optimized with

 B5 has 9 Statements Reduced to 3

 B56 has 8 Statements Reduced to 3

CH10.30

Loop Optimizations

CSE

4100

Three Types: Code Motion, Induction Variables, and

Strength Reduction

Code Motion

 Remove Invariant Operations from Loop while (limit * 2 > i) do

 Replaced by: t = limit * 2 while (t > i) do

Induction Variables

Identify Which Variables are Interdependent or in

Step j = j – 1 t4 = 4 * j

Replaced by below with an initialization of t4 t4 = t4 - 4

CH10.31

Loop Optimizations

CSE

4100

 Strength Reduction

 Replace an Expensive Operation (Such as Multiply) with a Cheaper Operation (Such as Add)

In B4, I and j can be replaced with t2 and t4

This Eliminates the need for Variables i and j

CH10.32

CSE

4100

Final Optimized Flow Graph – Done?

CH10.33

Turn to Prof. Michel’s Slides …

CSE

4100

Motivation

 Rewrite the basic block to eliminate subexpressions

Technique

 Change the representation

 Move to a tree!

CH10.34

CSE

4100

Example

L1: t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

CH10.35

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.36

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.37

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.38

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.39

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.40

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.41

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.42

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.43

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

CH10.44

L1:

CSE

4100 t1 := 4 * i; t2 := a[t1]; t3 := 4 * i; t4 := b[t3]; t5 := t2 * t4; t6 := prod + t5; prod := t6; t7 := i + 1; i := t7; if i <= 20 then goto L1

Example

 What we have

 Common sub-expressions are known

Used variables are known (leaves)

Live on exit are known

CH10.45

Peephole Optimization

CSE

4100

Simple Idea

 Slide a window over the code

 Optimize code in the window only.

Optimizations are

Local [still no big picture]

Semantic preserving

 Cheap to implement

Usually

 One can repeat the peephole several times!

 Each pass can create new opportunities for more

CH10.46

CSE

4100

Peephole Optimizer block_3: block_4: block_5: block_6: mov [esp-4],ebp mov ebp,esp mov [ebp-8],esp sub esp,28 mov eax,[ebp+8] cmp eax,0 mov eax,0 sete ah cmp eax,0 jz block_5 mov eax,1 jmp block_6 mov eax,[ebp+8] sub eax,1 push eax mov eax,[ebp+4] push eax mov eax,[eax] mov eax,[eax] call eax add esp,8 mov ebx,[ebp+8] imul ebx,eax mov eax,ebx mov esp,[ebp-8] mov ebp,[ebp-4] ret

CH10.47

Peephole Optimizations

CSE

4100

 A Few Simple technique [in a nutshell]

 Load/Store elimination

Get rid of redundant operations

Unreachable code

Get rid of code guaranteed to never execute

Flow of Control Optimization

Simply jump sequences.

Algebraic simplification

Use rules of algebra to rewrite some basic operation

Strength Reduction

Replace expensive instructions by equivalent ones (yet cheaper)

Machine Idioms

Replace expensive instructions by equivalent ones (for a

CH10.48

given machine)

Load / Store Sequences

CSE

4100

 Imagine the following sequence

“a” is a label for a memory location

 e.g. a variable in memory on on the stack

 If “a” is on the stack, it would look like ebp(k) [k == constant] mov a,eax mov eax,a

What is guaranteed to be true after the first instruction ?

Corollary....

CH10.49

Unreachable Code

CSE

4100

What is it?

 A situation that arise because...

Conditional compilation

 Previous optimizations “created/exposed” dead code

Example

#define debug 0

....

if (debug) { printf(“This is a trace message\n”);

}

....

CH10.50

Example

CSE

4100

 The Generated code looks like....

....

if (debug == 0) goto L2 printf(“This is a trace message\n”);

L2: ....

 If we know that...

 debug == 0

Then

L2:

1

....

if (0 == 0) goto L2 printf(“This is a trace message\n”);

....

CH10.51

CSE

4100

 Final transformation

Example

L2:

....

goto L2 printf(“This is a trace message\n”);

....

 Given this code

There is no way to branch “into” the blue block

The last instruction (goto L2) jumps over the blue block

The blue block is never used. Get rid of it!

CH10.52

Unreachable Code Example

CSE

4100

 Bottom Line

L2:

....

goto L2

....

Now L2 is instruction after goto...

 So get rid of goto altogether!

L2:

....

....

CH10.53

Flow of Control Optimization

CSE

4100

Situation

 We can have chains of jumps

Direct to conditional or vice-versa

Objective

 Avoid extra jumps.

Why? [a.k.a. motivation....]

Example

L2:

L3:

L4: if (x relop y) goto L2

....

goto L4

....

L4_BLOCK

CH10.54

Flow of Control

CSE

4100

 What can be done

 Collapse the chain

L2:

L3:

L4: if (x relop y) goto L4

....

goto L4

....

L4_BLOCK

CH10.55

Algebraic Simplification

CSE

4100

Simple Idea

 Use algebraic rules to rewrite some code

Examples x := y + 0 x := y * 1 x := y x := y x := y * 0 x := 0

CH10.56

Strength Reduction

CSE

4100

Idea

 Replace expensive operation

 By semantically equivalent cheaper ones.

Examples

Multiplication by 2 is equivalent to a left shift

Left shift is much faster

CH10.57

Hardware Idiom

CSE

4100

Idea

 Replace expensive instructions by...

 Equivalent instruction that are optimized for the platform

Example add eax,1 inc eax

CH10.58

Concluding Remarks/Looking Ahead

CSE

4100

Optimization Techniques/Concepts are Not Only

Relevant to Programming Languages

Database Systems do Optimization to Reduce Access to Secondary Storage

Concern when Asking for too Much Data

Joining Three or More Tables at Once

Doing a Cartesian Product Instead of a Join

Doing Selections before Joins

 Termed Query Optimization

Looking Ahead

Review Machine Code Generation (if time)

Final Exam Review

CH10.59

Download