Compiler Optimization Overview

advertisement
Compiler Optimization Overview

1. Computer Hardware Architecture Review
2. Analysis
3. Optimizations
4 . Continuing Development
Review: Phases of a Compiler


Intermediate code
optimizations are
not machine specific
Low level
optimizations can be
machine specific
Review: Compiler Options
Review: Basic Processor Parts
Review: CISC vs RISK





CISC
x86 Intel
Multi-clock complex
instructions
Memory access
incorporated in
instruction
Complex instruction
set





RISC
Mac Powerbook
Single clock
instructions
Memory accesses
are separate
instructions
Simple instruction
set
Review: Memory Hierarchy


Memory access
becomes
exponentially slower
at higher levels
Memory access
intensive programs
require special
optimizations
Review: Multiple Cores




Need to create and
use ILP
Multiple cores on the
same die can share
cache working
together faster
Can only execute
trivial parallelism (Dr.
Doughty)
Must eliminate
hazards
Review: Pipelines
Review: Pipelines
Compiler Optimization Overview

1. Computer Hardware Architecture Review
2. Analysis
3. Optimizations
4 . Continuing Development
Optimization Goals
Speed
 Executable Size
 Memory Access
 Power Usage –
Embedded
 Debugging

Optimizing for Speed*








Useful for CPU intensive applications
(graphics, video editing, sorting)
Scheduling – out of order execution
Removal of dependencies increase ILP
Instruction latency
Multiple ALUs, Cores, etc
Mix instruction types (int, float, mult, read,
write)
Eliminate jumps
Buffer writes (cannot write out of order)
Optimizing for Size

More common for embedded applications





Competing with power/speed optimizations
Limiting code size to keep critical loops in
memory
Choose form of instruction that is smaller
(CISC)
Use short constants for jumps (simpler form
of addressing)
Increase instruction length for loop alignment
Optimizing for Memory





Useful for memory I/O intensive applications
Consideration of proper alignment of data
and instructions to reduce cache misses and
improve results of paging
Use instructions for controlling cache
Partially addresses Von Neumann bottleneck
Reading lowest level cache in P4 is 3 clocks

Each higher level is an order of magnitude larger
(10, 100)
Analysis
Alias
 Control flow
 Data flow
 Dependence
 Interprocedural

Alias Analysis



Determines if there are multiple ways to
access a single data point
Knowing aliases helps identify optimizations
by recognizing data dependencies and
locating redundant code/data updates
Alias analysis is critical for global
optimizations (reference parameters, globally
defined data, pointers)
Control Flow Analysis

Precursor to critical loop reductions



Replacement of inefficient code
Gathers information concerning hierarchical
flow of control
Identifies potential branches in program
execution useful for mitigating pipeline
hazards
Example: Fibonacci
Example: Fibonacci
Example: Fibonacci
Data Flow Analysis



Procure information about how a procedure
uses data
Builds on structures from control flow
analysis
There are many ways to achieve goal:

Reaching definitions


Iterative Analysis



Calculate potential definitions at a give point in the
code
Use control graph
Structural Analysis
etc
Dependence Analysis*

Recognizes relationships using a DAG






True/Flow dependence
Anitdependence
Output dependence
Input dependence (does not affect execution order)
Instruction scheduling
Data caching
Interprocedural Analysis



Incorporates analysis methods discussed
earlier, but on a broader level
OOD and high level coding methodologies
are optimal for human understanding, not
computer processing
Includes analysis of relationships between
function calls to mitigate overhead of OOD
oriented code
Compiler Optimization Overview

1. Computer Hardware Architecture Review
2. Analysis
3. Optimizations
4 . Continuing Development
Loop Optimizations*





Loop optimizations have the greatest impact
on overall code performance
Desire to reduce dependencies to allow ILP
Desire to reduce overhead of jumping and
branching in loop
Predictability – predicting loop behavior to
mitigate pipeline hazards
Loops must be well behaved


Single return
No breaks, branches, etc
Loop Strength Reduction
Procedure Optimizations




Based on control flow
Desire to eliminate overhead of context
switches
Possibly turn function calls into branches
Optimizations occur at high and low level



High level – Procedure integration
Low level – In line expansion
Conventions


Leaf routines (call no others) have reduced
overhead
Shrink wrapping creates pseudo leaves by
adding data flow analysis
Tail Call Optimization: Tail
Recursion
Code Scheduling*

Block Scheduling



Branch Scheduling



Blocks optimized as independent pieces of code
Cross block scheduling applied to optimized
blocks
Fill stall cycles after branch with independent
code
Reduces effect of bad branch predictions in HW
pipeline
Software Pipelining

Executes multiple iterations of loops
synchronously
Register Allocation


Applies to low level assembly
Loops and nesting are used to weigh which
values should be maintained in registers



Nested loops weigh more heavily
Considers variable activity before and after block
of code is accessed
Use of operation costs and number of times they
are performed
Register Allocation Calculation
Register Allocation: Graph
Coloring





Use subset of objects that should be
allocated to registers
Arcs represent points where two objects exist
at the same time
Arcs represent conflicts where the object
cannot be assigned a register (int, float)
Color graph with number of colors equal to
number of registers
Assign registers based on color
Redundancy Elimination



Based on data flow analysis
Intermediate level optimization
Includes:




Common subexpression elimination
Loop invariant code motion
Partial redundancy elimination
Code hoisting
Peephole Optimizations




Focused on very small subsets of code
Generally performed late in the code process
Arguably covers up bad and incomplete
optimizations from earlier processes
Some examples include:






Dead code elimination (created from earlier
optimizations)
Strength reductions
Constant folding
Instruction combining
Copy propagation
Algebraic simplifications
Compiler Optimization Overview

1. Computer Hardware Architecture Review
2. Analysis
3. Optimizations
4 . Continuing Development
Continuous Relevance of
Compiler Development



Back end of compilers for older languages
are reworked to take advantage of advances
in hardware
Pipelines are becoming longer
Multiple cores are now common allowing
more use of parallel instructions
Research Areas






Domain specific subjects: security, reliability,
parallel, distributed, embedded, mobile
Analysis, prediction, and debugging tools
Embedded JIT compilation
Development of a research compiler (GCC)
Enhancing compiler optimization times,
specifically iterative and whole program
optimizations
MS F# - functional language for .NET like ML
Compiler Job Options





Additional exploitation of parallel computing
environments for desktop platforms
Multiple OS/Environment support
Integration of AI techniques, machine
learning, to know when, how, where to apply
optimizations (GCC)
Special purpose languages for video,
graphics, and audio processing (nVidea)
Special purpose vendors for embedded
products (Wind River, VxWorks)
Compiler Job Options


Library adaptation for reconfigurable
processors (GCC)
Fault tolerance and exception handling for
security
Compiler Optimization
Problems



Many optimizations are localized
Non-local optimizations create increased
overhead in the computation process
Multiple objectives of optimizations create
conflicts

For example: speed vs executable size
Download