global-scheduling - University of Delaware

advertisement
Optimizing Compilers
CISC 673
Spring 2011
Gobal Instruction Scheduling
John Cavazos
(Ben Perry)
University of Delaware
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
Overview


Introduction
Pipelining



Instruction Pipeline
Pipeline Execution
Constraints and Dependences
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
2
Current Processors


Can execute several operations in a single cycle
“How fast can a program run on a processor with
instruction-level parallelism?”




Potential parallelism in the program
Available parallelism on the processor
Ability to parallelize a sequential program
Find best schedule given constraints
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
Best targets

Programs with operations that are completely
dependent on each other are no good


Focus on constraints instead of scheduling
Numeric applications with large aggregate data
structures are good.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
4
Pipelines


Instruction Pipelines are found in every processor
Instructions go through multiple steps in the
pipeline from read to execute



Fetch, decode, execute, access memory, write result
Parallel processors: new instruction can be fetched
while current instruction is processed.
Each step in the pipeline takes a clock cycle
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
5
Example pipeline
i
i+1
i+2
i+3
i+4
1 Fetch
2 Identify
Fetch
3 Execute
Identify
Fetch
4 Read
Execute
Identify
Fetch
5 Write
Read
Execute
Identify
Fetch
6
Write
Read
Execute
Identify
Write
Read
Execute
Write
Read
7
8
9
Write
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
6
Pipelines – Speculative Computing



Load next instruction even if it may be
branched over (speculative)
On a branch event, the pipeline is emptied
and the branch must be fetched. (delay)
Hardware can predict which branch to
fetch, but it may be wrong
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
7
Pipeline Execution


Execution of an instruction is pipelined if
succeeding instructions not dependent on the
result are allowed to proceed.
Hardware can often detect dependencies
(superscaler machines) and pause execution if
operand isn’t available
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
8
Pipeline Execution



Some processors (Android phone, perhaps),
leave batch execution to compilers.
Very-long-instruction-words (VLIW) are
created by compiler that indicate a batch of
instructions to execute in parallel.
Out-of-order instructions can be scheduled
by advanced schedulers; best done at
software due to hardware limitations
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
9
Code-scheduling Constraints



Control-dependence – All operations
executed in original must be executed
Data-dependence – Must produce same
results as original
Resource
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
10
Data dependence




X = 5; Y = 6
Obviously, we can reorder these operations.
X = 5; Y = X
Obviously, we cannot reorder these.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
11
Data dependence

RAW – Read after write. True dependence.


If a write is followed by a read of the same
location, the read depends on the value written
WAR – Write after Read. Anti-dependence

If the write happens before the read, the read
will get the wrong value.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
12
Dependence

WAW – Write after Write.
If two writes go to the same location, the value
will be wrong
WAR and WAW can be eliminated using different
locations to store different values.


UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
13
Finding dependences



Compiler: GUILTY until proven innocent!
(always assume operations refer to same
location, and prove it otherwise).
Pointers p and (p + 10) cannot possibly refer
to the same location
Array data dependence analysis:


for i=0 to n: a[2i] = a[2i + 1].
No dependency in array during this loop
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
14
Finding dependences

Pointer alias analysis


Two pointers are aliased if they refer to the
same object. Difficult problem.
Interprocedural Analysis

Parameters passed by reference, or if globals are
passed
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
15
Register allocation



LD temporary_register1, a
ST b, temporary_register1
LD temporary_register2, c
ST d, temporary_register2
Two RAWs, but can be reordered.
If temporary_registers 1 and 2 get mapped to
the same physical register, we create another
dependency
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
16
Control dependence

All operations in a basic block are guaranteed
to execute.



But they’re small
And often highly related.
Optimize across other basic blocks is crucial.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
17
Control dependence


An instruction i1 is control dependent on
instruction i2 if the outcome of i2 determines
whether i1 is to be executed
Speculatively execute across different basicblocks
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
18
Speculative computing

Prefectching


Bring data from memory to the cache before it
is needed
Poison bits

Don’t throw exceptions when speculatively
computing. Instead, set poison bit. If poison
registered is really used, then throw exception.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
19
Speculative computing

Predicated Execution



Change
if (a == 0) b = c
To
st r4, r3
movif r2, r4, r1
Processor supports a conditional store,
enabling combination of basic blocks
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
20
Basic Block List Scheduling



NP-complete, but don’t give up.
Basic blocks are typically small.
Start with data-dependence graph


Nodes are instructions and resource
annotations
Edges are data dependences with a delay
destination has to wait (some instructions may
take 10 cycles, others only 1).
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
21
List Scheduling


Data dependence cannot have cycles
Build a topological ordering of the nodes


several such orderings may exist, though some
are better than others
Choose an ordering of the nodes such that for
each node, any following node cannot create a
dependence on it.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
22
List Scheduling
RT = an empty reservation table
Foreach n in SortedNodes:
-Find the earliest time instruction could begin
-Delay the instruction until resources are
available
-Schedule node after all delays
-claim resources
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
23
List Scheduling – better topologies


Longest path through the data-dependence
graph is shortest schedule.
Resources available constrain; critical
resource is the one with the largest ratio of
uses to the number of units of that resource
available.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
24
Global Code Scheduling





Optimize use of resources across blocks.
Global Code Scheduling - Moving
instructions from one basic block to another
Data AND control dependencies.
All instructions still must be performed
Speculative computing cannot be disruptive.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
25
Global Code Scheduling example




if (!a) {c=b;}
e=d+d
What are the data dependences?
What are the control dependences?
What can intuitively be ran in parallel?
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
26
Global Code Scheduling Example



if (!a) {c=b;}
e=d+d
Loads take two clock ticks, always hit. R1 =
a, R2 = b, …,
Processor can execute two instructions
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
27
if (!a) {c=b;}
e=d+d
Block 1
Block 2
Block 3
load r6, r1
idle
load r7, r2
idle
load r8, r4
idle
noop
idle
noop
idle
noop
idle
store r3, r7
idle
add r8,r8,r8
idle
st r5, r8
idle
jumpz r6, b3 idle
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
28
if (!a) {c=b;}
e=d+d
Block 1
Block 2
Block 3
load r6, r1
idle
load r7, r2
idle
load r8, r4
idle
noop
idle
noop
idle
noop
idle
store r3, r7
idle
add r8,r8,r8
idle
st r5, r8
idle
jumpz r6, b3 idle
Block 1
Block 2
load r6, r1
load r8, r4
Load r7, r2
idle
add r8,r8,r8
jumpz r6, b3
st r5, r8
Block 3
idle
st r5, r8
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
st r3, r7
29
Code movement

Definitions:




Dominates – A dominates B if all paths
through B pass through A.
Post-dominates – B post-dominates A if all
paths that pass through A pass through B.
Downward – Move operation down along
control
Upward – Move operation up along control
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
30
Upward Code Movement


Moving instruction from block src to block
dest. Block src comes after block dest in the
topological-sorted graph. Assume no
dependencies.
If dest dominates src and src post-dominates
dest, then we’re done.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
31
Upward Code Movement

If src does not postdominate dst, then we
have to speculatively compute



Only desirable if the operation is cheap
Only useful if src is reached.
If dst does not dominate src, copies of the
instruction are needed
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
32
Downward Code Movement


Moving instruction from block src to block
dest. Block src comes before block dest in the
topological-sorted graph. Assume no
dependencies
If src dominates dest and dest dominates src,
we’re done.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
33
Downward Code Movement

If src does not dominate dest,






Writes are often overwritten
Extra operations will be needed.
Replicate basic blocks and place operation in
new copy of dest
Alternatively, use predicated instructions
(speculative)
If dest does not post-dominate src,

Compensation code
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
34
Conclusion



Processors can execute several instructions in
parallel
We take advantage of this by moving code
Code can be moved if no dependencies
occur, but sometimes at a cost.
UNIVERSITY OF DELAWARE • COMPUTER & INFORMATION SCIENCES DEPARTMENT
35
Download