Evolution of ILP

advertisement
INSTRUCTION-LEVEL PARALLEL
PROCESSORS
Chapter No. 4
Evolution of ILPprocessors
What is instruction-level
parallelism?
The potential for executing certain
instructions in parallel, because they are
independent.
Any technique for identifying and
exploiting such opportunities.
Instruction-level parallelism
(ILP)
Basic blocks
 Sequences of instruction that appear between branches
Usually no more than 5 or 6 instructions!
Loops
for ( i=0; i<N; i++)
x[i] = x[i] + s;
We can only realize ILP by finding sequences of
independent instructions
Dependent instructions must be separated by a
sufficient amount of time
Principle Operation of ILPProcessors
Pipelined Operation
Pipelined Operation
Pipelined Operation
A number of functional units are employed in
sequence to perform a single computation.
Each functional unit represent a certain stage
of computation.
Pipeline allows overlapped execution of
instructions.
It increases the overall processor’s throughput.
Superscalar Processors
Increase the ability of the processor to
use instruction level parallelism.
Multiple instructions are issued every
cycle
multiple pipelines operating in parallel.
received sequential stream of instructions.
The decode and execute unit then issues
multiple instructions for the multiple
execution units in each cycle.
Superscalar Approach
VLIW architecture
Abbreviation of VLIW is Very Large Instruction
Word.
VLIW architecture receives multi-operation
instructions, i.e. with multiple fields for the
control of EUs.
Basic structure of superscalar processors and
VLIW is same
multiple EUs, each capable of parallel execution on
data fetched from a register file.
VLIW Approach
Dependencies Between
Instructions
Data dependency
control dependency
resource dependency
Dependencies Between
Instructions
Dependences
(constraints)
Data
dependences
Control
dependences
Resource
dependences
arise from precedence
requirements concerning
referenced data
arise from
conditional
statements
arise from
limited resources
Data Dependency
An instruction j depends on data from previous
instruction i
cannot execute j before the earlier instruction i
cannot execute j and i simultaneously.
Data dependence's are properties of the
program
whether this leads to data hazard or stall in a
pipeline depending upon the pipeline organization.
Data Dependency
Data can be differentiated according to the data
involved and according to their type.
The data involved in dependency may be from
register or from memory
the type of data dependency may be either in a
straight-line code or in a loop.
Data Dependency
Data Dependency in Straightline Code
Straight-line code may contain three
different types of dependencies:
RAW (read after write)
WAR (write after read)
WAW (write after write)
RAW (Read after Write)
Read After Write (RAW)

i1: load
i2: add
r1, a;
r2, r1,
r1;
Assume a pipeline of
Fetch/Decode/Execute/Mem/Writeback
Name Dependencies
A “name dependence” occurs when 2 instructions use
the same register or memory location (a name) but
there is no flow of data between the 2 instructions
There are 2 types:
Antidependencies: Occur when an instruction j writes a
register or memory location that instruction i reads – and i is
executed first
Corresponds to a WAR hazard
Output dependencies: Occur when instruction i and
instruction j write the same register or memory location
Protected against by checking for WAW hazards
Write after Read (WAR)
Write after Read (WAR)

i1: mul
i2: add
r1, r2, r3;
r2, r4 , r5;
r1 <= r2 * r3
r2 <= r4 + r5
If instruction i2 (add) is executed before
instruction i1 (mul) for some reason, then i1
(mul) could read the wrong value for r2.
Write after Read (WAR)
One reason for delaying i1 would be a stall for
the ‘r3’ value being produced by a previous
instruction. Instruction i2 could proceed
because it has all its operands, thus causing the
WAR hazard.
Use register renaming to eliminate WAR
dependency. Replace r2 with some other
register that has not been used yet.
Write after Write (WAW)
Write after Write (WAW)

i1: mul
i2: add
r1, r2, r3;
r1, r4 , r5;
r1 <= r2 * r3
r2 <= r4 + r5
If instruction i1 (mul) finishes AFTER instruction
i2 (add), then register r1 would get the wrong
value. Instruction i1 could finish after
instruction i2 if separate execution units were
used for instructions i1 and i2.
Write after Write (WAW)
One way to solve this hazard is to simply let
instruction i1 proceed normally, but disable its
write stage.
Data Dependency in Loops
Instruction belonging to a particular loop
iteration may be dependent on the the
instructions belonging to previous loop
iterations.
This type of dependency is referred as
recurrences or inter-iteration data
dependency.
do
X(I)= A*X(I-1) + B
end do
Data Dependency in Loops
Loops are a “common case” in pretty much any
program so this is worth mentioning…
Consider:
for (j=0; j<=100; j++) {
A[j+1] = A[j] + C[j]; /*S1*/
B[j+1] = B[j] + A[j+1]; /*S2*/
}
Data Dependency in Loops
Now, look at the dependence of S1 on an earlier
iteration of S1
This is a loop-carried dependence; in other words,
the dependence exists b/t different iterations of the
loop
Successive iterations of S1 must execute in order
S2 depends on S1 within an iteration and is not
loop carried
Multiple iterations of just this statement in a loop
could execute in parallel
Data Dependency in Graphs
 Data dependency can also be represented by graphs
 d0,, dt or ,da to denote RAW, WAR or WAR respectively.
i1
i2
i1
I2 is dependent on i1
i2
d t
d t
i1 : lo ad r1, a;
i3
i2: lo ad r2, b;
i3: add r3, r1, r2;
do
i4: m ul r1, r2, r 4;
i5: div r1, r2, r 4;
i4
do
Instruction interpretation:
r 3  (r1)  (r 2)
etc.
i5
Control Dependence
Control dependence determines the ordering of
an instruction with respect to a branch
instruction
if the branch is taken, the instruction is executed
if the branch is not taken, the instruction is not
executed
An instruction that is not control dependent on a
branch can not be moved before the branch
instructions from then part of if-statement cannot be
executed before the branch
Control Dependence
An instruction that is not control dependent on a
branch cannot be moved after the branch
other instructions cannot be moved into the
then part of an if statement
Example
sub r1, r2, r3;
jz zproc;
mul r4, r1, r1;
:
:
zproc: load r1, x:
:
Control Dependence
Yeh, Pat (1992) M88100
Stephens & al. (1991)
RS/6000
Lee, Smith (1984)
IBM/370
PDP-11-70
CDC 6400
Branch ratio( %)
Uncond. total branch
ratio (%)
gen. purp. scientific
gen. purp. scientific
Uncond. branch ratio
(%)
gen. purp. scientific
24
5
78
82
20
4
22
11
83
85
18
9
29
10
73
73
21
8
39
46
8
Published branch statistics
18
53
4
Control Dependence
Frequent conditional branches impose a heavy
performance constraint on ILP-processors.
Higher rate of instruction issue per cycle raise
the probability of encountering conditional
control dependency in each cycle.
Control Dependence
JC
JC
scalar issue
JC
JC
JC
JC
with 2 intructions/issue
JC
superscalar (or VLIW) issue
JC
JC
with 3 intructions/issue
JC
JC
JC
with 6 intructions/issue
t
JC: conditional branch
Control Dependency Graph
i0:
i1 :
i2:
i3:
i4:
i5:
i6:
i7:
i8:
i0
r1 = op1;
r2 = op2;
r3 = op3;
if (r2 > r1)
{ if (r3 > r1)
i1
i2
r4 = r3;
else r4 = r1}
i3
F
T
i4
else r4 = r2;
T
F
5r = r4 * r4
i5
Figure 5.2.11: An example for a CDG
i6
i8
i7
Resource Dependency
An instruction is resource dependent on a
previously issued instruction if it requires
hardware resource which is still being used by
previously issued instruction
If, for instance, there is only a single
multiplication unit available, then in the
code sequence
i1: div r1, r2, r3
i2: div r4, r2, r5
i2 is resource dependent on i1
Instruction Scheduling
Why instruction scheduling is needed?
Instruction scheduling involves:
Detection
detect where dependency occurs in a code
Resolution
removing dependencies from the code
two approaches for instruction scheduling
static approach
dynamic approach
Instruction Scheduling
Static Scheduling
Detection and resolution is accomplished by the
compiler which avoids dependencies by
reordering the code.
VLIW processors expects dependency free code
generated by ILP compiler.
Dynamic Scheduling
Performed by the processor
contains two windows
issue window
contains all fetched instructions which are
intended for issue in the next cycle.
Issue window’s width is equal to issue rate
all instructions are checked for dependencies that
may exists in an instruction.
Dynamic Scheduling
Execution window
contains all those instructions which are still in
execution and whose results have not yet been
produced are retained in execution window.
Instruction Scheduling in ILPprocessors
ILP-instruction scheduling
Detection and resolution
of dependencies
Parallel
optimization
Parallel Optimization
Parallel optimization is achieved by
reordering the sequence of instructions by
appropriate code transformation for
parallel execution.
Also, known as code restructuring or code
reorganization.
ILP-instruction Scheduling
Preserving Sequential
Consistency
To maintain the logical integrity of program
div r1, r2, r3;
ad r5, r6, r7;
jz anywhere;
Preserving Sequential
Consistency
Download