Flow dependence

advertisement
COSC513 Operating System
Research Paper
Fundamental Properties of Programming
for Parallelism
Student: Feng Chen (134192)
Conditions of Parallelism

Needs in three key areas:
Computation models
 Inter-processor communication
 System integration


Tradeoffs exist among time, space,
performance, cost factors
Data and resource dependences

Flow dependence: if an execution path exists from S1
to S2 and at least one output of S1 feeds in as input to
S2
 Antidependence: if S2 follows S1 and the output of S2
overlaps the input to S1
 Output dependence: S1 and S2 produce the same
output variable
 I/O dependence: the same file is referenced by more
than one I/O statements
 Unknown dependence: index itself indexed (indirect
addressing), no loop variable in the index, nonlinear
loop index, etc.
Example of data dependence




S1: Load R1, A/move mem(A) to R1
S2: Add R2, R1
/R2 = (R1) + (R2)
S3: Move R1, R3
/move (R3) to R1
S4: Store B, R1
/move (R1) to mem(B)





S2 is flow-dependent on S1
S3 is antidependent on S2
S3 is output-dependent on S1
S2 and S4 are totally independent
S4 is flow-dependent on S1 and S3
Example of I/O dependence

S1: Read(4), A(i)
unit 4
 S2: Rewind(4)
 S3: Write(4), B(i)
unit 4
 S4: Rewind(4)


/read array A from tape
/rewind tape unit 4
/write array B into tape
/rewind tape unit 4
S1 and S3 are I/O dependent on each other
This relation should not be violated during
execution; otherwise, errors occur.
Control dependence
The situation where the order of
execution of statements cannot be
determined before run time
 Different paths taken after a conditional
branch may change data dependences
 May exist between operations performed
in successive iterations of a loop
 Control dependence often prohibits
parallelism from being exploited

Example of control dependence

Successive iterations of this loop are controlindependent:

For (I=0; I<N; I++)
{
A(I) = C(I);
if (A(I) < 0)
A(I) = 1;
}





Example of control dependence

The following loop has controldependent iterations:
For (I=1; I<N; I++)
{
 if (A(I-1) == 0)

A(I) = 0
}

Resource dependence

Concerned with the conflicts in using shared
resources, such as integer units, floating-point
units, registers, and memory areas



ALU dependence: ALU is the conflicting resource
Storage dependence: each task must work on
independent storage locations or use protected
access to share writable memory area
Detection of parallelism requires a check of
the various dependence relations
Bernstein’s conditions for parallelism

Define:



Ii as the input set of a process Pi
Oi as the output set of a process Pi
P1 and P2 can execute in parallel (denoted as
P1 || P2) under the condition:

∩ O2 = 0
∩ O1 = 0
O1 ∩ O2 = 0

Note that I1 ∩ I2 <> 0 does not prevent parallelism
 I1
 I2
Bernstein’s conditions for parallelism
Input set: also called read set or domain
of a process
 Output set: also called write set or range
of a process
 A set of processes can execute in
parallel if Bernstein’s conditions are
satisfied on a pairwise basis; that is,
P1||P2|| … ||PK if and only if Pi||Pj for all
i<>j

Bernstein’s conditions for parallelism
The parallelism relation is commutative:
Pi || Pj implies that Pj || Pi
 The relation is not transitive: Pi || Pj and
Pj || Pk do not necessarily mean Pi || Pk
 Associativity: Pi || Pj || Pk implies that (Pi
|| Pj) || Pk = Pi || (Pj || Pk)

Bernstein’s conditions for parallelism

For n processes, there are 3n(n-1)/2
conditions; violation of any of them prohibits
parallelism collectively or partially
 Statements or processes which depend on
run-time conditions are not transformed to
parallelism. (IF or conditional branches)
 The analysis of dependences can be
conducted at code, subroutine, process, task,
and program levels; higher-level dependence
can be inferred from that of subordinate levels
Example of parallelism using
Bernstein’s conditions
P1: C = D * E
 P2: M = G + C
 P3: A = B + G
 P4: C = L + M
 P5: F = G / E


Assume no pipeline is used, five steps
are needed in sequential execution
Example of parallelism using
Bernstein’s conditions
D
P1
E
Time
*
D
C
P2
G
+
P1
G
P3
B
L
*
B
C
G
E
C
+
P2
A
P4
E
M
+
C
+
L
P4
P3
+
P5
/
M
+
G
P5
E
/
F
C
A
F
Example of parallelism using
Bernstein’s conditions
There are 10 pairs of statements to
check against Bernstein’s conditions
 Only P2 || P3 || P5 is possible because
P2 || P3, P3 || P5 and P2 || P5 are all
possible
 If two adders are available
simultaneously, the parallel execution
requires only three steps

Implementation of parallelism
We need special hardware and software
support to implement parallelism
 There is a distinguish between hardware
and software parallelism
 Parallelism cannot be achieved free

Hardware parallelism

Often a function of cost and performance
tradeoffs
 If a processor issues k instructions per
machine cycle, it is called a k-issue processor
 Conventional processor takes one or more
machine cycles to issue a single instruction:
one-issue processor
 A multiprocessor system built with n k-issue
processors should be able to handle maximum
nk threads of instructions simultaneously
Software parallelism

Defined by the control and data dependence
of programs
 A function of algorithm, programming styles,
and compiler optimization
 Two most cited types of parallel programming:


Control parallelism: in the form of pipelining and
multiple functional units
Data parallelism: similar operations performed over
many data elements by multiple processors;
practiced in SIMD and MIMD systems
Hardware vs. Software parallelism

Software parallelism
 Totally eight
instructions: 4 loads
(L), 2 multiplication
(X), 1 addition (+)
and 1 subtraction (-)
 Theoretically, the
computation will be
accomplished in 3
cycles (steps)
Step
1
L
L
L
L
Step
2
X
X
Step
3
+
-
A
B
Hardware vs. Software parallelism




Hardware parallelism
(Example 1)
By a 2-issue processor
which can execute one
memory access and one
arithmetic operation
simultaneously
The computation needs
7 cycles (steps)
Mismatch between HW
and SW parallelism
Step
1
L
Step
2
L
Step
3
X
\
X
L
Step
4
Step
5
X
Step
6
+
A
Step
7
L
B
Hardware vs. Software parallelism



Hardware parallelism
(example 2)
Using a dual-processor
system, each processor
is single-issue
6 cycles are needed to
execute the 12
instructions, where 2
store operations and 2
load operations are
inserted for interprocessor
communication through
the shared memory
Step
1
L
L
Step
2
L
L
Step
3
X
X
Step
4
S
S
Step
5
L
L
Step
6
+
-
A
S statements: added
instructions for interprocessor communication
B
Download