ODD/EVEN PARALLEL PREFIX ALGORITHM

advertisement
ODD/EVEN PARALLEL PREFIX ALGORITHM
Uses another divide and conquer approach. Divide N inputs into groups whose
indices are odd and even, respectively.
A construction which makes use of only one prefix computation of half the size is
shown in figure above. The prefix computation is then performed on N/2 sums,
yielding the correct result for all even numbered outputs. The odd numbered
outputs are then obtained by adding each even numbered result to the next higher
odd numbered input.
6
ODD/EVEN PARALLEL PREFIX ALGORITHM
Figure above shows the data dependence graph of odd/even algorithm for a vector
of 8 elements
When the odd/even construction is applied recursively, two operations are added
to the depth for each division of the size by two (fig. 2.6). This is not true for the
special case of four inputs (fig. 2.7). A simple diagram shows that the
construction of Poe (4) from Poe(2) yields a depth of 2 instead of 3. Doing the
construction for Poe (N) where N=2k down to Poe (4), the depth of which is 2,
3
yields
Depth( P oe (2 k ))   2  Depth( P oe (2 2 ))  2k  2  2 log 2 N  2, k  2.
i k
A similar repetition of the equation
Size ( P oe (2 k ))  Size ( P oe (2 k 1 ))  2 k  1 , down to Size ( P oe (2))  1 yields
1
Size ( P oe (2 k ))   (2 i  1)  2 k 1  k  2  2 N  log 2 N  2, k  0
i k
Thus Pul requires 0.25log2N times as many operations as Poe , but Poe has depth
about twice that of Pul. If the number of processors is small that some work should
be done sequentially just to the lack of processors, Poe may complete faster than
Pul.
7
CHARACTERIZING ALGORITHM BEHAVIOR FOR
LARGE PROBLEM SIZE
It is useful to be able to estimate how computational cost of solving grows as the
size of the problem grows. It is sufficiently to characterize asymptotic behavior.
For such cases “big O” and “small o” notations are used
Let f(n) and g(n) be functions of integer n.
Definition: The notation f(n)=O(g(n)) means that there exists a constant c and an
integer N such that for all n>=N, |f(n)|<c|g(n)|.
Definition: The notation f(n)= Ω(g(n)) means that there exists a constant c and an
integer N such that for all n>=N, |f(n)|>c|g(n)|.
Definition: If both f(n)=O(g(n)) and f(n)= Ω(g(n)) then f(n)=Θ(g(n))
Definition: The notation f(n)=o(g(n)) means that for any ε>0 there exists an
integer N such that for all n>=N, |f(n)|< ε |g(n)|.
f(n)/g(n) tends to zero as n increases
Definition: The notation f(n)=ω(g(n)) means that for all C, arbitrarily large, there
exists an integer N such that for all n>=N, |f(n)|>C|g(n)|
f(n)/g(n) diverges as n increases
Examples:
f ( n) 
n3
2n
 sin
 (n 3 )
3
M
M
Depth( P ul ( N ))  O(log 2 N ), Depth( P oe ( N ))  O(log 2 N )
8
PROGRAMMING PARALLEL PREFIX
Let’s consider at first sequential pseudocode for Poe algorithm:
Level:=2;
While level<=N begin
For i:=level step level until N
V[i]:=V[i]+V[i-level/2];
Level:=level*2;
End;
Level:=level/2;
If level=N then level:=level/2;
While level>1 begin
For i:=level+level/2 step level until N
V[i]:=V[i]+V[i-level/2];
Level:=level/2;
End;
9
PROGRAMMING PARALLEL PREFIX (CONT 1)
Let’s now consider MIMD odd/even parallel prefix code for process j of NP
processes, NP<=N.
Private level, j,i;
Shared N,NP, V[1:N];
Level:=2;
While level<=N begin
For i:=level+(j-1)*level step level*NP until N
V[i]:=V[i]+V[i-level/2];
Barrier;
Level:=2*level;
End;
Level:=level/2;
If level=N then level:=level/2;
While level>1 begin
For i:=level+level/2+(j-1)*level step level*NP until N
V[i]:=V[i]+V[i-level/2];
Barrier;
Level:=level/2;
End;
When a processor executes barrier statement, it waits until all other processors
have entered their barrier statements and then continues. This is a synchronization
primitive.
10
Download