# Lecture Note 3

```Lecture 3: Parallel Algorithm Design
1
Techniques of Parallel Algorithm Design






Balanced binary tree
Pointer jumping
Divide and conquer
Pipelining
Multi-level divide and conquer
.....
2
Balanced binary tree
Processing on binary tree: Let the leaves correspond to input and internal
nodes to processors.
Example
Find the sum of n integers (x1, x2, ... , xn).
3
Balanced binary tree
Problem of finding Prefix Sum
Definition of Prefix Sum
Input： n integers put in array A[1..n] on the shared memory
Output： array B[1..n], where for each B[i] (1≦i≦n)
B[i] = A[1] + A[2] + .... + A[i]
Example Input: A[1..5] = (5, 8, -7, -10, 3)，
Output: B[1..5] = (5, 13, 6, -4, -1)
Sequential algorithm for Prefix Sum
main (){
B[1] = A[1];
for (i = 2; i≦n; i++) {
B[i] = B[i-1] + A[i];
}
}
4
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (1)
Outline of the parallel algorithm for prefix sum
To simplify the problem, let n = 2k (k is an integer)
(1) Calculate the sub-sum from the leaves to the root in bottom up style.
(2) Using the sub-sum obtained in (1) , calculate the prefix sum from the
root to the leaves in up down style.
5
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (2)
(1) First read the input at the leaves. Then,
calculate the sub-sum from the leaves
to the root in bottom up style.
(2) From the root to the leaves, do
the following: send the right son
its sub-sum obtained in (1), and
send the left son the value of
(its sub-sum) – the right son’s sub-sum).
12-10
P1
=2
2-(-4)
=6
P1
6-2
=4
P1
2
P2
6
12
12-5 P 2 12
=7
P3
P4
2-5 2 7-(-3) 7 12-(-2) 12
=14
=-3
=10
6
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (3)
Correctness of the algorithm
 When step (1) finished, the sub-sum in each internal node is the sum
of its subtree.
12
P1
2
10
P1
6
P1
4
P2
-4
P2
2 -9
5
P3
5
8
5
P4
-3 7
-2
7
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (4)
Correctness of the algorithm - Continue
 In step (2), at each internal node
(a) The sub-sum sent to the right son is the summation of its subtree.
(b) The sub-sum sent to the left son is the sum of its subtree subtracted
by the sum of its right son’s subtree.
P1
P1
P1
P1
P2
P2
P3
12
(a)
P 1 12-10
=2
12
P4
P1
P2
P2
P3
12
P4
5
7
(b)
8
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (5)
Algorithm Parallel-PrefixSum (EREW PRAM algorithm)
main (){
if (number of processor == i) B[0, i] = A[i];
for (h=1; h≦log n; h++) {
if (number of processor j ≦ n/2h) {
B[h, j] = B[h-1, 2j-1] + B[h-1, 2j];
}
}
C[log n, 1] = B[log n, 1]
for (h = (log n) - 1; h≧0; h--) {
if (number of processor j ≦ n/2h) {
if (j is even) C[h, j] = C[h+1, j/2];
if (j is odd) C[h, j] = C[h+1, (j+1)/2] - B[h, j+1];
}
}
}
9
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (6)
(First step)
B[3,1]
B[2,1]
B[1,1]
B[2,2]
B[1,2]
B[1,3]
B[1,4]
B[0,1] B[0,2] B[0,3] B[0,4] B[0,5] B[0,6] B[0,7] B[0,8]
A[1] A[2]
A[3] A[4]
A[5] A[6]
A[7]
A[8]
10
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (7)
(Second step)
Ｃ[3,1]
C[2,1]
C[1,1]
C[2,2]
C[1,2]
C[1,3]
C[1,4]
C[0,1] C[0,2] C[0,3] C[0,4] C[0,5] C[0,6] C[0,7] C[0,8]
11
Balanced binary tree
Solving Prefix Sum problem on balanced binary tree (8)
Analysis of the algorithm
• Computing time: for loop repeated log n times and each loop can be
executed in O(1) time → O(log n) time
• Number of processors: Not larger than n → n processors
• Speed up: O(n/log n)
• Cost: O(n log n)
It is not cost optimal since the running
time of the optimal Θ(n).
12
Balanced binary tree
To reduce the cost, solve the problem sequentially when the size
of the problem is small.
Accelerated cascading is used, usually, with balanced binary tree
and divide and conquer techniques.
13
Balanced binary tree
Accelerated cascading for Prefix Sum problem
Policy for improving the algorithm
To make the algorithm cost optimal, we decrease the number of processors
from n to n/logn.
(Note: Computing time of the algorithm is O(logn).)
Steps:
1. Instead of processing n elements in parallel, divide n elements into
n/logn groups with logn elements each.
2. To each group assign one processor and solve the problem for the
group sequentially.
14
Balanced binary tree
Accelerated cascading for Prefix Sum problem
Improved algorithm Parallel-PrefixSum
(1) Divide n elements in A[1..n] in to n/log n groups with log n elements each.
（O(1) time，O(n/log n) processors）
A[1..n]
log n elements
(2) Assign each group one processor and find the prefix sum for each group.
（O(log n) time，O(n/log n) processors）
15
Balanced binary tree
Accelerated cascading for Prefix Sum problem (3)
Improved algorithm Parallel-PrefixSum - continue
(3) Let S be the set of the last element in each group (it is the sum of the group). Use
algorithm Parallel-PrefixSum to find the prefix sum of S.
( O(log (n/log n) ) = O(log n) time，O(n/log n) processors)
Algorithm
Parallel-PrefixSum
Last element
in each group
16
Balanced binary tree
Accelerated cascading for Prefix Sum problem (4)
Improved algorithm Parallel-PrefixSum - continue
(4) Use the prefix sum of S to find the prefix sum of the input
A[1..n].
（O(log n) time，O(n/log n) processor）
Result of (3)
17
Balanced binary tree
Accelerated cascading for Prefix Sum problem (5)
Analysis of the improved algorithm
Computing time and the number of processors：
Each step: O(log n) time, O(n/log n)
→ Totally, O(log n) time, O(n/log n) processors


Speed up = O(n/log n)
Cost： O(log n &times; n/log n) = O(n)


It is cost optimal.
It is also time optimal
( Don’t show the proof here)
It is optimal algorithm.
18
Divide and Conquer
Divide and conquer
(1) 2 divide and conquer
(2) n ε divide and conquer（ε＜１）
19
Divide and Conquer
Divide and conquer technique
•
Well known technique in algorithm design
•
Solving problems recursively
•
Used very often in both sequential and parallel
algorithms
How to divide and conquer
(1) Dividing step: dividing the problem into a number of subproblems.
(2) Conquering step: solving each subproblem recursively.
(3) Merging step: merging the solutions of subproblems to the solution of
the original problem.
20
Divide and Conquer
Convex hull problem
Input: a set of n points in the plane.
Output: the smallest convex polygon which contains all points of the
input. (The convex polygon is represented by the list of its
vertices in order of clockwise.)



Basic problems in computational geometry.
A lot of applications.
Solved in O(nlogn) time sequentially.
In the following we only consider
the upper convex hull.
（Upper convex hull: ( P9, P8, P1, P0 ) ）
P9
P8
Upper convex hull
P1
P7
P4
P2
P6
Lower convex hull
P5
P0
P3
Output: ( P 0,P 3,P 5,P9 ,P8 ,P1 )
21
Divide and Conquer
Merging of two upper convex hulls
Finding the upper common tangent
Common upper tangent = (p 3 ,p8 )
p2
p3
p4
p1
p5
p6
p7
p8
p9
p10
It is known that common tangents can be
found in O(log n) time sequentially.
22
Divide and Conquer
2 divide and conquer (1)
Outline of the algorithm Parallel-UpperConvexHull
Preprocessing Sort all the points according to their x coordinates, and let the result is
the sequence (p1, p2, p3, ... , pn).
(1) If the size of sequence is 2, return the sequence.
(2) Divide (p1, p2, p3, ... , pn) to the left half part and the right half part, and find the
upper convex hull of each recursively.
(3) Find the upper common tangent of two upper convex hulls obtained in (2), and
output the solution of the problem.
23
Divide and Conquer
2 divide and conquer (2)
How 2 divide and conquer works
Find the upper common
tangent for two upper convex
hulls of two vertices each.
Find the upper common
tangent for two upper convex
hulls of four vertices each.
Find the upper common
tangent for two upper convex
hulls of eight vertices each.
24
Divide and Conquer
2 divide and conquer（３）
Recursive execution
 When the problem is divide once, the size of the subproblem becomes
half.
 Suppose the size of the subproblems becomes 2 when the
problem is divided k times.
n/2k= 2 ⇒
k = log2 n - 1
n
n/2
n/2
Height=
log 2 n
n/4
2 22 2
n/4
2 2
n/4
n/4
25
Divide and Conquer
２divide and conquer (4)
Complexity of the algorithm
Proprocessing Sort the sequence of the points according to their x coordinates.
(1) If the size of the sequence is 2, return the sequence.
(2) Divide the sequence into the left half part and the right half part,
and find the upper convex hull of each recursively.
(3) Find the upper common tangent of two upper convex hulls obtained in (2),
and output the upper convex hull of the sequence.
Preprocessing： O(log n) time，n processors
Steps (1)〜(3)： each step runs O(log n) time，use n/2 processors
T(n) = T(n/2) + O(log n) Therefore, T(n) = O(log2 n)
∴ The algorithm runs in O(log 2 n) time using O(n) processors.
Computational model: There is no concurrent access ⇒ EREW PRAM
26
Divide and Conquer
2divide and conquer (5)
Finding the complexity of the algorithm from recursive tree
 Computing time
Time
Height
log 2 n
n/2
n/4
c log n
n
n/2
n/4
n/4
n/4
１
c log n/2
2
c log n/4
4
c
2 2 2 2
Processors
n/2
2 2
 Number of processors
Totally O(log2 n)
n/2
At the level of the leaves, n/2 processors are used at the same time. ⇒ n/2 processors
T(n)&times;P(n)=O(nlog2 n)
It is not cost optimal
27
Divide and Conquer
n1/2 divide and conquer
Outline of the algorithm
Preprocessing
Sort the sequence of the input points according to their x coordinates, and
let the result be sequence (p1, p2, p3, ... , pn).
(1) If the size of the sequence is 2, return the sequence.
1/2
(2) Divide (p1, p2, p3, ... , pn) to n equally-sized subsequence, and find
the upper convex hull of each recursively.
(3) Merge n1/2 upper convex hulls into the upper convex hull of the
sequence．
28
Divide and Conquer
Merging
n1/2 upper convex hull
1/2
Assign each upper convex hull n processors to find the upper common tangents in
O(log n) time, and then determine the edges which belong to the solution．
Case 1
Case 2
29
Divide and Conquer
1/2
Recursive tree of n divide and conquer
 When the problem is divided once, the size of the subproblems
becomes n 1/2.
 Suppose that the size of the subproblems becomes 2 when the
problem is divided k times.
k
n
1/(2 )
=2 ⇒
k = log log n
n
n 1/2 n 1/2
Height=
loglog n
n 1/4 n 1/4
2 2
2
n 1/2
n 1/4
2
2
2
30
Divide and Conquer
Analysis of the algorithm
Preprocessing Sort the sequence of the points in their x coordinates.
(1) If the size of the sequence is 2，return the sequence.
1/2
(2) Divide the sequence into n equally-sized subsequences,
and find the upper convex hull of each recursively.
(3) Find the upper common tangents of the n 1/2upper convex hulls
obtained in (2), and determine the solution.
Preprocessing： O(log n) time，n processors.
Steps (1)〜(3)： each step O(log n) time，n processors.
T(n) = T(n1/2 ) + O(log n), therefore, T(n) = O(log n)
∴ Totally, the algorithm runs in O(log n) time using O(n) processors.
Computational model
• Concurrent reading happens in the procedure of finding the upper common tangents
⇒ CREW PRAM
T(n)&times;P(n)=O(nlog n)
Optimal !!!
31
Exercise
1. Suppose nxn matrix A and matrix B are saved in two dimension arrays. Design a PRAM
algorithm for A&times;B using n and nxn processors, respectively. Answer the following
questions:
(1) What PRAM models that you use in your algorithms?
(2) What are the runings time?
(3) Are you algorithms cost optimal?
(4) Are your algorithms time optimal?
2. Design a PRAM algorithm for A&times;B using k (k &lt;= nxn processors). Answer the same
questions.
32
```