Uploaded by pourhaji

Divide and Conquer Algorithms: Parallel Computation

advertisement
Divide and Conquer
Yan Gu
What is Divide and Conquer?
An effective approach to designing fast algorithms in
sequential computation is the method known as divide
and conquer.
Stratege ---
Divide:
a problem to be solved is broken into a number of subproblems of the same form as the original problems;

Conquer:
the subproblems are then solved independently, usually recursively;

Combine:
finally, the solutions to the subproblems are combined to provide the answer to the original problem.
Classic example
Sorting algorithms

mergesort

quicksort
purpose

Divide and conquer can be sued successfully in
parallel computation;

Derive efficient parallel algorithms by using
divide and conquer algorithm.
Parallel Computation model
PRAM
Parallel Random Access Machine -------
an abstract machine for designing the algorithms applicable to
parallel computers
Instructions:
ER: Exclusive Read
CR: Concurrent Read
EW: Exclusive Write
CW: Concurrent Write
Searching

Problem: given a file of n records, with the ith record,
1<=i<=n, consisting of:
1. a key field containing an integer Si.
2. several other DATA fields containing information.
RAM BINARY SEARCH

Algorithm RAM BINARY SEARCH(S,x,k)
Step1: (1.1) i<- 1
(1.2) h<- n
(1.3) k<- 0
Step2: while I =< h do
(2.1) m<- (i+h)/2
(2.2) if x=Sm
then (i) k<- m
(ii) i<- h+1
else if x<Sm
then h<- m-1
else i<- m+1
end if
end if
end while
time complexity: O(logn)
Search on the PRAM

Assume that the file of n records is stored in the shared memory of a
PRAM WITH n processors P1, P2,…, PN, where 1 < N =<n
suppose now that N < n. the sequence S={S1, S2, …, Sn} is subdivided
into N subsequence of length n/N each, and processor Pi is assigned the
subsequence {S(i-1)(n/N)+1, …, Si(n/N)} , the algorithm is as follows:
1.
All processors read x.
those processors Pi for which S(i-1)(n/N)+1 =< x =< Si(n/N) perform
algorithm RAM BINARY SEARCH on their assigned subsequence;
those processor Pl for which x=S(l-1)(n/N)+j use a MIN CW TO write (l1)(n/N)+j in k.
2.
3.
Time complexity : O (log (n/N) )
Parallel Binary Search

In the parallel binary search, there are N processors,
and we can extend binary search to become an (N+1) –
way search. The algorithm will consist of several stages.
At each stage, the sequence still under consideration is
divided into (N+1) subsequence of equal length. The N
processors simultaneously test the elements at the
boundary between adjacent subsequences for equality
with x.
(the sequence S is assumed to be sorted from left to
right in nondecreasing order).
Parallel Binary Search

Algorithm PRAM SEARCH (S, x, k)
Step1 : (1.1) q<- 1
(1.2) r<- n
(1.3) k<- 0
(1.4) g<- log(n+1)/log(N+1)
Step2: while (q=<r and k=0) do
(2.1) j(0) <- q-1
(2.2) for i=1 to N do in parallel
(i) j(i) <- (q-1)+i(N+1)g-1
(ii) if j(i)=< r
then if x=Sj(i)
then k<- j(i)
else if x< Sj(i)
then ei <- left
else ei <- right
end if
end if
else (a) j(i) <- r+1
(b) ei <-left
end if
(iii) if ei-1 <>ei
then (a) q <- j(i-1) +1
(b) r <- j(i)-1
end if
(iv) if (i=N and ei<> ei+1)
then q <- j(i)+1
end if
end for
(2.3) g <- g-1
end while
Time complexity: O( log N n)
comparison

RAM search
Sequentially executed by single processor
Time complexity: O (logn)

Searching on the PRAM
Sequentially executed by several processors of N processors
Time complexity: O (log (n/N))

PRAM search( Parallel Binary Search)
parallelly executed by N processors
Time complexity: O (log Nn))
Comparison result:
O (log Nn)) < O (log (n/N)) < O (log n)
Conclusion:
PRAM search( Parallel Binary Search) is most efficient algorithm
among these three searching algorithms.
Merging

Suppose that two sequence of number X=(x1, x2,…., xn) and
Y={y1,y2…ym} are given, each sorted in nondecreasing order, with
n>= m>=1. the problem of merging X and Y calls for creating, from
these two sequence, a third sequence of numbers Z={z1, z2,…,
zn+m},also sorted into nondecreasing order, such that each element
of X and each elements of Y appears exactly once in Z.
Sequential algorithm: RAM MERGE (X,Y,Z)
Time complexity : O (n)
Ranking a sorted sequence
Algorithm PRAM RANK (X,Y,R)
If m< 4
Then algorithm PRAM MODIFIED SEARCH compute the ranks of Y in X using N=|Y|1/2 processors
else (1) s<- |Y| ½
(2) for i=1 to s do in parallel
(2.1) algorithm PRAM MODIFIED SEARCH compute the rank r(is) of yis in X using N=|Y|1/2
processors
(2.2) r(0) <- 0
end for
(3) for i=0 to s-1 do in parallel
(3.1) Xi <- {Xr(is)+1, Xr(is+2)+2, …, Xr(i+1)s)}
(3.2) Yi <- {Yis+1, Yis+2, …, Y(i+1)s-1}
(3.3) if r(is)=r((i+1)s)
then Ri={0,0,..,0}
else PRAM RANK(Xi,Yi,Ri)
end if
(3.4) for j=1 to s-1 do in parallel
r(is+j) <- r(is) +ri(j)
end for
end if
Running time of this algorithm: t(n)= t(n1/2)+o(1) =O(loglogn)
Cost : C(n)=p(n) x t(n)=O (nloglogn)
PRAM MERGE
Algorithm PRAM MERGE(X,Y,Z)
Step1: (1.1) PRAM RANK (X,Y,R)
(1.2) for i=1 to m do in parallel
Zi+r(i) <- yi
end for
Step 2: (2.1) PRAM RANK (Y,X,R’)
(2.2) for i=1 to n do in parallel
Zi+r’(i) <- xi
end for.
Using O(n) processors, steps (1.1)and (2.1) run in O(loglogn) time, while steps(1.2) and
(2.2) take constant time. Thus, for algorithm PRAM MERGE
P(n) = O(n),
t(n) = O(loglogn),
c(n) = O(nloglogn),
We can use O(n/loglogn) processor and runs in O(loglogn) time, thus , can lead to a
cost of O(n), which is an optimal cost
Computing the convex hull
Compute the convex hull of a set of points in the plane.
Let Q={q1,q2,…, qn} be a finite sequence representing n
points in the plane. The convex hull of Q, denoted
CH(Q), is the convex polygon with the smallest area
containing all the points of Q. thus, each Qi belong to Q
either lies inside CH(Q) or is a corner of CH(Q).
PRAM CONVEX HULL
Algorithm PRAM CONVEX HULL (n, Q,CH(Q))
Step1: sort the points of Q by their x-coordinates. (PRAM SORT, O(logn))
Step2: partition Q into n1/2 subsets Q1, Q2,.. Qn1/2, separated by vertical
lines, such that Qi is to the left of Qj IF i<j (constant time)
Step3: for i=1 to n1/2 do in parallel
if |Qi| =< 3
then CH (Qi) <- Qi
else PRAM CONVEX HULL (n1/2, Qi, CH(Qi))
end if
end for
(recursive)
Step4: CH(Q) <- CH(Q1) U CH(Q2) U … U CH(Qn1/2)
(merge the convex hulls obtained in step3 to compute CH(Q) , merging a set of
disjoint polygons)
t(n) = t(n1/2)+βlogn => t(n)= O(logn),
p(n)=n,
c(n)=O(nlogn)
Merging a set of disjoint polygons
let u and v be the points of Q with smallest and largest x-coordinate,
Convex hull CH(Q) consist of two parts:
 The upper hull
The sequence of corners of CH(Q) in clockwise order, from u to v
The lower hull
The sequence of corners of CH(Q) in clockwise order, from v to u

• Computing tangents
find the upper (lower) common tangent (k,m) of CH(Qi) and
CH(Qj) , that is, a straight-line segment with end points k and m,
tangent to CH(Qi) at k and CH(Qj) at m.
• Identifying the upper (lower) hull of CH(Q)
find the upper (lower) tangent to CH(Qi)
Question:
Among the three algorithms below for the searching problem:
RAM search
Searching on the PRAM
PRAM search
1) Which one is sequentially executed? which one is partly sequentially executed?
an Which one is parallelly executed?
2) Which one is the most efficiently algorithms among these three algorithms, what
is it’s time complexity?
Answer:
1) RAM search is sequentially executed; Searching on the PRAM is partly
sequentially executed;
PRAM search is parallelly executed.
2) PRAM search is the most efficiently algorithms among these three algorithms, its
time complexity is O (log N n))
Download