Divide and Conquer Yan Gu What is Divide and Conquer? An effective approach to designing fast algorithms in sequential computation is the method known as divide and conquer. Stratege --- Divide: a problem to be solved is broken into a number of subproblems of the same form as the original problems; Conquer: the subproblems are then solved independently, usually recursively; Combine: finally, the solutions to the subproblems are combined to provide the answer to the original problem. Classic example Sorting algorithms mergesort quicksort purpose Divide and conquer can be sued successfully in parallel computation; Derive efficient parallel algorithms by using divide and conquer algorithm. Parallel Computation model PRAM Parallel Random Access Machine ------- an abstract machine for designing the algorithms applicable to parallel computers Instructions: ER: Exclusive Read CR: Concurrent Read EW: Exclusive Write CW: Concurrent Write Searching Problem: given a file of n records, with the ith record, 1<=i<=n, consisting of: 1. a key field containing an integer Si. 2. several other DATA fields containing information. RAM BINARY SEARCH Algorithm RAM BINARY SEARCH(S,x,k) Step1: (1.1) i<- 1 (1.2) h<- n (1.3) k<- 0 Step2: while I =< h do (2.1) m<- (i+h)/2 (2.2) if x=Sm then (i) k<- m (ii) i<- h+1 else if x<Sm then h<- m-1 else i<- m+1 end if end if end while time complexity: O(logn) Search on the PRAM Assume that the file of n records is stored in the shared memory of a PRAM WITH n processors P1, P2,…, PN, where 1 < N =<n suppose now that N < n. the sequence S={S1, S2, …, Sn} is subdivided into N subsequence of length n/N each, and processor Pi is assigned the subsequence {S(i-1)(n/N)+1, …, Si(n/N)} , the algorithm is as follows: 1. All processors read x. those processors Pi for which S(i-1)(n/N)+1 =< x =< Si(n/N) perform algorithm RAM BINARY SEARCH on their assigned subsequence; those processor Pl for which x=S(l-1)(n/N)+j use a MIN CW TO write (l1)(n/N)+j in k. 2. 3. Time complexity : O (log (n/N) ) Parallel Binary Search In the parallel binary search, there are N processors, and we can extend binary search to become an (N+1) – way search. The algorithm will consist of several stages. At each stage, the sequence still under consideration is divided into (N+1) subsequence of equal length. The N processors simultaneously test the elements at the boundary between adjacent subsequences for equality with x. (the sequence S is assumed to be sorted from left to right in nondecreasing order). Parallel Binary Search Algorithm PRAM SEARCH (S, x, k) Step1 : (1.1) q<- 1 (1.2) r<- n (1.3) k<- 0 (1.4) g<- log(n+1)/log(N+1) Step2: while (q=<r and k=0) do (2.1) j(0) <- q-1 (2.2) for i=1 to N do in parallel (i) j(i) <- (q-1)+i(N+1)g-1 (ii) if j(i)=< r then if x=Sj(i) then k<- j(i) else if x< Sj(i) then ei <- left else ei <- right end if end if else (a) j(i) <- r+1 (b) ei <-left end if (iii) if ei-1 <>ei then (a) q <- j(i-1) +1 (b) r <- j(i)-1 end if (iv) if (i=N and ei<> ei+1) then q <- j(i)+1 end if end for (2.3) g <- g-1 end while Time complexity: O( log N n) comparison RAM search Sequentially executed by single processor Time complexity: O (logn) Searching on the PRAM Sequentially executed by several processors of N processors Time complexity: O (log (n/N)) PRAM search( Parallel Binary Search) parallelly executed by N processors Time complexity: O (log Nn)) Comparison result: O (log Nn)) < O (log (n/N)) < O (log n) Conclusion: PRAM search( Parallel Binary Search) is most efficient algorithm among these three searching algorithms. Merging Suppose that two sequence of number X=(x1, x2,…., xn) and Y={y1,y2…ym} are given, each sorted in nondecreasing order, with n>= m>=1. the problem of merging X and Y calls for creating, from these two sequence, a third sequence of numbers Z={z1, z2,…, zn+m},also sorted into nondecreasing order, such that each element of X and each elements of Y appears exactly once in Z. Sequential algorithm: RAM MERGE (X,Y,Z) Time complexity : O (n) Ranking a sorted sequence Algorithm PRAM RANK (X,Y,R) If m< 4 Then algorithm PRAM MODIFIED SEARCH compute the ranks of Y in X using N=|Y|1/2 processors else (1) s<- |Y| ½ (2) for i=1 to s do in parallel (2.1) algorithm PRAM MODIFIED SEARCH compute the rank r(is) of yis in X using N=|Y|1/2 processors (2.2) r(0) <- 0 end for (3) for i=0 to s-1 do in parallel (3.1) Xi <- {Xr(is)+1, Xr(is+2)+2, …, Xr(i+1)s)} (3.2) Yi <- {Yis+1, Yis+2, …, Y(i+1)s-1} (3.3) if r(is)=r((i+1)s) then Ri={0,0,..,0} else PRAM RANK(Xi,Yi,Ri) end if (3.4) for j=1 to s-1 do in parallel r(is+j) <- r(is) +ri(j) end for end if Running time of this algorithm: t(n)= t(n1/2)+o(1) =O(loglogn) Cost : C(n)=p(n) x t(n)=O (nloglogn) PRAM MERGE Algorithm PRAM MERGE(X,Y,Z) Step1: (1.1) PRAM RANK (X,Y,R) (1.2) for i=1 to m do in parallel Zi+r(i) <- yi end for Step 2: (2.1) PRAM RANK (Y,X,R’) (2.2) for i=1 to n do in parallel Zi+r’(i) <- xi end for. Using O(n) processors, steps (1.1)and (2.1) run in O(loglogn) time, while steps(1.2) and (2.2) take constant time. Thus, for algorithm PRAM MERGE P(n) = O(n), t(n) = O(loglogn), c(n) = O(nloglogn), We can use O(n/loglogn) processor and runs in O(loglogn) time, thus , can lead to a cost of O(n), which is an optimal cost Computing the convex hull Compute the convex hull of a set of points in the plane. Let Q={q1,q2,…, qn} be a finite sequence representing n points in the plane. The convex hull of Q, denoted CH(Q), is the convex polygon with the smallest area containing all the points of Q. thus, each Qi belong to Q either lies inside CH(Q) or is a corner of CH(Q). PRAM CONVEX HULL Algorithm PRAM CONVEX HULL (n, Q,CH(Q)) Step1: sort the points of Q by their x-coordinates. (PRAM SORT, O(logn)) Step2: partition Q into n1/2 subsets Q1, Q2,.. Qn1/2, separated by vertical lines, such that Qi is to the left of Qj IF i<j (constant time) Step3: for i=1 to n1/2 do in parallel if |Qi| =< 3 then CH (Qi) <- Qi else PRAM CONVEX HULL (n1/2, Qi, CH(Qi)) end if end for (recursive) Step4: CH(Q) <- CH(Q1) U CH(Q2) U … U CH(Qn1/2) (merge the convex hulls obtained in step3 to compute CH(Q) , merging a set of disjoint polygons) t(n) = t(n1/2)+βlogn => t(n)= O(logn), p(n)=n, c(n)=O(nlogn) Merging a set of disjoint polygons let u and v be the points of Q with smallest and largest x-coordinate, Convex hull CH(Q) consist of two parts: The upper hull The sequence of corners of CH(Q) in clockwise order, from u to v The lower hull The sequence of corners of CH(Q) in clockwise order, from v to u • Computing tangents find the upper (lower) common tangent (k,m) of CH(Qi) and CH(Qj) , that is, a straight-line segment with end points k and m, tangent to CH(Qi) at k and CH(Qj) at m. • Identifying the upper (lower) hull of CH(Q) find the upper (lower) tangent to CH(Qi) Question: Among the three algorithms below for the searching problem: RAM search Searching on the PRAM PRAM search 1) Which one is sequentially executed? which one is partly sequentially executed? an Which one is parallelly executed? 2) Which one is the most efficiently algorithms among these three algorithms, what is it’s time complexity? Answer: 1) RAM search is sequentially executed; Searching on the PRAM is partly sequentially executed; PRAM search is parallelly executed. 2) PRAM search is the most efficiently algorithms among these three algorithms, its time complexity is O (log N n))