On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard

On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard Presented by Andreas Söderström, ITN The prefix problem   Given X = x1,x2,…,xn compute the n products πk=x0 о x1 о … ο xk for 1 ≤ k ≤ n where ο is some associative operation Example: o = + (i.e. addition) X = 1,3,5,7 π1 = 1 π2 = 1+3 = 4 π3 = 1+3+5 = 9 π4 = 1+3+5+7 = 16 Parallel prefix sum (first pass) Step 3 36 10 3 1 7 2 Step 2 26 3 11 4 5 Step 1 15 6 7 8 Step 0 Parallel prefix sum (second pass)   For every even position use the value of the parent node For evey odd position pn compute pn-1+ pn Step 0 36 36 26 10 3 1 10 7 32 63 Step 1 11 21 10 4 15 5 Step 2 15 36 21 6 28 7 36 8 Step 3 Parallel prefix computation    Parallel time: 2n/p + O(log n) for p < n/(log n) Lower bound for parallel time: 2n/(p+1) for n > p(p+1)/2 Assumes identical processors! Parallel prefix computation  Potential practical problems: Processor setup may be heterogenous  Processor load may vary due to other users computing on the same machine    Off-line optimal scheduling potentially not optimal anymore! Solution:  Use on-line scheduling! The basic idea  Combine a sequentially optimal algorithm with fine-grained parallellism using work stealing P0 P1 P2 … Pn Steal work Steal work The algorithm Sequential process Ps:  The sequential process Ps starts working on [π1, πk], i.e. value indices [1,k] where indices [k+1,m] has been stolen  When Ps reaches the index k it communicates πk to the parallel process Pv that has stolen [k+1,m] and recoveres the last index n computed by Pv together with the local prefix result r n  Ps uses associativity to calculate πn+1 = πk o rn and continues with the computation from index n+1 The algorithm Parallel process Pv  Pv scans for active processes (can be Ps or another Pv) and steals part of the work from that process.  Pv computes the local prefix operation on the stolen interval  The computation of Pv depends on a previous value and need to be finalized when that value is known The algorithm Jump P0 1 2 3 4 5 6 Result P1 P2 Finalize 7 8 Stealable 9 10 11 12 13 14 15 16 Performance   If a processor is or becomes slow part of its work can be stolen by an idle processor Asymptotic optimality (proof provided in the paper) Performance P homogenous processeors 8 7 6 5 4 3 2 1 0 Lower bound Min Average Max p=2 p=6 p=4 Static p=8 p=2 p=4 p=6 Adaptive p=8 Performance P heterogenous processors 12 10 Lower bound 8 Min 6 Average 4 Max 2 0 p=2 p=4 p=6 Static p=8 p=2 p=4 p=6 Adaptive p=8 Questions?

On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard

Related documents

Products

Support

On-line adaptive parallel prefix computation Jean-Louis Roch, Daouda Traoré and Julien Bernard

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib