Sieve of Eratosthenes by Fola Olagbemi Outline • • • • What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options for the parallel algorithm • Block decomposition (to minimize interprocess communication) • The program (returns count of primes) • References What is the sieve of Eratosthenes? • An algorithm designed by the Greek mathematician Eratosthenes: used to find the prime numbers between two and another specified number. • It works by gradually eliminating multiples of the smallest unmarked number (x) in the given interval, till 𝑥 2 > the specified number. Algorithm 1. Create a list of natural numbers in the specified range e.g. 2, 3, 4, ….. , n, none of which is marked 2. Set k to 2 (or smallest number in list) where k is the smallest unmarked number in the list 3. Repeat 1. Mark all multiples of k between square of k and n 2. Find smallest number greater than k that is unmarked. Set k to the new value, until 𝑘 2 > n 4. The unmarked numbers are prime. Parallelizing the algorithm • Step 3a – key parallel computation step • Reduction needed in each iteration to determine new value of k (process 0 – controller - may not have the next k) • Broadcast needed after reduction to inform all processes of new k. • The goal in parallelizing is to reduce interprocess communication as much as possible. Data decomposition options • Goal of data decomposition approach chosen – each process should have roughly the same number of values to work on (optimal load balancing) • Decomposition options: – Interleaved data decomposition – Block data decomposition: • Grouped • Distributed Data decomposition options (contd.) • Interleaved data decomposition (e.g.): – Process 0 processes numbers 2, 2+p, 2+2p, … – Process 1 processes numbers 3, 3+p, 3+2p, … • Advantage – easy to know which process is working on the value in any array index i • Disadvantages – It can lead to significant load imbalance among processes (depending on how array elements are divided up among processes) – May require a significant amount of reduction/broadcast (communication overheads) Data decomposition options (contd.) • Block data decomposition: – Grouped: first few processes have more values than the last couple of processes Task 0 Task 1 Task 2 Task 3 Grouped – Distributed: First process has less, second has more, third has less etc. (this is the option used in the program) Task 0 Distributed Task 1 Task 2 Task 3 Data decomposition (contd.) • Grouped data decomposition: – Let r = n mod p (n – number of values in the list, p – number of processes) – If r = 0, all processes get block of values of size n/p – Else, first r processes get blocks of size ceil(n/p), remaining processes blocks of size (floor(n/p) • Distributed data decomposition: – First element controlled by process i is floor(i*n/p) – Last element controlled by process I is floor((i+1)n/p) - 1 Data decomposition (Contd.) • To reduce communication (i.e. reduce number of MPI_Reduce calls to 1), all the primes used for sieving must be held by process 0. • This requirement places a limitation on the number of processes that can be used in this approach. References • Michael J. Quinn, Parallel Programming in C with MPI and OpenMP; McGraw Hill; 2004 Questions?