Homework 3: Due Before Class, Thurs. Oct. 18

advertisement
Homework 3: Due Before Class, Thurs. Oct. 18
handin cs4230 hw3 <file>
•  Problem 1 (Amdahl’s Law):
(i) Assuming a 10 percent overhead for a parallel
computation, compute the speedup of applying 100
processors to it, assuming that the overhead remains
constant as processors are added. (ii) Given this
speedup, what is the efficiency? (iii.) Using this
efficiency, suppose each processor is capable of 10
Gflops peak performance. What is the best
performance we can expect for this computation in
Gflops?
•  Problem 2 (Data Dependences):
Does the following code have a loop-carried
dependence? If so, identify the dependence or
dependences and which loop carries it/them.
for (i=1; i < n-1; i++)
for (j=1; j < n-1; j++)
A[i][j] = A[i][j-1] + A[i-1][j+1];
10/04/2012"
CS4230"
2"
Homework 3, cont.
Problem 3 (Locality):
Sketch out how to rewrite the following code to
improve its cache locality and locality in registers.
Assume row-major access order.
for (i=1; i<n; i++)
for (j=1; j<n; j++)
a[j][i] = a[j-1][i-1] + c[j];
Briefly explain how your modifications have improved
memory access behavior of the computation.
10/04/2012"
CS4230"
3"
Homework 3, cont.
Problem 4 (Task Parallelism):
Construct a producer-consumer pipelined code in OpenMP to identify the
set of prime numbers in the sequence of integers from 1 to n. A common
sequential solution to this problem is the sieve of Erasthones. In this
method, a series of all integers is generated starting from 2. The first
number, 2, is prime and kept. All multiples of 2 are deleted because they
cannot be prime. This process is repeated with each remaining number, up
until but not beyond sqrt(n). A possible sequential implementation of this
solution is as follows:
for (i=2; i<=n; i++) prime[i] = true; // initialize
for (i=2; i<= sqrt(n); i++) {
if (prime[i]) {
for (j=i+i; j<=n; j = j+i) prime[j] = false;
}
}
The parallel code can operate on different values of i. First, a series of
consecutive numbers is generated that feeds into the first pipeline stage.
This stage eliminates all multiples of 2 and passes remaining numbers onto
the second stage, which eliminates all multiples of 3, etc. The parallel
code terminates when the “terminator” element arrives at each pipeline
stage.
10/04/2012"
CS4230"
4"
Download