Uploaded by metafrosted


G. Alaghband
Parallel and Distributed Systems
CSCI 4551/5551
Homework 2
Problems 1, 2, 3 due Thursday 9/12/19 (individual work, In-class discussion &
solution; will collect for grading)
Problems 4, 5, 6, and the team assignments, due 9/19/19
All work to be submitted as Hard copy. For the first submission, you may add a separate page with
your corrections during class discussion and submit the entire work.
Read chapter 1 and do the following problem:
1. Write SIMD and MIMD pseudo code for the tree summation of n elements of a
one dimensional array for n a power of two. Do not use recursive code but organize
the computations as iterations. The key issue is to determine which elements are
to be added by a single arithmetic unit or a single processor at any level of the tree.
You may overwrite elements of the array to be summed.
2. The SIMD matrix multiply pseudo code of Program 1-1is written to avoid doing
an explicit reduction operation that is needed for the dot product of two vectors.
Write another version of SIMD matrix multiply pseudo code that avoids the
reduction operation. Describe the order of operations and compare it with both
the sequential version and the SIMD version of Program 1-1.
Read chapter 2 and do the following problems:
3. Assume that we have N = 2l data structures, and that the amount of memory
required by each is given by len[i], i =0,1,...,N-1. It is desired to determine for all
K whether data structures 0,1,...,K-1, K will all fit in a memory area of size M. The
result of the computation is a logical vector fit[i], i = 0,1,...,N-1 such that fit[K] =
true if and only if data structures 0,1,...,K-1, K will all fit together in a memory
area of size M.
a. What is the size of a sequential algorithm to perform this computation?
Note: Comparison of two numbers is counted the same as any other arithmetic operation.
b. What is the minimum depth for a parallel algorithm to perform this computation?
c. For N = 8 show the dependence graph of an algorithm of size no more than 19
operations and depth no more than 5 which does the computation.
4. Draw P0(32). While coming up with this solution, pay attention to the relation
between the depth and the recursive application of the algorithm. Can you show
that the depth is Log2N?
5. Consider the evaluation of “perfect trees” using P processors. A “perfect tree” of
height m is a binary tree with 2m - 1 nodes. It thus has as many nodes as any binary
tree of height m can have. Assume that the nodes are operators which take unit
time to perform. Suppose we have P=2m-1/2k processors, where 0 ≤ k ≤ m-1.
G. Alaghband
Parallel and Distributed Systems
CSCI 4551/5551
- What is the time TP in which a perfect tree can be evaluated with P processors?
- What are the speedup SP and efficiency EP with this number of processors?
- How does the efficiency vary as k changes?
6. Start an individual web site to be used for this course. We will use this URL
address for posting your team assignments, and for your research & completed
projects so they are available for peer review.
Website and Teams: Form a team of four members for the team assignments
(will be part of our HW assignments. The team for final projects may be
different). Provide a list of your team members, (last name, first name) and your
URL, one per line. On the last line provide the URL of a website for your team’s
posting. Email This information (one/team) to
HUYNH.MANH@UCDENVER.EDU (Remember for emails to me, Manh and
Iris use "CSCI 5551 or CSCI 4551" in the subject field. Your email massage must
be in the following order, and on one line:
Team Assignments: Discuss your findings and post your team’s findings.
Post your answers to the following questions on your team’s website. Do not turn these
in as hard copy. Later in the semester, a review assignment will be made.
Team Assignment 1. Summarize (in no more than two double spaced typed pages) your
understanding, impressions, interpretation of the parallel algorithms discussions we have
been having in the last several classes. Pay specific attention to the prefix problem and
the various algorithm designs presented. Feel free to use your lecture notes and Chapter
two of Fundamentals of Parallel Processing and relate your discussion to any material
covered in class or your outside class knowledge. If this assignment has invoked
questions in your mind or opened possible new thoughts about parallel processing or
other subjects, I would like to read about them.
Team Assignment 2. Research some application areas for the prefix problems. Specify
two applications for prefix problems. For each write a short summary, give complete
references. If your reference is a paper, attach a copy of the paper (or a URL); this may
be needed at the time of review if the material you are referencing is not readily
Note: Please refer to the Reference Guide, IEEE Style at
http://www.ieee.org/documents/ieeecitationref.pdf to be sure you present your
references correctly.