G. Alaghband • Parallel and Distributed Systems CSCI 4551/5551 Homework 2 Problems 1, 2, 3 due Thursday 9/12/19 (individual work, In-class discussion & solution; will collect for grading) Problems 4, 5, 6, and the team assignments, due 9/19/19 • All work to be submitted as Hard copy. For the first submission, you may add a separate page with your corrections during class discussion and submit the entire work. Read chapter 1 and do the following problem: 1. Write SIMD and MIMD pseudo code for the tree summation of n elements of a one dimensional array for n a power of two. Do not use recursive code but organize the computations as iterations. The key issue is to determine which elements are to be added by a single arithmetic unit or a single processor at any level of the tree. You may overwrite elements of the array to be summed. 2. The SIMD matrix multiply pseudo code of Program 1-1is written to avoid doing an explicit reduction operation that is needed for the dot product of two vectors. Write another version of SIMD matrix multiply pseudo code that avoids the reduction operation. Describe the order of operations and compare it with both the sequential version and the SIMD version of Program 1-1. Read chapter 2 and do the following problems: 3. Assume that we have N = 2l data structures, and that the amount of memory required by each is given by len[i], i =0,1,...,N-1. It is desired to determine for all K whether data structures 0,1,...,K-1, K will all fit in a memory area of size M. The result of the computation is a logical vector fit[i], i = 0,1,...,N-1 such that fit[K] = true if and only if data structures 0,1,...,K-1, K will all fit together in a memory area of size M. a. What is the size of a sequential algorithm to perform this computation? Note: Comparison of two numbers is counted the same as any other arithmetic operation. b. What is the minimum depth for a parallel algorithm to perform this computation? c. For N = 8 show the dependence graph of an algorithm of size no more than 19 operations and depth no more than 5 which does the computation. 4. Draw P0(32). While coming up with this solution, pay attention to the relation between the depth and the recursive application of the algorithm. Can you show that the depth is Log2N? 5. Consider the evaluation of “perfect trees” using P processors. A “perfect tree” of height m is a binary tree with 2m - 1 nodes. It thus has as many nodes as any binary tree of height m can have. Assume that the nodes are operators which take unit time to perform. Suppose we have P=2m-1/2k processors, where 0 ≤ k ≤ m-1. 1 G. Alaghband Parallel and Distributed Systems CSCI 4551/5551 - What is the time TP in which a perfect tree can be evaluated with P processors? - What are the speedup SP and efficiency EP with this number of processors? - How does the efficiency vary as k changes? 6. Start an individual web site to be used for this course. We will use this URL address for posting your team assignments, and for your research & completed projects so they are available for peer review. Website and Teams: Form a team of four members for the team assignments (will be part of our HW assignments. The team for final projects may be different). Provide a list of your team members, (last name, first name) and your URL, one per line. On the last line provide the URL of a website for your team’s posting. Email This information (one/team) to HUYNH.MANH@UCDENVER.EDU (Remember for emails to me, Manh and Iris use "CSCI 5551 or CSCI 4551" in the subject field. Your email massage must be in the following order, and on one line: Team Assignments: Discuss your findings and post your team’s findings. Post your answers to the following questions on your team’s website. Do not turn these in as hard copy. Later in the semester, a review assignment will be made. Team Assignment 1. Summarize (in no more than two double spaced typed pages) your understanding, impressions, interpretation of the parallel algorithms discussions we have been having in the last several classes. Pay specific attention to the prefix problem and the various algorithm designs presented. Feel free to use your lecture notes and Chapter two of Fundamentals of Parallel Processing and relate your discussion to any material covered in class or your outside class knowledge. If this assignment has invoked questions in your mind or opened possible new thoughts about parallel processing or other subjects, I would like to read about them. Team Assignment 2. Research some application areas for the prefix problems. Specify two applications for prefix problems. For each write a short summary, give complete references. If your reference is a paper, attach a copy of the paper (or a URL); this may be needed at the time of review if the material you are referencing is not readily available. Note: Please refer to the Reference Guide, IEEE Style at http://www.ieee.org/documents/ieeecitationref.pdf to be sure you present your references correctly. 2