Introduction What class is this? CS 484 – Parallel Processing Who am I? Mark Clement 2214 TMCB clement@cs.byu.edu 422-7608 When are office hours? MW 10-12 and appointment Course and Text Homepage (Everything is here) BYU Learning Suite Text: Online tutorials & papers Course Objectives Students will understand and demonstrate the ability to develop Shared memory parallel programs (OpenMP & Pthreads), Distributed memory parallel programs (MPI), and Data parallel programs (CUDA) Students will understand and be able to implement several basic parallel algorithms and load balancing techniques. Lectures Follow schedule on the homepage We will move quickly. READ BEFORE CLASS!!!!! There will be in-class quizzes - no makeup I will post lecture notes on the web. What is expected of you? READ!!! Assignments C/C++ on the Supercomputer & CS Open Labs All assignments include a report. The report IS the assignment. The program is what you did to accomplish the assignment. - I will grade your writing. Submit through Learningsuite Exams In the testing center 8 1/2 x 11 sheet of notes allowed What is expected of you? Get an account on the university supercomputers Go to fsl.byu.edu Register Do it today!!! Read through the batch jobs tutorial for marylou Schedule a simple hello world using a PBS script. copy ~mjc22/hello/hello.c and ~mjc22/hello/hello.pbs edit hello.pbs mpicc hello.c sbatch hello.pbs Grading Grade Distribution 60 % 15 % 15 % 10% Grade Scale Homework & labs Midterm Final Project 94 % - Above 90 % - 93.9 % 80 % - 89.9 % 65 % - 79.9 % A AB-, B, B+ C-, C, C+ I expect everyone to get a good grade! Policies Late Work don’t like late work. However….. 10% off per school day – limited to 70% off I Programming Assignments To get better than a C-, you must complete ALL labs If you don’t complete all the labs, your grade ceiling is a C- Other Policies Honor Code: I expect you to follow the honor code including the dress and grooming standards. You can work together in groups on the homework and laboratories from a conceptual perspective, but the answers that you give and the programs that you write should be your own, not copies of other students work. Your reports should focus on your own ideas and things you have learned from your experimentation. Other Policies Cheating Cheating in any form will NOT be tolerated. This includes copying any part of a homework assignment or programming lab. Any assignment turned in that is not your work will be given a negative full value score. Any cases of multiple offense or cheating on a test will result in failure of the class. Systems Abuse Policy Abuse in any form will result in immediate suspension of your accounts(s). Other Policies Preventing Sexual Harassment Title IX of the Education Amendments of 1972 prohibits sex discrimination against any participant in an educational program or activity receiving federal funds. Title IX covers discrimination in programs, admissions, activities, and student-to-student sexual harassment. BYU's policy against sexual harassment extends not only to employees but to students as well. If you encounter unlawful sexual harassment or gender based discrimination, please talk to your professor; contact the Equal Employment Office at 377-5895 or 367-5629 (24 hours); or contact the Honor Code Office at 378-2847. Other Policies Students With Disabilities Brigham Young University is committed to providing a working and learning atmosphere that reasonably accommodates qualified persons with disabilites. If you have any disability, which may impair your ability to complete this course successfully, please contact the Services for Students with Disabilities Office (3782767). Reasonable academic accommodations are reviewed for all students who have qualified documented disabilites. Other Policies Children in the Classroom The study of Computer Science requires a degree of concentration and focus that is exceptional. Having small children in class is often a distraction that degrades the educational experience for the whole class. Please make other arrangements for child care rather than bringing children to class with you. If there are extenuating circumstances, please talk with your instructor in advance. What will we do in this class? Focus on shared memory and message passing programming PThreads OpenMP MPI Data parallel programming with CUDA Write code for the supercomputers Study parallel algorithms and parallelization What is Parallelism? Multiple tasks working at the same time on the same problem. Parallel Computing What is a parallel computer? A set of processors that are able to work cooperatively to solve a computational problem Examples Parallel supercomputers Clusters of workstations Symmetric multiprocessors Multiple core processors Won't serial computers be fast enough? Moore's Law Double in speed every 18 months Predictions of need British government in 1940s predicted they would only need about 2-3 computers Market for Cray was predicted to be about 10 Problem Doesn't take into account new applications. Applications Drive Supercomputing Traditional Weather simulation and prediction Climate modeling Chemical and physical computing New apps. Collaborative environments DNA Virtual reality Parallel databases Games Photoshop Application Needs Graphics 109 volume elements 200 operations per element Real-time display Weather & Climate year simulation involves 1016 operations Accuracy can be improved by higher resolution grids which involves more operations. 10 Cost-Performance Trend 1990s 1980s 1970s Performance 1960s Cost Serial computers What does this suggest? More performance is easy to a point. Significant performance increases of current serial computers beyond the saturation point is extremely expensive. Connecting large numbers of microprocessors into a parallel computer overcomes the saturation point. Cost stays low and performance increases. Computer Design Single processor performance has been increased lately by increasing the level of internal parallelism. Multiple functional units Pipelining Higher performance gains by incorporating multiple "computers on a chip.” Gigahertz race is over (Intel won) Multiple cores is where performance will come from Computer Performance TFLOPS IBM SP-2 Cray C90 Cray X-MP Cray 1 1e12 CDC 7600 1e7 IBM 7090 IBM 704 1e2 Eniac 1950 1975 2000 Communication Performance Early 1990s Ethernet Mid 1990s FDDI Mid 1990s ATM Late 1990s Fast Ethernet Late 1990s Gig Ethernet Gbps is now commonplace 10 Mbps 100 Mbps 100s Mbps 100 Mbps 100s Mps Performance Summary Applications are demanding more speed. Performance trends Processors are increasing in speed. Communication performance is increasing. Future Performance trends suggest a future where parallelism pervades all computing. Concurrency is key to performance increases. Parallel Processing Architectures Architectures Single computer with lots of processors Multiple interconnected computers Architecture governs programming Shared memory and locks Message passing Shared Memory Computers Uniform Memory Access All access the same memory can lead to memory bottleneck Non-uniform Memory Access different speed access more scalable Message Passing Computers Distributed Shared Memory Interconnection Network Processors Memory Modules Message Passing Architectures Requires some form of interconnection The network is the bottleneck Latency and bandwidth Diameter Bisection bandwidth Message Passing Architectures Line/Ring Mesh/Torus Message Passing Architectures Tree/Fat Tree Message Passing Architectures Hypercube Message Passing Architectures Switched Nice, but also has limitations Parallel Programming Properties Concurrency Performance should increase by employing multiple processors. Scalability Performance should continue to increase as we add more processors. Locality of Reference Performance will be greater if we only access local memory. Architecture affects each of the above! Parallel Programming Models Threads Parallel Programming Models Message Passing Parallel Programming Models Data Parallel So What? In this class you will learn to program using each of the parallel programming models. We will talk about advantages/disadvantages of each model We will also learn about common parallel algorithms and techniques