CSCI 4125 Programming for Performance Andrew Rau-Chaplin arc@cs.dal.ca www.cs.dal.ca/~arc Course Objectives Explore techniques for designing, implementing and evaluating efficient programs for Sequential computers, Shared-Memory Multiprocessors, and Distributed Memory Multicomputers Make it go fast! Performance oriented dev cycle techniques and tools for a performance oriented development cycle Algorithm design Implementation Benchmarking/evaluation Performance Tuning Quantifying performance Themes include: evaluation of performance design of test data sets issues of stability/reliability scalability common performance enhancing techniques parallel algorithm design techniques identification and elimination of dependencies Skills Development how to how to how to how to how to tools how to how to design experiments/benchmarks use of statistics in performance evaluation instrument code to obtain reliable timings use compiler switches use a profiler and performance tuning use a debugger/tracing tools plot performance results Topics Introduction to Parallelism Parallel Programming Parallel Architectures Parallel Algorithms Parallel Applications Other Parallel Architectures & Algorithms Official Outline This course explores the design, implementation, and evaluation of computer programs for applications in which performance is a central issue. In the sequential and multi-core settings, it explores topics such as profiling, cache effects, I/O performance, floatingpoint issues, multi-threading, and performance tuning techniques. It introduces techniques for the design, implementation and evaluation of programs for Multicore processors, SharedMemory Multiprocessors (SMPs) and Distributed Memory Multicomputers (Clusters). Resources Course web page: www.cs.dal.ca/~arc/teaching/CSc4125 All notes, readings, assignments Parallel Machines Your laptop! CGM6 & CGM7 Hugh Readings Sorry no text book! Will Assign Readings Books Introduction to High Performance Computing for Scientists and Engineers by Georg Hager and Gerhard Wellein Parallel Programming by Peter Pacheco, Morgan Kaufman Structured Parallel Programming by Michael McCool, Arch D. Robison, and James Reinders Parallel Programming in C with MPI and OpenMP by Quinn Parallel Programming with Intel Parallel Studio XE by S. Blair-Chappell and A. Stokes Using OpenMP: Portable Shared Memory Parallel Programming By Barbara Chapman, Gabriele Jost and Ruud van der Pas; Parallel Programming in OpenMP, by Rohit Chandra, Dave Kohr, Jeff McDonald, Morgan Kaufman Prerequisites Knowledge of C Csci3120: Operating systems Good to have CSci3110 - Analysis of Algorithms Course Evaluation Assignments (3) Seminar Project Participation 30% 20% 40% 10% See course web page for assignment copies and due dates Assignments Selected From Sequential Optimization OpenMP Cilk Thread building blocks MPI Hadoop CUDA/OpenCL Project Select your own topic Either Optimize an existing codebase Design and implementation of an efficient new code Components: Literature/Code review, some research or programming work, final paper, presentation Main Deliverables: Demo plus Conference style paper Questions Why are you taking this course? Which performance oriented technologies are you interested in? How will you know if the course has been a success for you?