MIT OpenCourseWare http://ocw.mit.edu 6.189 Multicore Programming Primer, January (IAP) 2007 Please use the following citation format: Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare). http://ocw.mit.edu (accessed MM DD, YYYY). License: Creative Commons Attribution-Noncommercial-Share Alike. Note: Please use the actual date you accessed this material in your citation. For more information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms 6.189 IAP 2007 Lecture 1 Multicore Programming Primer and Programming Competition Introduction Prof. Saman Amarasinghe, MIT. 1 6.189 IAP 2007 MIT The “Software Crisis” “To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem." -- E. Dijkstra, 1972 Turing Award Lecture Prof. Saman Amarasinghe, MIT. 2 6.189 IAP 2007 MIT The First Software Crisis ● Time Frame: ’60s and ’70s ● Problem: Assembly Language Programming • Computers could handle larger more complex programs ● Needed to get Abstraction and Portability without losing Performance Prof. Saman Amarasinghe, MIT. 3 6.189 IAP 2007 MIT How Did We Solve the First Software Crisis? ● High-level languages for von-Neumann machines • FORTRAN and C ● Provided “common machine language” for uniprocessors Common Properties Single flow of control Single memory image Differences: Register File ISA Functional Units Prof. Saman Amarasinghe, MIT. 4 6.189 IAP 2007 MIT The Second Software Crisis ● Time Frame: ’80s and ’90s ● Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers • Computers could handle larger more complex programs ● Needed to get Composability, Malleability and Maintainability • High-performance was not an issue • left for Moore’s Law Prof. Saman Amarasinghe, MIT. 5 6.189 IAP 2007 MIT How Did We Solve the Second Software Crisis? ● Object Oriented Programming • C++, C# and Java ● Also… • Better tools – • Component libraries, Purify Better software engineering methodology – Design patterns, specification, testing, code reviews Prof. Saman Amarasinghe, MIT. 6 6.189 IAP 2007 MIT Today: Programmers are Oblivious to Processors ● Solid boundary between Hardware and Software ● Programmers don’t have to know anything about the processor • High level languages abstract away the processors – • Ex: Java bytecode is machine independent Moore’s law does not require the programmers to know anything about the processors to get good speedups ● Programs are oblivious of the processor • work on all processors • A program written in ’70 using C still works and is much faster today ● This abstraction provides a lot of freedom for the programmers Prof. Saman Amarasinghe, MIT. 7 6.189 IAP 2007 MIT The Origins of a Third Crisis ● Time Frame: 2005 to 20?? ● Problem: Sequential performance is left behind by Moore’s law ● Needed continuous and reasonable performance improvements • • to support new features to support larger datasets ● While sustaining portability, malleability and maintainability without unduly increasing complexity faced by the programmer • critical to keep-up with the current rate of evolution in software Prof. Saman Amarasinghe, MIT. 8 6.189 IAP 2007 MIT The March to Multicore: Moore’s Law Image removed due to copyright restrictions. Graph of number of transistors versus year. From Hennessy, J. L., D. A. Patterson, and A. C. Arpaci-Dusseau. Computer Architecture: A Quantitative Approach. 4th ed. Amsterdam, The Netherlands: Morgan Kaufmann, 2006. ISBN: 9780123704900. Prof. Saman Amarasinghe, MIT. 9 6.189 IAP 2007 From MIT David Patterson The March to Multicore: Uniprocessor Performance (SPECint) i nt el 386 i nt el 486 i nt el pent i um i nt el pent i um 2 10000.00 Specint2000 i nt el pent i um 3 i nt el pent i um 4 i nt el i t ani um A l pha 21064 A l pha 21164 1000.00 A l pha 21264 Spar c Super Spar c Spar c 64 100.00 M i ps HP P A P ower P C AMD K6 AMD K7 10.00 A M D x86-64 1.00 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 Prof. Saman Amarasinghe, MIT. 10 6.189 IAP 2007 MIT The March to Multicore: Uniprocessor Performance (SPECint) ● General-purpose unicores have stopped historic performance scaling • • • • Power consumption Wire delays DRAM access latency Diminishing returns of more instruction-level parallelism Prof. Saman Amarasinghe, MIT. 11 6.189 IAP 2007 From MIT David Patterson Power Consumption (watts) i ntel 386 i ntel 486 1000 Power i ntel penti um i ntel penti um2 i ntel penti um3 i ntel penti um4 i ntel i tani um Al pha 21064 100 Al pha 21164 Al pha 21264 Spar c Super Spar c Spar c64 Mi ps 10 HP PA Power PC AMD K6 AMD K7 AMD x86-64 1 85 87 89 Prof. Saman Amarasinghe, MIT. 91 93 95 12 97 99 01 03 05 07 6.189 IAP 2007 MIT Power Efficiency (watts/spec) 0.7 intel 386 intel 486 0.6 intel pentium intel pentium 2 intel pentium 3 0.5 intel pentium 4 intel itanium Watts/Spec A lpha 21064 0.4 A lpha 21164 A lpha 21264 Sparc 0.3 SuperSparc Sparc64 M ips 0.2 HP P A P o wer P C A M D K6 0.1 A M D K7 A M D x86-64 0 1982 1984 1987 1990 1993 1995 1998 2001 2004 2006 Year Prof. Saman Amarasinghe, MIT. 13 6.189 IAP 2007 MIT Range of a Wire in One Clock Cycle 0.28 0.26 0.24 • 400 mm2 Die • From the SIA Roadmap 700 MHz 0.22 Process (microns) 0.2 1.25 GHz 0.18 0.16 0.14 2.1 GHz 0.12 0.1 0.08 6 GHz 10 GHz 0.06 13.5 GHz 0.04 0.02 0 1996 1998 2000 Prof. Saman Amarasinghe, MIT. 2002 2004 2006 Year 14 2008 2010 2012 6.189 IAP 2007 MIT 2014 DRAM Access Latency Images removed due to copyright restrictions. ● Access times are a speed of light issue ● Memory technology is also changing SRAM are getting harder to scale • DRAM is no longer cheapest cost/bit ● Power efficiency is an issue here as well µProc 60%/yr. (2X/1.5yr) DRAM 9%/yr. (2X/10 yrs) • Performance 1000000 10000 100 Year Prof. Saman Amarasinghe, MIT. 15 6.189 IAP 2007 MIT 20 04 20 02 20 00 19 98 19 96 19 94 19 92 19 90 19 88 19 86 19 84 19 82 19 80 1 Diminishing Returns ● The ’80s: Superscalar expansion • • 50% per year improvement in performance Transistors applied to implicit parallelism – pipeline processor (10 CPI --> 1 CPI) ● The ’90s: The Era of Diminishing Returns • Squeaking out the last implicit parallelism – – • • 2-way to 6-way issue, out-of-order issue, branch prediction 1 CPI --> 0.5 CPI performance below expectations projects delayed & canceled ● The ’00s: The Beginning of the Multicore Era • The need for Explicit Parallelism Prof. Saman Amarasinghe, MIT. 16 6.189 IAP 2007 MIT Unicores are on the verge of extinction Multicores are here MIT Raw 16 Cores Since 2002 Intel Montecito 1.7 Billion transistors Dual Core IA/64 Intel Pentium D (Smithfield) Intel Tanglewood Dual Core IA/64 Intel Dempsey Dual Core Xeon Intel Pentium Extreme 3.2GHz Dual Core Cancelled Intel Tejas & Jayhawk Unicore (4GHz P4) Intel Yonah Dual Core Mobile AMD Opteron Dual Core Sun Olympus and Niagara 8 Processor Cores IBM Cell Scalable Multicore IBM Power 6 Dual Core IBM Power 4 and 5 Dual Cores Since 2001 … 2H 2004 1H 2005 Prof. Saman Amarasinghe, MIT. 2H 2005 17 1H 2006 2H 2006 6.189 IAP 2007 MIT Multicores are Here 512 Picochip PC102 256 Ambric AM2045 Cisco CSR-1 128 Intel Tflops 64 32 # of cores 16 Raw 8 Niagara Boardcom 1480 4 2 1 Raza XLR 4004 8080 8086 286 386 486 Pentium 8008 1970 1975 Prof. Saman Amarasinghe, MIT. 1980 1985 1990 18 Cavium Octeon Cell Opteron 4P Xeon MP Xbox360 PA-8800 Opteron Tanglewood Power4 PExtreme Power6 Yonah P2 P3 Itanium P4 Itanium 2 Athlon 1995 2000 2005 20?? 6.189 IAP 2007 MIT Requirements and Outcomes ● Requirements • • A good programmer with experience Fluent in C ● Outcomes • • • • Know fundamental concepts of parallel programming (both hardware and software) Understand issues of parallel performance Able to synthesize a fairly complex parallel program Hands-on experience with the IBM Cell processor Prof. Saman Amarasinghe, MIT. 19 6.189 IAP 2007 MIT The Project ● You proposed the projects ● We selected 7 teams • Mainly by the strength of the project proposals ● Seven Great Projects • • • • • • • Distributed Real-time Ray Tracer Global Illumination Linear Algebra Pack Molecular Dynamics Simulator Speech Synthesizer Soft Radio Backgammon Tutor ● Project Characteristics • • • Ambitious but accomplishable Important and Relevant Opportunity to sizzle Courtesy of Sony Computer Entertainment Inc. Used with permission. ● Get them started ASAP! Prof. Saman Amarasinghe, MIT. 20 6.189 IAP 2007 MIT A Note of Caution ● ● ● ● ● Cell processor is very new It is not an easy architecture to work with The tool chain is thin and brittle Most of the staff have limited experience Projects you are doing are of your own making. They aren’t canned exercises that are tried and proven. ● You will face unexpected problems. ● WE ARE ALL IN THIS TOGETHER!! Prof. Saman Amarasinghe, MIT. 21 6.189 IAP 2007 MIT Grading ● Mini Quizzes • • 16% At the beginning of each class day 5 minutes each ● Lab Projects 24% ● Final Group Project 60% Prof. Saman Amarasinghe, MIT. 22 6.189 IAP 2007 MIT Final Competition ● The competition will be decided on • • • • Performance Completeness Algorithmic complexity Demo and Presentation ● The winning team will • • Get gift certificates ($150 each) Be invited to IBM TJ Watson Research Center for a day – – Tour of the facilities Present your project Prof. Saman Amarasinghe, MIT. 23 6.189 IAP 2007 MIT Staff ● Prof. Saman Amarasinghe • • • • Interested in languages, compilers and computer architecture Raw Processor (with Prof. Anant Agarwal) StreamIt language SUIF parallelizing compiler ● Dr. Rodric Rabbah • • • Currently a researcher at IBM Watson Research Center Was a research scientist at CSAIL before that Interested in compilers, computer architecture and FPGAs Prof. Saman Amarasinghe, MIT. 24 6.189 IAP 2007 MIT Guest Lectures ● Dr. Michael Perrone • • IBM Watson Research Center Expert in Cell Architecture and Application Development ● Prof. Alan Edelman • Math and CS. Interested in parallel algorithms ● Prof. Arvind • Parallel architectures, compilers and languages ● Dr. Bradley Kuszmaul • Research scientist at CSAIL working on Cilk ● Mike Acton • Professional game developer ● Bill Thies • • CSAIL PhD candidate Architect of StreamIt Prof. Saman Amarasinghe, MIT. 25 6.189 IAP 2007 MIT Lecture Organization Extracting Parallelism Implicit Explicit Hardware Compiler Languages Superscalar Processors Parallelizing Compilers StreamIt (Lecture 8) Star-P (Lecture 13) BlueSpec (Lecture 14) Cilk (Lecture 15) (start of Lecture 3) (Lectures 11 & 12) Prof. Saman Amarasinghe, MIT. 26 Library Concurrency (Lecture 4) Design Patterns (Lectures 5,6 7) 6.189 IAP 2007 MIT Schedule Monday 10:00 – Lecture 1: Course Introduction 10:55 Jan 8 Jan 15 Tuesday Recitation 1: Getting to Know Cell Jan 29 Lecture 3: Introduction to Parallel Architectures 11:05 – Lecture 2: Introduction to Cell 12:00 Processor Lecture 4: Introduction to Concurrent Programming 10:00 – 10:55 Lecture 7: Design Patterns for Parallel Programming II Holiday Recitation 2-3: Cell Programming Hands-On 11:05 – 12:00 Jan 22 Wednesday 10:00 – Lecture 11: Classic Parallelizing 10:55 Compilers Lecture 12: StreamIt 11:05 – Parallelizing 12:00 Compiler Recitation 5, 6: Cell Performance Monitoring Tools Thursday Friday Lecture 5: Parallel Programming Concepts Project Reviews Lecture 6: Design Patterns for Parallel Programming I Lecture 9: Debugging and Performance Monitoring Recitation 4: Cell Debugging Tools Lecture 8: StreamIt Language Lecture 10: Performance Optimizations Lecture 13: Star-P Lecture 15: Cilk Lecture 14: Synthesizing Parallel Programs Lecture 16: Anatomy of a Game 10:00 – Lecture 17: The Raw Experience 10:55 Group Presentations Awards & Reception 11:05 – 18: The Future 12:00 Prof. Saman Amarasinghe, MIT. 27 6.189 IAP 2007 MIT