Multicore Chips and Parallel Programming Mary Hall Dept. of Computer Science and Information Sciences Institute March 18, 2008 SSE Meeting 1 The Multicore Paradigm Shift: Technology Drivers March 18, 2008 SSE Meeting 2 Part 1: Technology Trends What to do with all these transistors? • Key ideas: – Movement away from increasingly complex processor design and faster clocks – Replicated functionality (i.e., parallel) is simpler to design – Resources more efficiently utilized – Huge power management advantages March 18, 2008 SSE Meeting 3 The Architectural Continuum Supercomputer: IBM BG/L Commodity Server: Sun Niagara Embedded: Xilinx Virtex 4 March 18, 2008 SSE Meeting 4 Multicore: Impact on Software Consequences: – Individual processors will no longer get faster. At first, they might get a little slower. – Today’s software may not perform as well on tomorrow’s hardware as written. • And forget about adding capability! The very future of the computing industry demands successful strategies for applications to exploit parallelism across cores! March 18, 2008 SSE Meeting 5 The Multicore Paradigm Shift: Computing Industry Perspective We are at the cusp of a transition to multicore, multithreaded architectures, and we still have not demonstrated the ease of programming the move will require… I have talked with a few people at Microsoft Research who say this is also at or near the top of their list [of critical CS research problems]. Justin Rattner, CTO, Intel Corporation March 18, 2008 SSE Meeting 6 The Rest of this Talk • • • Convergence of high-end, conventional and embedded computing – Application development and compilation strategies for high-end (supercomputers) are now becoming important for the masses Why? – Technology trends (Motivation) Looking to the future 1. Automatically generating parallel code is useful, but insufficient. 2. Parallel computing for the masses demands better parallel programming paradigms. 3. Compiler technology will become increasingly important to deal with a diversity of optimization challenge… and must be engineered for managing complexity and adapting to new architectures. 4. Potential to exploit vast machine resources to automatically compose applications and systematically tune application performance. 5. New tunable library and component technology. March 18, 2008 SSE Meeting 7 1. Automatic Parallelization From Hall et al., “Maximizing Multiprocessor Performance with the SUIF Compiler”, IEEE Computer, Dec. 1996. • Old approaches: – Limited to loops and array computations – Difficult to find sufficient granularity (parallel work between synchronization) – Success from fragile, complex software • New ideas in this area: – Finer granularity of parallelism -- more plentiful – Combine with hardware support (e.g., speculation and multithreading) March 18, 2008 SSE Meeting 8 2. Parallel Programming State of the Art Three dominant classes of applications Domains Appl. Characteristics Programming Paradigms Scientific Computing Very large arrays MPI dominant, representing simulation Also, OpenMP, PGAS region, loops, data parallel Grids & distributed computing Databases Queries over large data sets, often distributed Query languages like SQL Systems and Embedded Software Fine-grain threads, small number of processors Low-level threading such as Pthreads Domain-specific, intellectually challenging and low-level March 18, 2008 SSE Meeting 9 programming models not suitable for the masses. 2. New Parallel Programming Paradigms • Transactional memory – Section of code executes atomically with subsequent commit or rollback – Programming model + hardware support • Streams and data-parallel models – Data streams describe the flow of data – Well-suited for certain applications and hardware (IBM Cell, GPUs) • Domain-specific languages and libraries – Parallelism implicit within implementation Different applications and users demand different solutions. Convergence unlikely. Architecture March 18, 2008 SSE Meeting independence? 10 3. Engineering a Compiler • Compiler research will play a crucial role in achieving performance and programmability of multi-core hardware. • What is the state of compilers today? – Roughly 5 year lag between introducing a new architecture and a robust compiler – Many interesting new architectures fail in the marketplace due to inadequate software tools • Today’s compilers are complex and monolithic – SUIF has ~500K LOC, Open64 has ~12M LOC The best research ideas do not always make it into practice March 18, 2008 SSE Meeting 11 3. A New Kind of “Compiler” Traditional view: code Batch Compiler input data March 18, 2008 SSE Meeting 12 3 & 4. Performance Tuning “Compiler” transformation script(s) code Experiments Engine Code Translation input data (characteristics) search script(s) March 18, 2008 SSE Meeting 13 4. Auto-tuner Experiments Engine code transformation script(s) Code Translation input data (characteristics) March 18, 2008 search script(s) SSE Meeting 14 Heterogeneous: Additional Complexity Other: • Utilizing highly tuned libraries • Differences in programming models (GPP +FPGA is extreme example) Device Type 1 Memory Staging Data to/from global memory Managing data movement and synchronization Device Type 2 Device Type 3 Device Type 4 Partitioning: Where to execute? March 18, 2008 SSE Meeting 15 5. Libraries and Component Technology Expanded View Traditional View Interface: Provides/ Requires Interface: Abstract Provides/ Requires Code (source or binary) Partial Code (source or tunable binary) Performance: Device, Data Features Code Generator Data Description: Types, Sizes Interface: Device Dependencies Data Description: Types, Sizes Data Description: Map Features to Optimization Support for automatic selection, tuning, scheduling, etc. March 18, 2008 SSE Meeting 16 Summary • Parallel computing is everywhere! – And we need software tools – Can we find some common ground? • Strategies – Automatic parallelization – Libraries and domain-specific tools that hide parallelism component technology – New programming languages – Auto-tuners to “test” alternative solutions • General approach to solving challenges – Education: CS503, Parallel Programming – Organize the community to support incremental LONG TERM development. March 18, 2008 SSE Meeting 17