st 21 Research Directions for Century Computer Systems ASPLOS 2013 Panel 0. Mark Hill: Introduction Impact? $15M NSF XPS (Exploiting Parallelism & Scalability) cites 1 & 4. 1. Kathryn McKinley on NAS Report The Future of Computing Performance: Game Over or Next Level? 2. Josep Torrellas on CCC Workshops Advancing Computer Architecture Research (ACAR) 3. Mark Hill on ISAT Workshop Advancing Computer Systems without Technology Progress Q: Do to facilitate, 4. Sarita Adve on CCC White Paper 21st Century Computer Architecture transcend, or refute 5. Emmett Witchel unbounded these partially overlapping visions? The Future of Computing Performance: Game Over or Next Level? Samuel H. Fuller, Chair March 22, 2011 Computer Science and Telecommunications Board (CSTB) National Research Council (NRC) Thanks to Sam Fuller & Mark Hill Committee On Sustaining Growth In Computing Performance Experts Addressed the Problem • • • • • • • • • • SAMUEL H. FULLER, Analog Devices Inc., Chair LUIZ ANDRÉ BARROSO, Google, Inc. ROBERT P. COLWELL, Independent Consultant WILLIAM J. DALLY, NVIDIA Corporation and Stanford University DAN DOBBERPUHL, PA Semi/Apple PRADEEP DUBEY, Intel Corporation MARK D. HILL, University of Wisconsin–Madison MARK HOROWITZ, Stanford University DAVID KIRK, NVIDIA Corporation MONICA LAM, Stanford University • KATHRYN S. McKINLEY, University of Texas at Austin • • CHARLES MOORE, Advanced Micro Devices KATHERINE YELICK, University of California, Berkeley Staff • • LYNETTE I. MILLETT, Study Director SHENAE BRADLEY, Senior Program Assistant 3 Executive Summary 1. Computer hardware has transitioned to multicore 2. Dennard scaling of CMOS has broken down 3. Parallelism and locality must be exploited by software 4. Chip power will soon limit multicore scaling Virtuous Cycle doubling of transistors Software Devices Innovation 2x more capable, efficient, cheaper, smaller, … Software Complexity Sequential Interface Hardware Complexity Sequential Interface 5 Breaks in Virtuous Cycle doubling of transistors end of Dennard Scaling Devices Software Innovation 2x more capable, efficient, cheaper, smaller, … Software Complexity Sequential Interface Hardware Complexity Sequential Interface Sequential Interface 6 Next Steps Innovate within and across layers • Algorithms • Programming “systems” • Architecture • Technology • Education 7 Community No news here? But… Are we all acting on this knowledge or are we acting business as usual? Are we thinking beyond next paper to where to create future value? Denial … Acceptance Act? 2. Advancing Computer Architecture Research (ACAR) • Two workshops sponsored by CCC o 25 + 19 attendees • • • • • Organizers: J. Torrellas (U Illinois) & M. Oskin (U Wash.) Issued a community-wide call for white papers Selection committee picked most relevant papers Included industry folks Also invited DARPA, DOE, NSF program managers http://www.cra.org/ccc/docs/ACAR_Report_Popular-Parallel-Programming.pdf http://www.cra.org/ccc/docs/ACAR2-Report.pdf What We Found Data centers and extreme scale computing Energy and power consumption are the key limiters Architectures for programmability Performance scaling: • Past: no SW changes • Now: extensive SW+HW changes Specialized architectures and heterogeneity Ultimate goal: fully automated generation of app-specific HW for programs What We Found End of road for conventional ISA Modern systems are skyscrapers built on the ISA of a bungalow Secure, reliable and predictable from the HW up Foundation of computing is breaking apart; malicious parties are exploiting it Exploiting emerging technologies Architecture research enables new technologies to enter the market quickly Discussion Points • Many directions of research are relevant: o Computer systems research is broadening • Focus on increasing funding pie, not re-distributing it • Need to create coalitions with other communities: o o o o Big data New computing materials and devices Healthcare … • Need to move away from incrementalism System Capability (log) Advancing Computer Systems without Technology Progress Fallow Period 80s 90s 00s 10s 20s 30s 40s 50s Seek ~1000x = two decades of Moore Law via four thrusts The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Approved for Public Release, Distribution Unlimited 13 A. Spectrum of Hardware Specialization Metric Ops/mm2 Ops/Watt Time to Soln NRE 1 1 1 1 (domain specific) 1.5 3-5 Progr. Accelerator 3 5-10 Fixed Accelerator 5-10 10 10 (SoC design) 3-5 10 10 10 (SoC design) 10 Normalized to General-Purpose Specialized ISA (domain specific) (app specific) Specialized Mem & Interconnect (monolithic die) Package level integration (multi die: logic,mem,analog) (programming GPP) 2-3 (designing & programming) 2-3 (designing & programming) 5 10+ 10+ (silicon interposer) Approved for Public Release, Distribution Unlimited 1.5 2-3 5 C. Reduce Software Bloat (e.g., matrix multiply) PHP 9,298,440 ms 51,090x Python 6,145,070 ms 33,764x 348,749 ms 1816x C 19,564 ms 107x Tiled C 12,887 ms 71x 6,607 ms 36x 182 ms 1 Java Vectorized BLAS Parallel • Can we achieve PHP productivity at BLAS efficiency? Approved for Public Release, Distribution Unlimited D. Locality-aware Parallelism • Now: Seek (vast) parallelism o e.g., simple, energy efficient cores • But remote communication >100x cost of compute = 1200 pJ (24x) 16 Approved for Public Release, Distribution Unlimited C. Approximate Computing Example SECOND ORDER DIFFERENTIAL EQUATION ON ANALOG ACCELERATOR WITH DIGITAL ACCELERATOR. Approved for Public Release, Distribution Unlimited Workshop Takeaway • Can Harvest in the “Fallow” Period! A. HW/SW Specialization/Co-design B. Reduce SW Bloat C. Approximate Computing --------------------------------------------------~1000x = 2 decades of Moore’s Law! • D. Systems must exploit LOCALITY-AWARE parallelism • HILL’s TWO CENTS: Move beyond General-Purpose o Systems that do new things, e.g., Kinect o Optimizations that help some, e.g., big memory workloads 18 Approved for Public Release, Distribution Unlimited 21st Century Computer Architecture A Community White Paper, April-May 2012 Mark D. Hill, U Wisconsin (coordinator) Sarita Adve, U Illinois David H. Albonesi, Cornell U David Brooks, Harvard U Luis Ceze, U Washington Sandhya Dwarkadas, U Rochester Joel Emer, Intel/MIT Babak Falsafi, EPFL Antonio Gonzalez, Intel/UPC Mary Jane Irwin, Penn State U David Kaeli, Northeastern U Stephen W. Keckler, NVIDIA/U Texas Christos Kozyrakis, Stanford U Alvin Lebeck, Duke U Milo Martin, U Pennsylvania José F. Martínez, Cornell U Margaret Martonosi, Princeton U Kunle Olukotun, Stanford U Mark Oskin, U Washington Li-Shiuan Peh, M.I.T. Milos Prvulovic, Georgia Tech Steven K. Reinhardt, AMD Michael Schulte, AMD/U Wisconsin Simha Sethumadhavan, Columbia U Guri Sohi, U Wisconsin Daniel Sorin, Duke U Josep Torrellas, U Illinois Thomas F. Wenisch, U Michigan David Wood, U Wisconsin Katherine Yelick, UC Berkeley/LBNL + Jim Larus & Jeannette Wing gave feedback + CCC, Erwin Gianchandani, Ed Lazowska guided process 19 Technology’s Challenges Late 20th Century Moore’s Law — 2× transistors/chip The New Reality Transistor count still 2× BUT… Dennard Scaling —~constant Gone. Can’t repeatedly double power/chip power/chip Modest (hidden) transistor unreliability Increasing transistor unreliability can’t be hidden Focus on computation over communication Communication (energy) more expensive than computation 1-time costs amortized via mass market One-time cost much worse & want specialized platforms How should architects step up as technology falters? 21st Century Computer Architecture 20th Century Single-chip in stand-alone computer 21st Century Architecture as Infrastructure: Spanning sensors to clouds X Performance plus security, privacy, availability, programmability, … Performance via Energy First invisible ● Parallelism X instruction ● Specialization level parallelism ● Cross-layer design CrossCutting: Break current layers with new Predictable New technologies (non-volatile memory, interfaces technologies: near-threshold, 3D, photonics, …) CMOS, DRAM, & Rethink: memory & storage, reliability, disks communication 21 Some Thoughts Architecture ??? ASPLOS 2014 ??? ASPLOS PL OS Need to step up for agency positions NSF CCF Division Director Search 5. Emmett Witchel Unbounded THE 90S SUCKED JERRY GARCIA DEAD 1995 THE VERVE THE VERVE PIPE ARCHITECTURE WAS BORING MICROARCHITECTURE PROVIDES PERFORMANCE Architecture Intel DEC Alpha Date µArch Clock Int95 Date µArch Clock Int95 05/96 Pentium 133 04.2 03/96 21064 266 04.3 10/97 Pentium II 266 10.8 04/97 21164 500 14.4 09/98 Pentium II 450 17.3 09/98 21164 533 16.8 Microarchitecture or Clock rate 1. Buy machine 2. Wait 18 months 3. Buy next one LIFE IS BETTER NOW ARCHITECTURE CHANGES PROVIDE VALUE Date µArch 01/10 Westmere 01/11 09/11 Intel Arch AES-NI Sandy Bridge Ivy Bridge Instruction for SHA-1 RdRand • VT-x (11/05) • Extended Page Tables (11/08) • VT-d (11/08) • VPID (11/08) (tagged TLB!) 1. Consider app 2. Buy machine 3. Goto 1 HARDWARE + SOFTWARE COOPERATION NECESSARY Security The ‘10s Mobile belong to Data centers ASPLOS Concurrency GPU/Accelerator st 21 Research Directions for Century Computer Systems ASPLOS 2013 Panel 0. Mark Hill: Introduction 1. Kathryn McKinley on NAS Report The Future of Computing Performance: Game Over or Next Level? 2. Josep Torrellas on CCC Workshops Advancing Computer Architecture Research (ACAR) 3. Mark Hill on ISAT Workshop Advancing Computer Systems without Technology Progress 4. Sarita Adve on CCC White Paper 21st Century Computer Architecture 5. Emmett Witchel unbounded Kathryn S. McKinley Kathryn S. McKinley is a Principal Researcher at Microsoft and an Endowed Professor of Computer Science at The University of Texas at Austin. She and her collaborators have produced widely used tools: the DaCapo Java Benchmarks, TRIPS Compiler, Hoard memory manager, MMTk garbage collector toolkit, and Immix garbage collector. Her awards include: NSF Career, ASPLOS 2009 Best Paper, 2012 IEEE Top Picks, CACM Research Highlights (2006, 2012), Most Influential OOPSLA Paper from 2002 (awarded 2012), the 2011 ACM SIGPLAN Distinguished Service Award, and the 2012 ACM SIGPLAN Programming Languages Software Award. She has graduated 17 PhD students. She is an IEEE Fellow and ACM Fellow. 33 Josep Torrellas Josep Torrellas is a Professor of Computer Science at the University of Illinois Urbana-Champaign. He is the Director of the Center for Programmable Extreme Scale Computing, and the Director of the Illinois-Intel Parallelism Center (I2PC). He has also been a Willett Faculty Scholar and lead the OpenSPARC Center of Excellence. He is the past Chair of the IEEE Technical Committee on Computer Architecture, and currently serves as a Council Member of CRA's Computing Community Consortium. He is a Fellow of IEEE and ACM. He has made many technical contributions in the areas of shared-memory parallel computer architecture, low-power design, hardware reliability, and software dependability. He has graduated 30 Ph.D. students, who are now leaders in academia and industry. He is currently working on the Bulk Multicore Architecture, and on the DARPA-funded Runnemede Extreme Scale Architecture, both in collaboration with Intel. 34 Mark Hill Mark D. Hill (www.cs.wisc.edu/~markhill) is professor in both the computer sciences department and the electrical and computer engineering department at the University of Wisconsin--Madison, where he also co-leads the Wisconsin Multifacet (www.cs.wisc.edu/multifacet/) project with David Wood. His research interests include parallel computer system design, memory system design, computer simulation, deterministic replay and transactional memory. He earned a PhD from University of California, Berkeley. He is an ACM Fellow and a Fellow of the IEEE. 35 Sarita Adve Sarita Adve is Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests are in computer architecture and systems, parallel computing, and power and reliabilityaware systems. Her honors include the Anita Borg Institute Women of Vision award in innovation, the ACM SIGARCH Maurice Wilkes award, the University Scholar recognition by the University of Illinois, and an Alfred P. Sloan Research Fellowship. She is a fellow of the ACM and the IEEE. She serves on the boards of the Computing Research Association and ACM SIGARCH. She received the Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993. 36 Emmitt Witchel Emmett Witchel is an associate professor in computer science at The University of Texas at Austin. He and his group are interested in operating systems, security, and architecture. Most of his current research is about secure systems, GPU systems, and concurrent systems. He received his doctorate from MIT in 2004. 37