Exascale Computing Initiative (ECI) Steve Binkley DOE/ASCR Bob Meisner NNSA/ASC April 1, 2015 Exascale Applications Respond to DOE/NNSA Missions in Discovery, Design, and National Security Scientific Discovery – Mesoscale materials and chemical sciences – Improved climate models with reduced uncertainty Engineering Design National Security Nuclear power reactors Stockpile stewardship Advanced energy technologies Real-time cybersecurity and incident response Resilient power grid Advanced manufacturing Blue Bold Text indicates planned or existing exascale application projects 1 Stockpile Stewardship Challenges Nuclear Stockpile • Safety • Surety • Reliability • Robustness Thermonuclear burn p, D, T, He3,He4 Weapons Science Atomic Physics Δτ Burn ~ 10 12 sec Δτ ee 10 15 sec Coulomb Collisions Burning Plasma Radiation (Photons) Debye screening Quantum interference and diffraction Non-Proliferation and Nuclear Counter Terrorism 2 Spontaneous and stimulated emission Hydrodynamics 2 Mission: Extreme Scale Science Next Generation of Scientific Innovation • DOE's mission is to push the frontiers of science and technology to: – Enable scientific discovery – Provide state-of-the-art scientific tools – Plan, implement, and operate user facilities • The next generation of advancements will require Extreme Scale Computing – 1,000X capabilities of today's Petaflop computers with a similar size and power footprint • Extreme Scale Computing, however, cannot be achieved by a “business-as-usual” evolutionary approach • Extreme Scale Computing will require major novel advances in computing technology – Exascale Computing Exascale Computing Will Underpin Future Scientific Innovations 3 Exascale Computing Initiative • Top-Line Messages: – This effort is driven by the need for significant improvements in computer performance to enable future scientific discoveries. – The Department is developing a plan that will result in the deployment of exascale-capable systems by early in the next decade. – The budget request preserves options consistent with that timeline and keeps the U.S. globally competitive in high performance computing. – It is important to emphasize that this is a major research and development effort to address and influence significant changes in computing hardware and software, and our ability to use computers for scientific discovery and engineering. It is not a race to deploy the first exaflop machine. 4 Exascale Challenges and Issues • Four primary challenges must be overcome – – – – Parallelism / concurrency Reliability / resiliency Energy efficiency Memory / Storage • Productivity issues – Managing system complexity – Portability / Generality • System design issues – Scalability – Time to solution – Efficiency • Extensive Exascale Studies – US (DOE, DARPA, … ), Europe, Japan, … 5 Impact of No ECI: What’s at Stake? • Power restrictions will limit the performance of future computing systems – Without ECI, industry will build an energy- and footprint-inefficient point solution • Declining US leadership in science, engineering, and national security – HPC is the foundation of the nation’s nuclear security and economic leadership – International R&D investment already surpassing US – Asia and Europe: China’s Tianhe-2 is #1 (HPL); EU’s Mont Blanc with ARM • Increasing dependence on foreign technology – Countries could exert export controls enforced against us – There will be unacceptable cybersecurity and computer supply chain risks 6 DOE Exascale Computing Initiative (ECI) R&D Goals • Develop a new era of computers: exascale computers – Sustained 1018 operations/second and required storage for broader range of mission-critical applications – Create extreme-scale computing: approximately 1,000X performance of today's computers within a similar size, cost, and power footprint – Foster new generation of scientific, engineering, and large-data applications • Create dramatically more productive systems – Usable by a wide variety of scientists and engineers for more problem areas – Simplifies efficiency and scalability for shorter time to solution and science result • Develop marketable technologies – Set industry on new trajectory of progress – Exploit economies of scale and trickle-bounce effect • Prepare for “Beyond Exascale” 7 What is Exascale Computing? • What Exascale computing is not – Exaflops Linpack Benchmark Computer – Just a billion floating-point arithmetic units packaged together • What is Exascale computing? – 1,000X performance over a “petaflop” system (exaflops sustained performance on complex, real-world applications) – Similar power and space requirements as a petaflops computer – High programmability, generality, and performance portability 8 Key Performance Goals for an exascale computer (ECI) Parameter Performance Power Cabinets System Memory Sustained 1 – 10 ExaOPS 20 MW 200 - 300 128 PB – 256 PB Reliability Consistent with current platforms Productivity Scalable benchmarks Throughput benchmarks Better than or consistent with current platforms Target speedup over “current” systems … Target speedup over “current” systems … ExaOPS = 1018 Operations / sec 9 Exascale Target System Characteristics • 20 pJ per average operation • Billion-way concurrency (current systems have Million-way) • Ecosystem to support new application development and collaborative work, enable transparent portability, accommodate legacy applications • High reliability and resilience through self-diagnostics and self-healing • Programming environments (high-level languages, tools, …) to increase scientific productivity 10 Exascale Computing We Need to Reinvent Computing Traditional path of 2x performance improvement every 18 months has ended • For decades, Moore's Law plus Dennard scaling provided more, faster transistors in each new process technology • This is no longer true – we have hit a power wall! • The result is unacceptable power requirements for increased performance We cannot procure an exascale system based on today's or projected future commodity technology • Existing HPC solutions cannot be usefully scaled up to exascale • Energy consumption would be prohibitive (~300MW) Exascale will require partnering with U.S. computing industry to chart the future • Industry at a crossroads and is open to new paths • Time is right to push energy efficiency into the marketplace Exascale vs. Predecessor Computers Parameter Accepted Sequoia (CPU) Summit & Sierra CPU-GPU Titan (CPU-GPU) Exascale 2013 2013 2018 8 9 10 ~ 20 20.13 27.11 150 > 1,000 96 200 192 > 200 98,304 18,688 3,500 TBD System Memory (TB) 1,573 710 Linpack performance (PF) 17.17 17.59 Power (MW) Peak Performance (PF) Cabinets Nodes 2,100 > 128,000 12 ECI Strategy • Integrate applications, acquisitions, and research and development • Exploit co-design process, driven by the full application workflow • Develop exascale software stacks • Partner with and fund vendors to transition research to product space • Collaborate with other government agencies and other countries, as advantageous 13 Partnership with Industry is Vital • We need industry involvement – Don't want one-off, stove-piped solutions that are obsolete before they're deployed – Need continued “product” availability and upgrade potential beyond the lifetime of this initiative • Industry needs us – Business model obligates industry to optimize for profit, beat competitors – Internal investments heavily weighted towards near-term, evolutionary improvements with small margin over competitors – Funding for far-term technology is limited ($) and constrained in scope • How do we impact industry? – Work with those that have strong advocate(s) within the company – Fund development and demonstration of far-term technologies that clearly show potential as future mass-market products (or mass market components of families of products)* • *Corollary: do not fund product development – Industry has demonstrated that it will incorporate promising technologies into future product lines * Industrial contractor, private communication. 14 FY2011: MOU between the SC and NNSA for the Coordination of Exascale Activities Exascale Co-Design Centers Funded Request for Information: Critical and Platform Technologies DOE Progress Towards Exascale FY2012: Programming Environments (X-Stack) FastFoward 1: Vendor Partnerships on Critical Component technologies FY2013: Exascale Strategy Plan to Congress Operating System / Runtime (OS/R) DesignForward 1: Vendor Partnerships on Critical System-level technologies Meeting with Secretary Moniz, “go get a solid plan with defendable cost” FY2014: Meetings with HPC vendors to validate ECI timeline, update on exascale plans and costs Established Nexus / Plexus lab structure – determine software plans and costs FastForward 2: Exascale Node designs External Review of “Exascale Preliminary Project Design Document (EPPDD)” FY2015: DesignForward 2: Conceptual Designs of Exascale Systems Release to ASCAC “Preliminary Conceptual Design for an Exascale Computing Initiative” Generate requirements for exascale systems to be developed and deployed in FY-2023 Develop and release FOAs and RFPs, for funding in FY-2016 FY2016: Initiate the Exascale Computing Initiative (ECI) 15 Schedule Baseline 16 Exploit Co-Design Process Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) – Director: Timothy Germann (LANL) – http://www.exmatex.org Center for Exascale Simulation of Advanced Reactors (CESAR) – Director: Andrew Siegel (ANL) – https://cesar.mcs.anl.gov Center for Exascale Simulation of Combustion in Turbulance (ExaCT) – Director: Jacqueline Chen (SNL) – http://exactcodesign.org 17 Current partnerships with vendors (jointly funded by SC & NNSA) Fast Forward Program – node technologies • Phase 1: Two year contracts, started July 1, 2012 ($64M ) • Phase 2: Two year contracts, started Fall 2014 ($100M) • Performers: AMD, Cray, IBM, Intel, NVIDIA Project Goals & Objectives • Initiate partnerships with multiple companies to accelerate the R&D of critical node technologies and designs needed for extreme-scale computing. • Fund technologies targeted for productization in the 5–10 year timeframe. Design Forward Program – system technologies • Phase 1: Two year contracts, started Fall 2013 ($23M) • Phase 2: Two year contracts. started Winter 2015 ($10M) • Performers: AMD, Cray, IBM, Intel, NVIDIA • Project Goals & Objectives • Initiate partnerships with multiple companies to accelerate the R&D of interconnect architectures and conceptual designs for future extreme-scale computers. • Fund technologies targeted for productization in the 5–10 year timeframe. 18 FY-2016 ECI Cross-cut (in $K) FY 2015 Enacted NNSA ASC: Advanced Technology Development and Mitigation SC ASCR: Mathematical, Computational, and Computer Sciences Research ASCR: High Performance Computing and Network Facilities BER BES SC Total Exascale Total FY 2016 Request FY 2016 vs FY 2015 50,000 64,000 +14,000 41,000 43,511 +2,511 50,000 --8,000 99,000 134,383 18,730 12,000 208,624 +84,383 +18,730 +4,000 +109,624 149,000 272,624 +123,624 19 ECI Major Risks • Maintaining strong leadership and commitment from the US government. • Achieving the extremely challenging power and productivity goals. • Decreasing reliability as power efficiency and system complexity/concurrency increase. • Vendor commitment and stability; deployment of the developed technology. 21 Summary • Leadership in high-performance computing (HPC) and large-scale data analysis will advance national competitiveness in a wide array of strategic sectors, including basic science, national security, energy technology, and economic prosperity. • The U.S. semiconductor and HPC industries have the ability to develop the necessary technologies for an exascale computing capability early in the next decade. • An integrated approach to the development of hardware, software, and applications is required for the development of exascale computers. • ECI’s goal is to deploy two capable exascale computing systems. 22 END 23