DoD HPC Modernization Program & Move Toward Emerging Architectures Tom Dunn Naval Meteorology & Oceanography Command 20 November 2014 HPC RECENT TRENDS Per Top500 List RECENT 2014 DOD ACQUISITIONS EXPECTED PROCESSOR COMPETITION ONWARD TOWARD EXASCALE 2 Navy DoD SUPERCOMPUTING RESOURCE CENTER Peak Computational Performance (Teraflops) Estimates Follow Moore’s Law (~2x every 2 yrs) 1997 – .3 TFs 2001 – 8.4 TFs 2004 32 TFs 2006 – 58 TFs 2008 – 226 TFs 2012 2014 2015 2017 (Dec) (Jul) (Jul) (Jul) – 954 TFs – 2,556 TFs – 5,760 TFs est. –10,000 TFs est. 3 Navy DSRC Capabilities • One of the most capable HPC centers in the DoD and the nation • Chartered as a DoD Supercomputing Center in 1994 • Computational performance approximately doubles every two years; Currently 2,556 Teraflops • Systems reside on the Defense Research and Engineering Network (DREN) with 10 Gb connectivity – 19 Dec 2013 • 15% of Navy DSRC’s computational and storage capacity reserved for CNMOC activities operational use • R&D and CNMOC Ops are placed in separate system partitions and queues 4 Top500® Systems by Architecture, June 2006–June 2014 5 Number of CPUs in the Top500® Systems by Architecture Type, June 2006–June 2014 6 Number of Systems in the Top500® Utilizing Co-Processors or Accelerators, June 2009–June 2014 7 Number of Systems in the Top500® by Co-Processors or Accelerators Type, June 2009–June 2014 8 Number of Cores in the Top500® by Co-Processors or Accelerators Type, June 2011–June 2014 9 Number of Cores in the June 2014 Top500® by CPU Manufacturer JUN 2014 10 TOP 500 SUPERCOMPUTER LIST (JUNE 2014) BY OEM Supplier TOP 500 CRAY INC DELL 51 8 HEWLETT PACKARD 182 IBM 176 SGI 19 TOTAL 436 Other Suppliers 64 11 High Performance Computing Modernization Program 2014 HPC Awards Feb. 2014 Air Force Research Lab (AFRL) DSRC, Dayton, OH Cray XC-30 System (Lightning) - 1281 teraFLOPS - 56,880 Compute Cores (2.7 GHz Intel Ivy Bridge) - 32 NVIDIA Tesla K40 GPGPUs Navy DSRC, Stennis Space Center, MS Cray XC-30 (Shepard) - 813 teraFLOPS - 28,392 Compute Cores (2.7 GHz Intel Ivy Bridge) - 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60 core Intel Xeon 5120D Phi - 32 NVIDIA Tesla K40 GPGPUs Cray XC-30 (Armstrong) - 786 teraFLOPS - 29,160 Compute cores (2.7 GHz Intel Ivy Bridge) - 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60 core Intel Xeon 5120D Phi 12 High Performance Computing Modernization Program 2014 HPC Awards September 2014 Army Research Lab (ARL) DSRC, Aberdeen, MD Cray XC-40 System - 3.77 petaFLOPS - 101,312 compute cores (2.3 GHz Intel Xeon Haswell) - 32 NVIDIA Tesla K40 GPGPUs - 411 TB memory - 4.6 PB storage Army Engineer Research Development Center (ERDC) DSRC, Vicksburg, MS SGI ICE X System - 4.66 petaFLOPS - 125,440 compute cores (2.3 GHz Intel Xeon Haswell) - 32 NVIDIA Tesla K40 GPGPUs - 440 TB memory - 12.4 PB storage 13 High Performance Computing Modernization Program 2014/2015 HPC Awards Air Force Research Lab (AFRL) DSRC, Dayton, OH FY15 Funded OEM and Contract Award - TBD - 100,000+ compute cores - 3.5 – 5.0 petaFLOPS Navy DSRC, Stennis Space Center, MS FY15 Funded OEM and Contract Award - TBD - 100,000+ compute cores - 3.5 – 5.0 petaFLOPS 14 ECMWF (Top 500 List Jun 2014) 2 Cray XC30 Systems each with 81,160 compute cores (2.7 GHz Intel Ivy Bridge) 1,796 teraFLOPS NOAA NWS/NCEP Weather & Climate Operational Supercomputing System (WCOSS) Phase I 2 IBM iDataplex systems each with 10,048 compute cores (2.6 GHz Intel Sandy Bridge) 213 teraFLOPS Phase II (Jan 2015) Addition 2 IBM NeXtScale systems each with 24,192 compute cores (2.7GHz Intel Ivy Bridge) 585 teraFLOPS 15 UK Meterological Office IBM Power 7 System 18,432 compute cores (3.836 GHz) 565 teraFLOPS IBM Power 7 System 15,360 compute cores (3.876 GHz) 471 teraFLOPS ---------------------------------------------------------------------------------------------------27 Oct 2014 Announcement 128M Contract 2 Cray XC-40 systems (Intel Xeon Haswell initially) >13 times faster than current system total of 480,000 compute cores Phase 1a replace Power 7s by Sep 2015 Phase 1b extend both systems to power limit by Mar 2016 Phase 1c add one new system by Mar 2017 16 Expected Near Term HPC Processor Options 2016 Intel and ARM - Cray has ARM in-house for testing 2017 - Intel, ARM, & IBM Power 9 (with closely coupled NVIDIA GPUs) 17 DoD Applications & Exascale Computing • General external impression – In the 2024 timeframe, DoD will have no requirement for a balanced exascale supercomputer (untrue) – DoD should not be a significant participant in exascale planning for the U.S. (untrue) • Reality – DoD has compelling coupled multi-physics problems which will require more tightly-integrated resources than technologically possible in the 2024 timeframe – DoD has many other use cases which will benefit from the power efficiencies and novel technologies generated by the advent of exascale computing 18 HPCMP & 2024 DoD Killer Applications • HPCMP Categorizes Users Base into 11 Computational Technology Areas (CTAs) • Climate Weather Ocean (CWO) is one of 11 CTAs • Dr. Burnett (CNMOC TD) is the DoD HPCMP CWO CTA leader • Each CTA leader tasked in FY14 to project Killer Apps in their CTA • Dr. Burnett’s CWO CTA analysis lead by Lockheed Martin • Primary focus is on HYCOM but includes NAVGEM, and ESPC • Expect follow-on FY15 funding • Develop appropriate Kiviat diagrams (example to follow) • NRL Stennis part of an ONR sponsored NOPP project starting FY14 to look at attached processors (i.e. GPGPUs and accelerators) for HYCOM+CICE+WW3 19 Relevant Technology Issues • Classical computing advances may stall in the next 10 years – 22nm (feature size for latest processors) – 14nm (anticipated feature size in 2015) – 5-7nm (forecast limit for classical methods) – Recent 3D approaches currently used and dense 3D approaches contemplated, but have limitations • Mean-time-between-failures (MTBF) will decrease dramatically – Petascale (hours to days) – Exascale (minutes) • Data management exacale hurdles • Power management exascale hurdles 20 Relevant Software Issues • Gap between intuitive coding (i.e. readily relatable to domain science) and high performance coding will increase • Underpinnings of architectures will change more rapidly than codes can be refactored • Parallelism of underlying mathematics will become asymptotic (at some point) despite the need to scale to millions [if not billions] of processing cores • Current parallel code is based (in general) on synchronous communications; however, asynchronous methods may be necessary to overcome technology issues 21 Path Forward (Deliverables) [cont.] • Kiviat diagram conveying system architecture requirements for each impactful advent Future Computational Requirements for Hypersonic Flight Simulation PetaFLOPs 10,000 I/O bandwidth (terabytes/s) 1,000 100 Job duration (weeks) 10 Spirit: 1.5PF Reference System 1 Disk capacity (petabytes) 0 Memory capacity (petabytes) X-51: 1 Minute Flight Sim SR-72: 1 Minute Flight Sim Exascale Reference 1/(interconnect latency) (1/microseconds) Memory BW (petabytes/s) Interconnect BW (petabits/s) 22 March Toward Exascale Computing • Dept of Energy target for exascale in 2024 • Japan target for exascale in 2020 (with $1B gov assistance) • China target for exascale now in 2020 (originally in 2018) • HPCMP’s systems expected in 7 or 8 years – 100 petaflops 23