OpenPOWER Innovation: Redefining HPC Bradley McCredie VP/IBM Fellow POWER Systems Development President OpenPOWER Foundation mccredie@us.ibm.com Price/Performance History of Computing Relays / Mechanical Tubes Moore’s Law written Bipolar Integrated Circuits CMOS Multicore Bradley McCredie - 2/26/2015 2 Oct 10, 2003 – Expected Processor Freq Path Processor Frequency 12000 Processor MHz 10000 Alpha AMD HP IA32 P2,3 IA32 P4 IA64 Power3 IBM RS64 IBM Giga Sun 40% / yr 35% / yr IA32 MP 8000 6000 4000 2000 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Year 3Bradley McCredie - 2/26/2015 3 Nov 11, 2004 – Modified Path, One Year Later Processor Frequency 6000 Processor MHz 5000 Alpha AMD HP IA32 P2,3 IA32 P4 IA64 Power3 IBM RS64 IBM Giga Sun 40% / yr 35% / yr IA32 MP 4000 3000 2000 1000 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year 4Bradley McCredie - 2/26/2015 4 Power Systems Development of Commercial Aviation 10,000 Aircraft Speed (mph) Concorde Boeing 707 DeHaviland Comet 1,000 Speed of Sound in Air Boeing Stratoliner Boeing 247 Ford Trimotor 100 DC-3 Curtiss Flying Boat 10 Wright Brothers 1 1900 1920 1940 1960 1980 2000 Source: F. M. Schellenberg, UCB Bodega Bay, 5-10-2007 5 © 2014 IBM Corporation 2020 Power Systems Commercial Aviation 10,000 Aircraft Speed (mph) Concorde Boeing 707 DeHaviland Comet 1,000 Speed of Sound in Air Boeing Stratoliner Boeing 247 Ford Trimotor 777 747 767 A-320 DC-10 MD-90 100 DC-3 Curtiss Flying Boat 10 Wright Brothers 1 1900 1920 1940 1960 1980 2000 Source: F. M. Schellenberg, UCB Bodega Bay, 5-10-2007 6 © 2014 IBM Corporation 2020 Power Systems 425 HP 425ci Hemi 425 HP 6.1L Hemi 13 Sec ¼ mile @ 108 MPH 13.3 Sec ¼ mile @ 107 MPH 6.0 Sec 0-60 MPH 4.7 Sec 0-60 MPH 7 © 2014 IBM Corporation Power Systems FM Radio Front Disc Brakes Lap Belts Only 6.5 MPG 8 © 2014 IBM Corporation AM/FM/CD/DVD + 30G Storage + iPod Keyless Entry w/Remote Start Navigation System, Voice Activated Run Flat Tires (no spare) 4 Wheel Disc Antilock Brakes + Air Bags Lighted Cup Holders, Center Console 14 MPG Power Systems Industry Trends Generate New Opportunities Microprocessors are no longer driving sufficient Cost/Performance improvements (Core cost/perf) Processors Semiconductor Technology At constant technology costs 9 © 2014 IBM Corporation IBM Confidential Industry trends drive innovation beyond the chip… Microprocessors alone no longer drive sufficient Price/Performance improvements System Stack Applications and Services Processors Systems Management & Cloud Deployment Semiconductor Technology Systems Acceleration & HW/SW Optimization Firmware, Operating System and Hypervisor Processors Use Cases • • • • Workload Acceleration Services Delivery Model Advanced Memory Tech Network & I/O Accel Semiconductor Technology POWER8 Linux OpenPOWER System stack innovations are required to drive Price/Performance Bradley McCredie - 2/26/2015 10 Fueling an Open Development Community Implementation / HPC / Research System / Software / Integration I/O / Storage / Acceleration Boards / Systems Chip / SOC Complete member list at www.openpowerfoundation.org Innovation with POWER Technology GPU/Other NVLINK POWER Processors CAPI/PCI • 12 DMI Memory Interface Control Server Class Memory IBM & Partner Devices Innovation with IBM and Partners is taking place on all interfaces • Wide variety of innovation strategies (Many not depicted) • Leveraging different aspects of system design • All targeting price/performance leadership IBM Confidential OpenPOWER Innovations Targeting HPC Altera FPGA acceleration and IBM CAPI Monte Carlo 250x faster than POWER8 core US Dept of Energy $325M super computing contract awarded to IBM, Mellanox, and NVIDIA alone, reduced C code 40x over non-CAPI FPGA DoE systems for science and stockpile stewardship Data Engine for NoSQL 24:1 server consolidation, 3x lower cost per user, 40TB CAPI-attached flash Sierra and Summit systems to be >100 PF, 2 GB/core main memory, local NVRAM, and science performance 4x-8x Titan or Sequoia CAPI dev kit with FPGA card from Nallatech NVIDIA acceleration built into IBM Power S824L Tyan OpenPOWER Customer Reference System 8x faster than x86 Ivy Bridge on pattern extraction 82x faster for Cognos BI and DB2 BLU 13 © 2014 OpenPOWER Foundation NVLink Interconnect Differences System Hardware Design Graphics Memory PCIe Connection CPU 16+16 GB/s Graphics Memory System Memory 40+40 GB/s GPU NVLink GPU P8’ GPU System Memory Graphics Memory Current GPU Attach Future NVLink GPU Attachment “NVLink will help improve GPU-GPU peer-to-peer communications, eliminating the need to transfer data via the PCIe bus. It would also allow one or more GPUs to access the system RAM much quicker. The new protocol will debut in 2016 with Pascal, which was revealed to be NVidia's next GPU architecture.” CAPI vs. I/O Device Driver: Data Prep Typical I/O Model Flow: Total ~13µs for data prep Copy or Pin Source Data DD Call 300 Instructions MMIO Notify Accelerator 10,000 Instructions 7.9µs Acceleration Application Dependent, but Equal to below Poll / Interrupt Completion Copy or Unpin Result Data 1,000 Instructions 3,000 Instructions 1,000 Instructions 4.9µs Flow with a Coherent Model: Total 0.36µs Shared Mem. Notify Accelerator 400 Instructions 0.3µs Acceleration Application Dependent, but Equal to above Ret. From DD Completion Shared Memory Completion 100 Instructions 0.06µs Coherent Accelerator Processor Interface (CAPI) Overview CAPI FPGA IBM-Supplied POWER Service Layer CAPP PCIe Accelerator Function Unit (AFU) POWER8 Processor Typical I/O model flow DD Call Copy or Pin Source Data MMIO Notify Accelerator Acceleration Poll / Interrupt Completion Copy or Unpin Result Data Ret. From DD Completion Flow with a coherent model Shared Mem. Notify Accelerator Acceleration Shared Memory Completion Advantages of coherent attachment over I/O attachment Virtual addressing and data caching – Shared memory – Lower latency for highly referenced data Easier, more natural programming model – Traditional thread-level programming – Long latency of I/O typically requires restructuring of application Enables applications not possible on I/O – Pointer chasing, and so on CAPI and Networking Opportunities Paving the Road to Exascale with OpenPOWER Technology IBM / Mellanox / NVIDIA Collaboration Landscape CUDA 5.5 Programming Model P8 Tuleta - 4U 2 P8 (+ 2 GPU) PCIe Gen3 CAPI Power Systems CUDA 7 Open MP 4.0 CUDA 8 Open ACC Open MP 4.0 Firestone - 2U 2 P8 + 2 GPU PCIe Gen3 CAPI HPC Next - 2U CAPI NVLink CUDA 9 OpenMP 4.x HPC Future- 2U Enhanced CAPI Enhanced NVLink HPC Future - 2U Enhanced CAPI Enhanced NVLink Air/Water Cooled Adapters Mellanox Interconnect Technology Switches CPU Links 2014 Chip Technology GPUs CPU Power8 Connect-IB ConnectX-4 ConnectX-5 (dual ports) (dual ports) (dual ports) FDR InfiniBand EDR InfiniBand PCI-express Gen3 CAPI over PCI-express Gen3 NVIDIA GPU2015 (GK210) Power8 2016 GPU NVIDIA (GP100) Power Next NVLink JDA 2014 Air Cooled 2015 2016 HDR InfiniBand Enhanced CAPI over PCI-express Gen4 2017 GPU NVIDIA (GV100) Power Future Enhanced NVLink 2017 Road to Exascale Summary • The IT industry is being disrupted and being driven by disruptions • Open hardware and open systems will be one of those disruptions • Technology innovations will come from many unpredictable places • POWER technology is well positioned to exploit these new trends