IBM Blue Gene/P Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal Of Technology And Innovation Ninth time IBM has received nation's most prestigious tech award Blue Gene has led to breakthroughs in science, energy efficiency and analytics WASHINGTON, D.C. - 18 Sep 2009: President Obama recognized IBM (NYSE: IBM) and its Blue Gene family of supercomputers with the National Medal of Technology and Innovation, the country's most prestigious award given to leading innovators for technological achievement. President Obama will personally bestow the award at a special White House ceremony on October 7. IBM, which earned the National Medal of Technology and Innovation on eight other occasions, is the only company recognized with the award this year. Blue Gene's speed and expandability have enabled business and science to address a wide range of complex problems and make more informed decisions -- not just in the life sciences, but also in astronomy, climate, simulations, modeling and many other areas. Blue Gene systems have helped map the human genome, investigated medical therapies, safeguarded nuclear arsenals, simulated radioactive decay, replicated brain power, flown airplanes, pinpointed tumors, predicted climate trends, and identified fossil fuels – all without the time and money that would have been required to physically complete these tasks. The system also reflects breakthroughs in energy efficiency. With the creation of Blue Gene, IBM dramatically shrank the physical size and energy needs of a computing system whose processing speed would have required a dedicated power plant capable of generating power to thousands of homes. The influence of the Blue Gene supercomputer's energy-efficient design and computing model can be seen today across the Information Technology industry. Today, 18 of the top 20 most energy efficient supercomputers in the world are built on IBM high performance computing technology, according to the latest Supercomputing 'Green500 List' announced by Green500.org in July, 2009. IBM Blue Gene/P Solution: Expanding the Limits of Breakthrough Science Blue Gene Technology Roadmap Performance Blue Gene/Q Power Multi Core Scalable to 20 PF Blue Gene/P (PPC 450 @ 850MHz) Scalable to 3.56 PF Blue Gene/L (PPC 440 @ 700MHz) Scalable to 595 TFlops 2004 2007 2010 Note: All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. IBM® System Blue Gene®/P Solution © 2007 IBM Corporation BlueGene Roadmap • BG/L (5.7 TF/rack) – 130nm ASIC (1999-2004GA) – 104 racks, 212,992 cores, 596 TF/s, 210 MF/W; dual-core system-on-chip, – 0.5/1 GB/node • BG/P (13.9 TF/rack) – 90nm ASIC (2004-2007GA) – 72 racks, 294,912 cores, 1 PF/s, 357 MF/W; quad core SOC, DMA – 2/4 GB/node – SMP support, OpenMP, MPI • BG/Q – 20 PF/s TOP500 Performance Trend 10000 IBM has most aggregate performance for last 20 lists IBM has #1 system for last 10 lists (13 in total) 22.6 PF 1.1 PF 1000 275 TF 100 17.1 TF 10 1 0.1 0.01 Source: www.top500.org 0.001 0.0001 93 / 93 06 / 94 11 / 94 06 / 95 11 / 95 06 / 96 11 / 96 06 / 97 11 / 97 06 / 98 11 / 98 06 / 99 11 / 99 06 / 00 11 / 00 06 '0 /07 1/ 01 06 / 02 11 / 02 06 / 03 11 / 03 06 / 04 11 / 04 06 / 05 11 / 05 06 / 06 11 / 06 06 / 07 11 / 07 06 / 08 11 / 08 06 / 09 11 /0 6 Rmax Performance (GFlops) 100000 Blue Square Markers Indicate IBM Leadership HPCC 2008 IBM BG/P 365 TF linpack (32 racks, 450TF peak) Number 1 on FFT (4485.72) Number 1 on Random Access (6.82) Cray XT5 1059 TF linpack Number 1 on HPL Number 1 on Stream Source: www.top500.org November 2007 Green 500 Linpack GFLOPS/W 0.40 0.37 0.35 0.30 0.25 0.21 0.20 0.15 0.15 0.09 0.10 0.08 0.05 0.05 0.00 0.05 0.02 BG/L BG/P SGI 8200 HP Cray Cluster Sandia Cray Cray ORNL NERSC JS21 BSC Relative power, space and cooling efficiencies (Published specs per peak performance) 400% 300% 200% IBM BG/P 100% 0% Racks/TF kW/TF Sun/Constellation IBM System Blue Gene®/P Solution Sq Ft/TF Cray/XT4 Tons/TF SGI/ICE System BlueGene/P 72 Racks, 72x32x32 Cabled 8x8x16 Rack 32 Node Cards Node Card 1 PF/s 144 (288) TB (32 chips 4x4x2) 32 compute, 0-1 IO cards 13.9 TF/s 2 (4) TB Compute Card 435 GF/s 64 (128) GB 1 chip, 20 DRAMs Chip 4 processors 13.6 GF/s 8 MB EDRAM 13.6 GF/s 2.0 GB DDR2 (4.0GB 6/30/08) BlueGene/P compute ASIC 32k I1/32k D1 PPC450 Snoop filter 128 Multiplexing switch L2 Double FPU 32k I1/32k D1 PPC450 snoop Snoop filter 128 256 Shared L3 Directory for eDRAM 256 512b data 72b ECC 4MB eDRAM L3 Cache or On-Chip Memory w/ECC L2 Double FPU 32 PPC450 Snoop filter 128 L2 Double FPU 32k I1/32k D1 PPC450 Double FPU Shared SRAM Multiplexing switch 32k I1/32k D1 Shared L3 Directory for eDRAM w/ECC Snoop filter 512b data 72b ECC 4MB eDRAM L3 Cache or On-Chip Memory 128 L2 Arb DMA Hybrid PMU w/ SRAM 256x64b JTAG Access JTAG Torus 6 3.4Gb/s bidirectional Collective 3 6.8Gb/s bidirectional Global Barrier Ethernet 10 Gbit 4 global barriers or interrupts 10 Gb/s DDR-2 Controller w/ ECC DDR-2 Controller w/ ECC 13.6 Gb/s DDR-2 DRAM bus Execution Modes in BG/P per Node Next Generation HPC node core core core core Hardware Abstractions Black Software Abstractions Blue Quad Mode (VNM) 4 Processes 1 Thread/Process P0 P2 T0 T0 P1 P3 T0 T0 – Many Core – Expensive Memory – Two-Tiered Programming Model SMP Mode 1 Process 1-4 Threads/Process Dual Mode 2 Processes 1-2 Threads/Process P0 P1 P0 P0 T0 T0 T0 T2 T0 T1 T1 T1 T3 T1 P0 P1 T0 T0 P0 T0 P0 T2 T0 T1 IBM System Blue Gene®/P Solution © 2007 IBM Corporation BG/P 4 core compute card (target 100 FITs – 25% SER) 2 x 16GB interface to 2 or 4 GB SDRAM-DDR2 NVRAM, monitors, decoupling, Vtt termination All network and IO, power input BGQ ASIC 29mm x 29mm FC-PBGA IBM Research & STG BPC Node Card Optional IO card (one of 2 possible) with 10Gb optical link Page 14 32 Compute nodes Local DC-DC regulators (6 required, 8 with redundancy) First BG/P Rack (2 Midplanes) IBM Research & STG Page 15 Hydro-Air Concept for BlueGene/P (drawn to scale) 36” Air-Cooled BG/L 25 kW/Rack 3000 CFM/Rack 48” Air-Cooled BG/P 40 kW/Rack 5000 CFM/Rack 36” Key: BG Rack with Cards and Fans Airflow Air Plenum Air-to-Water Heat Exchanger Hydro-Air Cooled BG/P 40 kW/Rack 5000 CFM/Row 11 Main Memory Capacity per Rack 4500 4000 3500 3000 2500 2000 1500 1000 500 0 LRZ IA64 Cray ASC XT4 Purple RR BG/P Sun TACC SGI ICE Peak Memory Bandwidth per node (byte/flop) SGI ICE Sun TACC Itanium 2 POWER5 Cray XT5 4 core Cray XT3 2 core Roadrunner BG/P 4 core 0 0.5 1 1.5 2 Main Memory Bandwidth per Rack 14000 12000 10000 8000 6000 4000 2000 0 LRZ Itanium Cray XT5 ASC Purple RR BG/P Sun TACC SGI ICE BlueGene/P Interconnection Networks 3 Dimensional Torus Interconnects all compute nodes (73,728) Virtual cut-through hardware routing 3.4 Gb/s on all 12 node links (5.1 GB/s per node) 0.5 µs latency between nearest neighbors, 5 µs to the farthest MPI: 3 µs latency for one hop, 10 µs to the farthest Communications backbone for computations 1.7/3.9 TB/s bisection bandwidth, 188TB/s total bandwidth Collective Network One-to-all broadcast functionality Reduction operations functionality 6.8 Gb/s of bandwidth per link per direction Latency of one way tree traversal 1.3 µs, MPI 5 µs ~62TB/s total binary tree bandwidth (72k machine) Interconnects all compute and I/O nodes (1152) Low Latency Global Barrier and Interrupt Latency of one way to reach all 72K nodes 0.65 µs, MPI 1.6 µs Interprocessor Peak Bandwidth per node (byte/flop) Roadrunner Dell Myrinet x86 cluster Sun TACC Itanium 2 Power5 NEC ES Cray XT4 2c Cray XT5 4c BG/L,P 0 0.2 0.4 0.6 0.8 Failures per Month per @ 100 TFlops (20 BG/L racks) unparalleled reliability 800 800 700 600 500 400 IA64 X86 Power5 Blue Gene 394 300 200 127 100 0 1 Results of survey conducted by Argonne National Lab on 10 clusters ranging from 1.2 to 365 TFlops (peak); excluding storage subsystem, management nodes, SAN network equipment, software outages Reproducibility of Floating Point Operations The example below is illustrated with single precision floating point operations (~7 digits accuracy), the same principle applies to double precision floating point operations (~14 digits accuracy) A = 1234567 B = 1234566 C = 0.1234567 A-B+C = 1.123457 A+C-B = 1 Caution: floating point with finite number of digits in accuracy is not commutative, and not associative. BG/L,P enforces execution order, so all calculations are reproducible. IBM® System Blue Gene®/P Solution: Expanding the Limits of Breakthrough Science Summary Blue Gene/P: Facilitating Extreme Scalability – Ultrascale capability computing when nothing else will satisfy – Provides customer with enough computing resources to help solve grand challenge problems – Provide competitive advantages for customers’ applications looking for extreme computing power – Energy conscious solution supporting green initiatives – Familiar open/standards operating environment – Simple porting of parallel codes Key Solution Highlights – Leadership performance, space saving design, low power requirements, high reliability, and easy manageability IBM® System Blue Gene®/P Solution © 2007 IBM Corporation Backup Current HPC Systems Characteristics IBM POWER 6 (IH) IBM Blue Gene/P IBM Roadrunner (Cell) ORNL (AMD/Cray) GF/socket 37.6 @ 4.7GHz 13.6 108.8 36.8 @ 2.3GHz TF/rack 7.2 (192 chips) 13.9 (1024 chips) 5.3 7.066 (192 chips) GB/core 8 0.5/1 0.5 (per SPE) 2 GB/rack 3072 2048/4096 192(AMD) +192(cell) 1536 Mem BW byte/flop 1.5 1 0.25 0.696 Mem BW in TB/s 2.7 13.9 1.3 4.915 P-P Interconnect byte/flop 0.1 0.75 0.008 0.17 Kilo-watt/rack Linpack MF/W 70 104 31.1 375 9.4 437 22 152 Space/100TF ft2 375 170 350 280 racks/PF peak 139 72 186 173 TB/PF peak 427 147/295 36 + 36 220 June 2008 Blue Gene/L Customers with 232 racks sold! Advanced Industrial Science and Technology (AIST Yutaka Akiyama) ƒ Argonne National Laboratory Consortium (Rick Stevens) ƒ ASTRON Lofar, Holland - Stella (Kjeld van der Schaaf) ƒ Boston University ƒ Brookhaven National Laboratory/SUNY at Stony Brook (NewYorkBlue) ƒ Centre of Excellence for Applied Research and Training (CERT, UAE) ƒ CERFACS ƒ Council for the Central Laboratory of the Research Councils (CCLRC) ƒ DIAS at HEANet ƒ Ecole Polytechnique Federale de Lausanne (EPFL, Henry Markram) ƒ Electricite de France (EDF),France ƒ Forschungszentrum Jülich GmbH (Thomas Lippert) ƒ Harvard University (geophysics, computational chemistry) ƒ IBM Yorktown Research Center (BGW) ƒ IBM Almaden Research Center ƒ IBM Zürich Research Lab ƒ India Institute of Science (IISc), Bangalore ƒ Iowa State University (Srinivas Aluru for genome classification) ƒ Karolinska Institutet (neuroscience) ƒ KEK, High Energy Accelerator Research Org. (Shoji Hashimoto) ƒ Lawrence Livermore National Laboratory (Mark Seager) ƒ MIT (John Negele) ƒ NARSS/MCIT ƒ National Center for Atmospheric Research (NCAR Richard Loft) ƒ New Intelligent Workstation Systems Co. Ltd. (NIWS Ikuo Suesada) ƒ Princeton University – (Orangena) ƒ Rensselaer Polytechnic Institute- (CCNI) ƒ RIKEN ƒ San Diego Supercomputing Center (Wayne Pfeiffer) - Intimidata ƒ University of Alabama, Birmingham ƒ University of Canterbury, NZ (Blue Fern) ƒ University of Edinburgh (Richard Kenway) ƒ University of Minnesota (Hormel Institute) ƒ University of North Carolina, Chapel Hill (RENCI, Dan Reed) ƒ 4 racks 1 rack 6 racks 1 rack 18 racks 1 rack 1 rack 2 racks 2 racks 4 racks 4 racks 8 racks 2 racks 20 racks 2 racks 2 rack 4 racks 1 rack 1 rack 10 racks 105 racks 1 rack 1 rack 1 rack 1 rack 1 rack 17 racks 1 rack 3 racks 1 rack 2 racks 1 rack 1 rack 2 racks 2/05 12/04 3/30/05, replaced with BG/P 2004 2007 09/06 07/07 1/07 2008 06/05 10/06 12/05 06/06 05/05 03/05 03/05 9/07 12/05 1/07 03/01/06 09/05, 08/07 09/05 2007 3/05 1/05 9/05 5/07 2007 12/17/04,11/06 2007 2007 12/04 2008 4Q06, 1Q07 Blue Gene/P Customers Argonne National Laboratory, Intrepid 40 racks, Surveyor 1 rack, Intrepid 40 41 racks ƒ ASTRON 3 racks ƒ Brookhaven/Stony Brook Consortium 2 racks ƒ Council for the Central Laboratory of the Research Councils (CCLRC) 1 rack ƒ CHPC, South Africa 1 rack ƒ Dassault 1 rack ƒ Dublin Institute for Advanced Studies (DIAS) on HEANet 1 racks ƒ Ecole Polytechnique Federale de Lausanne (EPFL, Henry Markram) 4 racks ƒ Electricite de France (EDF), France 8 racks 2008 ƒ Forschungszentrum Jülich GmbH JuGene (Thomas Lippert) 72 racks ƒ Fritz Haber Institute (IPP) 2 rack ƒ IBM On Demand Center (JEMTs) 4 racks ƒ IBM Yorktown Research Center (BGW) 4 racks ƒ IBM Zurich 1 rack ƒ ICT, Bulgaria 2 racks ƒ Institute for Development and Resources in Intensive Scientific computing (IDRIS, France) Laboratoire Bordelais de Recherche en Informatique (LABRI) 10 racks ƒ KAUST 16 racks ƒ Lawrence Livermore National Laboratory 38 racks ƒ Moscow State University, Russia 2 racks ƒ Oak Ridge National Laboratory (up to 16?) 2 racks ƒ RZG/Max-Planck-Gesellschaft /Fritz Haber Ins., IPP Institut für Plasma Physik 3 racks ƒ Science & Technology Facilities Control (STFC) at Daresbury 1 rack ƒ Tata Institute of fundamental Research (TIFR) in India 1 rack ƒ University of Rochester, NY 1 rack ƒ – – Total BG/P Racks: Total BG/L Racks: 220 232 24 sites 34 sites 9 in 07, 32 in 08 2008 2007 2007 2008 2008 2007 07/09 16 in 07, 16 in 08, 40 in 09 2007 2008 2008 2008 2008 2008 2009 2009 2008 2007 2 in 07, 1 in 08 2007 2008 2009