BG/P Draft Disclosure Deck - Bulgarian Supercomputing Centre

advertisement
IBM Blue Gene/P
Dr. George Chiu
IEEE Fellow
IBM T.J. Watson Research Center
Yorktown Heights, NY
President Obama Honors IBM's Blue Gene
Supercomputer With National Medal Of Technology
And Innovation
Ninth time IBM has received nation's most prestigious tech award
Blue Gene has led to breakthroughs in science, energy efficiency and analytics
WASHINGTON, D.C. - 18 Sep 2009: President Obama recognized IBM (NYSE: IBM) and its Blue
Gene family of supercomputers with the National Medal of Technology and Innovation, the
country's most prestigious award given to leading innovators for technological achievement.
President Obama will personally bestow the award at a special White House ceremony on
October 7. IBM, which earned the National Medal of Technology and Innovation on eight other
occasions, is the only company recognized with the award this year.
Blue Gene's speed and expandability have enabled business and science to address a wide
range of complex problems and make more informed decisions -- not just in the life sciences, but
also in astronomy, climate, simulations, modeling and many other areas. Blue Gene systems
have helped map the human genome, investigated medical therapies, safeguarded nuclear
arsenals, simulated radioactive decay, replicated brain power, flown airplanes, pinpointed tumors,
predicted climate trends, and identified fossil fuels – all without the time and money that would
have been required to physically complete these tasks.
The system also reflects breakthroughs in energy efficiency. With the creation of Blue Gene, IBM
dramatically shrank the physical size and energy needs of a computing system whose processing
speed would have required a dedicated power plant capable of generating power to thousands of
homes.
The influence of the Blue Gene supercomputer's energy-efficient design and computing model
can be seen today across the Information Technology industry. Today, 18 of the top 20 most
energy efficient supercomputers in the world are built on IBM high performance computing
technology, according to the latest Supercomputing 'Green500 List' announced by Green500.org
in July, 2009.
IBM Blue Gene/P Solution: Expanding the Limits of Breakthrough Science
Blue Gene Technology Roadmap
Performance
Blue Gene/Q
Power Multi Core
Scalable to 20 PF
Blue Gene/P
(PPC 450 @ 850MHz)
Scalable to 3.56 PF
Blue Gene/L
(PPC 440 @ 700MHz)
Scalable to 595 TFlops
2004
2007
2010
Note: All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent
goals and objectives only.
IBM® System Blue Gene®/P Solution
© 2007 IBM Corporation
BlueGene Roadmap
• BG/L (5.7 TF/rack) – 130nm ASIC (1999-2004GA)
– 104 racks, 212,992 cores, 596 TF/s, 210 MF/W; dual-core
system-on-chip,
– 0.5/1 GB/node
• BG/P (13.9 TF/rack) – 90nm ASIC (2004-2007GA)
– 72 racks, 294,912 cores, 1 PF/s, 357 MF/W; quad core SOC,
DMA
– 2/4 GB/node
– SMP support, OpenMP, MPI
• BG/Q
– 20 PF/s
TOP500 Performance Trend
10000
IBM has most aggregate performance for last 20 lists
IBM has #1 system for last 10 lists (13 in total)
22.6 PF
1.1 PF
1000
275 TF
100
17.1 TF
10
1
0.1
0.01
Source:
www.top500.org
0.001
0.0001
93
/
93 06
/
94 11
/
94 06
/
95 11
/
95 06
/
96 11
/
96 06
/
97 11
/
97 06
/
98 11
/
98 06
/
99 11
/
99 06
/
00 11
/
00 06
'0 /07
1/
01 06
/
02 11
/
02 06
/
03 11
/
03 06
/
04 11
/
04 06
/
05 11
/
05 06
/
06 11
/
06 06
/
07 11
/
07 06
/
08 11
/
08 06
/
09 11
/0
6
Rmax Performance (GFlops)
100000
Blue Square Markers Indicate IBM Leadership
HPCC 2008
IBM BG/P 365 TF linpack (32 racks, 450TF peak)
 Number 1 on FFT (4485.72)
 Number 1 on Random Access (6.82)
Cray XT5 1059 TF linpack
 Number 1 on HPL
 Number 1 on Stream
Source: www.top500.org
November 2007 Green 500
Linpack GFLOPS/W
0.40
0.37
0.35
0.30
0.25
0.21
0.20
0.15
0.15
0.09
0.10
0.08
0.05
0.05
0.00
0.05
0.02
BG/L
BG/P
SGI
8200
HP
Cray
Cluster Sandia
Cray
Cray
ORNL NERSC
JS21
BSC
Relative power, space and cooling efficiencies
(Published specs per peak performance)
400%
300%
200%
IBM BG/P
100%
0%
Racks/TF
kW/TF
Sun/Constellation
IBM System Blue Gene®/P Solution
Sq Ft/TF
Cray/XT4
Tons/TF
SGI/ICE
System
BlueGene/P
72 Racks, 72x32x32
Cabled 8x8x16
Rack
32 Node Cards
Node Card
1 PF/s
144 (288) TB
(32 chips 4x4x2)
32 compute, 0-1 IO cards
13.9 TF/s
2 (4) TB
Compute Card
435 GF/s
64 (128) GB
1 chip, 20
DRAMs
Chip
4 processors
13.6 GF/s
8 MB EDRAM
13.6 GF/s
2.0 GB DDR2
(4.0GB 6/30/08)
BlueGene/P compute ASIC
32k I1/32k D1
PPC450
Snoop
filter
128
Multiplexing switch
L2
Double FPU
32k I1/32k D1
PPC450
snoop
Snoop
filter
128
256
Shared L3
Directory
for eDRAM
256
512b data
72b ECC
4MB
eDRAM
L3 Cache
or
On-Chip
Memory
w/ECC
L2
Double FPU
32
PPC450
Snoop
filter
128
L2
Double FPU
32k I1/32k D1
PPC450
Double FPU
Shared
SRAM
Multiplexing switch
32k I1/32k D1
Shared L3
Directory
for eDRAM
w/ECC
Snoop
filter
512b data
72b ECC
4MB
eDRAM
L3 Cache
or
On-Chip
Memory
128
L2
Arb
DMA
Hybrid
PMU
w/ SRAM
256x64b
JTAG
Access
JTAG
Torus
6 3.4Gb/s
bidirectional
Collective
3 6.8Gb/s
bidirectional
Global
Barrier
Ethernet
10 Gbit
4 global
barriers or
interrupts
10 Gb/s
DDR-2
Controller
w/ ECC
DDR-2
Controller
w/ ECC
13.6 Gb/s
DDR-2 DRAM bus
Execution Modes in BG/P per Node
 Next Generation HPC
node
core
core
core
core
Hardware Abstractions Black
Software Abstractions Blue
Quad Mode (VNM)
4 Processes
1 Thread/Process
P0
P2
T0
T0
P1
P3
T0
T0
– Many Core
– Expensive Memory
– Two-Tiered Programming Model
SMP Mode
1 Process
1-4 Threads/Process
Dual Mode
2 Processes
1-2 Threads/Process
P0
P1
P0
P0
T0
T0
T0
T2
T0
T1
T1
T1
T3
T1
P0
P1
T0
T0
P0
T0
P0
T2
T0
T1
IBM System Blue Gene®/P Solution
© 2007 IBM Corporation
BG/P 4 core compute card (target 100 FITs – 25% SER)
2 x 16GB interface
to 2 or 4 GB
SDRAM-DDR2
NVRAM, monitors,
decoupling, Vtt
termination
All network and IO,
power input
BGQ ASIC
29mm x 29mm FC-PBGA
IBM Research & STG
BPC Node Card
Optional IO card
(one of 2 possible)
with 10Gb optical link
Page 14
32 Compute
nodes
Local DC-DC
regulators
(6 required, 8 with
redundancy)
First BG/P Rack (2 Midplanes)
IBM Research & STG
Page 15
Hydro-Air
Concept for
BlueGene/P
(drawn to scale)
36”
Air-Cooled
BG/L
25 kW/Rack
3000 CFM/Rack
48”
Air-Cooled
BG/P
40 kW/Rack
5000 CFM/Rack
36”
Key:
BG Rack with Cards and Fans
Airflow
Air Plenum
Air-to-Water Heat Exchanger
Hydro-Air
Cooled
BG/P
40 kW/Rack
5000 CFM/Row
11
Main Memory Capacity per Rack
4500
4000
3500
3000
2500
2000
1500
1000
500
0
LRZ
IA64
Cray ASC
XT4 Purple
RR
BG/P
Sun
TACC
SGI
ICE
Peak Memory Bandwidth per node (byte/flop)
SGI ICE
Sun TACC
Itanium 2
POWER5
Cray XT5 4 core
Cray XT3 2 core
Roadrunner
BG/P 4 core
0
0.5
1
1.5
2
Main Memory Bandwidth per Rack
14000
12000
10000
8000
6000
4000
2000
0
LRZ
Itanium
Cray
XT5
ASC
Purple
RR
BG/P
Sun
TACC
SGI ICE
BlueGene/P Interconnection Networks
3 Dimensional Torus







Interconnects all compute nodes (73,728)
Virtual cut-through hardware routing
3.4 Gb/s on all 12 node links (5.1 GB/s per node)
0.5 µs latency between nearest neighbors, 5 µs to the
farthest
MPI: 3 µs latency for one hop, 10 µs to the farthest
Communications backbone for computations
1.7/3.9 TB/s bisection bandwidth, 188TB/s total bandwidth
Collective Network






One-to-all broadcast functionality
Reduction operations functionality
6.8 Gb/s of bandwidth per link per direction
Latency of one way tree traversal 1.3 µs, MPI 5 µs
~62TB/s total binary tree bandwidth (72k machine)
Interconnects all compute and I/O nodes (1152)
Low Latency Global Barrier and Interrupt

Latency of one way to reach all 72K nodes 0.65 µs,
MPI 1.6 µs
Interprocessor Peak Bandwidth per node (byte/flop)
Roadrunner
Dell Myrinet
x86 cluster
Sun TACC
Itanium 2
Power5
NEC ES
Cray XT4 2c
Cray XT5 4c
BG/L,P
0
0.2
0.4
0.6
0.8
Failures per Month per @ 100 TFlops (20 BG/L racks)
unparalleled reliability
800
800
700
600
500
400
IA64
X86
Power5
Blue Gene
394
300
200
127
100
0
1
Results of survey conducted by Argonne National Lab on 10 clusters ranging from 1.2 to 365 TFlops
(peak); excluding storage subsystem, management nodes, SAN network equipment, software outages
Reproducibility of Floating Point Operations








The example below is illustrated with single precision floating
point operations (~7 digits accuracy), the same principle
applies to double precision floating point operations (~14
digits accuracy)
A = 1234567
B = 1234566
C = 0.1234567
A-B+C = 1.123457
A+C-B = 1
Caution: floating point with finite number of digits in accuracy
is not commutative, and not associative.
BG/L,P enforces execution order, so all calculations are
reproducible.
IBM® System Blue Gene®/P Solution: Expanding the Limits of Breakthrough Science
Summary
 Blue Gene/P: Facilitating Extreme Scalability
– Ultrascale capability computing when nothing else will satisfy
– Provides customer with enough computing resources to help
solve grand challenge problems
– Provide competitive advantages for customers’ applications
looking for extreme computing power
– Energy conscious solution supporting green initiatives
– Familiar open/standards operating environment
– Simple porting of parallel codes
 Key Solution Highlights
– Leadership performance, space saving design, low power
requirements, high reliability, and easy manageability
IBM® System Blue Gene®/P Solution
© 2007 IBM Corporation
Backup
Current HPC Systems Characteristics
IBM POWER 6
(IH)
IBM Blue
Gene/P
IBM Roadrunner
(Cell)
ORNL
(AMD/Cray)
GF/socket
37.6 @ 4.7GHz
13.6
108.8
36.8 @ 2.3GHz
TF/rack
7.2 (192 chips)
13.9 (1024
chips)
5.3
7.066 (192 chips)
GB/core
8
0.5/1
0.5 (per SPE)
2
GB/rack
3072
2048/4096
192(AMD)
+192(cell)
1536
Mem BW
byte/flop
1.5
1
0.25
0.696
Mem BW in TB/s
2.7
13.9
1.3
4.915
P-P Interconnect
byte/flop
0.1
0.75
0.008
0.17
Kilo-watt/rack
Linpack MF/W
70
104
31.1
375
9.4
437
22
152
Space/100TF ft2
375
170
350
280
racks/PF peak
139
72
186
173
TB/PF peak
427
147/295
36 + 36
220
June 2008
Blue Gene/L Customers with 232 racks sold!
Advanced Industrial Science and Technology (AIST Yutaka Akiyama)
ƒ Argonne National Laboratory Consortium (Rick Stevens)
ƒ ASTRON Lofar, Holland - Stella (Kjeld van der Schaaf)
ƒ Boston University
ƒ Brookhaven National Laboratory/SUNY at Stony Brook (NewYorkBlue)
ƒ Centre of Excellence for Applied Research and Training (CERT, UAE)
ƒ CERFACS
ƒ Council for the Central Laboratory of the Research Councils (CCLRC)
ƒ DIAS at HEANet
ƒ Ecole Polytechnique Federale de Lausanne (EPFL, Henry Markram)
ƒ Electricite de France (EDF),France
ƒ Forschungszentrum Jülich GmbH (Thomas Lippert)
ƒ Harvard University (geophysics, computational chemistry)
ƒ IBM Yorktown Research Center (BGW)
ƒ IBM Almaden Research Center
ƒ IBM Zürich Research Lab
ƒ India Institute of Science (IISc), Bangalore
ƒ Iowa State University (Srinivas Aluru for genome classification)
ƒ Karolinska Institutet (neuroscience)
ƒ KEK, High Energy Accelerator Research Org. (Shoji Hashimoto)
ƒ Lawrence Livermore National Laboratory (Mark Seager)
ƒ MIT (John Negele)
ƒ NARSS/MCIT
ƒ National Center for Atmospheric Research (NCAR Richard Loft)
ƒ New Intelligent Workstation Systems Co. Ltd. (NIWS Ikuo Suesada)
ƒ Princeton University – (Orangena)
ƒ Rensselaer Polytechnic Institute- (CCNI)
ƒ RIKEN
ƒ San Diego Supercomputing Center (Wayne Pfeiffer) - Intimidata
ƒ University of Alabama, Birmingham
ƒ University of Canterbury, NZ (Blue Fern)
ƒ University of Edinburgh (Richard Kenway)
ƒ University of Minnesota (Hormel Institute)
ƒ University of North Carolina, Chapel Hill (RENCI, Dan Reed)
ƒ
4 racks
1 rack
6 racks
1 rack
18 racks
1 rack
1 rack
2 racks
2 racks
4 racks
4 racks
8 racks
2 racks
20 racks
2 racks
2 rack
4 racks
1 rack
1 rack
10 racks
105 racks
1 rack
1 rack
1 rack
1 rack
1 rack
17 racks
1 rack
3 racks
1 rack
2 racks
1 rack
1 rack
2 racks
2/05
12/04
3/30/05, replaced with BG/P
2004
2007
09/06
07/07
1/07
2008
06/05
10/06
12/05
06/06
05/05
03/05
03/05
9/07
12/05
1/07
03/01/06
09/05, 08/07
09/05
2007
3/05
1/05
9/05
5/07
2007
12/17/04,11/06
2007
2007
12/04
2008
4Q06, 1Q07
Blue Gene/P Customers
Argonne National Laboratory, Intrepid 40 racks, Surveyor 1 rack, Intrepid 40
41 racks
ƒ ASTRON
3 racks
ƒ Brookhaven/Stony Brook Consortium
2 racks
ƒ Council for the Central Laboratory of the Research Councils (CCLRC)
1 rack
ƒ CHPC, South Africa
1 rack
ƒ Dassault
1 rack
ƒ Dublin Institute for Advanced Studies (DIAS) on HEANet
1 racks
ƒ Ecole Polytechnique Federale de Lausanne (EPFL, Henry Markram)
4 racks
ƒ Electricite de France (EDF), France
8 racks
2008
ƒ Forschungszentrum Jülich GmbH JuGene (Thomas Lippert)
72 racks
ƒ Fritz Haber Institute (IPP)
2 rack
ƒ IBM On Demand Center (JEMTs)
4 racks
ƒ IBM Yorktown Research Center (BGW)
4 racks
ƒ IBM Zurich
1 rack
ƒ ICT, Bulgaria
2 racks
ƒ Institute for Development and Resources in Intensive Scientific computing (IDRIS, France)
Laboratoire Bordelais de Recherche en Informatique (LABRI)
10 racks
ƒ KAUST
16 racks
ƒ Lawrence Livermore National Laboratory
38 racks
ƒ Moscow State University, Russia
2 racks
ƒ Oak Ridge National Laboratory (up to 16?)
2 racks
ƒ RZG/Max-Planck-Gesellschaft /Fritz Haber Ins., IPP Institut für Plasma Physik
3 racks
ƒ Science & Technology Facilities Control (STFC) at Daresbury
1 rack
ƒ Tata Institute of fundamental Research (TIFR) in India
1 rack
ƒ University of Rochester, NY
1 rack
ƒ
–
–
Total BG/P Racks:
Total BG/L Racks:
220
232
24 sites
34 sites
9 in 07, 32 in 08
2008
2007
2007
2008
2008
2007
07/09
16 in 07, 16 in 08, 40 in 09
2007
2008
2008
2008
2008
2008
2009
2009
2008
2007
2 in 07, 1 in 08
2007
2008
2009
Download