Application of General Purpose HPC Systems in HPEC David Alexander Director of Engineering

Application of General Purpose
HPC Systems in HPEC
David Alexander
Director of Engineering
SGI, Inc.
dca 8/2003
SGI HPC System Architecture
dca 8/2003
SGI Scalable ccNUMA Architecture
Basic Node Structure and Interconnect
C
A
C
H
E
CPU
CPU
Processor Interface
(>6GB/sec)
Interface
Chip
C
A
C
H
E
C
A
C
H
E
NUMAlink Interconnect
(>6GB/sec)
Memory Interface
(>8GB/sec)
CPU
CPU
C
A
C
H
E
Processor Interface
(>6GB/sec)
Interface
Chip
Memory Interface
(>8GB/sec)
Physical Memory
(2-16GB)
Physical Memory
(2-16GB)
Physical Memory
SGI Scalable ccNUMA Architecture
Scaling to Large Node Counts
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Interface
Chip
Interface
Chip
Interface
Chip
Interface
Chip
Memory
Memory
Memory
Memory
System Interconnect Fabric
• Scaling to 100’s of processors
• RT Memory Latency < 600ns worst case (64p config)
• Bi-section bandwidth >25GB/sec (64p config)
SGI Scalable ccNUMA Architecture
SGI® NUMAflex™ Modular Design
System
Interconnect
CPU &
Memory
System I/O
Standard I/O
Expansion
Performance: High-bandwidth
interconnect with very low latency
Flexibility: Tailored configurations for
different dimensions of scalability
Investment protection: Add new
technologies as they evolve
Scalability: No central bus or switch; just
modules and NUMAlink™ cables
High BW I/O
Expansion
Graphics Expansion
Storage Expansion
Modules from SGI® Onyx® 3000 Series
SGI Scalable ccNUMA Architecture
The Benefits of Shared Memory
Traditional Clusters
SGI® Altix™ 3000
Commodity interconnect
mem mem mem mem mem
node
+
OS
node
+
OS
node
+
OS
node
+
OS
node
+
OS
...
mem
node
+
OS
Fast NUMAflex™ interconnect
Global shared memory
node
+
OS
node
+
OS
node
...
+
OS
node
+
OS
What is shared memory?
• All nodes operate on one large shared memory space, instead of each node having its own small
memory space
Shared memory is high-performance
• All nodes can access one large memory space efficiently, so complex communication and data
passing between nodes aren’t needed
• Big data sets fit entirely in memory; less disk I/O is needed
Shared memory is cost-effective and easy to deploy
• The SGI Altix 3000 family supports all major parallel programming models
• It requires less memory per node, because large problems can be solved in big shared memory
• Simpler programming means lower tuning and maintenance costs
SGI® Altix™ 3000 HPC Product
dca 8/2003
SGI® Altix™ 3000
Overview
First Linux® node with 64 CPUs in
single-OS image
First clusters with global shared
memory across multiple nodes
First Linux solution with HPC systemand data-management tools
World-record performance for floatingpoint calculations, memory
performance, I/O bandwidth, and real
technical applications
SGI® Altix™ 3000
Fusion of Powerhouse Technologies
SGI® supercomputing technology, Intel’s most advanced
processors, and open-source computing
Global shared memory
(64-bit shared memory across cluster nodes)
SGI ProPack™ and supercomputing enhancements
SGI® NUMAlink™
(Built-in high-bandwidth, low-latency interconnect)
SGI® NUMAflex™
(Third-generation modular supercomputing architecture)
Intel® Itanium® 2 processor family
Industry-standard Linux®
SGI® Altix™ 3000 Family
Extreme Power, Extreme Potential
Model 3700 Superclusters
Model 3300 Servers
Single-node entry offering
4–12 1.30 GHz Itanium® 2
processors, 3MB L3 cache
Up to 96GB memory
Scalable performance offering
4–64 Itanium 2 processors in a single node
1.30 GHz/3MB L3 cache
1.50 GHz/6MB L3 cache
Shared memory across nodes
Scalable to 2,048 processors, 16TB memory
Nodes up to 64P, 4TB memory
Multi-Paradigm Architecture
dca 8/2003
Multi-Paradigm Architecture
Overview
R
• NUMAlink system interconnect
C
• General-purpose compute nodes
IO
MSE
V
MSE
C
C
• Integrated graphics/visualization
IO
R
R
MSE
C
C
C
C
V
C
• Peer-attached general purpose I/O
• Mission-specific accel. and/or I/O (MSE)
IO
C
MSE
C
R
IO
R
IO
C
MSE
C
Multi-Paradigm Architecture
Mission Specific Element (MSE)
Memory
System
Interconnect
Fabric
>5GB/s
TIO ASIC
FPGA,
DSP, or
ASIC
I/O
NL4 Ports
(>5GB/s x 2)
MSE
Implementation variants:
• Customer supplied/loaded FPGA algorithms and/or ASIC/DSP
• Subroutine or standard library acceleration
• Specific use “appliance”
Embedded High Performance
Computing (EHPC)
dca 8/2003
Embedded High Performance Computing
Unified Development/Deployment Environment
EHPC Today
EHPC Vision
Different Development and Deployment Environments
Unified Development and Deployment
Development Environment
HPC methods, tools, hardware
Architecture provides freedom to develop new approaches
Months of work
Millions of dollars
Significant schedule risk
Significant architectural/performance risk
Deployment
Proprietary hardware and software
“Islands of memory” narrows algorithm options
Scalable ccNUMA
System Architecture
SGI
EHPC
System
Mission-Specific
Processing
Embedded Systems
Packaging
Platform Port
Seamless HPC Application
Development Model
Enhanced software development productivity
Superior performance and HW utilization
Demonstration to deployment in days
Benefits from mainstream HPC advances
Embedded High Performance Computing
EHPC Platform
Seamless HPC Application
Development Model
• Unified software environment for development,
prototype and deployment
• Full multi-paradigm computing support
• Field-upgradeable mission specific processing
acceleration
• Highly optimized performance per watt and
performance per slot
• Design to fit into established environmentals, form
factors, and interfaces
•
•
•
•
6U Eurocard form factor
Passive, slot configurable backplane
PMC module connectivity to standard interfaces
Able to address oceanic, ground and airborne
environmentals
SGI
EHPC
System
Mission-Specific
Processing
Embedded Systems
Packaging
Scalable ccNUMA
System Architecture
Embedded High Performance Computing
Family of 6U Form Factor Modules
6U Form Factor
6U Form Factor
Cache
Cache
uP
uP
PMC Module Site
PIC
Memory
Memory
DDR SDRAM
DDR SDRAM
Memory
Memory
DDR SDRAM
DDR SDRAM
Memory
Memory
DDR SDRAM
DDR SDRAM
Memory
Memory
DDR SDRAM
DDR SDRAM
PMC Module Site
NUMAlink
Controller
NUMAlink4
Ports
NUMAlink
Controller
System
Control
Scalable General Purpose
Processor/Memory Blade
Diagnostic &
L1 Ports
System
Control
General Purpose
I/O Blade
NUMAlink4
Ports
Diagnostic &
L1 Ports
Embedded High Performance Computing
Family of 6U Form Factor Modules
6U Form Factor
6U Form Factor
Memory
FPGA
External I/O
(Optional)
TIO
NL4
Port
System
Control
6.4GB/s
FPGA
Memory
TIO
NUMAlink4
Port
ASIC
Diagnostic &
L1 Ports
NUMAlink4
Port
External I/O
(Optional)
ASIC
TIO
NL4
Port
FPGA
6.4GB/s
FPGA
Memory
System
Control
TIO
Memory
Mission-Specific Accelerator and/or I/O Blades
NUMAlink4
Port
Diagnostic &
L1 Ports
NUMAlink4
Port
www.sgi.com
©2002 Silicon Graphics, Inc. All rights reserved. Silicon Graphics, SGI, IRIX, Challenge, Origin, Onyx, and the SGI logo are registered
trademarks and OpenVault, FailSafe, Altix, XFS, and CXFS are trademarks of Silicon Graphics, Inc., in the U.S. and/or other countries
worldwide. MIPS is a registered trademark of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. UNIX is a registered
trademark of The Open Group in the U.S. and other countries. Intel and Itanium are registered trademarks of Intel Corporation. Linux is a
registered trademark of Linus Torvalds. Mac is a registered trademark of Apple Computer, Inc. Windows is a registered trademark or
trademark of Microsoft Corporation in the United States and/or other countries.Red Hat is a registered trademark of Red Hat, Inc. All other
trademarks are the property of their respective owners. (1/03)