Networking Options for Beowulf Clusters

advertisement

March 22, 2000 Dr. Thomas Sterling, Caltech 1

Presentation to the American Physical Society:

Networking Options for Beowulf

Clusters

Dr. Thomas Sterling

California Institute of Technology and

NASA Jet Propulsion Laboratory

March 22, 2000

March 22, 2000 Dr. Thomas Sterling, Caltech 3

March 22, 2000 Dr. Thomas Sterling, Caltech 4

Points of Inflection Computing

• Heroic Era (1950)

– technology: vacuum tubes, mercury delay lines, pulse transformers

– architecture: accumulator based

– model: von-Neumann, sequential instruction execution

– examples: Whirlwind, EDSAC

• Mainframe (1960)

– technology: transistors, core memory, disk drives

– architecture: register bank based

– model: reentrant concurrent processes

– examples: IBM 7042, 7090, PDP-1

• Scientific Computer(1970)

– technology: earliest SSI logic gate modules

March 22, 2000 Dr. Thomas Sterling, Caltech

– model: parallel processing

– examples: CDC 6600, Goodyear STARAN

5

Points of Inflection in the History of

Computing

• Supercomputers (1980)

– technology: ECL, semiconductor integration, RAM

– architecture: pipelined

– model: vector

– example: Cray-1

• Massively Parallel Processing

(1990)

– technology: VLSI, microprocessor,

– architecture: MIMD

– model: Communicating Sequential

Processes, Message passing

Dr. Thomas Sterling, Caltech

• ? (2000)

6

March 22, 2000 Dr. Thomas Sterling, Caltech 7

March 22, 2000 Dr. Thomas Sterling, Caltech 8

Punctuated Equilibrium nonlinear dynamics drive to point of inflexion

• Drastic reduction in vendor support for HPC

• Component technology for PCs matches workstation capability

• PC hosted software environments achieve sophistication and robustness of mainframe O/S

• Low cost network hardware and software enable balanced PC clusters

• MPPs establish low level of expectation

• Cross-platform parallel programming model

March 22, 2000 Dr. Thomas Sterling, Caltech 9

BEOWULF -CLASS

SYSTEMS

• Cluster of PCs

– Intel x86

– DEC Alpha

– Mac Power PC

• Pure M 2 COTS

• Unix-like O/S with source

– Linux, BSD, Solaris

• Message passing programming model

– PVM, MPI, BSP, homebrew remedies

• Single user environments

• Large science and engineering applications

March 22, 2000 Dr. Thomas Sterling, Caltech 10

Rank Manufacturer

33 Sun

34 Compaq

44 Self-made

143 Sun

169 Compaq

265 Self-Made

351 Fujitsu-Siemens

384 Sun

397 SGI

399 Sun

400 Sun

420 SGI

421 SGI

422 SGI

423 SGI

424 SGI

443 Sun

445 SGI

454 Self-made

Computer

HPC 4500 Cluster

AlphaServer SC

CPlant Cluster

HPC 10000 400 MHz Cluster

Alphleet Cluster

Avalon Cluster hpcLine Cluster

Rmax

272.1

271.4

232.6

68.77

61.3

48.6

Installation Site

Sun Burlington

Compaq Computer Corporation Littleton

Sandia National Laboratories Albuquerque

KT Freetel Seoul

Institute of Physical and Chemical Res. (RIKEN) Wako

Los Alamos National Laboratory/CNLS Los Alamos

41.45 Universitaet Paderborn - PC2 Paderborn

HPC 10000 333 MHz Cluster 39.87 Dutchtone

ORIGIN 2000 250 MHz - EthCluster 39.4 The Sabre Group Ft Worth

HPC 10000 400 MHz Cluster

HPC 10000 400 MHz Cluster

39.03 Computer Manufacturer

39.03 Semiconductor Company

ORIGIN 2000 300 MHz - EthCluster 37.31 Industrial Light & Magic

ORIGIN 2000 250 MHz - EthCluster 37.31 Government

ORIGIN 2000 250 MHz - EthCluster 37.31 America On Line (AOL)

ORIGIN 2000 250 MHz - EthCluster 37.31 Industrial Light & Magic

ORIGIN 2000 250 MHz - EthCluster 37.31 NASA/Ames Research Center/NAS Mountain View

HPC 10000 333 MHz Cluster 35.17 Gedas N.A. (VW)

64

64

128

144

128

128

128

70

ORIGIN 2000 250 MHz - EthCluster 34.47 Government

Parnass2 Cluster 34.23 University Bonn - Dep. of Applied Mathematics Bonn

112

128

# Proc

720

512

580

110

140

140

192

78

128

51.2

51.2

76.8

72

64

64

64

42

56

57.6

Rpeak

483.84

512

580

88

140

149.4

86.4

46.8

64

March 22, 2000 Dr. Thomas Sterling, Caltech 11

Beowulf-class Systems

A New Paradigm for the Business of Computing

• Brings high end computing to broad ranged problems

– new markets

• Order of magnitude Price-Performance advantage

• Commodity enabled

– no long development lead times

• Low vulnerability to vendor-specific decisions

– companies are ephemeral; Beowulfs are forever

• Rapid response technology tracking

• Just-in-place user-driven configuration

– requirement responsive

• Industry-wide, non-proprietary software environment

March 22, 2000 Dr. Thomas Sterling, Caltech 12

March 22, 2000 Dr. Thomas Sterling, Caltech 13

Have to Run Big Problems on Big

Machines?

• Its work, not peak flops

• A user’s throughput over application cycle

• Big machines yield little slices

– due to time and space sharing

• But data set memory requirements

– wide range of data set needs, three order of magnitude

– latency tolerant algorithms enable out-of-core computation

• What is Beowulf breakpoint for price-performance?

March 22, 2000 Dr. Thomas Sterling, Caltech 14

Throughput Turbochargers

• Recurring costs approx.. 10% MPPs

• Rapid response to technology advances

• Just-in-place configuration and reconfigurable

• High reliability

• Easily maintained through low cost replacement

• Consistent portable programming model

– Unix, C, Fortran, Message passing

• Applicable to wide range of problems and algorithms

• Double machine room throughput at a tenth the cost

• Provides super-linear speedup

March 22, 2000 Dr. Thomas Sterling, Caltech 15

Beowulf Project - A Brief History

• Started in late 1993

• NASA Goddard Space Flight Center

– NASA JPL, Caltech, academic and industrial collaborators

• Sponsored by NASA HPCC Program

• Applications: single user science station

– data intensive

– low cost

• General focus:

– single user (dedicated) science and engineering applications

– out of core computation

– system scalability

– Ethernet drivers for Linux

March 22, 2000 Dr. Thomas Sterling, Caltech 16

Beowulf System at JPL (Hyglac)

• 16 Pentium Pro PCs, each with 2.5 Gbyte disk, 128 Mbyte memory,

Fast Ethernet card.

• Connected using 100Base-T network, through a 16-way crossbar switch.

 Theoretical peak performance: 3.2 GFlop/s.

 Achieved sustained performance: 1.26 GFlop/s.

March 22, 2000 Dr. Thomas Sterling, Caltech 17

A 10 Gflops

Beowulf

Center for

Advance

Computing

Research

March 22, 2000

California Institute of Technology

Dr. Thomas Sterling, Caltech

172 Intel

Pentium Pro microprocessors

18

Avalon architecture and price.

March 22, 2000 Dr. Thomas Sterling, Caltech 19

1st printing: May, 1999

2nd printing: Aug. 1999

MIT Press

March 22, 2000 Dr. Thomas Sterling, Caltech 20

Beowulf at Work

March 22, 2000 Dr. Thomas Sterling, Caltech 21

Beowulf

Scalability

March 22, 2000 Dr. Thomas Sterling, Caltech 22

Electro-dynamic FDTD Code

T3D

(shmem)

T3D

(MPI)

1.8

(1.3

* )

1.8

(1.3

*

0.007

0.08

)

Hyglac

(MPI,

Good Load

Balance)

1.1

Hyglac

(MPI,

Poor Load

Balance)

1.1

Interior

Computation

Interior

Communication

Boundary

Computation

Boundary

Communication

Total

0.19

0.04

2.0

(1.5

* )

0.19

1.5

3.5

(3.0

* )

3.8

0.14

50.1

55.1

3.8

0.42

0.0

5.5

( * using assembler kernel)

All timing data is in CPU seconds/simulated time step, for a global grid size of 282

362

102, distributed on 16 processors.

March 22, 2000 Dr. Thomas Sterling, Caltech 23

Network Topology Scaling

350

Latencies

(

 s)

300

250

200

150

100

50

0

TCP Latency

UDP Latency

March 22, 2000 Dr. Thomas Sterling, Caltech 24

Routed Network - Random Pattern

March 22, 2000 Dr. Thomas Sterling, Caltech 25

March 22, 2000 Dr. Thomas Sterling, Caltech 26

March 22, 2000

System Area Network

Technologies

• Fast Ethernet

– LAN, 100 Mbps, 100 usec

• Gigabit Ethernet

– LAN/SAN, 1000 Mbps, 50 usec

• ATM

– WAN/LAN, 155/620 Mbps,

• Myrinet

– SAN, 1250 Mbps, 20 usec

• Giganet

– SAN/VIA, 1000 Gbps, 5 usec

• Servernet II

– SAN/VIA, 1000 Gbps, 10 usec

• SCI Dr. Thomas Sterling, Caltech

– SAN, 8000 Gbps, 5 usec

27

3Com CoreBuilder 9400 Switch and Gigabit Ethernet NIC

March 22, 2000 Dr. Thomas Sterling, Caltech 28

Lucent Cajun M770 Multifunction

Switch

March 22, 2000 Dr. Thomas Sterling, Caltech 29

M2LM-SW16 16-Port Myrinet

Switch with 8 SAN ports and 8

LAN ports

March 22, 2000 Dr. Thomas Sterling, Caltech 30

Dolphin Modular SCI Switch for

System Area Networks

March 22, 2000 Dr. Thomas Sterling, Caltech 31

Giganet High Performance Host

Adapters

March 22, 2000 Dr. Thomas Sterling, Caltech 32

Giganet High Performance

Cluster Switch

March 22, 2000 Dr. Thomas Sterling, Caltech 33

March 22, 2000 Dr. Thomas Sterling, Caltech 34

March 22, 2000 Dr. Thomas Sterling, Caltech 35

March 22, 2000 Dr. Thomas Sterling, Caltech 36

March 22, 2000 Dr. Thomas Sterling, Caltech 37

March 22, 2000 Dr. Thomas Sterling, Caltech 38

March 22, 2000 Dr. Thomas Sterling, Caltech 39

The Beowulf Delta looking forward

• 6 years

• Clock rate: X 4

• flops (per chip): X 50 (2-4 proc/chip, 4-8 way

ILP/proc)

• #processors: 32

• Networking: X 32 (32 - 64 Gbps)

• Memory: X 10 (4 Gbytes)

• Disk: X 100

• price-performance: X 50

March 22, 2000 Dr. Thomas Sterling, Caltech

• system performance: 50 Tflops

40

Million $$ Teraflops Beowulf?

• Today, $3M peak Tflops

• < year 2002 $1M peak Tflops

• Performance efficiency is serious challenge

• System integration

– does vendor support of massive parallelism have to mean massive markup

• System administration, boring but necessary

• Maintenance without vendors; how?

– New kind of vendors for support

• Heterogeneity will become major aspect

March 22, 2000 Dr. Thomas Sterling, Caltech 41

March 22, 2000 Dr. Thomas Sterling, Caltech 42

Download