Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp.

advertisement
Supercomputers(t)
Gordon Bell
Bay Area Research Center
Microsoft Corp.
http://research.microsoft.com/users/gbell
Photos courtesy of
The Computer Museum History Center
Please only copy with credit!
http://www.computerhistory.org
7/27/2016
Copyright G Bell & TCM History Center
1
Supercomputer
Largest computer at a given time
 Technical use for science and
engineering calculations
 Large government defense,
weather, aero laboratories are
first buyers
 Price is no object
 Market size is 3-5

7/27/2016
Copyright G Bell & TCM History Center
2
Growth in Computational Resources
Used for UK Weather Forecasting
10T •
1T •
1010/ 50 yrs = 1.5850
100G •
10G •
205
1G •
100M •
YMP
195
10M •
1M •
KDF9
100K •
10K •
Mercury
1K •
100 •
Leo
10 • •
1950
7/27/2016
•
2000
Copyright G Bell & TCM History Center
3
What a difference 25 years and spending
>10x more makes!
Artist’s view of
40 Tflops
ESRDC c2002
LLNL
150 Mflops
machine
room
c1978
7/27/2016
Copyright G Bell & TCM History Center
4
Harvard Mark I
aka IBM ASCC
7/27/2016
Copyright G Bell & TCM History Center
5
“ market for maybe five
computers. ”
I think there is a world
Thomas Watson Senior,
Chairman of IBM, 1943
7/27/2016
Copyright G Bell & TCM History Center
6
The scientific market is still
about that size… 3 computers



When scientific processing was 100%
of the industry a good predictor
$3 Billion: 6 vendors, 7 architectures
DOE buys 3 very big ($100-$200 M)
machines every 3-4 years
7/27/2016
Copyright G Bell & TCM History Center
7
Supercomputer price (t)
Time $M
1950 1
1960 3
1970
1980
1990
2000
7/27/2016
10
30
250
1,000
structure
mainframes
instruction //sm
mainframe SMP
pipelining
vectors; SCI
MIMDs: mC, SMP, DSM
ASCI, COTS MPP
Copyright G Bell & TCM History Center
example
many...
IBM / CDC
7600 / Cray 1
“Crays”
“Crays”/MPP
Grid, Legion
8
Supercomputing:
speed at any price, using parallelism
Intra processor
Memory overlap & instruction lookahead
Functional parallelism (2-4)
Pipelining (10)
SIMD ala ILLIAC 2d array of 64 pe vs vectors
Wide instruction word (2-4)
MTA (10-20)
MIMDs… processor replication
SMP (4-64)
Distributed Shared Memory SMPs 100
MIMD… computer replication
Multicomputers aka MPP aka clusters (10K)
Grid: 100K
7/27/2016
Copyright G Bell & TCM History Center
9
High performance
architectures timeline
1950 .
1960 .
1970 .
Vtubes
Trans.
MSI(mini)
Processor overlap, lookahead
Cray era
SMP
1980 .
2000
Micro RISC
nMicr
“killer micros”
6600 7600 Cray1
X Y
C T
Vector-----SMP---------------->
mainframes--->
“multis”----------->
DSM
Clusters
1990 .
Tandm VAX
MPP if n>1000
7/27/2016
Copyright G Bell & TCM History Center
Networks
n>10,000
KSR
SGI---->
IBM
UNIX->
Ncube
Intel
IBM->
NOW
Grid10
High performance
architectures timeline
1950 .
1960 .
1970 .
Vtubes
Trans.
MSI(mini)
1980 .
1990 .
Micro RISC
2000
nMicr
Sequential programming---->-----------------------------<SIMD Vector--//--------------Parallelization---
Parallel programming
multicomputers
ultracomputers 10X in price
“in situ” resources 100x in //sm
7/27/2016
Copyright G Bell & TCM History Center
<--------------<--MPP era-----10xMPP
NOW
VLC
Grid11
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
2010
Processors
IBM
Interleaving, overlap, Instruction lookahead
CDC/Cray/Supers
6600
7600
Vector
DEC
mini
Alpha
Intel
8008
8086,8 286
386
486
Ppro P2,3, Merced
RISC and "the killer micros"
RISC
VLIW
Cydrome & Multiflow XXX
SIMD
Illiac IV
CM1 CM2 Maspar
XXX
Multi-threaded Architecture
Dennelcor?
Tera MTA ????????????
Time line of hpcc
contributions
Multiprocessors
SMP cabinet mainframes
SMP "multis_
SMP on a chip
Burroughs, Univac, IBM, etc.----------------Mulits=Sequent,Encore, etc.
------------X-------------------
SMPv. Cray, NEC, Fujitsu, Hitachi
XMP
YMP
Distributed Shared Memory
KSR
Shared address multicomputers
Multicomputers aka clusters aka MPP
Clusters of minis or mainframes
Tandem
MPPs: Intel, Thinking Machines, IBM
7/27/2016
Workstation
clusters
NOW
worldwide
C
BBN
T3D
VAX Clustr
CalT
Copyright G Bell & TCM History Center
----------
Origin numa-----
??????
??????
T3E
Sysplex
Ncube
T
UNIX --------------------Beowulf------------------
UC/B NOW etc.------Grid--------
??????
12
??????
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
2010
Processors
IBM
Stretch
360
CDC
1604
6600
DEC
PDP 8
Intel
RISC all
VLIW
SIMD
Multi-threaded Architecture
370
7600
PDP11
8008
G
Time line of hpcc
contributions
Cray 1
VAX
8086,8 286
Illiac IV
Dennelcor?
Alpha
386
486
PproP2,3,Merced
MIPS/Ppc/Sparc
Cydrome & Multiflow XXX
CM1 CM2 Maspar
Tera MTA
Multiprocessors
SMP
SMP.IBM
SMP.SUN
B5000, Univac, etc.
8090 ….
Mulits=Sequent,Encore, etc.
10K
SMPv.Cray
SMPv.NEC
XMP
YMP
SX 1…… 5
DSM SUN
DSM.SGI/Cray
KSR
T3D
C
T
SUN NUMA
Origin numa
T3E
Multicomputers aka clusters aka MPP
Clusters
Multicomputers
Intel MPPs
Thinking Machines
IBM MPP 7/27/2016
NOW
Tandem
VAX Clustr
Sysplex
UNIX
CalTechNcube
Beowulf
iPSC1, 2,Par,Delta
1.Tf 2Tf
CM1,2, 5
Copyright G Bell & TCM History Center
SP1
SP2
UC/B NOW Grid
13
Lehmer
UC/Berkeley
precomputer
number
sieves
7/27/2016
Copyright G Bell & TCM History Center
14
Eniac c1946
7/27/2016
Copyright G Bell & TCM History Center
15
Manchester:
the first
computer.
Baby, Mark I,
and Atlas
7/27/2016
Copyright G Bell & TCM History Center
16
von
Neumann
computers
: Rand
Johniac
7/27/2016
Copyright G Bell & TCM History Center
17
Gene Amdahl’s Dissertation and first computer
7/27/2016
Copyright G Bell & TCM History Center
18
IBM
7/27/2016
Copyright G Bell & TCM History Center
19
IBM Stretch c1961
& 360/91 c1965
consoles!
7/27/2016
Copyright G Bell & TCM History Center
20
IBM Terabit Photodigital Store
c1967
7/27/2016
Copyright G Bell & TCM History Center
21
STC Terabytes of storage
c1999
7/27/2016
Copyright G Bell & TCM History Center
22
Amdahl aka Fujitsu version of
the 360 c1975
7/27/2016
Copyright G Bell & TCM History Center
23
IBM ASCI Red @ LLNL
7/27/2016
Copyright G Bell & TCM History Center
24
CDC, ETA, Cray Research,
Cray Computer
7/27/2016
Copyright G Bell & TCM History Center
25
Cray
1925
-1996
7/27/2016
Copyright G Bell & TCM History Center
26
Circuits and Packaging,
Plumbing (bits and atoms) &
Parallelism… plus Programming
and Problems
Packaging, including heat removal
 High level bit plumbing… getting the
bits from I/O, into memory through a
processor and back to memory and
to I/O
 Parallelism
 Programming: O/S and compiler

Problems
being
solved
7/27/2016
Copyright
G Bell & TCM
History Center

27
Seymour Cray Computers
1951: ERA 1103 control circuits
 1957: Sperry Rand NTDS; to CDC
 1959: Little Character to test
transistor ckts
 1960: CDC 1604 (3600, 3800) &
160/160A
 1964: CDC 6600 (6xxx series)
 1969: CDC 7600

7/27/2016
Copyright G Bell & TCM History Center
28
Cray Research, Cray Computer Corp.
and SRC Computer Corp.
1976: Cray 1...
(1/M, 1/S, XMP, YMP, C90, T90)
 1985: Cray Computer
Cray 2 from Cray Research;
GaAs: Cray 3 (1993), Cray 4
 1999: SRC Company large scale,
shared memory multiprocessor
using x86 microprocessors

7/27/2016
Copyright G Bell & TCM History Center
29
Cray contributions…






Creative and productive during his entire
career 1951-1996.
Creator and un-disputed designer of supers
from c1960 1604 to Cray 1, 1s, 1m c1977…
basis for
SMPvector: XMP, YMP, T90, C90, 2, 3
Circuits, packaging, and cooling…
“the mini” as a peripheral computer
Use I/O computers versus I/O processors
Use the main processor and interrupt it for I/O
versus I/O processors aka IBM Channels
7/27/2016
Copyright G Bell & TCM History Center
30
Cray Contributions






Multi-theaded processor (6600 PPUs)
CDC 6600 functional parallelism leading to
RISC… software control
Pipelining in the 7600 leading to...
Use of vector registers: adopted by 10+
companies.
Mainstream for technical computing
Established the template for vector
supercomputer architecture
SRC Company use of x86 micro in 1986
that could lead to largest, smP?
7/27/2016
Copyright G Bell & TCM History Center
31
“Cray” Clock speed (Mhz),
no. of processors, peak power (Mflops)
1.E+06
1.E+05
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
1.E-01
1960
7/27/2016
1970
1980
Copyright G Bell & TCM History Center
1990
2000
32
CDC 1604 &
6600
7/27/2016
Copyright G Bell & TCM History Center
34
CDC 7600: pipelining
7/27/2016
Copyright G Bell & TCM History Center
35
CDC 8600
Prototype:
SMP,
scalar,
discrete
circuits,
failed to
achieve
clock
speed
7/27/2016
Copyright G Bell & TCM History Center
36
CDC
STAR…
ETA10
7/27/2016
Copyright G Bell & TCM History Center
37
CDC 7600 & Cray 1 at Livermore
Cray 1
CDC 7600
Disks
7/27/2016
Copyright G Bell & TCM History Center
38
Cray 1 #6 from
LLNL.
Located at
The Computer
Museum
History
Center, Moffett
Field
7/27/2016
Copyright G Bell & TCM History Center
39
Cray 1 150 Kw. MG set & heat exchanger
7/27/2016
Copyright G Bell & TCM History Center
40
Cray
XMP/4
Proc.
c1984
7/27/2016
Copyright G Bell & TCM History Center
41
Cray 2 from NERSC/LBL
7/27/2016
Copyright G Bell & TCM History Center
42
Cray 3 c1995 processor
500 MHz
32 modules 1K GaAs
ic’s/module
8 proc.
7/27/2016
Copyright G Bell & TCM History Center
43
c1970: Beginning the search
for parallelism
SIMDs
 Illiac IV
 CDC Star
 Cray 1

7/27/2016
Copyright G Bell & TCM History Center
44
Iliac IV: first SIMD c 1970s
7/27/2016
Copyright G Bell & TCM History Center
45
SCI
(Strategic Computing Initiative)
funded by DARPA and aimed at a
Teraflops!
Era of State computers and many
efforts to build high speed
computers… lead to HPCC
Thinking Machines, Intel supers,
Cray T3 series
7/27/2016
Copyright G Bell & TCM History Center
46
Minisupercomputers: a market
whose time never came.
Alliant, Convex,
Ardent+Stellar= Stardent = 0,
7/27/2016
Copyright G Bell & TCM History Center
47
Cydrome and Multiflow:
prelude to wide word parallelism
in Merced




Minisupers with VLIW attack the market
Like the minisupers, they are repelled
It’s software, software, and software
Was it a basically good idea that will
now work as Merced?
7/27/2016
Copyright G Bell & TCM History Center
48
MasPar...



A less costly, CM 1/2 done in silicon
chips
It is repelled.
S is the fatal flaw
7/27/2016
Copyright G Bell & TCM History Center
49
Thinking Machines:
7/27/2016
Copyright G Bell & TCM History Center
50
Thinking Machines: CM1 & CM5 c1983-1993
7/27/2016
Copyright G Bell & TCM History Center
51
“
In Dec. 1995 computers
with 1,000 processors
will do most of the
scientific processing.
”
Danny Hillis
1990 (1 paper or 1 company)
7/27/2016
Copyright G Bell & TCM History Center
52
The Bell-Hillis Bet
Massive Parallelism in 1995
TMC
TMC
TMC
World-wide
Supers
World-wide
Supers
World-wide
Supers
Applications
Petaflops / mo.
Revenue
7/27/2016
Copyright G Bell & TCM History Center
53
Bell-Hillis Bet: wasn’t paid off!
My goal was not necessarily to just
win the bet!
 Hennessey and Patterson were to
evaluate what was really
happening…
 Wanted to understand degree of
MPP progress and programmability

7/27/2016
Copyright G Bell & TCM History Center
54
KSR 1:
first commercial
DSM
NUMA
(non-uniform
memory access)
aka
COMA
(cache-only
memory
architecture)
7/27/2016
Copyright G Bell & TCM History Center
55
SCI (c1980s):
Strategic Computing Initiative funded
ATT/Columbia (Non Von), BBN Labs,
Bell Labs/Columbia (DADO),
CMU Warp (GE & Honeywell),
CMU (Production Systems), Encore, ESL,
GE (like connection machine), Georgia
Tech, Hughes (dataflow), IBM (RP3),
MIT/Harris, MIT/Motorola (Dataflow), MIT
Lincoln Labs, Princeton (MMMP),
Schlumberger (FAIM-1), SDC/Burroughs,
SRI (Eazyflow),
University of Texas,
7/27/2016
Copyright G Bell & TCM
History Center
Thinking
Machines
(Connection
Machine),
56
Those who gave their lives in
the search for parallellism
Alliant, American Supercomputer, Ametek,
AMT, Astronautics, BBN Supercomputer,
Biin, CDC, Chen Systems, CHOPP, Cogent,
Convex (now HP), Culler, Cray Computers,
Cydrome, Dennelcor, Elexsi, ETA, E & S
Supercomputers, Flexible, Floating Point
Systems, Gould/SEL, IPM, Key, KSR,
MasPar, Multiflow, Myrias, Ncube, Pixar,
Prisma, SAXPY, SCS, SDSA, Supertek (now
Cray), Suprenum, Stardent (Ardent+Stellar),
Supercomputer Systems Inc., Synapse,
Thinking Machines, Vitec, Vitesse,
Wavetracer.Copyright G Bell & TCM History Center
7/27/2016
57
NCSA Cluster of 8 x 128
processors SGI Origin c1999
7/27/2016
Copyright G Bell & TCM History Center
58
Humble
beginning:
In 1981…
would you
have
predicted
this would
be the
basis of
supers?
7/27/2016
Copyright G Bell & TCM History Center
59
Intel’s ipsc 1 & Touchstone Delta
7/27/2016
Copyright G Bell & TCM History Center
60
Intel Sandia Cluster 9K PII: 1.8 TF
7/27/2016
Copyright G Bell & TCM History Center
61
GB with NT, Compaq, HP cluster
7/27/2016
Copyright G Bell & TCM History Center
62
The Alliance LES NT Supercluster
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft
• Andrew Chien, CS UIUC-->UCSD
• Rob Pennington, NCSA
• Myrinet Network, HPVM, Fast Msgs
• Microsoft NT OS, MPI API
192 HP 300 MHz
64 Compaq 333 MHz
7/27/2016
Copyright G Bell & TCM History Center
63
Our Tax Dollars At Work
ASCI for Stockpile Stewardship




Intel/Sandia:
9000x1 node Ppro
LLNL/IBM:
512x8 PowerPC
(SP2)
LANL/Cray:
6144 CPUs
Maui
Supercomputer
Center
–7/27/2016
512x1
SP2Copyright G Bell & TCM History Center
64
ASCI Blue Mountain 3.1
Tflops SGI Origin 2000
12,000 sq. ft. of floor space
1.6 MWatts of power
530 tons of cooling
384 cabinets to house 6144 CPU’s with
1536 GB (32GB / 128 CPUs)
48 cabinets for metarouters
96 cabinets for 76 TB of raid disks
36 x HIPPI-800 switch Cluster Interconnect
9 cabinets for 36 HIPPI switches
about 348 miles of fiber cable
7/27/2016
Copyright G Bell & TCM History Center
65
Half of SGI ASCI Computer at
LASL c1999
7/27/2016
Copyright G Bell & TCM History Center
66
LASL ASCI Cluster Interconnect
18 Separate Networks
18 16x16 Crossbar Switches
1
2
3
1
4
5
2
6
7
8
3
9
10
11
4
12
13
14
15
16
5
17
18
6
6 Groups of 8 Computers each
7/27/2016
Copyright G Bell & TCM History Center
67
LASL ASCI Cluster Interconnect
7/27/2016
Copyright G Bell & TCM History Center
68
3 TeraOps makes a difference!
ASCI Blue Mountain MCNP
simulation:
• 1 mm resolution
(256x256x250)
• 100 million particles
• 2 hours on 6144 CPUs
Typical MCNP BNCT
simulation:
• 1 cm resolution
(21x21x25)
• 1 million particles
• 1 hour on 200 MHz PC
7/27/2016
Copyright G Bell & TCM History Center
69
LLNL Architecture
System Parameters
• 3.89 TFLOP/s Peak
• 2.6 TB Memory
• 62.5 TB Global disk
Sector S
HiPPI
2.5 GB/node Memory
24.5 TB Global Disk
8.3 TB Local Disk
12
6
FDDI 6
24
HPGN
24
Sector Y
Each SP sector has
• 488 Silver nodes
• 24 HPGN Links
7/27/2016
Sector K
1.5 GB/node Memory
24 20.5 TB Global Disk
4.4 TB Local Disk
SST Achieved >1.2TFLOP/s
on sPPM and Problem
>70x Larger
Than Ever Solved Before!
1.5 GB/node Memory
20.5
TB Global
Disk
Copyright G Bell
& TCM
History Center
4.4 TB Local Disk
70
I/O Hardware Architecture
488 Node IBM SP Sector
GPFS
GPFS
GPFS
GPFS
GPFS
GPFS
GPFS
GPFS
56 GPFS
Servers
System Data and Control Networks
24 SP Links to
Second Level
Switch
432 Silver Compute Nodes
Each SST Sector
Full system mode
• local and global I/O file system
• Application launch over full 1,464 Silver nodes
• 2.2 GB/s global I/O performance
• 1,048 MPI/us tasks, 2,048 MPI/IP tasks
• 3.66 GB/s local I/O performance
• High speed, low latency communication
7/27/2016
Copyright
G
Bell
&
TCM
History
CenterSTDIO interface
• Separate SP first level switches
• Single
71
Fujitsu VPP5000 multicomputer:
(not available in the U.S.)


Computing nodes
speed: 9.6 Gflops vector, 1.2 Gflops scalar
primary memory: 4-16 GB
memory bandwidth: 76 GB/s (9.6 x 64 Gb/s)
inter-processor comm: 1.6 GB/s non-blocking
with global addressing among all nodes
I/O: 3 GB/s to scsi, hippi, gigabit ethernet, etc.
1-128 computers deliver 1.22 Tflops
7/27/2016
Copyright G Bell & TCM History Center
72
NEC SX 5: clustered SMPv
(not available in the U.S.)

SMPv computing nodes
–
–
–
–

4 - 8 processors/computer
Processor pap: 8 Gflops
Memory
I/O speed
Cluster
7/27/2016
Copyright G Bell & TCM History Center
73
NEC Supers
7/27/2016
Copyright G Bell & TCM History Center
74
High Performance COTS

Raceway and (RACE++) Busses
–
–
–
–

ANSI Standardized
Mapped Memory, Message Passing, ‘Planned
Direct’ Transfers
Circuit Switched; Basic Bus Interface Unit Is a
6 (8) Port Bidirectional Switch at 40MB/s
(66MB/s) Per Port.
Scales to 4000 Processors
Skychannel
–
–
–
ANSI Standardized
320mb/sec; Crossbar backplane supports up
to 1.6 GB/s Throughput Non-blocking
Heart of Air Force $3M / 256 Gflops System
7/27/2016
Copyright G Bell & TCM History Center
75
Mercury & Sky Computers - & $
Rugged System With 10 Modules ~ $100K; $1K /#
3
Scalable to several K processors; ~1-10 Gflop / Ft
10 9U Boards * 4 Ppc750’s  440 Specfp95 in
3
1 Ft (18.5 * 8 * 10.75”)
Sky 384 Signal Processor, #20 on ‘Top 500’, $3M
7/27/2016 VME Platinum
Copyright GSystem
Bell & TCM History Center
Mercury
Sky PPC Daughtercard
76
Brookhaven/Columbia QCD c1999
(1999 Bell Prize for performance/$)
7/27/2016
Copyright G Bell & TCM History Center
77
Brookhaven/Columbia QCD
board
7/27/2016
Copyright G Bell & TCM History Center
78
HT-MT: What’s 0.55? c1999
7/27/2016
Copyright G Bell & TCM History Center
79
HT-MT…
Mechanical: cooling and signals
 Chips: design tools, fabrication
 Chips: memory, PIM
 Architecture: mta on steroids
 Storage material

7/27/2016
Copyright G Bell & TCM History Center
80
HTMT challenges the heuristics
for a successful computer




Mead 11 year rule: time between lab
appearance and commercial use
Requires >2 break throughs
Team’s first computer or super
It’s government funded…
albeit at a university
7/27/2016
Copyright G Bell & TCM History Center
81
Download