CAS2K5 NCAR Presentation

advertisement
Fast computers, big/fast
storage, fast networks
Marla Meehl
Manager, Network Engineering and Telecommunications,
NCAR/UCAR, Manager of the Front Range GigaPoP
Computational & Information Systems Laboratory
National Center for Atmospheric Research
History of Supercomputing at NCAR
new system @ NWSC
Production Systems
IBM p6 p575/4096 (bluefire)
IBM p6 p575/192 (firefly)
Experimental/Test Systems that became Production Systems
IBM p5+ p575/1744 (blueice)
IBM p5 p575/624 (bluevista)
Experimental/Test Systems
Aspen Nocona-IB/40 (coral)
IBM BlueGene-L/8192 (frost)
Red text indicates those systems that are currently in operation within the
IBM BlueGene-L/2048 (frost)
NCAR Computational and Information Systems Laboratory's computing facility.
IBM e1350/140 (pegasus)
IBM e1350/264 (lightning)
IBM p4 p690-F/64 (thunder)
IBM p4 p690-C/1600 (bluesky)
IBM p4 p690-C/1216 (bluesky)
SGI Origin 3800/128 (tempest)
Compaq ES40/36 (prospect)
IBM p3 WH2/1308 (blackforest)
IBM p3 WH2/604 (blackforest)
IBM p3 WH1/296 (blackforest)
Linux Networx Pentium-II/16 (tevye)
SGI Origin2000/128 (ute)
Cray J90se/24 (chipeta)
HP SPP-2000/64 (sioux)
Marine St.
Mesa Lab I
Mesa Lab II
Cray C90/16 (antero)
Cray J90se/24 (ouray)
Cray J90/20 (aztec)
Cray J90/16 (paiute)
Cray T3D/128 (T3)
Cray T3D/64 (T3)
Cray Y-MP/8I (antero)
CCC Cray 3/4 (graywolf)
IBM SP1/8 (eaglesnest)
TMC CM5/32 (littlebear)
IBM RS/6000 Cluster (CL)
Cray Y-MP/2 (castle)
Cray Y-MP/8 (shavano)
TMC CM2/8192 (capitol)
Cray X-MP/4 (CX)
Cray 1-A S/N 14 (CA)
Cray 1-A S/N 3 (C1)
CDC 7600
CDC 6600
CDC 3600
NWSC
35 systems
over 35 years
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
2010
27 Oct '09
2015
Current HPC Systems at NCAR
System Name
Vendor System
#
Frames
#
Proces
sors
11
4096
GHz
Peak
TFLOPs
POWER6
4.7
77.00
POWER6
4.7
3.610
PowerPC-440
0.7
22.935
Processor Type
Production Systems
bluefire (IB)
IBM p6 Power 575
Research Systems, Divisional Systems & Test Systems
Firefly (IB)
IBM p6 Power 575
1
192
frost (BG/L)
IBM BlueGene/L
4
8192
High Utilization, Low Queue Wait Times
Bluefire System Utilization (daily average)
100.0%
Bluefire system
utilization is routinely
>90%
Average queue-wait time
< 1hr


90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
Spring ML Powerdown
May 3 Firmware & Software Upgrade
20.0%
10.0%
14000
Nov-09
Oct-09
Sep-09
Aug-09
Jul-09
Jun-09
May-09
Apr-09
Mar-09
Feb-09
Jan-09
Economy
8000
Dec-08
Regular
Nov-08
0.0%
Premium
10000
Standby
6000
Average Queue Wait Times for User Jobs at the
NCAR/CISL Computing Facility
4000
2000
>2d
<2d
<1d
<8h
Peak TFLOPs >
<4h
<1h
Queue-Wait Time
<2h
<40m
<10m
<20m
<2m
0
<5m
# of Jobs
12000
Queue
Premium
Regular
Economy
Stand-by
4
bluefire
lightning
blueice
bluevista
since Jun'08
since Dec'04
Jan'07-Jun'08
Jan'06-Sep'08
77.0 TFLOPs
all jobs
00:05
00:36
01:52
01:18
1.1 TFLOPs
all jobs
00:11
00:16
01:10
00:55
12.2 TFLOPs
all jobs
00:11
00:37
03:37
02:41
4.4 TFLOPs
all jobs
00:12
01:36
03:51
02:14
NCAR FY2009 Computing Resource Usage by Discipline
Bluefire Usage
(FY2009)
Climate
49.3%
IP
Accelerated Scientific
Discovery
18.0%
Weather
10
Upper Atmosphere
0.3%
Miscellaneous
0.3%
Cloud Physics
1.3%
Basic Fluid Dynamics
2.6%
Oceanograph
4.0%
Atmospheric
Chemistry
Astrophysics 3.5%
4.4%
CURRENT NCAR COMPUTING >100TFLOPS
Peak TFLOPs at NCAR (All Systems)
IBM POWER6/Power575/IB
(bluefire)
ICESS Phase 2
100
IBM POWER6/Power575/IB
(firefly)
IBM POWER5+/p575/HPS
(blueice)
IBM POWER5/p575/HPS
(bluevista)
80
IBM BlueGene/L (frost)
IBM Opteron/Linux
(pegasus)
60
IBM Opteron/Linux
(lightning)
ICESS Phase 1
IBM POWER4/Federation
(thunder)
40
IBM POWER4/Colony
(bluesky)
bluefire
ARCS Phase 4
ARCS Phase 3
IBM POWER4 (bluedawn)
Linux
ARCS Phase 2
SGI Origin3800/128
20
blueice
lightning/pegasus
ARCS Phase 1
frost
IBM POWER3 (blackforest)
bluevista
firefly
bluesky
IBM POWER3 (babyblue)
blackforest
0
Jan-00
Jan-01
Jan-02
Jan-03
Jan-04
Jan-05
Jan-06
Jan-07
Jan-08
Jan-09
Jan-10
CISL Computing Resource Users
1347 Users in FY09
Univ Faculty,
191, 14%
Univ Research
Associates,
280, 21%
Government,
108, 8%
Other,
13, 1%
NCAR Assoc
Sci & Support
Staff,
252, 19%
NCAR
Scientists &
Proj
Scientists,
160, 12%
Graduate and
Undergrad
Students, 343,
25%
120
Universities with largest number of
users in FY09
Number of Users
100
80
60
40
20
0
Users are
from 114
U.S.
Universiti
Disk Storage Systems

Current Disk Storage for HPC and Data Collections:





215
335
110
152
TB for Bluefire (GPFS)
TB for Data Analysis and Vis (DAV) (GPFS)
TB for Frost (Local)
TB for Data Collections (RDA, ESG) (SAN)
Undergoing extensive redesign and expansion.
 Add ~2PB to (existing GPFS system) to create large central (x-
mounted) filesystem
Data Services Redesign

Near-term Goals:
–
–

Longer-term Goals:
–
–

Creation of unified and consistent data environment for NCAR HPC
High-performance availability of central filesystem from many projects/systems (RDA, ESG,
CAVS, Bluefire, TeraGrid)
Filesystem cross-mounting
Global WAN filesystems
Schedule:
–
–
–
Phase 1: Add 300TB (now)
Phase 2: Add 1 PB
(March 2010 )
Phase 3: Add .75 PB (TBD)
Mass Storage Systems

NCAR MSS (Mass Storage System)
 In Production since 70s

Transitioning from the NCAR Mass Storage System to
HPSS
– Prior to NCAR-Wyoming Supercomputer Center (NWSC)
commissioning in 2012
– Cutover Jan 2011
– Transition without data migration Silos/Tape Drives

Tape Drives/Silos
 Sun/StorageTek T10000B (1 TB media) with SL8500 libraries
 Phasing out tape silos Q2 2010
 Oozing 4 PBs (unique), 6 PBs with duplicates



“Online” process
Optimized data transfer
Average 20 TB/day with 10 streams
Estimated Growth
Total PBs Stored (w/ DSS Stewardship & IPCC AR5)
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
Oct'07
Oct'08
Oct'09
Oct'10
Oct'11
Oct'12
Facility Challenge
Beyond 2011

NCAR Data Center
– NCAR data center limits of power/cooling/space have been reached
– Any future computing augmentation will exceed existing center
capacity
– Planning in progress



Partnership
Focus on Sustainability and Efficiency
Project Timeline
NCAR-Wyoming Supercomputer
Center (NWSC) Partners







NCAR & UCAR
University of Wyoming University of Wyoming
State of Wyoming
Cheyenne LEADS
Wyoming Business Council
Cheyenne Light, Fuel & Power
Company
National Science Foundation
(NSF)
http://cisl.ucar.edu/nwsc/
NCAR
– Mesa Lab
Cheyenne
Focus on Sustainability





Maximum energy efficiency
LEED certification
Achievement of the smallest possible carbon footprint
Adaptable to the ever-changing landscape of HighPerformance Computing
Modular and expandable space that can be adapted as
program needs or technology demands dictate
Focus On Efficiency



Don’t fight mother nature
The cleanest electron is an
electron never used
The computer systems
drive all of the overhead
loads
Typical Modern Data
Center
0.1%
6.0%
15.0%
65.9%
13.0%
– NCAR will evaluate in
procurement
– Sustained performance / kW

Cooling
Fans
Electrical Losses
IT Load
NWSC Design
Focus on the biggest
losses
– Compressor based cooling
– UPS losses
– Transformer losses
Lights
Lights
Cooling
91.9%
Fans
Electrical Losses
IT Load
Design Progress




Will be one of the most efficient data centers
– Cheyenne Climate
– Eliminate refrigerant based cooling for ~98% of the year
– Efficient water use
– Minimal transformations steps electrically
Guaranteed 10% Renewable
Option for 100% (Wind Power)
Power Usage Effectiveness (PUE)
– PUE = Total Load / Computer Load = 1.08 ~ 1.10 for WY
– Good data centers 1.5
– Typical Commercial 2.0
Expandable and Modular

Long term
–
–
–
–

Medium term – Phase I / Phase II
–
–
–
–

24 – acre site
Dual substation electrical feeds
Utility commitment up to 36MW
Substantial capability fiber optics
Can be doubled
Enough structure for three or
four generations
Future container solutions or other advances
Short term – Module A – Module B
– Each Module 12,000 sq.ft.
– Install 4.5MW initially can be doubled in Module A
– Upgrade Cooling and Electrical systems in 4.5MW increments
Project Timeline






65% Design Review Complete (October 2009)
95% Design Complete
100% Design Complete (Feb 2010)
Anticipated Groundbreaking (June 2010)
Certification of Occupancy (October 2011)
Petascale System in Production (January 2012)
NCAR-Wyoming Supercomputer
Center
HPC services at NWSC


PetaFlop Computing – 1 PFLOP peak minimum
100’s PetaByte Archive (+20 PB yearly growth)
– HPSS
– Tape archive hardware acquisition TBD

PetaByte Shared File System(s) (5-15PB)
– GPFS/HPSS HSM integration (or Lustre)
– Data Analysis and Vis
– Cross Mounting to NCAR Research Labs, others


Research Data Archive data servers and Data portals
High-Speed Networking and some enterprise services
NWSC HPC Projection
Thousands
Peak PFLOPs at NCAR
2.0
NWSC HPC
(uncertainty)
1.8
NWSC HPC
1.6
1.4
IBM
POWER6/Power5
(bluefire)
1.2
IBM
POWER5+/p575/H
(blueice)
1.0
IBM POWER5/p57
(bluevista)
0.8
IBM BlueGene/L
0.6
IBM Opteron/Linu
(pegasus)
0.4
ICESS Phase 2
IBM Opteron/Linu
(lightning)
ICESS Phase 1
0.2
ARCS Phase 4
Jan-04
Jan-05
Jan-06
IBM POWER4/Co
(bluesky)
bluefire
bluesky
frost
0.0
Jan-07
Jan-08
23
Jan-09
Jan-10
Jan-11
Jan-12
Jan-13
Jan-14
24
Jan-15
Jan-14
Jan-13
Jan-12
Jan-11
Jan-10
Jan-09
Jan-08
Jan-07
Jan-06
Jan-05
Jan-04
Jan-03
Jan-02
120
Jan-01
Jan-00
Jan-99
Petabytes
NCAR Data
Archive
Projection
Total Data in the NCAR Archive (Actual and Projected)
Total
Unique
100
80
60
40
20
0
Tentative Procurement Timeline
NWSC Production
Installation & ATP
Award
NSF Approval
Negotiations
Proposal Evaluation
RFP Release
Benchmark Refinement
RFP Refinement
RFP Draft
RFI Evaluation
RFI
NDA's
Jul-09
39993
Jan-10
40177
Jul-10
40361
Dec-10
40545
Jul-11
40729
Jan-12
40913
Front Range GigaPoP (FRGP)









10 years of operation by UCAR
15 members
NLR, I2, Esnet peering @ 10Gbps
Commodity – Qwest and Level3
CPS and TransitRail Peering
Intra-FRGP peering
Akamai
10Gbps ESnet peering
www.frgp.net
Fiber, fiber, fiber



NCAR/UCAR intra-campus fiber – 10Gbps backbone
Boulder Research and Administration Network (BRAN)
Bi-State Optical Network (BiSON) – upgrading to
10/40/100Gbps capable
– Enables 10Gbps NCAR/UCAR Teragrid connection
– Enable 10Gbps NOAA/DC link via NLR lifetime lambda


DREAM – 10Gbps
SCONE – 1Gbps
Western Regional Network (WRN)



A multi-state partnership to ensure robust, advanced,
high-speed networking available for research,
education, and related uses
Increased aggregate bandwidth
Decreased costs
Questions
Download