Gross

advertisement
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA High Performance Computing
(HPC) Program
Brian Gross
Acting Deputy Director, High Performance Computing and Communications
Acting Project Manager, R&D High Performance Computing System
August 4, 2015
August 4, 2015
NCEP Production Suite Review
1
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA HPC Agenda








National Strategic Computing Initiative
Governance
Performance
Funding
R&D HPCS Overview
WCOSS Overview
Schedules
Big Data Project
August 4, 2015
NCEP Production Suite Review
2
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
National Strategic Computing Initiative


Executive Order signed July 29, 2015 for a multi-agency strategic vision and
Federal investment strategy in high-performance computing
Objectives:
–
–
–
–
–

Roles and Responsibilities
–
–
–

Accelerate delivery of a capable exascale computing system
Connect computing used for modeling and simulation to data analytic computing
Establish, over the next 15 years, a viable path forward for future HPC systems postMoore's Law
Increase the capacity and capability of an enduring national HPC ecosystem
Develop public-private collaboration to share benefits between the US Gov’t and
industrial and academic sectors.
Lead Agencies: DOE, DOD, NSF
Foundational R&D Agencies: IARPA, NIST
Deployment Agencies: NASA, NOAA, NIH, FBI, DHS
• These will develop mission-based HPC requirements to influence the early stages
of the design of new HPC systems and will seek viewpoints from the private sector
and academia on target HPC requirements
https://www.whitehouse.gov/blog/2015/07/29/advancing-us-leadership-highperformance-computing
August 4, 2015
NCEP Production Suite Review
3
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA HPC Governance Structure
HPC Board
LO DAAs
Chaired by NOAA CIO
Strategic Management
Strategic Execution and
Evaluation (SEE) Process
Provides Prioritized and
Funded Requirements
• Oversee Performance and Management of
NOAA HPC
• Integrate Execution of Prioritized and Funded
Requirements Using HPC
• Provide criterion and guidance for establishing
allocations
Allocation Committee
Technical Committee
NOAA Lab/Center Directors (HPC relevant)
NOAA Program Managers (HPC relevant)
Chaired by Lab/Center Director (Rotating)
(Current Integrated Management Team)
HPCC Office, NOAA HPC Site Leads, ITSSO
Chaired by HPCC Office Director
HPC Resource Management
•
•
•
•
Resource Technical Estimating
Allocation Planning
Allocation Execution
Monitor & Evaluate Allocations
Architecture/Acquisition Management
•
•
•
•
Acquisition Execution
Selection Process
Lifecycle Management
IT Security
NOAA Administrative Order (NAO) 216-110:
Management and Governance of High Performance Computing
August 4, 2015
NCEP Production Suite Review
4
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA HPC Overview
August 4, 2015
NCEP Production Suite Review
5
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA HPC Overview
Chart includes FY16 PB increase profile for R&D HPCS
August 4, 2015
Chart includes FY16 PB increases profile for R&D HPCS
NCEP Production Suite Review
6
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
R&D HPCS Overview
Development HPC
Research HPC
Systems Integration Contract (CSC)
Interagency Agreement (DOE/ORNL)
Systems Configuration
Systems Configuration
Zeus - Fairmont, WV (GSA Leased Space)
–Short-term/seasonal/inter-annual predictions
–383 teraflops - SGI
Theia - Fairmont, WV (Zeus Replacement)
–Short-term/seasonal/inter-annual predictions
–1,024 teraflops - Cray
Jet - Boulder, CO (NOAA Skaggs Facility)
–Hurricane forecast improvement
–421 teraflops - Aspen & Cray
Princeton, NJ (NOAA/GFDL)
–Climate post-processing and analysis
–106 nodes (8 core Intel Xeon) – Dell
Gaea - Oak Ridge, TN (Oak Ridge National Lab)
–Climate change research and projections
–1,100 teraflops Cray
Performance Measures
Performance Measures
• May 2010-May 2019 / $317M / IDIQ
• 9 yrs with 4-yr base, 4-yr option, 1-yr transition
• Minimum 96.0% System Availability
• Minimum 99.0% Data Availability
August 4, 2015
• Aug 2009-Aug 2016 / $108M / Cost Reimbursable
• 5 year agreement extended 2 years
• New IA signed June 18, 2015, through FY2020
Titan - Oak Ridge, TN (Oak Ridge National Lab)
–Applications for next generation architectures
–500 teraflops (2.6M node-hours) allocation of
27,000 teraflops Cray using Nvidia Graphics
Processing Units
• Minimum 96.0% System Availability
• Minimum 99.0% Data Availability
NCEP Production Suite Review
7
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
WCOSS Overview
Facility Locations
Contract Award
•
•
•
•
Awarded to IBM on November 23, 2011
ID/IQ w/ firm fixed price task orders
Contract Value of $502 million
Period of Performance
• 5 year base period (Nov 2011 - Nov 2016)
• 3 year option period
• 2 year option period for transition
Task Orders
• Task Order 01
• Initial project management task
• Task Order 02
• Phase I Base system 170 TF (2012)
• Phase 2 Midlife Upgrade 600 TF (2015)
• Task Order 03
• Phase I enhancement 60 TF (2012)
• Task Order 04: Cray XC-40
• 2,060 teraflops per site
August 4, 2015
 Primary
– Reston, VA (IBM provided facility)
 Backup
– Orlando, FL (IBM provided facility)
System Configuration
 Identical Systems (per site)
– IBM iDataPlex (Sandybridge)/NextScale
(IvyBridge)
– 830 teraflops per site
Performance Requirements
–
–
–
–
–
Minimum 99.9% Operational Use Time
Minimum 99.0% On-time Product Generation
Minimum 99.0% Development Use Time
Minimum 99.0% System Availability
Failover tested regularly
NCEP Production Suite Review
8
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
R&D HPCS Schedule
FY2015
FY2016
FY2017
FY2018
FY2019
FY2020
Gaea
Current System
Planned Recapitalization
Maintenance and Enhancement of Capability
(FY16 Request - $9M/year)
Maintenance and Enhancement of Capability
(FY16-19 Request - Ramp Up Profile)
Recapitalization
DOE IAA / Sustain Operations
DOE IAA / Five-year Agreement
DOE IAA / Five-year Agreement
Zeus/Theia
Current System (Zeus)
Sandy Supplemental (Phase 1 - Theia)
Sandy Supplemental (Phase 2 - Fine-grained)
Recapitalization (FY16-19 Request - Ramp Up Profile)
Maintenance and Enhancement of Capability
(FY16-19 Request - Ramp Up Profile)
Jet
Current System
Annual Enhancements
Annual Enhancements
Annual Enhancements
CSC Option
CSC 1 year Option
R&D HPCS Integrator follow-on
Acquisition
August 4, 2015
System Availability
NCEP Production Suite Review
Transition
Contract / IAA
9
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
WCOSS Schedule
Fiscal Year
2014
2015
2016
2017
2018
2019
2020
2021
2021
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
IBM IDIQ Contract
Base period
Task Order 002
Phase I
Phase I system is retired
Phase II
Phase II system is retired
Task Order 003
Task Order 003 system is retired
Task Order 004
NCEP can decide how long it wishes to keep TO4 System
TO 4 Award
TO 4 Acceptance
Task Order 005
(Power)
TO 5 Award
3 Year IDIQ Option
Period
Task Order 006
(Phase III)
2 Year IDIQ Option
Transition Period
Task Order 007
(Phase IV)
August 4, 2015
NCEP Production Suite Review
10
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA’s Big Data Project
The Idea
• Meet DOC Goal to Transform the Department’s data capacity to enhance the value, accessibility and usability of
Commerce data for government, business and the public
• Unleash full potential of NOAA data through innovative approaches
• Enable private sector to develop new information products and lines of business
• Improve compliance with Open Data policy (OMB M-13-13)
The Approach
• Position NOAA’s data alongside computing and analysis capabilities
• Create self-sustainable market ecosystem where industry:
• Moves NOAA data to cloud at no net cost to government
• Provides public access to original NOAA data
• Creates potential for new profitable services
Anchor partners established in April 2015 as nucleus around which data marketplaces
(Data Alliances) can form (https://data-alliance.noaa.gov/)
Results to date
• Over 135TB of NEXRAD Level II data moved from National Centers for Environmental Information (NCEI)
archive in Asheville, NC to collaborators
August 4, 2015
NCEP Production Suite Review
11
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
Backup Slides
August 4, 2015
NCEP Production Suite Review
12
NATI O NAL O CEAN I C AN D ATM O S PH ER I C ADM I NI STRATI O N
NOAA HPC Organization
Zachary Goldstein
Bill Lapenta
Director, NCEP
Ben Kyger
Director, NCEP Central
Operations
NCEP Central Operations
- Production
Management Branch
- Shared Infrastructure
Services Branch
-Systems Integration
Branch
Mitchell Ross
NOAA CIO and Director, High
Performance Computing and
Communications
Director, NOAA Acquisition
and Grants Office
Brian Gross
Acting Deputy Director, High
Performance Computing and
Communications
Weather and Climate Operational
Supercomputing System
Mike Kane
FAC-P/PM Level III
Kelly Mabe
Director, Strategic Sourcing
Acquisition Division
R&D High Performance
Computing System
HPCC Acquisition Support
Brian Gross (Acting)
FAC-C Level III
Mike Kane
Bernie Siebers
FAC-COR Level III
FAC-COR Level III
Rene Rodriguez ISSO
Jeff Flick ISSO
Michael Blumenfeld
Contract
Specialists
R&D Integrated
Management Team
August 4, 2015
NCEP Production Suite Review
13
Download