D. Powell - NCSA Evolution of an HPC Centerx

advertisement
NCSA – Evolution of an HPC Center
Infrastructure and Services for Scientific
Analysis and Decision Support
Danny Powell
Executive Director
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
Talk Outline
• About NCSA – Who we are now..
– Basic numbers
– Mission
– Basic methods of operation
•
Projects and Customers
–
–
–
–
•
Cyber-Infrastructure and Science Projects
Industry
Education
Government – Public Health
Evolving into a successful HPC Center
–
–
–
–
How we changed over the years
User service – centric focus
Your staff – it’s almost always about the people
Management – effective roles
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
National Center for Supercomputing
Applications
• Applied Research Unit of University of Illinois
– Origin: 1986 NSF-funded national supercomputing centers
– Original Mission: Provide state-of-the-art computing and data capabilities to the
nation’s scientists and engineers
– Develop software tools and software systems needed to make full use of advanced
computing and data systems (Mosaic, Apache Web Server, Telnet, D2K, MyProxy,
numerous others…)
• NCSA by the Numbers
– Approximately 275 staff (250 technical/professional staff)
– Two facilities (NCSA Building, NPCF) (>220k sq.ft)
Basic Facts about NCSA
•Computing/Data Resources
– Blue Waters: 11+ Petaflop (1+ PF sustained) computer (Cray)
• Most powerful machine in NSF portfolio – NSF’s only Tier One machine
• $350 million project ($200 million construction - $150 million operations)
– Mid-Range Supercomputing systems: ~200 TF
– Archival storage system: 500+ PB
– Advanced visualization systems
•Types of projects
– Local, National and Global scale
– Individual tools to large CI frameworks
– Point solutions to systemic improvements
•IP
– Majority of work at NCSA is open source.
– Can effectively deal with secure environments, proprietary codes, confidentiality
National Center for Supercomputing Applications
It is All About Working with Others
•
Funding
–
–
•
IACAT (Institute for Advanced Computing Applications and
Technologies)
–
•
Integrates applied research of NCSA with basic research teams of Universities
International Program
–
–
•
Federal Agencies, Industry, State of Illinois, Foundations, International sources
Most projects are partnerships with others (88%)
• Leveraging skills/resources of others
• Goal to be viewed as the “Partner of Choice”
30+ institutions from 22+ countries
Faculty and student exchanges, joint projects, workshops, technology sharing
Industrial Program
–
–
Nationally/internationally recognized for it’s level of functional interaction, technology
transfer, student engagement
23+ companies (Fortune 50/100/500, smaller technology companies)
National Center for Supercomputing Applications
NCSA Bridges Basic Research and Commercialization with Application
Phase 0
Concept/
Vision
Phase 1
Feasibility
Phase 2
Design/
Development
Phase 3
Prototyping
Phase 4
Production/
Deployment
Product Life Cycle
Theoretical
&
Basic
Research
Applied
Prototyping
&
Development
Optimization
&
Robustification
NCSA
Commercialization
&
Production
(.com or .org)
Bridges the Gap
BETWEEN
Basic Research & Commercialization
Universities
& Labs
Application
Economic Development
Private
Industry
Mission: Enable Science/Engineering/Education
Effective
Resource
Utilization
Individual tools, System software,
Analytics, Visualization,
Integrated SW systems,
Workflow, User Support,
Training
USERS:
High End
Computer
& Data
Needs
NCSA
Enables effective/efficient
use of high end computer
and data resources in support
of science and education
Scientific,
Decision
Support,
Inquiry
Results
Projects and Customers
CyberInfrastructure Development
A Collaboration/Partnership with a Broad
Set of Communities
National Center for Supercomputing Applications
Blue Waters
Presentation Title
9
Blue Waters Project
Input from Scientific Community
•
•
D. Baker, University of Washington
Coastal circulation and storm surge modeling
Protein structure refinement and determination
•
•
M. Campanelli, RIT
•
D. Ceperley, UIUC
•
J. P. Draayer, LSU
•
P. Fussell, Boeing
•
C. C. Goodrich
•
•
J. B. Klemp et al., NCAR
Weather forecasting/hurricane modeling
P. Spentzouris, Fermilab
Design of new accelerators
•
W. M. Tang, Princeton University
Simulation of fine-scale plasma turbulence
•
A. W. Thomas, D. Richards, Jefferson Lab
Lattice QCD for hadronic and nuclear physics
M. L. Klein, University of Pennsylvania
Biophysical and materials simulations
•
•
V. Govindaraju
Image processing and feature extraction
J. P. Schaefer, LSST Corporation
Analysis of LSST datasets
S. Gottlieb, Indiana University
Lattice quantum chromodynamics
•
•
M. Gordon, T. Windus, Iowa State University
Electronic structure of molecules
J. P. Ostriker, Princeton University
Virtual universe
Space weather modeling
•
M. L. Norman, UCSD
Simulations in astrophysics and cosmology
Aircraft design optimization
•
S. McKee, University of Michigan
Analysis of ATLAS data
Ab initio nuclear structure calculations
•
M. Maxey, Brown University
Multiphase turbulent flow in channels
Quantum Monte Carlo molecular dynamics
•
W. K. Liu, Northwestern University
Multiscale materials simulations
Computational relativity and gravitation
•
R. Luettich, University of North Carolina
•
J. Tromp, Caltech/Princeton
Global and regional seismic wave propagation
•
P. R. Woodward, University of Minnesota
Astrophysical fluid dynamics
National Center for Supercomputing Applications
Languages
Fortran/CAF
(OpenACC)
C (OpenACC)
Compilers
Cray
Compiling
Environment
(CCE)
C++ (OpenACC)
Programming
Models
Distributed
Memory
(Cray MPT)
• MPI
• SHMEM
IO Libraries
Tools
NetCDF
Environment setup
HDF5
ADIOS
Python
GNU
UPC
Performance
Analysis
Debuggers
Cray
Performance
Monitoring and
Analysis Tool
Allinea DDT
PAPI
lgdb
Shared Memory
• OpenMP 3.0
PGAS & Global
View
• UPC (CCE)
• CAF (CCE)
Prog. Env.
PerfSuite
Eclipse
Tau
Traditional
Charm++
Optimized
Scientific
Libraries
Resource
Manager
Adaptive/Ot
her
LAPACK
Modules
Debugging Support
Tools
• Fast Track
Debugger
(CCE w/ DDT)
• Abnormal
Termination
Processing
BLAS (libgoto)
Iterative
Refinement
Toolkit
STAT
Visualization
Cray Comparative
Debugger#
VisIt
Data Transfer
Paraview
GO
YT
HPSS
ScaLAPACK
RAIT
Cray Adaptive
FFTs (CRAFFT)
FFTW
Cray PETSc
(with CASK)
Cray Trilinos
(with CASK)
Cray Linux Environment (CLE)/SUSE Linux
Cray developed
3rd party packaging
Under development
NCSA supported
Licensed ISV SW
MWTCC
Mayto
31,
Cray
added -value
3rd2013
party
11
Blue Waters
Designed to meet compute-intensive, memoryintensive, and data-intensive needs across a
wide range of disciplines.
• Peak performance: 11.61 PF
•
• Cray XE6 cabinets: 237
• AMD Interlagos processors: >49,000 •
• 2.3 GHz
• 22 640 compute nodes
•
• 362,240 Bulldozer cores
•
• Cray XK6 cabinets: >30
•
• NVIDIA GPUs: >3,000
• Interconnect: Cray Gemini / 3D torus •
• Usable storage: >25 PB
• Usable storage bandwidth: >1 TB/s •
• Aggregate system memory: >1.5 PB
System Storage
• Scaling to 500 petabytes
Bandwidth to near-line storage: 100
GB/s
Memory per core: 4 GB
Number of disks: >17,000
Number of memory DIMMS:
>190,000
External network bandwidth: 100
Gb/s scaling to 300 Gb/s
Integrated near-line environment:
Presentation Title
12
XSEDE – National Compute and Data
CyberInfrastructure
•
Collaboration between multiple US CI centers with deep
experience: a partnership led by NCSA
•
PI:
•
John Towns
– Co-PIs: Jay Boisseau,
Gregg Peterson,
Ralph Roskies,
Nancy Wilkins-Diehr,
NCSA/Univ of Illinois
TACC/Univ of Texas Austin
NICS/Univ of Tenn-Knoxville
PSC/CMU
SDSC/UC-San Diego
Partners who complement these CI centers with expertise in
science, engineering, technology and education
– Univ of Virginia
SURA
Indiana Univ
Univ of Chicago
Berkeley
Shodor
Ohio Supercomputer Center
Cornell
Purdue
Rice
NCAR
Jülich Supercomputing Centre
13
Advanced Information Systems
National Cyberinfrastructure
Hardware
• Computers
• Data sources
• Data stores
• Networks
Software
• Middleware
• Portals
• Grid-enabled
•
Applications
•
Visualization
•
Data analysis
• Workflows
National Center for Supercomputing Applications
CyberInfrastructure is also about the
tools/systems that allow effective use
•
•
•
•
•
•
•
•
•
•
Workflow
Data management
Software models/simulations
Compute resources
Software/Hardware optimization
Visualization tools and resources
Analytic tools
Collaborative environments
Resource sharing
Publishing support tools
National Center for Supercomputing Applications
Examples:
Community Infrastructure Projects
• Earthquake Engineering
•
•
•
•
•
•
– Consequence based risk management for seismic events
Environmental Observatories
– Ocean Observatories, Coupled Human/Natural Systems,
BioDiversity
Atmospheric Modeling
– Severe Weather Predictions, Regional Climate Modeling
Astronomy
– Very large data transport, processing, and analysis pipelines
BioMedical Informatics
– Multisource infectious disease surveillance and patient safety
Humanities/Social Science Research
– Digital libraries, Text/Image analysis, social networks
Science Educational Support Systems
– Teaching support and educational enhancement systems
National Center for Supercomputing Applications
Projects and Customers
Industrial Partnerships
National Center for Supercomputing Applications
Private Sector Program Partners – August 2012
Industrial Interests in HPC
•
•
•
•
•
PDM (Product Development Management)
CRM (Customer Relationship Management)
ERP (Enterprise Resource Planning)
SCM (Supply Chain Management)
BENEFITS:
–
–
–
–
–
–
–
Reduced Time-to-Market
Improved Product Quality
Reduced Prototyping Costs
Re-use original data
Reduced Waste
Framework for Optimization
Global Collaboration
Imaginations unbound
Industrial Activities
•
Cycle provision
–
–
–
•
•
•
•
•
Overflow – when need exceeds their internal capacity
Testing – new architectures before purchasing
Research – testing new methods prior to large investments
Scalability, algorithms, optimization, security, …
Prototype tool/system development
Training
Peer discussions – on non-competitive basis
– Stated as an important and unique reason for participating
Industrial park participation
–
–
Partners – proximity to expertise and students
New company spinoffs
Imaginations unbound
Projects and Customers
Education
National Center for Supercomputing Applications
Training
•
•
•
Workshops
– Train the trainer workshops
– Targeted disciplinary/technology/techniques workshops
– National conferences and other venues
Training materials
– XSEDE https://www.xsede.org/training1
– Blue Waters – Petascale undergraduate education
program http://www.shodor.org/petascale/
Short courses
– Virtual School of Computational Science and
Engineering – petascale oriented (including big data)
– http://www.vscse.org/
– Collaboration – multiple universities
National Center for Supercomputing Applications
Outreach
•
Public awareness
– Visualization of real scientific data in public venues
• Planetariums – digital domes – astronomy
– Hubble 3-D
– Cosmic Voyage
• Science and Technology Museums – weather, astronomy
– Search for Life
– Computational Tornado Science
– Dynamic Earth
• TV and Film
– “Tree of Life” - Academy Award nomination – Cinematography
and visual effects
– “Hunt for the Supertwister” - a public television (NOVA)
special
– “Monster of the Milky Way” - NOVA PBS television special
– Others …
National Center for Supercomputing Applications
Educational Technology
In support of the learning process
•
Often - the technology used to support research
is also valuable in supporting education
– Digital informational resources
• Books, references, lectures, photos, videos, audio
• Virtual museums, artifacts
• Data, experiments
– Tools
• Analysis, Inquiry, Applications, Visualization
• Models and Simulations
– Collaborative Environments
• Virtual coordination, workflow spaces
• Resource sharing – data, computation, visualization
National Center for Supercomputing Applications
Projects and Customers
Government and Public Health
Informatics
National Center for Supercomputing Applications
Examples of Uses
of HPC / Data Analytics
– Illinois State Police – analysis of historical data
–
–
–
–
–
to help determine crime (and hence staffing) patterns
Policy makers – hazard risk assessments and
planning (and response)
Public health officials – early warning on disease
outbreaks, with informed options to manage
National Archives – data tools for long term
preservation and for public analysis of the data
Economic Development – agricultural
marketing enhancement and monitoring program
Policy Decision Support - Urban Planners,
Environmental Monitoring, Socio-Economic
Modeling, Social Network analysis… many others
National Center for Supercomputing Applications
Evolving into a successful HPC Center
How we have changed over time
User focus
Keeping your staff sharp – not complacent
Management
National Center for Supercomputing Applications
Mission: Enable Science/Engineering/Education
Effective
Resource
Utilization
Individual tools, System software,
Analytics, Visualization,
Integrated SW systems,
Workflow, User Support,
Training
USERS:
High End
Computer
& Data
Needs
NCSA
Enables effective/efficient
use of high end computer
and data resources in support
of science and education
Scientific,
Decision
Support,
Inquiry
Results
Traditional Function: System Support
•System Management
• Resource and job scheduling
•Storage Management
• On-line and Near-line system and data administration
• Information life cycle management
•Cyber-protection
•Networking provisioning and tuning
•System Monitoring
•System software upgrades and SW management.
•Quality Assurance
BW Full Service Overview
29
User Support Function:
Basic and Beyond
• Requirement Analysis
• Service Request Management
• Application Services
• Application analysis
• Porting and Tuning at scale
• Bottleneck reduction
• Client consulting
• Application re-engineering
• Library and tools creation and support
• Third Party Application support
• Visualization and Data Analysis
• Information provisioning
• Documentation, notification, training, community
• Account/allocation management
• Quality Assurance
BW Full Service Overview
30
Community Engagement Function:
Relationship Building
•Partnership/Team Building
•Structured Requirement Analysis
•Workflow Systems
Business / operation rules
Collaborative environments
Intuitive user interfaces
Data storage, data management tools
Visualization and data analytics tools
•Community engagement
•Work Plan Management
•Participation in evaluation and planning
•Trust
BW Full Service Overview
31
Staff Changes (estimated numbers)
Technical staff breakdown
Current
Very Early Days
Technical system administration
50
70
Applied R&D
100
40
User Support (from basic service to
Customized disciplinary support)
50
20
Technical management (mid level to senior)
50
25
National Center for Supercomputing Applications
And Finally:
Organizational Management
•
Hire and retain skilled staff
– Continued professional development
– Keep staff motivated and sharp
• Proposals – competitions
• Peer speaking engagements – personnel exchanges
•
Enable them to grow personally and professionally
– Don’t micromanage – empower your staff to succeed, and let them
•
The MONEY – Always the Money!!!
– Core funding – work closely with your core funding sources
– Variety of competitive grant funding
– Help your funding agencies understand the value of HPC and
CyberInfrastructure, and what it takes to be successful.
– It’s not cheap, and the ROI will take time to show value – but
without a long term commitment from your core funding agency, it
will be very, very difficult to accomplish.
National Center for Supercomputing Applications
Questions?
STEM Smart Workshop •10 April 2012
Imaginations unbound
Building Integrated Application/Decision Support
Systems – It’s an Iterative Process of Teamwork
User Representatives
Team Participation
Application Roadmaps
Requirements
Analysis & Specification
Partners
TeraGrid
Working Groups
Advisory Committees
Industrial Partners
International
Partners
Development &
System Integration
Prototype or Production
Cyberenvironments
Situation Analysis
National Center for
Supercomputing Applications
Technology Roadmaps
Cyberarchitecture Working
Group
Integrated Project Teams
Portals & GUIs
Workflow Mgmt
S&E Applications
Data Mining & Analysis
Visualization
Webservices
Collaboratories
Middleware
Security
Knowledge and
Decision Support
National Center for Supercomputing Applications
Science & Engineering Application Support
Science Team (ST) Requirements
and Challenges Gathering
SEAS Staff and Points of Contact (PoC)
Initial Contact
PoC Roles
• Questionnaire filled in from Science Team
• Collaboration meeting (in person, phone, web
video, etc.)
• Requirements analysis
• Project and code status on current systems
• Understand ST approach
• Develop initial work plans
• Provide in-depth assistance
• Ombudsman, advocate during
policy discussions
• Sub-award intermediary
Monitor Current State and Progress
• Direct
• Participate in ST telecons and meetings, join
mailing lists and wikis, etc.
• Follow trends and adoption of new
technology in the area
• Indirect
• Collect resource usage, utilization,
performance data
• Assist in code improvements
PoC and SEAS Services
• Application analysis
• Porting, debugging, profiling at
scale and depth
• Tuning, optimization and
bottleneck reduction
• Algorithmic re-engineering and
improvements
• Performance modeling
SEAS Accessibility
• Peer-to-Peer immediate contact
• Email, IM, phone, web, in-person
Work Plan Management
• Regular contact via email, phone, or
web conference
• Assessment of milestones and
deliverables
Traditional Services
•
•
•
•
•
Help desk
Service request tracking
Accounts and allocations
Consulting
Software inventory
Information Provisioning
•
•
•
•
Team portal/wiki space
Portal documentation
Individualized training
Workshops and webinars
Advanced Information Systems
Major New Data Sources
Computers
New high-end computers are producing
massive amounts of data from ever more
detailed computational models
Sensors, Surveys and Satellites
Sensor arrays, aerial surveys and satellite data
will revolutionize our understanding of the
environment
Instruments
New instruments, e.g., telescopes and detectors,
are using advanced digital technologies to
support increasingly detailed observations
National Center for Supercomputing Applications
NDEMC - OVERVIEW
• $5M, 18-month Public-Private Partnership (PPP)
• 4 OEMs; 4 solution providers;
• Phase 1: 8 manufacturing sector SMEs
• Advanced modeling, simulation & analysis (MS&A)
• Rationale:
•
– MS&A adoption by OEMs is high and growing
– SMMs’ use of advanced MS&A is suboptimal
– ROI is definitely favorable
Objectives:
– Boost MS&A adoption at SMMs
– Simplified access to advanced MS&A
– Demonstrate a scalable business model
Networks are Critical Infrastructure
National Center for Supercomputing Applications
Download