NCSA – Evolution of an HPC Center Infrastructure and Services for Scientific Analysis and Decision Support Danny Powell Executive Director National Center for Supercomputing Applications University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Talk Outline • About NCSA – Who we are now.. – Basic numbers – Mission – Basic methods of operation • Projects and Customers – – – – • Cyber-Infrastructure and Science Projects Industry Education Government – Public Health Evolving into a successful HPC Center – – – – How we changed over the years User service – centric focus Your staff – it’s almost always about the people Management – effective roles National Center for Supercomputing Applications University of Illinois at Urbana-Champaign National Center for Supercomputing Applications • Applied Research Unit of University of Illinois – Origin: 1986 NSF-funded national supercomputing centers – Original Mission: Provide state-of-the-art computing and data capabilities to the nation’s scientists and engineers – Develop software tools and software systems needed to make full use of advanced computing and data systems (Mosaic, Apache Web Server, Telnet, D2K, MyProxy, numerous others…) • NCSA by the Numbers – Approximately 275 staff (250 technical/professional staff) – Two facilities (NCSA Building, NPCF) (>220k sq.ft) Basic Facts about NCSA •Computing/Data Resources – Blue Waters: 11+ Petaflop (1+ PF sustained) computer (Cray) • Most powerful machine in NSF portfolio – NSF’s only Tier One machine • $350 million project ($200 million construction - $150 million operations) – Mid-Range Supercomputing systems: ~200 TF – Archival storage system: 500+ PB – Advanced visualization systems •Types of projects – Local, National and Global scale – Individual tools to large CI frameworks – Point solutions to systemic improvements •IP – Majority of work at NCSA is open source. – Can effectively deal with secure environments, proprietary codes, confidentiality National Center for Supercomputing Applications It is All About Working with Others • Funding – – • IACAT (Institute for Advanced Computing Applications and Technologies) – • Integrates applied research of NCSA with basic research teams of Universities International Program – – • Federal Agencies, Industry, State of Illinois, Foundations, International sources Most projects are partnerships with others (88%) • Leveraging skills/resources of others • Goal to be viewed as the “Partner of Choice” 30+ institutions from 22+ countries Faculty and student exchanges, joint projects, workshops, technology sharing Industrial Program – – Nationally/internationally recognized for it’s level of functional interaction, technology transfer, student engagement 23+ companies (Fortune 50/100/500, smaller technology companies) National Center for Supercomputing Applications NCSA Bridges Basic Research and Commercialization with Application Phase 0 Concept/ Vision Phase 1 Feasibility Phase 2 Design/ Development Phase 3 Prototyping Phase 4 Production/ Deployment Product Life Cycle Theoretical & Basic Research Applied Prototyping & Development Optimization & Robustification NCSA Commercialization & Production (.com or .org) Bridges the Gap BETWEEN Basic Research & Commercialization Universities & Labs Application Economic Development Private Industry Mission: Enable Science/Engineering/Education Effective Resource Utilization Individual tools, System software, Analytics, Visualization, Integrated SW systems, Workflow, User Support, Training USERS: High End Computer & Data Needs NCSA Enables effective/efficient use of high end computer and data resources in support of science and education Scientific, Decision Support, Inquiry Results Projects and Customers CyberInfrastructure Development A Collaboration/Partnership with a Broad Set of Communities National Center for Supercomputing Applications Blue Waters Presentation Title 9 Blue Waters Project Input from Scientific Community • • D. Baker, University of Washington Coastal circulation and storm surge modeling Protein structure refinement and determination • • M. Campanelli, RIT • D. Ceperley, UIUC • J. P. Draayer, LSU • P. Fussell, Boeing • C. C. Goodrich • • J. B. Klemp et al., NCAR Weather forecasting/hurricane modeling P. Spentzouris, Fermilab Design of new accelerators • W. M. Tang, Princeton University Simulation of fine-scale plasma turbulence • A. W. Thomas, D. Richards, Jefferson Lab Lattice QCD for hadronic and nuclear physics M. L. Klein, University of Pennsylvania Biophysical and materials simulations • • V. Govindaraju Image processing and feature extraction J. P. Schaefer, LSST Corporation Analysis of LSST datasets S. Gottlieb, Indiana University Lattice quantum chromodynamics • • M. Gordon, T. Windus, Iowa State University Electronic structure of molecules J. P. Ostriker, Princeton University Virtual universe Space weather modeling • M. L. Norman, UCSD Simulations in astrophysics and cosmology Aircraft design optimization • S. McKee, University of Michigan Analysis of ATLAS data Ab initio nuclear structure calculations • M. Maxey, Brown University Multiphase turbulent flow in channels Quantum Monte Carlo molecular dynamics • W. K. Liu, Northwestern University Multiscale materials simulations Computational relativity and gravitation • R. Luettich, University of North Carolina • J. Tromp, Caltech/Princeton Global and regional seismic wave propagation • P. R. Woodward, University of Minnesota Astrophysical fluid dynamics National Center for Supercomputing Applications Languages Fortran/CAF (OpenACC) C (OpenACC) Compilers Cray Compiling Environment (CCE) C++ (OpenACC) Programming Models Distributed Memory (Cray MPT) • MPI • SHMEM IO Libraries Tools NetCDF Environment setup HDF5 ADIOS Python GNU UPC Performance Analysis Debuggers Cray Performance Monitoring and Analysis Tool Allinea DDT PAPI lgdb Shared Memory • OpenMP 3.0 PGAS & Global View • UPC (CCE) • CAF (CCE) Prog. Env. PerfSuite Eclipse Tau Traditional Charm++ Optimized Scientific Libraries Resource Manager Adaptive/Ot her LAPACK Modules Debugging Support Tools • Fast Track Debugger (CCE w/ DDT) • Abnormal Termination Processing BLAS (libgoto) Iterative Refinement Toolkit STAT Visualization Cray Comparative Debugger# VisIt Data Transfer Paraview GO YT HPSS ScaLAPACK RAIT Cray Adaptive FFTs (CRAFFT) FFTW Cray PETSc (with CASK) Cray Trilinos (with CASK) Cray Linux Environment (CLE)/SUSE Linux Cray developed 3rd party packaging Under development NCSA supported Licensed ISV SW MWTCC Mayto 31, Cray added -value 3rd2013 party 11 Blue Waters Designed to meet compute-intensive, memoryintensive, and data-intensive needs across a wide range of disciplines. • Peak performance: 11.61 PF • • Cray XE6 cabinets: 237 • AMD Interlagos processors: >49,000 • • 2.3 GHz • 22 640 compute nodes • • 362,240 Bulldozer cores • • Cray XK6 cabinets: >30 • • NVIDIA GPUs: >3,000 • Interconnect: Cray Gemini / 3D torus • • Usable storage: >25 PB • Usable storage bandwidth: >1 TB/s • • Aggregate system memory: >1.5 PB System Storage • Scaling to 500 petabytes Bandwidth to near-line storage: 100 GB/s Memory per core: 4 GB Number of disks: >17,000 Number of memory DIMMS: >190,000 External network bandwidth: 100 Gb/s scaling to 300 Gb/s Integrated near-line environment: Presentation Title 12 XSEDE – National Compute and Data CyberInfrastructure • Collaboration between multiple US CI centers with deep experience: a partnership led by NCSA • PI: • John Towns – Co-PIs: Jay Boisseau, Gregg Peterson, Ralph Roskies, Nancy Wilkins-Diehr, NCSA/Univ of Illinois TACC/Univ of Texas Austin NICS/Univ of Tenn-Knoxville PSC/CMU SDSC/UC-San Diego Partners who complement these CI centers with expertise in science, engineering, technology and education – Univ of Virginia SURA Indiana Univ Univ of Chicago Berkeley Shodor Ohio Supercomputer Center Cornell Purdue Rice NCAR Jülich Supercomputing Centre 13 Advanced Information Systems National Cyberinfrastructure Hardware • Computers • Data sources • Data stores • Networks Software • Middleware • Portals • Grid-enabled • Applications • Visualization • Data analysis • Workflows National Center for Supercomputing Applications CyberInfrastructure is also about the tools/systems that allow effective use • • • • • • • • • • Workflow Data management Software models/simulations Compute resources Software/Hardware optimization Visualization tools and resources Analytic tools Collaborative environments Resource sharing Publishing support tools National Center for Supercomputing Applications Examples: Community Infrastructure Projects • Earthquake Engineering • • • • • • – Consequence based risk management for seismic events Environmental Observatories – Ocean Observatories, Coupled Human/Natural Systems, BioDiversity Atmospheric Modeling – Severe Weather Predictions, Regional Climate Modeling Astronomy – Very large data transport, processing, and analysis pipelines BioMedical Informatics – Multisource infectious disease surveillance and patient safety Humanities/Social Science Research – Digital libraries, Text/Image analysis, social networks Science Educational Support Systems – Teaching support and educational enhancement systems National Center for Supercomputing Applications Projects and Customers Industrial Partnerships National Center for Supercomputing Applications Private Sector Program Partners – August 2012 Industrial Interests in HPC • • • • • PDM (Product Development Management) CRM (Customer Relationship Management) ERP (Enterprise Resource Planning) SCM (Supply Chain Management) BENEFITS: – – – – – – – Reduced Time-to-Market Improved Product Quality Reduced Prototyping Costs Re-use original data Reduced Waste Framework for Optimization Global Collaboration Imaginations unbound Industrial Activities • Cycle provision – – – • • • • • Overflow – when need exceeds their internal capacity Testing – new architectures before purchasing Research – testing new methods prior to large investments Scalability, algorithms, optimization, security, … Prototype tool/system development Training Peer discussions – on non-competitive basis – Stated as an important and unique reason for participating Industrial park participation – – Partners – proximity to expertise and students New company spinoffs Imaginations unbound Projects and Customers Education National Center for Supercomputing Applications Training • • • Workshops – Train the trainer workshops – Targeted disciplinary/technology/techniques workshops – National conferences and other venues Training materials – XSEDE https://www.xsede.org/training1 – Blue Waters – Petascale undergraduate education program http://www.shodor.org/petascale/ Short courses – Virtual School of Computational Science and Engineering – petascale oriented (including big data) – http://www.vscse.org/ – Collaboration – multiple universities National Center for Supercomputing Applications Outreach • Public awareness – Visualization of real scientific data in public venues • Planetariums – digital domes – astronomy – Hubble 3-D – Cosmic Voyage • Science and Technology Museums – weather, astronomy – Search for Life – Computational Tornado Science – Dynamic Earth • TV and Film – “Tree of Life” - Academy Award nomination – Cinematography and visual effects – “Hunt for the Supertwister” - a public television (NOVA) special – “Monster of the Milky Way” - NOVA PBS television special – Others … National Center for Supercomputing Applications Educational Technology In support of the learning process • Often - the technology used to support research is also valuable in supporting education – Digital informational resources • Books, references, lectures, photos, videos, audio • Virtual museums, artifacts • Data, experiments – Tools • Analysis, Inquiry, Applications, Visualization • Models and Simulations – Collaborative Environments • Virtual coordination, workflow spaces • Resource sharing – data, computation, visualization National Center for Supercomputing Applications Projects and Customers Government and Public Health Informatics National Center for Supercomputing Applications Examples of Uses of HPC / Data Analytics – Illinois State Police – analysis of historical data – – – – – to help determine crime (and hence staffing) patterns Policy makers – hazard risk assessments and planning (and response) Public health officials – early warning on disease outbreaks, with informed options to manage National Archives – data tools for long term preservation and for public analysis of the data Economic Development – agricultural marketing enhancement and monitoring program Policy Decision Support - Urban Planners, Environmental Monitoring, Socio-Economic Modeling, Social Network analysis… many others National Center for Supercomputing Applications Evolving into a successful HPC Center How we have changed over time User focus Keeping your staff sharp – not complacent Management National Center for Supercomputing Applications Mission: Enable Science/Engineering/Education Effective Resource Utilization Individual tools, System software, Analytics, Visualization, Integrated SW systems, Workflow, User Support, Training USERS: High End Computer & Data Needs NCSA Enables effective/efficient use of high end computer and data resources in support of science and education Scientific, Decision Support, Inquiry Results Traditional Function: System Support •System Management • Resource and job scheduling •Storage Management • On-line and Near-line system and data administration • Information life cycle management •Cyber-protection •Networking provisioning and tuning •System Monitoring •System software upgrades and SW management. •Quality Assurance BW Full Service Overview 29 User Support Function: Basic and Beyond • Requirement Analysis • Service Request Management • Application Services • Application analysis • Porting and Tuning at scale • Bottleneck reduction • Client consulting • Application re-engineering • Library and tools creation and support • Third Party Application support • Visualization and Data Analysis • Information provisioning • Documentation, notification, training, community • Account/allocation management • Quality Assurance BW Full Service Overview 30 Community Engagement Function: Relationship Building •Partnership/Team Building •Structured Requirement Analysis •Workflow Systems Business / operation rules Collaborative environments Intuitive user interfaces Data storage, data management tools Visualization and data analytics tools •Community engagement •Work Plan Management •Participation in evaluation and planning •Trust BW Full Service Overview 31 Staff Changes (estimated numbers) Technical staff breakdown Current Very Early Days Technical system administration 50 70 Applied R&D 100 40 User Support (from basic service to Customized disciplinary support) 50 20 Technical management (mid level to senior) 50 25 National Center for Supercomputing Applications And Finally: Organizational Management • Hire and retain skilled staff – Continued professional development – Keep staff motivated and sharp • Proposals – competitions • Peer speaking engagements – personnel exchanges • Enable them to grow personally and professionally – Don’t micromanage – empower your staff to succeed, and let them • The MONEY – Always the Money!!! – Core funding – work closely with your core funding sources – Variety of competitive grant funding – Help your funding agencies understand the value of HPC and CyberInfrastructure, and what it takes to be successful. – It’s not cheap, and the ROI will take time to show value – but without a long term commitment from your core funding agency, it will be very, very difficult to accomplish. National Center for Supercomputing Applications Questions? STEM Smart Workshop •10 April 2012 Imaginations unbound Building Integrated Application/Decision Support Systems – It’s an Iterative Process of Teamwork User Representatives Team Participation Application Roadmaps Requirements Analysis & Specification Partners TeraGrid Working Groups Advisory Committees Industrial Partners International Partners Development & System Integration Prototype or Production Cyberenvironments Situation Analysis National Center for Supercomputing Applications Technology Roadmaps Cyberarchitecture Working Group Integrated Project Teams Portals & GUIs Workflow Mgmt S&E Applications Data Mining & Analysis Visualization Webservices Collaboratories Middleware Security Knowledge and Decision Support National Center for Supercomputing Applications Science & Engineering Application Support Science Team (ST) Requirements and Challenges Gathering SEAS Staff and Points of Contact (PoC) Initial Contact PoC Roles • Questionnaire filled in from Science Team • Collaboration meeting (in person, phone, web video, etc.) • Requirements analysis • Project and code status on current systems • Understand ST approach • Develop initial work plans • Provide in-depth assistance • Ombudsman, advocate during policy discussions • Sub-award intermediary Monitor Current State and Progress • Direct • Participate in ST telecons and meetings, join mailing lists and wikis, etc. • Follow trends and adoption of new technology in the area • Indirect • Collect resource usage, utilization, performance data • Assist in code improvements PoC and SEAS Services • Application analysis • Porting, debugging, profiling at scale and depth • Tuning, optimization and bottleneck reduction • Algorithmic re-engineering and improvements • Performance modeling SEAS Accessibility • Peer-to-Peer immediate contact • Email, IM, phone, web, in-person Work Plan Management • Regular contact via email, phone, or web conference • Assessment of milestones and deliverables Traditional Services • • • • • Help desk Service request tracking Accounts and allocations Consulting Software inventory Information Provisioning • • • • Team portal/wiki space Portal documentation Individualized training Workshops and webinars Advanced Information Systems Major New Data Sources Computers New high-end computers are producing massive amounts of data from ever more detailed computational models Sensors, Surveys and Satellites Sensor arrays, aerial surveys and satellite data will revolutionize our understanding of the environment Instruments New instruments, e.g., telescopes and detectors, are using advanced digital technologies to support increasingly detailed observations National Center for Supercomputing Applications NDEMC - OVERVIEW • $5M, 18-month Public-Private Partnership (PPP) • 4 OEMs; 4 solution providers; • Phase 1: 8 manufacturing sector SMEs • Advanced modeling, simulation & analysis (MS&A) • Rationale: • – MS&A adoption by OEMs is high and growing – SMMs’ use of advanced MS&A is suboptimal – ROI is definitely favorable Objectives: – Boost MS&A adoption at SMMs – Simplified access to advanced MS&A – Demonstrate a scalable business model Networks are Critical Infrastructure National Center for Supercomputing Applications