LLNL-PRES-501654 FASTMath: Frameworks, Algorithms and Scalable Technologies for Mathematics FASTMath Team Lori Diachin, Institute Director FASTMath SciDAC Institute ‹#› ‹#› The FASTMath SciDAC project focuses on the development and use of mathematics software libraries The FASTMath SciDAC Institute develops and deploys scalable mathematical algorithms and software tools for reliable simulation of complex physical phenomena and collaborates with DOE domain scientists to ensure the usefulness and applicability of FASTMath technologies ‹#› ‹#› Our goal is to help application scientists overcome two fundamental challenges 1. Improve the quality of their simulations • Increase accuracy • Increase physical fidelity • Improve robustness and reliability 2. Adapt computations to make effective use of HPC platforms • • Massive concurrency Multi-/many-core nodes FASTMath addresses both challenges by focusing on the interactions among mathematical algorithms, software design, and computer architectures ‹#› ‹#› The FASTMath project brings together three previously separate SciDAC applied math projects APDEC: Focused on structured grid capabilities ITAPS: Focused on unstructured grid capabilities TOPS: Focused on solution of algebraic systems FASTMath continues these focus areas and also started new work in integrating key technologies ‹#› ‹#› • Iterative solution of linear systems • Direct solution of linear systems • Nonlinear systems • Eigensystems • Differential variational inequalities High Level Integrated Capabilities • Structured grid technologies • Unstructured grid technologies • Adaptive mesh refinement • Complex geometry • High-order discretizations • Particle methods • Time integration Solution of Algebraic Systems Tools for Problem Discretization FASTMath encompasses three broad topical areas • Adaptivity through the software stack • Management of field data • Coupling different physics domains • Mesh/particle coupling methods ‹#› ‹#› Structured grid capabilities focus on high order, mapped grids, embedded boundaries, AMR and particles Structured AMR Mappedmultiblock grids Embedded boundary methods Particle-based methods Application to cosmology, astrophysics, accelerator modeling, fusion, climate, subsurface reacting flows, low mach number combustion, etc. ‹#› ‹#› Our unstructured grid capabilities focus on adaptivity, high order, and the tools needed for extreme scaling Parallel mesh infrastructures Dynamic load balancing Mesh adaptation and quality control Parallel performance on unstructured meshes Architecture aware implementations Application to fusion, climate, accelerator modeling, NNSA applications, nuclear energy, manufacturing processes, etc. ‹#› ‹#› Our work on algebraic systems provides key solution technologies to applications Linear system solution using direct and iterative solvers Nonlinear system solution using acceleration techniques and globalized Newton methods Eigensolvers using iterative techniques and optimization Differential variational inequalities based on active-set algorithms Application to fusion, nuclear structure calculation, quantum chemistry, accelerator modeling, climate, dislocation dynamics etc, ‹#› ‹#› Integrating technologies is a key value added by the FASTMath Institute Mesh/solver interactions Mesh-to-mesh coupling methods Unstructured mesh technologies into simulation workflows Software unification strategies Application to climate, plasma surface interactions, structural mechanics, nuclear energy, cosmology, fluid flow, etc. ‹#› ‹#› FASTMath encompasses our algorithm development in widely used software Structured Mesh Tools BoxLib (Ann Almgren) Chombo (Phil Colella) Unstructured Mesh Tools PUIMI (Seegyoung Seol) MeshAdapt (Mark Shephard) MOAB (Vijay Mahadevan) Mesquite (Lori Diachin) PHASTA (Ken Jansen) APF (Cameron Smith) Hypre (Rob Falgout) PETSc ( Barry Smith) SuperLU (Sherry Li) ML/Trilinos (Jonathan Hu) Nonlinear Solvers/Differential Variational Inequalities FASTMath Software Partitioning Tools Zoltan (Karen Devine) ParMA (Mark Shephard) Linear Solvers SUNDIALS (Carol Woodward PETSc (Barry Smith) NOX/Trilinos (Andy Salinger) Time Integrators Eigensolvers SUNDIALS (Carol Woodward) PETSc (Barry Smith) PARPACK (Chao Yang) ‹#› ‹#› Three primary metrics gauge our success 1. Use of FASTMath technologies in applications • SciDAC applications • Broader engagement with SC, NNSA, Energy, DoD, NASA, industry, other 2. Deployment of FASTMath software • Efficient and robust on DOE Compute facilities • Improving ease of use 3. Participation in the scientific community • 110 publications; 235 presentations since project inception • Team and individual tutorials on FASTMath software ‹#› ‹#› Our software is used world-wide and in a variety of application areas Package Name/Usage Stats Basic Capabilities Application Examples Chombo (600 downloads since 2013) Structured grid, mapped multi-block, EB, AMR, particles fusion (COGENT), subsurface reacting flows (EFRC), climate (PISCEES), space physics, cell biology, etc BoxLib (1040 registered users since 2012) Structured grid, AMR, particles astrophysics (CASTRO, MEASTRO), cosmology (Nyx), combustion, subsurface flow, etc MOAB/Coupe (~2000 downloads/yr) Structured/unstructured grid, mesh to mesh transfer Climate modeling (ACES4BGC, ACME), plasma-surface interactions (PSI), nuclear energy, etc PUMI/MeshAdapt Unstructured mesh management, communications, geometry MHD (M3DC1), electromagnetics (ACE3P), fluid/structure interaction, CFD, NASA, DoD, industry Zoltan(2) (>4800 downloads/yr in Trilinos) Parallel load balancing and partitioning Crash simulations, particle-based methods, multi-physics codes, solvers, graph analytics, etc ML/MeuLu (>4800 downloads/yr in Trilinos) Algebraic multigrid Climate modeling (PISCEES), Sierra, NNSA applications, DoD applications, etc hypre (~1800 downloads/yr) Algebraic multigrid Groundwater flow, lattice QCD, astrophysics, electromagnetics, NNSA integrated codes, etc PETSc (~40,000 downloads/yr) Algebraic solvers and ODE integrators Plasma surface interactions, fusion, nuclear energy, subsurface flow, etc SuperLU (~25,000 downloads/yr) Direct solution of sparse, nonsymmetric systems Commercial numerical libraries, fusion, materials, accelerator modeling, chemistry, etc SUNDIALS/ARKode (~3300 downloads/yr) Time integrators and nonlinear solvers Climate, dislocation dynamics (ParaDiS), power grid simulations, etc Albany (new technology) Finite element code Climate (PISCEES), structural mechanics, materials, seminconductor analysis, quantum devices, etc PAALS (new technology) Unstructured grids, AMR, linear /nonlinear solvers, load balance Laser welds, mechanics modeling, electronic circuits, etc ‹#› ‹#› FASTMath technologies are used in SciDAC application partnerships across SC and NNSA Title Predicting Ice Sheet and Climate Evolution at Extreme Scales (PISCEES) Multiscale Methods for Accurate, Efficient, and Scale-Aware Models of the Earth System Applying Computationally Efficient Schemes for BioGeochemical Cycles Charge Transfer and Charge Transport in Photoactivated Systems Simulating the generation, evolution and fate of electronic excitations in molecular and nanoscale materials with first principles methods. PI FASTMath involvement Office Technology Steve Price, LANL and Esmond Ng, LBNL Ng, Salinger BER SG UG S William Collins, LBNL Woodward, Colella Forrest Hoffman, ORNL Mahadevan Chris Cramer, Minnesota Ng, C Yang BER BER BES Martin Head-Gordon Li, Ng, C Yang Roberto Car, Advanced Modeling of Ions in Solutions, on Surfaces, and in Biological Environments Princeton Ng, C Yang Scalable Computational Tools for Discovery and Design -- Excited State Phenomena in James Chelikowsky, Energy Materials UT Austin C Yang Discontinuous methods for accurate, massively parallel quantum molecular dynamics: Lithium ion interface dynamics from first principles John Pask, LLNL C Yang BES Optimizing Superconductor Transport Properties through Large-scale Simulation Andreas Glatz, ANL Plasma Surface Interactions: Bridging from the Surface to the Micron Frontier through Leadership Class Computing Brian Wirth, ORNL Munson BES Center for Edge Physics Simulation CS Chang, PPPL Searching for Physics Beyond the Standard Model: Strongly-Coupled Field Theories at Paul Mackenize, the Intensity and Energy Frontiers Fermilab Adams, Shephard FES Falgout HEP Computation-Driven Discovery for the Dark Universe ComPASS: High performance Computing for Accelerator Design and Optimization A MultiScale Approach to Nuclear Structure and Reactions: Forming the Computational Bridge between Lattice QCD and Nonrelativistic Many-Body Theory BES BES BES Constantinescu, Smith FES Salmon Habib, ANL Almgren HEP Panaggiotis Ng, Colella, Li, Munson, Spentzouris, Fermilab C Yang HEP Wick Haxton, LBNL Ng, Falgout, C Yang NP Nuclear Computational Low-energy Initiative Joseph Carlson, LANL Ng, C Yang NP ParaDiS Dislocation Dynamics Tom Arsenlis, LLNL NNSA Woodward ‹#› ‹#› We have helped the application teams significantly reduce time to solution in their simulations Sparse direct solves improve time to solution 20X for accelerators allowing 8 cavity simulation (Spentzouris) Sped up flux surface creation to improve 2D mesh generation in fusion application from 11.5 hours to 1 minute (Chang) Acceleration-based nonlinear solvers speed up dislocation dynamics 35-50%; multistage RungeKutta methods reduce time steps by 94% (Arsenlis) Sophisticated eigensolvers significantly improve materials calculations in many domains including ions in solution (Car), excited state phenomenon (Chelikowsky, Head-Gordon) ‹#› ‹#› We have helped the application teams achieve unprecedented resolution and increased reliability Astrophysics Lyman-a forest simulation at 4096^3 in an 80Mpc/h box; produced statistics at 1% accuracy for first time (Habib) Predictions of grounding line match experiment for first time in ice sheet modeling due to AMR (Price) Implicit ODE integrators combined with AMG linear solvers enables solution of 4D reaction-diffusion eqns for plasma surface interactions (Wirth) High-order unstructured meshes for particle accelerators overcome mesh generation/ adaptation bottlenecks (Spentzouris) ‹#› ‹#› Our strategy toward architecture awareness focuses on both inter- and intra-node issues Inter-node: Massive Concurrency Intra-node: Deep NUMA • • • • • Reduce communication Increase concurrency Reduce synchronization Address memory footprint Enable large communication/computation overlap • MPI + threads for many packages • Compare task and data parallelism • Thread communicator to allow passing of thread information among libraries • Low-level kernels for vector operations that support hybrid programming models ‹#› ‹#› We are developing new algorithms that address architecture awareness Reduce communication Increase concurrency • AMG: develop nonGalerkin approaches, use redundancy or agglomeration on coarse grids, develop additive AMG variants (hypre) (2X improvement) • Hierarchical partitioning optimizes communication at each level (Zoltan) (27% improvement in matrix-vector multiply) • Relaxation and bottom solve in AMR multigrid (Chombo) (2.5X improvement in solver, 40% overall) • HSS methods • New spectrum slicing eigensolver in PARPACK (Computes 10s of thousands of eigenvalues in small amounts of time) • New pole expansion and selected inversion schemes (PEXSI) (now scales to over 100K cores) • Utilize BG/Q architecture for extreme scaling demonstrations (PHASTA) (3.1M processes on 768K cores unstructured mesh calculation) Reduce synchronization points • Implemented pipelined versions of CG and conjugate residual methods; 4X improvement in speed (PETSc) (30% speed up on 32K cores) Used in PFLOTRAN applications Address memory footprint issues • Predictive load balancing schemes for AMR (Zoltan) (Allows AMR runs to complete by maintaining memory footprint) • Hybrid programming models Used in PHASTA extreme scale applications Increase communication and computation overlap • Improved and stabilized look-ahead algorithms (SuperLU) (3X run time improvement) Used in Omega3P accelerator simulations ‹#› ‹#› We are refactoring our software to support hybrid programming models Block structured AMR in Chombo (MPI + OpenMP) • Adding course grained thread loop over blocks and micro-blocking reduced communication costs and memory footprint; performance improvements limited Unstructured grids using Parallel Control Utility (MPI and threads) • Utility layer that allows support of both MPI and threads; showed 30% efficiency improvement and 10% memory reduction on BG/Q Partitioning with Zoltan2 (MPI + OpenMP) • Implementation in multi-dimensional jagged geometric partitioning scaled to 8B elements on 64K nodes Direct linear solvers in SuperLU_DIST (MPI +OpenMP + CUDA) • Aggregation of small BLAS operations into larger ones to hide longlatency operations resulted in 2.7X faster performance and 5X reduction in memory costs on 100-node GPU cluster Linear solver kernels (Pthreads and OpenMP) • Introduction of a thread communicator allows passing this information among libraries for portable performance with maintainable kernels Time integrators in SUNDIALS (MPI + OpenMP) • New threaded kernels and integration with SuperLU_MT provide speed up and flexibility ‹#› ‹#› The FASTMath team is working together to lower the barriers to software adoption Tutorials • Argonne Training Program for Extreme Scale Computing (ATPESC) (2013, 2014 planned) • Individual tutorials: ACTS Toolkit, SIAM summer school, NERSC user group meeting,Trilinos User group meeting Developing common build/configure practices Software catalogue at the FASTMath web site Software snapshots available at ANL and LBNL ‹#› ‹#› The ATPESC effort provided a key focal point for software unification discussion and strategy “The hands-on exercises were very high quality and well thought out." 8 hour team tutorial in August 2013 60 early career students, postdocs, staff “Very well prepared and executed ... I liked the Installed and tested most FASTMath possibility to choose the packages on Vesta particular test I want to put my • Hypre, SuperLU, PETSc, PARPACK, hands on ... Very well organized, committed, and NOX, ML, Zoltan, MOAB, Chombo, motivated team ... bravo!" PUMI, Albany “The FASTMath hands-on exercises were very well • Emphasized interoperability planned. It is easy to tell when a group has put serious effort into Lectures (Diachin, Dubey, Ralgout, Smith, their exercises, and they did a Shephard, Tautges, Woodward) great job." Common look and feel hands-on exercises “Give more time to this session. Make all speakers able to go • Hypre, MOAB, PETSc, SuperLU, deeper." Chombo, PUMI, Albany ‹#› ‹#› We are addressing key package management issues targeting interoperability software Issues Addressed: Inconsistent installation processes Inconsistent or missing configuration information Copying sources as a means of managing dependencies Spoofing sources (e.g. MPI) as a means of simplifying package code Inconsistent or missing versioning Managed installations Publication: “Package Management Practices Essential for Interoperability: Lessons learned and Strategies Developed for FASTMath”, First workshop on Sustainable Software for Science: Practice and Experiences, 2013 ‹#› ‹#› In FASTMath, we define institute awareness in two ways Intra-institute awareness Inter-institute awareness ‹#› ‹#› We have active collaborations, technology use, and discussion with all of the SciDAC Institutes Effective data layouts and threading capabilities in SUNDIALS speed up ParaDiS code (AC) Efficient sparse-dense matrix-matrix kernels for block preconditioned eigensolver for nuclear structure computations (AC) Performance analysis of PSDLin, Trilinos components and MOAB (TU) Development of in situ visualization for PHASTA CFD simulations (AC) Use of VisIt for Ice Sheet and global climate simulations using Chombo (TU) Use of Paraview for MOAB and PUMI based simulations (TU) Examining ROMIO MPI-IO implementation for MOAB (ND) Interoperability of Dakota and Trilinos components improved (AC) Adding gradient computations to Albany for distributed parameters (AC) Developing UQ workflow for Ice Sheet modeling using FASTMath technologies (AC) Interfaced dimension reduction algorithm from QUEST to Albany (AC) ‹#› ‹#› FASTMath heavily leverages and helps drive research in the base research programs at ASCR FASTMath technologies leverage Applied Math base Driving requirements in the Applied Math base program New joint math/cs projects in Exascale solvers initiative Use of FASTMath capabilities in base CS projects Examples: Algorithms in Chombo, BoxLib, PETSc, SUNDIALS, hypre, ML and Mesquite Example: Fusion collaboration led to a new base math project in high-order discretization methods Example: Discussions between FASTMath and Super led to ExReDi Project Example: DTEC X-Stack project is exploring use of domain specific languages in Chombo Image courtesy of ESL Project Image courtesy of Sam Williams, LBNL Image courtesy of DTEC Project Image courtesy of hypre Project ‹#› ‹#› The FASTMath team includes experts from four national laboratories and six universities Lawrence Berkeley National Laboratory Mark Adams Ann Almgren Phil Colella Anshu Dubey Dan Graves Sherry Li Lin Lin Terry Ligocki Mike Lijewski Peter McCorquodale Esmond Ng Brian Van Straalen Chao Yang Subcontract: Jim Demmel (UC Berkeley) Lawrence Livermore National Laboratory Barna Bihari Lori Diachin Milo Dorr Rob Falgout Mark Miller Jacob Schroder Carol Woodward Ulrike Yang Subcontract: Carl Ollivier-Gooch (Univ of British Columbia) Subcontract: Dan Reynolds (Southern Methodist) Rensselear Polytechnic Inst. E. Seegyoung Seol Onkar Sahni Mark Shephard Cameron Smith Subcontract: Ken Jansen (UC Boulder) Argonne National Laboratory Jed Brown Lois Curfman McInnes Todd Munson Vijay Mahadevan Barry Smith Subcontract: Jim Jiao (SUNY Stony Brook) Subcontract: Paul Wilson (Univ of Wisconsin) Sandia National Laboratories Karen Devine Glen Hansen Jonathan Hu Vitus Leung Siva Rajamanickam Michel Wolf Andrew Salinger ‹#› ‹#› FASTMath Management Institute Director: Lori Diachin • Responsible for overall management of project • Single POC for FASTMath to ASCR • Biweekly teleconferences with DOE ASCR Executive Council • Responsible for representing the key capability areas of FASTMath project • Coordinate project activities and technical reporting Institutional POCs • Responsible for budgetary and reporting requirements to DOE Lori Diachin Institute Director LLNL POC Phil Colella Structured Grid Technologies Mark Shephard Unstructured Grid Technologies RPI POC Barry Smith Linear Solver Technologies ANL POC Esmond Ng Nonlinear and Eigen Solvers LBNL POC Andy Salinger Integrated Technologies Karen Devine SNL POC ‹#› ‹#› FASTMath Communication Face to face meetings • Kick off meeting, Nov 2011 • Project team meetings: Feb 2012, 2013 2014 at SIAM mtgs July 2012, 2013 at SciDAC PI mtgs Team Webinar series for technical exchange with internal and external speakers Project management • Biweekly scheduled exec council teleconferences • Team telecons as needed Redmine Wiki site for tutorial and software development Subteam teleconferences and face-to-face meetings ‹#› ‹#› FASTMath review presentations highlight technical work and applications impact in key discipline areas Time Talk Title Presenter 1:30-1:55 pm Structured Grid Technologies Phil Colella 1:55 – 2:20 pm Unstructured Grid Technologies Mark Shephard 2:20 – 2:45 pm Solver Technologies Esmond Ng 2:45 – 3:10 pm Integrated Technologies Andy Salinger 3:10 – 3:20 pm Flexible Q&A All ‹#› ‹#› The FASTMath team is on track and delivering for the SciDAC program 1. FASTMath technologies are integral to a broad range of applications and are used world-wide (SciDAC, SC, NNSA, Applied Energy, DoD, NASA, Industry, etc) 1. We are leaders in mathematical algorithm and software development for current and next generation architectures 1. The FASTMath team is leveraging this unique opportunity to develop new integrated technologies ‹#› ‹#›