Solution of Algebraic Systems Esmond G. Ng Lawrence Berkeley National Laboratory for the FASTMath Algebraic Systems Team ‹#› ‹#› The algebraic systems team provides key technologies to solution of scientific problems Linear system solution using direct and iterative solvers Nonlinear system solution using acceleration techniques and globalized Newton methods Eigensolvers using iterative techniques and optimization Differential variational inequalities based on activeset algorithms Application to fusion, nuclear structure calculation, quantum chemistry, Accelerator modeling, climate, dislocation dynamics etc, ‹#› ‹#› The algebraic systems team includes experts from 4 DOE laboratories and 2 universities Lawrence Berkeley National Laboratory Mark Adams Ann Almgren Phil Colella Anshu Dubey Dan Graves Sherry Li Lin Lin Terry Ligocki Mike Lijewski Peter McCorquodale Esmond Ng Brian Van Straalen Chao Yang Subcontract: Jim Demmel (UC Berkeley) Lawrence Livermore National Laboratory Barna Bihari Lori Diachin Milo Dorr Rob Falgout Mark Miller Jacob Schroder Carol Woodward Ulrike Yang Subcontract: Carl Ollivier-Gooch (Univ of British Columbia) Subcontract: Dan Reynolds (Southern Methodist) Rensselear Polytechnic Inst. E. Seegyoung Seol Onkar Sahni Mark Shephard Cameron Smith Subcontract: Ken Jansen (UC Boulder) Argonne National Laboratory Jed Brown Lois Curfman McInnes Todd Munson Vijay Mahadevan Barry Smith Subcontract: Jim Jiao (SUNY Stony Brook) Subcontract: Paul Wilson (Univ of Wisconsin) Sandia National Laboratories Karen Devine Glen Hansen Jonathan Hu Vitus Leung Siva Rajamanickam Michel Wolf Andrew Salinger ‹#› ‹#› Goals of the algebraic systems team Develop state-of-the-art algorithms for solving algebraic systems that overcome architectural challenges. • Communication and memory bottlenecks. Provide efficient, scalable implementations, often in form of software libraries, for the computational science community to use. • Several libraries are award-winning. • Most algorithms and libraries are exploiting multi-/many-core architectures, often with significant speedups. Deploy algorithms and software libraries in SciDAC science partnership projects and elsewhere. Collaborate with domain scientists to solve specific algebraic systems. ‹#› ‹#› The algebraic systems team participates in 15 of the 19 SciDAC Science Partnership Projects BES BER FES HEP NP Roberto Car (Princeton) Chemistry Bill Collins (LBNL) Earth System C.-S. Chang (PPPL) EPSI Fusion Edge P. Spentzouris (FNAL) COMPASS Accelerators Wick Haxton (LBNL) CalLAT LQCD Chris Cramer (UMN) Chemistry Esmond Ng & Steve Price (LANL) PISCEES Ice Sheets Brian Wirth (ORNL) Paul Mackenzie PSI Fusion Materials (FNAL) LQCD Martin Head-Gordon (LBNL) Materials Forrest Hoffman (ORNL) ACES4BGC Earth System John Pask (LLNL) Chemistry Jim Chelikowsky (Texas) Materials Andreas Glatz (ANL) Materials Salman Habib (ANL) Dark Matter Blue – linear solvers Green – nonlinear solvers/time integrators Purple – linear solvers and nonlinear solvers/time integrators Maroon – linear solvers + eigenvalue Red – eigenvalue Joe Carlson (LANL) NUCLEI Nuclear Physics Frithjof Karsch (BNL) LQCD NNSA Tom Arsenlis (LLNL) Materials So Hirata (UIUC) Materials ‹#› ‹#› Linear equations solvers Technical Areas: • Iterative and multigrid/multilevel methods, direct methods. • Hybrid methods – domain decomposition based, combining direct and iterative methods. Notable SW Packages: • hypre, PETSc, Trilinos (received R&D 100 Awards). • SuperLU (+ variants), PDSLin. ‹#› ‹#› Advancing linear solver technology Implementation of constrained energy minimization multigrid in Trilinos/MueLu. • Generalization of smoothed aggregation (SA) AMG. • Constraints: sparsity pattern, nullspace interpolation. Output: interpolation weights. • Setup expense can be amortized over many solves, offset w/ good initial guess. • 50% efficiency @ 8,200 cores for 3D elasticity. Hierarchically Semi-Separable structure (HSS) for dense, but datasparse matrices: (e.g., BEMs, Integral equations, PDEs with smooth kernels) – a new approach to solve a class of structured linear systems. • Matrix off-diagonal blocks are rank deficient. • Hierarchical partitioning, nested bases lead to nearly linear complexity. • Performance: 3D seismic imaging – Helmholtz equations up to 6003 cubic grids (216M equations), 16,000+ cores, 2x faster, uses 1/5 of memory. • New randomized method further reduces arithmetic and communication complexity. ‹#› ‹#› Impact of linear solver technology on science Improved performance and robustness of linear and nonlinear solvers in BISICLES ice sheet modeling code (PISCEES) through Chombo-PETSc coupling. Chombo-PETSc coupling was also deployed for embedded boundary Solver convergence for BISICLES velocity solve – PETSc AMG vs. Chombo GMG solvers. methods for reacting pore-scale flow. Developing new gyrokinetic Poisson solver with accurate flux surface averaging (FSA) term in EPSI (use PETSc’s FieldSplit block solver capabilities). Use of PDSLin reduces time to solution and memory by factors of 20 and 5, respectively, in the ACE3P code for modeling accelerator cavities (ComPASS). Deploying multigrid solver to tackle hard-to-solve linear systems in QCD calculations (USQCD, CalLAT). ‹#› ‹#› Linear equations solvers Technical Areas: • Iterative and multigrid/multilevel methods, direct methods. • Hybrid methods – domain decomposition based, combining direct and iterative methods. Notable SW Packages: • hypre, PETSc, Trilinos (received R&D 100 Awards). • SuperLU (+ variants), PDSLin. Plans for FY14-FY16: • Improving performance of solvers through hybrid programming and reduction of communication/synchronization requirements. • Improving capabilities of solvers through development of new techniques. • Deploying linear solvers in SciDAC Science Partnership Projects. • Working with domain scientists to solve specific linear systems. ‹#› ‹#› Nonlinear equations solvers and time integrators Technical Areas: • Accelerated fixed point methods and implicit/semi-implicit approaches for solving nonlinear systems. • Time integrators for multi-physics and multi-scale problems. Notable SW Packages: • ARKode, SUNDIALS, Trilinos/NOX. ‹#› ‹#› Advancing nonlinear solver and time integrator technology Developed new SUNDIALS interface to SuperLU_MT. Developed new ARKode solver as part of SUNDIALS library: • High-order, stable, time-adaptive, mixed implicit+explicit time integration. • Fully-customizable for specific applications: predictors, time adaptivity, Butcher tables, etc. • Standard Newton-Krylov and novel accelerated fixed-point nonlinear solvers. • Native support for non-identity mass matrices (FEM spatial discretizations) and spatial adaptivity (changing vector sizes between steps). • Use of SUNDIALS linear solvers: parallel Krylov ([F]GMRES, BiCGStab, TFQMR, PCG), serial direct (dense, band, sparse KLU, sparse SuperLU-MT). Developed new nonlinear solvers in SUNDIALS KINSOL package. • Fixed point and Picard solvers with Anderson acceleration motivated by work in dislocation dynamics material models and subsurface flow problems. Trilinos/NOX: Anderson Acceleration solver was also added. • Motivated by – and used by – the CASL program in Nuclear Energy. ‹#› ‹#› Impact of nonlinear solver and time integrator technology on science Interfaced ARKode’s implicit time integrators and SUNDIALS’ Anderson accelerated fixed point solver from KINSOL into ParaDiS dislocation dynamics simulator (NNSA) • Higher order implicit methods with accelerated fixed point solver gave ~50% speedup with a 94% reduction in number of time steps on 16 cores of LLNL Cab Linux cluster. • Speedups held across a range of strain rate values. • Accelerated fixed point solver from KINSOL resulted in 25% speedup on 4,096 cores of LLNL Vulcan machine and 12% speedup on 262K cores of LLNL Sequoia machine. SUNDIALS DAE solver is being used in a DOE Office of Electricity and Energy Reliability project in transmission power grid modeling. Trilinos/NOX is being incorporated in several climate research codes. • BER SciDACs: PISCEES, MultiScale, and Ocean BGC Spin-up. Trilinos/NOX is used by the CASL program in Nuclear Energy. ‹#› ‹#› Nonlinear equations solvers and time integrators Technical Areas: • Accelerated fixed point methods and implicit/semi-implicit approaches for solving nonlinear systems. • Time integrators for multi-physics and multi-scale problems. Notable SW Packages: • SUNDIALS, Trilinos/NOX. Plans for FY14-FY16: • Implementing symplectic and multi-rate integrators in ARKode. • Constructing interfaces between SUNDIALS and hypre to access scalable multigrid preconditioners. • Improving performance of SUNDIALS through hybrid programming and reduction of communication/synchronization requirements. • Add adjoint capability in Albany using NOX, targeting ice sheet initialization. ‹#› ‹#› Eigensolvers Technical Areas: • Development and implementation of algorithms for solving a variety of eigenvalue problems. • Linear problems: Ax = λBx. • Nonlinear problems: H(X)X = XΛ; F(λ)x = 0. Notable SW Packages: • PARPACK. • The eigenvalue work goes beyond PARPACK – much of the work is motivated by science problems. ‹#› ‹#› Advancing eigensolver technology and impact on science Computing many eigenpairs of a Hermitian matrix (excited states calculations for materials). • Reduce the number of Rayleigh-Ritz calculations. • Trace penalty minimization and projected conjugate gradient methods to reduce number of Rayleigh-Ritz calculations. • Spectrum slicing. Scalable eigensolver for nuclear configuration interaction calculations. • Developed topology aware data distribution. • Hybrid MPI/OpenMP implementation to overlap comm with computation. • Resulted in up to 6x speed-up on 18,000 codes. Efficient solver for interior eigenvalues of non-Hermitian matrix in materials science. • Developed Generalized Locally Preconditioned Minimal Residual Method (use harmonic projection to extract approximate eigenvalues from a subspace spanned by the current approximation and nested preconditioned residuals). ‹#› ‹#› Eigensolvers Technical Areas: • Development and implementation of algorithms for solving a variety of eigenvalue problems. • Linear problems: Ax = λBx. • Nonlinear problems: H(X)X = XΛ; F(λ)x=0. Notable SW Packages: • PARPACK. Plans for FY14-FY16: • Focus on structured eigenvalue problem (e.g., Bethe-Salpeter equation, Casida linear response eigenvalue problem). • Focus on convergence issues in nonlinear eigenvalue problems (e.g., selfconsistent Dyson's equation, Bloch-Horowitz equation). • Further performance improvement of linear eigensolvers. • Deploy fast eigensolvers in electronic structure software package (Qbox, quantum-espresso, etc.) ‹#› ‹#› Overcoming architectural challenges – communication reduction Communication reduction in multigrid solvers. • Non-Galerkin AMG replaces usual coarse-grid operators with sparser ones, resulting 1.2x – 2.4x speedups. • Redundant coarse grid AMG uses subgroups of processes and perform one or several AMG solves independently, leading to speedups up to 3x. • Multi-additive AMG exploits a theoretical identity to inherit the parallelization benefits of additive methods and the convergence properties of multiplicative MG, providing speedups up to 2x over existing AMG. Reducing communication in iterative and direct methods. • Deriving communication lower bounds. • Designing algorithms that achieve these lower bounds. • Improving numerical stability in some of the communication-avoiding algorithms. ‹#› ‹#› Overcoming architectural challenges – synchronization reduction Traditional Krylov solvers all block on global reductions of inner products and norms. • Pipelined GMRES is a reorganization of the steps of GMRES that produces the same results in exact arithmetic but delay the use of the results of the inner products and norms allowing the algorithm to proceed with additional computation and communication before needed the results of the global reduction. Primary scaling obstacle for sparse matrix factorization is high degree of task and data dependency (DAG). • Developed new scheduling and flexible look-ahead algorithms for DAG execution, reducing processor idle time and shortening the length of the critical path. • Achieved nearly 3x speedup on thousands of cores. New scheduling in SuperLU_DIST: Fusion Tokmak Simulation, M3D-C1 ‹#› ‹#› Science impact – FASTMath solvers aim to accelerate LQCD simulations in USQCD and CalLAT Quantum Chromodynamics (QCD) is the theory of strong forces in the Standard Model of particle physics. Scalable solvers for the Dirac equations have been elusive until recently. Challenges: • System is complex, indefinite, and can be extremely ill-conditioned. • Near null space is unknown and oscillatory! Real part Imaginary part Incorporating new adaptive algebraic multigrid (AMG) into hypre and making it available to the USQCD software stack and to CalLAT Working on a parallel implementation of the adaptive bootstrap AMG (BAMG) algorithm and exploring communication-reducing ideas based on non-Galerkin techniques. ‹#› ‹#› Science impact – PEXSI revolutionizes ab initio molecular dynamics simulations Ab initio molecular dynamics is used to study the solid-electrolyte interface (SEI) in Li-ion batteries (Pask) and graphene oxide (Car). Electronic structure calculation is performed at each time step for energy, density and forces, which depend on a spectral projector, requiring calculating eigenvalues. Alternative formulation uses Fermi-Dirac approximation. Pole expansion leads to a solution that requires selected elements of the inverse of a matrix. Pole Expansion and Selected Inversion (PEXSI) has complexity O(n2) for 3D systems, where n is the number of atoms, compared to the O(n3) complexity for the diagonalization approach. For 2D and quasi-1D systems, complexities are O(n1.5) and O(n). PEXSI will be integrated with DGDFT, which will allow massively parallel AIMD to be performed in the design of new anode-electrolyte combinations for safe, reliable, high-capacity, high-charge rate batteries. ‹#› ‹#› Algebraic systems summary Possess a diverse set of expertise in algebraic systems. Develop state-of-the-art algorithms and provide efficient, scalable implementations, often in the form of freely available software libraries. Work towards overcoming architectural challenges. • Address multi-/many-core architecures, with successes. • Communication/synchronization and memory reduction. Actively interact with domain scientists, and is involved in 15 of the 19 SciDAC Science Partnership Projects. Collaborate with domain scientists to make significant impact on enabling and accelerating scientific discoveries. ‹#› ‹#›