FASTMath_AlgebraicSystems_v7

advertisement
Solution of Algebraic Systems
Esmond G. Ng
Lawrence Berkeley National Laboratory
for the FASTMath Algebraic Systems Team
‹#›
‹#›
The algebraic systems team provides key
technologies to solution of scientific problems
Linear system
solution using
direct and
iterative solvers
Nonlinear system
solution using
acceleration
techniques and
globalized
Newton methods
Eigensolvers
using iterative
techniques and
optimization
Differential
variational
inequalities
based on activeset algorithms
Application to fusion, nuclear structure calculation, quantum chemistry,
Accelerator modeling, climate, dislocation dynamics etc,
‹#›
‹#›
The algebraic systems team includes experts from
4 DOE laboratories and 2 universities
Lawrence Berkeley
National Laboratory
Mark Adams
Ann Almgren
Phil Colella
Anshu Dubey
Dan Graves
Sherry Li
Lin Lin
Terry Ligocki
Mike Lijewski
Peter McCorquodale
Esmond Ng
Brian Van Straalen
Chao Yang
Subcontract: Jim Demmel
(UC Berkeley)
Lawrence Livermore
National Laboratory
Barna Bihari
Lori Diachin
Milo Dorr
Rob Falgout
Mark Miller
Jacob Schroder
Carol Woodward
Ulrike Yang
Subcontract: Carl Ollivier-Gooch
(Univ of British Columbia)
Subcontract: Dan Reynolds
(Southern Methodist)
Rensselear Polytechnic Inst.
E. Seegyoung Seol
Onkar Sahni
Mark Shephard
Cameron Smith
Subcontract: Ken Jansen
(UC Boulder)
Argonne National Laboratory
Jed Brown
Lois Curfman McInnes
Todd Munson
Vijay Mahadevan
Barry Smith
Subcontract: Jim Jiao
(SUNY Stony Brook)
Subcontract: Paul Wilson
(Univ of Wisconsin)
Sandia National
Laboratories
Karen Devine
Glen Hansen
Jonathan Hu
Vitus Leung
Siva Rajamanickam
Michel Wolf
Andrew Salinger
‹#›
‹#›
Goals of the algebraic systems team
 Develop state-of-the-art algorithms for solving algebraic systems that
overcome architectural challenges.
• Communication and memory bottlenecks.
 Provide efficient, scalable implementations, often in form of software
libraries, for the computational science community to use.
• Several libraries are award-winning.
• Most algorithms and libraries are exploiting multi-/many-core architectures, often
with significant speedups.
 Deploy algorithms and software libraries in SciDAC science partnership
projects and elsewhere.
 Collaborate with domain scientists to solve specific algebraic systems.
‹#›
‹#›
The algebraic systems team participates in 15 of the
19 SciDAC Science Partnership Projects
BES
BER
FES
HEP
NP
Roberto Car
(Princeton)
Chemistry
Bill Collins (LBNL)
Earth System
C.-S. Chang (PPPL)
EPSI Fusion Edge
P. Spentzouris
(FNAL) COMPASS
Accelerators
Wick Haxton (LBNL)
CalLAT LQCD
Chris Cramer (UMN)
Chemistry
Esmond Ng & Steve
Price (LANL)
PISCEES Ice Sheets
Brian Wirth (ORNL)
Paul Mackenzie
PSI Fusion Materials (FNAL) LQCD
Martin Head-Gordon
(LBNL) Materials
Forrest Hoffman
(ORNL) ACES4BGC
Earth System
John Pask (LLNL)
Chemistry
Jim Chelikowsky
(Texas) Materials
Andreas Glatz (ANL)
Materials
Salman Habib (ANL)
Dark Matter
Blue – linear solvers
Green – nonlinear solvers/time integrators
Purple – linear solvers and nonlinear solvers/time
integrators
Maroon – linear solvers + eigenvalue
Red – eigenvalue
Joe Carlson (LANL)
NUCLEI Nuclear
Physics
Frithjof Karsch (BNL)
LQCD
NNSA
Tom Arsenlis
(LLNL) Materials
So Hirata (UIUC)
Materials
‹#›
‹#›
Linear equations solvers
 Technical Areas:
• Iterative and multigrid/multilevel methods, direct methods.
• Hybrid methods – domain decomposition based, combining direct and iterative
methods.
 Notable SW Packages:
• hypre, PETSc, Trilinos (received R&D 100 Awards).
• SuperLU (+ variants), PDSLin.
‹#›
‹#›
Advancing linear solver technology
 Implementation of constrained energy minimization multigrid in
Trilinos/MueLu.
• Generalization of smoothed aggregation (SA) AMG.
• Constraints: sparsity pattern, nullspace interpolation. Output: interpolation
weights.
• Setup expense can be amortized over many solves, offset w/ good initial guess.
• 50% efficiency @ 8,200 cores for 3D elasticity.
 Hierarchically Semi-Separable structure (HSS) for dense, but datasparse matrices: (e.g., BEMs, Integral equations, PDEs with smooth
kernels) – a new approach to solve a class of structured linear systems.
• Matrix off-diagonal blocks are rank deficient.
• Hierarchical partitioning, nested bases lead to nearly linear complexity.
• Performance: 3D seismic imaging – Helmholtz equations up to 6003 cubic grids
(216M equations), 16,000+ cores, 2x faster, uses 1/5 of memory.
• New randomized method further reduces arithmetic and communication
complexity.
‹#›
‹#›
Impact of linear solver technology on science
 Improved performance and robustness of
linear and nonlinear solvers in BISICLES
ice sheet modeling code (PISCEES)
through Chombo-PETSc coupling.
 Chombo-PETSc coupling was also
deployed for embedded boundary
Solver convergence for BISICLES velocity solve –
PETSc AMG vs. Chombo GMG solvers.
methods for reacting pore-scale flow.
 Developing new gyrokinetic Poisson solver with accurate flux surface
averaging (FSA) term in EPSI (use PETSc’s FieldSplit block solver
capabilities).
 Use of PDSLin reduces time to solution and memory by factors of 20 and 5,
respectively, in the ACE3P code for modeling accelerator cavities
(ComPASS).
 Deploying multigrid solver to tackle hard-to-solve linear systems in QCD
calculations (USQCD, CalLAT).
‹#›
‹#›
Linear equations solvers
 Technical Areas:
• Iterative and multigrid/multilevel methods, direct methods.
• Hybrid methods – domain decomposition based, combining direct and iterative
methods.
 Notable SW Packages:
• hypre, PETSc, Trilinos (received R&D 100 Awards).
• SuperLU (+ variants), PDSLin.
 Plans for FY14-FY16:
• Improving performance of solvers through hybrid programming and reduction of
communication/synchronization requirements.
• Improving capabilities of solvers through development of new techniques.
• Deploying linear solvers in SciDAC Science Partnership Projects.
• Working with domain scientists to solve specific linear systems.
‹#›
‹#›
Nonlinear equations solvers and time integrators
 Technical Areas:
• Accelerated fixed point methods and implicit/semi-implicit approaches for solving
nonlinear systems.
• Time integrators for multi-physics and multi-scale problems.
 Notable SW Packages:
• ARKode, SUNDIALS, Trilinos/NOX.
‹#›
‹#›
Advancing nonlinear solver and time integrator
technology
 Developed new SUNDIALS interface to SuperLU_MT.
 Developed new ARKode solver as part of SUNDIALS library:
• High-order, stable, time-adaptive, mixed implicit+explicit time integration.
• Fully-customizable for specific applications: predictors, time adaptivity, Butcher
tables, etc.
• Standard Newton-Krylov and novel accelerated fixed-point nonlinear solvers.
• Native support for non-identity mass matrices (FEM spatial discretizations) and
spatial adaptivity (changing vector sizes between steps).
• Use of SUNDIALS linear solvers: parallel Krylov ([F]GMRES, BiCGStab,
TFQMR, PCG), serial direct (dense, band, sparse KLU, sparse SuperLU-MT).
 Developed new nonlinear solvers in SUNDIALS KINSOL package.
• Fixed point and Picard solvers with Anderson acceleration motivated by work in
dislocation dynamics material models and subsurface flow problems.
 Trilinos/NOX: Anderson Acceleration solver was also added.
• Motivated by – and used by – the CASL program in Nuclear Energy.
‹#›
‹#›
Impact of nonlinear solver and time integrator
technology on science
 Interfaced ARKode’s implicit time integrators and SUNDIALS’ Anderson
accelerated fixed point solver from KINSOL into ParaDiS dislocation
dynamics simulator (NNSA)
• Higher order implicit methods with accelerated fixed point solver gave ~50%
speedup with a 94% reduction in number of time steps on 16 cores of LLNL Cab
Linux cluster.
• Speedups held across a range of strain rate values.
• Accelerated fixed point solver from KINSOL resulted in 25% speedup on 4,096
cores of LLNL Vulcan machine and 12% speedup on 262K cores of LLNL
Sequoia machine.
 SUNDIALS DAE solver is being used in a DOE Office of Electricity and
Energy Reliability project in transmission power grid modeling.
 Trilinos/NOX is being incorporated in several climate research codes.
• BER SciDACs: PISCEES, MultiScale, and Ocean BGC Spin-up.
 Trilinos/NOX is used by the CASL program in Nuclear Energy.
‹#›
‹#›
Nonlinear equations solvers and time integrators
 Technical Areas:
• Accelerated fixed point methods and implicit/semi-implicit approaches for solving
nonlinear systems.
• Time integrators for multi-physics and multi-scale problems.
 Notable SW Packages:
• SUNDIALS, Trilinos/NOX.
 Plans for FY14-FY16:
• Implementing symplectic and multi-rate integrators in ARKode.
• Constructing interfaces between SUNDIALS and hypre to access scalable
multigrid preconditioners.
• Improving performance of SUNDIALS through hybrid programming and
reduction of communication/synchronization requirements.
• Add adjoint capability in Albany using NOX, targeting ice sheet initialization.
‹#›
‹#›
Eigensolvers
 Technical Areas:
• Development and implementation of algorithms for solving a variety of
eigenvalue problems.
• Linear problems: Ax = λBx.
• Nonlinear problems: H(X)X = XΛ; F(λ)x = 0.
 Notable SW Packages:
• PARPACK.
• The eigenvalue work goes beyond PARPACK – much of the work is motivated
by science problems.
‹#›
‹#›
Advancing eigensolver technology and impact on
science
 Computing many eigenpairs of a Hermitian matrix (excited states
calculations for materials).
• Reduce the number of Rayleigh-Ritz calculations.
• Trace penalty minimization and projected conjugate gradient methods to reduce
number of Rayleigh-Ritz calculations.
• Spectrum slicing.
 Scalable eigensolver for nuclear configuration interaction calculations.
• Developed topology aware data distribution.
• Hybrid MPI/OpenMP implementation to overlap comm with computation.
• Resulted in up to 6x speed-up on 18,000 codes.
 Efficient solver for interior eigenvalues of non-Hermitian matrix in materials
science.
• Developed Generalized Locally Preconditioned Minimal Residual Method (use
harmonic projection to extract approximate eigenvalues from a subspace
spanned by the current approximation and nested preconditioned residuals).
‹#›
‹#›
Eigensolvers
 Technical Areas:
• Development and implementation of algorithms for solving a variety of
eigenvalue problems.
• Linear problems: Ax = λBx.
• Nonlinear problems: H(X)X = XΛ; F(λ)x=0.
 Notable SW Packages:
• PARPACK.
 Plans for FY14-FY16:
• Focus on structured eigenvalue problem (e.g., Bethe-Salpeter equation, Casida
linear response eigenvalue problem).
• Focus on convergence issues in nonlinear eigenvalue problems (e.g., selfconsistent Dyson's equation, Bloch-Horowitz equation).
• Further performance improvement of linear eigensolvers.
• Deploy fast eigensolvers in electronic structure software package (Qbox,
quantum-espresso, etc.)
‹#›
‹#›
Overcoming architectural challenges –
communication reduction
 Communication reduction in multigrid solvers.
• Non-Galerkin AMG replaces usual coarse-grid
operators with sparser ones, resulting 1.2x – 2.4x
speedups.
• Redundant coarse grid AMG uses subgroups of
processes and perform one or several AMG solves
independently, leading to speedups up to 3x.
• Multi-additive AMG exploits a theoretical identity to inherit the parallelization
benefits of additive methods and the convergence properties of multiplicative
MG, providing speedups up to 2x over existing AMG.
 Reducing communication in iterative and direct methods.
• Deriving communication lower bounds.
• Designing algorithms that achieve these
lower bounds.
• Improving numerical stability in some of the
communication-avoiding algorithms.
‹#›
‹#›
Overcoming architectural challenges –
synchronization reduction
 Traditional Krylov solvers all block on global
reductions of inner products and norms.
• Pipelined GMRES is a reorganization of the
steps of GMRES that produces the same
results in exact arithmetic but delay the use
of the results of the inner products and norms
allowing the algorithm to proceed with
additional computation and communication
before needed the results of the global reduction.
 Primary scaling obstacle for sparse matrix
factorization is high degree of task and data
dependency (DAG).
• Developed new scheduling and flexible look-ahead
algorithms for DAG execution, reducing processor
idle time and shortening the length of the critical path.
• Achieved nearly 3x speedup on thousands of cores.
New scheduling in SuperLU_DIST:
Fusion Tokmak Simulation, M3D-C1
‹#›
‹#›
Science impact – FASTMath solvers aim to
accelerate LQCD simulations in USQCD and CalLAT
 Quantum Chromodynamics (QCD) is the theory of strong forces in the
Standard Model of particle physics.
 Scalable solvers for the Dirac equations have
been elusive until recently.
 Challenges:
• System is complex, indefinite, and can be
extremely ill-conditioned.
• Near null space is unknown and oscillatory!
Real part
Imaginary part
 Incorporating new adaptive algebraic
multigrid (AMG) into hypre and making it available to the USQCD software
stack and to CalLAT
 Working on a parallel implementation of the adaptive bootstrap AMG
(BAMG) algorithm and exploring communication-reducing ideas based on
non-Galerkin techniques.
‹#›
‹#›
Science impact – PEXSI revolutionizes ab initio
molecular dynamics simulations
 Ab initio molecular dynamics is used to
study the solid-electrolyte interface (SEI)
in Li-ion batteries (Pask) and graphene
oxide (Car).
 Electronic structure calculation is
performed at each time step for energy,
density and forces, which depend on a
spectral projector, requiring calculating
eigenvalues.
 Alternative formulation uses Fermi-Dirac
approximation. Pole expansion leads to
a solution that requires selected
elements of the inverse of a matrix.
 Pole Expansion and Selected Inversion
(PEXSI) has complexity O(n2) for 3D
systems, where n is the number of
atoms, compared to the O(n3) complexity
for the diagonalization approach. For 2D
and quasi-1D systems, complexities are
O(n1.5) and O(n).
 PEXSI will be integrated with DGDFT,
which will allow massively parallel AIMD
to be performed in the design of new
anode-electrolyte combinations for safe,
reliable, high-capacity, high-charge rate
batteries.
‹#›
‹#›
Algebraic systems summary
 Possess a diverse set of expertise in algebraic systems.
 Develop state-of-the-art algorithms and provide efficient, scalable
implementations, often in the form of freely available software libraries.
 Work towards overcoming architectural challenges.
• Address multi-/many-core architecures, with successes.
• Communication/synchronization and memory reduction.
 Actively interact with domain scientists, and is involved in 15 of the 19
SciDAC Science Partnership Projects.
 Collaborate with domain scientists to make significant impact on enabling
and accelerating scientific discoveries.
‹#›
‹#›
Download