FASTMathOverview

advertisement
LLNL-PRES-501654
FASTMath: Frameworks, Algorithms and Scalable
Technologies for Mathematics
FASTMath Team
Lori Diachin, Institute Director
FASTMath SciDAC Institute
‹#›
‹#›
The FASTMath SciDAC project focuses on the
development and use of mathematics software libraries
The FASTMath SciDAC Institute develops and deploys scalable
mathematical algorithms and software tools for reliable
simulation of complex physical phenomena and collaborates
with DOE domain scientists to ensure the usefulness and
applicability of FASTMath technologies
‹#›
‹#›
Our goal is to help application scientists overcome
two fundamental challenges
1. Improve the quality of their simulations
• Increase accuracy
• Increase physical fidelity
• Improve robustness and reliability
2. Adapt computations to make effective use of HPC
platforms
•
•
Massive concurrency
Multi-/many-core nodes
FASTMath addresses both challenges by focusing on the
interactions among mathematical algorithms, software
design, and computer architectures
‹#›
‹#›
The FASTMath project brings together three
previously separate SciDAC applied math projects
 APDEC: Focused on structured grid capabilities
 ITAPS: Focused on unstructured grid capabilities
 TOPS: Focused on solution of algebraic systems
FASTMath continues these focus areas and also started
new work in integrating key technologies
‹#›
‹#›
• Iterative
solution of
linear systems
• Direct solution
of linear
systems
• Nonlinear
systems
• Eigensystems
• Differential
variational
inequalities
High Level Integrated Capabilities
• Structured grid
technologies
• Unstructured
grid
technologies
• Adaptive mesh
refinement
• Complex
geometry
• High-order
discretizations
• Particle
methods
• Time
integration
Solution of Algebraic Systems
Tools for Problem Discretization
FASTMath encompasses three broad topical areas
• Adaptivity
through the
software stack
• Management
of field data
• Coupling
different
physics
domains
• Mesh/particle
coupling
methods
‹#›
‹#›
Structured grid capabilities focus on high order, mapped
grids, embedded boundaries, AMR and particles
Structured AMR
Mappedmultiblock grids
Embedded
boundary
methods
Particle-based
methods
Application to cosmology, astrophysics, accelerator modeling, fusion, climate,
subsurface reacting flows, low mach number combustion, etc.
‹#›
‹#›
Our unstructured grid capabilities focus on adaptivity,
high order, and the tools needed for extreme scaling
Parallel mesh
infrastructures
Dynamic load
balancing
Mesh
adaptation
and quality
control
Parallel
performance
on
unstructured
meshes
Architecture
aware
implementations
Application to fusion, climate, accelerator modeling, NNSA applications,
nuclear energy, manufacturing processes, etc.
‹#›
‹#›
Our work on algebraic systems provides key solution
technologies to applications
Linear system
solution using
direct and
iterative solvers
Nonlinear system
solution using
acceleration
techniques and
globalized
Newton methods
Eigensolvers
using iterative
techniques and
optimization
Differential
variational
inequalities based
on active-set
algorithms
Application to fusion, nuclear structure calculation, quantum chemistry,
accelerator modeling, climate, dislocation dynamics etc,
‹#›
‹#›
Integrating technologies is a key value added by the
FASTMath Institute
Mesh/solver
interactions
Mesh-to-mesh
coupling methods
Unstructured
mesh
technologies into
simulation
workflows
Software
unification
strategies
Application to climate, plasma surface interactions, structural mechanics,
nuclear energy, cosmology, fluid flow, etc.
‹#›
‹#›
FASTMath encompasses our algorithm development
in widely used software
Structured Mesh Tools
BoxLib (Ann Almgren)
Chombo (Phil Colella)
Unstructured Mesh Tools
PUIMI (Seegyoung Seol)
MeshAdapt (Mark Shephard)
MOAB (Vijay Mahadevan)
Mesquite (Lori Diachin)
PHASTA (Ken Jansen)
APF (Cameron Smith)
Hypre (Rob Falgout)
PETSc ( Barry Smith)
SuperLU (Sherry Li)
ML/Trilinos (Jonathan Hu)
Nonlinear Solvers/Differential
Variational Inequalities
FASTMath Software
Partitioning Tools
Zoltan (Karen Devine)
ParMA (Mark Shephard)
Linear Solvers
SUNDIALS (Carol Woodward
PETSc (Barry Smith)
NOX/Trilinos (Andy Salinger)
Time Integrators
Eigensolvers
SUNDIALS (Carol Woodward)
PETSc (Barry Smith)
PARPACK (Chao Yang)
‹#›
‹#›
Three primary metrics gauge our success
1. Use of FASTMath technologies in
applications
• SciDAC applications
• Broader engagement with SC, NNSA,
Energy, DoD, NASA, industry, other
2. Deployment of FASTMath software
• Efficient and robust on DOE
Compute facilities
• Improving ease of use
3. Participation in the scientific community
• 110 publications; 235 presentations
since project inception
• Team and individual tutorials on
FASTMath software
‹#›
‹#›
Our software is used world-wide and in a variety of
application areas
Package Name/Usage Stats
Basic Capabilities
Application Examples
Chombo (600 downloads
since 2013)
Structured grid, mapped multi-block,
EB, AMR, particles
fusion (COGENT), subsurface reacting flows (EFRC),
climate (PISCEES), space physics, cell biology, etc
BoxLib (1040 registered users
since 2012)
Structured grid, AMR, particles
astrophysics (CASTRO, MEASTRO), cosmology (Nyx),
combustion, subsurface flow, etc
MOAB/Coupe (~2000
downloads/yr)
Structured/unstructured grid, mesh to
mesh transfer
Climate modeling (ACES4BGC, ACME), plasma-surface
interactions (PSI), nuclear energy, etc
PUMI/MeshAdapt
Unstructured mesh management,
communications, geometry
MHD (M3DC1), electromagnetics (ACE3P),
fluid/structure interaction, CFD, NASA, DoD, industry
Zoltan(2) (>4800 downloads/yr
in Trilinos)
Parallel load balancing and
partitioning
Crash simulations, particle-based methods, multi-physics
codes, solvers, graph analytics, etc
ML/MeuLu (>4800
downloads/yr in Trilinos)
Algebraic multigrid
Climate modeling (PISCEES), Sierra, NNSA applications,
DoD applications, etc
hypre (~1800 downloads/yr)
Algebraic multigrid
Groundwater flow, lattice QCD, astrophysics,
electromagnetics, NNSA integrated codes, etc
PETSc (~40,000
downloads/yr)
Algebraic solvers and ODE integrators
Plasma surface interactions, fusion, nuclear energy,
subsurface flow, etc
SuperLU (~25,000
downloads/yr)
Direct solution of sparse,
nonsymmetric systems
Commercial numerical libraries, fusion, materials,
accelerator modeling, chemistry, etc
SUNDIALS/ARKode (~3300
downloads/yr)
Time integrators and nonlinear solvers
Climate, dislocation dynamics (ParaDiS), power grid
simulations, etc
Albany (new technology)
Finite element code
Climate (PISCEES), structural mechanics, materials,
seminconductor analysis, quantum devices, etc
PAALS (new technology)
Unstructured grids, AMR, linear
/nonlinear solvers, load balance
Laser welds, mechanics modeling, electronic circuits, etc
‹#›
‹#›
FASTMath technologies are used in SciDAC
application partnerships across SC and NNSA
Title
Predicting Ice Sheet and Climate Evolution at Extreme Scales (PISCEES)
Multiscale Methods for Accurate, Efficient, and Scale-Aware Models of the Earth
System
Applying Computationally Efficient Schemes for BioGeochemical Cycles
Charge Transfer and Charge Transport in Photoactivated Systems
Simulating the generation, evolution and fate of electronic excitations in molecular
and nanoscale materials with first principles methods.
PI
FASTMath involvement Office Technology
Steve Price, LANL and
Esmond Ng, LBNL
Ng, Salinger
BER
SG UG S
William Collins, LBNL Woodward, Colella
Forrest Hoffman,
ORNL
Mahadevan
Chris Cramer,
Minnesota
Ng, C Yang
BER
BER
BES
Martin Head-Gordon Li, Ng, C Yang
Roberto Car,
Advanced Modeling of Ions in Solutions, on Surfaces, and in Biological Environments Princeton
Ng, C Yang
Scalable Computational Tools for Discovery and Design -- Excited State Phenomena in James Chelikowsky,
Energy Materials
UT Austin
C Yang
Discontinuous methods for accurate, massively parallel quantum molecular dynamics:
Lithium ion interface dynamics from first principles
John Pask, LLNL
C Yang
BES
Optimizing Superconductor Transport Properties through Large-scale Simulation
Andreas Glatz, ANL
Plasma Surface Interactions: Bridging from the Surface to the Micron Frontier through
Leadership Class Computing
Brian Wirth, ORNL
Munson
BES
Center for Edge Physics Simulation
CS Chang, PPPL
Searching for Physics Beyond the Standard Model: Strongly-Coupled Field Theories at Paul Mackenize,
the Intensity and Energy Frontiers
Fermilab
Adams, Shephard
FES
Falgout
HEP
Computation-Driven Discovery for the Dark Universe
ComPASS: High performance Computing for Accelerator Design and Optimization
A MultiScale Approach to Nuclear Structure and Reactions: Forming the
Computational Bridge between Lattice QCD and Nonrelativistic Many-Body Theory
BES
BES
BES
Constantinescu, Smith FES
Salmon Habib, ANL
Almgren
HEP
Panaggiotis
Ng, Colella, Li, Munson,
Spentzouris, Fermilab C Yang
HEP
Wick Haxton, LBNL
Ng, Falgout, C Yang
NP
Nuclear Computational Low-energy Initiative
Joseph Carlson, LANL Ng, C Yang
NP
ParaDiS Dislocation Dynamics
Tom Arsenlis, LLNL
NNSA
Woodward
‹#›
‹#›
We have helped the application teams significantly
reduce time to solution in their simulations
Sparse direct solves improve time to solution 20X
for accelerators allowing 8 cavity simulation
(Spentzouris)
Sped up flux surface creation to improve 2D mesh
generation in fusion application from 11.5 hours to
1 minute (Chang)
Acceleration-based nonlinear solvers speed up
dislocation dynamics 35-50%; multistage RungeKutta methods reduce time steps by 94% (Arsenlis)
Sophisticated eigensolvers significantly improve
materials calculations in many domains including
ions in solution (Car), excited state phenomenon
(Chelikowsky, Head-Gordon)
‹#›
‹#›
We have helped the application teams achieve
unprecedented resolution and increased reliability
Astrophysics Lyman-a forest simulation at 4096^3 in
an 80Mpc/h box; produced statistics at 1% accuracy
for first time (Habib)
Predictions of grounding line match experiment for
first time in ice sheet modeling due to AMR (Price)
Implicit ODE integrators combined with AMG linear
solvers enables solution of 4D reaction-diffusion
eqns for plasma surface interactions (Wirth)
High-order unstructured meshes for particle
accelerators overcome mesh generation/ adaptation
bottlenecks (Spentzouris)
‹#›
‹#›
Our strategy toward architecture awareness focuses on
both inter- and intra-node issues
Inter-node:
Massive
Concurrency
Intra-node:
Deep NUMA
•
•
•
•
•
Reduce communication
Increase concurrency
Reduce synchronization
Address memory footprint
Enable large communication/computation
overlap
• MPI + threads for many packages
• Compare task and data parallelism
• Thread communicator to allow passing of
thread information among libraries
• Low-level kernels for vector operations that
support hybrid programming models
‹#›
‹#›
We are developing new algorithms that address
architecture awareness
Reduce
communication
Increase
concurrency
• AMG: develop nonGalerkin approaches,
use redundancy or
agglomeration on coarse
grids, develop additive
AMG variants (hypre)
(2X improvement)
• Hierarchical partitioning
optimizes communication
at each level (Zoltan)
(27% improvement in
matrix-vector multiply)
• Relaxation and bottom
solve in AMR multigrid
(Chombo) (2.5X
improvement in solver,
40% overall)
• HSS methods
• New spectrum slicing
eigensolver in PARPACK
(Computes 10s of
thousands of
eigenvalues in small
amounts of time)
• New pole expansion and
selected inversion
schemes (PEXSI) (now
scales to over 100K
cores)
• Utilize BG/Q architecture
for extreme scaling
demonstrations
(PHASTA) (3.1M
processes on 768K
cores unstructured mesh
calculation)
Reduce
synchronization
points
• Implemented pipelined
versions of CG and
conjugate residual
methods; 4X
improvement in speed
(PETSc) (30% speed up
on 32K cores)
Used in
PFLOTRAN
applications
Address memory
footprint issues
• Predictive load balancing
schemes for AMR
(Zoltan) (Allows AMR
runs to complete by
maintaining memory
footprint)
• Hybrid programming
models
Used in
PHASTA
extreme scale
applications
Increase
communication
and computation
overlap
• Improved and stabilized
look-ahead algorithms
(SuperLU) (3X run time
improvement)
Used in
Omega3P
accelerator
simulations
‹#›
‹#›
We are refactoring our software to support hybrid
programming models
Block structured AMR in Chombo
(MPI + OpenMP)
• Adding course grained thread loop over blocks and micro-blocking
reduced communication costs and memory footprint; performance
improvements limited
Unstructured grids using Parallel
Control Utility (MPI and threads)
• Utility layer that allows support of both MPI and threads; showed
30% efficiency improvement and 10% memory reduction on BG/Q
Partitioning with Zoltan2 (MPI +
OpenMP)
• Implementation in multi-dimensional jagged geometric partitioning
scaled to 8B elements on 64K nodes
Direct linear solvers in
SuperLU_DIST (MPI +OpenMP +
CUDA)
• Aggregation of small BLAS operations into larger ones to hide longlatency operations resulted in 2.7X faster performance and 5X
reduction in memory costs on 100-node GPU cluster
Linear solver kernels (Pthreads
and OpenMP)
• Introduction of a thread communicator allows passing this
information among libraries for portable performance with
maintainable kernels
Time integrators in SUNDIALS
(MPI + OpenMP)
• New threaded kernels and integration with SuperLU_MT provide
speed up and flexibility
‹#›
‹#›
The FASTMath team is working together to lower the
barriers to software adoption
 Tutorials
• Argonne Training Program for
Extreme Scale Computing (ATPESC)
(2013, 2014 planned)
• Individual tutorials: ACTS Toolkit,
SIAM summer school, NERSC user
group meeting,Trilinos User group
meeting
 Developing common build/configure
practices
 Software catalogue at the FASTMath
web site
 Software snapshots available at ANL and
LBNL
‹#›
‹#›
The ATPESC effort provided a key focal point for
software unification discussion and strategy
“The hands-on exercises were
very high quality and well
thought out."
 8 hour team tutorial in August 2013
 60 early career students, postdocs, staff
“Very well prepared and
executed ... I liked the
 Installed and tested most FASTMath
possibility to choose the
packages on Vesta
particular test I want to put my
• Hypre, SuperLU, PETSc, PARPACK, hands on ... Very well
organized, committed, and
NOX, ML, Zoltan, MOAB, Chombo,
motivated team ... bravo!"
PUMI, Albany
“The FASTMath hands-on
exercises were very well
• Emphasized interoperability
planned. It is easy to tell when a
group has put serious effort into
 Lectures (Diachin, Dubey, Ralgout, Smith,
their exercises, and they did a
Shephard, Tautges, Woodward)
great job."
 Common look and feel hands-on exercises “Give more time to this session.
Make all speakers able to go
• Hypre, MOAB, PETSc, SuperLU,
deeper."
Chombo, PUMI, Albany
‹#›
‹#›
We are addressing key package management issues
targeting interoperability software
Issues Addressed:
 Inconsistent installation
processes
 Inconsistent or missing
configuration information
 Copying sources as a
means of managing
dependencies
 Spoofing sources (e.g.
MPI) as a means of
simplifying package code
 Inconsistent or missing
versioning
 Managed installations

Publication: “Package Management
Practices Essential for Interoperability:
Lessons learned and Strategies Developed
for FASTMath”, First workshop on
Sustainable Software for Science: Practice
and Experiences, 2013
‹#›
‹#›
In FASTMath, we define institute awareness in two
ways
 Intra-institute awareness
 Inter-institute awareness
‹#›
‹#›
We have active collaborations, technology use, and
discussion with all of the SciDAC Institutes



Effective data layouts
and threading
capabilities in
SUNDIALS speed up
ParaDiS code (AC)
Efficient sparse-dense
matrix-matrix kernels
for block
preconditioned
eigensolver for
nuclear structure
computations (AC)
Performance analysis
of PSDLin, Trilinos
components and
MOAB (TU)




Development of in situ
visualization for
PHASTA CFD
simulations (AC)
Use of VisIt for Ice
Sheet and global
climate simulations
using Chombo (TU)
Use of Paraview for
MOAB and PUMI
based simulations
(TU)
Examining ROMIO
MPI-IO
implementation for
MOAB (ND)




Interoperability of
Dakota and Trilinos
components improved
(AC)
Adding gradient
computations to
Albany for distributed
parameters (AC)
Developing UQ
workflow for Ice Sheet
modeling using
FASTMath
technologies (AC)
Interfaced dimension
reduction algorithm
from QUEST to
Albany (AC)
‹#›
‹#›
FASTMath heavily leverages and helps drive
research in the base research programs at ASCR
FASTMath
technologies
leverage Applied
Math base
Driving
requirements in
the Applied Math
base program
New joint math/cs
projects in
Exascale solvers
initiative
Use of FASTMath
capabilities in
base CS projects
Examples:
Algorithms in
Chombo, BoxLib,
PETSc,
SUNDIALS,
hypre, ML and
Mesquite
Example: Fusion
collaboration led
to a new base
math project in
high-order
discretization
methods
Example:
Discussions
between
FASTMath and
Super led to
ExReDi Project
Example: DTEC
X-Stack project is
exploring use of
domain specific
languages in
Chombo
Image courtesy of ESL
Project
Image courtesy of Sam
Williams, LBNL
Image courtesy of
DTEC Project
Image courtesy of
hypre Project
‹#›
‹#›
The FASTMath team includes experts from four national
laboratories and six universities
Lawrence Berkeley
National Laboratory
Mark Adams
Ann Almgren
Phil Colella
Anshu Dubey
Dan Graves
Sherry Li
Lin Lin
Terry Ligocki
Mike Lijewski
Peter McCorquodale
Esmond Ng
Brian Van Straalen
Chao Yang
Subcontract: Jim Demmel
(UC Berkeley)
Lawrence Livermore
National Laboratory
Barna Bihari
Lori Diachin
Milo Dorr
Rob Falgout
Mark Miller
Jacob Schroder
Carol Woodward
Ulrike Yang
Subcontract: Carl Ollivier-Gooch
(Univ of British Columbia)
Subcontract: Dan Reynolds
(Southern Methodist)
Rensselear Polytechnic Inst.
E. Seegyoung Seol
Onkar Sahni
Mark Shephard
Cameron Smith
Subcontract: Ken Jansen
(UC Boulder)
Argonne National Laboratory
Jed Brown
Lois Curfman McInnes
Todd Munson
Vijay Mahadevan
Barry Smith
Subcontract: Jim Jiao
(SUNY Stony Brook)
Subcontract: Paul Wilson
(Univ of Wisconsin)
Sandia National
Laboratories
Karen Devine
Glen Hansen
Jonathan Hu
Vitus Leung
Siva Rajamanickam
Michel Wolf
Andrew Salinger
‹#›
‹#›
FASTMath Management
 Institute Director: Lori Diachin
• Responsible for overall management
of project
• Single POC for FASTMath to ASCR
• Biweekly teleconferences with DOE
ASCR
 Executive Council
• Responsible for representing the key
capability areas of FASTMath project
• Coordinate project activities and
technical reporting
 Institutional POCs
• Responsible for budgetary and
reporting requirements to DOE
Lori Diachin
Institute Director
LLNL POC
Phil Colella
Structured Grid
Technologies
Mark Shephard
Unstructured Grid
Technologies
RPI POC
Barry Smith
Linear Solver
Technologies
ANL POC
Esmond Ng
Nonlinear and Eigen
Solvers
LBNL POC
Andy Salinger
Integrated Technologies
Karen Devine
SNL POC
‹#›
‹#›
FASTMath Communication
 Face to face meetings
• Kick off meeting, Nov 2011
• Project team meetings:
 Feb 2012, 2013 2014 at SIAM mtgs
 July 2012, 2013 at SciDAC PI mtgs
 Team Webinar series for technical exchange
with internal and external speakers
 Project management
• Biweekly scheduled exec council teleconferences
• Team telecons as needed
 Redmine Wiki site for tutorial and software development
 Subteam teleconferences and face-to-face meetings
‹#›
‹#›
FASTMath review presentations highlight technical
work and applications impact in key discipline areas
Time
Talk Title
Presenter
1:30-1:55 pm
Structured Grid
Technologies
Phil Colella
1:55 – 2:20 pm
Unstructured Grid
Technologies
Mark Shephard
2:20 – 2:45 pm
Solver
Technologies
Esmond Ng
2:45 – 3:10 pm
Integrated
Technologies
Andy Salinger
3:10 – 3:20 pm
Flexible Q&A
All
‹#›
‹#›
The FASTMath team is on track and delivering for the
SciDAC program
1. FASTMath technologies are integral to a broad range of
applications and are used world-wide (SciDAC, SC,
NNSA, Applied Energy, DoD, NASA, Industry, etc)
1. We are leaders in mathematical algorithm and software
development for current and next generation
architectures
1. The FASTMath team is leveraging this unique
opportunity to develop new integrated technologies
‹#›
‹#›
Download