The Role of Analysis in Modern High Performance Computing

The Role of Analysis in Modern
High Performance Computing
J. L. Schwarzmeier, Cray Inc,
July 6, 2007
 Where I am coming from
 Describe process of going from ‘problem to be solved’ to
‘solution on computer’ and role of analysis throughout
 Conclusions
Where I am coming from
 As much as I have studied and enjoyed math, I am not a
researcher in analysis  a personal view of analysis and
applied math
• However, did see lots of use of analysis for theoretical and numerical
purposes in my own work and work of others
• 20 years since I left LANL for Cray …
• Cray personnel generally do not work with customers at the level of
their fundamental equations. But continue to see sophisticated uses
of analysis by customers
• Sometimes we do recommend alternative numerical schemes to
improve performance
 Will describe ‘journey’ of identifying problem to be solved …
 … to obtaining solution on a computer, role of analysis,
sprinkled with examples I have encountered.
Role of analysis in the sciences
 Is the use of analysis is science/engineering relegated to a
bygone era – before computers and before all science had
been ‘discovered’?
 Myth: all science has already been discovered
Reality: no, laws of physics known, but focus has changed
to solving more realistic problems
wave function of the universe, ( x1 , x2 ,...xN , t ) , where N = # particles in universe,
H ( x1 , x2 ,..., x N , t )  (ih / 2 )
 ,  (, t )  0
approximately ‘true’ but totally useless equation. Since 1920’s about the only quantum
mechanical problems that have been solved ‘exactly’ are: hydrogen atom, single particle
in harmonic oscillator potential, or other similarly idealized situations. Even on most
powerful supercomputers, quantum chemistry codes struggle mightily to approximately
solve N-atom Schroedinger equation for N ~ O(10) atoms.
 Even on today’s supercomputers, important problems cannot be solved
by brute force – clever algorithms and implementations are required and
knowing what approximations to make
Circle of dependency
 Scientific progress today
often requires multidisciplinary collaboration:
• Team of scientists
• Mathematics, numerical
methods & analysis
• Computer specialists
 Do we have enough
students entering
analysis/applied math?
Will their training allow
them to bridge the gap
from scientist/engineer to
computer programmer?
1. Scientists
define problem
2. Analysis to
guide solution
3. Numerics
and computer
Step 1 of the Journey: role of scientists/engineers
1. A big opportunity for analysis and computing today is
solving new, realistic, specific, often multi-disciplinary
problems. Specific model equations are derived from
general equations by physical and mathematical intuition
2. Focus on narrowed range of problems of interest
Of fundamental equations which terms are important; how are multidisciplinary PDEs coupled together? How to deal with greatly
disparate space/time scales?  modified set of equations
In climate studies current model couples atmosphere, ocean, land,
sea ice. Seek to improve model by including: 100 chemical species
in atmosphere; full carbon cycle modeling with interaction of
vegetation, plant decay, release of CO2; full fresh water hydrology
with river basin modeling and drainage into oceans; higher spatial
resolution to model cloud physics and effects of narrow land
formations – Florida – on oceanic currents; better modeling of manmade interactions such as fires, deforestation, development,
irrigation, pollution, etc. This is a much expanded set of equations
and unknowns – needs mathematical foundation
Another multi-disciplinary example
 Researchers at Rice University Use Cray Supercomputer to Unlock
Biomedical Mysteries and Aid Future Diagnostics
“Members of the Team for Advanced Flow Simulation and Modeling at Rice are
collaborating with colleagues from other institutions to create computational fluid dynamics
models that mimic how blood courses through the brain’s arteries and interacts with an
aneurysm on a vessel wall. An aneurysm is a balloon-like protrusion of an artery that could
be fatal if it bursts. The team uses the Cray system to simulate numerically how the blood,
artery and aneurysm interact with each other. The data is then loaded into a program from
Computational Engineering International called EnSight, which provides visualization and
analytical capabilities.”
“Accurate blood-flow simulations are extremely complex because an artery wall isn’t rigid
and blood pressure fluctuates with the beating of the heart,” says Tayfun Tezduyar,
professor of mechanical engineering at Rice. “We want to understand how much a cerebral
artery wall deforms, how blood flow is affected and what stresses are created that could
affect the aneurysm. A precise understanding of this dynamic will be of great benefit to
brain surgeons when they have to make a decision about whether or not to operate…”
“A traditional Singular Value Decomposition algorithm running on a conventional computer
does not preserve the symmetry of the molecule, making it difficult to isolate and study a
protein’s characteristics. The team developed more accurate algorithms that they
“parallelized” to run quickly on the supercomputer…”
Example of HPC for economic competitiveness
800,000 Simulation Hours Helped Create Design For Highly Successful
Commercial Aircraft
SEATTLE, WA, July 5, 2007 -- Global supercomputer leader Cray Inc. (Nasdaq
GM: CRAY) reported today that 800,000 processor hours of computing time on
Cray supercomputers went into the design of the highly successful Boeing 787
Supercomputer-based modeling and simulation is far more efficient, costeffective and practical than physical proto-typing for testing large numbers of
design variables. While physical prototyping is still important for final design
validation, Boeing engineers were able to build the 787 Dreamliner after
physically testing only 11 wing designs, versus 77 wing designs for the earlier
Boeing 767 aircraft.
The Boeing 787 Dreamliner is 20% lighter and produces 20% fewer emissions
than similarly sized airplanes, while providing 10% better per-seat costs per mile,
according to Boeing.
Step 2: mathematical analysis
 Analysis gives range of analytical techniques of posing or
formally solving boundary value problems
• What are mathematical properties of equations: are equations
parabolic, elliptic, hyperbolic?; what are appropriate BCs?; are initial
conditions continuous, differentiable?; should we allow for weak
solutions?; what are continuity properties of operators?
• Can perturbation theory be used to find ‘first-order’ solutions? Are
there transformations of independent, dependent variables that
simplify problem?; are there analytical, limiting solutions?; are there
symmetry properties to take advantage of?
• What are possible methods of solution: eigenfunction expansions,
Green’s function, method of characteristics, transform methods,
divergence theorem, Stoke’s theorem?; are there special functions
that form an expansion set with desired continuity, orthogonality,
completeness properties?
• Is the solution derivable from a variational principle?; are there
constraints in the problem – Lagrange multipliers?
• Many of these techniques apply not only as traditional mathematical
solutions but also as numerical solutions on computers, only now
linear spaces have finite dimensionality
Resolving coordinate-induced singularities
For problems involving cylindrical or spherical domains, introduce
artificial singularities at origin by cylindrical or spherical coordinates 
must eliminate singular solutions, made worse by increasing resolution.
For expansions g ( x, y )  f m (r, ) ~ eim um (r ) , determine minimum continuity
conditions on function u m (r ) that allows only analytic solutions about the
origin. Do this separately for scalar quantities u(r ) versus vector
components (ur ( r ), u ( r ), uz ( r )) . This can be done independent of any
For example, for simple scalar case, for m  0 , f m (r ,  )  u m ( x  y )
Derivatives of the latter functional form eventually will diverge at the
origin, unless we have u m (r )   m (r ) . For m  0, we have
e  ( x  iy) r
f m (r , ) 
u m (r )
im m
Thus the analytic solution is of the form f m ( r,  )  e r  m ( r )
Furthermore, in finite element implementations shape functions that
intersect the origin can be made explicit functions of r . Shape
functions whose finite elements do not reach the origin can be functions
of r.
Use of Fourier Expansion
 For theoretical purposes, decomposition of solution in terms of Fourier
‘wavelengths’ and ‘frequencies’ will always be important for judging
physical scales of interest
 Specific applicability for a) periodic boundary conditions, b) smooth, long
wavelength solutions, c) when # basis functions needs to be small as
possible, and d) when analysis or computation aided by orthogonal basis
 Currently used in the following HPC applications
Turbulence modeling
Long range molecular forces in molecular dynamics and materials science
Application that have geometries (cylindrical, spherical) with periodic BCs
Some weather/climate codes for latitude-longitude
 But, Fourier expansion not for every problem
• Solutions with steep gradients exhibit “Gibb’s phenomena”, where Fourier
solution has strong, spurious oscillations around discontinuities
• Complicated boundaries difficult with global expansion functions
• some problems have coordinate singularities in Fourier expansions
DNS code and Fourier Transforms
 E.g. 3-D turbulence. Direct Numerical Simulation (DNS) code seeks to
understand distribution of energy in density/velocity fluctuations versus
wavelengths for the incompressible Navier-Stokes equations, for a specified
source of turbulence. Understanding turbulence needed for efficient design
of airplanes, vehicles, combustion systems, etc.
Code written in Fortran 90 with MPI
Time evolution: Runge Kutta 2nd order
Spatial derivative calculation: pseudospectral method
Typically, FFTs are done in all 3 dimensions.
Parallel 3D FFT: so-called transpose strategy, as opposed to direct
strategy. That is, make sure all data in direction of 1D transform resides
in one processor’s memory. Parallelize over orthogonal dimension(s).
Data decomposition: N3 grid points over P processors
• Originally 1D (slab) decomposition: divide one side of the cube over P, assign
N/P planes to each processor. Limitation: P <= N
• Currently 2D (pencil) decomposition: divide side of the cube (N2) over P,
assign N2/P pencils (columns) to each processor.
 DNS code needs PFLOPS sustained performance to achieve turbulence
scaling on 122883 grid in ~40 hours
Issues of Fourier expansions for climate models
Pick geometry, Boundary Conditions (BC), coordinate system
In climate/weather studies, use of latitude-longitude (and height)
coordinates leads to efficient Fourier expansions in longitude,
Legendre transforms in latitude. But there are issues with this
approach: 1) artificial singularities at poles  must eliminate
singular solutions at poles  do complicated ‘Fourier filtering’ 
but poor load balancing on computers; 2) 1D data decomposition
in either longitude or latitude dimensions leads to limited
scalability on computers  cannot use more processors to
reduce wall time. Furthermore, when switching between Fourier
phase or Legendre phase data must be re-distributed across
processors via global transposes, which require very high
bandwidth networks. For these reasons the climate/weather
community is moving away from Fourier/spectral methods to
finite element approaches:
More Detailed Models with High Resolution
Sea surface temperature (degreesC) on the last day of year 4 from the 1/10 degree, 42
level POP spinup simulation on Jaguar. The result of this spinup run will be used as
the ocean initial condition for a fully coupled climate run. (Mat Maltrud, LANL)
(courtesy M. Gunzberger)
(courtesy P. Duffy)
New methods and scaling
 Scaling to 100,000 processors using
cubed sphere
Finite Volume with GFDL (Lin, Kerr,
Spectral Elements with NCAR
(Taylor, Nair)
 Cloud resolving icoshedral
dynamical core being developed by
Randall at CSU under SciDAC2
Example of perturbation theory, Hamiltonian
dynamics, etc. in Magnetic Fusion
 Plasma physics is regime of high temperature, ionized gases. At T > 100M
nuclear fusion can occur. Energy of fusion reactions released as high energy
neutrons, photons, or alpha particles, depending on type of reaction. Energy can
be captured in a reactor to make electricity -- ‘ultimate’ energy source
After equilibrium and gross stability are ensured, need to reduce microturbulenceinduced thermal transport to walls so applied heating can raise temperature
Microturbulence requires particle description rather than continuous fluid model
 
Collisionless plasma described by Vlasov equation for f ( x , v , t ) -- 6D phase space
f 
e   
 v  f  ( E  v  B)   v f  0
Electromagnetic field given by Maxwell’s equations, where plasma particles are
sources of charge density and current density. Highly coupled, nonlinear system
Solving Vlasov equation equivalent to solving equations of motion for millions of
particles subject to applied and self-consistent forces: Particle-In-Cell (PIC)
Ion temperature gradient instability drives microturbulence, but Tokamak fusion
devices have a small parameter of   rL R  1, rL  vth eB m, R is major radius
of torus. Perturbation analysis for small  is used to simplify Hamiltonian
Example of choosing proper coordinates:
Gyrokinetic Toroidal Code (GTC)
 Tokamaks are main candidates for fusion research today
• Plasma contained in toroidal (donut) shaped device. Long way around is
‘toroidal’ direction, short way around is ‘poloidal’, and minor radial direction.
• ‘obvious’ independent spatial coordinates are ( r,  ,  )
• Better to transform from r to poloidal flux,  (r , )
 , B
 , B
  (r , )   B   dA
( r,  )
  (r  0, )  0
Efficiency of Global Field-aligned Mesh
 Transform from toroidal coordinates to canonical magnetic coordinates
( r,  ,  )  (  ,  ,  ),      / q, q  rB RB
 Use perturbation analysis to
find canonical coordinates
with H ex  H 2  O( 3 ). Following
particle motion with magnetic
coordinates in Hamiltonian
‘straightens’ out B and
changes charge deposition
step of PIC from 3D to 2D
process. This allows for
coarse    grid and saves
factor ~ 100 in CPU time.
This is huge and easily
justifies a team spending a
year doing analysis to get this
Domain Decomposition
 Domain decomposition:
• each MPI process holds a toroidal section
• each particle is assigned to a processor according to its position
 Initial memory allocation is done locally on each processor to maximize
 Communication between domains is done with MPI calls (runs on most
parallel computers)
Step 2: Example of taking advantage of symmetry
 Find eigenfunctions, eigenvalues of operator (x) in 1D geometry with
periodic boundary conditions at x  L, but which also has periodicity
length a, L  Na : ( x  a)  ( x). The eigenvalue problem is
( x)n ( x)  nn ( x) , n (0)  n ( L). The eigenfunctions are of the form
n ( x )  kn ( x )  eikx ukn ( x ) , where k  2k  L , k   1,2,...N, ukn ( x  a )  ukn ( x) .
That is, rather than find eigenfunctions over interval L  Na , find reduced
eigenfunctions kn (x ) over interval a , for each value of k . This is an
example of the Floquet-Bloch theorem.
Step3: Analysis in computer solution
 Many new, critical problems of interest will involve computer
solution. Embrace this reality, as analysis can greatly aid
implementation of computer solutions
• Do contributions from new sciences enter as ‘source’ terms to old models or
are new PDEs introduced? What is coupling at interfaces – what continuity
conditions, conservation laws, and boundary conditions are appropriate?
• What are asymptotic solutions, symmetric solutions, or other ‘first-order’
solutions? They can be crucial in understanding properties of general solution
space. Special solutions often can be used as sanity checks, to improve
choice of basis functions, reduce computation, or improve convergence.
• Applied math needed to steer among myriad of possible numerical
algorithms, based on mathematical properties of final system of equations:
methods for different types of PDEs; finite differencing versus finite elements;
global eigenfunction expansions versus basis functions with compact support;
direct versus iterative solvers; explicit versus implicit iterative methods;
convergence and stability of iterative methods; what kind of iterative solver –
multi-grid, conjugate gradient,…; pre-conditioners?…
• How is problem distributed among processors?; are scalable algorithms
chosen for inter-processor communication?; is code written to allow full
expression of parallelism to CPUs, including vectorization?
How do university math departments view
analysis? Is there an applied math major?
 In my opinion, applied math majors should take
Required: 2 semesters calculus-based introductory physics
1 semester undergraduate mechanics
1 semester undergraduate atomic/quantum physics
1 semester undergraduate numerical methods/analysis
1 semester programming for scientific applications, including
[1 semester undergraduate electromagnetism]
22 [25] credits outside math courses
Each of the physics courses has graduate counterparts …
Or, rather than physics could focus on engineering fields, chemistry,
biology, medicine, etc
 Math I found most useful: calculus, vector analysis, complex
variables, linear vector spaces, ODEs, PDEs, probability,
calculus of variations, linear algebra
 Analysis and applied math are crucial components of
training professionals needed help mankind to understand
the environment, achieve fusion energy, enable ‘drugs by
design’, develop new materials, etc. These also keep the US
on top economically and in terms of national security
 Today’s problems are more targeted to solving specific
problems that are directly tied to industrial, national, or
international need
 Many of today’s problems are multi-disciplinary and all
require sound mathematical foundation and understanding
 Today’s big problems are solved on supercomputers, and
use of these machines requires solid understanding of
algorithms and computer system architecture