The Role of Analysis in Modern High Performance Computing J. L. Schwarzmeier, Cray Inc, July 6, 2007 Outline Where I am coming from Describe process of going from ‘problem to be solved’ to ‘solution on computer’ and role of analysis throughout Conclusions Where I am coming from As much as I have studied and enjoyed math, I am not a researcher in analysis a personal view of analysis and applied math • However, did see lots of use of analysis for theoretical and numerical purposes in my own work and work of others • 20 years since I left LANL for Cray … • Cray personnel generally do not work with customers at the level of their fundamental equations. But continue to see sophisticated uses of analysis by customers • Sometimes we do recommend alternative numerical schemes to improve performance Will describe ‘journey’ of identifying problem to be solved … … to obtaining solution on a computer, role of analysis, sprinkled with examples I have encountered. Role of analysis in the sciences Is the use of analysis is science/engineering relegated to a bygone era – before computers and before all science had been ‘discovered’? NO!! Myth: all science has already been discovered Reality: no, laws of physics known, but focus has changed to solving more realistic problems wave function of the universe, ( x1 , x2 ,...xN , t ) , where N = # particles in universe, satisfies H ( x1 , x2 ,..., x N , t ) (ih / 2 ) t , (, t ) 0 approximately ‘true’ but totally useless equation. Since 1920’s about the only quantum mechanical problems that have been solved ‘exactly’ are: hydrogen atom, single particle in harmonic oscillator potential, or other similarly idealized situations. Even on most powerful supercomputers, quantum chemistry codes struggle mightily to approximately solve N-atom Schroedinger equation for N ~ O(10) atoms. Even on today’s supercomputers, important problems cannot be solved by brute force – clever algorithms and implementations are required and knowing what approximations to make Circle of dependency Scientific progress today often requires multidisciplinary collaboration: • Team of scientists • Mathematics, numerical methods & analysis • Computer specialists Do we have enough students entering analysis/applied math? Will their training allow them to bridge the gap from scientist/engineer to computer programmer? 1. Scientists define problem 2. Analysis to understand properties, guide solution 3. Numerics and computer implementation Step 1 of the Journey: role of scientists/engineers 1. A big opportunity for analysis and computing today is solving new, realistic, specific, often multi-disciplinary problems. Specific model equations are derived from general equations by physical and mathematical intuition 2. Focus on narrowed range of problems of interest Of fundamental equations which terms are important; how are multidisciplinary PDEs coupled together? How to deal with greatly disparate space/time scales? modified set of equations In climate studies current model couples atmosphere, ocean, land, sea ice. Seek to improve model by including: 100 chemical species in atmosphere; full carbon cycle modeling with interaction of vegetation, plant decay, release of CO2; full fresh water hydrology with river basin modeling and drainage into oceans; higher spatial resolution to model cloud physics and effects of narrow land formations – Florida – on oceanic currents; better modeling of manmade interactions such as fires, deforestation, development, irrigation, pollution, etc. This is a much expanded set of equations and unknowns – needs mathematical foundation Another multi-disciplinary example Researchers at Rice University Use Cray Supercomputer to Unlock Biomedical Mysteries and Aid Future Diagnostics “Members of the Team for Advanced Flow Simulation and Modeling at Rice are collaborating with colleagues from other institutions to create computational fluid dynamics models that mimic how blood courses through the brain’s arteries and interacts with an aneurysm on a vessel wall. An aneurysm is a balloon-like protrusion of an artery that could be fatal if it bursts. The team uses the Cray system to simulate numerically how the blood, artery and aneurysm interact with each other. The data is then loaded into a program from Computational Engineering International called EnSight, which provides visualization and analytical capabilities.” “Accurate blood-flow simulations are extremely complex because an artery wall isn’t rigid and blood pressure fluctuates with the beating of the heart,” says Tayfun Tezduyar, professor of mechanical engineering at Rice. “We want to understand how much a cerebral artery wall deforms, how blood flow is affected and what stresses are created that could affect the aneurysm. A precise understanding of this dynamic will be of great benefit to brain surgeons when they have to make a decision about whether or not to operate…” “A traditional Singular Value Decomposition algorithm running on a conventional computer does not preserve the symmetry of the molecule, making it difficult to isolate and study a protein’s characteristics. The team developed more accurate algorithms that they “parallelized” to run quickly on the supercomputer…” Example of HPC for economic competitiveness CRAY SUPERCOMPUTERS PLAY KEY ROLE IN DESIGNING BOEING 787 DREAMLINER 800,000 Simulation Hours Helped Create Design For Highly Successful Commercial Aircraft SEATTLE, WA, July 5, 2007 -- Global supercomputer leader Cray Inc. (Nasdaq GM: CRAY) reported today that 800,000 processor hours of computing time on Cray supercomputers went into the design of the highly successful Boeing 787 Dreamliner. Supercomputer-based modeling and simulation is far more efficient, costeffective and practical than physical proto-typing for testing large numbers of design variables. While physical prototyping is still important for final design validation, Boeing engineers were able to build the 787 Dreamliner after physically testing only 11 wing designs, versus 77 wing designs for the earlier Boeing 767 aircraft. The Boeing 787 Dreamliner is 20% lighter and produces 20% fewer emissions than similarly sized airplanes, while providing 10% better per-seat costs per mile, according to Boeing. Step 2: mathematical analysis Analysis gives range of analytical techniques of posing or formally solving boundary value problems • What are mathematical properties of equations: are equations parabolic, elliptic, hyperbolic?; what are appropriate BCs?; are initial conditions continuous, differentiable?; should we allow for weak solutions?; what are continuity properties of operators? • Can perturbation theory be used to find ‘first-order’ solutions? Are there transformations of independent, dependent variables that simplify problem?; are there analytical, limiting solutions?; are there symmetry properties to take advantage of? • What are possible methods of solution: eigenfunction expansions, Green’s function, method of characteristics, transform methods, divergence theorem, Stoke’s theorem?; are there special functions that form an expansion set with desired continuity, orthogonality, completeness properties? • Is the solution derivable from a variational principle?; are there constraints in the problem – Lagrange multipliers? • Many of these techniques apply not only as traditional mathematical solutions but also as numerical solutions on computers, only now linear spaces have finite dimensionality Resolving coordinate-induced singularities For problems involving cylindrical or spherical domains, introduce artificial singularities at origin by cylindrical or spherical coordinates must eliminate singular solutions, made worse by increasing resolution. For expansions g ( x, y ) f m (r, ) ~ eim um (r ) , determine minimum continuity conditions on function u m (r ) that allows only analytic solutions about the origin. Do this separately for scalar quantities u(r ) versus vector components (ur ( r ), u ( r ), uz ( r )) . This can be done independent of any PDE. 2 2 For example, for simple scalar case, for m 0 , f m (r , ) u m ( x y ) Derivatives of the latter functional form eventually will diverge at the 2 origin, unless we have u m (r ) m (r ) . For m 0, we have m m m im ( x iy ) . Thus e ( x iy) r f m (r , ) m u m (r ) r im m 2 Thus the analytic solution is of the form f m ( r, ) e r m ( r ) Furthermore, in finite element implementations shape functions that 2 intersect the origin can be made explicit functions of r . Shape functions whose finite elements do not reach the origin can be functions of r. Use of Fourier Expansion For theoretical purposes, decomposition of solution in terms of Fourier ‘wavelengths’ and ‘frequencies’ will always be important for judging physical scales of interest Specific applicability for a) periodic boundary conditions, b) smooth, long wavelength solutions, c) when # basis functions needs to be small as possible, and d) when analysis or computation aided by orthogonal basis Currently used in the following HPC applications • • • • Turbulence modeling Long range molecular forces in molecular dynamics and materials science Application that have geometries (cylindrical, spherical) with periodic BCs Some weather/climate codes for latitude-longitude But, Fourier expansion not for every problem • Solutions with steep gradients exhibit “Gibb’s phenomena”, where Fourier solution has strong, spurious oscillations around discontinuities • Complicated boundaries difficult with global expansion functions • some problems have coordinate singularities in Fourier expansions DNS code and Fourier Transforms E.g. 3-D turbulence. Direct Numerical Simulation (DNS) code seeks to understand distribution of energy in density/velocity fluctuations versus wavelengths for the incompressible Navier-Stokes equations, for a specified source of turbulence. Understanding turbulence needed for efficient design of airplanes, vehicles, combustion systems, etc. Code written in Fortran 90 with MPI Time evolution: Runge Kutta 2nd order Spatial derivative calculation: pseudospectral method Typically, FFTs are done in all 3 dimensions. Parallel 3D FFT: so-called transpose strategy, as opposed to direct strategy. That is, make sure all data in direction of 1D transform resides in one processor’s memory. Parallelize over orthogonal dimension(s). Data decomposition: N3 grid points over P processors • Originally 1D (slab) decomposition: divide one side of the cube over P, assign N/P planes to each processor. Limitation: P <= N • Currently 2D (pencil) decomposition: divide side of the cube (N2) over P, assign N2/P pencils (columns) to each processor. DNS code needs PFLOPS sustained performance to achieve turbulence scaling on 122883 grid in ~40 hours Issues of Fourier expansions for climate models Pick geometry, Boundary Conditions (BC), coordinate system In climate/weather studies, use of latitude-longitude (and height) coordinates leads to efficient Fourier expansions in longitude, Legendre transforms in latitude. But there are issues with this approach: 1) artificial singularities at poles must eliminate singular solutions at poles do complicated ‘Fourier filtering’ but poor load balancing on computers; 2) 1D data decomposition in either longitude or latitude dimensions leads to limited scalability on computers cannot use more processors to reduce wall time. Furthermore, when switching between Fourier phase or Legendre phase data must be re-distributed across processors via global transposes, which require very high bandwidth networks. For these reasons the climate/weather community is moving away from Fourier/spectral methods to finite element approaches: More Detailed Models with High Resolution Sea surface temperature (degreesC) on the last day of year 4 from the 1/10 degree, 42 level POP spinup simulation on Jaguar. The result of this spinup run will be used as the ocean initial condition for a fully coupled climate run. (Mat Maltrud, LANL) (courtesy M. Gunzberger) (courtesy P. Duffy) New methods and scaling Scaling to 100,000 processors using cubed sphere • • Finite Volume with GFDL (Lin, Kerr, Putman) Spectral Elements with NCAR (Taylor, Nair) Cloud resolving icoshedral dynamical core being developed by Randall at CSU under SciDAC2 Example of perturbation theory, Hamiltonian dynamics, etc. in Magnetic Fusion Plasma physics is regime of high temperature, ionized gases. At T > 100M nuclear fusion can occur. Energy of fusion reactions released as high energy neutrons, photons, or alpha particles, depending on type of reaction. Energy can be captured in a reactor to make electricity -- ‘ultimate’ energy source After equilibrium and gross stability are ensured, need to reduce microturbulenceinduced thermal transport to walls so applied heating can raise temperature Microturbulence requires particle description rather than continuous fluid model Collisionless plasma described by Vlasov equation for f ( x , v , t ) -- 6D phase space f e v f ( E v B) v f 0 t m Electromagnetic field given by Maxwell’s equations, where plasma particles are sources of charge density and current density. Highly coupled, nonlinear system Solving Vlasov equation equivalent to solving equations of motion for millions of particles subject to applied and self-consistent forces: Particle-In-Cell (PIC) Ion temperature gradient instability drives microturbulence, but Tokamak fusion devices have a small parameter of rL R 1, rL vth eB m, R is major radius of torus. Perturbation analysis for small is used to simplify Hamiltonian Example of choosing proper coordinates: Gyrokinetic Toroidal Code (GTC) Tokamaks are main candidates for fusion research today • Plasma contained in toroidal (donut) shaped device. Long way around is ‘toroidal’ direction, short way around is ‘poloidal’, and minor radial direction. • ‘obvious’ independent spatial coordinates are ( r, , ) • Better to transform from r to poloidal flux, (r , ) , B , B ˆ (r , ) B dA ( r, ) B (r 0, ) 0 Efficiency of Global Field-aligned Mesh Transform from toroidal coordinates to canonical magnetic coordinates ( r, , ) ( , , ), / q, q rB RB R Use perturbation analysis to find canonical coordinates with H ex H 2 O( 3 ). Following particle motion with magnetic coordinates in Hamiltonian ‘straightens’ out B and changes charge deposition step of PIC from 3D to 2D process. This allows for coarse grid and saves factor ~ 100 in CPU time. This is huge and easily justifies a team spending a year doing analysis to get this right Domain Decomposition Domain decomposition: • each MPI process holds a toroidal section • each particle is assigned to a processor according to its position Initial memory allocation is done locally on each processor to maximize efficiency Communication between domains is done with MPI calls (runs on most parallel computers) Step 2: Example of taking advantage of symmetry Find eigenfunctions, eigenvalues of operator (x) in 1D geometry with periodic boundary conditions at x L, but which also has periodicity length a, L Na : ( x a) ( x). The eigenvalue problem is ( x)n ( x) nn ( x) , n (0) n ( L). The eigenfunctions are of the form n ( x ) kn ( x ) eikx ukn ( x ) , where k 2k L , k 1,2,...N, ukn ( x a ) ukn ( x) . That is, rather than find eigenfunctions over interval L Na , find reduced eigenfunctions kn (x ) over interval a , for each value of k . This is an example of the Floquet-Bloch theorem. Step3: Analysis in computer solution Many new, critical problems of interest will involve computer solution. Embrace this reality, as analysis can greatly aid implementation of computer solutions • Do contributions from new sciences enter as ‘source’ terms to old models or are new PDEs introduced? What is coupling at interfaces – what continuity conditions, conservation laws, and boundary conditions are appropriate? • What are asymptotic solutions, symmetric solutions, or other ‘first-order’ solutions? They can be crucial in understanding properties of general solution space. Special solutions often can be used as sanity checks, to improve choice of basis functions, reduce computation, or improve convergence. • Applied math needed to steer among myriad of possible numerical algorithms, based on mathematical properties of final system of equations: methods for different types of PDEs; finite differencing versus finite elements; global eigenfunction expansions versus basis functions with compact support; direct versus iterative solvers; explicit versus implicit iterative methods; convergence and stability of iterative methods; what kind of iterative solver – multi-grid, conjugate gradient,…; pre-conditioners?… • How is problem distributed among processors?; are scalable algorithms chosen for inter-processor communication?; is code written to allow full expression of parallelism to CPUs, including vectorization? How do university math departments view analysis? Is there an applied math major? In my opinion, applied math majors should take • • • • • • • • • Required: 2 semesters calculus-based introductory physics 1 semester undergraduate mechanics 1 semester undergraduate atomic/quantum physics 1 semester undergraduate numerical methods/analysis 1 semester programming for scientific applications, including Fortran/C/C++/Matlab [1 semester undergraduate electromagnetism] 22 [25] credits outside math courses Each of the physics courses has graduate counterparts … Or, rather than physics could focus on engineering fields, chemistry, biology, medicine, etc Math I found most useful: calculus, vector analysis, complex variables, linear vector spaces, ODEs, PDEs, probability, calculus of variations, linear algebra Conclusions Analysis and applied math are crucial components of training professionals needed help mankind to understand the environment, achieve fusion energy, enable ‘drugs by design’, develop new materials, etc. These also keep the US on top economically and in terms of national security Today’s problems are more targeted to solving specific problems that are directly tied to industrial, national, or international need Many of today’s problems are multi-disciplinary and all require sound mathematical foundation and understanding Today’s big problems are solved on supercomputers, and use of these machines requires solid understanding of algorithms and computer system architecture