Parallel Computing 2007: Science Applications February 26-March 1 2007 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu PC07ScienceApps gcf@indiana.edu 1 Four Descriptions of Matter -- Quantum, Particle, Statistical, Continuum • Quantum Physics • • • • Particle Dynamics Statistical Physics Continuum Physics – These give rise to different algorithms and in some cases, one will mix these different descriptions. We will briefly describe these with a pointer to types of algorithms used. – These descriptions underlie several different fields such as physics, chemistry, environmental modeling, climatology. – indeed any field that studies physical world from a reasonably fundamental point of view. – For instance, they directly underlie weather prediction as this is phrased in terms of properties of atmosphere. – However, if you simulate a chemical plant, you would not phrase this directly in terms of atomic properties but rather in terms of phenomenological macroscopic artifacts - "pipes", "valves", "machines", "people" etc. (today several biology simulations are of this phenomenological type) General Relativity and Quantum Gravity – These describe space-time at the ultimate level but are not needed in practical real world calculations. There are important academic computations studying these descriptions of matter. PC07ScienceApps gcf@indiana.edu 2 Quantum Physics and Examples of Use of Computation • This is a fundamental description of the microscopic world. You would in principle use it to describe everything but this is both unnecessary and too difficult both computationally and analytically. • Quantum Physics problems are typified by Quantum Chromodynamics (QCD) calculations and these end up looking identical to statistical physics problems numerically. There are also some chemistry problems where quantum effects are important. These give rise to several types of algorithms. – Solution to Schrodinger's equation (a partial differential equation). This can only be done exactly for simple 2-->4 particle systems – Formulation of a large matrix whose rows and columns are the distinct states of the system. This is followed by typical matrix operations (diagonalization, multiplication, inversion) – Statistical methods which can be thought of as Monte Carlo evaluation of integrals gotten in integral equation formulation of problem • These are Grid (QCD) or Matrix PC07ScienceApps gcf@indiana.edu 3 Particle Dynamics and Examples of Use of Computation • Quantum effects are only important at small distances (10-13 cms for the so called strong or nuclear forces, 10-8 cm for electromagnetically interacting particles). • Often these short distance effects are unimportant and it is sufficient to treat physics classically. Then all matter is made up of particles - which are selected from set of atoms (electrons etc.). • The most well known problems of this type come from biochemistry. Here we study biologically interesting proteins which are made up of some 10,000 to 100,000 atoms. We hope to understand the chemical basis of life or more practically find which proteins are potentially interesting drugs. • Particles each obey Newton's Law and study of proteins generalizes the numerical formulation of the study of the solar system where the sun and planets are evolved in time as defined by Gravity's Force Law PC07ScienceApps gcf@indiana.edu 4 Particle Dynamics and Example of Astrophysics • Astrophysics has several important particle dynamics problems where new particles are not atoms but rather stars, clusters of stars, galaxies or clusters of galaxies. • The numerical algorithm is similar but there is an important new approach because we have a lot of particles (currently over N=107) and all particles interact with each other. • This naively has a computational complexity of O(N2) at each time step but a clever numerical method reduces it to O(N) or O (NlogN). • Physics problems addressed include: – Evolution of early universe structure of today – Why are galaxies spiral? – What happens when galaxies collide? – What makes globular clusters (with O(106) stars) like they are? PC07ScienceApps gcf@indiana.edu 5 Statistical Physics and Comparison of Monte Carlo and Particle Dynamics • Large systems reach equilibrium and ensemble properties (temperature, pressure, specific heat, ...) can be found statistically. This is essentially law of large numbers (central limit theorem). • The resultant approach moves particles "randomly" asccording to some probability and NOT deterministically as in Newton's laws • Many properties of particle systems can be calculated either by Monte Carlo or by Particle Dynamics. Monte Carlo is harder as cannot evolve particles independently. • This can lead to (soluble!) difficulties in parallel algorithms as lack of independence implies that synchronization issues. • Many quantum systems treated just like statistical physics as quantum theory built on probability densities PC07ScienceApps gcf@indiana.edu 6 Continuum Physics as an approximation to Particle Dynamics • Replace particle description by average. 1023 molecules in a molar volume is too many to handle numerically. So divide full system into a large number of "small" volumes dV such that: – Macroscopic Properties: Temperature, velocity, pressure are essentially constant in volume • In principle, use statistical physics (or Particle Dynamics averaged as "Transport Equations") to describe volume dV in terms of macroscopic (ensemble) properties for volume • Volume size = dV must be small enough so macroscopic properties are indeed constant; dV must be large enough so can average over molecular motion to define properties – As typical molecule is 10-8 cm in linear dimension, these constraints are not hard – Breaks down sometimes e.g. leading edges at shuttle reentry etc. Then you augment continuum approach (computational fluid dynamics) with explicit particle method PC07ScienceApps gcf@indiana.edu 7 Computational Fluid Dynamics • Computational Fluid Dynamics is dominant numerical field for Continuum Physics • There are a set of partial differential equations which cover – liquids including blood, oil etc. – gases including airflow over wings and weather • We apply computational "fluid" dynamics most often to the gas air. Gases are really particles • If a small number (<106) of particles, use "molecular dynamics" and if a large number (1023) use computational fluid dynamics. PC07ScienceApps gcf@indiana.edu 8 • Computational Sweet Spots A given application needs a certain computer performance to do a certain style of computation • In 1980 we had a few megaflop (106 floating point operation/sec) and this allowed simple two dimensional continuum physics simulations • Now in 2005, we have “routinely” a few teraflop peak performance and this allows three dimensional continuum physics simulations • However some areas need much larger computational power and haven’t reached “their sweet spot” – Some computations in Nuclear and Particle Physics are like this – One can study properties of particles with today’s computers but scattering of two particles appears to require complexity 109 X 109 • In some areas you have two sweet spots – a low performance sweet spot for a “phenomenological model” – If you go to a “fundamental description”, one needs far more computer power than is available today – Biology is of this type PC07ScienceApps gcf@indiana.edu 9 What needs to be Solved? • A set of particles or things (cells in biology), transistors in circuit simulation) – Solve couple ordinary differential equations – There are lots of “things” to decompose over for parallelism • One or more fields which are functions of space and time (continuum physics) – Discretize space and time and define fields on Grid points spread over domain – Parallelize over Grid points • Matrices which could need to be diagonalized to find eigenvectors and eigenvalues – Quantun physics – Mode analysis – principal components – Parallelize over matrix elements PC07ScienceApps gcf@indiana.edu 10 Classes of Physical Simulations • Mathematical (Numerical) formulations of simulations fall into a few key classes which have their own distinctive algorithmic and parallelism issues • Most common formalism is that of a field theory where quantities of interest are represented by densities defined over a 1,2,3 or 4 dimensional space. – Such a description could be “fundamental” as in electromagnetism or relativity for gravitational field or “approximate” as in CFD where a fluid density averages over a particle description. – Our Laplace example is of this form where field could either be fundamental (as in electrostatics) or approximate if comes from Euler equations for CFD PC07ScienceApps gcf@indiana.edu 11 Applications reducing to Coupled set of Ordinary Differential Equations • Another set of models of physical systems represent them as coupled set of discrete entities evolving over time – Instead of (x,t) one gets i(t) labeled by an index i – Discretizing x in continuous case leads to discrete case but in many cases, discrete formulation is fundamental • Within coupled discrete system class, one has two important approaches – Classic time stepped simulations -- loop over all i at fixed t updating to – Discrete event simulations -- loop over all events representing changes of state of i(t) PC07ScienceApps gcf@indiana.edu 12 Particle Dynamics or Equivalent Problems • Particles are sets of entities -- sometimes fixed (atoms in a crystal) or sometimes moving (galaxies in a universe) • They are characterized by force Fij on particle i due to particle j • Forces are characterized by their range r: Fij(xi,xj) is zero if distance |xi-xj| greater than r • Examples: – – – – The universe A globular star cluster The atoms in a crystal vibrating under interatomic forces Molecules in a protein rotating and flexing under interatomic force • Laws of Motion are typically ordinary differential equations – Ordinary means differentiate wrt one variable -- typically time PC07ScienceApps gcf@indiana.edu 13 Classes of Particle Problems • If the range r is small (as in a crystal), the one gets numerical formulations and parallel computing considerations similar to those in Laplace example with local communication – We showed in Laplace module that efficiency increases as range of force increases • If r is infinite ( no cut-off for force) as in gravitational problem, one finds rather different issues which we will discuss in this module • There are several “non-particle” problems discussed later that reduce to long range force problem characterized by every entity interacting with every other entity – Characterized by a calculation where updating entity i involves all other entities j PC07ScienceApps gcf@indiana.edu 14 Circuit Simulations I • An electrical or electronic network has the same structure as a particle problem where “particles” are components (transistor, resistance, inductance etc.) and “force” between components i and j is nonzero if and only if i and j are linked in the circuit – For simulations of electrical transmission networks (the electrical grid), one would naturally use classic time stepped simulations updating each component i from state at time t to state at time t+t. • If one is simulating something like a chip, then time stepped approach is very wasteful as 99.99% of the components are doing nothing (i.e. remain in same state) at any given time step! – Here is where discrete event simulations (DES) are useful as one only computes where the action is • Biological simulations often are formulated as networks where each component (say a neuron or a cell) is described by an ODE and the network couples components PC07ScienceApps gcf@indiana.edu 15 Circuit Simulations II • Discrete Event Simulations are clearly preferable on sequential machines but parallel algorithms are hard due to need for dynamic load balancing (events are dynamic and not uniform throughout system) and synchronization (which events can be executed in parallel?) • There are several important approaches to DES of which best known is Time Warp method originally proposed by David Jefferson -- here one optimistically executes events in parallel and rolls back to an earlier state if this is found to be inconsistent • Conservative methods (only execute those events you are certain cannot be impacted by earlier events) have little paralellism – e.g. there is only one event with lowest global time • DES do not exhibit the classic loosely synchronous computecommunicate structure as there is no uniform global time – typically even with time warp, no scalable parallelism PC07ScienceApps gcf@indiana.edu 16 Discrete Event Simulations • Suppose we try to execute in parallel events E1 and E2 at times t1 and t2 with t1< t2. • We show the timelines of several(4) objects in the system and our two events E1 and E2 • If E1 generates no interfering events or one E*12 at a time greater than t2 then our parallel execution of E2 is consistent • However if E1 generates E12 before t2 then execution of E2 has to be rolled back and E12 should be executed first E1 E21 E11 Objects E22 in System E12 E2 PC07ScienceApps E*12 gcf@indiana.edu Time 17 Matrices and Graphs I • Especially in cases where the “force” is linear in the i(t) , it is convenient to think of force being specified by a matrix M whose elements mij are nonzero if and only if the force between i and j is nonzero. A typical force law is: Fi = mij j(t) • In Laplace Equation example, the matrix M is sparse ( most elements are zero) and this is a specially common case where one can and needs to develop efficient algorithms • We discuss in another talk the matrix formulation in the case of partial differential solvers PC07ScienceApps gcf@indiana.edu 18 Matrices and Graphs II • Another way of looking at these problems is as graphs G where the nodes of the graphs are labeled by the particles i, and one has edges linking i to j if and only if the force Fij is non zero • In these languages, long range force problems correspond to dense matrix M (all elements nonzero) and fully connected graphs G 10 1 3 4 7 11 9 5 2 6 8 12 PC07ScienceApps gcf@indiana.edu 19 Other N-Body Like Problems - I • The characteristic structure of N-body problem is an observable that depends on all pairs of entities from a set of N entities. • This structure is seen in diverse applications: • 1) Look at a database of items and calculate some form of correlation between all pairs of database entries • 2) This was first used in studies of measurements of a "chaotic dynamical system" with points xi which are vectors of length m Put rij = distance between xi and xj in m dimensional space Then probability p(rij = r) is proportional to r(d-1) – where d (not equal to m) is dynamical dimension of system – calculate by forming all the rij (for i and j running over observable points from our system -- usually a time series) and accumulating in a histogram of bins in r – Parallel algorithm in a nutshell: Store histograms replicated in all processors, distribute vectors equally in each processor and just pipeline xj through processors and as they pass through accumulate rij ; add histograms together at end. PC07ScienceApps gcf@indiana.edu 20 Other N-Body Like Problems - II • 3) Green's Function Approach to simple Partial Differential equations gives solutions as integrals of known Green's functions times "source" or "boundary" terms. – For the simulation of earthquakes in GEM project the source terms are strains in faults and the stresses in any fault segment are the integral over strains in all other segments – Compared to particle dynamics, Force law replaced by Green's function but in each case total stress/Force is sum over contributions associated with other entities in formulation • 4) In the so called vortex method in CFD (Computational Fluid Dynamics) one models the Navier Stokes Equation as the long range interactions between entities which are the vortices • 5) Chemistry uses molecular dynamics and so particles are molecules but force is not Newton's laws usually but rather Van der Waals forces which are long range but fall off faster than 1/r2 PC07ScienceApps gcf@indiana.edu 21 Chapters 5-8 of Sourcebook • Chapters 5-8 are the main application section of this book! • The Sourcebook of Parallel Computing, Edited by Jack Dongarra, Ian Foster, Geoffrey Fox, William Gropp, Ken Kennedy, Linda Torczon, Andy White, October 2002, 760 pages, ISBN 155860-871-0, Morgan Kaufmann Publishers. http://www.mkp.com/books_catalog/cat alog.asp?ISBN=1-55860-871-0 PC07ScienceApps gcf@indiana.edu 22 Computational Fluid Dynamics (CFD) in Chapter 5 I • This chapter provides a thorough formulation of CFD with a general discussion of the importance of non-linear terms and most importantly viscosity. • Difficult features like shockwaves and turbulence can be traced to the small coefficient of the highest order derivatives. • Incompressible flow is approached using the spectral element method, which combines the features of finite elements (copes with complex geometries) and highly accurate approximations within each element. • These problems need fast solvers for elliptic equations and there is a detailed discussion of data and matrix structure and the use of iterative conjugate gradient methods. • This is compared with direct solvers using the static condensation method for calculating the solution (stiffness) matrix. PC07ScienceApps gcf@indiana.edu 23 Computational Fluid Dynamics (CFD) in Chapter 5 II • The generally important problem of adaptive meshes is described using the successive refinement quad/oct-tree (in two/three dimensions) method. • Compressible flow methods are reviewed and the key problem of coping with the rapid change in field variables at shockwaves is identified. • One uses a lower order approximation near a shock but preserves the most powerful high order spectral methods in the areas where the flow is smooth. • Parallel computing (using space filling curves for decomposition) and adaptive meshes are covered. PC07ScienceApps gcf@indiana.edu 24 Space filling curve PC07ScienceApps gcf@indiana.edu 25 Environment and Energy in Chapter 6 I • This article describes three distinct problem areas – each illustrating important general approaches. • Subsurface flow in porous media is needed in both oil reservoir simulations and environmental pollution studies. – The nearly hyperbolic or parabolic flow equations are characterized by multiple constituents and by very heterogeneous media with possible abrupt discontinuities in the physical domain. – This motivates the use of domain decomposition methods where the full region is divided into blocks which can use different solution methods if necessary. – The blocks must be iteratively reconciled at their boundaries (mortar spaces). – The IPARS code described has been successfully integrated into two powerful problem solving environment: NetSolve described in chapter 14 and DISCOVER (aimed especially at interactive steering) from Rutgers university. PC07ScienceApps gcf@indiana.edu 26 PC07ScienceApps gcf@indiana.edu 27 Environment and Energy in Chapter 6 II • The discussion of the shallow water problem uses a method involving implicit (in the vertical direction) and explicit (in the horizontal plane) time-marching methods. • It is instructive to see that good parallel performance is obtained by only decomposing in the horizontal directions and keeping the hard to parallelize implicit algorithm sequentially implemented. • The irregular mesh was tackled using space filling curves as also described in chapter 5. • Finally important code coupling (meta-problem in chapter 4 notation) issues are discussed for oil spill simulations where water and chemical transport need to be modeled in a linked fashion • . ADR (Active Data Repository) technology from Maryland is used to link the computations between the water and chemical simulations. Sophisticated filtering is needed to match the output and input needs of the two subsystems. 28 PC07ScienceApps gcf@indiana.edu Molecular Quantum Chemistry in Chapter 7 I • This article surveys in detail two capabilities of the NWChem package from Pacific Northwest Laboratory. It surveys other aspects of computational chemistry. • This field makes extensive use of particle dynamics algorithms and some use of partial differential equation solvers. • However characteristic of computational chemistry is the importance of matrix-based methods and these are the focus of this chapter. The matrix is the Hamiltonian (energy) and is typically symmetric positive definite. • In a quantum approach, the eigensystems of this matrix are the equilibrium states of the molecule being studied. This type of problem is characteristic of quantum theoretical methods in physics and chemistry; particle dynamics is used in classical non-quantum regimes. PC07ScienceApps gcf@indiana.edu 29 Molecular Quantum Chemistry in Chapter 7 II • NWChem uses a software approach – the Global Array (GA) toolkit, whose programming model lies in between those of HPF and message passing and has been highly successful. • GA exposes locality to the programmer but has a shared memory programming model for accessing data stored in remote processors. • Interestingly in many cases calculating the matrix elements dominates (over solving for eigenfunctions) and this is a pleasing parallel task. • This task requires very careful blocking and staging of the components used to calculate the integrals forming the matrix elements. • In some approaches, parallel matrix multiplication is important in generating the matrices. • The matrices typically are taken as full and very powerful parallel eigensolvers were developed for this problem. • This area of science clearly shows the benefit of linear algebra libraries (see chapter 20) and general performance enhancements PC07ScienceApps gcf@indiana.edu 30 like blocking. General Relativity • This field evolves in time complex partial differential equations which have some similarities with the simpler Maxwell equations used in electromagnetics (Sec. 8.6). • Key difficulties are the boundary conditions which are outgoing waves at infinity and the difficult and unique multiple black hole surface conditions internally. • Finite difference and adaptive meshes are the usual approach. PC07ScienceApps gcf@indiana.edu 31 Lattice Quantum Chromodynamics (QCD) and Monte Carlo Methods I • Monte Carlo Methods are central to the numerical approaches to many fields (especially in physics and chemistry) and by their nature can take substantial computing resources. • Note that the error in the computation only decreases like the square root of computer time used compared to the power convergence of most differential equation and particle dynamics based methods. • One finds Monte Carlo methods when problems are posed as integral equations and the often-high dimension integrals are solved by Monte Carlo methods using a randomly distributed set of integration points. • Quantum Chromodynamics (QCD) simulations described in this subsection are a classic example of large-scale Monte Carlo simulations which perform excellently on most parallel machines due to modest communication costs and regular structure leading to good node performance. PC07ScienceApps gcf@indiana.edu 32 Errors in Numerical Integration • • • • • For an integral with N points Monte Carlo has error 1/N0.5 Iterated Trapezoidal has error 1/N2 Iterated Simpson has error 1/N4 Iterated Gaussian is error 1/N2m for our a basic integration scheme with m points • But in d dimensions, for all but Monte Carlo must set up a Grid of N1/d points on a side; that hardly works above N=3 – Monte Carlo error still 1/N0.5 – Simpson error becomes 1/N4/d etc. PC07ScienceApps gcf@indiana.edu 33 Monte Carlo Convergence • In homework for N=10,000,000 one finds errors in π of around 10-6 using Simpson’s rule • This is a combination of rounding error (when computer does floating point arithmetic, it is inevitably approximate) and error from formula which is proportional to N-4 • For Monte Carlo, error will be about 1.0/N0.5 • So an error of 10-6 requires N=1012 or • N=1000,000,000,000 (100,000 more than Simpson’s rule) • One doesn’t use Monte Carlo to get such precise results!PC07ScienceApps gcf@indiana.edu 34 Lattice Quantum Chromodynamics (QCD) and Monte Carlo Methods II • This application is straightforward to parallelize and very suitable for HPF as the basic data structure is an array. However the work described here uses a portable MPI code. • Section 8.9 describes some new Monte Carlo algorithms but QCD advances typically come from new physics insights allowing more efficient numerical formulations. • This field has generated many special purpose facilities as the lack of significant I/O and CPU intense nature of QCD allows optimized node designs. The work at Columbia and Tsukuba universities is well known. • There are other important irregular geometry Monte Carlo problems and they see many of the same issues such as adaptive load balancing seen in irregular finite element problems. PC07ScienceApps gcf@indiana.edu 35 Ocean Modeling • This describes the issues encountered in optimizing a whole earth ocean simulation including realistic geography and proper ocean atmosphere boundaries. • Conjugate gradient solvers and MPI message passing with Fortran 90 are used for the parallel implicit solver for the vertically averaged flow. PC07ScienceApps gcf@indiana.edu 36 Tsunami Simulations • These are still very preliminary; an area where much more work could be done PC07ScienceApps gcf@indiana.edu 37 Multidisciplinary Simulations • Oceans naturally couple to atmosphere and atmosphere couples to environment including – Deforestration – Emissions from using gasoline (fossil fuels) – Conversely atmosphere makes lakes acid etc. • These are not trivial as very different timescales PC07ScienceApps gcf@indiana.edu 38 Earthquake Simulations • Earthquake simulations are a relatively young field and it is not known how far they can go in forecasting large earthquakes. • The field has an increasing amount of real-time sensor data, which needs data assimilation techniques and automatic differentiation tools such as those of chapter 24. • Study of earthquake faults can use finite element techniques or with some approximation, Green’s function approaches, which can use fast multipole methods. • Analysis of observational and simulation data need data mining methods as described in subsection 8.7 and 8.8. • The principal component and hidden Markov classification algorithms currently used in the earthquake field illustrate the diversity in data mining methods when compared to the decision tree methods of section 8.7. • Most uses of parallel computing are still pleasingly parallel PC07ScienceApps gcf@indiana.edu 39 Published February 19, 2002 in: Proceedings of the National Academy of Sciences, USA PC07ScienceApps gcf@indiana.edu Decision Threshold = 10-4 40 Status of the Real Time Earthquake Forecast Experiment (Original Version) ( JB Rundle et al., PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo et al., Europhys. Lett., 60, 481-487, 2002; JB Rundle et al.,Rev. Geophys. Space Phys., 41(4), DOI 10.1029/2003RG000135 ,2003. http://quakesim.jpl.nasa.gov ) (Composite N-S Catalog) 6≤M 5≤M≤6 After the work was completed 1. Big Bear I, M = 5.1, Feb 10, 2001 2. Coso, M = 5.1, July 17, 2001 After the paper was in press ( September 1, 2001 ) 3. Anza I, M = 5.1, Oct 31, 2001 After the paper was published ( February 19, 2002 ) 4. Baja, M = 5.7, Feb 22, 2002 5. Gilroy, M=4.9 - 5.1, May 13, 2002 6. Big Bear II, M=5.4, Feb 22, 2003 7. San Simeon, M = 6.5, Dec 22, 2003 8. San Clemente Island, M = 5.2, June 15, 2004 9. Bodie I, M=5.5, Sept. 18, 2004 10. Bodie II, M=5.4, Sept. 18, 2004 11. Parkfield I, M = 6.0, Sept. 28, 2004 12. Parkfield II, M = 5.2, Sept. 29, 2004 13. Arvin, M = 5.0, Sept. 29, 2004 14. Parkfield III, M = 5.0, Sept. 30, 2004 15. Wheeler Ridge, M = 5.2, April 16, 2005 16. Anza II, M = 5.2, June 12, 2005 17. Yucaipa, M = 4.9 - 5.2, June 16, 2005 18. Obsidian Butte, M = 5.1, Sept. 2, 2005 Note: This original forecast was made using both the full Southern California catalog plus the full Northern California catalog. The S. Calif catalog was used south of lattitude 36o, and the N. Calif. catalog was used north of 36o . No corrections were applied for the different event statistics in the two catalogs. Green triangles mark locations of large earthquakes 10 41 (M 5.0) between Jan 1, 1990 – Dec 31, 1999. Increase in Potential for significant earthquakes, ~ 2000 to PC07ScienceApps 2010 Plot of Log (Seismic Potential) gcf@indiana.edu CL#03-2015 Decision Threshold = 10-3 Eighteen significant earthquakes (blue circles) have occurred in Central or Southern California. Margin of error of the anomalies is +/- 11 km; Data from S. CA. and N. CA catalogs: Forecasting m World-Wide 7 Earthquakes, Earthquakes: 1, 2000 - 2010 Seismicity World-Wide MJanuary > 5, 1965-2000 Circles represent m 7 from January 1, ANSS earthquakes Catalog - 1970-2000, Magnitude m 2000 5 – Present UC Davis Group led by John Rundle PC07ScienceApps gcf@indiana.edu 42 Cosmological Structure Formation (CSF) • CSF is an example of a coupled particle field problem. • Here the universe is viewed as a set of particles which generate a gravitational field obeying Poisson’s equation. • The field then determines the force needed to evolve each particle in time. This structure is also seen in Plasma physics where electrons create an electromagnetic field. • It is hard to generate compatible particle and field decompositions. CSF exhibits large ranges in distance and temporal scale characteristic of the attractive gravitational forces. • Poisson’s equation is solved by fast Fourier transforms and deeply adaptive meshes are generated. • The article describes both MPI and CMFortran (HPF like) implementations. • Further it made use of object oriented techniques (chapter 13) with kernels in F77. Some approaches to this problem class use fast multipole methods. PC07ScienceApps gcf@indiana.edu 43 Cosmological Structure Formation (CSF) • There is a lot of structure in universe PC07ScienceApps gcf@indiana.edu 44 PC07ScienceApps gcf@indiana.edu 45 PC07ScienceApps gcf@indiana.edu 46 PC07ScienceApps gcf@indiana.edu 47 PC07ScienceApps gcf@indiana.edu 48 Computational Electromagnetics (CEM) • This overview summarizes several different approaches to electromagnetic simulations and notes the growing importance of coupling electromagnetics with other disciplines such as aerodynamics and chemical physics. • Parallel computing has been successfully applied to the three major approaches to CEM. • Asymptotic methods use ray tracing as seen in visualization. Frequency domain methods use moment (spectral) expansions that were the earliest uses of large parallel full matrix solvers 10 to 15 years ago; these now have switched to the fast multipole approach. • Finally time-domain methods use finite volume (element) methods with an unstructured mesh. As in general relativity, special attention is needed to get accurate wave solutions at infinity in the time-domain approach. PC07ScienceApps gcf@indiana.edu 49 PC07ScienceApps gcf@indiana.edu 50 Data mining • Data mining is a broad field with many different applications and algorithms (see also sections 8.4 and 8.8). • This article describes important algorithms used for example in discovering associations between items that were likely to be purchased by the same customer; these associations could occur either in time or because the purchases tended to be in the same shopping basket. • Other data-mining problems discussed include the classification problem tackled by decision trees. • These tree based approaches are parallelized effectively (as they are based on huge transaction databases) with load balance being a difficult issue. PC07ScienceApps gcf@indiana.edu 51 Signal and Image Processing • This samples some of the issues from this field, which currently makes surprisingly little use of parallel computing even though good parallel algorithms often exist. • The field has preferred the convenient programming model and interactive feedback of systems like MATLAB and Khoros. • These are problem solving environments as described in chapter 14 of SOURCEBOOK. PC07ScienceApps gcf@indiana.edu 52 Monte Carlo Methods and Financial Modeling I • Subsection 8.2 introduces Monte Carlo methods and this subsection describes some very important developments in the generation of “random” numbers. • Quasirandom numbers (QRN’s) are more uniformly distributed than the standard truly random numbers and for certain integrals lead to more rapid convergence. • In particular these methods have been applied to financial modeling where one needs to calculate one or more functions (stock prices, their derivatives or other financial instruments) at some future time by integrating over the possible future values of the underlying variables. • These future values are given by models based on the past behavior of the stock. PC07ScienceApps gcf@indiana.edu 53 Monte Carlo Methods and Financial Modeling II • This can be captured in some cases by the volatility or standard deviation of the stock. • The simplest model is perhaps the Black-Scholes equation, which can be derived from a Gaussian stock distribution, combined with an underlying "no-arbitrage" assumption. This asserts that the stock market is always in equilibrium instantaneously and there is no opportunity to make money by exploiting mismatches between buy and sell prices. • In a physics language, the different players in the stock market form a heat bath, which keeps the market in adiabatic equilibrium. • There is a straightforward (to parallelize and implement) binomial method for predicting the probability distributions of financial instruments. However Monte Carlo methods and QRN’s are the most powerful approach. PC07ScienceApps gcf@indiana.edu 54 Quasi Real-time Data analysis of Photon Source Experiments • This subsection describes a successful application of computational grids to accelerate the data analysis of an accelerator experiment. It is an example that can be generalized to other cases. • The accelerator (here a photon source at Argonne) data is passed in real-time to a supercomputer where the analysis is performed. Multiple visualization and control stations are also connected to the Grid. PC07ScienceApps gcf@indiana.edu 55 PC07ScienceApps gcf@indiana.edu 56 Forces Modeling and Simulation • This subsection describes event driven simulations which as discussed in chapter 4 are very common in military applications. • A distributed object approach called HLA (see chapter 13) is being used for modern problems of this class. • Some run in “real-time” with synchronization provided by wall clock and humans and machines in the loop. • Other cases are run in “virtual time” in a more traditional standalone fashion. • This article describes integration of these military standards with Object Web ideas such as CORBA and .NET from Microsoft. • One application simulated the interaction of vehicles with a million mines on a distributed Grid of computers. – This work also parallelized the minefield simulator using threads (chapter 10). PC07ScienceApps gcf@indiana.edu 57 Event Driven Simulations • This is a graph based model where independent objects issue events that travel as messages to other objects • Hard to parallelize as no guarantee that event will not arrive from past in simulation time • Often run in “real-time” 1 t1 3 t2 2 PC07ScienceApps gcf@indiana.edu 58 • • • • • • • Industrial Strength Parallel Computing I Morgan Kaufmann publishes a book “Industrial Strength Parallel Computing” (ISPC), Edited by Alice E. Koniges, which is complementary to our book as it has a major emphasis on application experience. As a guide to readers interested in further insight as to which technologies are useful in which application areas we give a brief summary of the application chapters of ISPC. We will use CRPC to designate work in Sourcebook and ISPC to denote work in “Industrial Strength Parallel Computing”. Chapter 7 - Ocean Modeling and Visualization (Yi Chao, P. Peggy Li, Ping Wang, Daniel S. Katz, Benny N. Cheng, Scott Whitman) of ISPC This uses a variant of the same ocean code described in section 8.4 of CRPC and describes both basic parallel strategies and the integration of the simulation with a parallel 3D volume renderer. Chapter 8 - Impact of Aircraft on Global Atmospheric Chemistry (Douglas A. Rotman, John R. Tannahill, Steven L. Baughcum) of ISPC This discusses issues related to those in chapter 6 of CRPC in the context of estimating the impact on atmospheric chemistry of supersonic aircraft emissions. Task decomposition (code coupling) for different physics packages is combined with domain decomposition and parallel block data decomposition. Again one keeps the vertical direction in each processor and decomposes in the horizontal plane. Nontrivial technical problems are found in the polar regions due to the decomposition singularities. Chapter 9 - Petroleum Reservoir Management (Michael DeLong, Allyson Gajraj, Wayne Joubert, Olaf Lubeck, James Sanderson, Robert E. Stephenson, Gautam S. Shiralkar, Bart van Bloemen Waanders) of ISPC This addresses an application covered in chapter 6 of CRPC but focuses on a different code Falcon developed as a collaboration between Amoco and Los Alamos. As in other chapters of ISPC, detailed performance results are given but particularly interesting is the discussion of the sparse matrix solver (chapter 21 of CRPC). A very efficient parallel pre-conditioner for a fully implicit solver was developed based on the ILU (Incomplete LU) approach. This re-arranged the order of computation but faithfully preserved the sequential algorithm. PC07ScienceApps gcf@indiana.edu 59 • • • • • • • • Industrial Strength Parallel Computing II Chapter 10 - An Architecture-Independent Navier-Stokes Code (Johnson C. T. Wang, Stephen Taylor) of ISPC This describes parallelization of a commercial code ALSINS (from the Aerospace Corporation) which solves the Navier-Stokes equations (chapter 5 of CRPC) using finite difference methods in the Reynolds averaging approximation for turbulence. Domain decomposition (Chapters 6 and 20 of CRPC) and MPI is used for the parallelism. The application studied involved flow over Delta and Titan launch rockets Chapter 11 - Gaining Insights into the Flow in a Static Mixer (Olivier Byrde, Mark L. Sawley) of ISPC This studies flow in commercial chemical mixers using Reynolds-averaged Navier-Stokes equations using finite volume methods as in ISPC, chapter 10. Domain decomposition (Chapters 6 and 20 of CRPC) of a block structured code and PVM is used for the parallelism. The mixing study required parallel study of particle trajectories in the calculated flow field. Chapter 12 - Modeling Groundwater Flow and Contaminant Transport (William J. Bosl, Steven F. Ashby, Chuck Baldwin, Robert D. Falgout, Steven G. Smith, Andrew F. B. Tompson) of ISPC This presents a groundwater flow (chapter 6 of CRPC) code ParFlow that uses finite volume methods to generate the finite difference equations. A highlight is the detailed discussion of parallel multigrid (Chapters 8.6, 12 and 21 of CRPC), which is used not as a complete solver but as a pre-conditioner for a conjugate gradient algorithm. Chapter 13 - Simulation of Plasma Reactors (Stephen Taylor, Marc Rieffel, Jerrell Watts, Sadasivan Shankar) of ISPC This simulates plasma reactors used in semiconductor manufacturing plants. The Direct Simulation Monte Carlo method is used to model the system in terms of locally interacting particles. Adaptive three dimensional meshes (chapter 19 of CRPC) are used with a novel diffusive algorithm to control dynamic load balancing (chapter 18 of CRPC). PC07ScienceApps gcf@indiana.edu 60 Industrial Strength Parallel Computing III • • • • • • • • • • Chapter 14 - Electron-Molecule Collisions for Plasma Modeling (Carl Winstead, Chuo-Han Lee, Vincent McKoy) of ISPC This complements chapter 13 of ISPC by studying the fundamental particle interactions in plasma reactors. It is instructive to compare the discussion of the algorithm in this chapter with that of chapter 7 of CRPC. They lead to similar conclusions with chapter 7 naturally describing the issues more generally. Two steps – calculation of matrix elements and then a horde of matrix multiplications to transform basis sets dominate the computation. In this problem class, the matrix solver is not a computationally significant step. Chapter 15 - Three-Dimensional Plasma Particle-in-Cell Calculations of Ion Thruster Backflow Contamination (Robie I. Samanta Roy, Daniel E. Hastings, Stephen Taylor) of ISPC This chapter studies contamination from space-craft thruster exhaust using a three dimensional particle in the cell code. This involves a mix of solving Poisson’s equation for the electrostatic field and evolving ions under the forces calculated from this field. There are algorithmic similarities to the astrophysics problems in CRPC section 8.6 but electromagnetic problems produce less extreme density concentrations than the purely attractive (and hence clumping) gravitational force found in astrophysics. Chapter 16 - Advanced Atomic-Level Materials Design (Lin H. Yang) of ISPC This describes a Quantum Molecular Dynamics package implementing the well known CarParrinello method. This is part of the NWChem package featured in chapter 7 of CRPC but not described in detail there. The computation is mostly dominated by 3D FFT, and basic BLAS (complex vector arithmetic) calls but has significant I/O. Chapter 17 - Solving Symmetric Eigenvalue Problems (David C. O'Neal, Raghurama Reddy) of ISPC This describes parallel eigenvalue determination which is covered in sections 7.4.3 and chapter 20 of CRPC. Chapter 18 - Nuclear Magnetic Resonance Simulations (Alan J. Benesi, Kenneth M. Merz, James J. Vincent, Ravi Subramanya) of ISPC This is a pleasing parallel computation of NMR spectra gotten by averaging over crystal PC07ScienceApps gcf@indiana.edu 61 orientation. Industrial Strength Parallel Computing IV • • • • • • • • • • Chapter 19 - Molecular Dynamics Simulations Using Particle-Mesh Ewald Methods (Michael F. Crowley, David W. Deerfield II, Tom A. Darden, Thomas E. Cheatham III) of ISPC This chapter discusses parallelization of a widely used molecular dynamics AMBER and its application to computational biology. Much of the discussion is devoted to implementing a particle-mesh method aimed at fast calculation of the long range forces. Chapter 8.6 discusses this problem for astrophysical cases. The ISPC discussion focuses on the needed 3D FFT. Chapter 20 - Radar Scattering and Antenna Modeling (Tom Cwik, Cinzia Zuffada, Daniel S. Katz, Jay Parker) of ISPC This article discusses a finite element formulation of computational electromagnetics (see section 8.7 of CRPC) which leads to a sparse matrix problem with multiple right hand sides. The minimum residual iterative solver was used – this is similar to the conjugate gradient approach described extensively in the CRPC Book (Chapters 20,21 and many applications – especially chapter 5). The complex geometries of realistic antenna and scattering problems demanded sophisticated mesh generation. (chapter 19 of CRPC) Chapter 21 - Functional Magnetic Resonance Imaging Dataset Analysis (Nigel H. Goddard, Greg Hood, Jonathan D. Cohen, Leigh E. Nystrom, William F. Eddy, Christopher R. Genovese, Douglas C. Noll) of ISPC This describes a commonly important type of data analysis where raw images (MRI scans in neuroscience) need basic processing before they can be interpreted. This processing for MRI involves a pipeline of 5-15 steps of which the computationally intense Fourier transforms, interpolation and head motion corrections were parallelized. Sections 8.9 and 8.11 of CRPC describe related applications. Chapter 22 - Selective and Sensitive Comparison of Genetic Sequence Data (Alexander J. Ropelewski, Hugh B. Nicholas, Jr., David W. Deerfield II ) of ISPC This describes the very important genome database search problem implemented in a program called Msearch. The basic sequential algorithm involves very sophisticated pattern matching but parallelism is straightforward because one can use pleasingly parallel approaches involving decomposing the computation over parts of the searched database. Chapter 23 - Interactive Optimization of Video Compression Algorithms (Henri Nicolas, Fred Jordan) of ISPC This chapter describes parallel compression algorithms for video streams. The parallelism involves dividing images into blocks and independently compressing each block. The goal is an interactive system to support design of new compression methods. PC07ScienceApps gcf@indiana.edu 62 Parallel Computing Works Applications • Parallel Computing Works G C Fox, P Messina, and R Williams; Morgan Kaufmann, San Mateo Ca, (1994) http://www.old-npac.org/copywrite/pcw/ • These applications are not as sophisticated as those discussed above as they come from a time when few scientists addressed three dimensional problems; 2D computations were typically the best you could do in partial differential equation arena. To make a stark contrast, the early 1983 QCD (section 8.3) computations in PCW were done on the Caltech hypercube whose 64 nodes could only make it to a total of 3 megaflops when combined! Today teraflop performance is available – almost a million times better … • Nevertheless in many applications, the parallel approaches described in this book are still sound and state of the art. • The book develops Complex Systems formalism used here PC07ScienceApps gcf@indiana.edu 63 Parallel Computing Works I • PCW Chapter 3 “A Methodology for Computation” describes more formally the approach taken in chapter 4 of CRPC. • PCW Chapter 4 “Synchronous Applications I” describes QCD (section 8.3) and other similar statistical physics Monte Carlo simulations on a regular lattice. It also presents a cellular automata model for granular materials (such as sand dunes), which has a simple regular lattice structure as mentioned in section 4.5 of CRPC. • PCW Chapter 6 “Synchronous Applications II” describes other regular problems including convectively-dominated flows and the flux-corrected transport differential equations. High statistics studies of two-dimensional statistical physics problems are used to study phase transitions (cf. sections 8.3 and 8.10 of CRPC). Parallel multiscale methods are also described for various image processing algorithms including surface reconstruction, character recognition, real-time motion field estimation and collective stereosis (cf. section 8.9 of CRPC). PC07ScienceApps gcf@indiana.edu 64 • Parallel Computing Works II PCW Chapter 7 “Independent Parallelism” describes what is termed “pleasingly parallel” applications in chapter 4. This PCW chapter included a physics computation of quantum string theory surfaces, parallel random number generation, and ray tracing to study a statistical approach to the gravitational lensing of quasars by galaxies. A high temperature superconductor study used Quantum Monte Carlo method – here one uses Monte Carlo methods to generate a set of random independent paths – a different problem structure to that of section 8.3 but a method of general importance in chemistry, condensed matter and nuclear physics. GENESIS was one of the first general purpose biological neural network simulators. • PCW Chapter 8 “Full Matrix Algorithms and Their Applications” first discusses some parallel matrix algorithms (chapter 20) and applies the Gauss-Jordan matrix solver to a chemical reaction computation. This directly solves Schrödinger’s equation for a small number of particles and is different in structure from the problems in CRPC chapter 7; it reduces to a multi-channel ordinary differential equation and leads to full matrix solvers. A section on “electronmolecule collisions” describes a similar structure to the much more sophisticated simulation engines of CRPC chapter 7. Further work by this group can be found in chapter 14 of ISPC. PC07ScienceApps gcf@indiana.edu 65 Parallel Computing Works III • PCW Chapter 9 “ Loosely Synchronous Problems”. The above chapters described synchronous or pleasingly parallel systems in the language of chapter 4 of CRPC. This chapter describes several loosely synchronous cases. Geomorphology by micro-mechanical simulations was different approach to granular systems (from the cellular automata in chapter 4 of PCW) using direct modeling of particles “bouncing off each other”. Particlein-cell simulation of an electron beam plasma instability used particle in the cell methods, which have of course grown tremendously in sophistication as seen in the astrophysics simulation of CRPC section 8.6 and the ion thruster simulations in chapter 15 of ISPC (which uses the same approach as described in this PCW chapter). Computational electromagnetics (see section 8.7 of CRPC) used finite element methods and is followed up in chapter 20 of ISPC. Concurrent DASSL applied to dynamic distillation column simulation uses a parallel sparse solver (chapter 21 of CRPC) to tackle coupled ordinary differential-algebraic equations arising in chemical engineering. This chapter also discusses parallel adaptive multigrid for solving differential equations – an area with similarities to mesh refinement discussed in CRPC chapters 5, 8.6, 12 and 19. See also chapter 9 of ISPC. Munkres’s assignment algorithm was parallelized for a multi-target Kalman filter problem (cf. section 8.8 of CRPC). This PCW chapter also discusses parallel implementations of learning methods for neural networks. • PCW Chapter 10 “DIME Programming Environment” discusses one of the earliest parallel unstructured mesh generators and applies it to model finite element problems. Chapter 19 of CRPC is an up to date survey of this field. PC07ScienceApps gcf@indiana.edu 66 Parallel Computing Works IV • PCW Chapter 11 “Load Balancing and Optimization” describes approaches to optimization based on physical analogies and including approaches to the well-known traveling salesman problem. These physical optimization methods complement those discussed in CRPC chapters 8.8 and 22. • PCW Chapter 12 “Irregular Loosely Synchronous Problems” features some of the harder parallel scientific codes in PCW. This chapter includes two adaptive unstructured mesh problems that used the DIME package described in PCW chapter 10. One was a simulation of the electrosensory system of the fish gnathonemus petersii and the other transonic flow in CFD (chapter 5 of CRPC). There is a full discussion of fast multipole methods and their parallelism; these were mentioned in chapters 4 and 8.7 of CRPC and in PCW are applied to astrophysical problems similar to those in section 8.6 of CRPC and to the vortex approach to CFD. Fast multipole methods are applied to the same problem class as particle in the cell codes as they again involve interacting particles and fields. Chapter 19 of ISPC discusses another biochemistry problem of this class. Parallel sorting is an interesting area and this PCW chapter describes several good algorithms and compares them. The discussion of cluster algorithms for statistical physics is interesting as these are the best sequential algorithms but the method is very hard to parallelize. The same difficult structure occurs in some approaches to region finding in image processing and also for some models of the formation of earthquakes using cellular automata-like models. The clusters are the aligned strains that form the earthquake. PC07ScienceApps gcf@indiana.edu 67 Parallel Computing Works V • PCW Chapter 14 “Asynchronous Applications” describes examples of the temporally asynchronous algorithms described in chapter 4 where scaling parallelism is not easy. Melting in two dimensions illustrates a subtle point that distinguishes Monte Carlo and PDE algorithms, as one cannot simultaneously update in Monte Carlo, sites with overlapping neighbors. This complicates the loosely synchronous structure and can make problem architecture look like that of asynchronous event driven simulations---here events are individual Monte Carlo updates. ``Detailed balance'' requires that such events be sequentially (if arbitrarily) ordered which is not easy in a parallel environment. Nevertheless using the equivalent of multiple threads (chapter 10 of CRPC), one finds an algorithm that gives good parallel performance. • Computer Chess is the major focus of this chapter where parallelism is gotten from sophisticated parallelism of the game tree. Statistical methods are used to balance the processing of the different branches of the dynamically pruned game tree. There is a shared database containing previous evaluation of positions, but otherwise the processing of the different possible moves is independent. One does need a clever ordering of the work (evaluation of the different final positions) to avoid a significant number of calculations being wasted because they would ``later'' be pruned away by a parallel calculation on a different processor. Branch and bound applications have similar parallelization characteristics to computer chess. Note this is not the only and in fact not the easiest form of parallelism in computer chess. Rather fine grain parallelism in evaluating each position is used in all recent computer chess championship systems. The work described in PCW is complementary to this mainstream activity. PC07ScienceApps gcf@indiana.edu 68 Parallel Computing Works VI • PCW Chapter 18 “Complex System Simulation and Analysis” describes a few metaproblems using the syntax of section 4.9 of CRPC. ISIS was an Interactive Seismic Imaging System and there is a long discussion of one of the first large scale parallel military simulations mixing data and task parallelism. This involved generation of scenario, tracking multiple ballistic missiles and a simulation of the hoped for identification and destruction. A very sophisticated parallel Kalman filter was generated in this project. • Workflow technology would be used in these applications today PC07ScienceApps gcf@indiana.edu 69