Parallelization

Partial Differential Equations • Introduction, Adam Zornes • Discretizations and Iterative Solvers, Chenfang Chen • Parallelization, Dr. Danny Thorne What do You Stand For? • A PDE is a Partial Differential Equation • This is an equation with derivatives of at least two variables in it. • In general, partial differential equations are much more difficult to solve analytically than are ordinary differential equations What Does a PDE Look Like • Let u be a function of x and y. There are several ways to write a PDE, e.g., – ux + u y = 0 – du/dx + du/dy = 0 The Baskin Robin’s esq Characterization of PDE’s • The order is determined by the maximum number of derivatives of any term. • Linear/Nonlinear – A nonlinear PDE has the solution times a partial derivative or a partial derivative raised to some power in it • Elliptic/Parabolic/Hyperbolic Six One Way • Say we have the following: Auxx + 2Buxy + Cuyy + Dux + Euy + F = 0. • Look at B2 - AC – < 0 elliptic – = 0 parabolic – > 0 hyperbolic Or Half a Dozen Another • A general linear PDE of order 2: • Assume symmetry in coefficients so that A = [aij] is symmetric. Eig(A) are real. Let P and Z denote the number of positive and zero eigenvalues of A. – Elliptic: Z = 0 and P = n or Z = 0 and P = 0.. – Parabolic: Z > 0 (det(A) = 0). – Hyperbolic: Z=0 and P = 1 or Z = 0 and P = n-1. – Ultra hyperbolic: Z = 0 and 1 < P < n-1. Elliptic, Not Just For Exercise Anymore • Elliptic partial differential equations have applications in almost all areas of mathematics, from harmonic analysis to geometry to Lie theory, as well as numerous applications in physics. • The basic example of an elliptic partial differential equation is Laplace’s Equation – uxx - uyy = 0 The Others • The heat equation is the basic Hyperbolic – ut - uxx - uyy = 0 • The wave equations are the basic Parabolic – ut - ux - uy = 0 – utt - uxx - uyy = 0 • Theoretically, all problems can be mapped to one of these What Happens Where You Can’t Tell What Will Happen • Types of boundary conditions – Dirichlet: specify the value of the function on a surface – Neumann: specify the normal derivative of the function on a surface – Robin: a linear combination of both • Initial Conditions Is It Worth the Effort? • Basically, is it well-posed? – A solution to the problem exists. – The solution is unique. – The solution depends continuously on the problem data. • In practice, this usually involves correctly specifying the boundary conditions Why Should You Stay Awake for the Remainder of the Talk? • Enormous application to computational science, reaching into almost every nook and cranny of the field including, but not limited to: physics, chemistry, etc. Example • Laplace’s equation involves a steady state in systems of electric or magnetic fields in a vacuum or the steady flow of incompressible non-viscous fluids • Poisson’s equation is a variation of Laplace when an outside force is applied to the system Poisson Equation in 2D Example: CFD Computational Fluid Dynamics • CFD can be defined narrowly as confined to aerodynamic flow around vehicles but it can be generalized to include as well such areas as weather and climate simulation, flow of pollutants in the earth, and flow of liquids in oil fields (reservoir modelling). • Involve Huge PDE’s • Computational Science only Realistic Solution Links • http://www.npac.syr.edu/users/gcf/cps713overI94/ • http://www.cse.uiuc.edu/~rjhartma/pdesong.html • http://www.maths.soton.ac.uk/teaching/units/ma274/nod e2.html • http://www.npac.syr.edu/projects/csep/pde/pde.html • http://mathworld.wolfram.com/PartialDifferentialEquation. html Discretization and Iterative Method for PDEs Chunfang Chen Danny Thorne Adam Zornes Classification of PDEs Different mathematical and physical behaviors:  Elliptic Type  Parabolic Type  Hyperbolic Type System of coupled equations for several variables:  Time : first-derivative (second-derivative for wave equation)  Space: first- and second-derivatives Classification of PDEs (cont.) General form of second-order PDEs ( 2 variables) PDE Model Problems  Hyperbolic (Propagation) • Advection equation (First-order linear) • Wave equation (Second-order linear ) PDE Model Problems (cont.)  Parabolic (Time- or space-marching) • Burger’s equation(Second-order nonlinear) (Diffusion / dispersion) • Fourier equation (Second-order linear ) PDE Model Problems (cont.)  Elliptic (Diffusion, equilibrium problems) • Laplace/Poisson (second-order linear) • Helmholtz equation (second-order linear) PDE Model Problems (cont.)  System of Coupled PDEs • Navier-Stokes Equations Well-Posed Problem Numerically well-posed  Discretization equations  Auxiliary conditions (discretized approximated) • the computational solution exists (existence) • the computational solution is unique (uniqueness) • the computational solution depends continuously on the approximate auxiliary data • the algorithm should be well-posed (stable) also Boundary and Initial Conditions Initial conditions: starting point for propagation problems Boundary conditions: specified on domain boundaries to provide the interior solution in computational domain R R s n Numerical Methods  Complex geometry  Complex equations (nonlinear, coupled)  Complex initial / boundary conditions  No analytic solutions  Numerical methods needed !! Numerical Methods Objective: Speed, Accuracy at minimum cost  Numerical  Numerical  Numerical  Validation Accuracy (error analysis) Stability (stability analysis) Efficiency (minimize cost) (model/prototype data, field data, analytic solution, theory, asymptotic solution)  Reliability and Flexibility (reduce preparation and debugging time)  Flow Visualization (graphics and animations) computational solution procedures Governing Equations ICS/BCS Continuous Solutions System of Algebraic Equations Equation (Matrix) Solver Discrete Nodal Values Tridiagonal Ui (x,y,z,t) ADI p (x,y,z,t) Finite-Element SOR T (x,y,z,t) Spectral Gauss-Seidel Boundary Element Krylov Hybrid Multigrid Discretization Finite-Difference Finite-Volume DAE Approximate Solution or  (,,, ) Discretization  Time derivatives almost exclusively by finite-difference methods  Spatial derivatives - Finite-difference: Taylor-series expansion - Finite-element: low-order shape function and interpolation function, continuous within each element - Finite-volume: integral form of PDE in each control volume - There are also other methods, e.g. collocation, spectral method, spectral element, panel method, boundary element method Finite Difference  Taylor series  Truncation error  How to reduce truncation errors? • Reduce grid spacing, use smaller x = x-xo • Increase order of accuracy, use larger n Finite Difference Scheme  Forward difference  Backward difference  Central difference Example : Poisson Equation (-1,1) (1,1) y x (-1,-1) (1,-1) Example (cont.) Ui  1, j  2Ui , j  Ui x 2  1, j  Ui , j  1  2Ui , j  Ui , y          (i,j+1)y  (i-1,j)x (i,j)(i+1,j)  (i,j-1)          2 j  1 1 Rectangular Grid After we discretize the Poisson equation on a rectangular domain, we are left with a finite number of gird points. The boundary values of the equation are 0,4 1,4 2,4 3,4 4,4 the only known grid 0,3 1,3 2,3 3,3 4,3 points  0,1 0,0 0,2  1,1 1,0 1,2  2,1 2,0 2,2  3,1 3,0 3,2  4,1 4,0 4,2 What to solve? Discretization produces a linear system of equations. 4 1 0 1 0 The A matrix is a pentadiagonal banded matrix of a standard form: 1 4 1 0 1 0 1 4 1 0 1 0 1 4 1 0 1 0 1 4 A solution method is to be performed for solving Matrix Storage We could try and take advantage of the banded nature of the system, but a more general solution is the adoption of a sparse matrix storage strategy. Limitations of Finite Differences  Unfortunately, it is not easy to use finite differences in complex geometries.  While it is possible to formulate curvilinear finite difference methods, the resulting equations are usually pretty nasty. Finite Element Method  The finite element method, while more complicated than finite difference methods, easily extends to complex geometries.  A simple (and short) description of the finite element method is not easy to give. PDE Weak Matrix Form System Finite Element Method (Variational Formulations)  Find u in test space H such that a(u,v) = f(v) for all v in H, where a is a bilinear form and f is a linear functional.     V(x,y) = j Vjj(x,y), j = 1,…,n I(V) = .5 j j AijViVj - j biVi, i,j = 1,…,n Aij =  a( j,  j), i = 1,…,n Bi =  f j, i = 1,…,n  The coefficients Vj are computed and the function V(x,y) is evaluated anyplace that a value is needed.  The basis functions should have local support (i.e., have a limited area where they are nonzero). Time Stepping Methods  Standard methods are common: – Forward Euler (explicit) – Backward Euler (implicit) – Crank-Nicolson (implicit) T jn 1  T jn t   L xx T L xx T j  n 1 j  (1   ) L xx T T j 1  2T j  T j 1 x 2 n j θ = 0, Fully-Explicit θ = 1, Fully-Implicit θ = ½, Crank-Nicolson Time Stepping Methods (cont.)  Variable length time stepping – Most common in Method of Lines (MOL) codes or Differential Algebraic Equation (DAE) solvers Solving the System  The system may be solved using simple iterative methods - Jacobi, Gauss-Seidel, SOR, etc. • Some advantages: • Some disadvantages - No explicit storage of the matrix is required - The methods are fairly robust and reliable - Really slow (Gauss-Seidel) - Really slow (Jacobi) Solving the System Advanced iterative methods (CG, GMRES) CG is a much more powerful way to solve the problem. • Some advantages: – Easy to program (compared to other advanced methods) – Fast (theoretical convergence in N steps for an N by N system) • Some disadvantages: – Explicit representation of the matrix is probably necessary – Applies only to SPD matrices Multigrid Algorithm: Components  Residual compute the error of the approximation  Iterative method/Smoothing Operator Gauss-Seidel iteration  Restriction obtain a ‘coarse grid’  Prolongation from the ‘coarse grid’ back to the original grid Residual Vector  The equation we are to solve is defined as:  So then the residual is defined to be:  Where uq is a vector approximation for u  As the u approximation becomes better, the components of the residual vector(r) , move toward zero Multigrid Algorithm: Components  Residual compute the error of your approximation  Iterative method/Smoothing Operator Gauss-Seidel iteration  Restriction obtain a ‘coarse grid’  Prolongation from the ‘coarse grid’ back to the original grid Multigrid Algorithm: Components  Residual compute the error of your approximation  Iterative method/Smoothing Operator Gauss-Seidel iteration  Restriction obtain a ‘coarse grid’  Prolongation from the ‘coarse grid’ back to the original grid The Restriction Operator  Defined as ‘half-weighted’ restriction.  Each new point in the courser grid, is dependent upon it’s neighboring points from the fine grid Multigrid Algorithm: Components  Residual compute the error of your approximation  Iterative method/Smoothing Operator Gauss-Seidel iteration  Restriction obtain a ‘coarse grid’  Prolongation from the ‘coarse grid’ back to the original grid The Prolongation Operator  The grid change is exactly the opposite of restriction (0,2)  (1,2)  (2,2)  (0,4) (1,4) (2,4) (3,4) (4,4) (0,1)  (1,1)  (2,1)  (0,2) (1,2) (2,2) (3,2) (4,2) (0,0)  (1,0)  (2,0)  (0,0) (1,0) (2,0) (3,0) (4,0) (0,3) (1,3) (2,3) (3,3) (4,3) (0,1) (1,1) (2,1) (3,1) (4,1) Prolongation vs. Restriction  The most efficient multigrid algorithms use prolongation and restriction operators that are directly related to each other. In the one dimensional case, the relation between prolongation and restriction is as follows: Full Multigrid Algorithm 1.Smooth initial U vector to receive a new approximation Uq 2. Form residual vector: Rq =b -A Uq 3. Restrict Rq to the next courser grid  Rq-1 4. Smooth Ae= Rq-1 starting with e=0 to obtain eq-1 5.Form a new residual vector using: Rq-1= Rq-1 -A eq-1 6. Restrict R2 (5x? where ?5) down to R1(3x? where ?3) 7. Solve exactly for Ae= R1 to obtain e1 8. Prolongate e1e2 & add to e2 you got from restriction 9. Smooth Ae= R2 using e2 to obtain a new e2 10. Prolongate eq-1 to eq and add to Uq Reference http://csep1.phy.ornl.gov/CSEP/PDE/PDE.htm l www.mgnet.org/ www.ceprofs.tamu.edu/hchen/ www.cs.cmu.edu/~ph/859B/www/notes/multig rid.pdf www.cs.ucsd.edu/users/carter/260 www.cs.uh.edu/~chapman/teachpubs/slides04 -methods.ppt http://www.ccs.uky.edu/~douglas/Classes/cs5 21-s01/index.html HPC Issues in PDEs, Part 3 Parallelization Danny Thorne, CS521, University of Kentucky, April 9, 2002 Parallel Computation • Serious calculations today are mostly done on a parallel computer. • The domain is partitioned into subdomains that may or may not overlap slightly. • Goal is to calculate as many things in parallel as possible even if some things have to be calculated on several processors in order to avoid communication. • ”Communication is the Darth Vader of parallel computing.” Example: Original Mesh • Consider solving a PDE on this grid. • Suppose that only half of the nodes fit on a processor. Example: Mesh on Two Processors • Divide into two connected subsets and renumber within the subdomains. • Communication occurs between neighbors that cross the processor boundary. • Ghost points (or overlap) can reduce communication sometimes at the expense of extra computation. • Computation is O ( 11000) communication per word. Mesh Decomposition • Goals are to maximize interior while minimizing connections between subdomains. That is, minimize communication. • Such decomposition problems have been studied in load balancing for parallel computation. • Lots of choices: • METIS, ParMETIS -- University of Minnesota. • PARTI -- University of Maryland, • CHACO -- Sandia National Laboratories, • JOSTLE -- University of Greenwich, • PARTY -- University of Paderborn, • SCOTCH -- Université Bordeaux, • TOP/DOMDEC -- NAS at NASA Ames Research Center. Mesh Decomposition • Quality of mesh decomposition has a highly significant effect on performance. • Arriving at a “good” decomposition is a complex task. • “Good” may be problem/architecture dependent. • A wide variety of well-established methods exist. • Several packages/libraries provide implementations of many of these methods. • Major practical difficulty is differences in file formats. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-11.html Decomposition Goals • Load balance. • Distribute elements evenly across processors. • Each processor should have equal share of work. • Communication costs should be minimized. • Minimize sub-domain boundary elements. • Minimize number of neighboring domains. • Distribution should reflect machine architecture. • Communication versus calculation. • Bandwidth versus latency. • Note that optimizing load balance and communication cost simultaneously is an NP-hard problem. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-13.html Static Mesh versus Dynamic Mesh • Static mesh • Decomposition need only be carried out once. • Static decomposition may therefore be carried out as a preprocessing step, often done in serial. • Dynamic mesh • Decomposition must be adapted as underlying mesh changes. • Dynamic decomposition therefore becomes part of the calculation itself and cannot be carried out solely as a preprocessing step. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-14.html Rod Impact • Notice that the mesh changes dynamically. http://www.amtec.com/plotgallery/animation/movies/t-rod.gif Frisbee • Need a dynamic mesh for this? http://www.llnl.gov/CASC/Overture/henshaw/figures/pib.mpg Dynamic Decomposition • To redistribute a mesh • A new decomposition must be found and required changes made (shuffling of data between processors). • The time taken may outweigh the benefit gained so an assessment of whether it is worthwhile is needed. • The decomposition must run in parallel and be fast. • Take into account the old decomposition. • Make minimal changes. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-15.html Dual Graphs • To make a decomposition we need • A representation of the basic entities being distributed (e.g. the elements, in the case of a FE mesh), • An idea of how communication takes place between them. • A dual graph, based on the mesh, fills this role. • Vertices in the graph represent the entities (elements). • Edges in the graph represent communication. • Graph depends on how data is transferred. • For finite elements it could be via nodes, edges or faces. • So a single mesh can have more than one dual graph. • Edges (communications) or vertices (calculations) may be weighted. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-16.html http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-17.html 2D Example Graph Partitioning • This problem occurs in several applications. • VLSI circuit layout design, • Efficient orderings for parallel sparse matrix factorization, • Mesh decomposition. • This problem has been extensively studied. • A large body of literature already exists. • Much work is still being produced in the field. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-18.html Graph Partitioning • Given a graph G (V , E ) we want to partition it into k parts such that: • Each part has roughly the same number of vertices, • Number of edges that cross partitions is minimized. http://www-users.cs.umn.edu/~karypis/publications/Talks/multi-constraint/sld002.htm Mesh Decomposition • View mesh decomposition as having two aspects: • Partitioning a graph, • Mapping the sub-domains to processors. • Algorithms may • Address both issues, • Or simplify the problem by ignoring the latter. • Whether this is an issue depends on target machine. • With very fast communications, e.g. the T3D, it's probably less important. • On a cluster of workstations it may be absolutely critical. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-22.html Generalization Decomposition Algorithms • Global methods • Direct k-way partitioning. • Recursive application of some  -way technique, e.g. bisection. • Local refinement techniques • Incrementally improve quality of an existing decomposition. • Multi-level variants • Work with coarse approximation of graph to increase performance. • Hybrid techniques • Using various combinations of above. http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-24.html Examples • Simple, direct k-way algorithms • Random, Scattered & Linear Bandwidth Reduction • The Greedy Algorithm • Optimization algorithms • Simulated Annealing • Chained Local Optimization • Genetic Algorithms • Geometry based recursive algorithms • Coordinate Partitioning • Inertial Partitioning http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-25.html Examples • Graph based recursive algorithms • Layered Partitioning • Spectral Partitioning • Local refinement algorithms • Kernighan and Lin • Jostle • Multi-level and hybrid variants • Multi-level Kernighan and Lin Partitioning • Multi-level Spectral RQL / Symmlq Partitioning http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/MeshDecomp-26.html Space Filling Curves • Locality preserving. • Each point lies a unique distance along the curve. Space Filling Curves • Optimal load balance. • Subdomain boundaries are sub-optimal. • Recall: Optimizing load and comm is NP-hard. Space Filling Curves The main advantages of this partition method are: • It is fast compared to graph partitioning heuristics, • It runs in parallel, • It requires no administration and no storage of processor neighborhoods. • The knowledge of the separators is enough to compute where to find a node and which processor to ask for it. Parallel PDE Tools • Parallel ELLPACK -- Purdue University • POOMA -- Los Alamos National Laboratory • Overture -- Lawrence Livermore National Laboratory • PADRE – LLNL, LANL • PETSc -- Argonne National Laboratory • Unstructured Grids -- University of Heidelberg Links • A Software Framework for Easy Parallelization of PDE Solvers -- http://www.ifi.uio.no/~xingca/PCFD2000/Slides/pcfd2000/ • Numerical Objects Online -- http://www.nobjects.com/Diffpack/ • Xing Cai's home page at University of Oslo -- http://www.ifi.uio.no/~xingca/ • Related Grid Decomposition Sites -- http://www.erc.msstate.edu/~bilderba/research/other_sites.html • Domain Decomposition Methods -- http://www.ddm.org/ • People working on Domain Decomposition -- http://www.ddm.org/people.html • Unstructured Mesh Decomposition -- http://www.epcc.ed.ac.uk/epcc-tec/documents/meshdecomp-slides/ • HPCI Seminar: Parallel Sparse Matrix Solvers -- http://www.epcc.ed.ac.uk/computing/research_activities/HPCI/sparse_seminar.html • METIS: Family of Multilevel Partitioning Algorithms -- http://www-users.cs.umn.edu/~karypis/metis/index.html • Multilevel Algorithms for Multi-Constraint Graph Partitioning -- http://www-users.cs.umn.edu/~karypis/publications/Talks/multi-constraint/sld001.htm • Structured Versus Unstructured Meshes -- http://www.epcc.ed.ac.uk/epcc-tec/courses/images/Meshes.gif • Domain Decomposition Methods for elliptic PDEs -- http://www.cs.purdue.edu/homes/mav/projects/dom_dec.html • Multiblock Parti library -- http://www.cs.umd.edu/projects/hpsl/compilers/base_mblock.html • Chaco: Software for Partitioning Graphs -- http://www.cs.sandia.gov/~bahendr/chaco.html • A Portable and Efficient Parallel Code for Astrophysical Fluid Dynamics -- http://astro.uchicago.edu/Computing/On_Line/cfd95/camelse.html • Links for Graph Partitioning -- http://rtm.science.unitn.it/intertools/graph-partitioning/links.html • A Multilevel Algorithm for Partitioning Graphs -- http://www.chg.ru/SC95PROC/509_BHEN/SC95.HTM • Multigrid Workbench: The Algorithm -- http://www.mgnet.org/mgnet/tutorials/xwb/mgframe.html • Domain DecompositionParallelization of Mesh Based Applications -- http://www.hlrs.de/organization/par/par_prog_ws/online/DomainDecomposition/index.htm • Parallel ELLPACK Elliptic PDE Solvers -- http://www.cs.purdue.edu/homes/markus/research/pubs/pde-solvers/paper.html • POOMA -- http://www.acl.lanl.gov/pooma/ • Overture -- http://www.llnl.gov/CASC/Overture/ • PADRE: A Parallel Asynchronous Data Routing Environment -- http://www.c3.lanl.gov:80/~kei/iscope97.1/iscope97.1.html • PETSc: The Portable, Extensible Toolkit for Scientific Computation -- http://www-fp.mcs.anl.gov/petsc/ • UG Home Page -- http://cox.iwr.uni-heidelberg.de/~ug/ • Space-filling Curves -- http://mrl.nyu.edu/~dzorin/sig99/zorin3/sld036.htm • Separate IMAGE for Basic foil 30 Two Space Filling Curves -- http://www.npac.syr.edu/users/gcf/cps615gemoct98/foilsepimagedir/030IMAGE.html • Using Space-filling Curves for Multi-dimensional Indexing -- http://www.dcs.bbk.ac.uk/~jkl/BNCOD2000/slides.html • The origins of fractals -- http://plus.maths.org/issue6/turner1/ • Adaptive Parallel Multigrid Methods -- http://wissrech.iam.uni-bonn.de/research/projects/zumbusch/hash.html

Parallelization

Related documents

Products

Support

Parallelization

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib