Parallel solution of the Helmholtz equation with large wave numbers Dan Gordon Rachel Gordon Computer Science University of Haifa Aerospace Eng. Technion July 1, 2010 Parallel solution of the Helmholtz equation 1 Outline of Talk The Kaczmarz algorithm (KACZ) KACZ CARP (ComponentAveraged Row Projections) Applications of CARP CARP-CG: CG acceleration of CARP Sample results with the Helmholtz equation July 1, 2010 Parallel solution of the Helmholtz equation 2 KACZ: The Kaczmarz algorithm Iterative method, due to Kaczmarz (1937). Rediscovered for CT as ART Geometric algorithm: consider the hyperplane defined by each equation Start from an arbitrary initial point Successively project current point onto the next hyperplane, in cyclic order July 1, 2010 Parallel solution of the Helmholtz equation 3 KACZ: Geometric Description initial point eq. 1 eq. 3 July 1, 2010 eq. 2 Parallel solution of the Helmholtz equation 4 KACZ with Relaxation Parameter KACZ can be used with a relaxation parameter w w=1: project exactly on the hyperplane w<1: project in front of hyperplane w>1: project beyond the hyperplane Cyclic relaxation: eq. i is assigned a relaxation parameter wi July 1, 2010 Parallel solution of the Helmholtz equation 5 Convergence Properties of KACZ KACZ with relaxation (0<w<2) converges for consistent systems: – Herman, Lent & Lutz, 1978 – Trummer, 1981 For inconsistent systems, KACZ converges cyclically: – Tanabe, 1971 – Eggermont, Herman & Lent, 1981 (for cyclic relaxation parameters). July 1, 2010 Parallel solution of the Helmholtz equation 6 Algebraic formulation of KACZ Given the system Consider system Ax = b the "normal equations" T T AA y = b, x=A y Well-known fact: KACZ is SOR applied to the normal equations The relaxation parameter of KACZ is the usual relax. par. of SOR July 1, 2010 Parallel solution of the Helmholtz equation 7 Block Mode & Parallelization Block KACZ: projection onto affine subspace defined by a block of eqns Block-sequential KACZ: – partition eqns into blocks – each block consists of independent eqns – iterate over blocks – in each block, perform projections in parallel July 1, 2010 Parallel solution of the Helmholtz equation 8 CARP: Component-Averaged Row Projections A block-parallel version of KACZ The equations are divided into blocks (not necessarily disjoint) Initial estimate: vector x=(x1,…,xn) Suppose x1 is a variable (component of x) that appears in 3 blocks x1 is “cloned” as y1 , z1 , t1 in the different blocks. Perform one (or more) KACZ iteration(s) on each block (independently, in parallel) July 1, 2010 Parallel solution of the Helmholtz equation 9 CARP – Explanation (cont) The internal iterations in each block produce 3 new values for the clones of x1 : y1’ , z1’ , t1’ The next iterative value of x1 is x1’ = (y1’ + z1’ + t1’)/3 The next iterate is x’ = (x1’ , ... , xn’) Repeat iterations as needed for convergence July 1, 2010 Parallel solution of the Helmholtz equation 10 CARP as Domain Decomposition x0 y 1 clone of x1 x1 external grid point of A Note: domains may overlap domain A July 1, 2010 domain B Parallel solution of the Helmholtz equation 11 Overview of CARP domain A KACZ iterations domain B averaging KACZ iterations cloning KACZ in superspace (with cyclic relaxation) July 1, 2010 Parallel solution of the Helmholtz equation 12 Convergence of CARP Averaging Lemma: the component- averaging and cloning operations of CARP are equivalent to KACZ row- projections in a certain superspace (with w=1) CARP is equivalent to KACZ in the superspace, with cyclic relaxation parameters – known to converge July 1, 2010 Parallel solution of the Helmholtz equation 13 CARP Application: Solution of stiff linear systems from PDEs Elliptic PDEs w/large convection term result in stiff linear systems (large off-diagonal elements) CARP is very robust on these systems, as compared to leading solver/preconditioner combinations Downside: Not always efficient July 1, 2010 Parallel solution of the Helmholtz equation 14 CARP Application: Electron Tomography (joint work with J.-J. Fernández) 3D reconstructions: Each processor is assigned a block of consecutive slices. Data is in overlapping blobs. The blocks are processed in parallel. The values of shared variables are transmitted between the processors which share them, averaged, and redestributed. July 1, 2010 Parallel solution of the Helmholtz equation 15 CARP-CG: CG acceleration of CARP CARP is KACZ in some superspace (with cyclic relaxation parameters) Björck & Elfving (BIT 79): developed CGMN, which is a (sequential) CGacceleration of KACZ (double sweep, fixed relax. parameter) We extended this result to allow cyclic relaxation parameters Result: CARP-CG July 1, 2010 Parallel solution of the Helmholtz equation 16 CARP-CG: Properties Same robustness as CARP Very significant improvement in performance on stiff linear systems derived from elliptic PDEs Very competitive runtime compared to leading solver/preconditioner combinations on systems derived from convection-dominated PDEs Improved performance in ET July 1, 2010 Parallel solution of the Helmholtz equation 17 CARP-CG: Properties On one processor, CARP-CG is identical to CGMN Particularly useful on systems with LARGE off-diagonal elements – example: convection-dominated PDEs Discontinuous coefficients are handled without requiring domain decomposition (DD) July 1, 2010 Parallel solution of the Helmholtz equation 18 Robustness of CARP-CG KACZ inherently normalizes the eqns After normalization, the diagonal elements of AAT are larger than the offdiagonal ones (in each row) This is not diagonal dominance, but it makes the normal eqns manageable Normalization was also found to be useful for discontinuous coefficients July 1, 2010 Parallel solution of the Helmholtz equation 19 The Helmholtz Equation Eqn: -Δu - k2u = f Wave length: l = 2p/k No. of grid pts per l: Ng = 2p/kh Shifted Laplacian approach: – Bayliss, Goldstein & Turkel, 1983 – Erlangga, Vuik & Oosterlee, 2004/06 -Δu – (1- i b)k2 u = f uses multigrid to solve the PC (PC = preconditioner) July 1, 2010 Parallel solution of the Helmholtz equation 20 The Helmholtz Equation Bollhöfer, Grote & Schenk, 2009: introduced algebraic multilevel PC for the Helmholtz eqn in heterogeneous media. Uses symmetric max weight matchings and an inverse-based pivoting method. Apologies to many other contributors to this problem! July 1, 2010 Parallel solution of the Helmholtz equation 21 Experiments CARP-CG was used with a fixed relaxation parameter of 1.7 in all cases Domain: 2nd unit square [0,1][0,1] order central difference scheme July 1, 2010 Parallel solution of the Helmholtz equation 22 Problem 1 (with analytic sol'n) Based on Erlangga et al '04, §6.1 Eqn: (Δ+k2)u = (k2–5p2)sin(px)sin(2py) bndry condition: u=0 on all sides Analytic solution: u = sin(px)sin(2py) Grid points per l: Ng = 6,8,10,12 No. of processors: 1 – 32 k = 100, 300 July 1, 2010 Parallel solution of the Helmholtz equation 23 Problem 2 (homogeneous) Based on Erlangga et al '04, §6.2 Eqn: Δu + k2u = 0 Domain: unit square [0,1]x[0,1] Dirichlet bndry cond. on one side, with a discontinuity at midpoint 1st-order absorbing bndry cond. on other sides Grid points per l: Ng = 6, 8, 10 No. of processors: 1 – 32 k = 75, 150, 300, 450, 600 July 1, 2010 Parallel solution of the Helmholtz equation 28 July 1, 2010 Parallel solution of the Helmholtz equation 29 July 1, 2010 Parallel solution of the Helmholtz equation 30 July 1, 2010 Parallel solution of the Helmholtz equation 31 Problem 3 (heterogeneous) 3-layer heterogeneous problem Based on Erlangga et al '04, §6.3 Everything is identical to Problem 2 EXCEPT: k=600 k=450 k=300 July 1, 2010 Parallel solution of the Helmholtz equation 34 July 1, 2010 Parallel solution of the Helmholtz equation 35 July 1, 2010 Parallel solution of the Helmholtz equation 36 July 1, 2010 Parallel solution of the Helmholtz equation 37 Comparative Timing Results Time/iter of Bi-CGSTAB and GMRES relative to CARP-CG it-ratio = (time/iter of algorithm) / (time/iter of CARP-CG) Method time/iter time/iter it-ratio it-ratio 1 proc 16 proc 1 proc 16 proc CARP-CG 0.0978 0.0134 1.00 1.00 BiCGSTAB 0.1477 0.0344 1.51 2.56 0.1490 0.0212 1.52 1.58 GMRES (restart =10) Results from CARP-CG paper (PARCO, 2010) Timing and Speedup Results Problem 2, k=600, Ng=8, grid: 763763 582,169 (complex) equations, rel-res<10-7 No. proc No. Iter Time (s) Speed-up Efficiency 1 7025 1256 1.00 100% 2 7039 822 1.53 76.4% 4 7066 462 2.72 68.0% 8 7115 255 4.92 61.5% 15 7206 159 7.92 52.8% Summary CARP-CG is highly scalable on the Helmholtz eqn w/high wave numbers Applicable to discontinuous coefficients Very simple to implement General-purpose – useful also for other problems with large offdiagonal elements and discontinuous coefficients Other Potential Applications Fourth-order schemes for the Helmholtz equation (already have good initial results) Maxwell equations Saddle-point problems Circuit problems Linear solvers in some eigenvalue methods ... July 1, 2010 Parallel solution of the Helmholtz equation 41 Relevant Publications http://cs.haifa.ac.il/~gordon/pub.html CARP: SIAM J Sci Comp 2005 CGMN: ACM Trans Math Software 2008 Microscopy: J Parallel & Distr Comp 2008 Large convection + discontin coef: CMES 2009 CARP-CG: Parallel Comp 2010 Scaling for discont coef: J Comp & Appl Math 2010 CARP-CG SOFTWARE AVAILABLE ON REQUEST THANK YOU! July 1, 2010 Parallel solution of the Helmholtz equation 42