Distributed Computation of Wave Propagation Models Using PVM Richard E. Ewing Texas A& M University Robert C. Sharpley University of South Carolina Derek Mitchum and Patrick O'Leary University of Wyoming James S. Sochacki James Madison University @ The Parallel VirtualMachine lets researchers create a powerful, inexpensive parallel system on which they can solve large, sophisticated problems such as simulating the propagation of seismic waves. 26 A lthough MIMD, SIMD, shared-memory, and other emerging supercomputers can attack most of today's large-scale computing problems, these machines are inaccessible to the average researcher. But any researcher with accounts on multiple Unix workstations can corral unused C P U cycles to solve large problems by using a distributed software utility called the Parallel Virtual Machine, developed by Emory University, the University of Tennessee, and Oak k d g e National Laboratory (see the sidebar).'a2 PVM lets researchers connect workstations, mini-supercomputers, or specialty machines to form a relatively inexpensive, powerful, parallel computer. Such hardware is frequently abundant at research locations, so PVM incurs little or no hardware costs. Also, PVM is flexible: It uses existing communication networks (Ethernet or fiber) and remote procedural libraries; it lets programmers use either C or Fortran; and it can emulate several commercial architectures including hypercubes, meshes, and rings. W e believe that PVM can compete effectively with traditional supercomputers, and we have demonstrated its computational power and cost-effectiveness by simulating the propagation of seismic waves using an isolated Ethernet ring comprising an IBM RS/6000 550 as the host and six RS/6000 320H's as the nodes. 1063-65j?/94/$4.00 0 1994 IEEE IEEE Parallel & Distributed Technology Model equations and numerical method Geophysicists determine the earth's substructure by producing vibrations (through controlled explosions or vibroseis trucks) at or near the earth's surface. They are particularly interested in the density, sound speed, and Lam6 parameters (describing the earth's elastic properties) of the materials composing the section o f the earth surrounding the explosion site. Typical measurements include pressure distribution at the earth's surface caused by the explosion (the p.re.s.szii-e seismogram) and the vertical displacement of the earth's surface (the displacement seismognzm). Geophysicists use an acoustic wave equation to simulate a pressure seismogram, and an elastic zm'e equation to simulate a displacement seismogram. From these they determine the substructure's characteristics. (These wave model equations can also solve problems in medical imaging, sonar, and nondestructive testing of materials.) Determining a wave source's effects on a specified substructure is called thefomai-dp.roblem,while determining the substructure and its parameters is the inverseproblem. W e address the forward problem here. W e are dealing with a 2D problem, so x1 or x represents distance along the earth's surface, and x2or z represents depth into the earth. T h e forward acoustic problem consists of solving the following equation, given po, c, F , and S: I f p = -U, and zl = (l/po)Vn + G, where R(.r, t)= c'p,,V . (G), and G, = (l/po)F, thenp and i' solve Euler's equations for pressure and velocity. To that equation we add the earth's surface condition: p(.q, 0, t) = -ut(x1,0, t) = S(xl,t),where Sis a surface excitation (S= O if there is no surface source). T h e forward elastic problem consists of solving the following equation, given U ,p, p, F1,F z , S,, and Sz: a ax pnrt = - [ ( h + 2 p ) u , +hiE,]t where p = p(x,~)is the equilibrium density; h = h(x,z),p = p(x,z) are the Lain6 parameters; c = U = d(h2p/p), p = PVM components PVM has two primary components:controlling daemons and a procedural libra7y. The controlling daemon PVMD institutes distributed control by requiring each processing unit in the distributed calculation to execute its own copy of PVMD. Each processing unit thereby absorbs any masterlslave overhead. As the controlling daemons exchange information, a resident look-up table of enrolled subprocesses enables interprocessorcommunication. PVMD also facilitates point-to-pointdata transfer, message broadcasting, mutual exclusion, process control, shared memory emulation,and barriers. The set of simple subroutinecalls in the procedural library lets programmers interact with PVMD in a relatively transparent manner. Therefore, parallelizing an application requires few subroutine calls and provides flexibility. d(p/p) are the P and S wave velocities, respectively; 14 is the horizontal particle displacement; w is the vertical particle displacement; and F,, F2 are the interior sources. Free-surface boundary conditions describe the earth's surface: where S, and SI are the surface excitation sources. Our model of the earth has idealized curves of discontinuity for p, c, and p that describe the iizte7fnces between layered media. Geophysicists use the inverse problem to locate these interface curves and determine the layers' parameters. For the forward problem, these interfaces and parameters arc specified. W e could use many numerical methods to solve the forward problem, including finite-difference, finiteelement, Fourier, and pseudospecu-al.Each has strengths and weaknesses. U'e chose the finite-difference method,3,-'which gives discrete difference equations for each point in the region of interest and integrates the equations at each spatial grid point. W'e use centered differences to keep second-order accuracy over time. Integration forces continuity of pressure and normal velocity at the interfaces in the acoustic wave equation, and the particle displacements, normal stresses, and tangential stresses at the interfaces in the elastic wave equation. This method is naturally parallel, because the integration scheme is uniform at each node (grid point) and may be handled independently. Typically, the region surrounding the explosion site Wavepropagation models Figure A1 shows an acoustic model of a salt dome lying between 500 and 1,000meters under three layersof different homogeneous materials that are separated at 200 meters and 400 meOm 1,000 m 0.2 seconds ters. The model dimensions are 1,OOO x 1,000 meters; sound speed values vary from 1,000 to 5,000 meters per second, and density vanes from 1,000 to 5,500 kilograms per cubic meter. The casual source is a surface explosion set off at 400 meters, with frequencies from 3-7 Hz. Figure A2 shows pressure distribution (wave (11 propagation) at 0.2, 0.3, 0.4, and 0.5 igure A. Acoustic wave simulal:ion. seconds. The remaining parameters for the acoustic model are dt = 0.001 second, dx = dz = 10 meters. Figure B1 shows the elastic model, which is the same as the acousticmodel except that a fluid-saturated dome lies from 600 to 800 meters and there is an outcropping at the surface. The dome demonstrates S wave generation (not present in acoustic simulations), while the slanted interface at the surface Om 1,000 m 0.2 seconds shows the importance of accurate surface boundary conditions for the elastic model. The source, located at x = 600 meters and z = 200 meters, is a compressional spherical source with amplitude in time given by the derivative of a Gaussian. Figure B2 shows particle motion (wave propagation) en0.4 seconds ergy at 0.2, 0.3, 0.4, and 0.5 seconds. (2) The remaining parameters are the Figure B. Elastic wave simulation. same as for the acoustic model. has no physical boundaries, so the numerical simulation should minimize spurious (artificial) reflections off the numerical boundary. W e use numerical boundary conditions called absov-bing boundaly conditions to reduce or eliminate spurious reflections. Since only the processors handling the model's outer edges calculate boundary conditions, this presents a load-balancing problem. W e address this problem using a damping method,j remembering that absorbing boundary conditions are a t best approximately absorbing. This method requires a modified wave equation (at the boundary only) that artificially maintains load balancing. This equation requires more calculations a t the interior points, but most absorbing boundary conditions are computationally intensive. 28 0.3 seconds 0.3 seconds T h e difference approximation to the acoustic equation without damping is u;;l = where IEEE Parallel & Distributed Technology and bj,k = (l/PO(T,zk)). T h e finite-difference equation that includes the absorbing boundary conditions is 1 2 3 4 5 6 Processors ~ ~~~~ ~- ~~ ~ Figure I . Communication time. where vIilis the expression computed in the difference approximation without damping and Aj,k is the damping weights. This forms a five-point star for U . T h e difference equations for the elastic equation are ~ i m i l a r but , ~ they include mixed differences for the cross-derivative terms," and the difference stencil is a nine-point star. T h e elastic equation's free-surface boundary conditions are difficult to solve using finite differences. W e use an implicit m e t h ~ d , ~which Jj creates a system that has four bands and must be solved a t each time iteration. Directly inverting this matrix is essentially a sequential algorithm, so we solve this system by iterative methods in order to keep the code parallel. PVM implementation T h e parallel version of our acoustic wave propagation simulator uses the hosdnode approach. T h e host program performs VO and dictates the domain decomposition to the node program. T h e node program gathers and distributes information needed for and produced by the finite-difference calculations, and communicates iterative interprocessor boundary solutions to neighboring nodes with respect to the problem's domain decomposition. This is a 2D decomposition, so we can divide the domain into strips to exploit available vector processors or into patches to reduce communication packet size. Our timings show that this flexibility can help achieve optimal speedup. T h e node program calculates communication pathways by assigning node values from 0 to n-1 to the processors. This maintains nearest-neighbor communication, although PVM obtains no computational advantage from it. T h e communication of the iteration interprocessor boundary solutions synchronizes the node programs, while requirements for output synchronize the hosdnode programs. Spring 1994 T h e parallel version of the elastic wave propagation simulator is similar, but we add the implicit method to incorporate stress-free surface conditions. Since these calculations occur only at the surface nodes, load balancing and node program synchronization become issues. W e do not have to reorganize the data structure among the nodes because the implicit method uses the same finitedifference stencil. However, we also use the conjugate gradient squared algorithm7as a solver, which ylelds five barriers to parallelization that involve both inner products and a matrix multiply. The inner products require a global sum across surface nodes, but the associated communication packet is small. For the matrix multiply, the necessary matrix components are locally available, but vector components that correspond to the off-diagonal bands are not resident and must be gathered using nearest-neighbor communication between surface nodes. Computational results W e analyzed the forward problem for two models that are important to geophysicists (see the second sidebar). We ran an acoustic model and elastic model (with and without the free-surface solve) 10 times each on configurations of one to six processors, for a total of 60 runs per model. Figure 1 shows the overall communication time for each simulation.The acoustic simulation requires less communication time because we are solving only for the pressure, as opposed to solving for vertical and horizontal displacements. However, the elastic simulation seems to take slightly less communication time with the freesurface solve than without it. This anomaly indicates that the synchronization step in the surface solve alleviated a communication bottleneck caused by network saturation. Since the amount of information communicated is constant, the flattening of the curves indicates how well PVM software parallelizes codes. 29 6.0 n 3 U a m x Acoustic + Elastic 5.0 o Elastic with free surface solve 4.0 free surface solve 3.0 2.0 1.o 0.0 00’ _.I 2 1 ~ ~ 3 4 Processors 5 ~~ ~ 3.0 2.0 1.o 0.0 ~ 1 2 3 4 Processors 5 6 Figure 4. Timestep time. Figure 2 shows the ideal computational time for each simulation, gven PVM overhead and the chosen algorithms. Highly parallel algorithms such as those we use reduce inhibition of parallelization. T h e matrix solve, used for the elastic simulation with the free-surface solve, drastically inhibits parallelizationbecause many processors are unused. Figure 3 shows the actual time for each simulation, indicating overall speedup of 5.14 for the acoustic simulation, 5.42 for the elastic simulation, and 4.75 for the elastic simulation with the free-surface solve. Compared to the ideal times in Figure 2, the acoustic simulations balance computation and communication poorly, while the elastic simulations are more evenly balanced. This indicates a need to analyze &s balance thoroughly when running distributed code. Figure 4 factors out the startup time to indicate how the host-node paradigm performs. PVM initialization requires only a few seconds and thus does not inhibit speedup. 30 ~ 5 6 .~ Future directions S U W m 3 4 Processors T h e major performance difference between the acoustic and elastic models was communication overhead caused by the elastic model’s free-surface constraints. Acoustic Elastic 5.0 o Elastic with free surface solve 4.0 x f n ~~ 2 Figure 3. Actual computational time. Figure 2. Ideal computational time. 6.0 1 6 T h e matrix computations arising from the free-surface conditions are similar to those for elliptic differential equations. One way to parallelize this computation is to diagonally precondition this system and then parallelize the matrix multiply part of a preconditioned conjugategradient type iterative procedure. This also requires parallelizing the scalar product and a global sum. Techniques for these parallel computations are available. As the application’ssize increases and the discretization sizes decrease, the conditioning of the matrix described above will increase significantly, and the diagonal preconditioner will be less effective. Therefore, we are developing better parallelization methods based on domain decomposition.’O W e have written a general Additive-Schwartz, overlapping domain code9 that physically splits the domain; the size of the overlap between regions is given as an input parameter and controls communication. W e can then locally apply multigrid methods to give good local preconditioning. D VM does have disadvantages. It cannot ex- ploit nearest-neighbor communication. Since it depends on existing networks, communications must follow network package protocols, so several machines may process a message before it reaches its destination. Also, the network could become a significant bottleneck. For many applications, speedups will be less significant as processors are added and network communication becomes saturated. Finally, we performed our simulation on homogeneous hardware in an isolated network; perforIEEE Parallel & Distributed Technology mance will probably degrade in a heterogeneous environment or a network with heavy or bursty traffic. A heterogeneous system will also cause additional loadbalancing problems, and PVM may not be suitable for some algorithms in a heterogeneous environment due to incompatible processors and inaccuracies in their math libraries. However, these drawbacks should diminish as network technologies improve, as the Open Systems Foundation addresses system compatibility, and as PVM undergoes continued development. % ACKNOWLEDGMENTS We thank Patrick K. Malone, Christian Turner, and Phillip Crotwell for their help and the University of South Carolina and Westinghouse Savannah River Laboratory for use of the IBM RS/6000 computing ring. This work was supported in part by the National Science Foundation under grants EHR-910-8774, EHR-910-8772, and INT-89-14472. REFERENCES 1. A. Beguelin et al., “A User’s Guide to PVM Parallel Virtual Machine,”Tech. ReportTM-1126, OakRidge Nat’l Laboratory, Oak Ridge, Term., 1991. 2. G.A. Geist and V.S. Sunderam, “Network-Based Concurrent Computing on the PVM System,” to appear in Concurrency: Practice and Experience. 3. K.R. Kelly et al., “Modeling: The Forward Method,” in Concepts and Techniques in Oil and Gas Exploration, K.C. Jain and R.J.P. de Figueiredo, eds., Soc. Exploration Geophysicists, Tulsa, Okla., 1982. 4. J.S. Sochackiet al., “Interface Conditions for Acoustic and Elastic WavePropagation,”Geophysics,Vol. 56,No. 2,1991, pp. 161-181. 5. J.S. Sochacki et al., “Absorbing Boundary Conditions and Surface Waves,” Geophysics,Vol. 52, No. 1, 1987, pp. 60-71. 6. J.E. Vidale and R.W. Clayton, “A Stable Free-Surface Boundary Condition for Two-Dimensional Elastic Wave Propagation,” Geophysics,Vol. 51, No. 12, 1986, pp. 2247-2249. 7. P. Sonneveld, “CGS: A Fast Lanczos-Type Solver for Nonsynmetriclinear Systems,” SlAM?. Scientific and Statistical Computing, Vol. 10, No. 1, 1989, pp. 36--52. 8. J.S. Sochacld et al., “SeismicModehgand Inversion on the nCube,” Frfth Dhrihted Memory Cvmputing Conference, Vol. 1, IEEE Computer Society Press, Los Alamitos, Calif., 1990, pp. 530-535. 9. R.E. Ewing et al., “ParallelizationofMultiphaseModels for Contaminant Transport in Porous Media,” in Parallel Processingfor Scient$cCmpting, Vol. 1, R. Sincovecetal., eds., 1993,pp. 83-91. 10. J.H. Bramble et al., “Convergence Estimates for Product Iterative Methods with Applications to Domain Decomposition,” Math. Comp., Vol. 57, 1991, pp. 2 3 4 5 . Spring 1994 Richard E. Ewing is professor of mathematics and engineering, director of the Institute for Scientific Computation, dean of Science, and Texas Engineering Experiment Station Distinguished Research Chair at Texas A&M University. He has conducted research in numerical analysis, mathematical modeling, fluid flow in porous media, and large-scale scientific computation. He has more than 180 scientific publications in journals, books, and proceedings. He received his PhD in mathematics from the University of Texas a t Austin. Readers can contact Ewing a t the Institute for Scientific Computation, Texas A&M University, College Station. T X 77843. Robert C. Sharpley is a professor of mathematics at the University of South Carolina. He is an editor of ConstruaiveApproximationand the author of two research monographs and thirty research articlesin approximationtheory, functional analysis, numerical analysis, computational science, Fourier analysis, and partial differential equations. He received his PhD in mathematics from the University of Texas at Austin in 1972. Readers can contact Sharpleyat the Department of Mathematics, University of South Carolina, Columbia, SC 29208. DerekMitchum is a graphics specialist in the Department ofMathematics at the University of South Carolina. H e was previously a systems programmer for the University of Wyoming. Readers can contact him at the Department of Mathematics, University of South Carolina, Columbia, SC 29208. James Sochacki is an applied mathematician for the Department of Mathematics at James Madison University. His research interests include linear and nonlinear wave propagation, especially the numerical approximation of these equations. He is also developing an interdisciplinary undergraduate mathematical modeling center. Readers can contact Sochacki at the Deparnnent of Mathematics, James Madison University, Harrisonburg, VA 22807. Patrick O’Leary is a research scientist in mathematics at the University of Wyoming. His current research interests include parallelism, scientific visualization, and mathematical modeling. Readers can contact O’Leary at the Institute for Scientific Computation, University of Wyoming, Laramie, WY 8207 1. 31