Parallelization of Phase-Field Model for Phase Transformation Problems in a Flow Field Ying Xua , J. M. McDonoughb and K. A. Tagavia a Department of Mechanical Engineering University of Kentucky, Lexington, KY 40506-0503 b Departments of Mechanical Engineering and Mathematics University of Kentucky, Lexington, KY 40506-0503 We implement parallelization of a phase-field model for solidification in a flow field using OpenMP and MPI to compare their parallel performance. The 2-D phase-field and Navier–Stokes equations are presented with prescribed boundary and initial conditions corresponding to lid-driven-cavity flow. Douglas & Gunn time-splitting is applied to the discrete governing equations, and a projection method is employed to solve the momentum equations. Freezing from a supercooled melt, nickel, is initiated in the center of the square domain. The approach taken to parallelize the algorithm is described, and results are presented for both OpenMP and MPI with the latter being decidedly superior. 1. INTRODUCTION Phase-field models have been applied to simulation of phase transformation problems for decades. Initially most researchers focused on pure substances in the 2-D case and did not consider convection induced by a velocity field or small-scale fluctuations. But phase transitions in binary alloys and solidification in the presence of convection has attracted increasing interest in recent studies. Anderson et al. [1] derived a phase-field model with convection and gave some simple examples of equilibrium of a planar interface, density-change flow and shear flow based on the model they obtained; they also applied the sharp-interface asymptotics analysis to the phase-field model with convection [2]. Beckermann et al. [3] provided a phase-field model with convection using the same kind of volume or ensemble averaging methods. They also presented more complex examples such as simulation of convection and coarsening in an isothermal mush of a binary alloy and dendritic growth in the presence of convection; phase-field and energy equations were solved using an explicit method in this research. Al-Rawahi and Tryggvason [4] simulated 2-D dendritic solidification with convection using a somewhat different method based on front tracking. We remark that all previous research on solidification in a flow field involved length and time scales in microns and nanoseconds, respectively, which is not appropriate for studies of freezing in typical flow fields such as rivers and lakes, or industrial molds and castings. Therefore, the concepts of multiscale methods should be introduced to phase1 2 Ying Xu et al. field models with convection. Even such formulations still have the drawback of being very CPU intensive. Hence, it is necessary to apply parallelization to phase-field model algorithms in order to decrease wall-clock time to within practical limits. We introduce the phase-field model with convection in the first part of this paper, followed by numerical solutions corresponding to growth of dendrites in a supercooled melt of nickel. Finally we discuss the approach to parallelization and the speedups obtained. Parallelization via both OpenMP and MPI has been implemented on a symmetric multiprocessor; the latter is found to be significantly more effective but required considerably more programming effort. 2. GOVERNING EQUATIONS OF PHASE-FIELD MODEL WITH CONVECTION In this section we introduce the equations of the phase-field model including effects of convective transport on macroscopic scales. The difference in length scales between dendrites and flow field make it unreasonable to implement a dimensionless form of the governing equations since there is no single appropriate length scale. Therefore, dimensional equations are employed. Boundary and initial conditions required to formulate a well-posed mathematical problem are also prescribed. The coupled 2-D Navier–Stokes equations and phase-field model are ux + v y = 0 , (1a) µ(φ) 1 px + ∆u + X1 (φ) , ρ0 ρ0 (1b) 1 µ(φ) [ρ(φ, T ) − ρL ] py + ∆v + X2 (φ) − g, ρ0 ρ0 ρ0 (1c) ut + (u2 )x + (uv)y = − vt + (uv)x + (v 2 )y = − 2 30ρ0 L0 ∇ · ((ξ · ∇φ)ξ) − ψ(φ)(Tm − T ) M Tm M ρ0 0 − ψ (φ)T + Y (φ, T ) , aM 2 k 30L0 ψ(φ) Dφ Tt + (uT )x + (vT )y = ∆T + ∇ · ((ξ · ∇φ)ξ) − ρ0 cp (φ) 2 cp (φ) Dt φt + (uφ)x + (vφ)y = + W (u, v, φ, T ) , (1d) (1e) to be solved on a bounded domain Ω ⊆ R2 . In these equations coordinate subscripts x, y, t denote partial differentiation, and ∇ and ∆ are gradient and Laplace operators in the coordinate system imposed on Ω; D/Dt is the usual material derivative. Here u and v are velocities in x and y directions; p, T , φ are gauge pressure, temperature and phase field variable, respectively; L0 is the latent heat per unit mass at the melting temperature Tm , and k is thermal conductivity; ρ0 is the reference density. In our computations, density ρ, dynamic viscosity µ and specific heat cp are no longer constants; they are constant in each bulk phase, but are functions of the phase-field variable over the thin interface separating the bulk phases. In addition, since the Boussinesq approximation is used to represent the 3 Parallelization of Phase-Field Model in Flow Field buoyancy force term in Eq. (1c), density is also a function of temperature. We express the density, dynamic viscosity and specific heat in the forms: ρ(φ, T ) = ρS + P (φ)[ρL − ρS + βρL (T − Tm )] , µ(φ) = µS + P (φ)(µL − µS ) , cp (φ) = cpS + P (φ)(cpL − cpS ) , where subscripts S and L denote solid and liquid respectively, and β is the coefficient of thermal volumetric expansion. ψ(φ) is a double-well potential, and P (φ) is a polynomial introduced to denote the interface; these functions are commonly given as polynomials in φ: ψ(φ) = φ2 (1 − φ)2 , P (φ) = 6φ5 − 15φ4 + 10φ3 . Other parameters in Eqs. (1) are √ 2 = 6 2σδ , ρ 0 Tm δ a= √ , 6 2σ M= ρ 0 L0 δ , Tm µ k where 2 is the positive gradient coefficient related to the interfacial thickness δ in such a way that as δ → 0, the phase-field model approaches the modified Stefan model; µk is the kinetic coefficient, and M is the mobility which is related to the inverse of kinetic coefficient; a is a positive parameter occuring in the double-well potential which is related to the surface tension σ. Kinetic coefficient µk is a microscopic physical parameter reflecting kink density at steps in solid thickness and the atom exchange rate at each kink, as explained in Chernov [5] and Ookawa [6]. It is a function of temperature and orientation, and also depends on the material; the kinetic coefficient of metal is the highest among all materials. Linear variation of kinetic coefficient might be reasonable for a molecularly rough interface; however, it strongly depends on orientation of the interface for facetted interfaces, as shown by Langer [7]. For simplicity, the kinetic coefficient is usually assumed to be a constant as we do herein, but it is difficult to determine the value of this constant either theoretically or experimentally. The forcing terms in Eqs. (1) take the forms X1 (φ) = − 2 φx ξ12 φxx + 2ξ1 ξ2 φxy + ξ22 φyy , ρ0 (5a) X2 (φ) = − 2 φy ξ12 φxx + 2ξ1 ξ2 φxy + ξ22 φyy , ρ0 (5b) Y (φ, T ) = 30 p ψ(φ)[ρL − ρS + βρL (T − Tm )] , ρ0 M W (u, v, φ, T ) = (5c) µ(φ) 2 2 2ux + vy2 + (uy + vx )2 + ξ12 φ2x − ξ22 φ2y (vy − ux ) ρ0 cp (φ) 4ρ0 cp (φ) − 2 vx ξ1 ξ2 φ2x + ξ22 φx φy + uy ξ12 φx φy + ξ1 ξ2 φ2y . 2ρ0 cp (φ) (5d) 4 Ying Xu et al. In the above equations, we have introduced a vector φy φx , ξ = (ξ1 , ξ2 ) = |ξ| |∇φ| |∇φ| φy φx , = [1 + m cos m(θ + α)] , |∇φ| |∇φ| (6a) to represent anisotropy in the interfacial energy and kinetics for a crystal of cubic symmetry with anisotropy strength m ; m determines the mode of symmetry of the crystal; θ = arctan(φy /φx ) is the angle between the interface normal and the crystal axis, and α denotes the angle of the symmetry axis with respect to the x-axis. The forcing terms X1 and X2 represent the effects introduced by the solidification process on the flow field; they are effective only over the interface region. Y (φ, T ) is a modification to the Stefan condition caused by the buoyancy force term; W (u, v, φ, T ) is viscous dissipation in the energy equation, and it can usually be neglected since it is small compared with other dissipative terms in that equation. On the domain Ω ≡ [0, l] × [0, l], prescribed boundary conditions are u=0 u=U v≡0 ∂p =0 ∂n ∂φ =0 ∂n ∂T =0 ∂n on ∂Ω\{(x, y)|y = l} , on {(x, y)|y = l} , on ∂Ω , on ∂Ω ∪ ∂Ω0 , on ∂Ω , on ∂Ω . Initial conditions are u0 = v 0 = 0 φ0 = 0 and T0 < Tm on Ω , in Ω0 ≡ {(x, y) | |x| + |y| ≤ lc , (x, y) ∈ [−lc , lc ]2 } , where lc is one half the length of the diagonal of a 45◦ -rotated square in the center of the domain. 3. NUMERICAL METHODS AND RESULTS The governing equations (1b–e) are four coupled nonlinear parabolic equations in conserved form. We apply a projection method due to Gresho [8] to solve the momentum equations while preserving the divergence-free constraint (1a). The Shumann filter [9] is applied to solutions of the momentum equations (1b), (1c) prior to projection to remove aliasing due to under resolution. Since time-splitting methods are efficient for solving multi-dimensional problems by decomposing them into sequences of 1-D problems, a δform Douglas & Gunn [10] procedure is applied to the current model. Quasilinearization of Eqs. (1b–d) is constructed by Fréchet–Taylor expansion in “δ-form” as described by Ames [11] and Bellman and Kalaba [12]. Parallelization of Phase-Field Model in Flow Field 5 The computations are performed on a square domain Ω ≡ [0, 63 cm] × [0, 63 cm]. Initially, the currently-used material, nickel, is supercooled by an amount ∆T = Tm − T = 224K, and freezing begins from a small, rotated square with half-diagonal length lc = 1.89 cm in the center of the domain. The numerical spatial and time step sizes are ∆x = ∆y = 0.315 cm, ∆t = 10−6 s, respectively, and the length scale for the interfacial thickness is δ = 0.105 cm. The kinetic coeffient µk is chosen to be 2.85 m/s · K. Figure 1 displays the velocity field at t = 100.1s. Lid-driven-cavity flow is introduced at t = 100 s with U = 1cm/s. Dendrite shape evolves from the initial rotated square to an approximate circle as shown in Fig. 1. We observe that velocity near the solid-liquid interface is greater than that nearby, a direct result of the forcing terms X1 and X2 in momentum equations. We also have found that the flow field has a significant effect on the growth rate of dendrites although this is not evident from Fig. 1; in particular, the growth rate is decreased by the flow field. Figure 1. Lid-Driven-Cavity Flow with Freezing from Center at t = 100.1s 4. APPROACH TO PARALLELIZATION AND RESULTS Parallelization of the numerical solution procedure is based on the shared-memory programming paradigm using the HP Fortran 90 HP-UX compiler. The program is parallelized using OpenMP and MPI running on the HP SuperDome at the University of Kentucky Computing Center to compare parallel performance of these two approaches. The maximum number of processors available on a single hypernode of the HP Super- 6 Ying Xu et al. Dome is 64, and in the current study each processor is used to compute one part of the whole domain. For parallelization studies the grid is set at 201 × 201 points corresponding to the domain size 63 cm × 63 cm. The procedure for parallelizing two-step Douglas & Gunn time-splitting with MPI is to compute different parts of the domain on different processors, i.e., simply a crude form of domain decomposition. In particular, we divide the domain into n equal pieces along the separate directions corresponding to each split step, where n is the number of processors being used. That is, we first divide the domain in the x direction during the first time-splitting step, and then in the y direction during the second step. Therefore, transformations of data between each processor are required during the two steps of time-splitting. The sketch of this is shown in Fig. 2. Moreover, data transformations are also needed between adjacent boundaries of each processor. Since communication between processors increases with increasing number of processors for a fixed number of grid points, parallel performance is expected to decrease in such a case, resulting in only a sub-linear increase of speed-up for MPI. processor processor n 1 2 … … n processor 2 y y processor 1 x x splitting step I splitting step II Figure 2. Distrubution of Processors for Two-Level Douglas & Gunn Time-Splitting The implementation of OpenMP, on the other hand, is quite straightforward. It can be done by automatic parallelization of DO loops. All that is necessary is to share the information required by the parallelization within the DO loop, and this is easily handled with the OpenMP syntax. To study the speed-up achieved by parallelization, different numbers n of processors (n = 1, 2, 4, 8, 16, 32) are used to execute the algorithm until t = 5 × 10−3 s for both OpenMP and MPI. Figure 3 displays the speed-up factor versus number of processors. It shows that, as the number of processors increases, the speed-up factors increase only 7 Parallelization of Phase-Field Model in Flow Field sub-linearly for both OpenMP and MPI. Moreover, the speed-up performance of MPI is better than that of OpenMP. The curve for OpenMP in Fig. 3 also suggests that the speed-up factor attains its maximum at a number of processors only slightly beyond 32 for the present problem. Moreover, it is clear that parallel efficiency is quite low for OpenMP already by 16 processors, so this is possibly the maximum number that should be used. It should also be mentioned that even though the MPI implementation has not yet been completely optimized, the CPU time of MPI runs is somewhat less than that for OpenMP. Better performance could be achieved if further optimization of MPI is applied within the context of the current algorithm. Such optimization might include use of nonblocking communication, sending noncontiguous data using pack/unpack functions, decreasing unnecessary blocking, and optimizing the number of Douglas & Gunn split line solves simultaneously sent to each processor. The last of these can significantly alter the communication time to compute time tradeoff. 40 theoretical speed-up MPI results OpenMP results Speedup 30 20 10 0 0 10 20 30 40 Number of Processors Figure 3. Speed-up Performance of Parallelized Phase-Field Model with Convection 5. SUMMARY AND CONCLUSIONS In this paper we have compared parallel performance of OpenMP and MPI implemented for the 2-D phase-field model in a flow field. We found that MPI is both more efficient and also exhibited higher absolute performance than OpenMP. Moreover, it requires less memory than does OpenMP since MPI supports distributed memory while OpenMP supports shared-memory programming. Therefore, since memory requirements 8 Ying Xu et al. for our current problem are high, MPI is recommended for such problems. However, the implementation of MPI is more difficult than that of OpenMP. For example, programming effort for the current problem using MPI was approximately 100 times greater than that using OpenMP. 6. ACKNOWLEDGEMENTS This work is supported by Center for Computational Science of University of Kentucky. We are also grateful to the University of Kentucky Computing Center for use of their HP SuperDome for all the computations. REFERENCES 1. D. M. Anderson, G. B. McFadden, and A. A. Wheeler. A phase-field model of solidification with convection. Physica D, 135:175–194, 2000. 2. D. M. Anderson, G. B. McFadden, and A. A. Wheeler. A phase-field model with convection: sharp-interface asymptotics. Physica D, 151:305–331, 2001. 3. C. Beckermann, H.-J. Diepers, I. Steinbach, A. Karma, and X. Tong. Modeling melt convection in phase-field simulations of solidification. J. Comput. Phys., 154:468–496, 1999. 4. Nabeel Al-Rawahi and Gretar Tryggvason. Numerical simulation of dendritic solidification with convection: Two-dimensional geometry. J. Comput. Phys., 180:471–496, 2002. 5. A. A. Chernov. Surface morphology and growth kinetics. In R. Ueda and J. B. Mullin, editors, Crystal Growth and Characterization, pages 33–52. North-Holland Publishing Co., Amsterdam, 1975. 6. A. Ookawa. Physical interpretation of nucleation and growth theories. In R. Ueda and J. B. Mullin, editors, Crystal Growth and Characterization, pages 5–19. North-Holland Publishing Co., Amsterdam, 1975. 7. J. S. Langer. Models of pattern formation in first-order phase transitions. In G. Grinstein and G. Mazenko, editors, Directions in Condensed Matter Physics, pages 164– 186. World Science, Singapore, 1986. 8. P. M. Gresho. On the theory of semi-implicit projection methods for viscous incompressible flow and its implementation via a finite element method that also introduces a nearly consistent mass matrix. part 1: Theory. Int. J. Numer. Meth. Fluids, 11:587– 620, 1990. 9. F. G. Shuman. Numerical method in weather prediction: smoothing and filtering. Mon. Weath. Rev., 85:357–361, 1957. 10. J. Douglas Jr. and J. E. Gunn. A general formulation of alternating direction methods, part 1. parabolic and hyperbolic problems. Numer. Math., 6:428–453, 1964. 11. W. F. Ames. Numerical Methods for Partial Differential Equations. Academic Press, New York, NY, 1977. 12. R. E. Bellman and R. E. Kalaba. Quasilinearization and Nonlinear Boundary-Value Problems. American Elsevier Publishing Company, Inc., New York, NY, 1965.