Parallelization of Phase-Field Model for Phase Transformation Problems

advertisement
Parallelization of Phase-Field Model for Phase Transformation Problems
in a Flow Field
Ying Xua , J. M. McDonoughb and K. A. Tagavia
a
Department of Mechanical Engineering
University of Kentucky, Lexington, KY 40506-0503
b
Departments of Mechanical Engineering and Mathematics
University of Kentucky, Lexington, KY 40506-0503
We implement parallelization of a phase-field model for solidification in a flow field
using OpenMP and MPI to compare their parallel performance. The 2-D phase-field and
Navier–Stokes equations are presented with prescribed boundary and initial conditions
corresponding to lid-driven-cavity flow. Douglas & Gunn time-splitting is applied to the
discrete governing equations, and a projection method is employed to solve the momentum
equations. Freezing from a supercooled melt, nickel, is initiated in the center of the square
domain. The approach taken to parallelize the algorithm is described, and results are
presented for both OpenMP and MPI with the latter being decidedly superior.
1. INTRODUCTION
Phase-field models have been applied to simulation of phase transformation problems
for decades. Initially most researchers focused on pure substances in the 2-D case and
did not consider convection induced by a velocity field or small-scale fluctuations. But
phase transitions in binary alloys and solidification in the presence of convection has
attracted increasing interest in recent studies. Anderson et al. [1] derived a phase-field
model with convection and gave some simple examples of equilibrium of a planar interface,
density-change flow and shear flow based on the model they obtained; they also applied
the sharp-interface asymptotics analysis to the phase-field model with convection [2].
Beckermann et al. [3] provided a phase-field model with convection using the same kind
of volume or ensemble averaging methods. They also presented more complex examples
such as simulation of convection and coarsening in an isothermal mush of a binary alloy
and dendritic growth in the presence of convection; phase-field and energy equations were
solved using an explicit method in this research. Al-Rawahi and Tryggvason [4] simulated
2-D dendritic solidification with convection using a somewhat different method based on
front tracking.
We remark that all previous research on solidification in a flow field involved length
and time scales in microns and nanoseconds, respectively, which is not appropriate for
studies of freezing in typical flow fields such as rivers and lakes, or industrial molds and
castings. Therefore, the concepts of multiscale methods should be introduced to phase1
2
Ying Xu et al.
field models with convection. Even such formulations still have the drawback of being
very CPU intensive. Hence, it is necessary to apply parallelization to phase-field model
algorithms in order to decrease wall-clock time to within practical limits.
We introduce the phase-field model with convection in the first part of this paper, followed by numerical solutions corresponding to growth of dendrites in a supercooled melt
of nickel. Finally we discuss the approach to parallelization and the speedups obtained.
Parallelization via both OpenMP and MPI has been implemented on a symmetric multiprocessor; the latter is found to be significantly more effective but required considerably
more programming effort.
2. GOVERNING EQUATIONS OF PHASE-FIELD MODEL WITH CONVECTION
In this section we introduce the equations of the phase-field model including effects
of convective transport on macroscopic scales. The difference in length scales between
dendrites and flow field make it unreasonable to implement a dimensionless form of the
governing equations since there is no single appropriate length scale. Therefore, dimensional equations are employed. Boundary and initial conditions required to formulate a
well-posed mathematical problem are also prescribed.
The coupled 2-D Navier–Stokes equations and phase-field model are
ux + v y = 0 ,
(1a)
µ(φ)
1
px +
∆u + X1 (φ) ,
ρ0
ρ0
(1b)
1
µ(φ)
[ρ(φ, T ) − ρL ]
py +
∆v + X2 (φ) −
g,
ρ0
ρ0
ρ0
(1c)
ut + (u2 )x + (uv)y = −
vt + (uv)x + (v 2 )y = −
2
30ρ0 L0
∇ · ((ξ · ∇φ)ξ) −
ψ(φ)(Tm − T )
M
Tm M
ρ0 0
−
ψ (φ)T + Y (φ, T ) ,
aM
2
k
30L0 ψ(φ) Dφ
Tt + (uT )x + (vT )y =
∆T +
∇ · ((ξ · ∇φ)ξ) −
ρ0 cp (φ)
2
cp (φ)
Dt
φt + (uφ)x + (vφ)y =
+ W (u, v, φ, T ) ,
(1d)
(1e)
to be solved on a bounded domain Ω ⊆ R2 . In these equations coordinate subscripts x, y,
t denote partial differentiation, and ∇ and ∆ are gradient and Laplace operators in the
coordinate system imposed on Ω; D/Dt is the usual material derivative. Here u and v are
velocities in x and y directions; p, T , φ are gauge pressure, temperature and phase field
variable, respectively; L0 is the latent heat per unit mass at the melting temperature Tm ,
and k is thermal conductivity; ρ0 is the reference density. In our computations, density ρ,
dynamic viscosity µ and specific heat cp are no longer constants; they are constant in each
bulk phase, but are functions of the phase-field variable over the thin interface separating
the bulk phases. In addition, since the Boussinesq approximation is used to represent the
3
Parallelization of Phase-Field Model in Flow Field
buoyancy force term in Eq. (1c), density is also a function of temperature. We express
the density, dynamic viscosity and specific heat in the forms:
ρ(φ, T ) = ρS + P (φ)[ρL − ρS + βρL (T − Tm )] ,
µ(φ) = µS + P (φ)(µL − µS ) ,
cp (φ) = cpS + P (φ)(cpL − cpS ) ,
where subscripts S and L denote solid and liquid respectively, and β is the coefficient of
thermal volumetric expansion. ψ(φ) is a double-well potential, and P (φ) is a polynomial
introduced to denote the interface; these functions are commonly given as polynomials in
φ:
ψ(φ) = φ2 (1 − φ)2 ,
P (φ) = 6φ5 − 15φ4 + 10φ3 .
Other parameters in Eqs. (1) are
√
2 = 6 2σδ ,
ρ 0 Tm δ
a= √ ,
6 2σ
M=
ρ 0 L0 δ
,
Tm µ k
where 2 is the positive gradient coefficient related to the interfacial thickness δ in such
a way that as δ → 0, the phase-field model approaches the modified Stefan model; µk
is the kinetic coefficient, and M is the mobility which is related to the inverse of kinetic
coefficient; a is a positive parameter occuring in the double-well potential which is related
to the surface tension σ. Kinetic coefficient µk is a microscopic physical parameter reflecting kink density at steps in solid thickness and the atom exchange rate at each kink, as
explained in Chernov [5] and Ookawa [6]. It is a function of temperature and orientation,
and also depends on the material; the kinetic coefficient of metal is the highest among
all materials. Linear variation of kinetic coefficient might be reasonable for a molecularly
rough interface; however, it strongly depends on orientation of the interface for facetted
interfaces, as shown by Langer [7]. For simplicity, the kinetic coefficient is usually assumed to be a constant as we do herein, but it is difficult to determine the value of this
constant either theoretically or experimentally.
The forcing terms in Eqs. (1) take the forms
X1 (φ) = −
2
φx ξ12 φxx + 2ξ1 ξ2 φxy + ξ22 φyy ,
ρ0
(5a)
X2 (φ) = −
2
φy ξ12 φxx + 2ξ1 ξ2 φxy + ξ22 φyy ,
ρ0
(5b)
Y (φ, T ) =
30 p
ψ(φ)[ρL − ρS + βρL (T − Tm )] ,
ρ0 M
W (u, v, φ, T ) =
(5c)
µ(φ) 2
2
2ux + vy2 + (uy + vx )2 +
ξ12 φ2x − ξ22 φ2y (vy − ux )
ρ0 cp (φ)
4ρ0 cp (φ)
−
2
vx ξ1 ξ2 φ2x + ξ22 φx φy + uy ξ12 φx φy + ξ1 ξ2 φ2y .
2ρ0 cp (φ)
(5d)
4
Ying Xu et al.
In the above equations, we have introduced a vector
φy
φx
,
ξ = (ξ1 , ξ2 ) = |ξ|
|∇φ| |∇φ|
φy
φx
,
= [1 + m cos m(θ + α)]
,
|∇φ| |∇φ|
(6a)
to represent anisotropy in the interfacial energy and kinetics for a crystal of cubic symmetry with anisotropy strength m ; m determines the mode of symmetry of the crystal;
θ = arctan(φy /φx ) is the angle between the interface normal and the crystal axis, and
α denotes the angle of the symmetry axis with respect to the x-axis. The forcing terms
X1 and X2 represent the effects introduced by the solidification process on the flow field;
they are effective only over the interface region. Y (φ, T ) is a modification to the Stefan
condition caused by the buoyancy force term; W (u, v, φ, T ) is viscous dissipation in the
energy equation, and it can usually be neglected since it is small compared with other
dissipative terms in that equation.
On the domain Ω ≡ [0, l] × [0, l], prescribed boundary conditions are
u=0
u=U
v≡0
∂p
=0
∂n
∂φ
=0
∂n
∂T
=0
∂n
on ∂Ω\{(x, y)|y = l} ,
on {(x, y)|y = l} ,
on ∂Ω ,
on ∂Ω ∪ ∂Ω0 ,
on ∂Ω ,
on ∂Ω .
Initial conditions are
u0 = v 0 = 0
φ0 = 0 and T0 < Tm
on Ω ,
in Ω0 ≡ {(x, y) | |x| + |y| ≤ lc , (x, y) ∈ [−lc , lc ]2 } ,
where lc is one half the length of the diagonal of a 45◦ -rotated square in the center of the
domain.
3. NUMERICAL METHODS AND RESULTS
The governing equations (1b–e) are four coupled nonlinear parabolic equations in conserved form. We apply a projection method due to Gresho [8] to solve the momentum
equations while preserving the divergence-free constraint (1a). The Shumann filter [9] is
applied to solutions of the momentum equations (1b), (1c) prior to projection to remove
aliasing due to under resolution. Since time-splitting methods are efficient for solving
multi-dimensional problems by decomposing them into sequences of 1-D problems, a δform Douglas & Gunn [10] procedure is applied to the current model. Quasilinearization
of Eqs. (1b–d) is constructed by Fréchet–Taylor expansion in “δ-form” as described by
Ames [11] and Bellman and Kalaba [12].
Parallelization of Phase-Field Model in Flow Field
5
The computations are performed on a square domain Ω ≡ [0, 63 cm] × [0, 63 cm]. Initially, the currently-used material, nickel, is supercooled by an amount ∆T = Tm −
T = 224K, and freezing begins from a small, rotated square with half-diagonal length
lc = 1.89 cm in the center of the domain. The numerical spatial and time step sizes are
∆x = ∆y = 0.315 cm, ∆t = 10−6 s, respectively, and the length scale for the interfacial
thickness is δ = 0.105 cm. The kinetic coeffient µk is chosen to be 2.85 m/s · K.
Figure 1 displays the velocity field at t = 100.1s. Lid-driven-cavity flow is introduced
at t = 100 s with U = 1cm/s. Dendrite shape evolves from the initial rotated square to
an approximate circle as shown in Fig. 1. We observe that velocity near the solid-liquid
interface is greater than that nearby, a direct result of the forcing terms X1 and X2 in
momentum equations. We also have found that the flow field has a significant effect on
the growth rate of dendrites although this is not evident from Fig. 1; in particular, the
growth rate is decreased by the flow field.
Figure 1. Lid-Driven-Cavity Flow with Freezing from Center at t = 100.1s
4. APPROACH TO PARALLELIZATION AND RESULTS
Parallelization of the numerical solution procedure is based on the shared-memory programming paradigm using the HP Fortran 90 HP-UX compiler. The program is parallelized using OpenMP and MPI running on the HP SuperDome at the University of
Kentucky Computing Center to compare parallel performance of these two approaches.
The maximum number of processors available on a single hypernode of the HP Super-
6
Ying Xu et al.
Dome is 64, and in the current study each processor is used to compute one part of the
whole domain. For parallelization studies the grid is set at 201 × 201 points corresponding
to the domain size 63 cm × 63 cm. The procedure for parallelizing two-step Douglas &
Gunn time-splitting with MPI is to compute different parts of the domain on different
processors, i.e., simply a crude form of domain decomposition. In particular, we divide the
domain into n equal pieces along the separate directions corresponding to each split step,
where n is the number of processors being used. That is, we first divide the domain in
the x direction during the first time-splitting step, and then in the y direction during the
second step. Therefore, transformations of data between each processor are required during the two steps of time-splitting. The sketch of this is shown in Fig. 2. Moreover, data
transformations are also needed between adjacent boundaries of each processor. Since
communication between processors increases with increasing number of processors for a
fixed number of grid points, parallel performance is expected to decrease in such a case,
resulting in only a sub-linear increase of speed-up for MPI.
processor
processor n
1
2
…
…
n
processor 2
y
y
processor 1
x
x
splitting step I
splitting step II
Figure 2. Distrubution of Processors for Two-Level Douglas & Gunn Time-Splitting
The implementation of OpenMP, on the other hand, is quite straightforward. It can
be done by automatic parallelization of DO loops. All that is necessary is to share the
information required by the parallelization within the DO loop, and this is easily handled
with the OpenMP syntax.
To study the speed-up achieved by parallelization, different numbers n of processors
(n = 1, 2, 4, 8, 16, 32) are used to execute the algorithm until t = 5 × 10−3 s for both
OpenMP and MPI. Figure 3 displays the speed-up factor versus number of processors.
It shows that, as the number of processors increases, the speed-up factors increase only
7
Parallelization of Phase-Field Model in Flow Field
sub-linearly for both OpenMP and MPI. Moreover, the speed-up performance of MPI is
better than that of OpenMP. The curve for OpenMP in Fig. 3 also suggests that the
speed-up factor attains its maximum at a number of processors only slightly beyond 32
for the present problem. Moreover, it is clear that parallel efficiency is quite low for
OpenMP already by 16 processors, so this is possibly the maximum number that should
be used. It should also be mentioned that even though the MPI implementation has
not yet been completely optimized, the CPU time of MPI runs is somewhat less than
that for OpenMP. Better performance could be achieved if further optimization of MPI is
applied within the context of the current algorithm. Such optimization might include use
of nonblocking communication, sending noncontiguous data using pack/unpack functions,
decreasing unnecessary blocking, and optimizing the number of Douglas & Gunn split line
solves simultaneously sent to each processor. The last of these can significantly alter the
communication time to compute time tradeoff.
40
theoretical speed-up
MPI results
OpenMP results
Speedup
30
20
10
0
0
10
20
30
40
Number of Processors
Figure 3. Speed-up Performance of Parallelized Phase-Field Model with Convection
5. SUMMARY AND CONCLUSIONS
In this paper we have compared parallel performance of OpenMP and MPI implemented for the 2-D phase-field model in a flow field. We found that MPI is both more
efficient and also exhibited higher absolute performance than OpenMP. Moreover, it requires less memory than does OpenMP since MPI supports distributed memory while
OpenMP supports shared-memory programming. Therefore, since memory requirements
8
Ying Xu et al.
for our current problem are high, MPI is recommended for such problems. However, the
implementation of MPI is more difficult than that of OpenMP. For example, programming
effort for the current problem using MPI was approximately 100 times greater than that
using OpenMP.
6. ACKNOWLEDGEMENTS
This work is supported by Center for Computational Science of University of Kentucky.
We are also grateful to the University of Kentucky Computing Center for use of their HP
SuperDome for all the computations.
REFERENCES
1. D. M. Anderson, G. B. McFadden, and A. A. Wheeler. A phase-field model of solidification with convection. Physica D, 135:175–194, 2000.
2. D. M. Anderson, G. B. McFadden, and A. A. Wheeler. A phase-field model with
convection: sharp-interface asymptotics. Physica D, 151:305–331, 2001.
3. C. Beckermann, H.-J. Diepers, I. Steinbach, A. Karma, and X. Tong. Modeling melt
convection in phase-field simulations of solidification. J. Comput. Phys., 154:468–496,
1999.
4. Nabeel Al-Rawahi and Gretar Tryggvason. Numerical simulation of dendritic solidification with convection: Two-dimensional geometry. J. Comput. Phys., 180:471–496,
2002.
5. A. A. Chernov. Surface morphology and growth kinetics. In R. Ueda and J. B. Mullin,
editors, Crystal Growth and Characterization, pages 33–52. North-Holland Publishing
Co., Amsterdam, 1975.
6. A. Ookawa. Physical interpretation of nucleation and growth theories. In R. Ueda and
J. B. Mullin, editors, Crystal Growth and Characterization, pages 5–19. North-Holland
Publishing Co., Amsterdam, 1975.
7. J. S. Langer. Models of pattern formation in first-order phase transitions. In G. Grinstein and G. Mazenko, editors, Directions in Condensed Matter Physics, pages 164–
186. World Science, Singapore, 1986.
8. P. M. Gresho. On the theory of semi-implicit projection methods for viscous incompressible flow and its implementation via a finite element method that also introduces
a nearly consistent mass matrix. part 1: Theory. Int. J. Numer. Meth. Fluids, 11:587–
620, 1990.
9. F. G. Shuman. Numerical method in weather prediction: smoothing and filtering.
Mon. Weath. Rev., 85:357–361, 1957.
10. J. Douglas Jr. and J. E. Gunn. A general formulation of alternating direction methods,
part 1. parabolic and hyperbolic problems. Numer. Math., 6:428–453, 1964.
11. W. F. Ames. Numerical Methods for Partial Differential Equations. Academic Press,
New York, NY, 1977.
12. R. E. Bellman and R. E. Kalaba. Quasilinearization and Nonlinear Boundary-Value
Problems. American Elsevier Publishing Company, Inc., New York, NY, 1965.
Download