ab initio calculations Abstract J.E. Pask

advertisement
Finite element methods in ab initio electronic structure
calculations
J.E. Pask∗ and P.A. Sterne
Lawrence Livermore National Laboratory,
University of California, Livermore, California 94550
(Dated: March 29, 2005)
Abstract
We review the application of the finite element (FE) method to ab initio electronic structure
calculations in solids. The FE method is a general approach for the solution of differential and
integral equations which uses a strictly local, piecewise polynomial basis. Because the basis is
composed of polynomials, the method is completely general and its accuracy is systematically
improvable. Because the basis is strictly local in real space, the method allows for variable resolution
in real space; produces sparse, structured matrices, enabling the effective use of iterative solution
methods; and is well suited for parallel implementation. The method thus combines significant
advantages of both real-space-grid and basis-oriented approaches, and so is well suited for large,
accurate ab initio calculations. We review the construction and properties of the required FE bases
and their use in the self-consistent solution of the Kohn-Sham equations of density functional theory.
PACS numbers: 02.70.Dh, 71.15.-m, 71.70.+z
∗
Electronic address: pask1@llnl.gov
1
I.
INTRODUCTION
Over the course of the past few decades, the density functional theory (DFT) of Hohenberg, Kohn, and Sham [1, 2] has proven to be an accurate and reliable basis for the
understanding and prediction of a wide range of materials properties from first principles
(ab initio), with no experimental input or empirical parameters. However, the solution of
the equations of DFT is a formidable task and this has limited the range of physical systems which can be investigated by such rigorous, quantum mechanical means. In order to
extend the interpretive and predictive power of such quantum mechanical theories further
into the domain of “real materials”, involving nonstoichiometric deviations, defects, grain
boundaries, surfaces, interfaces, and the like, robust and efficient methods for the solution of
the associated quantum mechanical equations are of critical importance. The finite element
(FE) method [3–5] is a general method for the solution of partial differential and integral
equations which has found wide application in diverse areas of research. Here, we discuss
its application to large-scale ab initio electronic structure calculations of solids.
While the FE method is a general approach, capable of providing accurate solutions to
both all-electron [6] and pseudopotential [7] formulations of the equations of DFT, its application to all-electron problems in molecules and solids has so far been limited by the
large number of basis functions required to adequately describe all-electron solutions near
nuclei, where the solutions can have cusps and oscillate rapidly. White et al. [8] applied the
method to isolated monatomic and diatomic all-electron problems, and found that as many
as 105 basis functions per atom were required to achieve sufficient accuracy using uniform
meshes. Tsuchida and Tsukada [9] have applied the method to isolated H2 molecules using
nonuniform meshes. They found also that, even with nuclei at nodal points in the mesh
to facilitate the modeling of cusps, many thousands of basis functions were still required
to achieve the desired accuracy. This is to be compared with the tens or at most hundreds of basis functions per atom required by conventional all-electron methods such as the
linearized augmented planewave [6] and linearized muffin tin orbital [10] methods, which
employ problem-specific bases constructed from isolated atomic solutions. Such methods
lack the generality of the FE method but, by building known physics into the basis, can be
far more efficient than the FE method for such problems. For the FE method to become
competitive with conventional methods in the all-electron context, specialized basis functions, such as isolated atomic solutions or Gaussians, will likely need to be added to the
standard FE basis to increase the efficiency of the representation, as in the recent work of
Yamakawa and Hyodo [11], for example, on molecules and clusters. In the pseudopotential
context, however, where solutions are much smoother, and simpler, more general basis sets
are typically employed, the FE method is immediately applicable and we focus upon this
context here.
Like the traditional planewave (PW) method [7], the FE method is a variational expansion approach, in which solutions are represented as a linear combination of basis functions.
However, whereas the PW method employs a Fourier basis, with every basis function overlapping every other, the FE method employs a basis of strictly local piecewise polynomials,
each overlapping only its immediate neighbors. Because the FE basis consists of polynomials, the method is completely general and systematically improvable, like the PW method.
Because the FE basis is strictly local, however, the method offers some significant advantages
with respect to large-scale calculations. First, because the FE basis functions are localized,
they can be concentrated where needed in real space to increase the efficiency of the rep-
2
resentation. The PW basis has the same resolution at all points in space and this leads
to substantial inefficiencies in the solution of highly inhomogeneous problems such as those
involving first-row or transition-metal atoms. Recent progress on this issue in the context of
PW bases includes ultrasoft pseudopotentials [12], optimized pseudopotentials [13, 14], and
adaptive coordinate transformations [15–17]. Second, the FE method can accommodate a
wide variety of boundary conditions: Dirichlet boundary conditions for molecules or clusters,
Bloch boundary conditions for crystals, a mixture of these for surfaces, etc. The PW method
is limited to periodic boundary conditions, which leads to inefficiencies in cluster and surface
calculations in particular, where large vacuum regions must be employed to minimize interactions among periodic images. Finally, and most significantly for large-scale calculations,
the strict locality of the FE basis facilitates implementation on massively parallel computational architectures by minimizing the need for nonlocal communications. PW methods
rely on Fourier transforms which can lead to substantial inefficiencies on such distributed
architectures due the need for every node to communicate with every other. Recent progress
on this issue in the context of PW bases includes new fast Fourier transform formulations
[18, 19] and the pioneering work of Skylaris et al. [20] reformulating the PW basis in terms
of periodic bandwidth limited delta functions.
The advantages of a strictly local, real-space approach in large-scale calculations have
been amply demonstrated in the context of finite difference (FD) methods [21–31]. These
methods allow for some variable resolution in real space, can accommodate a variety of
boundary conditions, and require no computation- or communication-intensive transforms.
However, FD methods achieve these advantages by giving up the use of a basis altogether,
which leads to disadvantages such as limited accuracy in integrations and nonvariational
convergence. By retaining the use of a basis while remaining strictly local in real space, FE
methods combine significant advantages of both PW and FD approaches.
The FE method [3–5] has had a long history of success in diverse applications ranging
from civil engineering to quantum mechanics. Applications in engineering go back to the
1950s. Applications to the electronic structure of isolated atomic and molecular systems
began to appear in the 1970s [32]. White, Wilkins, and Teter [8] applied the method to
full 3D atomic and molecular calculations in 1989 and demonstrated the favorable scaling
of the method with the number of basis functions afforded by the strict locality and realspace nature of the basis. Applications to solids have appeared more recently. Hermansson
and Yevick [33] applied the FE method to non-self-consistent solid-state electronic structure calculations in 1986, finding that it was less efficient than the PW method for small
problems with relatively smooth potentials, when uniform meshes and cubic- or lower-order
bases were employed. Subsequent work has focused on the application of the method to
large-scale calculations, where the strict locality of the basis can provide significant advantages in the context of massively parallel computations. Tsuchida and Tsukada [9, 34–37]
have applied the method to full self-consistent molecular and solid-state electronic structure
calculations. They have implemented nonuniform meshes, adaptive coordinate transformations, pseudopotential and all-electron calculations, molecular dynamics, and an O(N)
formulation for insulating materials; and have demonstrated the favorable efficiency of the
method relative to FD approaches. Pask et al. [38–41] have formulated a general FE based
electronic structure method, allowing treatment of metallic as well as insulating systems,
arbitrary Bravais lattices, and full Brillouin zone sampling. They have addressed, in particular, the enforcement of the required Bloch-periodic boundary conditions in the context of a
general C 0 basis, the use of nonlocal operators in the context of self-consistent calculations,
3
and the computation of the crystal potential and total energy in real space. Recently, Jun
[42] has reported initial non-self-consistent results using a mesh-free FE basis to solve the
Schrödinger equation subject to Bloch-periodic boundary conditions in an arbitrary Bravais
lattice with arbitrary Brillouin zone sampling.
The solution of the equations of DFT can be accomplished by a number of approaches,
including direct minimization of the total energy functional [43], solution of the associated
Lagrangian equations [44], and self-consistent solution of the Kohn-Sham equations [1]. In
the context of solids, Tsuchida and Tsukada [9, 34, 35] have formulated an FE based direct
minimization approach for large-scale calculations of insulating systems. Pask et al. [38–
41] have formulated an FE based Kohn-Sham approach appropriate for both metallic and
insulating systems. The Kohn-Sham approach requires the repeated solution of coupled
Schrödinger and Poisson equations with rapidly varying potentials and densities, subject
to Bloch-periodic and periodic boundary conditions, respectively, in general parallelepiped
domains. In practice, the solution of these equations constitutes the most computationally
intensive part of the Kohn-Sham solution process and it is here where different ab initio
methods mainly differ: the choice of methods employed to solve the required Schrödinger
and Poisson equations.
Here, we review the solution of the equations of DFT in solids using the finite element
method. In Sec. II, we discuss the construction and properties of the required FE bases. In
Sec. III, we discuss the solution of the Schrödinger and Poisson equations, subject to the
required boundary conditions, using these bases. In Sec. IV, we discuss the use of these
solutions in the self-consistent solution of the Kohn-Sham equations. In Sec. V, we discuss
the computation of the density-functional total energy. In Sec. VI, we discuss applications.
And we conclude with a summary and outlook in Sec. VII.
II.
FINITE ELEMENT BASES
Finite element bases are, in the end, just well chosen sets of strictly local, piecewise
polynomials. However, there are a great many variations, of various dimensions, geometries,
polynomial orders, and continuities [3, 5], and we do not propose to review them all here.
Rather, we focus on a broad and widely used class of bases which encompasses those most
commonly employed in practice and most often adopted in the quantum-mechanical electronic structure context in particular: nodal bases, defined by their values and/or derivatives
at specified points in the domain.
The construction and key properties of such FE bases are perhaps best conveyed in the
simplest case: a one-dimensional (1D), piecewise-linear basis. Figure 1 shows the steps involved in the construction of such a basis on a domain Ω = (0, 1). The domain is partitioned
into subdomains called elements (Fig. 1(a)). In this case, the domain is partitioned into
three elements Ω1 –Ω3 ; in practice, there are typically many more, so that each element encompasses only a small fraction of the domain. For simplicity, we have chosen a uniform
partition but this need not be the case in general. (Indeed, it is precisely the flexibility to
partition the domain as desired which allows for the substantial efficiency of the basis in
highly inhomogeneous problems.) A parent basis φ̂i is then defined on the parent element
Ω̂ = (−1, 1) (Fig. 1(b)). In this case, the parent basis functions are φ̂1 (ξ) = (1 − ξ)/2 and
φ̂2 (ξ) = (1 + ξ)/2. Since the parent basis consists of two (independent) linear polynomials,
it is complete to linear order, i.e., a linear combination can represent any linear polynomial
exactly. Furthermore, it is defined such that each function takes on the value 1 at exactly
4
one point, its associated node, and vanishes at all (one, in this case) other nodes. Local basis
(e)
functions φi are then generated by transformations ξ (e) (x) of the parent basis functions
φ̂i from the parent element Ω̂ to each element Ωe (Fig. 1(c)). In present case, for example,
(1)
(1)
φ1 (x) ≡ φ̂1 (ξ (1) (x)) = 1 − 3x and φ2 (x) ≡ φ̂2 (ξ (1) (x)) = 3x, where ξ (1) (x) = 6x − 1.
Finally, the piecewise-polynomial basis functions φi of the method are generated by piecing
(e)
together the local basis functions φi (Fig. 1(d)). This is the process of assembly. In the
present case, for example,
(1)
φ1 (x), x ∈ [0, 1/3],
φ1 (x) =
0,
otherwise,
 (1)

φ2 (x), x ∈ [0, 1/3],
φ2 (x) = φ(2)
1 (x), x ∈ [1/3, 2/3],


0,
otherwise.
The above 1D piecewise-linear FE basis possesses the key properties of all such nodal
bases, whether of higher dimension or higher polynomial order. First, the basis functions
are strictly local, i.e., nonzero over only a small fraction of the domain. This leads to sparse
matrices and scalability, as in FD approaches, while retaining the use of a basis, as in PW
approaches. Second, within each element, the basis functions are simple, low-order polynomials, which leads to computational efficiency, generality, and systematic improvability, as
in both FD and PW approaches. Third, the basis functions are C 0 in nature, i.e., continuous but not necessarily smooth. (Higher-order bases can be smoother, but we consider the
C 0 case for generality.) As we shall discuss, this necessitates extra care in the solution of
second-order problems, with periodic boundary conditions in particular. Finally, the basis
functions have the key property
φi (xj ) = δij
i.e., each basis function takes on a value of 1 at its associated node and vanishes at all other
nodes. (Higher-order bases may possess this property with respect to derivatives
as well as
values at nodes.) By virtue of this property, an FE expansion f (x) = i ci φi (x) has the
property f (xj ) = cj , so that the expansion coefficients have a direct, real-space meaning.
This eliminates the need for computationally intensive transforms, such as Fourier transforms
in PW based approaches [7], and facilitates preconditioning in iterative solutions, such as
multigrid in FD approaches [26].
Figure 1(d) shows a general C 0 linear basis, capable of representing any piecewise linear
function (having the same polynomial subintervals) exactly. To solve a problem subject to
vanishing Dirichlet boundary conditions, as occurs in molecular or cluster calculations, one
can restrict the basis as in Fig. 1(e), i.e., omit boundary functions. To solve a problem subject
to periodic boundary conditions, as occurs in solid-state electronic structure calculations,
one can restrict the basis as in Fig. 1(f), i.e., piece together local basis functions across the
domain boundary in addition to piecing together across interelement boundaries. Regarding
this
periodic basis, however, it should be noted that an arbitrary linear combination f (x) =
i ci φi (x) necessarily satisfies
f (0) = f (1)
(1)
but does not necessarily satisfy
f (0) = f (1).
5
(2)
Ω = (0,1)
Ω2
Ω1
(a)
0
Ω3
1/3
φˆ1
1
(b)
ˆ = (−1,1)
Ω
0
ξ1 = −1
0
φ1(1)
φ2(1)
φ1(2)
x
φˆ2
ξ2 = 1
ξ (2) ( x)
ξ (1) ( x)
1
1
2/3
ξ (3) ( x)
φ2(2)
ξ
(transform)
φ1(3)
φ2(3)
(c)
0
x1(1) = 0
φ1
0
x1 = 0
(e)
0
1
(f)
φ2
Periodic
0
x1 = 0
φ3
φ1
x
φ1
φ4
x4 = 1
x
φ2
x1 = 1/ 3
0
x2(3) = 1
x3 = 2 / 3
x2 = 1/ 3
Dirichlet
1
x2(2) = x1(3) = 2 / 3
(assemble)
General
1
(d)
x2(1) = x1(2) = 1/ 3
x2 = 2 / 3
φ2
x2 = 1/ 3
1
φ3
x3 = 2 / 3
x
φ1
1
x
FIG. 1: 1D piecewise-linear FE bases. (a) Domain and elements. (b) Parent element and parent
basis functions. (c) Local basis functions generated by transformations of parent basis functions to
each element. (d) General piecewise-linear basis, generated by piecing together local basis functions
across interelement boundaries. (e) Dirichlet basis, generated by omitting boundary functions. (f)
Periodic basis, generated by piecing together boundary functions.
Thus, unlike PW or other such smooth bases, while the value-periodic condition (1) is
enforced by the use of such an FE basis, the derivative-periodic condition (2) is not. And
so for problems requiring the enforcement of both, as in solid-state electronic structure, the
derivative condition must be enforced by other means [38]. We address this in Sec. III.
Higher-order FE bases are constructed by defining more independent parent basis functions, which requires that some basis functions be of higher order than linear. And, as in the
linear case, what is typically done is to define all functions to be of the same order so that,
for example, to define a 1D quadratic basis, one would define three quadratic parent basis
functions; for a 1D cubic basis, four cubic parent basis functions, etc. Figure 2 shows the
construction of a 1D quadratic basis. With higher-order basis functions, however, come new
possibilities. For example, with cubic basis functions there are sufficient degrees of freedom
to specify both value and slope at end points, thus allowing for the possibility of both value
and slope continuity across interelement boundaries, and so allowing for the possibility of a
6
Ω = (0,1)
Ω2
Ω1
(a)
0
1/3
(b)
ˆ = (−1,1)
Ω
φˆ2
0
ξ1 = −1
φ1(1)
(c)
0
x1(1) = 0
φ3(1)
φ2(1)
x2(1)
φ3(2)
x2(2)
(transform)
φ1(3)
φ3(3)
φ2(3)
x2(3)
x3(2) = x1(3)
x3(3) = 1
φ1
φ2
φ4
φ3
φ5
φ6
φ7
(d)
0
x1 = 0
x2 = 1/ 6
Dirichlet
1
x
(assemble)
General
1
ξ (3) ( x)
φ2(2)
x3(1) = x1(2)
ξ
ξ3 = 1
ξ (2) ( x)
φ1(2)
1
φˆ3
ξ2 = 0
ξ (1) ( x)
1
x
2/3
φˆ1
1
Ω3
φ1
x4 = 1/ 2
x3 = 1/ 3
φ2
φ3
x5 = 2 / 3
x6 = 5 / 6
φ4
x7 = 1
x
φ5
(e)
0
1
x1 = 1/ 6
0
Periodic
φ1
x2 = 1/ 3
φ2
x3 = 1/ 2
φ4
φ3
x4 = 2 / 3
x
x5 = 5 / 6
φ5
φ6
1
φ1
(f)
0
x1 = 0
x2 = 1/ 6
x4 = 1/ 2
x3 = 1/ 3
x5 = 2 / 3
x6 = 5 / 6
x
1
FIG. 2: 1D piecewise-quadratic FE bases. (a) Domain and elements. (b) Parent element and
parent basis functions. (c) Local basis functions generated by transformations of parent basis
functions to each element. (d) General piecewise-quadratic basis, generated by piecing together
local basis functions across interelement boundaries. (e) Dirichlet basis, generated by omitting
boundary functions. (f) Periodic basis, generated by piecing together boundary functions.
Cubic Hermite Parent Basis Functions
φˆ1
φˆ3
φˆ2
φˆ4
FIG. 3: Cubic Hermite parent basis functions φ̂1 –φ̂4 on the parent element Ω̂ = (−1, 1).
7
TABLE I: Basis functions and associated nodes of 3D C 0 cubic parent element. Here, ξ0 = ξi ξ,
η0 = ηi η, and ζ0 = ζi ζ.
(ξi , ηi , ζi )
(±1, ±1, ±1)
(± 13 , ±1, ±1)
(±1, ± 13 , ±1)
(±1, ±1, ± 13 )
φ̂i
1
64 (1
9
64 (1
9
64 (1
9
64 (1
+ ξ0 )(1 + η0 )(1 + ζ0 ){9(ξ 2 + η 2 + ζ 2 ) − 19}
− ξ 2 )(1 + 9ξ0 )(1 + η0 )(1 + ζ0 )
− η 2 )(1 + 9η0 )(1 + ξ0 )(1 + ζ0 )
− ζ 2 )(1 + 9ζ0 )(1 + ξ0 )(1 + η0 )
C 1 (continuous value and slope) rather than C 0 basis. One common family of such bases is
the Hermite family. Higher-order Hermite bases are defined by adding derivative conditions
to existing nodes rather than value conditions at additional nodes. For example, on the
parent element Ω̂ = (−1, 1), the cubic Hermite parent basis functions are defined by the
conditions
φ̂1 (−1) = 1, φ̂1 (−1) = 0, φ̂1 (1) = 0, φ̂1 (1) = 0,
φ̂2 (−1) = 0, φ̂2 (−1) = 1, φ̂2 (1) = 0, φ̂2 (1) = 0,
φ̂3 (−1) = 0, φ̂3 (−1) = 0, φ̂3 (1) = 1, φ̂3 (1) = 0,
φ̂4 (−1) = 0, φ̂4 (−1) = 0, φ̂4 (1) = 0, φ̂4 (1) = 1,
which give
φ̂1 (ξ) = (2 − 3ξ + ξ 3 )/4,
2
3
φ̂2 (ξ) = (1 − ξ − ξ + ξ )/4,
3
(3)
(4)
φ̂3 (ξ) = (2 + 3ξ − ξ )/4,
(5)
φ̂4 (ξ) = (−1 − ξ + ξ 2 + ξ 3 )/4,
(6)
as shown in Fig. 3. Using these parent functions, the corresponding piecewise polynomial FE
(e)
basis is then constructed as before: local basis functions φi are defined by transformations
of the parent functions φ̂i from the parent element Ω̂ to each element Ωe and these are
pieced together to form the final piecewise polynomial basis. However, in this case, four
functions are pieced together across each element boundary rather than two: the two with
matching nonzero values (as in the C 0 case) and the two with matching nonzero derivatives.
The resulting piecewise polynomial basis is then C 1 and complete to cubic order. For
sufficiently smooth problems, such higher order continuity can yield greater accuracy per
degree of freedom [4] and such bases have been used in the electronic structure context
[8, 35]. However, while straightforward in one dimension, in higher dimensions this requires
matching both values and derivatives, including cross terms, across entire curves or surfaces,
which becomes increasingly difficult to accomplish and leads to additional constraints on the
meshes which can be employed [4].
Higher-dimensional FE bases are constructed along the same lines as the 1D case: partition the domain into elements, define local basis functions within each element via transformations of parent basis functions, and piece together the resulting local basis functions
to form the final piecewise-polynomial basis. In higher dimensions, however, there arises a
8
( 1, 1, 1)
(-1, 1, 1)
( 1,-1, 1)
(-1,-1, 1)
(-1, 1,-1)
1, 1,-1)
((1,-1,-1)
(-1,-1,-1)
( 1,-1,-1)
FIG. 4: Three-dimensional C 0 cubic parent element and nodes (denoted by open circles).
significant additional choice: that of shape. The most common 2D element shapes are triangles and quadrilaterals. In 3D, tetrahedra, hexahedra (e.g., parallelepipeds), and wedges are
among the most common. A variety of shapes have been employed in atomic and molecular
calculations [32]. In solid-state electronic structure calculations, the domain can be reduced
to a parallelepiped and C 0 [9, 38] as well as C 1 [35] parallelepiped elements have been employed. Pask et al. [38] have employed a 3D C 0 cubic serendipity basis [5] with parent basis
functions and associated nodes as shown in Table I and Fig. 4. Tsuchida and Tsukada [35]
have employed a 3D C 1 cubic Hermite basis with parent basis functions defined by Cartesian
products of the 1D functions (3)–(6) shown in Fig. 3.
III.
SOLUTION OF THE SCHRÖDINGER AND POISSON EQUATIONS
Having established the key properties of the relevant FE bases, we turn now to their
use in solution of the Kohn-Sham equations of density functional theory in solids. In this
section, we discuss the solution of the required Schrödinger and Poisson equations and in the
next, we discuss the use of these solutions in the self-consistent solution of the Kohn-Sham
equations. For generality, we shall follow the development of Pask et al. [38–41] in terms of
an arbitrary C 0 basis. Since C 1 and smoother bases are contained in C 0 , the development
applies to all such smoother bases as well.
In a perfect crystal, the electronic potential is periodic, i.e.,
V (x + R) = V (x)
(7)
for all lattice vectors R, and the solutions of the Schrödinger equation satisfy Bloch’s theorem
ψ(x + R) = eik·R ψ(x)
(8)
for all lattice vectors R and wavevectors k [45]. Thus the values of V (x) and ψ(x) throughout
the crystal are completely determined by their values in a single unit cell; and so the solutions
of the Poisson and Schrödinger equations in the crystal can be reduced to their solutions in a
single unit cell, subject to boundary conditions consistent with Eqs. (7) and (8), respectively.
9
R3
R2
R1
FIG. 5: Parallelepiped unit cell (domain) Ω, boundary Γ, surfaces Γ1 –Γ3 , and associated lattice
vectors R1 –R3 .
A.
1.
Schrödinger equation
Differential formulation
We consider first the Schrödinger problem:
− 12 ∇2 ψ + V ψ = εψ
(9)
in a unit cell, subject to boundary conditions consistent with Bloch’s theorem, where V is
an arbitrary periodic potential (atomic units are used throughout). Since V is periodic, ψ
can be written in the form
(10)
ψ(x) = u(x)eik·x ,
where u is a complex, cell-periodic function satisfying u(x) = u(x + R) for all lattice vectors
R [45]. Assuming the form (10), the Schrödinger equation (9) becomes
− 12 ∇2 u − ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u = εu,
(11)
where, allowing for the possibility of nonlocality, V l and V nl are the local and nonlocal parts
of V . From the periodicity condition (8), the required boundary conditions on the unit cell
are then [45]
u(x) = u(x + Rl ), x ∈ Γl ,
n̂ · ∇u(x) = n̂ · ∇u(x + Rl ), x ∈ Γl ,
(12)
(13)
where Γl and Rl are the surfaces of the boundary Γ and associated lattice vectors R shown
in Fig. 5, and n̂ is the outward unit normal at x. The required Bloch-periodic problem in
the infinite crystal can thus be reduced to the periodic problem (11)–(13) in the finite unit
cell.
Since the domain has been reduced to the unit cell, however, nonlocal operators require
further consideration. In particular, if as is typically the case for ab initio pseudopotentials,
the domain of definition is all space (i.e., the full crystal), they must be transformed to the
relevant finite subdomain (i.e., the unit cell). For a separable potential of the usual form [7]
a
a
vlm
(x − τa − Rn )hal vlm
(x − τa − Rn ),
(14)
V nl (x, x ) =
n,a,l,m
10
where n runs over all lattice vectors and a runs over atoms in the unit cell, the nonlocal
term e−ik·x V nl eik·x u in Eq. (11) is
−ik·x
a
a
a
dx vlm
vlm (x − τa − Rn )hl
(x − τa − Rn )eik·x u(x ),
(15)
e
n,a,l,m
where the integral is over all space. Rewriting the integral over all space as a sum over all
unit cells and using the cell-periodicity of u, the integral in (15) can be transformed to the
unit cell as follows:
a
dx vlm
(x − τa − Rn )eik·x u(x)
a
dx vlm
(x − τa − Rn − Rn )eik·x e−ik·Rn u(x − Rn )
=
n
=
Ω
n
Ω
a
dx vlm
(x − τa − Rn )eik·x eik·Rn e−ik·Rn u(x),
so that the nonlocal term e−ik·x V nl eik·x u in Eq. (11) becomes
−ik·x
ik·Rn a
a
a
e
vlm (x − τa − Rn )hl
dx
e−ik·Rn vlm
(x − τa − Rn )eik·x u(x ).
e
a,l,m
Ω
n
n
(16)
Having reduced the required problem to a periodic problem on a finite domain, solutions
may be obtained using a periodic FE basis defined in that domain. However, if the basis is
C 0 , as is typically the case, rather than C 1 or smoother, some additional consideration is
required. First, the direct application of the Laplacian to such a basis is clearly problematic.
Second, being periodic in value, consistent with (12), but not necessarily in derivative,
contrary to (13), the basis does not satisfy the required boundary conditions. Both issues
can be resolved by reformulating the original differential formulation (11)–(13) in weak
(integral) form.
2.
Weak formulation
An equivalent integral equation can be formed by taking the inner product of the differential equation (11) with an arbitrary1 “test function” v:
v ∗ − 12 ∇2 u − ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u − εu dx = 0.
Ω
The order of the highest derivative can now be reduced and a boundary term can be created
(whose usefulness will become clear presently) by integrating the ∇2 term by parts:
∗
1
1
∇v · ∇u dx − 2 v ∗ ∇u · n̂ ds
2
Ω
1
Γ
We assume that all functions involved are sufficiently regular to keep the integrals well defined, as will be
the case in our applications. For details regarding the relevant functions spaces see, e.g., Ref. [4].
11
+
Ω
v ∗ −ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u − εu dx = 0.
The derivative boundary condition (13) can now be incorporated by restricting v to
v ∈ V = {v : v(x) = v(x + Rl ), x ∈ Γl } ,
i.e., to satisfy the value condition (12). Then, using the fact that the domain is a parallelepiped, the boundary term can be written as
1
v ∗ (x) [∇u(x) − ∇u(x + Rl )] · n̂ ds,
2
l
Γl
which vanishes upon the assertion of the derivative boundary condition. Thus, the differential equation (11) and derivative boundary condition (13) together imply the integral
equation
∗
∗
l
−ik·x nl ik·x
1
1 2
∇v
·
∇u
dx
+
v
k
u
+
V
u
+
e
V
e
u
−
εu
dx = 0 ∀v ∈ V.
−ik
·
∇u
+
2
2
Ω
Ω
(17)
Conversely, it can be shown [39] that the integral equation (17) implies both the differential
equation and derivative boundary condition. The differential formulation (11)–(13) is thus
equivalent to the following weak formulation: Find the scalars ε and functions u ∈ V such
that
∗
1
∇v · ∇u dx + v ∗ −ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u dx =
2
Ω
Ω
ε v ∗ u dx ∀v ∈ V (18)
Ω
or, more concisely,
h(v, u) = εs(v, u) ∀v ∈ V,
(19)
where V = {v : v(x) = v(x + Rl ), x ∈ Γl }. This formulation contains no derivatives higher
than first order and requires only value-periodicity (i.e., Eq. (12)) of the approximation
space V, thus resolving both of the aforementioned Laplacian and boundary condition issues
in the context of a C 0 basis.
3.
Discretization
Having reformulated the problem in weak form, discretization can now proceed in a C 0
FE basis. Let
n
n
u=
cj φj and v =
d i φi ,
j=1
i=1
where {φk }nk=1 is a real C 0 basis satisfying the value-periodic condition (12) and {cj } and
{di } are complex coefficients, so that u and v are restricted to a finite dimensional subspace
Vn ⊂ V. Then (19) becomes: Find the scalars ε and coefficients cj such that
d∗i cj h(φi , φj ) = ε
d∗i cj s(φi , φj ) ∀{di },
i,j
i,j
12
which implies
cj h(φi , φj ) = ε
j
cj s(φi, φj )
j
due to the arbitrariness the {di }. And so we arrive at a generalized
eigenproblem determining
the approximate eigenvalues ε and eigenfunctions u = j cj φj of the weak formulation and
thus of the required problem:
Hij cj = ε
Sij cj ,
(20)
j
where
Hij =
Sij =
Ω
Ω
dx
j
l
−ik·x nl ik·x
1 2
,
∇φ
·
∇φ
−
ik
·
φ
∇φ
+
k
φ
φ
+
V
φ
φ
+
φ
e
V
e
φ
i
j
i
j
i
j
i
j
i
j
2
2
1
dx φiφj .
(21)
(22)
For a separable potential of the usual form (14), transformed to the unit cell as in (16), the
nonlocal term in (21) becomes
aj ∗
ai a
dx φi(x)e−ik·x V nl eik·x φj (x) =
flm
hl flm
,
(23)
Ω
a,l,m
where
ai
flm
=
Ω
dx φi(x)e−ik·x
a
eik·Rn vlm
(x − τa − Rn ).
n
Since Ω dxφi ∇φj = − Ω dxφj ∇φi for all φi, φj satisfying the value-periodic condition (12),
the local part of H is Hermitian while S is real symmetric. Also, as is clear from (23), for
a separable potential of the usual form (14), the nonlocal part of H is Hermitian as well.
a
are local in
Furthermore, since both the FE basis functions φi and nonlocal projectors vlm
real space, both H and S are sparse, with O(n) nonzero elements, where n is the number of
basis functions. And since the basis functions are simple, low-order polynomials, the matrix
elements can be evaluated to any desired accuracy with minimal computation. Thus, like
the PW method, the FE method is variational and matrix elements can be readily evaluated
to any desired accuracy, independent of the mesh. And like the FD method, the resulting
matrix equations are sparse and well suited for solution by modern iterative methods.
4.
Results
Figure 6 shows a series of FE results for a Si pseudopotential [38, 46]. Since the method
allows for the direct treatment of any Bravais lattice, results are shown for a two-atom
fcc primitive cell. The figure shows the sequence of band structures (eigenvalues εi vs.
wavevector k along a path through high-symmetry points in the Brillouin zone) obtained
for 3 × 3 × 3, 4 × 4 × 4, and 6 × 6 × 6 uniform meshes vs. exact values at selected k points
(where “exact values” were obtained from a well converged PW calculation). The variational
nature of the method is clearly exhibited: the error is strictly positive and the entire band
structure converges rapidly and uniformly from above as the number of basis functions is
increased. Figure 7 shows a more quantitative view of the convergence: the fractional error
13
Si
Energy (eV)
15
10
3x3x3
4x4x4
6x6x6
5
FE
Exact
0
*
L
X
FIG. 6: Finite element (FE) and exact band structures for a Si pseudopotential. FE results are for
a series of uniform meshes: 3×3×3, 4×4×4, and 6×6×6 in the fcc primitive cell. The convergence
is rapid and variational: the entire band structure converges from above, with excellent agreement
for the 6 × 6 × 6 mesh.
log10 Elements in each direction
10
2
10
3
10
4
10
5
10
6
0.9
1
1.1
1.2
1.3
Si
3
EE
k
.123, .234, .345 2 a
cubic C0 elements
2
E1
E2
E3
E4
E5
4
slope
6
log 10
EE
0.8
5
6
6
8
10
12
14 16
Elements in each direction
18 20
FIG. 7: Fractional error ∆E/E of FE eigenvalues at an arbitrary k point for a Si pseudopotential.
FE results are for a series of uniform meshes from 6 × 6 × 6 to 20 × 20 × 20 in the fcc primitive cell.
The variational nature and consistent, sextic convergence of the method are clearly exhibited: the
error is strictly positive and rapidly establishes a constant asymptotic slope of ≈ −6 on the log-log
scale.
14
of the first few eigenvalues with increasing numbers of elements, at an arbitrary k point.
The variational nature and consistent, sextic convergence of the method are clearly shown:
the error is strictly positive and rapidly establishes a constant asymptotic slope of ≈ −6 on
the log-log scale, indicating an error of O(h6 ), where h is the mesh spacing, consistent with
asymptotic convergence theorems for the cubic-complete case [4].
B.
Poisson equation
The Poisson solution proceeds along the same lines as the Schrödinger solution: differential formulation, weak formulation to address Laplacian and boundary condition issues,
then discretization in the appropriate C 0 basis. In this case, the required problem is
∇2 V (x) = 4πρ(x), x ∈ Ω,
(24)
V (x) = V (x + Rl ), x ∈ Γl ,
n̂ · ∇V (x) = n̂ · ∇V (x + Rl ), x ∈ Γl ,
(25)
(26)
subject to boundary conditions
where V (x) is the potential energy of an electron in the charge density ρ(x) and the domain
Ω, bounding surfaces Γl , and lattice vectors Rl are again as in Fig. 5. The weak formulation
of (24)–(26) is [39]: Find V ∈ V such that
(27)
− ∇v · ∇V dx = 4π vρ(x) dx ∀v ∈ V,
Ω
Ω
where V = {v : v(x) = v(x + Rl ), x ∈ Γl }. Subsequent discretization in a real periodic FE
basis φj leads to a symmetric linear system determining the approximate solution V (x) =
j cj φj (x) of the weak formulation and thus of the required problem:
Lij cj = fi ,
(28)
j
where
Lij = − dx ∇φi (x) · ∇φj (x),
Ω
fi = 4π dx φi(x)ρ(x).
(29)
(30)
Ω
As in the FD method, the above matrices are sparse and structured due to the strict locality
of the FE basis, requiring only O(n) storage and O(n) operations for solution by iterative
methods, whereas O(n log n) operations are required in a PW basis.
Figures 8–10 show results for a general triclinic model problem [39] for which an analytic
solution is available. Figure 8 shows the FE and exact solutions for V along a body diagonal
of the triclinic cell, for a series of uniform meshes: 2 × 2 × 2, 4 × 4 × 4, and 6 × 6 × 6. The FE
approximations converge rapidly to the exact solution, with substantial improvement upon
doubling the number of elements in each direction from 2 × 2 × 2 to 4 × 4 × 4, and excellent
agreement for the 6 × 6 × 6 mesh. Furthermore, the approximations converge to the smooth
15
6
Triclinic cell
V s R1 R2 R3
4
2
0
2
2 2 2
4 4 4
6 6 6 Exact
4
0
0.2
0.4
0.6
0.8
1
s
FIG. 8: FE and exact solutions of Poisson’s equation for a model triclinic charge density, for a
series of meshes of increasing resolution. The FE approximations converge rapidly to the smooth,
fully periodic exact solution, though the basis is itself neither smooth nor fully periodic.
0.06
Triclinic cell
e s R1 R2 R3
0.04
0.02
0
0.02
0.04
6 6 6
12 12 12
0.06
0
0.2
0.4
0.6
0.8
1
s
FIG. 9: Error in FE solution of Poisson’s equation for a model triclinic charge density, along a
body diagonal of the cell. The error converges rapidly and uniformly: being of comparable size,
and converging at a comparable rate, at the boundary as elsewhere. The cusps in the error are a
manifestation of the C 0 nature of the approximation.
analytic solution despite the fact that the basis is not smooth (being merely C 0 ) and to
the correct derivative-periodic limit despite the fact that the basis is not derivative-periodic
(being merely value-periodic). The approximations satisfy the value-periodic condition (25)
exactly, by virtue of the basis, and the derivative-periodic condition (26) asymptotically, by
virtue of the weak formulation. A more detailed look at the error of the approximations is
provided by Fig. 9 which shows the error e = VFE − Vexact along the diagonal of the cell for
two meshes: 6 × 6 × 6 and 12 × 12 × 12. The C 0 nature of the approximations is evident
in the cusps (the exact solution being a smooth function in this case). As is clear from
Fig. 8, these cusps correspond to small but in general nonzero differences in slope across
interelement boundaries. These small differences are a natural consequence of basing the
16
Log10 Elements in each direction
10
3
10
4
1
1.2
1.6
Triclinic cell
2
3
Vx
ex
10
1.4
as
y
5
mp
tot
ic s
4
lop
e
4
5
7
Vx
10
2
Log10 e x
0.8
5
9 11 13
17 21 25 29 35 41
Elements in each direction
FIG. 10: Relative L2 error in FE solution of Poisson’s equation for a model triclinic charge density,
for a series of meshes of increasing resolution. The asymptotic slope of ≈ −4 demonstrates the
optimal quartic convergence of the cubic approximation.
approximation on the weak formulation, which allows the approximation of the smooth,
derivative-periodic solution to be drawn from the larger admissible space of functions which
are neither necessarily smooth nor derivative-periodic. And in allowing the approximation
to be drawn from this larger space, the weak formulation allows for a better approximation
than would be possible were it to be restricted to the smooth, derivative-periodic subspace.
Of course, it is not sufficient that the approximation merely converges to the exact solution; it should do so at the maximum rate consistent with the completeness of the basis.
Figure 10 shows the relative L2 error e/V for a series of meshes ranging from 5 × 5 × 5
to 41 × 41 × 41, where e = VFE − Vexact and · denotes the L2 norm. According to standard
FE convergence theorems [4], the L2 norm of the error of an FE approximation of degree p
should be O(hp+1),2 where h is the mesh spacing. On a log-log plot then, the asymptotic
slope of the error vs. 1/h (or, equivalently, the number of elements in each direction) for a
series of FE approximations of degree p should be −(p + 1). A least squares fit of the last
three points in Fig. 10 yields a slope of −4.02, indicating that the approximations do indeed
converge at the optimal rate consistent with the cubic completeness of the basis employed.
Optimal pointwise convergence has also been shown [39].
IV.
SOLUTION OF THE KOHN-SHAM EQUATIONS
In the pseudopotential approximation [7], the Kohn-Sham equations of density functional
theory [1, 2] are given by
− 12 ∇2 ψi (x) + Veff ψi (x) = εi ψi (x),
2
(31)
Assuming that the solution is sufficiently smooth (as in the present case) or, if not, that appropriate
measures such as mesh gradation and/or the introduction of special basis functions at singularities have
been taken.
17
Veff = VIl + VInl + VH + Vxc ,
VI,a (x),
VIl =
a
VInl ψi
=
nl
(x, x )ψi (x ),
dx VI,a
(33)
(34)
a
ρe (x )
,
|x − x |
Vxc = Vxc (x; ρe ),
fi ψi∗ (x)ψi (x),
ρe = −
VH = −
(32)
dx
(35)
(36)
(37)
i
nl
where ψi and εi are the Kohn-Sham eigenfunctions and eigenvalues, VI,a and VI,a
are the
local and nonlocal parts of the ionic pseudopotential of atom a, ρe is the electronic charge
density, the integrals extend over all space, and the summations extend over all atoms a and
states i with occupations fi . (For simplicity, we omit spin and crystal momentum indices
and consider the case in which the external potential arises from the ions.) The nonlocal part
VInl and exchange-correlation potential Vxc are determined by the choice of pseudopotentials
and exchange-correlation functional, respectively. VIl is the Coulomb potential arising from
the ions and VH is that arising from the electrons (Hartree potential).
Since the eigenfunctions ψi depend on the effective potential Veff in (31), which depends
on the electronic density ρe in (35) and (36), which depends again on the eigenfunctions
in (37), these equations are coupled and must be solved self-consistently. The process is
generally as follows (see, e.g., Ref. [7]): An initial electronic charge density ρin
e is constructed
(e.g., by overlapping atomic charge densities). The effective potential Veff is computed from
Eqs. (32)–(36). The eigenstates ψi are computed by solving the associated Schrödinger
equation (31). And finally, a new electronic density ρe is computed from (37). If ρe is
in
sufficiently close to ρin
e , then self-consistency has been reached; otherwise, a new ρe is
constructed based on ρe (and possibly previous iterates) and the process is repeated until
self-consistency is reached.
In practice, it is often more efficient to compute VH by solving the associated Poisson
equation rather than evaluating the integral (35) directly. And so in practice, the selfconsistent solution of the Kohn-Sham equations requires the repeated solution of Schrödinger
(31) and Poisson (in place of (35)) equations until self-consistency is reached. For large
calculations, involving many atoms, the solution of these equations becomes the time limiting
step. And so it is to the solution of these equations that the FE method is applied, with the
goal of extending the range of physical systems that can be investigated by such rigorous,
quantum mechanical means.
The efficiency of the FE method in the context of large-scale calculations derives from
the strict locality of the FE basis in real space, thus minimizing the need for extensive
interprocessor communications on massively parallel computational architectures. In order
to accomplish an efficient Kohn-Sham solution, however, both FE and non-FE parts of the
calculation must proceed efficiently on such computational architectures. Due to the longrange 1/r nature of the Coulomb interaction, the formulation of these non-FE parts of the
computation in such a way as to avoid extensive interprocessor communications requires
some care. The key to accomplishing this in the context of crystalline solids is to formulate all terms in the effective potential and total energy in terms of net neutral densities in
18
VI,a and ρI,a (a.u.)
Si
ρI,a
VI,a
r (a.u.)
FIG. 11: Local part VI,a of Si pseudopotential [55] and corresponding localized charge density ρI,a .
The potential has a long-range 1/r tail whereas the corresponding density is localized in real space.
order to avoid divergences associated with the long-range Coulomb interaction [47–52] and
accomplish otherwise long-range lattice summations without the need for reciprocal space
(Fourier) transformations. A number of such approaches have been developed in the context
of modern real-space electronic structure methods. Alemany et al. [22] employ a uniform
neutralizing density in a real-space formulation of the Hartree term. Ionic terms are, however, computed as in the conventional reciprocal space formalism, using Fourier transforms.
Other real-space formulations have employed localized neutralizing densities to eliminate
the need for reciprocal space transformations altogether. The formulation of Bachelet et
al. [53] employs Gaussian representations of local ionic densities to construct total densities
and potentials and associated neutral terms amenable to evaluation in real space. Tsuchida
and Tsukada [9, 35] employ a combination of localized and uniform neutralizing densities
to construct neutral terms optimized for evaluation in real space. Ordejón et al. [54] form
neutral terms by expressing potentials and energies in terms of neutral pseudoatomic densities and differences of the crystal density from these. The formulation of Fattebert and
Nardelli [31] employs neutralizing Gaussian densities and associated potentials, in the spirit
of the classical Ewald method [48], to render long-range interactions short ranged. Pask et
al. [41] have developed a real-space formulation for the Kohn-Sham effective potential and
total energy which employs localized ionic densities to replace long-ranged ionic potentials.
We review this formulation below.
In an infinite crystal, VIl and VH are divergent and the total Coulomb potential VC = VIl +
VH within the unit cell depends on ions and electrons far from the unit cell due to the longrange 1/r nature of the Coulomb interaction. The latter constitutes a particular problem for
real-space formulations such as the FE method. Both difficulties may be overcome, however,
by replacing the long-range ionic potentials by the short ranged charge densities which
generate them, and incorporating long-range interactions into boundary conditions on the
unit cell. By construction, the local ionic pseudopotentials VI,a of each atom a vary as −Za /r
(or rapidly approach this) outside their respective pseudopotential cutoff radii rc,a; where Za
is the effective ionic charge and r is the radial distance. They thus correspond, by Poisson’s
equation, to charge densities ρI,a strictly localized within rc,a (or rapidly approaching this).
Figure 11 shows a typical local ionic pseudopotential and corresponding ionic charge density.
19
Energy (a.u.)
GaAs
FE
Exact
6x6x6
9x9x9
12x12x12
Γ
Γ
FIG. 12: Finite element (FE) and exact self-consistent band structures for GaAs in the local density
approximation. FE results are for series of meshes: 6 × 6 × 6, 9 × 9 × 9, and 12 × 12 × 12 in the
fcc primitive cell. The self-consistent FE approximations converge rapidly and uniformly the exact
self-consistent solution as the number of elements is increased.
The total ionic charge density is then
ρI =
ρI,a (x),
(38)
a
where the summation extends over all atoms in the crystal. Since the ionic densities are
localized in real space, however, the summation in the unit cell is in fact finite and readily
performed in real space, unlike the summation of ionic potentials. Having constructed the
ionic charge density in the unit cell, the total charge density ρ = ρI + ρe may then be
constructed and the total Coulomb potential VC = VIl + VH may be computed at once by a
single Poisson solution subject to periodic boundary conditions:
∇2 VC (x) = 4πρ(x);
(39)
whereupon Veff may be evaluated as in (32).
Figure 12 shows FE and exact self-consistent band structures (Kohn-Sham eigenvalues εi
vs. wavevector k along a path through high-symmetry points in the Brillouin zone) for GaAs,
in the local density approximation of density functional theory. “Exact” results were taken
from a standard PW calculation, employing Fourier transforms in the usual way [7, 56, 57].
FE results are for a series of meshes, 6 × 6 × 6, 9 × 9 × 9, and 12 × 12 × 12 in the fcc
primitive cell, using the above real-space formulation of the crystal potential. Both FE
and PW calculations used the same nonlocal pseudopotentials [55] and exchange-correlation
20
functional [58]. The self-consistent FE approximations converge rapidly and uniformly, over
the whole of the k-space path, to the exact self-consistent solution as the number of elements
is increased. A more quantitative look at the error of the FE approximations (see below)
shows that the convergence is in fact optimal with respect to the completeness of the chosen
FE basis.
V.
TOTAL ENERGY
In the pseudopotential approximation, the total energy in density-functional theory is
given by
l
nl
+ EeI
+ Eee + EII + Exc ,
Etot = Ts + EeI
Ts =
fi dx ψi∗ (x)(− 12 ∇2 )ψi (x),
i
l
EeI
nl
EeI
(40)
(41)
dx ρe (x)VIl (x),
=
fi dx ψi∗ (x)VInl ψi (x),
=−
(42)
(43)
i
EII
Exc
ρe (x)ρe (x )
,
|x − x |
Za Za
= 12
,
|τa − τa |
a,a =a
= − dx ρe (x)εxc (x; ρe ),
Eee =
1
2
dxdx
(44)
(45)
(46)
where Za is the ionic charge of atom a at position τa and, as in (31)–(37), the integrals
extend over all space, and the summations extend over all atoms a and a , and states i with
l
occupations fi . Ts is the kinetic energy of the non-interacting system; EeI
, Eee , and EII are
the potential energies associated with the Coulomb interaction between electrons and ions,
nl
is the energy associated with
electrons and electrons, and ions and ions, respectively; EeI
the nonlocal part of the ionic potential; and Exc is the exchange-correlation energy. Ts is
nl
determined by the Kohn-Sham orbitals and occupations, EeI
is determined by the choice of
pseudopotentials, and Exc is determined by the choice of exchange-correlation functional.
In an infinite crystal, the total energy per unit cell may be obtained by restricting the
integrals over x and summation on a in (41)–(46) to the unit cell, while the integrals over x
l
and summation on a remain over all space. In this case, EeI
is divergent and negative while
Eee and EII are divergent and positive due to the long-range 1/r nature of the Coulomb
interaction. However, in terms of the total charge density ρ and Coulomb potential VC , the
l
finite total Coulomb energy per unit cell EC = EeI
+ Eee + EII may be obtained at once:
1
dx ρ(x)VC (x) − Es ,
(47)
EC = − 2
Ω
where Ω is the unit cell and Es is the ionic self-energy per unit cell. The ionic self-energy is
subtracted so that EC corresponds to the conventional density-functional Coulomb energy,
21
which excludes ionic self-energy. This self energy may be computed from the ionic potentials
and associated densities:
1
Es = − 2
(48)
dx ρI,a (x)VI,a (x),
a
where the summation is over atoms in the unit cell and the integrals are over all space. The
integrals are readily evaluated as one dimensional radial integrals over a finite interval by
virtue of the spherical symmetry and short range of the ionic densities in real space. The
remaining ion-ion energy in (47) then corresponds to the point-ion energy (45) by virtue of
the localization of the ionic charge densities within their respective pseudopotential cutoff
radii.
The above formulation exploits the fact that, although the total Coulomb energy per unit
cell depends physically on the contributions of ions and electrons throughout the crystal,
it is determined completely, per unit cell, by the density and potential within the unit
cell; so that no Fourier transforms or Ewald sums are required and the evaluation can be
accomplished in O(N) operations in real space, where N is the number of atoms.
In terms of the total Coulomb energy, the total energy per unit cell is then
nl
Etot = Ts + EC + EeI
+ Exc .
(49)
nl
The orbital dependence in Ts and EeI
can be eliminated in the usual way using the relation
l
nl
dx ρe (x)Veff
(x) + EeI
=
fi εi
(50)
Ts −
Ω
i
l
is the local part of the effective potential
implied by the Kohn-Sham equations, where Veff
which produces Kohn-Sham orbitals ψi and eigenvalues εi according to (31), and ρe is the
electronic charge density corresponding to orbitals ψi according to (37). Combining (46)–
(50), we arrive then at an explicit real-space expression for the total energy per unit cell in
terms of the Kohn-Sham eigenvalues:
Etot =
i
fi εi +
Ω
l
dx ρe (x)Veff
(x) − 12 ρ(x)VC (x) − ρe (x)εxc (x; ρe )
+
1
2
dx ρI,a (x)VI,a (x). (51)
a
l
is the local part of the effective potential which
In the self-consistent solution process, Veff
produces Kohn-Sham orbitals ψi and eigenvalues εi . The electronic charge density ρe is
constructed from the orbitals, the total charge density ρ is constructed from ρe , and the
total Coulomb potential VC , from ρ. With such accounting of self-consistent inputs and
outputs, the resulting expression is variational in the output density ρe and quadratically
convergent [7, 41, 59].
Figure 13 shows the convergence of the FE total energy and Kohn-Sham eigenvalues to
exact values as the number of elements is increased in a self-consistent GaAs calculation at
an arbitrary k point [41]. Here, “exact” values were taken from a standard PW calculation,
employing Fourier transforms and Ewald summations in the usual way [7, 56, 57]. Both FE
and PW calculations used the same nonlocal pseudopotentials [55] and exchange-correlation
functional [58]. FE results are for a series of uniform wavefunction meshes from 8 × 8 × 8 to
22
EFE – EEXACT (a.u.)
GaAs
Etot
E1
E2
E3
Elements in each direction
FIG. 13: Error EFE − EEXACT of finite element (FE) self-consistent total energy and Kohn-Sham
eigenvalues at an arbitrary k point for GaAs in the local density approximation. FE results are for
a series of uniform meshes from 8 × 8 × 8 to 32 × 32 × 32 in the fcc primitive cell. The asymptotic
slope of ≈ −6 on the log-log scale shows that both total energy and eigenvalues converge to exact
values at the optimal theoretical rate consistent with the cubic completeness of the basis.
32 × 32 × 32 in the fcc primitive cell. As in the PW method, higher resolution is employed in
the calculation of the charge density and potential (twice the resolution of the wavefunction
mesh in the present case). The optimal, variational convergence of the FE approximations
to the exact self-consistent solution is clearly exhibited: the error is strictly positive and
monotonically decreasing, with an asymptotic slope of ≈ −6 on a log-log scale, indicating
an error of O(h6 ), where h is the mesh spacing, consistent with the cubic completeness
of the basis employed. This is in contrast to FD approaches where, lacking a variational
foundation, the error can be of either sign and may oscillate.
VI.
APPLICATIONS
Since the application of the finite element method to ab initio electronic-structure calculations of solids is yet in its early stages, and codes are still under development, applications
to date are still relatively few and timing comparisons with other more mature methods and
codes are not yet meaningful. The speed and capacity of FE electronic-structure codes will
increase substantially as efficient massively parallel implementations are constructed and optimized. Given this, it is notable that comparable efficiency with an established planewave
code has already been reported [35] in parallel calculations on a small number of processors.
Among the early applications of the finite element method to large-scale ab initio calculations in solids is the work of Sterne et al. [60] on the calculation of positron distributions
and lifetimes in metals. Positrons injected into solids tend to avoid nuclei and concentrate
preferentially in open volume regions. Shortly after injection, they annihilate with surrounding electrons, emitting radiation that can be detected experimentally. The annihilation rate
depends on the overlap the positron’s distribution with the surrounding electronic distribution, and the overlap of the distributions depends on the nature of defects in the material.
The positron lifetime is thus an indicator of the types of defects present. The positron distribution and lifetime can be calculated by ab initio methods, thus turning the experimental
lifetime measurement into a defect-specific probe. Since the number of atoms required to
23
Positron
distribution
Fe
Cu
FIG. 14: Positron distribution in the vicinity of a Cu precipitate in Fe. Shown is the density in a
slice through the center of a 5488 atom cell. The positron concentrates in the vicinity of the Cu
atoms and is suppressed at the nuclei, resulting in a highly inhomogeneous distribution.
adequately model the relevant defects can be large, and the positron wavefunctions are not
at all atomic-orbital like, the FE method is particularly well suited to this computation.
When a positron enters a solid, it quickly thermalizes and either becomes trapped in a
defect, if present, or enters a low-momentum (small k) Bloch state. Thus, in either case, by
the time a positron annihilates, it is in its lowest-energy state. And so the central problem is
to calculate this lowest-energy quantum state. The calculation of positron states proceeds,
+
then, in exactly the same way as for electronic states: First, an effective potential veff
for
the positron in the material is constructed. Then, the corresponding positron Schrödinger
equation
+
ψ + = ε+ ψ +
(52)
− 12 ∇2 ψ + + veff
(in atomic units) is solved. Once the positron wavefunction ψ + has been calculated, the
positron annihilation rate τ , the inverse of the positron lifetime λ, can be calculated [61]:
2
(53)
λ = 1/τ = πr0 c n+ (x)n(x)Γ[n(x)] dx,
Ω
where r0 is the classical electron radius, n is the ground-state electron charge density, n+ is
the positron charge density (|ψ + |2 ), and Γ is an enhancement factor which takes into account
the fact that the electrons are drawn toward the positron, thus increasing the overlap and
annihilation rate.
Sterne et al. [60] have employed the FE method to compute positron distributions and
lifetimes for various defect structures consisting of hundreds to thousands of atoms. Figure 14
shows the positron distribution (density) in the vicinity of a Cu precipitate in Fe. The
positron concentrates in the vicinity of the Cu atoms and is suppressed at the nuclei, resulting
in a highly inhomogeneous, non-electron-like distribution. Because the distribution is nonelectron-like (concentrating away from the nuclei rather than around them), the generality
of the FE basis makes it particularly well suited for such problems, whereas other electronicstructure bases, such as linear muffin tin orbital bases for example, are less well suited. The
precipitate was modeled by a 5488 atom cell—among the largest such calculations reported
by any method to date. Using a 42 × 42 × 42 uniform mesh, corresponding to ∼ 95 cubic
24
basis functions/atom, a non-self-consistent finite-element calculation of the positron lifetime
required just 17.9 hrs of CPU time on a single processor.
Among more recent applications of the FE method to large-scale ab initio calculations
is the work of Tsuchida et al. [36, 37] on molecular dynamics simulations of liquids. They
employ a C 1 cubic basis with adaptive coordinates [34] to increase the efficiency of the
representation. Separable nonlocal pseudopotentials are employed and Brillouin zone sampling is restricted to the Γ point. In Ref. [36], they apply the method to the study of the
structural and dynamic properties of liquid methanol (CH3 OH), using a generalized gradient
approximation [62, 63] to exchange and correlation to improve upon the local density approximation in modeling hydrogen bonds. Adaptive coordinate transformations were found
to be especially useful to concentrate degrees of freedom in the vicinity of strongly localized
O orbitals. The liquid was modeled by 32 molecules in a cubic cell of 24.41 a.u.3 , for a total
of 192 atoms. The molecular dynamics simulation was performed for 1.7 ps with a time step
of 1.2 fs, for a total of ∼ 1400 steps (each of which requiring a self-consistent calculation).
In Ref. [37], the method was applied to the study of liquid formamide (HCONH2 ). A more
recent generalized gradient approximation [64] to exchange and correlation was employed to
improve modeling of hydrogen bonds. The adaptive coordinate transformations were used
to approximately double the resolution of the basis at the C, N, and O centers. The liquid
was modeled by 100 molecules in a cubic supercell of side 35.5 a.u., for a total of 600 atoms.
The molecular dynamics simulation was carried out for 7 ps, with a time step of 1.33 fs, for
a total of ∼ 5200 steps. The scope of this simulation is on the order of the largest performed
by standard planewave methods and demonstrates the capacity of the finite element method
for such large-scale computations already, at this relatively early stage of development.
VII.
SUMMARY AND OUTLOOK
Because FE bases are simultaneously polynomial and strictly local in nature, FE methods
retain significant advantages of FD methods without sacrificing the use of a basis, and in
this sense, combine advantages of both PW and FD methods for ab initio electronic structure calculations. In particular, while variational and systematically improvable, the FE
method produces sparse matrices and requires no computation- or communication-intensive
transforms; and so is well suited to large, accurate calculations on massively parallel architectures. However, FE methods produce generalized rather than standard eigenproblems,
require more memory than FD based approaches, and are more difficult to implement. Because of the relative merits of each approach, and because FE based approaches are yet at
an early stage of development, it cannot be known which approach will prove superior in the
large-scale ab initio electronic structure context in the years to come [26]. FE based ab initio
calculations already involving hundreds of atoms [37] are a promising indication, however,
and the development and optimization of FE based approaches for a range of large-scale
applications remains a very active area of research.
Acknowledgments
We thank B.M. Klein and C.Y. Fong for their early and substantial contributions to this
research. This work was performed under the auspices of the U.S. Department of Energy by
25
University of California, Lawrence Livermore National Laboratory under Contract W-7405Eng-48.
[1] P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964); W. Kohn and L.J. Sham, ibid.
140, A1133 (1965).
[2] For reviews see, e.g., N.D. Lang, Solid State Phys. 28, 255 (1973); A.K. Rajagopal, Adv.
Chem. Phys. 41, 59 (1979); W. Kohn and P. Vashishta, in Theory of the Inhomogeneous
Electron Gas, edited by S. Lundqvist and N.H. March (Plenum Press, New York, 1983), p. 79;
N.D. Lang, ibid., p. 309; A.R. Williams and V. von Barth, ibid., p. 189; J. Callaway and N.H.
March, Solid State Phys. 38, 135 (1984); R.O. Jones and O. Gunnarsson, Rev. Mod. Phys.
61, 689 (1989).
[3] O.C. Zienkiewicz and R.L. Taylor, The Finite Element Method, 4th ed. (McGraw-Hill, New
York, 1988).
[4] G. Strang and G.J. Fix, An Analysis of the Finite Element Method (Prentice-Hall, Englewood
Cliffs, NJ, 1973).
[5] Finite Element Handbook, edited by H. Kardestuncer et al. (McGraw-Hill, New York, 1987).
[6] D. Singh, Plane waves, pseudopotentials and the LAPW method, Kluwer Academic, 1994.
[7] W.E. Pickett, Comput. Phys. Rep. 9, 115 (1989).
[8] S.R. White, J.W. Wilkins, and M.P. Teter, Phys. Rev. B 39, 5819 (1989).
[9] E. Tsuchida and M. Tsukada, Phys. Rev. B 52, 5573 (1995).
[10] H. L. Skriver, The LMTO Method, Springer-Verlag, 1984.
[11] S. Yamakawa and S. Hyodo, J. Alloys and Compounds 356-357, 231 (2003).
[12] D. Vanderbilt, Phys. Rev. B 41, 7892 (1990); K. Laasonen, R. Car, C. Lee, and D. Vanderbilt,
ibid. 43, 6796 (1991).
[13] A.M. Rappe, K.M. Rabe, E. Kaxiras, and J.D. Joannopoulos, Phys. Rev. B 41, 1227 (1990).
[14] J.S. Lin, A. Qteish, M.C. Payne, and V. Heine, Phys. Rev. B 47, 4174 (1993).
[15] F. Gygi, Europhys. Lett. 19, 617 (1992); Phys. Rev. B 48, 11692 (1993); 51, 11190 (1995).
[16] A. Devenyi, K. Cho, T.A. Arias, and J.D. Joannopoulos, Phys. Rev. B 49, 13373 (1994).
[17] D.R. Hamann, Phys. Rev. B 51, 7337 (1995); 51, 9508 (1995); 54, 1568 (1996).
[18] S. Goedecker, M. Boulet, and T. Deutsch, Comput. Phys. Commun. 154, 105 (2003).
[19] A. Canning and D. Raczkowski, Bull. Am. Phys. 49, 344 (2004).
[20] C.-K. Skylaris, A.A. Mostofi, P.D. Haynes, O. Diéguez, and M.C. Payne, Phys. Rev. B 66,
035119 (2002).
[21] J.R. Chelikowsky, N. Troullier, and Y. Saad, Phys. Rev. Lett. 72, 1240 (1994); J.R. Chelikowsky, N. Troullier, K. Wu, and Y. Saad, Phys. Rev. B 50, 11355 (1995).
[22] M.M.G. Alemany, M. Jain, L. Kronik, and J.R. Chelikowsky, Phys. Rev. B 69, 075101 (2004).
[23] A.P. Seitsonen, M.J. Puska, and R.M. Nieminen, Phys. Rev. B 51, 14057 (1995); M. Heiskanen,
T. Torsti, M.J. Puska, and R.M. Nieminen, Phys. Rev. B 63, 245106 (2001).
[24] F. Gygi and G. Galli, Phys. Rev. B 52, R2229 (1995).
[25] K.A. Iyer, M.P. Merrick, and T.L. Beck, J. Chem. Phys. 103, 227 (1995).
[26] T.L. Beck, Rev. Mod. Phys. 72, 1041 (2000).
[27] T. Hoshi, M. Arai, and T. Fujiwara, Phys. Rev. B 52, R5459 (1995).
[28] E.L. Briggs, D.J. Sullivan, and J. Bernholc, Phys. Rev. B 52, R5471 (1995); 54, 14362 (1996).
[29] N.A. Modine, G. Zumbach, and E. Kaxiras, Phys Rev. B 55, 10289 (1997); U.V. Waghmare,
26
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
H. Kim, I.J. Park, N. Modine, P. Maragakis, and E. Kaxiras, Comput. Phys. Commun. 137,
341 (2001).
J.-L Fattebert, J. Comput. Phys. 149, 75 (1999).
J.-L. Fattebert and M.B. Nardelli, in Handbook of Numerical Analysis, vol. X, Elsevier, Amsterdam, 2003.
For reviews see, e.g., J. Linderberg, Comput. Phys. Rep. 6, 209 (1987); J.J.S. Neto and J.
Linderberg, Comput. Phys. Commun. 66, 55 (1991); and L.R. Ram-Mohan, Finite Element
and Boundary Element Applications in Quantum Mechanics (Oxford University Press, New
York, 2002).
B. Hermansson and D. Yevick, Phys. Rev. B 33, 7241 (1986).
E. Tsuchida and M. Tsukada, Phys. Rev. B 54, 7602 (1996).
E. Tsuchida and M. Tsukada; J. Phys. Soc. Japan 67, 3844 (1998).
E. Tsuchida, Y. Kanada, and M. Tsukada, Chem. Phys. Lett. 311, 236 (1999).
E. Tsuchida, J. Chem. Phys. 121, 4740 (2004).
J.E. Pask, B.M. Klein, C.Y. Fong, and P.A. Sterne, Phys. Rev. B 59, 12352 (1999).
J.E. Pask, B.M. Klein, P.A. Sterne, and C.Y. Fong, Comput. Phys. Commun. 135, 1 (2001).
J.E. Pask and P.A. Sterne, in Handbook of Materials Modeling, Vol. 1 (R. Catlow, H. Shercliff,
and S. Yip Eds.), Kluwer Academic Publishers, 2004, in press.
J.E. Pask and P.A. Sterne, Phys. Rev. B 71, 113101 (2005).
S. Jun, Int. J. Numer. Meth. Engng. 59, 1909 (2004).
M.C. Payne, M.P. Teter, D.C. Allan, T.A. Arias, and J.D. Joannopoulos, Rev. Mod. Phys.
64, 1045 (1992).
T.A. Arias, Rev. Mod. Phys. 71, 267 (1999).
N.W. Ashcroft and N.D. Mermin, Solid State Physics (Holt, Rinehart and Winston, New
York, 1976).
M.L. Cohen and T.K. Bergstresser, Phys. Rev. 141, 789 (1966).
E. Madelung, Phys. Z. 19, 524 (1918).
P.P. Ewald, Ann. Phys. 64, 253 (1921); Göttinger Nachr., Math.-Phys. Kl. II 3, 55 (1937).
E. Wigner and F. Seitz, Phys. Rev. 43, 804 (1933); 46, 509 (1934).
K. Fuchs, Proc. R. Soc. 151, 585 (1935).
R.A. Coldwell-Horsfall and A.A. Maradudin, J. Math. Phys. 1, 395 (1960).
J. Ihm, A. Zunger, and M.L. Cohen, J. Phys. C 12, 4409 (1979).
G.B. Bachelet, H.S. Greenside, G.A. Baraff, and M. Schlüter, Phys. Rev. B 24, 4745 (1981);
G.A. Baraff and M. Schlüter, Phys. Rev. B 28, 2296 (1983); G.A. Baraff and M. Schlüter,
Phys. Rev. B 30, 1853 (1984).
P. Ordejón, E. Artacho, and J.M. Soler, Phys. Rev. B 53, 10441 (1996); J.M. Soler, E. Artacho,
J.D. Gale, A. Garcı́a, J. Junquera, P. Ordejón, D. Sánchez-Portal, J. Phys.: Condens. Matter
14, 2745 (2002).
C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B 58, 3641 (1998).
X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese, L. Sindic, M.
Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami, Ph. Ghosez, J.-Y. Raty, and
D.C. Allan, Comput. Mat. Sci. 25, 478 (2002).
The abinit code is a common project of the Université Catholique de Louvain, Corning Incorporated, and other contributors (URL http://www.abinit.org).
J.P. Perdew and A. Zunger, Phys. Rev. B 23, (1981).
J.R. Chelikowsky and S.G. Louie, Phys. Rev. B 29, 3470 (1984).
27
[60]
[61]
[62]
[63]
[64]
P.A. Sterne, J.E. Pask, and B.M. Klein, Appl. Surf. Sci. 149, 238 (1999).
See, e.g., M.J. Puska and R.M. Nieminen, Rev. Mod. Phys. 66, 841 (1994).
A.D. Becke, Phys Rev. A 38, 3098 (1988).
C. Lee, W. Yang, and R.G. Parr, Phys. Rev. B 37, 785 (1988).
J.P. Perdew, K. Burke, and M. Ernzerhof, Phys Rev. Lett. 77, 3865 (1996).
28
Download