Finite element methods in ab initio electronic structure calculations J.E. Pask∗ and P.A. Sterne Lawrence Livermore National Laboratory, University of California, Livermore, California 94550 (Dated: March 29, 2005) Abstract We review the application of the finite element (FE) method to ab initio electronic structure calculations in solids. The FE method is a general approach for the solution of differential and integral equations which uses a strictly local, piecewise polynomial basis. Because the basis is composed of polynomials, the method is completely general and its accuracy is systematically improvable. Because the basis is strictly local in real space, the method allows for variable resolution in real space; produces sparse, structured matrices, enabling the effective use of iterative solution methods; and is well suited for parallel implementation. The method thus combines significant advantages of both real-space-grid and basis-oriented approaches, and so is well suited for large, accurate ab initio calculations. We review the construction and properties of the required FE bases and their use in the self-consistent solution of the Kohn-Sham equations of density functional theory. PACS numbers: 02.70.Dh, 71.15.-m, 71.70.+z ∗ Electronic address: pask1@llnl.gov 1 I. INTRODUCTION Over the course of the past few decades, the density functional theory (DFT) of Hohenberg, Kohn, and Sham [1, 2] has proven to be an accurate and reliable basis for the understanding and prediction of a wide range of materials properties from first principles (ab initio), with no experimental input or empirical parameters. However, the solution of the equations of DFT is a formidable task and this has limited the range of physical systems which can be investigated by such rigorous, quantum mechanical means. In order to extend the interpretive and predictive power of such quantum mechanical theories further into the domain of “real materials”, involving nonstoichiometric deviations, defects, grain boundaries, surfaces, interfaces, and the like, robust and efficient methods for the solution of the associated quantum mechanical equations are of critical importance. The finite element (FE) method [3–5] is a general method for the solution of partial differential and integral equations which has found wide application in diverse areas of research. Here, we discuss its application to large-scale ab initio electronic structure calculations of solids. While the FE method is a general approach, capable of providing accurate solutions to both all-electron [6] and pseudopotential [7] formulations of the equations of DFT, its application to all-electron problems in molecules and solids has so far been limited by the large number of basis functions required to adequately describe all-electron solutions near nuclei, where the solutions can have cusps and oscillate rapidly. White et al. [8] applied the method to isolated monatomic and diatomic all-electron problems, and found that as many as 105 basis functions per atom were required to achieve sufficient accuracy using uniform meshes. Tsuchida and Tsukada [9] have applied the method to isolated H2 molecules using nonuniform meshes. They found also that, even with nuclei at nodal points in the mesh to facilitate the modeling of cusps, many thousands of basis functions were still required to achieve the desired accuracy. This is to be compared with the tens or at most hundreds of basis functions per atom required by conventional all-electron methods such as the linearized augmented planewave [6] and linearized muffin tin orbital [10] methods, which employ problem-specific bases constructed from isolated atomic solutions. Such methods lack the generality of the FE method but, by building known physics into the basis, can be far more efficient than the FE method for such problems. For the FE method to become competitive with conventional methods in the all-electron context, specialized basis functions, such as isolated atomic solutions or Gaussians, will likely need to be added to the standard FE basis to increase the efficiency of the representation, as in the recent work of Yamakawa and Hyodo [11], for example, on molecules and clusters. In the pseudopotential context, however, where solutions are much smoother, and simpler, more general basis sets are typically employed, the FE method is immediately applicable and we focus upon this context here. Like the traditional planewave (PW) method [7], the FE method is a variational expansion approach, in which solutions are represented as a linear combination of basis functions. However, whereas the PW method employs a Fourier basis, with every basis function overlapping every other, the FE method employs a basis of strictly local piecewise polynomials, each overlapping only its immediate neighbors. Because the FE basis consists of polynomials, the method is completely general and systematically improvable, like the PW method. Because the FE basis is strictly local, however, the method offers some significant advantages with respect to large-scale calculations. First, because the FE basis functions are localized, they can be concentrated where needed in real space to increase the efficiency of the rep- 2 resentation. The PW basis has the same resolution at all points in space and this leads to substantial inefficiencies in the solution of highly inhomogeneous problems such as those involving first-row or transition-metal atoms. Recent progress on this issue in the context of PW bases includes ultrasoft pseudopotentials [12], optimized pseudopotentials [13, 14], and adaptive coordinate transformations [15–17]. Second, the FE method can accommodate a wide variety of boundary conditions: Dirichlet boundary conditions for molecules or clusters, Bloch boundary conditions for crystals, a mixture of these for surfaces, etc. The PW method is limited to periodic boundary conditions, which leads to inefficiencies in cluster and surface calculations in particular, where large vacuum regions must be employed to minimize interactions among periodic images. Finally, and most significantly for large-scale calculations, the strict locality of the FE basis facilitates implementation on massively parallel computational architectures by minimizing the need for nonlocal communications. PW methods rely on Fourier transforms which can lead to substantial inefficiencies on such distributed architectures due the need for every node to communicate with every other. Recent progress on this issue in the context of PW bases includes new fast Fourier transform formulations [18, 19] and the pioneering work of Skylaris et al. [20] reformulating the PW basis in terms of periodic bandwidth limited delta functions. The advantages of a strictly local, real-space approach in large-scale calculations have been amply demonstrated in the context of finite difference (FD) methods [21–31]. These methods allow for some variable resolution in real space, can accommodate a variety of boundary conditions, and require no computation- or communication-intensive transforms. However, FD methods achieve these advantages by giving up the use of a basis altogether, which leads to disadvantages such as limited accuracy in integrations and nonvariational convergence. By retaining the use of a basis while remaining strictly local in real space, FE methods combine significant advantages of both PW and FD approaches. The FE method [3–5] has had a long history of success in diverse applications ranging from civil engineering to quantum mechanics. Applications in engineering go back to the 1950s. Applications to the electronic structure of isolated atomic and molecular systems began to appear in the 1970s [32]. White, Wilkins, and Teter [8] applied the method to full 3D atomic and molecular calculations in 1989 and demonstrated the favorable scaling of the method with the number of basis functions afforded by the strict locality and realspace nature of the basis. Applications to solids have appeared more recently. Hermansson and Yevick [33] applied the FE method to non-self-consistent solid-state electronic structure calculations in 1986, finding that it was less efficient than the PW method for small problems with relatively smooth potentials, when uniform meshes and cubic- or lower-order bases were employed. Subsequent work has focused on the application of the method to large-scale calculations, where the strict locality of the basis can provide significant advantages in the context of massively parallel computations. Tsuchida and Tsukada [9, 34–37] have applied the method to full self-consistent molecular and solid-state electronic structure calculations. They have implemented nonuniform meshes, adaptive coordinate transformations, pseudopotential and all-electron calculations, molecular dynamics, and an O(N) formulation for insulating materials; and have demonstrated the favorable efficiency of the method relative to FD approaches. Pask et al. [38–41] have formulated a general FE based electronic structure method, allowing treatment of metallic as well as insulating systems, arbitrary Bravais lattices, and full Brillouin zone sampling. They have addressed, in particular, the enforcement of the required Bloch-periodic boundary conditions in the context of a general C 0 basis, the use of nonlocal operators in the context of self-consistent calculations, 3 and the computation of the crystal potential and total energy in real space. Recently, Jun [42] has reported initial non-self-consistent results using a mesh-free FE basis to solve the Schrödinger equation subject to Bloch-periodic boundary conditions in an arbitrary Bravais lattice with arbitrary Brillouin zone sampling. The solution of the equations of DFT can be accomplished by a number of approaches, including direct minimization of the total energy functional [43], solution of the associated Lagrangian equations [44], and self-consistent solution of the Kohn-Sham equations [1]. In the context of solids, Tsuchida and Tsukada [9, 34, 35] have formulated an FE based direct minimization approach for large-scale calculations of insulating systems. Pask et al. [38– 41] have formulated an FE based Kohn-Sham approach appropriate for both metallic and insulating systems. The Kohn-Sham approach requires the repeated solution of coupled Schrödinger and Poisson equations with rapidly varying potentials and densities, subject to Bloch-periodic and periodic boundary conditions, respectively, in general parallelepiped domains. In practice, the solution of these equations constitutes the most computationally intensive part of the Kohn-Sham solution process and it is here where different ab initio methods mainly differ: the choice of methods employed to solve the required Schrödinger and Poisson equations. Here, we review the solution of the equations of DFT in solids using the finite element method. In Sec. II, we discuss the construction and properties of the required FE bases. In Sec. III, we discuss the solution of the Schrödinger and Poisson equations, subject to the required boundary conditions, using these bases. In Sec. IV, we discuss the use of these solutions in the self-consistent solution of the Kohn-Sham equations. In Sec. V, we discuss the computation of the density-functional total energy. In Sec. VI, we discuss applications. And we conclude with a summary and outlook in Sec. VII. II. FINITE ELEMENT BASES Finite element bases are, in the end, just well chosen sets of strictly local, piecewise polynomials. However, there are a great many variations, of various dimensions, geometries, polynomial orders, and continuities [3, 5], and we do not propose to review them all here. Rather, we focus on a broad and widely used class of bases which encompasses those most commonly employed in practice and most often adopted in the quantum-mechanical electronic structure context in particular: nodal bases, defined by their values and/or derivatives at specified points in the domain. The construction and key properties of such FE bases are perhaps best conveyed in the simplest case: a one-dimensional (1D), piecewise-linear basis. Figure 1 shows the steps involved in the construction of such a basis on a domain Ω = (0, 1). The domain is partitioned into subdomains called elements (Fig. 1(a)). In this case, the domain is partitioned into three elements Ω1 –Ω3 ; in practice, there are typically many more, so that each element encompasses only a small fraction of the domain. For simplicity, we have chosen a uniform partition but this need not be the case in general. (Indeed, it is precisely the flexibility to partition the domain as desired which allows for the substantial efficiency of the basis in highly inhomogeneous problems.) A parent basis φ̂i is then defined on the parent element Ω̂ = (−1, 1) (Fig. 1(b)). In this case, the parent basis functions are φ̂1 (ξ) = (1 − ξ)/2 and φ̂2 (ξ) = (1 + ξ)/2. Since the parent basis consists of two (independent) linear polynomials, it is complete to linear order, i.e., a linear combination can represent any linear polynomial exactly. Furthermore, it is defined such that each function takes on the value 1 at exactly 4 one point, its associated node, and vanishes at all (one, in this case) other nodes. Local basis (e) functions φi are then generated by transformations ξ (e) (x) of the parent basis functions φ̂i from the parent element Ω̂ to each element Ωe (Fig. 1(c)). In present case, for example, (1) (1) φ1 (x) ≡ φ̂1 (ξ (1) (x)) = 1 − 3x and φ2 (x) ≡ φ̂2 (ξ (1) (x)) = 3x, where ξ (1) (x) = 6x − 1. Finally, the piecewise-polynomial basis functions φi of the method are generated by piecing (e) together the local basis functions φi (Fig. 1(d)). This is the process of assembly. In the present case, for example, (1) φ1 (x), x ∈ [0, 1/3], φ1 (x) = 0, otherwise, (1) φ2 (x), x ∈ [0, 1/3], φ2 (x) = φ(2) 1 (x), x ∈ [1/3, 2/3], 0, otherwise. The above 1D piecewise-linear FE basis possesses the key properties of all such nodal bases, whether of higher dimension or higher polynomial order. First, the basis functions are strictly local, i.e., nonzero over only a small fraction of the domain. This leads to sparse matrices and scalability, as in FD approaches, while retaining the use of a basis, as in PW approaches. Second, within each element, the basis functions are simple, low-order polynomials, which leads to computational efficiency, generality, and systematic improvability, as in both FD and PW approaches. Third, the basis functions are C 0 in nature, i.e., continuous but not necessarily smooth. (Higher-order bases can be smoother, but we consider the C 0 case for generality.) As we shall discuss, this necessitates extra care in the solution of second-order problems, with periodic boundary conditions in particular. Finally, the basis functions have the key property φi (xj ) = δij i.e., each basis function takes on a value of 1 at its associated node and vanishes at all other nodes. (Higher-order bases may possess this property with respect to derivatives as well as values at nodes.) By virtue of this property, an FE expansion f (x) = i ci φi (x) has the property f (xj ) = cj , so that the expansion coefficients have a direct, real-space meaning. This eliminates the need for computationally intensive transforms, such as Fourier transforms in PW based approaches [7], and facilitates preconditioning in iterative solutions, such as multigrid in FD approaches [26]. Figure 1(d) shows a general C 0 linear basis, capable of representing any piecewise linear function (having the same polynomial subintervals) exactly. To solve a problem subject to vanishing Dirichlet boundary conditions, as occurs in molecular or cluster calculations, one can restrict the basis as in Fig. 1(e), i.e., omit boundary functions. To solve a problem subject to periodic boundary conditions, as occurs in solid-state electronic structure calculations, one can restrict the basis as in Fig. 1(f), i.e., piece together local basis functions across the domain boundary in addition to piecing together across interelement boundaries. Regarding this periodic basis, however, it should be noted that an arbitrary linear combination f (x) = i ci φi (x) necessarily satisfies f (0) = f (1) (1) but does not necessarily satisfy f (0) = f (1). 5 (2) Ω = (0,1) Ω2 Ω1 (a) 0 Ω3 1/3 φˆ1 1 (b) ˆ = (−1,1) Ω 0 ξ1 = −1 0 φ1(1) φ2(1) φ1(2) x φˆ2 ξ2 = 1 ξ (2) ( x) ξ (1) ( x) 1 1 2/3 ξ (3) ( x) φ2(2) ξ (transform) φ1(3) φ2(3) (c) 0 x1(1) = 0 φ1 0 x1 = 0 (e) 0 1 (f) φ2 Periodic 0 x1 = 0 φ3 φ1 x φ1 φ4 x4 = 1 x φ2 x1 = 1/ 3 0 x2(3) = 1 x3 = 2 / 3 x2 = 1/ 3 Dirichlet 1 x2(2) = x1(3) = 2 / 3 (assemble) General 1 (d) x2(1) = x1(2) = 1/ 3 x2 = 2 / 3 φ2 x2 = 1/ 3 1 φ3 x3 = 2 / 3 x φ1 1 x FIG. 1: 1D piecewise-linear FE bases. (a) Domain and elements. (b) Parent element and parent basis functions. (c) Local basis functions generated by transformations of parent basis functions to each element. (d) General piecewise-linear basis, generated by piecing together local basis functions across interelement boundaries. (e) Dirichlet basis, generated by omitting boundary functions. (f) Periodic basis, generated by piecing together boundary functions. Thus, unlike PW or other such smooth bases, while the value-periodic condition (1) is enforced by the use of such an FE basis, the derivative-periodic condition (2) is not. And so for problems requiring the enforcement of both, as in solid-state electronic structure, the derivative condition must be enforced by other means [38]. We address this in Sec. III. Higher-order FE bases are constructed by defining more independent parent basis functions, which requires that some basis functions be of higher order than linear. And, as in the linear case, what is typically done is to define all functions to be of the same order so that, for example, to define a 1D quadratic basis, one would define three quadratic parent basis functions; for a 1D cubic basis, four cubic parent basis functions, etc. Figure 2 shows the construction of a 1D quadratic basis. With higher-order basis functions, however, come new possibilities. For example, with cubic basis functions there are sufficient degrees of freedom to specify both value and slope at end points, thus allowing for the possibility of both value and slope continuity across interelement boundaries, and so allowing for the possibility of a 6 Ω = (0,1) Ω2 Ω1 (a) 0 1/3 (b) ˆ = (−1,1) Ω φˆ2 0 ξ1 = −1 φ1(1) (c) 0 x1(1) = 0 φ3(1) φ2(1) x2(1) φ3(2) x2(2) (transform) φ1(3) φ3(3) φ2(3) x2(3) x3(2) = x1(3) x3(3) = 1 φ1 φ2 φ4 φ3 φ5 φ6 φ7 (d) 0 x1 = 0 x2 = 1/ 6 Dirichlet 1 x (assemble) General 1 ξ (3) ( x) φ2(2) x3(1) = x1(2) ξ ξ3 = 1 ξ (2) ( x) φ1(2) 1 φˆ3 ξ2 = 0 ξ (1) ( x) 1 x 2/3 φˆ1 1 Ω3 φ1 x4 = 1/ 2 x3 = 1/ 3 φ2 φ3 x5 = 2 / 3 x6 = 5 / 6 φ4 x7 = 1 x φ5 (e) 0 1 x1 = 1/ 6 0 Periodic φ1 x2 = 1/ 3 φ2 x3 = 1/ 2 φ4 φ3 x4 = 2 / 3 x x5 = 5 / 6 φ5 φ6 1 φ1 (f) 0 x1 = 0 x2 = 1/ 6 x4 = 1/ 2 x3 = 1/ 3 x5 = 2 / 3 x6 = 5 / 6 x 1 FIG. 2: 1D piecewise-quadratic FE bases. (a) Domain and elements. (b) Parent element and parent basis functions. (c) Local basis functions generated by transformations of parent basis functions to each element. (d) General piecewise-quadratic basis, generated by piecing together local basis functions across interelement boundaries. (e) Dirichlet basis, generated by omitting boundary functions. (f) Periodic basis, generated by piecing together boundary functions. Cubic Hermite Parent Basis Functions φˆ1 φˆ3 φˆ2 φˆ4 FIG. 3: Cubic Hermite parent basis functions φ̂1 –φ̂4 on the parent element Ω̂ = (−1, 1). 7 TABLE I: Basis functions and associated nodes of 3D C 0 cubic parent element. Here, ξ0 = ξi ξ, η0 = ηi η, and ζ0 = ζi ζ. (ξi , ηi , ζi ) (±1, ±1, ±1) (± 13 , ±1, ±1) (±1, ± 13 , ±1) (±1, ±1, ± 13 ) φ̂i 1 64 (1 9 64 (1 9 64 (1 9 64 (1 + ξ0 )(1 + η0 )(1 + ζ0 ){9(ξ 2 + η 2 + ζ 2 ) − 19} − ξ 2 )(1 + 9ξ0 )(1 + η0 )(1 + ζ0 ) − η 2 )(1 + 9η0 )(1 + ξ0 )(1 + ζ0 ) − ζ 2 )(1 + 9ζ0 )(1 + ξ0 )(1 + η0 ) C 1 (continuous value and slope) rather than C 0 basis. One common family of such bases is the Hermite family. Higher-order Hermite bases are defined by adding derivative conditions to existing nodes rather than value conditions at additional nodes. For example, on the parent element Ω̂ = (−1, 1), the cubic Hermite parent basis functions are defined by the conditions φ̂1 (−1) = 1, φ̂1 (−1) = 0, φ̂1 (1) = 0, φ̂1 (1) = 0, φ̂2 (−1) = 0, φ̂2 (−1) = 1, φ̂2 (1) = 0, φ̂2 (1) = 0, φ̂3 (−1) = 0, φ̂3 (−1) = 0, φ̂3 (1) = 1, φ̂3 (1) = 0, φ̂4 (−1) = 0, φ̂4 (−1) = 0, φ̂4 (1) = 0, φ̂4 (1) = 1, which give φ̂1 (ξ) = (2 − 3ξ + ξ 3 )/4, 2 3 φ̂2 (ξ) = (1 − ξ − ξ + ξ )/4, 3 (3) (4) φ̂3 (ξ) = (2 + 3ξ − ξ )/4, (5) φ̂4 (ξ) = (−1 − ξ + ξ 2 + ξ 3 )/4, (6) as shown in Fig. 3. Using these parent functions, the corresponding piecewise polynomial FE (e) basis is then constructed as before: local basis functions φi are defined by transformations of the parent functions φ̂i from the parent element Ω̂ to each element Ωe and these are pieced together to form the final piecewise polynomial basis. However, in this case, four functions are pieced together across each element boundary rather than two: the two with matching nonzero values (as in the C 0 case) and the two with matching nonzero derivatives. The resulting piecewise polynomial basis is then C 1 and complete to cubic order. For sufficiently smooth problems, such higher order continuity can yield greater accuracy per degree of freedom [4] and such bases have been used in the electronic structure context [8, 35]. However, while straightforward in one dimension, in higher dimensions this requires matching both values and derivatives, including cross terms, across entire curves or surfaces, which becomes increasingly difficult to accomplish and leads to additional constraints on the meshes which can be employed [4]. Higher-dimensional FE bases are constructed along the same lines as the 1D case: partition the domain into elements, define local basis functions within each element via transformations of parent basis functions, and piece together the resulting local basis functions to form the final piecewise-polynomial basis. In higher dimensions, however, there arises a 8 ( 1, 1, 1) (-1, 1, 1) ( 1,-1, 1) (-1,-1, 1) (-1, 1,-1) 1, 1,-1) ((1,-1,-1) (-1,-1,-1) ( 1,-1,-1) FIG. 4: Three-dimensional C 0 cubic parent element and nodes (denoted by open circles). significant additional choice: that of shape. The most common 2D element shapes are triangles and quadrilaterals. In 3D, tetrahedra, hexahedra (e.g., parallelepipeds), and wedges are among the most common. A variety of shapes have been employed in atomic and molecular calculations [32]. In solid-state electronic structure calculations, the domain can be reduced to a parallelepiped and C 0 [9, 38] as well as C 1 [35] parallelepiped elements have been employed. Pask et al. [38] have employed a 3D C 0 cubic serendipity basis [5] with parent basis functions and associated nodes as shown in Table I and Fig. 4. Tsuchida and Tsukada [35] have employed a 3D C 1 cubic Hermite basis with parent basis functions defined by Cartesian products of the 1D functions (3)–(6) shown in Fig. 3. III. SOLUTION OF THE SCHRÖDINGER AND POISSON EQUATIONS Having established the key properties of the relevant FE bases, we turn now to their use in solution of the Kohn-Sham equations of density functional theory in solids. In this section, we discuss the solution of the required Schrödinger and Poisson equations and in the next, we discuss the use of these solutions in the self-consistent solution of the Kohn-Sham equations. For generality, we shall follow the development of Pask et al. [38–41] in terms of an arbitrary C 0 basis. Since C 1 and smoother bases are contained in C 0 , the development applies to all such smoother bases as well. In a perfect crystal, the electronic potential is periodic, i.e., V (x + R) = V (x) (7) for all lattice vectors R, and the solutions of the Schrödinger equation satisfy Bloch’s theorem ψ(x + R) = eik·R ψ(x) (8) for all lattice vectors R and wavevectors k [45]. Thus the values of V (x) and ψ(x) throughout the crystal are completely determined by their values in a single unit cell; and so the solutions of the Poisson and Schrödinger equations in the crystal can be reduced to their solutions in a single unit cell, subject to boundary conditions consistent with Eqs. (7) and (8), respectively. 9 R3 R2 R1 FIG. 5: Parallelepiped unit cell (domain) Ω, boundary Γ, surfaces Γ1 –Γ3 , and associated lattice vectors R1 –R3 . A. 1. Schrödinger equation Differential formulation We consider first the Schrödinger problem: − 12 ∇2 ψ + V ψ = εψ (9) in a unit cell, subject to boundary conditions consistent with Bloch’s theorem, where V is an arbitrary periodic potential (atomic units are used throughout). Since V is periodic, ψ can be written in the form (10) ψ(x) = u(x)eik·x , where u is a complex, cell-periodic function satisfying u(x) = u(x + R) for all lattice vectors R [45]. Assuming the form (10), the Schrödinger equation (9) becomes − 12 ∇2 u − ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u = εu, (11) where, allowing for the possibility of nonlocality, V l and V nl are the local and nonlocal parts of V . From the periodicity condition (8), the required boundary conditions on the unit cell are then [45] u(x) = u(x + Rl ), x ∈ Γl , n̂ · ∇u(x) = n̂ · ∇u(x + Rl ), x ∈ Γl , (12) (13) where Γl and Rl are the surfaces of the boundary Γ and associated lattice vectors R shown in Fig. 5, and n̂ is the outward unit normal at x. The required Bloch-periodic problem in the infinite crystal can thus be reduced to the periodic problem (11)–(13) in the finite unit cell. Since the domain has been reduced to the unit cell, however, nonlocal operators require further consideration. In particular, if as is typically the case for ab initio pseudopotentials, the domain of definition is all space (i.e., the full crystal), they must be transformed to the relevant finite subdomain (i.e., the unit cell). For a separable potential of the usual form [7] a a vlm (x − τa − Rn )hal vlm (x − τa − Rn ), (14) V nl (x, x ) = n,a,l,m 10 where n runs over all lattice vectors and a runs over atoms in the unit cell, the nonlocal term e−ik·x V nl eik·x u in Eq. (11) is −ik·x a a a dx vlm vlm (x − τa − Rn )hl (x − τa − Rn )eik·x u(x ), (15) e n,a,l,m where the integral is over all space. Rewriting the integral over all space as a sum over all unit cells and using the cell-periodicity of u, the integral in (15) can be transformed to the unit cell as follows: a dx vlm (x − τa − Rn )eik·x u(x) a dx vlm (x − τa − Rn − Rn )eik·x e−ik·Rn u(x − Rn ) = n = Ω n Ω a dx vlm (x − τa − Rn )eik·x eik·Rn e−ik·Rn u(x), so that the nonlocal term e−ik·x V nl eik·x u in Eq. (11) becomes −ik·x ik·Rn a a a e vlm (x − τa − Rn )hl dx e−ik·Rn vlm (x − τa − Rn )eik·x u(x ). e a,l,m Ω n n (16) Having reduced the required problem to a periodic problem on a finite domain, solutions may be obtained using a periodic FE basis defined in that domain. However, if the basis is C 0 , as is typically the case, rather than C 1 or smoother, some additional consideration is required. First, the direct application of the Laplacian to such a basis is clearly problematic. Second, being periodic in value, consistent with (12), but not necessarily in derivative, contrary to (13), the basis does not satisfy the required boundary conditions. Both issues can be resolved by reformulating the original differential formulation (11)–(13) in weak (integral) form. 2. Weak formulation An equivalent integral equation can be formed by taking the inner product of the differential equation (11) with an arbitrary1 “test function” v: v ∗ − 12 ∇2 u − ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u − εu dx = 0. Ω The order of the highest derivative can now be reduced and a boundary term can be created (whose usefulness will become clear presently) by integrating the ∇2 term by parts: ∗ 1 1 ∇v · ∇u dx − 2 v ∗ ∇u · n̂ ds 2 Ω 1 Γ We assume that all functions involved are sufficiently regular to keep the integrals well defined, as will be the case in our applications. For details regarding the relevant functions spaces see, e.g., Ref. [4]. 11 + Ω v ∗ −ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u − εu dx = 0. The derivative boundary condition (13) can now be incorporated by restricting v to v ∈ V = {v : v(x) = v(x + Rl ), x ∈ Γl } , i.e., to satisfy the value condition (12). Then, using the fact that the domain is a parallelepiped, the boundary term can be written as 1 v ∗ (x) [∇u(x) − ∇u(x + Rl )] · n̂ ds, 2 l Γl which vanishes upon the assertion of the derivative boundary condition. Thus, the differential equation (11) and derivative boundary condition (13) together imply the integral equation ∗ ∗ l −ik·x nl ik·x 1 1 2 ∇v · ∇u dx + v k u + V u + e V e u − εu dx = 0 ∀v ∈ V. −ik · ∇u + 2 2 Ω Ω (17) Conversely, it can be shown [39] that the integral equation (17) implies both the differential equation and derivative boundary condition. The differential formulation (11)–(13) is thus equivalent to the following weak formulation: Find the scalars ε and functions u ∈ V such that ∗ 1 ∇v · ∇u dx + v ∗ −ik · ∇u + 12 k 2 u + V l u + e−ik·x V nl eik·x u dx = 2 Ω Ω ε v ∗ u dx ∀v ∈ V (18) Ω or, more concisely, h(v, u) = εs(v, u) ∀v ∈ V, (19) where V = {v : v(x) = v(x + Rl ), x ∈ Γl }. This formulation contains no derivatives higher than first order and requires only value-periodicity (i.e., Eq. (12)) of the approximation space V, thus resolving both of the aforementioned Laplacian and boundary condition issues in the context of a C 0 basis. 3. Discretization Having reformulated the problem in weak form, discretization can now proceed in a C 0 FE basis. Let n n u= cj φj and v = d i φi , j=1 i=1 where {φk }nk=1 is a real C 0 basis satisfying the value-periodic condition (12) and {cj } and {di } are complex coefficients, so that u and v are restricted to a finite dimensional subspace Vn ⊂ V. Then (19) becomes: Find the scalars ε and coefficients cj such that d∗i cj h(φi , φj ) = ε d∗i cj s(φi , φj ) ∀{di }, i,j i,j 12 which implies cj h(φi , φj ) = ε j cj s(φi, φj ) j due to the arbitrariness the {di }. And so we arrive at a generalized eigenproblem determining the approximate eigenvalues ε and eigenfunctions u = j cj φj of the weak formulation and thus of the required problem: Hij cj = ε Sij cj , (20) j where Hij = Sij = Ω Ω dx j l −ik·x nl ik·x 1 2 , ∇φ · ∇φ − ik · φ ∇φ + k φ φ + V φ φ + φ e V e φ i j i j i j i j i j 2 2 1 dx φiφj . (21) (22) For a separable potential of the usual form (14), transformed to the unit cell as in (16), the nonlocal term in (21) becomes aj ∗ ai a dx φi(x)e−ik·x V nl eik·x φj (x) = flm hl flm , (23) Ω a,l,m where ai flm = Ω dx φi(x)e−ik·x a eik·Rn vlm (x − τa − Rn ). n Since Ω dxφi ∇φj = − Ω dxφj ∇φi for all φi, φj satisfying the value-periodic condition (12), the local part of H is Hermitian while S is real symmetric. Also, as is clear from (23), for a separable potential of the usual form (14), the nonlocal part of H is Hermitian as well. a are local in Furthermore, since both the FE basis functions φi and nonlocal projectors vlm real space, both H and S are sparse, with O(n) nonzero elements, where n is the number of basis functions. And since the basis functions are simple, low-order polynomials, the matrix elements can be evaluated to any desired accuracy with minimal computation. Thus, like the PW method, the FE method is variational and matrix elements can be readily evaluated to any desired accuracy, independent of the mesh. And like the FD method, the resulting matrix equations are sparse and well suited for solution by modern iterative methods. 4. Results Figure 6 shows a series of FE results for a Si pseudopotential [38, 46]. Since the method allows for the direct treatment of any Bravais lattice, results are shown for a two-atom fcc primitive cell. The figure shows the sequence of band structures (eigenvalues εi vs. wavevector k along a path through high-symmetry points in the Brillouin zone) obtained for 3 × 3 × 3, 4 × 4 × 4, and 6 × 6 × 6 uniform meshes vs. exact values at selected k points (where “exact values” were obtained from a well converged PW calculation). The variational nature of the method is clearly exhibited: the error is strictly positive and the entire band structure converges rapidly and uniformly from above as the number of basis functions is increased. Figure 7 shows a more quantitative view of the convergence: the fractional error 13 Si Energy (eV) 15 10 3x3x3 4x4x4 6x6x6 5 FE Exact 0 * L X FIG. 6: Finite element (FE) and exact band structures for a Si pseudopotential. FE results are for a series of uniform meshes: 3×3×3, 4×4×4, and 6×6×6 in the fcc primitive cell. The convergence is rapid and variational: the entire band structure converges from above, with excellent agreement for the 6 × 6 × 6 mesh. log10 Elements in each direction 10 2 10 3 10 4 10 5 10 6 0.9 1 1.1 1.2 1.3 Si 3 EE k .123, .234, .345 2 a cubic C0 elements 2 E1 E2 E3 E4 E5 4 slope 6 log 10 EE 0.8 5 6 6 8 10 12 14 16 Elements in each direction 18 20 FIG. 7: Fractional error ∆E/E of FE eigenvalues at an arbitrary k point for a Si pseudopotential. FE results are for a series of uniform meshes from 6 × 6 × 6 to 20 × 20 × 20 in the fcc primitive cell. The variational nature and consistent, sextic convergence of the method are clearly exhibited: the error is strictly positive and rapidly establishes a constant asymptotic slope of ≈ −6 on the log-log scale. 14 of the first few eigenvalues with increasing numbers of elements, at an arbitrary k point. The variational nature and consistent, sextic convergence of the method are clearly shown: the error is strictly positive and rapidly establishes a constant asymptotic slope of ≈ −6 on the log-log scale, indicating an error of O(h6 ), where h is the mesh spacing, consistent with asymptotic convergence theorems for the cubic-complete case [4]. B. Poisson equation The Poisson solution proceeds along the same lines as the Schrödinger solution: differential formulation, weak formulation to address Laplacian and boundary condition issues, then discretization in the appropriate C 0 basis. In this case, the required problem is ∇2 V (x) = 4πρ(x), x ∈ Ω, (24) V (x) = V (x + Rl ), x ∈ Γl , n̂ · ∇V (x) = n̂ · ∇V (x + Rl ), x ∈ Γl , (25) (26) subject to boundary conditions where V (x) is the potential energy of an electron in the charge density ρ(x) and the domain Ω, bounding surfaces Γl , and lattice vectors Rl are again as in Fig. 5. The weak formulation of (24)–(26) is [39]: Find V ∈ V such that (27) − ∇v · ∇V dx = 4π vρ(x) dx ∀v ∈ V, Ω Ω where V = {v : v(x) = v(x + Rl ), x ∈ Γl }. Subsequent discretization in a real periodic FE basis φj leads to a symmetric linear system determining the approximate solution V (x) = j cj φj (x) of the weak formulation and thus of the required problem: Lij cj = fi , (28) j where Lij = − dx ∇φi (x) · ∇φj (x), Ω fi = 4π dx φi(x)ρ(x). (29) (30) Ω As in the FD method, the above matrices are sparse and structured due to the strict locality of the FE basis, requiring only O(n) storage and O(n) operations for solution by iterative methods, whereas O(n log n) operations are required in a PW basis. Figures 8–10 show results for a general triclinic model problem [39] for which an analytic solution is available. Figure 8 shows the FE and exact solutions for V along a body diagonal of the triclinic cell, for a series of uniform meshes: 2 × 2 × 2, 4 × 4 × 4, and 6 × 6 × 6. The FE approximations converge rapidly to the exact solution, with substantial improvement upon doubling the number of elements in each direction from 2 × 2 × 2 to 4 × 4 × 4, and excellent agreement for the 6 × 6 × 6 mesh. Furthermore, the approximations converge to the smooth 15 6 Triclinic cell V s R1 R2 R3 4 2 0 2 2 2 2 4 4 4 6 6 6 Exact 4 0 0.2 0.4 0.6 0.8 1 s FIG. 8: FE and exact solutions of Poisson’s equation for a model triclinic charge density, for a series of meshes of increasing resolution. The FE approximations converge rapidly to the smooth, fully periodic exact solution, though the basis is itself neither smooth nor fully periodic. 0.06 Triclinic cell e s R1 R2 R3 0.04 0.02 0 0.02 0.04 6 6 6 12 12 12 0.06 0 0.2 0.4 0.6 0.8 1 s FIG. 9: Error in FE solution of Poisson’s equation for a model triclinic charge density, along a body diagonal of the cell. The error converges rapidly and uniformly: being of comparable size, and converging at a comparable rate, at the boundary as elsewhere. The cusps in the error are a manifestation of the C 0 nature of the approximation. analytic solution despite the fact that the basis is not smooth (being merely C 0 ) and to the correct derivative-periodic limit despite the fact that the basis is not derivative-periodic (being merely value-periodic). The approximations satisfy the value-periodic condition (25) exactly, by virtue of the basis, and the derivative-periodic condition (26) asymptotically, by virtue of the weak formulation. A more detailed look at the error of the approximations is provided by Fig. 9 which shows the error e = VFE − Vexact along the diagonal of the cell for two meshes: 6 × 6 × 6 and 12 × 12 × 12. The C 0 nature of the approximations is evident in the cusps (the exact solution being a smooth function in this case). As is clear from Fig. 8, these cusps correspond to small but in general nonzero differences in slope across interelement boundaries. These small differences are a natural consequence of basing the 16 Log10 Elements in each direction 10 3 10 4 1 1.2 1.6 Triclinic cell 2 3 Vx ex 10 1.4 as y 5 mp tot ic s 4 lop e 4 5 7 Vx 10 2 Log10 e x 0.8 5 9 11 13 17 21 25 29 35 41 Elements in each direction FIG. 10: Relative L2 error in FE solution of Poisson’s equation for a model triclinic charge density, for a series of meshes of increasing resolution. The asymptotic slope of ≈ −4 demonstrates the optimal quartic convergence of the cubic approximation. approximation on the weak formulation, which allows the approximation of the smooth, derivative-periodic solution to be drawn from the larger admissible space of functions which are neither necessarily smooth nor derivative-periodic. And in allowing the approximation to be drawn from this larger space, the weak formulation allows for a better approximation than would be possible were it to be restricted to the smooth, derivative-periodic subspace. Of course, it is not sufficient that the approximation merely converges to the exact solution; it should do so at the maximum rate consistent with the completeness of the basis. Figure 10 shows the relative L2 error e/V for a series of meshes ranging from 5 × 5 × 5 to 41 × 41 × 41, where e = VFE − Vexact and · denotes the L2 norm. According to standard FE convergence theorems [4], the L2 norm of the error of an FE approximation of degree p should be O(hp+1),2 where h is the mesh spacing. On a log-log plot then, the asymptotic slope of the error vs. 1/h (or, equivalently, the number of elements in each direction) for a series of FE approximations of degree p should be −(p + 1). A least squares fit of the last three points in Fig. 10 yields a slope of −4.02, indicating that the approximations do indeed converge at the optimal rate consistent with the cubic completeness of the basis employed. Optimal pointwise convergence has also been shown [39]. IV. SOLUTION OF THE KOHN-SHAM EQUATIONS In the pseudopotential approximation [7], the Kohn-Sham equations of density functional theory [1, 2] are given by − 12 ∇2 ψi (x) + Veff ψi (x) = εi ψi (x), 2 (31) Assuming that the solution is sufficiently smooth (as in the present case) or, if not, that appropriate measures such as mesh gradation and/or the introduction of special basis functions at singularities have been taken. 17 Veff = VIl + VInl + VH + Vxc , VI,a (x), VIl = a VInl ψi = nl (x, x )ψi (x ), dx VI,a (33) (34) a ρe (x ) , |x − x | Vxc = Vxc (x; ρe ), fi ψi∗ (x)ψi (x), ρe = − VH = − (32) dx (35) (36) (37) i nl where ψi and εi are the Kohn-Sham eigenfunctions and eigenvalues, VI,a and VI,a are the local and nonlocal parts of the ionic pseudopotential of atom a, ρe is the electronic charge density, the integrals extend over all space, and the summations extend over all atoms a and states i with occupations fi . (For simplicity, we omit spin and crystal momentum indices and consider the case in which the external potential arises from the ions.) The nonlocal part VInl and exchange-correlation potential Vxc are determined by the choice of pseudopotentials and exchange-correlation functional, respectively. VIl is the Coulomb potential arising from the ions and VH is that arising from the electrons (Hartree potential). Since the eigenfunctions ψi depend on the effective potential Veff in (31), which depends on the electronic density ρe in (35) and (36), which depends again on the eigenfunctions in (37), these equations are coupled and must be solved self-consistently. The process is generally as follows (see, e.g., Ref. [7]): An initial electronic charge density ρin e is constructed (e.g., by overlapping atomic charge densities). The effective potential Veff is computed from Eqs. (32)–(36). The eigenstates ψi are computed by solving the associated Schrödinger equation (31). And finally, a new electronic density ρe is computed from (37). If ρe is in sufficiently close to ρin e , then self-consistency has been reached; otherwise, a new ρe is constructed based on ρe (and possibly previous iterates) and the process is repeated until self-consistency is reached. In practice, it is often more efficient to compute VH by solving the associated Poisson equation rather than evaluating the integral (35) directly. And so in practice, the selfconsistent solution of the Kohn-Sham equations requires the repeated solution of Schrödinger (31) and Poisson (in place of (35)) equations until self-consistency is reached. For large calculations, involving many atoms, the solution of these equations becomes the time limiting step. And so it is to the solution of these equations that the FE method is applied, with the goal of extending the range of physical systems that can be investigated by such rigorous, quantum mechanical means. The efficiency of the FE method in the context of large-scale calculations derives from the strict locality of the FE basis in real space, thus minimizing the need for extensive interprocessor communications on massively parallel computational architectures. In order to accomplish an efficient Kohn-Sham solution, however, both FE and non-FE parts of the calculation must proceed efficiently on such computational architectures. Due to the longrange 1/r nature of the Coulomb interaction, the formulation of these non-FE parts of the computation in such a way as to avoid extensive interprocessor communications requires some care. The key to accomplishing this in the context of crystalline solids is to formulate all terms in the effective potential and total energy in terms of net neutral densities in 18 VI,a and ρI,a (a.u.) Si ρI,a VI,a r (a.u.) FIG. 11: Local part VI,a of Si pseudopotential [55] and corresponding localized charge density ρI,a . The potential has a long-range 1/r tail whereas the corresponding density is localized in real space. order to avoid divergences associated with the long-range Coulomb interaction [47–52] and accomplish otherwise long-range lattice summations without the need for reciprocal space (Fourier) transformations. A number of such approaches have been developed in the context of modern real-space electronic structure methods. Alemany et al. [22] employ a uniform neutralizing density in a real-space formulation of the Hartree term. Ionic terms are, however, computed as in the conventional reciprocal space formalism, using Fourier transforms. Other real-space formulations have employed localized neutralizing densities to eliminate the need for reciprocal space transformations altogether. The formulation of Bachelet et al. [53] employs Gaussian representations of local ionic densities to construct total densities and potentials and associated neutral terms amenable to evaluation in real space. Tsuchida and Tsukada [9, 35] employ a combination of localized and uniform neutralizing densities to construct neutral terms optimized for evaluation in real space. Ordejón et al. [54] form neutral terms by expressing potentials and energies in terms of neutral pseudoatomic densities and differences of the crystal density from these. The formulation of Fattebert and Nardelli [31] employs neutralizing Gaussian densities and associated potentials, in the spirit of the classical Ewald method [48], to render long-range interactions short ranged. Pask et al. [41] have developed a real-space formulation for the Kohn-Sham effective potential and total energy which employs localized ionic densities to replace long-ranged ionic potentials. We review this formulation below. In an infinite crystal, VIl and VH are divergent and the total Coulomb potential VC = VIl + VH within the unit cell depends on ions and electrons far from the unit cell due to the longrange 1/r nature of the Coulomb interaction. The latter constitutes a particular problem for real-space formulations such as the FE method. Both difficulties may be overcome, however, by replacing the long-range ionic potentials by the short ranged charge densities which generate them, and incorporating long-range interactions into boundary conditions on the unit cell. By construction, the local ionic pseudopotentials VI,a of each atom a vary as −Za /r (or rapidly approach this) outside their respective pseudopotential cutoff radii rc,a; where Za is the effective ionic charge and r is the radial distance. They thus correspond, by Poisson’s equation, to charge densities ρI,a strictly localized within rc,a (or rapidly approaching this). Figure 11 shows a typical local ionic pseudopotential and corresponding ionic charge density. 19 Energy (a.u.) GaAs FE Exact 6x6x6 9x9x9 12x12x12 Γ Γ FIG. 12: Finite element (FE) and exact self-consistent band structures for GaAs in the local density approximation. FE results are for series of meshes: 6 × 6 × 6, 9 × 9 × 9, and 12 × 12 × 12 in the fcc primitive cell. The self-consistent FE approximations converge rapidly and uniformly the exact self-consistent solution as the number of elements is increased. The total ionic charge density is then ρI = ρI,a (x), (38) a where the summation extends over all atoms in the crystal. Since the ionic densities are localized in real space, however, the summation in the unit cell is in fact finite and readily performed in real space, unlike the summation of ionic potentials. Having constructed the ionic charge density in the unit cell, the total charge density ρ = ρI + ρe may then be constructed and the total Coulomb potential VC = VIl + VH may be computed at once by a single Poisson solution subject to periodic boundary conditions: ∇2 VC (x) = 4πρ(x); (39) whereupon Veff may be evaluated as in (32). Figure 12 shows FE and exact self-consistent band structures (Kohn-Sham eigenvalues εi vs. wavevector k along a path through high-symmetry points in the Brillouin zone) for GaAs, in the local density approximation of density functional theory. “Exact” results were taken from a standard PW calculation, employing Fourier transforms in the usual way [7, 56, 57]. FE results are for a series of meshes, 6 × 6 × 6, 9 × 9 × 9, and 12 × 12 × 12 in the fcc primitive cell, using the above real-space formulation of the crystal potential. Both FE and PW calculations used the same nonlocal pseudopotentials [55] and exchange-correlation 20 functional [58]. The self-consistent FE approximations converge rapidly and uniformly, over the whole of the k-space path, to the exact self-consistent solution as the number of elements is increased. A more quantitative look at the error of the FE approximations (see below) shows that the convergence is in fact optimal with respect to the completeness of the chosen FE basis. V. TOTAL ENERGY In the pseudopotential approximation, the total energy in density-functional theory is given by l nl + EeI + Eee + EII + Exc , Etot = Ts + EeI Ts = fi dx ψi∗ (x)(− 12 ∇2 )ψi (x), i l EeI nl EeI (40) (41) dx ρe (x)VIl (x), = fi dx ψi∗ (x)VInl ψi (x), =− (42) (43) i EII Exc ρe (x)ρe (x ) , |x − x | Za Za = 12 , |τa − τa | a,a =a = − dx ρe (x)εxc (x; ρe ), Eee = 1 2 dxdx (44) (45) (46) where Za is the ionic charge of atom a at position τa and, as in (31)–(37), the integrals extend over all space, and the summations extend over all atoms a and a , and states i with l occupations fi . Ts is the kinetic energy of the non-interacting system; EeI , Eee , and EII are the potential energies associated with the Coulomb interaction between electrons and ions, nl is the energy associated with electrons and electrons, and ions and ions, respectively; EeI the nonlocal part of the ionic potential; and Exc is the exchange-correlation energy. Ts is nl determined by the Kohn-Sham orbitals and occupations, EeI is determined by the choice of pseudopotentials, and Exc is determined by the choice of exchange-correlation functional. In an infinite crystal, the total energy per unit cell may be obtained by restricting the integrals over x and summation on a in (41)–(46) to the unit cell, while the integrals over x l and summation on a remain over all space. In this case, EeI is divergent and negative while Eee and EII are divergent and positive due to the long-range 1/r nature of the Coulomb interaction. However, in terms of the total charge density ρ and Coulomb potential VC , the l finite total Coulomb energy per unit cell EC = EeI + Eee + EII may be obtained at once: 1 dx ρ(x)VC (x) − Es , (47) EC = − 2 Ω where Ω is the unit cell and Es is the ionic self-energy per unit cell. The ionic self-energy is subtracted so that EC corresponds to the conventional density-functional Coulomb energy, 21 which excludes ionic self-energy. This self energy may be computed from the ionic potentials and associated densities: 1 Es = − 2 (48) dx ρI,a (x)VI,a (x), a where the summation is over atoms in the unit cell and the integrals are over all space. The integrals are readily evaluated as one dimensional radial integrals over a finite interval by virtue of the spherical symmetry and short range of the ionic densities in real space. The remaining ion-ion energy in (47) then corresponds to the point-ion energy (45) by virtue of the localization of the ionic charge densities within their respective pseudopotential cutoff radii. The above formulation exploits the fact that, although the total Coulomb energy per unit cell depends physically on the contributions of ions and electrons throughout the crystal, it is determined completely, per unit cell, by the density and potential within the unit cell; so that no Fourier transforms or Ewald sums are required and the evaluation can be accomplished in O(N) operations in real space, where N is the number of atoms. In terms of the total Coulomb energy, the total energy per unit cell is then nl Etot = Ts + EC + EeI + Exc . (49) nl The orbital dependence in Ts and EeI can be eliminated in the usual way using the relation l nl dx ρe (x)Veff (x) + EeI = fi εi (50) Ts − Ω i l is the local part of the effective potential implied by the Kohn-Sham equations, where Veff which produces Kohn-Sham orbitals ψi and eigenvalues εi according to (31), and ρe is the electronic charge density corresponding to orbitals ψi according to (37). Combining (46)– (50), we arrive then at an explicit real-space expression for the total energy per unit cell in terms of the Kohn-Sham eigenvalues: Etot = i fi εi + Ω l dx ρe (x)Veff (x) − 12 ρ(x)VC (x) − ρe (x)εxc (x; ρe ) + 1 2 dx ρI,a (x)VI,a (x). (51) a l is the local part of the effective potential which In the self-consistent solution process, Veff produces Kohn-Sham orbitals ψi and eigenvalues εi . The electronic charge density ρe is constructed from the orbitals, the total charge density ρ is constructed from ρe , and the total Coulomb potential VC , from ρ. With such accounting of self-consistent inputs and outputs, the resulting expression is variational in the output density ρe and quadratically convergent [7, 41, 59]. Figure 13 shows the convergence of the FE total energy and Kohn-Sham eigenvalues to exact values as the number of elements is increased in a self-consistent GaAs calculation at an arbitrary k point [41]. Here, “exact” values were taken from a standard PW calculation, employing Fourier transforms and Ewald summations in the usual way [7, 56, 57]. Both FE and PW calculations used the same nonlocal pseudopotentials [55] and exchange-correlation functional [58]. FE results are for a series of uniform wavefunction meshes from 8 × 8 × 8 to 22 EFE – EEXACT (a.u.) GaAs Etot E1 E2 E3 Elements in each direction FIG. 13: Error EFE − EEXACT of finite element (FE) self-consistent total energy and Kohn-Sham eigenvalues at an arbitrary k point for GaAs in the local density approximation. FE results are for a series of uniform meshes from 8 × 8 × 8 to 32 × 32 × 32 in the fcc primitive cell. The asymptotic slope of ≈ −6 on the log-log scale shows that both total energy and eigenvalues converge to exact values at the optimal theoretical rate consistent with the cubic completeness of the basis. 32 × 32 × 32 in the fcc primitive cell. As in the PW method, higher resolution is employed in the calculation of the charge density and potential (twice the resolution of the wavefunction mesh in the present case). The optimal, variational convergence of the FE approximations to the exact self-consistent solution is clearly exhibited: the error is strictly positive and monotonically decreasing, with an asymptotic slope of ≈ −6 on a log-log scale, indicating an error of O(h6 ), where h is the mesh spacing, consistent with the cubic completeness of the basis employed. This is in contrast to FD approaches where, lacking a variational foundation, the error can be of either sign and may oscillate. VI. APPLICATIONS Since the application of the finite element method to ab initio electronic-structure calculations of solids is yet in its early stages, and codes are still under development, applications to date are still relatively few and timing comparisons with other more mature methods and codes are not yet meaningful. The speed and capacity of FE electronic-structure codes will increase substantially as efficient massively parallel implementations are constructed and optimized. Given this, it is notable that comparable efficiency with an established planewave code has already been reported [35] in parallel calculations on a small number of processors. Among the early applications of the finite element method to large-scale ab initio calculations in solids is the work of Sterne et al. [60] on the calculation of positron distributions and lifetimes in metals. Positrons injected into solids tend to avoid nuclei and concentrate preferentially in open volume regions. Shortly after injection, they annihilate with surrounding electrons, emitting radiation that can be detected experimentally. The annihilation rate depends on the overlap the positron’s distribution with the surrounding electronic distribution, and the overlap of the distributions depends on the nature of defects in the material. The positron lifetime is thus an indicator of the types of defects present. The positron distribution and lifetime can be calculated by ab initio methods, thus turning the experimental lifetime measurement into a defect-specific probe. Since the number of atoms required to 23 Positron distribution Fe Cu FIG. 14: Positron distribution in the vicinity of a Cu precipitate in Fe. Shown is the density in a slice through the center of a 5488 atom cell. The positron concentrates in the vicinity of the Cu atoms and is suppressed at the nuclei, resulting in a highly inhomogeneous distribution. adequately model the relevant defects can be large, and the positron wavefunctions are not at all atomic-orbital like, the FE method is particularly well suited to this computation. When a positron enters a solid, it quickly thermalizes and either becomes trapped in a defect, if present, or enters a low-momentum (small k) Bloch state. Thus, in either case, by the time a positron annihilates, it is in its lowest-energy state. And so the central problem is to calculate this lowest-energy quantum state. The calculation of positron states proceeds, + then, in exactly the same way as for electronic states: First, an effective potential veff for the positron in the material is constructed. Then, the corresponding positron Schrödinger equation + ψ + = ε+ ψ + (52) − 12 ∇2 ψ + + veff (in atomic units) is solved. Once the positron wavefunction ψ + has been calculated, the positron annihilation rate τ , the inverse of the positron lifetime λ, can be calculated [61]: 2 (53) λ = 1/τ = πr0 c n+ (x)n(x)Γ[n(x)] dx, Ω where r0 is the classical electron radius, n is the ground-state electron charge density, n+ is the positron charge density (|ψ + |2 ), and Γ is an enhancement factor which takes into account the fact that the electrons are drawn toward the positron, thus increasing the overlap and annihilation rate. Sterne et al. [60] have employed the FE method to compute positron distributions and lifetimes for various defect structures consisting of hundreds to thousands of atoms. Figure 14 shows the positron distribution (density) in the vicinity of a Cu precipitate in Fe. The positron concentrates in the vicinity of the Cu atoms and is suppressed at the nuclei, resulting in a highly inhomogeneous, non-electron-like distribution. Because the distribution is nonelectron-like (concentrating away from the nuclei rather than around them), the generality of the FE basis makes it particularly well suited for such problems, whereas other electronicstructure bases, such as linear muffin tin orbital bases for example, are less well suited. The precipitate was modeled by a 5488 atom cell—among the largest such calculations reported by any method to date. Using a 42 × 42 × 42 uniform mesh, corresponding to ∼ 95 cubic 24 basis functions/atom, a non-self-consistent finite-element calculation of the positron lifetime required just 17.9 hrs of CPU time on a single processor. Among more recent applications of the FE method to large-scale ab initio calculations is the work of Tsuchida et al. [36, 37] on molecular dynamics simulations of liquids. They employ a C 1 cubic basis with adaptive coordinates [34] to increase the efficiency of the representation. Separable nonlocal pseudopotentials are employed and Brillouin zone sampling is restricted to the Γ point. In Ref. [36], they apply the method to the study of the structural and dynamic properties of liquid methanol (CH3 OH), using a generalized gradient approximation [62, 63] to exchange and correlation to improve upon the local density approximation in modeling hydrogen bonds. Adaptive coordinate transformations were found to be especially useful to concentrate degrees of freedom in the vicinity of strongly localized O orbitals. The liquid was modeled by 32 molecules in a cubic cell of 24.41 a.u.3 , for a total of 192 atoms. The molecular dynamics simulation was performed for 1.7 ps with a time step of 1.2 fs, for a total of ∼ 1400 steps (each of which requiring a self-consistent calculation). In Ref. [37], the method was applied to the study of liquid formamide (HCONH2 ). A more recent generalized gradient approximation [64] to exchange and correlation was employed to improve modeling of hydrogen bonds. The adaptive coordinate transformations were used to approximately double the resolution of the basis at the C, N, and O centers. The liquid was modeled by 100 molecules in a cubic supercell of side 35.5 a.u., for a total of 600 atoms. The molecular dynamics simulation was carried out for 7 ps, with a time step of 1.33 fs, for a total of ∼ 5200 steps. The scope of this simulation is on the order of the largest performed by standard planewave methods and demonstrates the capacity of the finite element method for such large-scale computations already, at this relatively early stage of development. VII. SUMMARY AND OUTLOOK Because FE bases are simultaneously polynomial and strictly local in nature, FE methods retain significant advantages of FD methods without sacrificing the use of a basis, and in this sense, combine advantages of both PW and FD methods for ab initio electronic structure calculations. In particular, while variational and systematically improvable, the FE method produces sparse matrices and requires no computation- or communication-intensive transforms; and so is well suited to large, accurate calculations on massively parallel architectures. However, FE methods produce generalized rather than standard eigenproblems, require more memory than FD based approaches, and are more difficult to implement. Because of the relative merits of each approach, and because FE based approaches are yet at an early stage of development, it cannot be known which approach will prove superior in the large-scale ab initio electronic structure context in the years to come [26]. FE based ab initio calculations already involving hundreds of atoms [37] are a promising indication, however, and the development and optimization of FE based approaches for a range of large-scale applications remains a very active area of research. Acknowledgments We thank B.M. Klein and C.Y. Fong for their early and substantial contributions to this research. This work was performed under the auspices of the U.S. Department of Energy by 25 University of California, Lawrence Livermore National Laboratory under Contract W-7405Eng-48. [1] P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964); W. Kohn and L.J. Sham, ibid. 140, A1133 (1965). [2] For reviews see, e.g., N.D. Lang, Solid State Phys. 28, 255 (1973); A.K. Rajagopal, Adv. Chem. Phys. 41, 59 (1979); W. Kohn and P. Vashishta, in Theory of the Inhomogeneous Electron Gas, edited by S. Lundqvist and N.H. March (Plenum Press, New York, 1983), p. 79; N.D. Lang, ibid., p. 309; A.R. Williams and V. von Barth, ibid., p. 189; J. Callaway and N.H. March, Solid State Phys. 38, 135 (1984); R.O. Jones and O. Gunnarsson, Rev. Mod. Phys. 61, 689 (1989). [3] O.C. Zienkiewicz and R.L. Taylor, The Finite Element Method, 4th ed. (McGraw-Hill, New York, 1988). [4] G. Strang and G.J. Fix, An Analysis of the Finite Element Method (Prentice-Hall, Englewood Cliffs, NJ, 1973). [5] Finite Element Handbook, edited by H. Kardestuncer et al. (McGraw-Hill, New York, 1987). [6] D. Singh, Plane waves, pseudopotentials and the LAPW method, Kluwer Academic, 1994. [7] W.E. Pickett, Comput. Phys. Rep. 9, 115 (1989). [8] S.R. White, J.W. Wilkins, and M.P. Teter, Phys. Rev. B 39, 5819 (1989). [9] E. Tsuchida and M. Tsukada, Phys. Rev. B 52, 5573 (1995). [10] H. L. Skriver, The LMTO Method, Springer-Verlag, 1984. [11] S. Yamakawa and S. Hyodo, J. Alloys and Compounds 356-357, 231 (2003). [12] D. Vanderbilt, Phys. Rev. B 41, 7892 (1990); K. Laasonen, R. Car, C. Lee, and D. Vanderbilt, ibid. 43, 6796 (1991). [13] A.M. Rappe, K.M. Rabe, E. Kaxiras, and J.D. Joannopoulos, Phys. Rev. B 41, 1227 (1990). [14] J.S. Lin, A. Qteish, M.C. Payne, and V. Heine, Phys. Rev. B 47, 4174 (1993). [15] F. Gygi, Europhys. Lett. 19, 617 (1992); Phys. Rev. B 48, 11692 (1993); 51, 11190 (1995). [16] A. Devenyi, K. Cho, T.A. Arias, and J.D. Joannopoulos, Phys. Rev. B 49, 13373 (1994). [17] D.R. Hamann, Phys. Rev. B 51, 7337 (1995); 51, 9508 (1995); 54, 1568 (1996). [18] S. Goedecker, M. Boulet, and T. Deutsch, Comput. Phys. Commun. 154, 105 (2003). [19] A. Canning and D. Raczkowski, Bull. Am. Phys. 49, 344 (2004). [20] C.-K. Skylaris, A.A. Mostofi, P.D. Haynes, O. Diéguez, and M.C. Payne, Phys. Rev. B 66, 035119 (2002). [21] J.R. Chelikowsky, N. Troullier, and Y. Saad, Phys. Rev. Lett. 72, 1240 (1994); J.R. Chelikowsky, N. Troullier, K. Wu, and Y. Saad, Phys. Rev. B 50, 11355 (1995). [22] M.M.G. Alemany, M. Jain, L. Kronik, and J.R. Chelikowsky, Phys. Rev. B 69, 075101 (2004). [23] A.P. Seitsonen, M.J. Puska, and R.M. Nieminen, Phys. Rev. B 51, 14057 (1995); M. Heiskanen, T. Torsti, M.J. Puska, and R.M. Nieminen, Phys. Rev. B 63, 245106 (2001). [24] F. Gygi and G. Galli, Phys. Rev. B 52, R2229 (1995). [25] K.A. Iyer, M.P. Merrick, and T.L. Beck, J. Chem. Phys. 103, 227 (1995). [26] T.L. Beck, Rev. Mod. Phys. 72, 1041 (2000). [27] T. Hoshi, M. Arai, and T. Fujiwara, Phys. Rev. B 52, R5459 (1995). [28] E.L. Briggs, D.J. Sullivan, and J. Bernholc, Phys. Rev. B 52, R5471 (1995); 54, 14362 (1996). [29] N.A. Modine, G. Zumbach, and E. Kaxiras, Phys Rev. B 55, 10289 (1997); U.V. Waghmare, 26 [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] H. Kim, I.J. Park, N. Modine, P. Maragakis, and E. Kaxiras, Comput. Phys. Commun. 137, 341 (2001). J.-L Fattebert, J. Comput. Phys. 149, 75 (1999). J.-L. Fattebert and M.B. Nardelli, in Handbook of Numerical Analysis, vol. X, Elsevier, Amsterdam, 2003. For reviews see, e.g., J. Linderberg, Comput. Phys. Rep. 6, 209 (1987); J.J.S. Neto and J. Linderberg, Comput. Phys. Commun. 66, 55 (1991); and L.R. Ram-Mohan, Finite Element and Boundary Element Applications in Quantum Mechanics (Oxford University Press, New York, 2002). B. Hermansson and D. Yevick, Phys. Rev. B 33, 7241 (1986). E. Tsuchida and M. Tsukada, Phys. Rev. B 54, 7602 (1996). E. Tsuchida and M. Tsukada; J. Phys. Soc. Japan 67, 3844 (1998). E. Tsuchida, Y. Kanada, and M. Tsukada, Chem. Phys. Lett. 311, 236 (1999). E. Tsuchida, J. Chem. Phys. 121, 4740 (2004). J.E. Pask, B.M. Klein, C.Y. Fong, and P.A. Sterne, Phys. Rev. B 59, 12352 (1999). J.E. Pask, B.M. Klein, P.A. Sterne, and C.Y. Fong, Comput. Phys. Commun. 135, 1 (2001). J.E. Pask and P.A. Sterne, in Handbook of Materials Modeling, Vol. 1 (R. Catlow, H. Shercliff, and S. Yip Eds.), Kluwer Academic Publishers, 2004, in press. J.E. Pask and P.A. Sterne, Phys. Rev. B 71, 113101 (2005). S. Jun, Int. J. Numer. Meth. Engng. 59, 1909 (2004). M.C. Payne, M.P. Teter, D.C. Allan, T.A. Arias, and J.D. Joannopoulos, Rev. Mod. Phys. 64, 1045 (1992). T.A. Arias, Rev. Mod. Phys. 71, 267 (1999). N.W. Ashcroft and N.D. Mermin, Solid State Physics (Holt, Rinehart and Winston, New York, 1976). M.L. Cohen and T.K. Bergstresser, Phys. Rev. 141, 789 (1966). E. Madelung, Phys. Z. 19, 524 (1918). P.P. Ewald, Ann. Phys. 64, 253 (1921); Göttinger Nachr., Math.-Phys. Kl. II 3, 55 (1937). E. Wigner and F. Seitz, Phys. Rev. 43, 804 (1933); 46, 509 (1934). K. Fuchs, Proc. R. Soc. 151, 585 (1935). R.A. Coldwell-Horsfall and A.A. Maradudin, J. Math. Phys. 1, 395 (1960). J. Ihm, A. Zunger, and M.L. Cohen, J. Phys. C 12, 4409 (1979). G.B. Bachelet, H.S. Greenside, G.A. Baraff, and M. Schlüter, Phys. Rev. B 24, 4745 (1981); G.A. Baraff and M. Schlüter, Phys. Rev. B 28, 2296 (1983); G.A. Baraff and M. Schlüter, Phys. Rev. B 30, 1853 (1984). P. Ordejón, E. Artacho, and J.M. Soler, Phys. Rev. B 53, 10441 (1996); J.M. Soler, E. Artacho, J.D. Gale, A. Garcı́a, J. Junquera, P. Ordejón, D. Sánchez-Portal, J. Phys.: Condens. Matter 14, 2745 (2002). C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B 58, 3641 (1998). X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese, L. Sindic, M. Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami, Ph. Ghosez, J.-Y. Raty, and D.C. Allan, Comput. Mat. Sci. 25, 478 (2002). The abinit code is a common project of the Université Catholique de Louvain, Corning Incorporated, and other contributors (URL http://www.abinit.org). J.P. Perdew and A. Zunger, Phys. Rev. B 23, (1981). J.R. Chelikowsky and S.G. Louie, Phys. Rev. B 29, 3470 (1984). 27 [60] [61] [62] [63] [64] P.A. Sterne, J.E. Pask, and B.M. Klein, Appl. Surf. Sci. 149, 238 (1999). See, e.g., M.J. Puska and R.M. Nieminen, Rev. Mod. Phys. 66, 841 (1994). A.D. Becke, Phys Rev. A 38, 3098 (1988). C. Lee, W. Yang, and R.G. Parr, Phys. Rev. B 37, 785 (1988). J.P. Perdew, K. Burke, and M. Ernzerhof, Phys Rev. Lett. 77, 3865 (1996). 28