Large-Scale Density-Functional calculations for nano-meter size Si materials Jun-Ichi Iwata Center for Computational Sciences University of Tsukuba Feb 23, 2010, Tsukuba-Edinburgh Computational Science Workshop, Edinburgh Outline Quantum Mechanical (First-Principles) Simulation in Solid-State Physics Density-Functional Theory W. Kohn (Nobel Prize in 1998) Density-Functional simulations for large systems Real-Space DFT program code for Parallel Computation Applications of RSDFT for Si nano materials -RSDFT- >10,000-atom system First-Principles Calculation in Material Physics • We describe material properties from the behavior of electrons and ions. • • ions → classical, electrons → quantum We solve the Schrodinger equation for electronic ground state • Density-functional theory is a powerful tool for this purpose. N electron density (r ) i (r ) i 1 Density-Functional Theory Energy Functional 1 E[{i }] dri* (r ) 2i (r ) 2 i 1 N (minimize) dr (r )vIons (r ) We get stable atomic & electronic structures. 1 (r ) (r) d r d r 2 r r E XC [ ] Kohn-Sham equation Potential (r) E XC [ ] v[ ](r) dr r r (r) minimize with respect to i 1 2 vIons (r ) v[ ](r ) i (r ) ii (r ) 2 → We have to solve this equation self-consistently ( Nonlinear eigenvalue problem ) P. Hohenberg and W. Kohn, Phys. Rev. 136 (1964) B864. W. Kohn and L. J. Sham, Phys. Rev. 140 (1965) A1133. 2 Performance of DFT with simple approximation E XC [ ] E X [ ] EC [ ] Exchange functional in Local-Density Approx. E X [ ] C X dr 5 / 3 (r ) Correctly describe various properties v X [ ](r ) E X [ ] 5 C X 1/ 3 (r ) (r ) 3 quantitatively good results Si(in diamond structure) M. T. Yin and M. L. Cohen Phys. Rev. B26, 5668 (1982). DFT calc. Expt. Lattice Constant (Å) 5.37 5.41 Bulk Modulus (Mb) 0.977 0.988 Everybody wants to apply the DFT for Large systems A. Ichimiya et al., Surf. Sci. 493, 555 (2001). Proteins(cytochrome c oxidase) ~30,000 atoms Nano structures (Si pyramid) ~100,000 atoms Usually, we treat 10- to 1000-atom systems by DFT. • However, we need to treat larger systems. • • • to study large objects (nano structures, proteins) to make the atomic model more realistic Real-Space DFT program code (RSDFT) Solve Kohn-Sham equation (eigenvalue problem) → Computational costs ~ O(N3) Developed for parallel computers Higher-order finite difference pseudopotential method J. R. Chelikowsky et al., Phys. Rev. B, (1994) Real-Space Method ( ⇔ Reciprocal-Space (Plane-Wave) Method ) continuous space discrete space n (1) (i ) n (M ) n n (r) discretize n ( xi , yi , zi ) n (i) n ( x, y , z ) function Column vector Laplacian → Higher-Order Finite-Difference n ( x, y , z ) 2 6 6 6 C ( x m, y, z ) C ( x, y m, z ) C ( x, y, z m) m 6 m n m 6 m n Typical number of grid points:10,000~1,000,000 m 6 m n RSDFT – suitable for parallel first-principles calculation MPI ( Message Passing Interface ) library Real-Space Finite-Difference Sparse Matrix FFT free (FFT is inevitable in the conventional plane-wave code) Kohn-Sham eq. (finite-difference) 1 2 PP vs [ ](r ) vˆnloc (r ) n (r ) nn (r ) 2 3D grid is divided by several regions for parallel computation. CPU6 CPU7 CPU8 CPU3 CPU4 CPU5 CPU0 CPU1 Higher-order finite difference 6 2 n ( x, y, z ) Cm n ( x mx, y, z ) x 2 m 6 MPI_ISEND, MPI_IRECV Integration m (r) n (r)dr Mesh i 1 m (ri ) n (ri )xyz MPI_ALLREDUCE CPU2 with our recently developed code “RSDFT” Massively Parallel Computing Iwata et al, J. Comp. Phys. (2010) Real-Space Density-Functional Theory code (RSDFT) Based on the finite-difference pseudopotential method (J. R. Chelikowsky et al., PRB1994) Highly tuned for massively parallel computers Computations are done on a massively-parallel cluster PACS-CS at University of Tsukuba. (Theoretical Peak Performance = 5.6GFLOPS/node) The largest system in the present study → Si10701H1996 Grid points = 3,402,059 Bands = 22,432 |Vnew-Vold|^2 Convergence behavior for Si10701H1996 10 -1 10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 Computational Time (with 1024 nodes of PACS-CS) -10 0 10 20 30 iteration 40 50 6781 sec. × 60 iteration step = 113 hour Flow chart Algorithm → subspace iteration method (Rayleigh-Ritz method) Input initial configuration of Ions Calc. Ionic Potentials Conjugate-Gradient Method O(N2) Gram-Schmidt orthonormalization O(N3) Density, Potentials update O(N) Atomic structure optimization Convergence Check vnew vold yes Hellman-Feynman Force Move ions Subspace Diagonalization O(N3) Convergence Check Electronic structure optimization Electronic structure optimization must be performed in each atomic optimization step Force max yes Total Computational Cost ~O(N3) Algorithm1 → Subspace Iteration Method(Rayleigh-Ritz Method) Problem 1 L vIons vH [ ] v XC [ ] n n n 2 M-dimensional eigenvalue problem We need smallest N(≪M) eigen-pairs Initial guess 1 ,2 ,, N Minimize Reyleigh quotients by Conjugate-Gradient Method n hKS n n ( n ) n n n n pn wave function update Algorithm 2 Gram-Schmidt Orthogonalization n 1 n n m m n m 1 m n mn O(MN2) Subspace Diagonalization 1,2 ,, N → as a basis set Calc. Matrix Elements hm,n m hKS n N n ci i O(MN2) O(MN2) (Ritz vectors) i 1 hN N ci i ci O(N3) 1 , 2 ,, N ← initial guess for the next iteration Gram-Schmidt orthogonalization ~Active use of Level 3 BLAS in O(N3) computation~ → Collaboration with computer scientists much improve the performance of the RSDFT! Time & Performance for Gram-Schmidt Time (sec) GFLOPS/node Old algorithm 661 (710) 0.70 (0.65) New algorithm 111 (140) 4.30 (3.50) Theoretical peak performance = 5.6 GFLOPS/node O(N3) part can be computed at 80% of the theoretical peak performance! Algorithm of GS 1 1 2 2 1 1 2 3 3 1 1 3 2 2 3 Part of the calculations can be performed as Matrix × Matrix operation! 4 4 1 1 4 2 2 4 3 3 4 5 5 1 1 5 2 2 5 3 3 5 4 4 5 6 6 1 1 6 2 2 6 3 3 6 4 4 6 5 5 6 PACS-CS(5.6GFLOPS/node) 256nodes Elapsed time for 1 step of iteration Time (sec) 600 400 CG O(N2) GS O(N3) SD O(N3) Others 200 0 512 1000 1728 2744 Number of Si atoms → time for O(N2)-part and O(N3)-part become comparable 4096 Application 1 Nano-meter size Si quantum dots Si quantum dot is a promising material for several device applications Memory Single-electron transistor Optical Device Clarifying the relation between the “Dot size” and “Band gap” is important for controlling the device properties. First-principles calculations are useful for such studies? → Yes, but … System size is very large! A model of the Si quantum dot of 6.6 nm diameter(Si7055H1596) EgSCF I ( N ) A( N ) E( N 1) E( N 1) 2E( N ) Band Gaps 4.0 SCF gap KS gap Expt. Eg (eV) 3.0 Experimental fit curve 2.0 E gExpt. 1.136 1.0 2 3 4 5 6 7 (eV) From STS measurement B.Zanknoon et al., Nano letters 8, 1689 (2008). Diameter (nm) 300 atoms 9.75 D2 >10,000 atoms The ΔSCF gap seems to be closer to the ΔKS gap … EgKS LUKS HOKS Application 2 Si nanowires Samsung Si nanowire devices IEDM2005 IEDM2006 Diameter of NW 10 nm 8 nm Gate length 30 nm 15 nm Vdd 1.0 V 1.0 V I_on (n) 2.64 mA/m 1.4 mA/m I_on (p) 1.11 mA/m 1.94 mA/m I_off (n) 3.1 nA/m 2.0 nA/m I_off (p) 0.0056 mA/m 1.0 nA/m Several size of Si nanowires 4 nm diameter ( 425 atoms) 20 nm diameter ( 8941 atoms ) 10 nm diameter ( 2341 atoms) There may be an optimum diameter in the region of 10 nm ~ 20 nm. Band Structure and DOS of SiNW (d=1nm) 3.4 (eV) 3.2 4 3 2.8 2 (eV) 3.0 2.6 1 0 -1 -2 0.5 X (eV) 0.0 -0.5 DOS ( States / eV atom ) -1.0 1.0 -1.5 0.8 0.6 0.4 d=1nm Si21H20(41 atoms) Eg=2.60eV (LDA Bulk : 0.53eV) 0.2 0.0 -10 -8 -6 -4 -2 (eV) 0 2 4 6 Band Structure and DOS of SiNW (d=4nm) 1.2 (eV) 1.1 2.0 (eV) 1.5 1.0 0.9 1.0 0.8 0.5 0.7 0.0 -0.5 -1.0 X 0.4 (eV) 0.0 -0.4 DOS ( States / eV atom ) -0.8 1.0 0.8 0.6 0.4 d=4nm Si341H84(425 atoms) Eg=0.81eV (LDA Bulk=0.53eV) 0.2 0.0 -12 -10 -8 -6 -4 (eV) -2 0 2 Band Structure and DOS of SiNW (d=8nm) Si1361H164(1525 atoms), Eg=0.61eV 1.0 1.5 0.9 1.0 0.8 (eV) (eV) 2.0 0.5 0.0 0.6 -0.5 0.5 X 1.0 0.0 0.8 0.6 (eV) DOS ( States / eV atom ) -1.0 0.7 0.4 -0.2 -0.4 0.2 -0.6 0.0 -12 -10 -8 -6 -4 (eV) -2 0 2 1.0 2.0 0.8 1.5 1.0 0.6 (eV) DOS ( States / eV atom ) Bulk Si 0.4 0.5 0.0 0.2 -0.5 0.0 -1.0 -12 -10 -8 -6 -4 (eV) -2 0 2 X Eg=0.53eV Si nano wire with surface roughness Si12822H1544 Top View Si12822H1544(14,366 atoms) ・10nm diameter、3.3nm height、(100) ・Grid spacing:0.45Å (~14Ry) ・# of grid points:4,718,592 ・# of bands:29,024 ・Memory:1,022GB~2,044GB Side View PACS-CS1024 nodes(peak performance:5.6 GFLOPS/node) Subspace diagonalization:4600 sec. Gram-Schmidt:2300 sec. Conjugate-Gradient Method:3700 sec. Total Energy calc.:1200 sec. Total(1 step):12,000 sec. DOS of SiNW with roughness 1.0 DOS ( states / eV atom ) DOS ( States / eV atom ) DOS of Bulk Si 0.8 0.6 0.4 0.2 0.0 -12 -10 -8 -6 -4 (eV) -2 0 2 1.0 0.8 0.6 0.4 0.2 0.0 -12 -10 -8 -6 -4 Energy (eV) -2 0 d=10nm(with roughness) Si12822H1544(14,366 atoms) Eg=0.57eV 2 Application3 Si divacancy Si divacancy Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. a' c' v b dab There are two possibilities for the structure of Si divacancy. v a b b' c' a' c a c dac b' Resonant-Bond type What is the stable structure ? EPR experiment (Watkins & Corbett, 1965) Large-Paring type c' LDA calculation (Saito & Oshiyama, 1994) Resonant-Bond type is stable (Large-Paring type was not found) Model size ~ 60 atoms a b a' b' c Large-paring type More recent LDA calculation (Oguet et al., 1999) ・Both “Large-paring” and “Resonant-Bond” structure were found. ・Large-Paring type is the most stable (RB type is a local minimum) Model size ~ 300 atoms →Model Size dependence ? Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. Si divacancy a' c' v b dab v a b' c dac, dab (Å) dac 3.6 3.4 3.2 3.0 2.8 2.6 62 214 510 Model size (# of atoms) Large-paring Resonant-Bond Small-Paring There are two possibilities for the structure of Si divacancy. b c' a' c a b' Resonant-Bond type •Structures converge at 998-atom model. c' • LP structure appears at 510 or larger models. a •RB structure is most 998 stable, but the energy difference is very small (<10 meV) b a' b' c Large-paring type J.-I. Iwata, et al., Phys. Rev. B 77 (2008) 115208 Summary We have developed Real-Space DFT program code for large systems by utilizing the massively parallel computers Collaboration with computer scientist much improve the performance of RSDFT (Especially, O(N3)-part calculation with BLAS 3) By using a few hundred~1000CPUs, we have achieved the first-principles calculation for ・Si 1000-atom system with atomic structure optimization ・Self-Consistent electronic structures of Si 10,000-atom systems By using large atomic models → eliminate the model-size dependence We have applied the RSDFT for nano-meter scale Si materials (SiNW, SiQD) I think the RSDFT becomes an useful tool for future device development