Large-Scale Density-Functional calculations for nano-meter size Si materials Jun-Ichi Iwata

advertisement
Large-Scale Density-Functional
calculations for nano-meter size Si
materials
Jun-Ichi Iwata
Center for Computational Sciences
University of Tsukuba
Feb 23, 2010, Tsukuba-Edinburgh Computational Science Workshop, Edinburgh
Outline
Quantum Mechanical (First-Principles)
Simulation in Solid-State Physics
Density-Functional Theory
W. Kohn (Nobel Prize in 1998)
Density-Functional simulations for large systems
Real-Space DFT program code
for Parallel Computation
Applications of RSDFT
for Si nano materials
-RSDFT-
>10,000-atom system
First-Principles Calculation in
Material Physics
•
We describe material properties from the behavior of
electrons and ions.
•
•
ions → classical, electrons → quantum
We solve the Schrodinger equation for electronic
ground state
•
Density-functional theory is a powerful tool for this purpose.
N
electron density
 (r )   i (r )
i 1
Density-Functional Theory
Energy Functional
 1

E[{i }]    dri* (r )   2i (r ) 
 2

i 1
N
(minimize)
  dr (r )vIons (r )
We get stable atomic &
electronic structures.

1
 (r )  (r)

d
r
d
r
2 
r  r
 E XC [  ]
Kohn-Sham equation
Potential
 (r) E XC [  ]
v[  ](r)   dr

r  r (r)
minimize with respect to
i
 1 2

    vIons (r )  v[  ](r ) i (r )   ii (r )
 2

→ We have to solve this equation self-consistently
( Nonlinear eigenvalue problem )
P. Hohenberg and W. Kohn, Phys. Rev. 136 (1964) B864.
W. Kohn and L. J. Sham, Phys. Rev. 140 (1965) A1133.
2
Performance of DFT
with simple approximation
E XC [  ]  E X [  ]  EC [  ]
Exchange functional in Local-Density Approx.
E X [  ]  C X  dr 5 / 3 (r )
Correctly describe various properties
v X [  ](r ) 
E X [  ]
5
  C X  1/ 3 (r )
(r )
3
quantitatively good results
Si(in diamond structure)
M. T. Yin and M. L. Cohen
Phys. Rev. B26, 5668 (1982).
DFT calc.
Expt.
Lattice Constant (Å)
5.37
5.41
Bulk Modulus (Mb)
0.977
0.988
Everybody wants to apply the DFT
for Large systems
A. Ichimiya et al., Surf. Sci. 493, 555 (2001).
Proteins(cytochrome c oxidase)
~30,000 atoms
Nano structures (Si pyramid)
~100,000 atoms
Usually, we treat 10- to 1000-atom systems by
DFT.
• However, we need to treat larger systems.
•
•
•
to study large objects (nano structures, proteins)
to make the atomic model more realistic
Real-Space DFT program code
(RSDFT)
Solve Kohn-Sham equation (eigenvalue problem)
→ Computational costs ~ O(N3)
Developed for parallel computers
Higher-order finite difference pseudopotential method
J. R. Chelikowsky et al., Phys. Rev. B, (1994)
Real-Space Method
( ⇔ Reciprocal-Space (Plane-Wave) Method )
continuous space
discrete space
 n (1) 


  
  (i ) 
 n

  
 (M ) 
 n

n (r)
discretize
n ( xi , yi , zi )  n (i)
 n ( x, y , z )
function
Column vector
Laplacian → Higher-Order Finite-Difference
 n ( x, y , z ) 
2
6
6
6
 C  ( x  m, y, z )   C  ( x, y  m, z )   C  ( x, y, z  m)
m  6
m n
m  6
m n
Typical number of grid points:10,000~1,000,000
m  6
m n
RSDFT – suitable for parallel first-principles calculation MPI ( Message Passing Interface ) library
Real-Space Finite-Difference
Sparse Matrix
FFT free (FFT is inevitable in the conventional plane-wave code)
Kohn-Sham eq. (finite-difference)
 1 2

PP
    vs [  ](r )  vˆnloc (r ) n (r )   nn (r )
 2

3D grid is divided by several regions
for parallel computation.
CPU6
CPU7
CPU8
CPU3
CPU4
CPU5
CPU0
CPU1
Higher-order finite difference
6
2
 n ( x, y, z )   Cm n ( x  mx, y, z )
x 2
m 6
MPI_ISEND, MPI_IRECV
Integration

m
(r) n (r)dr 
Mesh

i 1
m
(ri ) n (ri )xyz
MPI_ALLREDUCE
CPU2
with our recently developed code “RSDFT”
Massively Parallel Computing
Iwata et al, J. Comp. Phys. (2010)
Real-Space Density-Functional Theory code (RSDFT)
Based on the finite-difference pseudopotential method (J. R. Chelikowsky et al., PRB1994)
Highly tuned for massively parallel computers
Computations are done on a massively-parallel cluster PACS-CS at University of Tsukuba.
(Theoretical Peak Performance = 5.6GFLOPS/node)
The largest system in the present study → Si10701H1996
Grid points = 3,402,059
Bands = 22,432
|Vnew-Vold|^2
Convergence behavior for Si10701H1996
10
-1
10
-2
10
-3
10
-4
10
-5
10
-6
10
-7
10
-8
10
-9
10
Computational Time (with 1024 nodes of PACS-CS)
-10
0
10
20
30
iteration
40
50
6781 sec. × 60 iteration step = 113 hour
Flow chart
Algorithm → subspace iteration method
(Rayleigh-Ritz method)
Input initial configuration of Ions
Calc. Ionic Potentials
Conjugate-Gradient Method
O(N2)
Gram-Schmidt orthonormalization
O(N3)
Density, Potentials update
O(N)
Atomic structure optimization
Convergence Check
vnew  vold  
yes
Hellman-Feynman Force
Move ions
Subspace Diagonalization
O(N3)
Convergence Check
Electronic structure optimization
Electronic structure optimization must be
performed in each atomic optimization step
Force max   
yes
Total Computational Cost ~O(N3)
Algorithm1
→ Subspace Iteration Method(Rayleigh-Ritz Method)
Problem

 1

  L  vIons  vH [  ]  v XC [  ]  n   n n
 2

M-dimensional eigenvalue problem
We need smallest N(≪M) eigen-pairs
Initial guess
1 ,2 ,, N 
Minimize Reyleigh quotients by Conjugate-Gradient Method


 n hKS  n

 n ( n ) 
 
n n



 n   n  pn
wave function update
Algorithm 2
Gram-Schmidt Orthogonalization

n 1




 n   n   m  m  n
m 1


 m  n   mn
O(MN2)
Subspace Diagonalization
1,2 ,, N  → as a basis set
Calc. Matrix Elements


hm,n   m hKS  n

N

 n   ci i
O(MN2)
O(MN2)
(Ritz vectors)
i 1





hN  N
 
 
  
 
 ci    i  ci 
 
 
 
 
O(N3)
 

 1 , 2 ,, N  ← initial guess for
the next iteration
Gram-Schmidt
orthogonalization
~Active use of Level 3 BLAS in O(N3) computation~
→ Collaboration with computer scientists
much improve the performance of the RSDFT!
Time & Performance for Gram-Schmidt
Time (sec)
GFLOPS/node
Old algorithm
661 (710)
0.70 (0.65)
New algorithm
111 (140)
4.30 (3.50)
Theoretical peak performance = 5.6 GFLOPS/node
O(N3) part can be computed at 80% of the theoretical peak performance!
Algorithm of GS
 1   1
 2   2  1  1  2
 3   3  1  1  3  2  2  3
Part of the calculations can be
performed as Matrix × Matrix operation!
 4   4  1  1  4  2  2  4  3  3  4
 5   5  1  1  5  2  2  5  3  3  5  4  4  5
 6   6  1  1  6  2  2  6  3  3  6  4  4  6  5  5  6
PACS-CS(5.6GFLOPS/node)
256nodes
Elapsed time for
1 step of iteration
Time (sec)
600
400
CG O(N2)
GS O(N3)
SD O(N3)
Others
200
0
512
1000
1728
2744
Number of Si atoms
→ time for O(N2)-part and O(N3)-part become comparable
4096
Application 1
Nano-meter size
Si quantum dots
Si quantum dot is a promising material
for several device applications
 Memory
 Single-electron transistor
 Optical Device
Clarifying the relation between the “Dot size” and “Band gap”
is important for controlling the device properties.
First-principles calculations are useful
for such studies? → Yes, but …
 System size is very large!
A model of the Si quantum dot
of 6.6 nm diameter(Si7055H1596)
EgSCF  I ( N )  A( N )  E( N  1)  E( N 1)  2E( N )
Band Gaps
4.0
SCF gap
KS gap
Expt.
Eg (eV)
3.0
Experimental fit curve
2.0
E gExpt.  1.136 
1.0
2
3
4
5
6
7
(eV)
From STS measurement
B.Zanknoon et al.,
Nano letters 8, 1689 (2008).
Diameter (nm)
300 atoms
9.75
D2
>10,000 atoms
The ΔSCF gap seems to be closer to the ΔKS gap …
EgKS   LUKS   HOKS
Application 2
Si nanowires
Samsung Si nanowire devices
IEDM2005
IEDM2006
Diameter of NW
10 nm
8 nm
Gate length
30 nm
15 nm
Vdd
1.0 V
1.0 V
I_on (n)
2.64 mA/m
1.4 mA/m
I_on (p)
1.11 mA/m
1.94 mA/m
I_off (n)
3.1 nA/m
2.0 nA/m
I_off (p)
0.0056 mA/m
1.0 nA/m
Several size of Si nanowires
4 nm diameter
( 425 atoms)
20 nm diameter
( 8941 atoms )
10 nm diameter
( 2341 atoms)
There may be an optimum diameter
in the region of 10 nm ~ 20 nm.
Band Structure and DOS of SiNW (d=1nm)
3.4
(eV)
3.2
4
3
2.8
2
(eV)
3.0
2.6
1
0
-1
-2

0.5
X
(eV)
0.0
-0.5
DOS ( States / eV atom )
-1.0
1.0
-1.5
0.8
0.6
0.4
d=1nm
Si21H20(41 atoms)
Eg=2.60eV (LDA Bulk : 0.53eV)
0.2
0.0
-10
-8
-6
-4
-2
(eV)
0
2
4
6
Band Structure and DOS of SiNW (d=4nm)
1.2
(eV)
1.1
2.0
(eV)
1.5
1.0
0.9
1.0
0.8
0.5
0.7
0.0
-0.5
-1.0

X
0.4
(eV)
0.0
-0.4
DOS ( States / eV atom )
-0.8
1.0
0.8
0.6
0.4
d=4nm
Si341H84(425 atoms)
Eg=0.81eV (LDA Bulk=0.53eV)
0.2
0.0
-12
-10
-8
-6
-4
(eV)
-2
0
2
Band Structure and DOS of SiNW (d=8nm)
Si1361H164(1525 atoms), Eg=0.61eV
1.0
1.5
0.9
1.0
0.8
(eV)
(eV)
2.0
0.5
0.0
0.6
-0.5
0.5

X
1.0
0.0
0.8
0.6
(eV)
DOS ( States / eV atom )
-1.0
0.7
0.4
-0.2
-0.4
0.2
-0.6
0.0
-12
-10
-8
-6
-4
(eV)
-2
0
2
1.0
2.0
0.8
1.5
1.0
0.6
(eV)
DOS ( States / eV atom )
Bulk Si
0.4
0.5
0.0
0.2
-0.5
0.0
-1.0
-12
-10
-8
-6
-4
(eV)
-2
0
2

X
Eg=0.53eV
Si nano wire with surface roughness
Si12822H1544
Top View
Si12822H1544(14,366 atoms)
・10nm diameter、3.3nm height、(100)
・Grid spacing:0.45Å (~14Ry)
・# of grid points:4,718,592
・# of bands:29,024
・Memory:1,022GB~2,044GB
Side View
PACS-CS1024 nodes(peak performance:5.6 GFLOPS/node)
Subspace diagonalization:4600 sec.
Gram-Schmidt:2300 sec.
Conjugate-Gradient Method:3700 sec.
Total Energy calc.:1200 sec.
Total(1 step):12,000 sec.
DOS of SiNW with roughness
1.0
DOS ( states / eV atom )
DOS ( States / eV atom )
DOS of Bulk Si
0.8
0.6
0.4
0.2
0.0
-12
-10
-8
-6
-4
(eV)
-2
0
2
1.0
0.8
0.6
0.4
0.2
0.0
-12
-10
-8
-6
-4
Energy (eV)
-2
0
d=10nm(with roughness)
Si12822H1544(14,366 atoms)
Eg=0.57eV
2
Application3
Si divacancy
Si divacancy
Structure of Si divacancy :
Small-yellow balls : vacancies (no atoms)
Green balls : Si atoms with dangling bonds.
a'
c'
v
b
dab
There are two possibilities
for the structure of Si
divacancy.
v
a
b
b'
c'
a'
c
a
c
dac
b'
Resonant-Bond type
What is the stable structure ?
EPR experiment
(Watkins & Corbett, 1965)
Large-Paring type
c'
LDA calculation
(Saito & Oshiyama, 1994)
Resonant-Bond type is stable
(Large-Paring type was not found)
Model size ~ 60 atoms
a
b
a'
b'
c
Large-paring type
More recent LDA calculation (Oguet et al., 1999)
・Both “Large-paring” and “Resonant-Bond” structure were found.
・Large-Paring type is the most stable (RB type is a local minimum)
Model size ~ 300 atoms
→Model Size dependence ?
Structure of Si divacancy :
Small-yellow balls : vacancies (no atoms)
Green balls : Si atoms with dangling bonds.
Si divacancy
a'
c'
v
b
dab
v
a
b'
c
dac, dab (Å)
dac
3.6
3.4
3.2
3.0
2.8
2.6
62
214
510
Model size (# of atoms)
Large-paring
Resonant-Bond
Small-Paring
There are two possibilities
for the structure of Si
divacancy.
b
c'
a'
c
a
b'
Resonant-Bond type
•Structures converge at
998-atom model.
c'
• LP structure appears
at 510 or larger models.
a
•RB structure is most
998 stable, but the energy
difference is very small
(<10 meV)
b
a'
b'
c
Large-paring type
J.-I. Iwata, et al., Phys. Rev. B 77 (2008) 115208
Summary
We have developed Real-Space DFT program code for large systems
by utilizing the massively parallel computers
Collaboration with computer scientist much improve the performance of RSDFT
(Especially, O(N3)-part calculation with BLAS 3)
By using a few hundred~1000CPUs, we have achieved the first-principles calculation for
・Si 1000-atom system with atomic structure optimization
・Self-Consistent electronic structures of Si 10,000-atom systems
By using large atomic models → eliminate the model-size dependence
We have applied the RSDFT for nano-meter scale Si materials (SiNW, SiQD)
I think the RSDFT becomes an useful tool for future device development
Download