Parallel Exponential Integrators for Quantum Dynamics Katharina Kormann

advertisement
Introduction
Parallel Exponential Integrators for
Quantum Dynamics
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Katharina Kormann
joint work with Magnus Gustafsson and Sverker Holmgren
Uppsala University
Division of Scientific Computing
Theory
Results
Outlook
April 28, 2010
Katharina Kormann | katharina.kormann@it.uu.se
1
Schrödinger Equation for Molecule–Laser
Interaction
i~
∂
ψ(r , t) = Ĥ(r , ∆r , t)ψ(r , t),
∂t
ψ(r , 0) = ψ0 (r ).
2
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
~
Hamiltonian for three-state system: Ĥ = − 2m
∆ + V with


Vg (r )
µ(r )V1 (t)
0
Ve1 (r )
V2 (r )  .
V =  µ(r )V1 (t)
0
V2 (r )
Ve2 (r )
Methods
Lanczos
Minimizing
Communication
Theory
Results
Outlook
A1 $+
0.03
potential energy surface
Error Control
0.04
u
0.02
3
b %u
0.01
0
target state "
!0.01
!0.02
! (t)
!0.03
X1$+g
!0.04
!0.05
!0.06
6
initial state #0
8
10
12
14
16
18
intercore distance
Katharina Kormann | katharina.kormann@it.uu.se
2
Some Numerical Challenges
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
• Curse of dimensionality: dimension of the problem
proportional to 3 times the number of nuclei
• Unbounded domains
• Oscillations of femto-second pulses
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
3
HAParaNDA
• Implementation framework for high-dimensional
time-dependet PDEs
Introduction
• Trade-off: Generality ←→ (Parallel) Efficiency
Spatial
Discretization
• Written in C (but will be ported to C++)
• Designed for massively parallel computations on clusters
Differentiation
Implementation
Temporal
Discretization
Clusters
multicore nodes
of
multicoreof
nodes:
Methods
Lanczos
Minimizing
Communication
• Message passing (MPI) between distributed nodes
Grand scale
required
realistic
• • OpenMP
forcomputing
worksharing
withinforeach
nodeproblems
Error Control
Theory
Results
Outlook
Introduction
Spatial
discretization
Temporal
discretization
Minimizing
communication
Conclusion
• Example node architecture, dual Intel Xeon E5430:
Katharina Kormann | katharina.kormann@it.uu.se
4
Spatial Discretization: Standard Method
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Pseudo-spectral method (Fourier basis)
+ High-accuracy
+ Efficient implementation based on FFT
– Non locality of differential operator
– Not flexible with boundary conditions
– No adaptivity possible
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
5
Spatial Discretization: Alternative
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
High-order finite differences
• trade-off high-accuracy and locality: 8th order often
good compromise
• Kronecker product of stencil for high-dimensions
+ Adaptivity possible
+ Differential operator localized
– 8th order compared to PS points per dimension to be
doubled to reach same accuracy (for equi-distributed
points)
Katharina Kormann | katharina.kormann@it.uu.se
6
Spatial Discretization: Implementation
Introduction
Spatial
Discretization
• Block-structured grid in d dimensions
• Currently employing an equidistant, static grid
• Choose block sizes w.r.t. cache sizes
• Adaptive grid refinement/coarsening to be implemented
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
7
Spatial Discretization: Implementation
Introduction
Spatial
Discretization
• Block-structured grid in d dimensions
• Currently employing an equidistant, static grid
• Choose block sizes w.r.t. cache sizes
• Adaptive grid refinement/coarsening to be implemented
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
7
Spatial Discretization
• High-order finite difference stencils
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
8
Spatial Discretization
• High-order finite difference stencils
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
8
Spatial Discretization
• High-order finite difference stencils
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
8
Spatial Discretization
• High-order finite difference stencils
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
8
Spatial Discretization
• Data dependencies on block boundaries (ghost cells)
• High-dimensional ghost cell blocks
• Large memory overhead due to duplicated data
Introduction
Spatial
Discretization
Differentiation
Implementation
• Communicate data one dimension at a time
• Reuse allocated arrays for ghost data
• Nearest-neighbor communication
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
9
Propagation and chemical properties
(bounded states)
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
• Conservation of total probability → L2 -norm
conservation
• Energy conservation
• Time-reversibility → hermitian evolution operator
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
10
Suitable Integrators for TDSE
• Exponential Integrators:
ψn+1 = exp(− ~i H(tn + ∆t/2)∆t)ψn
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
•
•
•
•
Split operator
Chebyshev scheme
Lanczos / Arnoldi algorithm
Truncated Magnus expansion for increased accuracy
• Symplectic
√ Integrators: Define q(t) =
p(t) =
√
2~<ψ(t) and
2~=ψ(t), then
d
∂
q(t) =
H(q, p, t)
dt
∂p
d
∂
p(t) = − H(q, p, t)
dt
∂q
• partitioned Runge–Kutta
• (Crank–Nicolson)
Katharina Kormann | katharina.kormann@it.uu.se
11
Chebyshev and Lanczos
Lanczos
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
• orthogonal bases of
Krylov subspace and
exponential of
projected
Hamiltonian
• memory: p vectors
• vectors matrix vector
products and scalar
products
Chebyshev
• approximate
exponential based on
Chebyshev
polynomials
• memory: three
vectors
• matrix vector
products
• suitable for short
time steps
• suitable for long time
steps
Katharina Kormann | katharina.kormann@it.uu.se
12
Symmetric Lanczos Algorithm
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Krylov subspace of size p: Kp (H, ψ) = {ψ, Hψ, . . . , H p−1 ψ}
Lp - tridiagonal projection matrix
Vp - orthonormal basis of Kp (H, ψ)
α1
 β1


Lp = 



β1
α2
..
.

β2
..
.
..
.
βp−2 αp−1 βp−1
βp−1 αp



 = VpT HVp ∈ Rp×p .


The vector is then approximated by
i
ψ(t + ∆t) ≈ kψ(t)kVp exp − Lp ∆t e1 ,
~
where e1 ∈ Rp with e1,j = δ1,j , j = 1, . . . , p.
Katharina Kormann | katharina.kormann@it.uu.se
13
Standard Lanczos Algorithm
Scalability issues
Introduction
• Sparse matrix-vector multiplication (nearest-neighbor)
Spatial
Discretization
• Two inner products in each iteration (all-to-all)
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Algorithm 1 The Lanczos Algorithm
v0 = 0
β0 = 0
v1 = Ψk / �Ψk �2
for j = 1, 2, . . . , m do
r = Hvj − βj−1 vj−1
αj = (r,vj )
r = r − αj vj
if j < m then
βj = �r�2
vj+1 = r / βj
end if
end for
Algorithm 2 The Modified Lanczos Algorithm
q0 = 0
r0 = Ψk / �Ψk �2
for j = 0, 1, 2, . . . , m do
t = Arj
Katharina Kormann | katharina.kormann@it.uu.se
14
Algorithm 1 The Lanczos Algorithm
v0 = 0
β0 = 0
v1 = Ψk / �Ψk �2
for j = 1, 2, . . . , m do
r = Hvj − βj−1 vj−1
αj = (r,vj )
r = r − αj vj
if j < m then
βj = �r�2
vj+1 = r / βj
end if
end for
A Modified Lanczos Scheme1
Introduction
Spatial
Discretization
• Alter Lanczos’ algorithm and merge the inner products
• Eliminates one synchronization point per iteration
Differentiation
Implementation
Algorithm 2 The Modified Lanczos Algorithm
q0 = 0
r0 = Ψk / �Ψk �2
for j = 0, 1, 2, . . . , m do
t = Arj
u = (t,rj )
v = (rj ,rj )
√
βj = v
αj+1 = u/v
qj+1 = rj /βj
rj+1 = t/βj − βj qj − αj+1 qj+1
end for
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
1
S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented
on Parallel Computers (1991)
Katharina Kormann | katharina.kormann@it.uu.se
15
s-step Lanczos1
s-step Lanczos1
• Reduced amount of synchronization by a factor s but
increased amount of arithmetic work
Introduction
Spatial
troduction
Discretization
Differentiation
patial
Implementation
scretization
Algorithm
is much
more complex and
• •Reduced
amount
of synchronization
by adifficult
factor sto optimize
for parallelism
• However:
increased amount of arithmetic work
Algorithm 3 The s-step Lanczos Algorithm
v1 = u
V¯1 = [ v11 , Av11 , . . . , As−1 v11 ]
Temporal
emporal
Discretization
scretization
Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 )
for k = 0, 1, . . . , !m/s" do
Call Scalar1
1
s
vk+1
= Avk1 −V̄k−1 γk−1
− V̄k αsk
Methods
Lanczos
inimizing
Minimizing
mmunication
Communication
1
1
1
, A2 vk+1
, . . . , As vk+1
Compute Avk+1
onclusion
Error Control
Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 )
Theory
Results
Call Scalar2
for j = 2, . . . , s do
j
1
vk+1
= Aj−1 vk+1
−V̄k ∗ tjk
end for
end for
Outlook
i
Scalar1: Form Wk , solve Wk−1 γk−1
= cik−1 , Wk αik = dik , i = 1, . . . , s
Scalar2: Solve Wk tjk = bjk , j = 2, . . . , s
1
S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented
on Parallel Computers (1991)
1
S. K.
Kim and
T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented
Katharina
Kormann
| A.
katharina.kormann@it.uu.se
16
Comparison of Lanczos Variants
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
inner products
vector updates
matrix × vector
all-to-all
standard
2s
5s
s
2s
optimized
2s
6s
s
s
s-step
2s
2s(s + 1)
s +1
1
Table: Comparison of computational work and communication
between s steps of standard/optimized Lanczos and one s-step
iteration.
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
17
Performance Comparison of Lanczos
5000
Introduction
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
4000
CPU−time (sec.)
Spatial
Discretization
3000
2000
1000
s−step Lanczos
Optimized Lanczos
Standard Lanczos
Error Control
Theory
Results
Outlook
0
8
16
32
64 128 256
Number of cores
512 1024 2048
Figure: Comparison of scaled speedup of the three Lanczos variants
(T = 1000 steps, Nbl = 604 , q = 8, p = 6, s = 3)
Katharina Kormann | katharina.kormann@it.uu.se
18
Potential Improvements1 of s-step Lanczos
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
• use different s-step basis
• use TSQR for reorthogonalization
→ tridiagonal projection matrix
→ improved stability
→ possible to choose larger value of s
Theory
Results
Outlook
1
M. Hoemmen, Communication-Avoiding Krylov Subspace Methods (2010)
Katharina Kormann | katharina.kormann@it.uu.se
19
Error Control and Temporal Adaptivity
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Idea: Improve performance and numerical stability by
adaptively choosing p and ∆t
Note: Trade-off between less iterations and overhead for error
estimation
Error Control
Theory
Results
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
20
Adaptive Magnus–Lanczos: Idea
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
• Choose step size according to error in the Magnus
expansion
• Choose size of the Krylov space according to error in
the Lanczos algorithm1
• Use a posteriori estimate2 to connect local and
global error
• Use special properties of the Schrödinger equation to
avoid solving the dual problem
Outlook
1
M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of
Differential Equations (1998)
2
Y. Cao and L. Petzold, A posteriori error estimate for ordinary differential equations by
the adjoint method (2004)
Katharina Kormann | katharina.kormann@it.uu.se
21
A posteriori error estimate
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Let hφ, ψ(tf )ir be the interesting quantity.
primal equation

 d ψ(t) = −i Hψ(t),
dt
~

ψ(0) = ψ0 .
dual equation

 d χ(t) = −i Hχ(t),
dt
~

χ(tf ) = φ.
Katharina Kormann | katharina.kormann@it.uu.se
22
A posteriori error estimate
Introduction
View the numerical results as the solution of the
perturbed primal equation
Spatial
Discretization

 d y (t) = −i Hy +S(t) ,
dt
~

y (0) = ψ0 +R .
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
dual equation
Error Control
Theory
Results
Outlook

 d χ(t) = −i Hχ(t),
dt
~

χ(tf ) = φ.
Katharina Kormann | katharina.kormann@it.uu.se
23
A posteriori error estimate
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
Define e(tf ) = ψ(tf ) − y (tf ).
Error in the functional hφ, ψ(tf )ir :
Z tf
hφ, e(tf )ir =
hχ(s), S(s)ir ds + hχ(0), Rir
0
Using Cauchy–Schwarz and norm conservation:
Z tf
|hφ, e(tf )ir | ≤
kχ(s)kr kS(s)kr ds + kχ(0)kr kRkr
0
Z tf
=
kS(s)kr ds + kRkr
0
Katharina Kormann | katharina.kormann@it.uu.se
24
Residual in Magnus expansion
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
The function ψ(t) = exp(Ω(t))ψ0 solves the differential
equation
ψ̇(t) = dexpΩ(t) (Ω̇(t))ψ,
1
1
[A, B] + 3!
[A,
where dexpA (B) = B + 2!
P[A, B]] + . . .. Magnus
expansion gives an expression Ω(t) = ∞
k=1 Ωk (t) such that
i
dexpΩ(t) (Ω̇(t)) = − ~ H.
Outlook
Katharina Kormann | katharina.kormann@it.uu.se
25
Residual in Magnus expansion
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
P
Magnus expansion gives an expression Ω(t) = ∞
k=1 Ωk (t)
i
such that dexpΩ(t) (Ω̇(t)) = − ~ H.
Hence,P
the residual when using a truncation
e K = K Ωk (t) of the series, reads
Ω
k=1
i
˙e
S(t) = dexpΩe (t) ΩK (t) ψ(t) − − Hψ(t)
K
~
= Ω̇K +1 (t)ψ(t) + h.o.t(t − t0 ).
for fixed spatial discretization.
Katharina Kormann | katharina.kormann@it.uu.se
25
Residual in Lanczos
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
• Standard1 and optimized Lanczos:
i
S = ∆t exp − Lp ∆t
βp kψk.
~
p,1
• s-step Lanczos:
i
1
kψkkvk+1
k.
S = ∆t exp − Lp ∆t
~
p,1
Outlook
1
M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of
Differential Equations (1998)
Katharina Kormann | katharina.kormann@it.uu.se
26
Residual in Lanczos
s-step Lanczos1
• Standard1 and optimized Lanczos:
S = ∆t exp − i Lp ∆t
Introduction
Spatial
Introduction
Discretization
Spatial
Differentiation
discretization
Implementation
Temporal
Temporal
Discretization
discretization
Methods
Minimizing
Lanczos
communication
Minimizing
Communication
Conclusion
Error Control
Theory
Results
Outlook
βp kψk.
~
p,1
• Reduced amount
of synchronization
by a factor s
1 k.
• s-step Lanczos: S = ∆t exp − ~i Lp ∆t p,1 kψkkvk+1
• However: increased amount of arithmetic work
Algorithm 3 The s-step Lanczos Algorithm
v1 = u
V¯1 = [ v11 , Av11 , . . . , As−1 v11 ]
Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 )
for k = 0, 1, . . . , !m/s" do
Call Scalar1
1
s
vk+1
= Avk1 −V̄k−1 γk−1
− V̄k αsk
1
1
1
, A2 vk+1
, . . . , As vk+1
Compute Avk+1
Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 )
Call Scalar2
for j = 2, . . . , s do
j
1
vk+1
= Aj−1 vk+1
−V̄k ∗ tjk
end for
end for
i
Scalar1: Form Wk , solve Wk−1 γk−1
= cik−1 , Wk αik = dik , i = 1, . . . , s
Scalar2: Solve Wk tjk = bjk , j = 2, . . . , s
1
M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of
Differential
Equations (1998)
1
S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented
Katharina Kormann | katharina.kormann@it.uu.se
26
Rb2 with tolerance 10−5 (1D)
`2 error
steps1
FFT2
2/adap.
1.7 · 10−6
27.0
16.4
2/fix
6.2 · 10−6
27.0
15.2
2/fix
1.7 · 10−6
54.6
30.1
4/adaptive
8.8 · 10−6
1.0
1.0
Theory
Results
4/fix
5.5 · 10−5
1.0
0.997
Outlook
4/fix
8.8 · 10−6
1.59
1.51
order/step
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
1
2
relative to no. of steps needed by adaptive method 4/6.
relative to no. of FFT needed by adaptive method 4/6.
Katharina Kormann | katharina.kormann@it.uu.se
27
`2 error
steps
FFT
2/4
8.4 · 10−6
1.0
1.0
2/–
2.2 ·
10−5
1.0
1.01
2/–
8.4 · 10−6
1.47
1.38
order/step
Introduction
Spatial
Discretization
Differentiation
Implementation
Table: OClO with tolerance 10−4 (3D).
Temporal
Discretization
`2 error
steps
FFT
2/4
1.6 · 10−4
1.0
1.0
Theory
Results
2/–
3.4 ·
10−4
1.0
0.93
Outlook
2/–
1.6 · 10−4
1.66
1.43
Methods
Lanczos
Minimizing
Communication
order/step
Error Control
Table: IBr with tolerance 10−3 (including dissociation).
Katharina Kormann | katharina.kormann@it.uu.se
28
Adaptivity in Step Size and Lanczos
Step sizes (2D)
Lanczos iterations (2D)
8
Introduction
0.02
7
6
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
0.01
5
4
0
0
10
20
Step sizes (4D)
10
20
Lanczos iterations (4D)
6
0.02
Theory
Results
Outlook
3
0
5
0.01
0
0
4
10
20
Katharina Kormann | katharina.kormann@it.uu.se
3
0
10
20
29
Outlook
Introduction
Spatial
Discretization
Differentiation
Implementation
Temporal
Discretization
Methods
Lanczos
Minimizing
Communication
Error Control
Theory
Results
Outlook
• Hope: improve parallel scalability by minimizing
communication (Results computer architecture
depending)
• High-dimensional PDE problems extremely memory
consuming
• Future work:
• Collect more results on different architectures, study
numerical accuracy
• Optimize s-step Lanczos (possibly CA-Lanczos by
Hoemmen)
• Implement spatial adaptivity
Katharina Kormann | katharina.kormann@it.uu.se
30
Download