Introduction Parallel Exponential Integrators for Quantum Dynamics Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Katharina Kormann joint work with Magnus Gustafsson and Sverker Holmgren Uppsala University Division of Scientific Computing Theory Results Outlook April 28, 2010 Katharina Kormann | katharina.kormann@it.uu.se 1 Schrödinger Equation for Molecule–Laser Interaction i~ ∂ ψ(r , t) = Ĥ(r , ∆r , t)ψ(r , t), ∂t ψ(r , 0) = ψ0 (r ). 2 Introduction Spatial Discretization Differentiation Implementation Temporal Discretization ~ Hamiltonian for three-state system: Ĥ = − 2m ∆ + V with Vg (r ) µ(r )V1 (t) 0 Ve1 (r ) V2 (r ) . V = µ(r )V1 (t) 0 V2 (r ) Ve2 (r ) Methods Lanczos Minimizing Communication Theory Results Outlook A1 $+ 0.03 potential energy surface Error Control 0.04 u 0.02 3 b %u 0.01 0 target state " !0.01 !0.02 ! (t) !0.03 X1$+g !0.04 !0.05 !0.06 6 initial state #0 8 10 12 14 16 18 intercore distance Katharina Kormann | katharina.kormann@it.uu.se 2 Some Numerical Challenges Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication • Curse of dimensionality: dimension of the problem proportional to 3 times the number of nuclei • Unbounded domains • Oscillations of femto-second pulses Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 3 HAParaNDA • Implementation framework for high-dimensional time-dependet PDEs Introduction • Trade-off: Generality ←→ (Parallel) Efficiency Spatial Discretization • Written in C (but will be ported to C++) • Designed for massively parallel computations on clusters Differentiation Implementation Temporal Discretization Clusters multicore nodes of multicoreof nodes: Methods Lanczos Minimizing Communication • Message passing (MPI) between distributed nodes Grand scale required realistic • • OpenMP forcomputing worksharing withinforeach nodeproblems Error Control Theory Results Outlook Introduction Spatial discretization Temporal discretization Minimizing communication Conclusion • Example node architecture, dual Intel Xeon E5430: Katharina Kormann | katharina.kormann@it.uu.se 4 Spatial Discretization: Standard Method Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Pseudo-spectral method (Fourier basis) + High-accuracy + Efficient implementation based on FFT – Non locality of differential operator – Not flexible with boundary conditions – No adaptivity possible Outlook Katharina Kormann | katharina.kormann@it.uu.se 5 Spatial Discretization: Alternative Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook High-order finite differences • trade-off high-accuracy and locality: 8th order often good compromise • Kronecker product of stencil for high-dimensions + Adaptivity possible + Differential operator localized – 8th order compared to PS points per dimension to be doubled to reach same accuracy (for equi-distributed points) Katharina Kormann | katharina.kormann@it.uu.se 6 Spatial Discretization: Implementation Introduction Spatial Discretization • Block-structured grid in d dimensions • Currently employing an equidistant, static grid • Choose block sizes w.r.t. cache sizes • Adaptive grid refinement/coarsening to be implemented Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 7 Spatial Discretization: Implementation Introduction Spatial Discretization • Block-structured grid in d dimensions • Currently employing an equidistant, static grid • Choose block sizes w.r.t. cache sizes • Adaptive grid refinement/coarsening to be implemented Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 7 Spatial Discretization • High-order finite difference stencils Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 8 Spatial Discretization • High-order finite difference stencils Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 8 Spatial Discretization • High-order finite difference stencils Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 8 Spatial Discretization • High-order finite difference stencils Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 8 Spatial Discretization • Data dependencies on block boundaries (ghost cells) • High-dimensional ghost cell blocks • Large memory overhead due to duplicated data Introduction Spatial Discretization Differentiation Implementation • Communicate data one dimension at a time • Reuse allocated arrays for ghost data • Nearest-neighbor communication Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 9 Propagation and chemical properties (bounded states) Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication • Conservation of total probability → L2 -norm conservation • Energy conservation • Time-reversibility → hermitian evolution operator Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 10 Suitable Integrators for TDSE • Exponential Integrators: ψn+1 = exp(− ~i H(tn + ∆t/2)∆t)ψn Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook • • • • Split operator Chebyshev scheme Lanczos / Arnoldi algorithm Truncated Magnus expansion for increased accuracy • Symplectic √ Integrators: Define q(t) = p(t) = √ 2~<ψ(t) and 2~=ψ(t), then d ∂ q(t) = H(q, p, t) dt ∂p d ∂ p(t) = − H(q, p, t) dt ∂q • partitioned Runge–Kutta • (Crank–Nicolson) Katharina Kormann | katharina.kormann@it.uu.se 11 Chebyshev and Lanczos Lanczos Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook • orthogonal bases of Krylov subspace and exponential of projected Hamiltonian • memory: p vectors • vectors matrix vector products and scalar products Chebyshev • approximate exponential based on Chebyshev polynomials • memory: three vectors • matrix vector products • suitable for short time steps • suitable for long time steps Katharina Kormann | katharina.kormann@it.uu.se 12 Symmetric Lanczos Algorithm Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Krylov subspace of size p: Kp (H, ψ) = {ψ, Hψ, . . . , H p−1 ψ} Lp - tridiagonal projection matrix Vp - orthonormal basis of Kp (H, ψ) α1 β1 Lp = β1 α2 .. . β2 .. . .. . βp−2 αp−1 βp−1 βp−1 αp = VpT HVp ∈ Rp×p . The vector is then approximated by i ψ(t + ∆t) ≈ kψ(t)kVp exp − Lp ∆t e1 , ~ where e1 ∈ Rp with e1,j = δ1,j , j = 1, . . . , p. Katharina Kormann | katharina.kormann@it.uu.se 13 Standard Lanczos Algorithm Scalability issues Introduction • Sparse matrix-vector multiplication (nearest-neighbor) Spatial Discretization • Two inner products in each iteration (all-to-all) Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Algorithm 1 The Lanczos Algorithm v0 = 0 β0 = 0 v1 = Ψk / �Ψk �2 for j = 1, 2, . . . , m do r = Hvj − βj−1 vj−1 αj = (r,vj ) r = r − αj vj if j < m then βj = �r�2 vj+1 = r / βj end if end for Algorithm 2 The Modified Lanczos Algorithm q0 = 0 r0 = Ψk / �Ψk �2 for j = 0, 1, 2, . . . , m do t = Arj Katharina Kormann | katharina.kormann@it.uu.se 14 Algorithm 1 The Lanczos Algorithm v0 = 0 β0 = 0 v1 = Ψk / �Ψk �2 for j = 1, 2, . . . , m do r = Hvj − βj−1 vj−1 αj = (r,vj ) r = r − αj vj if j < m then βj = �r�2 vj+1 = r / βj end if end for A Modified Lanczos Scheme1 Introduction Spatial Discretization • Alter Lanczos’ algorithm and merge the inner products • Eliminates one synchronization point per iteration Differentiation Implementation Algorithm 2 The Modified Lanczos Algorithm q0 = 0 r0 = Ψk / �Ψk �2 for j = 0, 1, 2, . . . , m do t = Arj u = (t,rj ) v = (rj ,rj ) √ βj = v αj+1 = u/v qj+1 = rj /βj rj+1 = t/βj − βj qj − αj+1 qj+1 end for Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook 1 S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented on Parallel Computers (1991) Katharina Kormann | katharina.kormann@it.uu.se 15 s-step Lanczos1 s-step Lanczos1 • Reduced amount of synchronization by a factor s but increased amount of arithmetic work Introduction Spatial troduction Discretization Differentiation patial Implementation scretization Algorithm is much more complex and • •Reduced amount of synchronization by adifficult factor sto optimize for parallelism • However: increased amount of arithmetic work Algorithm 3 The s-step Lanczos Algorithm v1 = u V¯1 = [ v11 , Av11 , . . . , As−1 v11 ] Temporal emporal Discretization scretization Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 ) for k = 0, 1, . . . , !m/s" do Call Scalar1 1 s vk+1 = Avk1 −V̄k−1 γk−1 − V̄k αsk Methods Lanczos inimizing Minimizing mmunication Communication 1 1 1 , A2 vk+1 , . . . , As vk+1 Compute Avk+1 onclusion Error Control Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 ) Theory Results Call Scalar2 for j = 2, . . . , s do j 1 vk+1 = Aj−1 vk+1 −V̄k ∗ tjk end for end for Outlook i Scalar1: Form Wk , solve Wk−1 γk−1 = cik−1 , Wk αik = dik , i = 1, . . . , s Scalar2: Solve Wk tjk = bjk , j = 2, . . . , s 1 S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented on Parallel Computers (1991) 1 S. K. Kim and T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented Katharina Kormann | A. katharina.kormann@it.uu.se 16 Comparison of Lanczos Variants Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results inner products vector updates matrix × vector all-to-all standard 2s 5s s 2s optimized 2s 6s s s s-step 2s 2s(s + 1) s +1 1 Table: Comparison of computational work and communication between s steps of standard/optimized Lanczos and one s-step iteration. Outlook Katharina Kormann | katharina.kormann@it.uu.se 17 Performance Comparison of Lanczos 5000 Introduction Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication 4000 CPU−time (sec.) Spatial Discretization 3000 2000 1000 s−step Lanczos Optimized Lanczos Standard Lanczos Error Control Theory Results Outlook 0 8 16 32 64 128 256 Number of cores 512 1024 2048 Figure: Comparison of scaled speedup of the three Lanczos variants (T = 1000 steps, Nbl = 604 , q = 8, p = 6, s = 3) Katharina Kormann | katharina.kormann@it.uu.se 18 Potential Improvements1 of s-step Lanczos Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control • use different s-step basis • use TSQR for reorthogonalization → tridiagonal projection matrix → improved stability → possible to choose larger value of s Theory Results Outlook 1 M. Hoemmen, Communication-Avoiding Krylov Subspace Methods (2010) Katharina Kormann | katharina.kormann@it.uu.se 19 Error Control and Temporal Adaptivity Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Idea: Improve performance and numerical stability by adaptively choosing p and ∆t Note: Trade-off between less iterations and overhead for error estimation Error Control Theory Results Outlook Katharina Kormann | katharina.kormann@it.uu.se 20 Adaptive Magnus–Lanczos: Idea Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results • Choose step size according to error in the Magnus expansion • Choose size of the Krylov space according to error in the Lanczos algorithm1 • Use a posteriori estimate2 to connect local and global error • Use special properties of the Schrödinger equation to avoid solving the dual problem Outlook 1 M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of Differential Equations (1998) 2 Y. Cao and L. Petzold, A posteriori error estimate for ordinary differential equations by the adjoint method (2004) Katharina Kormann | katharina.kormann@it.uu.se 21 A posteriori error estimate Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Let hφ, ψ(tf )ir be the interesting quantity. primal equation d ψ(t) = −i Hψ(t), dt ~ ψ(0) = ψ0 . dual equation d χ(t) = −i Hχ(t), dt ~ χ(tf ) = φ. Katharina Kormann | katharina.kormann@it.uu.se 22 A posteriori error estimate Introduction View the numerical results as the solution of the perturbed primal equation Spatial Discretization d y (t) = −i Hy +S(t) , dt ~ y (0) = ψ0 +R . Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication dual equation Error Control Theory Results Outlook d χ(t) = −i Hχ(t), dt ~ χ(tf ) = φ. Katharina Kormann | katharina.kormann@it.uu.se 23 A posteriori error estimate Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook Define e(tf ) = ψ(tf ) − y (tf ). Error in the functional hφ, ψ(tf )ir : Z tf hφ, e(tf )ir = hχ(s), S(s)ir ds + hχ(0), Rir 0 Using Cauchy–Schwarz and norm conservation: Z tf |hφ, e(tf )ir | ≤ kχ(s)kr kS(s)kr ds + kχ(0)kr kRkr 0 Z tf = kS(s)kr ds + kRkr 0 Katharina Kormann | katharina.kormann@it.uu.se 24 Residual in Magnus expansion Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results The function ψ(t) = exp(Ω(t))ψ0 solves the differential equation ψ̇(t) = dexpΩ(t) (Ω̇(t))ψ, 1 1 [A, B] + 3! [A, where dexpA (B) = B + 2! P[A, B]] + . . .. Magnus expansion gives an expression Ω(t) = ∞ k=1 Ωk (t) such that i dexpΩ(t) (Ω̇(t)) = − ~ H. Outlook Katharina Kormann | katharina.kormann@it.uu.se 25 Residual in Magnus expansion Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook P Magnus expansion gives an expression Ω(t) = ∞ k=1 Ωk (t) i such that dexpΩ(t) (Ω̇(t)) = − ~ H. Hence,P the residual when using a truncation e K = K Ωk (t) of the series, reads Ω k=1 i ˙e S(t) = dexpΩe (t) ΩK (t) ψ(t) − − Hψ(t) K ~ = Ω̇K +1 (t)ψ(t) + h.o.t(t − t0 ). for fixed spatial discretization. Katharina Kormann | katharina.kormann@it.uu.se 25 Residual in Lanczos Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results • Standard1 and optimized Lanczos: i S = ∆t exp − Lp ∆t βp kψk. ~ p,1 • s-step Lanczos: i 1 kψkkvk+1 k. S = ∆t exp − Lp ∆t ~ p,1 Outlook 1 M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of Differential Equations (1998) Katharina Kormann | katharina.kormann@it.uu.se 26 Residual in Lanczos s-step Lanczos1 • Standard1 and optimized Lanczos: S = ∆t exp − i Lp ∆t Introduction Spatial Introduction Discretization Spatial Differentiation discretization Implementation Temporal Temporal Discretization discretization Methods Minimizing Lanczos communication Minimizing Communication Conclusion Error Control Theory Results Outlook βp kψk. ~ p,1 • Reduced amount of synchronization by a factor s 1 k. • s-step Lanczos: S = ∆t exp − ~i Lp ∆t p,1 kψkkvk+1 • However: increased amount of arithmetic work Algorithm 3 The s-step Lanczos Algorithm v1 = u V¯1 = [ v11 , Av11 , . . . , As−1 v11 ] Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 ) for k = 0, 1, . . . , !m/s" do Call Scalar1 1 s vk+1 = Avk1 −V̄k−1 γk−1 − V̄k αsk 1 1 1 , A2 vk+1 , . . . , As vk+1 Compute Avk+1 Compute (vk1 , vk1 ), (Avk1 , vk1 ), . . . , (A2s−1 vk1 , vk1 ) Call Scalar2 for j = 2, . . . , s do j 1 vk+1 = Aj−1 vk+1 −V̄k ∗ tjk end for end for i Scalar1: Form Wk , solve Wk−1 γk−1 = cik−1 , Wk αik = dik , i = 1, . . . , s Scalar2: Solve Wk tjk = bjk , j = 2, . . . , s 1 M. Hochbruck, C. Lubich, and H, Selhofer: Exponential Integrators for Large Systems of Differential Equations (1998) 1 S. K. Kim and A. T. Chronopoulos: A Class of Lanczos-like Algorithms Implemented Katharina Kormann | katharina.kormann@it.uu.se 26 Rb2 with tolerance 10−5 (1D) `2 error steps1 FFT2 2/adap. 1.7 · 10−6 27.0 16.4 2/fix 6.2 · 10−6 27.0 15.2 2/fix 1.7 · 10−6 54.6 30.1 4/adaptive 8.8 · 10−6 1.0 1.0 Theory Results 4/fix 5.5 · 10−5 1.0 0.997 Outlook 4/fix 8.8 · 10−6 1.59 1.51 order/step Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control 1 2 relative to no. of steps needed by adaptive method 4/6. relative to no. of FFT needed by adaptive method 4/6. Katharina Kormann | katharina.kormann@it.uu.se 27 `2 error steps FFT 2/4 8.4 · 10−6 1.0 1.0 2/– 2.2 · 10−5 1.0 1.01 2/– 8.4 · 10−6 1.47 1.38 order/step Introduction Spatial Discretization Differentiation Implementation Table: OClO with tolerance 10−4 (3D). Temporal Discretization `2 error steps FFT 2/4 1.6 · 10−4 1.0 1.0 Theory Results 2/– 3.4 · 10−4 1.0 0.93 Outlook 2/– 1.6 · 10−4 1.66 1.43 Methods Lanczos Minimizing Communication order/step Error Control Table: IBr with tolerance 10−3 (including dissociation). Katharina Kormann | katharina.kormann@it.uu.se 28 Adaptivity in Step Size and Lanczos Step sizes (2D) Lanczos iterations (2D) 8 Introduction 0.02 7 6 Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control 0.01 5 4 0 0 10 20 Step sizes (4D) 10 20 Lanczos iterations (4D) 6 0.02 Theory Results Outlook 3 0 5 0.01 0 0 4 10 20 Katharina Kormann | katharina.kormann@it.uu.se 3 0 10 20 29 Outlook Introduction Spatial Discretization Differentiation Implementation Temporal Discretization Methods Lanczos Minimizing Communication Error Control Theory Results Outlook • Hope: improve parallel scalability by minimizing communication (Results computer architecture depending) • High-dimensional PDE problems extremely memory consuming • Future work: • Collect more results on different architectures, study numerical accuracy • Optimize s-step Lanczos (possibly CA-Lanczos by Hoemmen) • Implement spatial adaptivity Katharina Kormann | katharina.kormann@it.uu.se 30