Tracking the trajectory in finite precision CG computations Tomáš Gergelits 1 1 Zdeněk Strakoš 1,2 Faculty of Mathematics and Physics, Charles University in Prague 2 Institute of Computer Science AS CR Seminar on Numerical Analysis 2014 Nymburk, 28th January 2014 T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 1 / 21 Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 2 / 21 Introduction Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 3 / 21 Introduction Simple numerical experiment Solve Ax = b by the CG method A . . . 25 × 25 diagonal and positive definite matrix, b = ones(1, . . . , 1) 0 exact arithmetic finite precision arithmetic relative A−norm of the error 10 20 iterations of CG in exact arithmetic kx − x20 kA = 2.0611 × 10−6 −5 10 20 iterations of CG in finite precision arithmetic (FP CG) −10 kx − x̄20 kA = 1.0953 × 10−2 10 kx20 − x̄20 k∞ = 1.424 × 10−2 −15 10 x 0 10 T. Gergelits, Z. Strakoš 20 30 iteration number 40 50 SNA’14, Nymburk x̄20 x20 2014, January 28 4 / 21 Introduction Simple numerical experiment Solve Ax = b by the CG method A . . . 25 × 25 diagonal and positive definite matrix, b = ones(1, . . . , 1) 0 exact arithmetic finite precision arithmetic relative A−norm of the error 10 20 iterations of CG in exact arithmetic kx − x20 kA = 2.0611 × 10−6 −5 10 20 iterations of CG in finite precision arithmetic (FP CG) −10 kx − x̄20 kA = 1.0953 × 10−2 10 kx20 − x̄20 k∞ = 1.424 × 10−2 −15 10 x 0 10 T. Gergelits, Z. Strakoš 20 30 iteration number 40 50 SNA’14, Nymburk x̄20 x20 2014, January 28 4 / 21 Introduction Simple numerical experiment Solve Ax = b by the CG method A . . . 25 × 25 diagonal and positive definite matrix, b = ones(1, . . . , 1) 0 exact arithmetic finite precision arithmetic relative A−norm of the error 10 20 iterations of CG in exact arithmetic kx − x20 kA = 2.0611 × 10−6 −5 10 20 iterations of CG in finite precision arithmetic (FP CG) −10 kx − x̄20 kA = 1.0953 × 10−2 10 kx20 − x̄20 k∞ = 1.424 × 10−2 −15 10 x 0 10 T. Gergelits, Z. Strakoš 20 30 iteration number 40 50 SNA’14, Nymburk x̄20 x20 2014, January 28 4 / 21 Introduction Simple numerical experiment Solve Ax = b by the CG method A . . . 25 × 25 diagonal and positive definite matrix, b = ones(1, . . . , 1) 0 exact arithmetic finite precision arithmetic relative A−norm of the error 10 20 iterations of CG in exact arithmetic kx − x20 kA = 2.0611 × 10−6 −5 10 29 iterations of FP CG for the same level of accuracy −10 kx − x̄29 kA = 2.0057 × 10−6 10 kx20 − x̄29 k∞ = 1.304 × 10−7 −15 10 x̄29 0 10 T. Gergelits, Z. Strakoš 20 30 iteration number 40 50 SNA’14, Nymburk x20 x 2014, January 28 5 / 21 Introduction Simple numerical experiment Solve Ax = b by the CG method A . . . 25 × 25 diagonal and positive definite matrix, b = ones(1, . . . , 1) 0 exact arithmetic finite precision arithmetic relative A−norm of the error 10 20 iterations of CG in exact arithmetic kx − x20 kA = 2.0611 × 10−6 −5 10 29 iterations of FP CG for the same level of accuracy −10 kx − x̄29 kA = 2.0057 × 10−6 10 kx20 − x̄29 k∞ = 1.304 × 10−7 −15 10 x̄29 0 10 T. Gergelits, Z. Strakoš 20 30 iteration number 40 50 SNA’14, Nymburk x20 x 2014, January 28 5 / 21 The essence of the CG method Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 6 / 21 The essence of the CG method The essence of the CG method I. Ax = b, A ∈ FN×N HPD matrix, b ∈ FN where F is C or R Hestenes, Stiefel (1952), Lanczos (1952) CG is a projection method which minimizes the energy norm of the error xk ∈ x0 + Kk (A, r0 ), rk ⊥ Kk (A, r0 ), where k = 1, 2, . . . Kk (A, r0 ) = span{r0 , Ar0 , A2 r0 , . . . , Ak −1 r0 } kx − xk kA = min {kx − ykA : y ∈ x0 + Kk (A, r0 )} CG is conforming with the Galerkin approximation (k ) k∇(u − uh )k2 = k∇(u − uh )k2 + kx − xk k2A T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 7 / 21 The essence of the CG method The essence of the CG method I. Ax = b, A ∈ FN×N HPD matrix, b ∈ FN where F is C or R Hestenes, Stiefel (1952), Lanczos (1952) CG is a projection method which minimizes the energy norm of the error xk ∈ x0 + Kk (A, r0 ), rk ⊥ Kk (A, r0 ), where k = 1, 2, . . . Kk (A, r0 ) = span{r0 , Ar0 , A2 r0 , . . . , Ak −1 r0 } kx − xk kA = min {kx − ykA : y ∈ x0 + Kk (A, r0 )} CG is conforming with the Galerkin approximation (k ) k∇(u − uh )k2 = k∇(u − uh )k2 + kx − xk k2A T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 7 / 21 The essence of the CG method The essence of the CG method I. Ax = b, A ∈ FN×N HPD matrix, b ∈ FN where F is C or R Hestenes, Stiefel (1952), Lanczos (1952) CG is a projection method which minimizes the energy norm of the error xk ∈ x0 + Kk (A, r0 ), rk ⊥ Kk (A, r0 ), where k = 1, 2, . . . Kk (A, r0 ) = span{r0 , Ar0 , A2 r0 , . . . , Ak −1 r0 } kx − xk kA = min {kx − ykA : y ∈ x0 + Kk (A, r0 )} CG is conforming with the Galerkin approximation (k ) k∇(u − uh )k2 = k∇(u − uh )k2 + kx − xk k2A T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 7 / 21 The essence of the CG method The essence of the CG method II. CG method is mathematically equivalent to the Lanczos process: AVk = Vk Tk + δk +1 vk +1 ekT with given v1 = r0 /kr0 k The CG approximations xk are given by Tk yk = kr0 ke1 , xk = x0 + Vk yk . Jacobi matrix Tk is the k × k matrix representation of the OG projected operator Ak ≡ Vk Vk∗ AVk Vk∗ : Kk (A, r0 ) → Kk (A, r0 ). CG is a matrix formulation of the Gauss-Christoffel quadrature ω(λ), A, r0 /kr0 k Z ∞ f (λ) dω(λ) 0 first 2k moments matched ω (k ) (λ), Tk , e1 T. Gergelits, Z. Strakoš SNA’14, Nymburk Pk i=1 (k ) (k ) ωi f (λi ) 2014, January 28 8 / 21 The essence of the CG method The essence of the CG method II. CG method is mathematically equivalent to the Lanczos process: AVk = Vk Tk + δk +1 vk +1 ekT with given v1 = r0 /kr0 k The CG approximations xk are given by Tk yk = kr0 ke1 , xk = x0 + Vk yk . Jacobi matrix Tk is the k × k matrix representation of the OG projected operator Ak ≡ Vk Vk∗ AVk Vk∗ : Kk (A, r0 ) → Kk (A, r0 ). CG is a matrix formulation of the Gauss-Christoffel quadrature ω(λ), A, r0 /kr0 k Z ∞ f (λ) dω(λ) 0 first 2k moments matched ω (k ) (λ), Tk , e1 T. Gergelits, Z. Strakoš SNA’14, Nymburk Pk i=1 (k ) (k ) ωi f (λi ) 2014, January 28 8 / 21 CG in FP arithmetic Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 9 / 21 CG in FP arithmetic Delay of convergence Short recurrences =⇒ loss of orthogonality =⇒ delay of convergence 0 exact arithmetic finite precision arithmetic loss of orthogonality relative A−norm of the error 10 −5 10 −10 10 delay of convergence −15 10 0 T. Gergelits, Z. Strakoš 10 20 30 iteration number SNA’14, Nymburk 40 50 2014, January 28 10 / 21 CG in FP arithmetic Scheme of backward-like analysis Paige (1971, 1980), Greenbaum (1989), S (1991), Greenbaum and S (1992) b FN FN(k ) A, FP Lanczos → Tk b A(k), EXACT Lanczos → Tk the first k steps b Eigenvalues of A(k) are tightly clustered around the eigenvalues of A. b In numerical experiments, A(k) can be replaced (with small inaccuracy) b by an “universal” A with sufficiently many eigenvalues in the tight clusters around the eigenvalues of A. T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 11 / 21 CG in FP arithmetic Illustration of backward-like analysis 0 relative energy norm of the error 10 b exact CG on A FP CG on A exact CG on A −5 10 −10 10 0 20 40 60 80 100 iteration number 120 140 160 kx − x̄k kA ≈ kxb − xbk kAb We can relate k-th iteration of finite precision CG applied on A with k-th b iteration of exact CG applied on blurred problem A. T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 12 / 21 Trajectory of CG approximations Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 13 / 21 Trajectory of CG approximations Rank-deficiency and delay of convergence FP CG convergence curve shifted correspondingly to the rank-deficiency of the computed Krylov subspace gives the exact CG convergence curve. 0 10 45 relative A−norm of the error rank of the computed Krylov subspace 50 40 35 30 rank−deficiency 25 20 15 −5 10 −10 10 10 FP CG computation exact CG computation 5 0 0 10 20 30 iteration number 40 50 −15 10 shifted FP CG exact CG 0 5 10 15 20 rank of the computed Krylov subspace 25 Rank deficiency: k − k̄ , where k̄ = rank(Kk (A, r0 )) is the numerical rank of the computed Krylov subspace. Threshold: 10−1 (correspondence to a significant loss of orthogonality) T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 14 / 21 Trajectory of CG approximations Comparison of approximations We can compare FP and exact CG approximations x̄k and xk̄ themselves, both x̄k , xk̄ ∈ FN . 0 maximum norm 10 Observation kx − xk̄ k∞ kx − x̄k k∞ kx̄k − xk̄ k∞ −5 10 kx̄k − xk̄ k∞ ≪ 1, kx − xk̄ k∞ k x̄k −xk̄ k ∞ kx−xk̄ k ∞ i.e., distance between approximations is small in comparison with the actual level of error. −10 10 −15 10 0 5 10 15 20 rank of the computed Krylov subspace T. Gergelits, Z. Strakoš SNA’14, Nymburk 25 2014, January 28 15 / 21 Trajectory of CG approximations Comparison of approximations Ax = b FN FN Ax = b delay at the k-th step x xk x̄k x̄k xk̄ exact computation finite precision computation x0 x exact computation finite precision computation Finite precision CG computation tightly follows the trajectory of exact CG computations with the delay which is given by the rank-deficiency of the computed Krylov subspace. T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 16 / 21 Inertia of computed Krylov subspaces Contents 1 Introduction 2 The essence of the CG method 3 CG in finite precision arithmetic 4 Trajectory of CG approximations 5 Inertia of computed Krylov subspaces T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 17 / 21 Inertia of computed Krylov subspaces Comparison of numerical ranks Kk (A, r0 ): k-th Krylov subspace generated by FP CG Kk̄ (A, r0 ): k̄-th Krylov subspace generated by exact CG ( rank(Kk̄ (A, r0 )) = k̄ ) k 56 80 100 196 362 528 611 664 k̄ = rank(Kk (A, r0 )) 56 73 80 93 112 126 131 132 rank(Kk (A, r0 ) ∪ Kk̄ (A, r0 )) 56 74 80 93 113 127 131 132 Bcsstk04 (from MatrixMarket) T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 18 / 21 Inertia of computed Krylov subspaces Distance between subspaces Principal angles between k̄ -dimensional subspace Kk̄ (A, r0 ) and the k̄ -dimensional restriction of the subspace Kk (A, r0 ) which corresponds to its numerical rank: 0 ≤ θ1 ≤ θ2 ≤ . . . ≤ θk̄ ≤ π/2 Distance: distance Kk̄ (A, r0 ), restricted Kk (A, r0 ) 0 10 = sin(θk̄ ) sin(θk̄ ) = distance sin(θk̄−1 ) sin(θk̄−2 ) sin(θk̄−3 ) −5 10 0 20 T. Gergelits, Z. Strakoš 40 60 80 100 k̄ - dimension of the corresponding subspaces SNA’14, Nymburk 120 2014, January 28 19 / 21 Concluding remarks and future work The rate of convergence typically substantially differs for FP and exact CG. However, the trajectories of FP and exact CG approximations seem to be very close to each other. Inertia of computed Krylov subspaces represents phenomenon which needs to be studied. T. Gergelits, Z. Strakoš SNA’14, Nymburk 2014, January 28 20 / 21 Thank you for your kind attention Acknowledgement This work has been supported by the ERC-CZ project LL1202, by the GACR grant 201/13-06684S and by the GAUK grant 695612.