Introduction Basics - the problem and conjugacy Convergence Summary Linear conjugate gradient methods Presentation for Nonlinear optimisation, equations and least squares Anders Märak Leffler February 15, 2012 Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm When to use linear CG methods A method (class of methods) for solving Ax = b, A positive definite. Better than Gauss elimination for large systems (avoid for smaller). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm When to use linear CG methods A method (class of methods) for solving Ax = b, A positive definite. Better than Gauss elimination for large systems (avoid for smaller). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm When to use linear CG methods At most n steps to solve, for A ∈ Mn×n . Often faster convergence, depending on distribution of eigenvalues of A. Memory efficient. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm When to use linear CG methods At most n steps to solve, for A ∈ Mn×n . Often faster convergence, depending on distribution of eigenvalues of A. Memory efficient. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm When to use linear CG methods At most n steps to solve, for A ∈ Mn×n . Often faster convergence, depending on distribution of eigenvalues of A. Memory efficient. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary When is this applicable? A look at the algorithm The algorithm [I] Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Stating the problem Solve Ax = b, A ∈ Mn×n , symmetric positive definite. The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A symmetric positive definite n × n. Standard quadratic form problem. ∇Φ(x) = Ax − b ∇Φ(x) = r (x). At step k : rk = ∇Φ(xk ) = Axk − b. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Stating the problem Solve Ax = b, A ∈ Mn×n , symmetric positive definite. The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A symmetric positive definite n × n. Standard quadratic form problem. ∇Φ(x) = Ax − b ∇Φ(x) = r (x). At step k : rk = ∇Φ(xk ) = Axk − b. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Stating the problem Solve Ax = b, A ∈ Mn×n , symmetric positive definite. The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A symmetric positive definite n × n. Standard quadratic form problem. ∇Φ(x) = Ax − b ∇Φ(x) = r (x). At step k : rk = ∇Φ(xk ) = Axk − b. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Stating the problem Solve Ax = b, A ∈ Mn×n , symmetric positive definite. The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A symmetric positive definite n × n. Standard quadratic form problem. ∇Φ(x) = Ax − b ∇Φ(x) = r (x). At step k : rk = ∇Φ(xk ) = Axk − b. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Stating the problem Solve Ax = b, A ∈ Mn×n , symmetric positive definite. The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A symmetric positive definite n × n. Standard quadratic form problem. ∇Φ(x) = Ax − b ∇Φ(x) = r (x). At step k : rk = ∇Φ(xk ) = Axk − b. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A simpler problem Consider Φ̄(x) = x T Bx − bT x B diagonal positive definite. Searching along the axes yield optimum in at most dim(Rn ) = n steps. . . . if only there were a transformation of the general problem. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A simpler problem Consider Φ̄(x) = x T Bx − bT x B diagonal positive definite. Searching along the axes yield optimum in at most dim(Rn ) = n steps. . . . if only there were a transformation of the general problem. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A simpler problem Consider Φ̄(x) = x T Bx − bT x B diagonal positive definite. Searching along the axes yield optimum in at most dim(Rn ) = n steps. . . . if only there were a transformation of the general problem. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A simpler problem Consider Φ̄(x) = x T Bx − bT x B diagonal positive definite. Searching along the axes yield optimum in at most dim(Rn ) = n steps. . . . if only there were a transformation of the general problem. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors Definition Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if pi Apj = 0, i 6= j. Vectors of P are linearly independent. Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn . S = [p0 . . . pn−1 ] basis matrix. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors Definition Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if pi Apj = 0, i 6= j. Vectors of P are linearly independent. Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn . S = [p0 . . . pn−1 ] basis matrix. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors Definition Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if pi Apj = 0, i 6= j. Vectors of P are linearly independent. Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn . S = [p0 . . . pn−1 ] basis matrix. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors Definition Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if pi Apj = 0, i 6= j. Vectors of P are linearly independent. Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn . S = [p0 . . . pn−1 ] basis matrix. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y . S T AS diagonal, positive diagonal. Reduces to simple case. Interpretation: minimise along directions pi , convergence in ≤ n steps. Method: generate sequence {xk } by xk +1 = xk + αk pk . Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y . S T AS diagonal, positive diagonal. Reduces to simple case. Interpretation: minimise along directions pi , convergence in ≤ n steps. Method: generate sequence {xk } by xk +1 = xk + αk pk . Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y . S T AS diagonal, positive diagonal. Reduces to simple case. Interpretation: minimise along directions pi , convergence in ≤ n steps. Method: generate sequence {xk } by xk +1 = xk + αk pk . Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y . S T AS diagonal, positive diagonal. Reduces to simple case. Interpretation: minimise along directions pi , convergence in ≤ n steps. Method: generate sequence {xk } by xk +1 = xk + αk pk . Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Conjugate vectors y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y . S T AS diagonal, positive diagonal. Reduces to simple case. Interpretation: minimise along directions pi , convergence in ≤ n steps. Method: generate sequence {xk } by xk +1 = xk + αk pk . Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Step lengths αk Given: Current vector xk . Search direction pk . Residue rk Minimising φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ). Explicitly αk = − bT pk −xkT Apk pkT Apk rT p = − pTk Apk . Anders Märak Leffler k k Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Step lengths αk Given: Current vector xk . Search direction pk . Residue rk Minimising φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ). Explicitly αk = − bT pk −xkT Apk pkT Apk rT p = − pTk Apk . Anders Märak Leffler k k Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Step lengths αk Given: Current vector xk . Search direction pk . Residue rk Minimising φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ). Explicitly αk = − bT pk −xkT Apk pkT Apk rT p = − pTk Apk . Anders Märak Leffler k k Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Expanding subspace minimisation Theorem (Nocedal & Wright thm 5.2) Let x0 ∈ Rn be the starting point, {xk } a sequence generated as above. Then rkT pi = 0, i = 0, 1, . . . , k − 1, and xk is the minimiser of Φ(x) = 12 x T Ax − bT x over the set {x : x = x0 + span{p0 , . . . , pk −1 }} Proof. Details, see Nocedal & Wright, p106f. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Expanding subspace minimisation Theorem (Nocedal & Wright thm 5.2) Let x0 ∈ Rn be the starting point, {xk } a sequence generated as above. Then rkT pi = 0, i = 0, 1, . . . , k − 1, and xk is the minimiser of Φ(x) = 12 x T Ax − bT x over the set {x : x = x0 + span{p0 , . . . , pk −1 }} Proof. Details, see Nocedal & Wright, p106f. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1. Process: minimise over affine subspaces of increasing dimension. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1. Process: minimise over affine subspaces of increasing dimension. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1. Process: minimise over affine subspaces of increasing dimension. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors A digression If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if pk Api = 0, i = 0, . . . , k − 1. Compare expanding ON base: given set V of (k-2) ON vectors vi , add vk if vk ⊥ vi , ∀i. Cf G-S orthog. process: project and subtract, using scalar product. Idea: similarly, but with < x, y >A = y T Ax scalar product. Satisfies all the properties: < x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A . < αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay . < x, x >A = x T Ax ≥ 0∀x, A positive definite. < x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0). Expensive - requires all previous vectors to be stored. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Memory efficiency Unnecessary to store old vectors. Update by pk = −rk + βk pk −1 , βk = pk −1 Ark pk −1 Apk −1 The latter given by condition < pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0. Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well. piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Memory efficiency Unnecessary to store old vectors. Update by pk = −rk + βk pk −1 , βk = pk −1 Ark pk −1 Apk −1 The latter given by condition < pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0. Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well. piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Memory efficiency Unnecessary to store old vectors. Update by pk = −rk + βk pk −1 , βk = pk −1 Ark pk −1 Apk −1 The latter given by condition < pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0. Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well. piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Memory efficiency Unnecessary to store old vectors. Update by pk = −rk + βk pk −1 , βk = pk −1 Ark pk −1 Apk −1 The latter given by condition < pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0. Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well. piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Memory efficiency Unnecessary to store old vectors. Update by pk = −rk + βk pk −1 , βk = pk −1 Ark pk −1 Apk −1 The latter given by condition < pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0. Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well. piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects At most n steps Theorem (Nocedal & Wright, thm 5.1) The sequence xk generated as above converges to the optimum x ∗ in at most n steps. Proof. Details, see Nocedal & Wright, p103f. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects At most n steps Theorem (Nocedal & Wright, thm 5.1) The sequence xk generated as above converges to the optimum x ∗ in at most n steps. Proof. Details, see Nocedal & Wright, p103f. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Basic idea Define a suitable norm which corresponds to the problem. Extract eigenvalues, vectors Express the difference in arguments in the eigenbase. . Create inequalities bounding the distance (in k· kA , ie the error) from minimum. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Basic idea Define a suitable norm which corresponds to the problem. Extract eigenvalues, vectors Express the difference in arguments in the eigenbase. . Create inequalities bounding the distance (in k· kA , ie the error) from minimum. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Basic idea Define a suitable norm which corresponds to the problem. Extract eigenvalues, vectors Express the difference in arguments in the eigenbase. . Create inequalities bounding the distance (in k· kA , ie the error) from minimum. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Basic idea Define a suitable norm which corresponds to the problem. Extract eigenvalues, vectors Express the difference in arguments in the eigenbase. . Create inequalities bounding the distance (in k· kA , ie the error) from minimum. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects The norm k· kA We define kzk2A =< z, z >A . Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ). min kx − x ∗ k2A is thus equivalent to min Φ(x). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects The norm k· kA We define kzk2A =< z, z >A . Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ). min kx − x ∗ k2A is thus equivalent to min Φ(x). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects The norm k· kA We define kzk2A =< z, z >A . Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ). min kx − x ∗ k2A is thus equivalent to min Φ(x). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restating the current vectors By construction xk +1 = x0 + α0 p0 + . . . + αk pk . span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }. (proofs in Nocedal & Wright, p109ff, omitted) Changing base to {r0 , Ar0 , . . . , Ak r0 }: xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 . Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk So xk − x0 = Pk∗ (A)r0 . xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn orthonormal eigenvectors. Recall Pk (· ). λi are eigenvalues of Pk (A) Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi = γ0 vi + γ1 λi vi + . . . + γk λki vi = = Pk (λi )vi Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn orthonormal eigenvectors. Recall Pk (· ). λi are eigenvalues of Pk (A) Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi = γ0 vi + γ1 λi vi + . . . + γk λki vi = = Pk (λi )vi Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn orthonormal eigenvectors. Recall Pk (· ). λi are eigenvalues of Pk (A) Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi = γ0 vi + γ1 λi vi + . . . + γk λki vi = = Pk (λi )vi Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn orthonormal eigenvectors. Recall Pk (· ). λi are eigenvalues of Pk (A) Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi = γ0 vi + γ1 λi vi + . . . + γk λki vi = = Pk (λi )vi Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restate xk +1P − x ∗ in the base of eigenvectors. n ∗ xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi Pn kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2 Objective: find Pk∗ (-coefficients) which minimise this quantity. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restate xk +1P − x ∗ in the base of eigenvectors. n ∗ xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi Pn kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2 Objective: find Pk∗ (-coefficients) which minimise this quantity. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restate xk +1P − x ∗ in the base of eigenvectors. n ∗ xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi Pn kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2 Objective: find Pk∗ (-coefficients) which minimise this quantity. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Restate xk +1P − x ∗ in the base of eigenvectors. n ∗ xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi Pn kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2 Objective: find Pk∗ (-coefficients) which minimise this quantity. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Extract the maximal λ to create a bound: P n 2 λ β kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2 j j j=1 Looking at the RHS: P Pn n 2 2 2 λ β λ [1 + P (λ )] β = = kx0 − x ∗ k2A 0 i j j=1 j j j=1 j Thus the RHS is minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Extract the maximal λ to create a bound: P n 2 λ β kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2 j j j=1 Looking at the RHS: P Pn n 2 2 2 λ β λ [1 + P (λ )] β = = kx0 − x ∗ k2A 0 i j j=1 j j j=1 j Thus the RHS is minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Extract the maximal λ to create a bound: P n 2 λ β kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2 j j j=1 Looking at the RHS: P Pn n 2 2 2 λ β λ [1 + P (λ )] β = = kx0 − x ∗ k2A 0 i j j=1 j j j=1 j Thus the RHS is minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Theorem (Nocedal & Wright, thm 5.4) If A has r distinct eigenvalues, then the conjugate gradient algorithm will stop at x ∗ after at most r iterations. Proof. Let τ1 < τ2 < . . . < τr be the r distinct eigenvalues of A. r Define Qr (λ) = τ1(−1) τ2 ...τr (λ − τ1 ) . . . (λ − τr ). Note that r (−1)r τ1 . . . τr = 1. Qr (0) = τ(−1) 1 ...τr Thus P̄r −1 (λ) = (Qr (λ) − 1)/λ is a polynomial of degree r − 1. We verify that this does in fact minimise kxr − x ∗ k2A : 0 ≤ minPr −1 max1≤i≤n [1 + λi Pr −1 (λi )]2 ≤ max1≤i≤n [1 + λi P̄r −1 (λi )]2 = max1≤i≤n Qrr (λi ) = 0. That is, at iteration r the distance to the optimal value is 0, and x ∗ = xr , as the solution is unique (str convex quadratic). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Theorem (Nocedal & Wright, thm 5.4) If A has r distinct eigenvalues, then the conjugate gradient algorithm will stop at x ∗ after at most r iterations. Proof. Let τ1 < τ2 < . . . < τr be the r distinct eigenvalues of A. r Define Qr (λ) = τ1(−1) τ2 ...τr (λ − τ1 ) . . . (λ − τr ). Note that r (−1)r τ1 . . . τr = 1. Qr (0) = τ(−1) 1 ...τr Thus P̄r −1 (λ) = (Qr (λ) − 1)/λ is a polynomial of degree r − 1. We verify that this does in fact minimise kxr − x ∗ k2A : 0 ≤ minPr −1 max1≤i≤n [1 + λi Pr −1 (λi )]2 ≤ max1≤i≤n [1 + λi P̄r −1 (λi )]2 = max1≤i≤n Qrr (λi ) = 0. That is, at iteration r the distance to the optimal value is 0, and x ∗ = xr , as the solution is unique (str convex quadratic). Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Outline 1 2 3 4 Introduction When is this applicable? A look at the algorithm Basics - the problem and conjugacy Stating the problem A simpler problem Conjugacy property Step lengths Expanding subspaces Generating the conjugate vectors Convergence Bound independent of λ Eigenvalue-dependent bounds Clustering effects Summary Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Theorem (Nocedal & Wright, thm 5.5) If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that λn−k − λ1 ∗ 2 kxo − x ∗ k2A kxk +1 − x kA ≤ λn−k + λ1 Proof. Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have time to complete this) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Theorem (Nocedal & Wright, thm 5.5) If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that λn−k − λ1 ∗ 2 kxo − x ∗ k2A kxk +1 − x kA ≤ λn−k + λ1 Proof. Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have time to complete this) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Theorem (Nocedal & Wright, thm 5.5) If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that λn−k − λ1 ∗ 2 kxo − x ∗ k2A kxk +1 − x kA ≤ λn−k + λ1 Proof. Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have time to complete this) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say −λ1 kxm+1 − x ∗ k2A ≤ λλn−m kx0 − x ∗ kA n−m +λ1 Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Implications Say that there are m large eigenvalues, the rest being small (and clustered). By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say λn−m −λ1 ∗ λn−m +λ1 kx0 − x kA kxm+1 − x ∗ k2A ≤ Defining = (λn−m − λ1 , we have kxm+1 − x ∗ kA ' kx0 − x ∗ kA (intuitively: as λ1 is relatively small.) Ie, at iteration m we get a good approximation. In general we want to have clustered eigenvalues. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Condition numbers Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to largest). We can create the √bound k K (A)−1 ∗ kxk − x kA ≤ 2 √ kxo − x ∗ kA . Nocedal & Wright warn (K (A)+1 that this bound often overshoots. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Condition numbers Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to largest). We can create the √bound k K (A)−1 ∗ kxk − x kA ≤ 2 √ kxo − x ∗ kA . Nocedal & Wright warn (K (A)+1 that this bound often overshoots. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Condition numbers Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to largest). We can create the √bound k K (A)−1 ∗ kxk − x kA ≤ 2 √ kxo − x ∗ kA . Nocedal & Wright warn (K (A)+1 that this bound often overshoots. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects Condition numbers Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to largest). We can create the √bound k K (A)−1 ∗ kxk − x kA ≤ 2 √ kxo − x ∗ kA . Nocedal & Wright warn (K (A)+1 that this bound often overshoots. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects A note on preconditioning If A is unsuitable, change variables before solving. Let y = Cx, C nonsingular, and minimise. Selecting the preconditioning method is matrix-specific. Nocedal & Wright specifically mention methods based on approximate Cholesky factorisation. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects A note on preconditioning If A is unsuitable, change variables before solving. Let y = Cx, C nonsingular, and minimise. Selecting the preconditioning method is matrix-specific. Nocedal & Wright specifically mention methods based on approximate Cholesky factorisation. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects A note on preconditioning If A is unsuitable, change variables before solving. Let y = Cx, C nonsingular, and minimise. Selecting the preconditioning method is matrix-specific. Nocedal & Wright specifically mention methods based on approximate Cholesky factorisation. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary Bound independent of λ Eigenvalue-dependent bounds Clustering effects A note on preconditioning If A is unsuitable, change variables before solving. Let y = Cx, C nonsingular, and minimise. Selecting the preconditioning method is matrix-specific. Nocedal & Wright specifically mention methods based on approximate Cholesky factorisation. Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary The algorithm revisited Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary The algorithm revisited Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary The algorithm revisited Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary The algorithm revisited Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods Introduction Basics - the problem and conjugacy Convergence Summary The algorithm revisited Given x0 , preconditioner M; Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0; while rk 6= 0 αk ← − rkT yk ; pkT Apk xk +1 ← xk + αk pk ; rk +1 ← rk + αk Apk ; Solve Myk +1 = rk +1 ; βk +1 ← rkT+1 yk +1 ; rkT yk pk +1 ← −yk +1 + βk +1 pk ; k ← k + 1; end (while) Anders Märak Leffler Linear CG methods