Linear conjugate gradient methods Presentation for Nonlinear optimisation, equations and least squares

advertisement
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Linear conjugate gradient methods
Presentation for Nonlinear optimisation, equations and least
squares
Anders Märak Leffler
February 15, 2012
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
When to use linear CG methods
A method (class of methods) for solving Ax = b, A positive
definite.
Better than Gauss elimination for large systems (avoid for
smaller).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
When to use linear CG methods
A method (class of methods) for solving Ax = b, A positive
definite.
Better than Gauss elimination for large systems (avoid for
smaller).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
When to use linear CG methods
At most n steps to solve, for A ∈ Mn×n .
Often faster convergence, depending on distribution of
eigenvalues of A.
Memory efficient.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
When to use linear CG methods
At most n steps to solve, for A ∈ Mn×n .
Often faster convergence, depending on distribution of
eigenvalues of A.
Memory efficient.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
When to use linear CG methods
At most n steps to solve, for A ∈ Mn×n .
Often faster convergence, depending on distribution of
eigenvalues of A.
Memory efficient.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
When is this applicable?
A look at the algorithm
The algorithm [I]
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Stating the problem
Solve Ax = b,
A ∈ Mn×n , symmetric positive definite.
The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A
symmetric positive definite n × n.
Standard quadratic form problem.
∇Φ(x) = Ax − b
∇Φ(x) = r (x).
At step k : rk = ∇Φ(xk ) = Axk − b.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Stating the problem
Solve Ax = b,
A ∈ Mn×n , symmetric positive definite.
The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A
symmetric positive definite n × n.
Standard quadratic form problem.
∇Φ(x) = Ax − b
∇Φ(x) = r (x).
At step k : rk = ∇Φ(xk ) = Axk − b.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Stating the problem
Solve Ax = b,
A ∈ Mn×n , symmetric positive definite.
The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A
symmetric positive definite n × n.
Standard quadratic form problem.
∇Φ(x) = Ax − b
∇Φ(x) = r (x).
At step k : rk = ∇Φ(xk ) = Axk − b.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Stating the problem
Solve Ax = b,
A ∈ Mn×n , symmetric positive definite.
The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A
symmetric positive definite n × n.
Standard quadratic form problem.
∇Φ(x) = Ax − b
∇Φ(x) = r (x).
At step k : rk = ∇Φ(xk ) = Axk − b.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Stating the problem
Solve Ax = b,
A ∈ Mn×n , symmetric positive definite.
The formulation to be used: min Φ(x) = 12 x T Ax − bT x, A
symmetric positive definite n × n.
Standard quadratic form problem.
∇Φ(x) = Ax − b
∇Φ(x) = r (x).
At step k : rk = ∇Φ(xk ) = Axk − b.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A simpler problem
Consider Φ̄(x) = x T Bx − bT x
B diagonal positive definite.
Searching along the axes yield optimum in at most dim(Rn ) = n
steps.
. . . if only there were a transformation of the general problem.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A simpler problem
Consider Φ̄(x) = x T Bx − bT x
B diagonal positive definite.
Searching along the axes yield optimum in at most dim(Rn ) = n
steps.
. . . if only there were a transformation of the general problem.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A simpler problem
Consider Φ̄(x) = x T Bx − bT x
B diagonal positive definite.
Searching along the axes yield optimum in at most dim(Rn ) = n
steps.
. . . if only there were a transformation of the general problem.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A simpler problem
Consider Φ̄(x) = x T Bx − bT x
B diagonal positive definite.
Searching along the axes yield optimum in at most dim(Rn ) = n
steps.
. . . if only there were a transformation of the general problem.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
Definition
Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if
pi Apj = 0, i 6= j.
Vectors of P are linearly independent.
Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn .
S = [p0 . . . pn−1 ] basis matrix.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
Definition
Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if
pi Apj = 0, i 6= j.
Vectors of P are linearly independent.
Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn .
S = [p0 . . . pn−1 ] basis matrix.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
Definition
Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if
pi Apj = 0, i 6= j.
Vectors of P are linearly independent.
Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn .
S = [p0 . . . pn−1 ] basis matrix.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
Definition
Vectors P = {p0 , p1 , . . . , pn−1 }, pi 6= 0∀i conjugate (w r t A) if
pi Apj = 0, i 6= j.
Vectors of P are linearly independent.
Thus, if P ⊆ Rn , |P| = n ⇒ span(P) = Rn .
S = [p0 . . . pn−1 ] basis matrix.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y .
S T AS diagonal, positive diagonal.
Reduces to simple case.
Interpretation: minimise along directions pi , convergence in ≤ n
steps.
Method: generate sequence {xk } by xk +1 = xk + αk pk .
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y .
S T AS diagonal, positive diagonal.
Reduces to simple case.
Interpretation: minimise along directions pi , convergence in ≤ n
steps.
Method: generate sequence {xk } by xk +1 = xk + αk pk .
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y .
S T AS diagonal, positive diagonal.
Reduces to simple case.
Interpretation: minimise along directions pi , convergence in ≤ n
steps.
Method: generate sequence {xk } by xk +1 = xk + αk pk .
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y .
S T AS diagonal, positive diagonal.
Reduces to simple case.
Interpretation: minimise along directions pi , convergence in ≤ n
steps.
Method: generate sequence {xk } by xk +1 = xk + αk pk .
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Conjugate vectors
y = S −1 x yields Φ(x) = Φ(Sy ) = 21 y T (S T AS)y − (S T b)T y .
S T AS diagonal, positive diagonal.
Reduces to simple case.
Interpretation: minimise along directions pi , convergence in ≤ n
steps.
Method: generate sequence {xk } by xk +1 = xk + αk pk .
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Step lengths αk
Given:
Current vector xk .
Search direction pk .
Residue rk
Minimising
φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ).
Explicitly αk = −
bT pk −xkT Apk
pkT Apk
rT p
= − pTk Apk .
Anders Märak Leffler
k
k
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Step lengths αk
Given:
Current vector xk .
Search direction pk .
Residue rk
Minimising
φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ).
Explicitly αk = −
bT pk −xkT Apk
pkT Apk
rT p
= − pTk Apk .
Anders Märak Leffler
k
k
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Step lengths αk
Given:
Current vector xk .
Search direction pk .
Residue rk
Minimising
φ(α) = Φ(xk +αk pk ) = 12 (xk +αk pk )T A(xk +αk pk )−bT (xk +αk pk ).
Explicitly αk = −
bT pk −xkT Apk
pkT Apk
rT p
= − pTk Apk .
Anders Märak Leffler
k
k
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Expanding subspace minimisation
Theorem (Nocedal & Wright thm 5.2)
Let x0 ∈ Rn be the starting point, {xk } a sequence generated as
above. Then
rkT pi = 0, i = 0, 1, . . . , k − 1, and xk is the minimiser of
Φ(x) = 12 x T Ax − bT x over the set {x : x = x0 + span{p0 , . . . , pk −1 }}
Proof.
Details, see Nocedal & Wright, p106f.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Expanding subspace minimisation
Theorem (Nocedal & Wright thm 5.2)
Let x0 ∈ Rn be the starting point, {xk } a sequence generated as
above. Then
rkT pi = 0, i = 0, 1, . . . , k − 1, and xk is the minimiser of
Φ(x) = 12 x T Ax − bT x over the set {x : x = x0 + span{p0 , . . . , pk −1 }}
Proof.
Details, see Nocedal & Wright, p106f.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1.
Process: minimise over affine subspaces of increasing
dimension.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1.
Process: minimise over affine subspaces of increasing
dimension.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Gradient ∇Φ(xk ) ⊥ pi , i = 1, . . . , k − 1.
Process: minimise over affine subspaces of increasing
dimension.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
A digression
If P = {p0 , . . . , pk −1 } conjugate,P ∪ {pk } conjugate if
pk Api = 0, i = 0, . . . , k − 1.
Compare expanding ON base: given set V of (k-2) ON vectors vi ,
add vk if vk ⊥ vi , ∀i.
Cf G-S orthog. process: project and subtract, using scalar
product.
Idea: similarly, but with < x, y >A = y T Ax scalar product.
Satisfies all the properties:
< x, y >A = y T Ax = A symm = y T AT x = x T Ay =< y , x >A .
< αx + βy , z >A = z T A(αx + βy ) = αz T Ax + βz T Ay .
< x, x >A = x T Ax ≥ 0∀x, A positive definite.
< x, x >A = 0 iff x = 0, A positive definite (for x 6= 0, x T Ax > 0).
Expensive - requires all previous vectors to be stored.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Memory efficiency
Unnecessary to store old vectors.
Update by pk = −rk + βk pk −1 , βk =
pk −1 Ark
pk −1 Apk −1
The latter given by condition
< pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0.
Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well.
piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Memory efficiency
Unnecessary to store old vectors.
Update by pk = −rk + βk pk −1 , βk =
pk −1 Ark
pk −1 Apk −1
The latter given by condition
< pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0.
Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well.
piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Memory efficiency
Unnecessary to store old vectors.
Update by pk = −rk + βk pk −1 , βk =
pk −1 Ark
pk −1 Apk −1
The latter given by condition
< pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0.
Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well.
piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Memory efficiency
Unnecessary to store old vectors.
Update by pk = −rk + βk pk −1 , βk =
pk −1 Ark
pk −1 Apk −1
The latter given by condition
< pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0.
Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well.
piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Memory efficiency
Unnecessary to store old vectors.
Update by pk = −rk + βk pk −1 , βk =
pk −1 Ark
pk −1 Apk −1
The latter given by condition
< pk , pk −1 >A = pkT−1 Apk = −pk −1 Ark + βk pk −1 Apk −1 = 0.
Note that < pk , pi >A = 0, ∀i = 0, 1, . . . , k − 2 as well.
piT Apk = piT A(−rk ) + βk pk −1 = −piT Ark + 0 = thm 5.2 = 0.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
At most n steps
Theorem (Nocedal & Wright, thm 5.1)
The sequence xk generated as above converges to the optimum x ∗ in
at most n steps.
Proof.
Details, see Nocedal & Wright, p103f.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
At most n steps
Theorem (Nocedal & Wright, thm 5.1)
The sequence xk generated as above converges to the optimum x ∗ in
at most n steps.
Proof.
Details, see Nocedal & Wright, p103f.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Basic idea
Define a suitable norm which corresponds to the problem.
Extract eigenvalues, vectors
Express the difference in arguments in the eigenbase. .
Create inequalities bounding the distance (in k· kA , ie the error)
from minimum.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Basic idea
Define a suitable norm which corresponds to the problem.
Extract eigenvalues, vectors
Express the difference in arguments in the eigenbase. .
Create inequalities bounding the distance (in k· kA , ie the error)
from minimum.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Basic idea
Define a suitable norm which corresponds to the problem.
Extract eigenvalues, vectors
Express the difference in arguments in the eigenbase. .
Create inequalities bounding the distance (in k· kA , ie the error)
from minimum.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Basic idea
Define a suitable norm which corresponds to the problem.
Extract eigenvalues, vectors
Express the difference in arguments in the eigenbase. .
Create inequalities bounding the distance (in k· kA , ie the error)
from minimum.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
The norm k· kA
We define kzk2A =< z, z >A .
Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ).
min kx − x ∗ k2A is thus equivalent to min Φ(x).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
The norm k· kA
We define kzk2A =< z, z >A .
Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ).
min kx − x ∗ k2A is thus equivalent to min Φ(x).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
The norm k· kA
We define kzk2A =< z, z >A .
Conveniently, 12 kx − x ∗ k2A = 12 (x − x ∗ )T A(x − x ∗ ) = Φ(x) − Φ(x ∗ ).
min kx − x ∗ k2A is thus equivalent to min Φ(x).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restating the current vectors
By construction xk +1 = x0 + α0 p0 + . . . + αk pk .
span{p0 , . . . , pk }=span{r0 , r1 , . . . , rk }=span{r0 , Aro , . . . , Ak r0 }.
(proofs in Nocedal & Wright, p109ff, omitted)
Changing base to {r0 , Ar0 , . . . , Ak r0 }:
xk = x0 + γ0 r0 + γ1 Ar0 + . . . + γk Ak r0 .
Define Pk∗ (· ) as polynomial with coeff. γ0 , . . . , γk
So xk − x0 = Pk∗ (A)r0 .
xk +1 − x ∗ = x0 + Pk∗ (A)r0 − x ∗ = [I + Pk∗ (A)A](x0 − x ∗ ).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn
orthonormal eigenvectors.
Recall Pk (· ). λi are eigenvalues of Pk (A)
Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi =
γ0 vi + γ1 λi vi + . . . + γk λki vi =
= Pk (λi )vi
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn
orthonormal eigenvectors.
Recall Pk (· ). λi are eigenvalues of Pk (A)
Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi =
γ0 vi + γ1 λi vi + . . . + γk λki vi =
= Pk (λi )vi
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn
orthonormal eigenvectors.
Recall Pk (· ). λi are eigenvalues of Pk (A)
Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi =
γ0 vi + γ1 λi vi + . . . + γk λki vi =
= Pk (λi )vi
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Let 0 < λ1 ≤ λ2 ≤ . . . ≤ λn be eigenvalues of A, v1 , . . . , vn
orthonormal eigenvectors.
Recall Pk (· ). λi are eigenvalues of Pk (A)
Pk (A)vi = γ0 Ivi + γ1 Avi + γ2 A2 vi + . . . + γk Ak vi =
γ0 vi + γ1 λi vi + . . . + γk λki vi =
= Pk (λi )vi
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restate xk +1P
− x ∗ in the base of eigenvectors.
n
∗
xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi
Pn
kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2
Objective: find Pk∗ (-coefficients) which minimise this quantity.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restate xk +1P
− x ∗ in the base of eigenvectors.
n
∗
xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi
Pn
kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2
Objective: find Pk∗ (-coefficients) which minimise this quantity.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restate xk +1P
− x ∗ in the base of eigenvectors.
n
∗
xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi
Pn
kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2
Objective: find Pk∗ (-coefficients) which minimise this quantity.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Restate xk +1P
− x ∗ in the base of eigenvectors.
n
∗
xk +1 − x = i=1 [1 + λi Pk∗ (λi )]βi vi
Pn
kxk +1 − x ∗ k2A = i=1 λi [1 + λi Pk (λi )]2 βi2
Objective: find Pk∗ (-coefficients) which minimise this quantity.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Extract the maximal λ to create a bound:
P
n
2
λ
β
kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2
j
j
j=1
Looking at the
RHS:
P
Pn
n
2
2 2
λ
β
λ
[1
+
P
(λ
)]
β
=
= kx0 − x ∗ k2A
0 i
j
j=1 j j
j=1 j
Thus the RHS is
minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Extract the maximal λ to create a bound:
P
n
2
λ
β
kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2
j
j
j=1
Looking at the
RHS:
P
Pn
n
2
2 2
λ
β
λ
[1
+
P
(λ
)]
β
=
= kx0 − x ∗ k2A
0 i
j
j=1 j j
j=1 j
Thus the RHS is
minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Extract the maximal λ to create a bound:
P
n
2
λ
β
kxk +1 − x ∗ k2A ≤ minPk max1≤i≤n [1 + λi Pk (λi )]2
j
j
j=1
Looking at the
RHS:
P
Pn
n
2
2 2
λ
β
λ
[1
+
P
(λ
)]
β
=
= kx0 − x ∗ k2A
0 i
j
j=1 j j
j=1 j
Thus the RHS is
minPk max1≤i≤n [1 + λi Pk (λi )]2 kx0 − x ∗ k2A
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Theorem (Nocedal & Wright, thm 5.4)
If A has r distinct eigenvalues, then the conjugate gradient algorithm
will stop at x ∗ after at most r iterations.
Proof.
Let τ1 < τ2 < . . . < τr be the r distinct eigenvalues of A.
r
Define Qr (λ) = τ1(−1)
τ2 ...τr (λ − τ1 ) . . . (λ − τr ). Note that
r
(−1)r τ1 . . . τr = 1.
Qr (0) = τ(−1)
1 ...τr
Thus P̄r −1 (λ) = (Qr (λ) − 1)/λ is a polynomial of degree r − 1. We
verify that this does in fact minimise kxr − x ∗ k2A :
0 ≤ minPr −1 max1≤i≤n [1 + λi Pr −1 (λi )]2 ≤ max1≤i≤n [1 + λi P̄r −1 (λi )]2 =
max1≤i≤n Qrr (λi ) = 0. That is, at iteration r the distance to the optimal
value is 0, and x ∗ = xr , as the solution is unique (str convex
quadratic).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Theorem (Nocedal & Wright, thm 5.4)
If A has r distinct eigenvalues, then the conjugate gradient algorithm
will stop at x ∗ after at most r iterations.
Proof.
Let τ1 < τ2 < . . . < τr be the r distinct eigenvalues of A.
r
Define Qr (λ) = τ1(−1)
τ2 ...τr (λ − τ1 ) . . . (λ − τr ). Note that
r
(−1)r τ1 . . . τr = 1.
Qr (0) = τ(−1)
1 ...τr
Thus P̄r −1 (λ) = (Qr (λ) − 1)/λ is a polynomial of degree r − 1. We
verify that this does in fact minimise kxr − x ∗ k2A :
0 ≤ minPr −1 max1≤i≤n [1 + λi Pr −1 (λi )]2 ≤ max1≤i≤n [1 + λi P̄r −1 (λi )]2 =
max1≤i≤n Qrr (λi ) = 0. That is, at iteration r the distance to the optimal
value is 0, and x ∗ = xr , as the solution is unique (str convex
quadratic).
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Outline
1
2
3
4
Introduction
When is this applicable?
A look at the algorithm
Basics - the problem and conjugacy
Stating the problem
A simpler problem
Conjugacy property
Step lengths
Expanding subspaces
Generating the conjugate vectors
Convergence
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Summary
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Theorem (Nocedal & Wright, thm 5.5)
If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that
λn−k − λ1
∗ 2
kxo − x ∗ k2A
kxk +1 − x kA ≤
λn−k + λ1
Proof.
Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have
time to complete this)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Theorem (Nocedal & Wright, thm 5.5)
If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that
λn−k − λ1
∗ 2
kxo − x ∗ k2A
kxk +1 − x kA ≤
λn−k + λ1
Proof.
Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have
time to complete this)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Theorem (Nocedal & Wright, thm 5.5)
If A has eigenvalues λ1 ≤ λ2 ≤ . . . λn , we have that
λn−k − λ1
∗ 2
kxo − x ∗ k2A
kxk +1 − x kA ≤
λn−k + λ1
Proof.
Left as exercise. (by Nocedal & Wright as well; sadly I didn’t have
time to complete this)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
−λ1
kxm+1 − x ∗ k2A ≤ λλn−m
kx0 − x ∗ kA
n−m +λ1
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Implications
Say that there are m large eigenvalues, the rest being small (and
clustered).
By the ordering i< j ⇒ λi≤ λj , we can use thm 5.5 to say
λn−m −λ1
∗
λn−m +λ1 kx0 − x kA
kxm+1 − x ∗ k2A ≤
Defining = (λn−m − λ1 , we have
kxm+1 − x ∗ kA ' kx0 − x ∗ kA
(intuitively: as λ1 is relatively small.) Ie, at iteration m we get a
good approximation.
In general we want to have clustered eigenvalues.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Condition numbers
Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to
largest).
We can create the
√bound k
K (A)−1
∗
kxk − x kA ≤ 2 √
kxo − x ∗ kA . Nocedal & Wright warn
(K (A)+1
that this bound often overshoots.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Condition numbers
Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to
largest).
We can create the
√bound k
K (A)−1
∗
kxk − x kA ≤ 2 √
kxo − x ∗ kA . Nocedal & Wright warn
(K (A)+1
that this bound often overshoots.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Condition numbers
Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to
largest).
We can create the
√bound k
K (A)−1
∗
kxk − x kA ≤ 2 √
kxo − x ∗ kA . Nocedal & Wright warn
(K (A)+1
that this bound often overshoots.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
Condition numbers
Recall K (A) = kAk2 kA−1 k2 = λn /λ1 (λi ordered from smallest to
largest).
We can create the
√bound k
K (A)−1
∗
kxk − x kA ≤ 2 √
kxo − x ∗ kA . Nocedal & Wright warn
(K (A)+1
that this bound often overshoots.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
A note on preconditioning
If A is unsuitable, change variables before solving.
Let y = Cx, C nonsingular, and minimise.
Selecting the preconditioning method is matrix-specific.
Nocedal & Wright specifically mention methods based on
approximate Cholesky factorisation.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
A note on preconditioning
If A is unsuitable, change variables before solving.
Let y = Cx, C nonsingular, and minimise.
Selecting the preconditioning method is matrix-specific.
Nocedal & Wright specifically mention methods based on
approximate Cholesky factorisation.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
A note on preconditioning
If A is unsuitable, change variables before solving.
Let y = Cx, C nonsingular, and minimise.
Selecting the preconditioning method is matrix-specific.
Nocedal & Wright specifically mention methods based on
approximate Cholesky factorisation.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
Bound independent of λ
Eigenvalue-dependent bounds
Clustering effects
A note on preconditioning
If A is unsuitable, change variables before solving.
Let y = Cx, C nonsingular, and minimise.
Selecting the preconditioning method is matrix-specific.
Nocedal & Wright specifically mention methods based on
approximate Cholesky factorisation.
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
The algorithm revisited
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
The algorithm revisited
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
The algorithm revisited
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
The algorithm revisited
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Introduction
Basics - the problem and conjugacy
Convergence
Summary
The algorithm revisited
Given x0 , preconditioner M;
Set r0 ← Ax0 − b, solve My0 = r0 for y0 , set p0 = −y0 , k ← 0;
while rk 6= 0
αk ← −
rkT yk
;
pkT Apk
xk +1 ← xk + αk pk ;
rk +1 ← rk + αk Apk ;
Solve Myk +1 = rk +1 ;
βk +1 ←
rkT+1 yk +1
;
rkT yk
pk +1 ← −yk +1 + βk +1 pk ;
k ← k + 1;
end (while)
Anders Märak Leffler
Linear CG methods
Download