0 - University of Illinois at Urbana

advertisement
ECE 530 – Analysis Techniques for
Large-Scale Electrical Systems
Lecture 25: Krylov Subspace Methods
Prof. Hao Zhu
Dept. of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
haozhu@illinois.edu
12/2/2014
1
Announcements
•
•
No class on Thursday Dec 4
Homework 8 posted, due on Thursday Dec 11
2
Krylov Subspace Outline
•
•
•
•
•
•
Review of fields and vector spaces
Eigensystem basics
Definition of Krylov subspaces and annihilating
polynomial
Generic Krylov subspace solver
Steepest descent
Conjugate gradient
3
Krylov Subspace
•
Iterative methods to solve Ax=b build on the idea that
m 1


1
1
j
x  A b     a j 1 A  b
 a0 j  0

•
Given a matrix A and a vector v, the ith order Krylov
subspace is defined as
𝐊 𝑖 𝐯, 𝐀 = span {𝐯, 𝐀𝐯, 𝐀2 𝐯, … , 𝐀𝑖−1 𝐯}
•
For a specified matrix A and a vector v, the largest
value of i is bounded
4
Generic Krylov Subspace Solver
•
•
•
The following is a generic Krylov subspace solver
method for solving Ax = b using only matrix vector
multiplies
Step 1: Start with an initial guess x(0) and some
predefined error tolerance e > 0; compute the residual,
r(0) = b – A x(0); set i = 0
Step 2: While ||r(i) ||  e Do
(a) i := i + 1
(b) get Ki(r(0),A)
(c) find x(i) in {x(0) + Ki(r(0),A)} to minimize ||r(i) ||
Stop
5
Krylov Subspace Solver
•
Note that no calculations are performed in Step 2
once i becomes greater than its largest value
The Krylov subspace methods differ from each other
in
•
–
•
•
–
the construction scheme of the Krylov subspace in Step
2(b) of the scheme
the residual minimization criterion used in Step 2(c)
A common initial guess is x(0) = 0, giving
r(0) = b – A x(0) = b
Every solver involves the A matrix only in matrixvector products: Air(0), i=1,2,…
6
Iterative Optimization Methods
•
•
•
•
Directly constructing the Krylov Subspace for any A
and r(0) would be computationally expensive
We will instead introduce iterative optimization
methods for solving Ax = b, which turns out to be a
a special case of Krylov Subspace method
Without loss of generality, consider the system
Ax = b
where A is symmetric (i.e., A = AT) and positive
definite (i.e., A≻0, all eigenvalues nonnegative)
Any Ax = b with nonsingular A is equivalent to
AT Ax = AT b
where AT A is symmetric and positive definite
7
Optimization Problem
•
•
Consider the convex problem
1 𝑇
𝑓 𝒙 = 𝒙 𝑨𝒙 − 𝒃𝑇 𝒙
2
The optimal x* that minimizes f(x) is given by the
solution of
T
•
 x f   0  A x  b
which is exactly the solution to Ax = b
The classical method for convex optimization entails
the application of the steepest descent scheme
8
Steepest Descent Algorithm
•
•
•
•
Iteratively update x along the direction
−𝛻𝑓 𝒙 = 𝒃 − 𝑨𝒙
The stepsize is selected to minimize f(x) along −𝛻𝑓 𝒙
Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b
While ||r(i) ||  e Do
(a) calculate
a 
i
i
r
r
  T
r i  A r i
 
(b) x(i+1) = x(i) + a(i) r(i)
(c) r(i+1) = r(i) - a(i) Ar(i)
(d) i := i + 1
End While
 i  T
Note there is only
one matrix, vector
multiply per iteration
9
Steepest Descent Convergence
•
We define the A-norm of x
x
•
2
A
 x TA x
We can show exponential convergence, that is
𝒙
𝑖
− 𝒙∗ ≤
𝜅−1 𝑖
𝜅+1
𝒙
0
− 𝒙∗
where 𝜅 is the condition number of A, i.e.,
 max
 
 min
10
Steepest Descent Convergence
•
•
Because (𝜅-1)/(𝜅+1) < 1 the error will decrease with
each steepest descent iteration, albeit potentially quite
slow for large 𝜅
The function values decreases quicker, as per
 
f x    f x 
f x   f x*
i
0
•
*
  1
 



1


2i
but this can still be quite slow if 𝜅 is large
The issue is steepest descent often finds itself taking
steps along the same direction as that of its earlier
steps
11
Conjugate Direction Methods
•
•
An improvement over the steepest descent is to take
the exact number of steps using a set of search
directions and obtain the
solution after n such steps;
this is the basic idea in the
conjugate direction methods
Image compares steepest
descent with a conjugate
direction approach
Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg
12
Conjugate Direction Methods
•
The basic idea is the n search directions denoted by
0
 1
d ,d ,
,d
 n  1
need to be A-orthogonal, that is
 i  T
 j
d
A
d
0,


•
i  j , i, j  0 ,1, ... , n  2
At the ith iteration, we will update
x
i  1
 x    a  d  
i
i
i
i  0 ,1, ... , n  2
13
Stepsize Selection
•
The stepsize 𝛼 (𝑖) is chosen such that
𝑓 𝒙(𝑖) + 𝛼 (𝑖) 𝒅(𝑖) = min 𝑓(𝒙
𝛼
•
𝑖
+ 𝛼𝒅 𝑖 )
By setting to zero the derivative
0 = (𝒅(𝑖) )′𝛻𝑓 𝒙
𝑖
+𝛼 𝑖 𝒅
𝑖
= (𝒅(𝑖) )′(𝑨 𝒙
𝑖
+
14
Convergence Proof
•
To prove the convergence of conjugate direction
method, we can show that
𝒙(𝑖+1) = arg min 𝑓(𝒙)
where 𝑀𝑖 = {𝒙
0
+
𝒙∈𝑀𝑖
span 𝒅 0
,…𝒅
𝑖
}
•
•
This is exactly due to the A- orthogonality of 𝒅 𝑖 ’s
Suppose all the d(0), d(1)… d(n-1) are linearly independent
(l.i.), we have 𝑀𝑛−1 = span 𝒅 0 , … 𝒅 𝑛−1 = Rn
•
Therefore, 𝒙(𝑛) = arg min 𝑓 𝒙 = 𝒙∗ is the optimum
15
Linearly Independent Directions
•
•
Proposition: If A is positive definite, and the set of
nonzero vectors d(0), d(1)… d(n-1) are, then these vectors
are linearly independent (l.i.)
Proof: Suppose there are constants ai, i=0,1,2,…n such
0
 1
a 0d  a1d   a n1d
 n  1
0
Recall l.i. only
if all a's = 0
Multiplying by A and then scalar product with d(i) gives
 i  T
a i d  Ad ( i )  0
Since A is positive definite, it follows ai = 0
Hence, the vectors are l.i.
16
Conjugate Direction Method
•
Given the search direction 𝒅 𝑖 , the i-th iteration
r   b  Ax 
i
a
i
i
i  T
i
d
r

 
T
i

d  A d  i 


x
i  1
 x    a  d  
r
i  1
 r    a  A d  
i
i
i
i
i
i
What we have not
yet covered is how
to get the n search
directions.
We'll cover that
shortly, but the
next slide presents
an algorithm,
followed by an
example.
17
Orthogonalization
•
•
To quickly generate A–orthogonal search directions,
one can use the Gram-Schmidt orthogonalization
procedure
Suppose we are given a l.i. set of n vectors {u0, u1, …,
un-1}, successively construct d(j), j=0, 1, 2, … n-1, by
removing from uj all the components along directions
d
•
j  1
, d
j  2
, ... , d 
0
The trick is to use the gradient directions; i.e., ui = r(i)
for all i=0,1,…,n-1, which yields the very popular
conjugate gradient method
18
Conjugate Gradient Method
•
•
Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b
While ||r(i) ||  e Do
(a) If i = 0 Then d(0) = r(0)
Else Begin
𝛽 (𝑖)
=
[𝒓(𝑖) ]𝑇 𝒓(𝑖)
[𝒓(𝑖−1) ]𝑇 𝒓(𝑖−1)
d(i) = r(i) + b(i)d(i-1)
End
19
Conjugate Gradient Algorithm
(b) Update stepsize
𝛼 (𝑖) =
(𝒅(𝑖) )′ 𝒓 𝑖
(𝒅(𝑖) )′𝑨 𝒅 𝑖
(c) x(i+1) = x(i) + a(i) d(i)
(d) r(i+1) = r(i) - a(i) Ad(i)
(e) i := i + 1
End While
Note that
there is only
one matrix vector
multiply per
iteration!
20
Conjugate Gradient Example
•
Using the same system as before, let
 10 5 4 
 10 
3.354 
A   5 12 6  , b   20  We are solving for x  1.645 
 4 6 10 
 15 
3.829 
•
•
Select i=0, x(0) = 0, e = 0.1, then r(0) = b
With i = 0, d(0) = r(0) = b
21
Conjugate Gradient Example
𝛼
(0)
=
(𝒅(0) )′ 𝒓 0
(𝒅(0) )′𝑨 𝒅 0
x (1)  x (0 )  a (10)d (10)
=0.0582
0 
 10   0.582 
 0   0.0582   20    1.165 
0 
 15   0.873 
r (1)  r (0 )  a (10) Ad (10)
 10 
 10 5 4   10  1.847 
  20   0.0582   5 12 6   20    2.129 
 15 
 4 6 10   15  1.606 
i  i 1 1
This first step exactly matches Steepest Descent
22
Conjugate Gradient Example
•
With i=1 solve for b(1)
b  21 
1
d(2)
•
 1  T
 1
r
r
10.524
  T

 0.01452
725
r  0  r  0
 
1.847 
 10  1.992 
 1
 21  (1)
 r  b d 0   2.128   0.01452   20   1.838 
1.606 
 15  1.824 
Then
𝛼 (1)
=
(𝒅(1) )′ 𝒓 1
(𝒅(1) )′𝑨 𝒅 1
=
725
12450
= 1.388
23
Conjugate Gradient Example
•
And
 0.582 
1.993   3.348 
x ( 2 )  x (1)  a ( 21)d ( 21 )   1.165   1.388  1.838   1.386 
 0.873 
1.824   3.405 
1.847 
 10 5 4  1.993   2.923 
1
r ( 2 )  r (1)  a ( 2 ) Ad ( 21 )   2.129   1.388   5 12 6  1.838    0.532 
1.606 
 4 6 10  1.824   2.658 
i  11  2
24
Conjugate Gradient Example
•
With i=2 solve for b(2)
b  32 
2
d(3)
•
Then
𝛼 (3)
 2  T
 2
r
r
15.897
  T

 1.511
r 1  r 1 10.524
 
 2.924 
1.992   0.086 
 2
 23  (2)
 r  b d 1   0.531   1.511  1.838    3.308 
 2.658 
1.824   5.413 
=
(𝒅(2) )′ 𝒓 2
(𝒅(2) )′𝑨 𝒅 2
= 0.078
25
Conjugate Gradient Example
•
And
x ( 3 )  x ( 2 )  a ( 32 )d ( 3 )
2
 3.348 
0.086   3.354 
 1.386   0.783   3.308   1.646 
 3.405 
 5.413   3.829 
r ( 3 )  r ( 2 )  a ( 32) Ad ( 32 )
 2.923 
 10 5 4  0.086  0 
  0.532   0.783   5 12 6  3.308   0 
 2.658 
 4 6 10   5.413  0 
i  21 3
Done in 3 = n iterations!
26
Krylov Subspace Method
•
•
•
•
Recall in the i-th iteration of the generic Krylov solver,
we want to find x(i) in {x(0) + Ki(r(0),A)} that minimizes
||r(i) ||= ||b-Ax(i) ||
In conjugate gradient, the iterate x(i) actually minimizes
1 𝑇
𝑓 𝒙 = 𝒙 𝑨𝒙 − 𝒃𝑇 𝒙
2
over the linear manifold {x(0) + Ki(r(0),A)}
With positive definite A, both methods attain
𝒙(𝑛) = 𝒙∗ = 𝐀−1 𝒃
For any invertible A, we have to use Generalized
Minimum Residual Method (GMRES)
27
References
•
D. P. Bertsekas, Nonlinear Programming, Chapter 1, 2nd
Edition, Athena Scientific, 1999
•
Y. Saad, Iterative Methods for Spare Linear Systems,
2002, free online at
www.users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf
28
Download