0 - University of Illinois at Urbana

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 25: Krylov Subspace Methods Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign haozhu@illinois.edu 12/2/2014 1 Announcements • • No class on Thursday Dec 4 Homework 8 posted, due on Thursday Dec 11 2 Krylov Subspace Outline • • • • • • Review of fields and vector spaces Eigensystem basics Definition of Krylov subspaces and annihilating polynomial Generic Krylov subspace solver Steepest descent Conjugate gradient 3 Krylov Subspace • Iterative methods to solve Ax=b build on the idea that m 1   1 1 j x  A b     a j 1 A  b  a0 j  0  • Given a matrix A and a vector v, the ith order Krylov subspace is defined as 𝐊 𝑖 𝐯, 𝐀 = span {𝐯, 𝐀𝐯, 𝐀2 𝐯, … , 𝐀𝑖−1 𝐯} • For a specified matrix A and a vector v, the largest value of i is bounded 4 Generic Krylov Subspace Solver • • • The following is a generic Krylov subspace solver method for solving Ax = b using only matrix vector multiplies Step 1: Start with an initial guess x(0) and some predefined error tolerance e > 0; compute the residual, r(0) = b – A x(0); set i = 0 Step 2: While ||r(i) ||  e Do (a) i := i + 1 (b) get Ki(r(0),A) (c) find x(i) in {x(0) + Ki(r(0),A)} to minimize ||r(i) || Stop 5 Krylov Subspace Solver • Note that no calculations are performed in Step 2 once i becomes greater than its largest value The Krylov subspace methods differ from each other in • – • • – the construction scheme of the Krylov subspace in Step 2(b) of the scheme the residual minimization criterion used in Step 2(c) A common initial guess is x(0) = 0, giving r(0) = b – A x(0) = b Every solver involves the A matrix only in matrixvector products: Air(0), i=1,2,… 6 Iterative Optimization Methods • • • • Directly constructing the Krylov Subspace for any A and r(0) would be computationally expensive We will instead introduce iterative optimization methods for solving Ax = b, which turns out to be a a special case of Krylov Subspace method Without loss of generality, consider the system Ax = b where A is symmetric (i.e., A = AT) and positive definite (i.e., A≻0, all eigenvalues nonnegative) Any Ax = b with nonsingular A is equivalent to AT Ax = AT b where AT A is symmetric and positive definite 7 Optimization Problem • • Consider the convex problem 1 𝑇 𝑓 𝒙 = 𝒙 𝑨𝒙 − 𝒃𝑇 𝒙 2 The optimal x* that minimizes f(x) is given by the solution of T •  x f   0  A x  b which is exactly the solution to Ax = b The classical method for convex optimization entails the application of the steepest descent scheme 8 Steepest Descent Algorithm • • • • Iteratively update x along the direction −𝛻𝑓 𝒙 = 𝒃 − 𝑨𝒙 The stepsize is selected to minimize f(x) along −𝛻𝑓 𝒙 Set i=0, e > 0, x(0) = 0, so r(i) = b - Ax(0) = b While ||r(i) ||  e Do (a) calculate a  i i r r   T r i  A r i   (b) x(i+1) = x(i) + a(i) r(i) (c) r(i+1) = r(i) - a(i) Ar(i) (d) i := i + 1 End While  i  T Note there is only one matrix, vector multiply per iteration 9 Steepest Descent Convergence • We define the A-norm of x x • 2 A  x TA x We can show exponential convergence, that is 𝒙 𝑖 − 𝒙∗ ≤ 𝜅−1 𝑖 𝜅+1 𝒙 0 − 𝒙∗ where 𝜅 is the condition number of A, i.e.,  max    min 10 Steepest Descent Convergence • • Because (𝜅-1)/(𝜅+1) < 1 the error will decrease with each steepest descent iteration, albeit potentially quite slow for large 𝜅 The function values decreases quicker, as per   f x    f x  f x   f x* i 0 • *   1      1   2i but this can still be quite slow if 𝜅 is large The issue is steepest descent often finds itself taking steps along the same direction as that of its earlier steps 11 Conjugate Direction Methods • • An improvement over the steepest descent is to take the exact number of steps using a set of search directions and obtain the solution after n such steps; this is the basic idea in the conjugate direction methods Image compares steepest descent with a conjugate direction approach Image Source: http://en.wikipedia.org/wiki/File:Conjugate_gradient_illustration.svg 12 Conjugate Direction Methods • The basic idea is the n search directions denoted by 0  1 d ,d , ,d  n  1 need to be A-orthogonal, that is  i  T  j d A d 0,   • i  j , i, j  0 ,1, ... , n  2 At the ith iteration, we will update x i  1  x    a  d   i i i i  0 ,1, ... , n  2 13 Stepsize Selection • The stepsize 𝛼 (𝑖) is chosen such that 𝑓 𝒙(𝑖) + 𝛼 (𝑖) 𝒅(𝑖) = min 𝑓(𝒙 𝛼 • 𝑖 + 𝛼𝒅 𝑖 ) By setting to zero the derivative 0 = (𝒅(𝑖) )′𝛻𝑓 𝒙 𝑖 +𝛼 𝑖 𝒅 𝑖 = (𝒅(𝑖) )′(𝑨 𝒙 𝑖 + 14 Convergence Proof • To prove the convergence of conjugate direction method, we can show that 𝒙(𝑖+1) = arg min 𝑓(𝒙) where 𝑀𝑖 = {𝒙 0 + 𝒙∈𝑀𝑖 span 𝒅 0 ,…𝒅 𝑖 } • • This is exactly due to the A- orthogonality of 𝒅 𝑖 ’s Suppose all the d(0), d(1)… d(n-1) are linearly independent (l.i.), we have 𝑀𝑛−1 = span 𝒅 0 , … 𝒅 𝑛−1 = Rn • Therefore, 𝒙(𝑛) = arg min 𝑓 𝒙 = 𝒙∗ is the optimum 15 Linearly Independent Directions • • Proposition: If A is positive definite, and the set of nonzero vectors d(0), d(1)… d(n-1) are, then these vectors are linearly independent (l.i.) Proof: Suppose there are constants ai, i=0,1,2,…n such 0  1 a 0d  a1d   a n1d  n  1 0 Recall l.i. only if all a's = 0 Multiplying by A and then scalar product with d(i) gives  i  T a i d  Ad ( i )  0 Since A is positive definite, it follows ai = 0 Hence, the vectors are l.i. 16 Conjugate Direction Method • Given the search direction 𝒅 𝑖 , the i-th iteration r   b  Ax  i a i i i  T i d r    T i  d  A d  i    x i  1  x    a  d   r i  1  r    a  A d   i i i i i i What we have not yet covered is how to get the n search directions. We'll cover that shortly, but the next slide presents an algorithm, followed by an example. 17 Orthogonalization • • To quickly generate A–orthogonal search directions, one can use the Gram-Schmidt orthogonalization procedure Suppose we are given a l.i. set of n vectors {u0, u1, …, un-1}, successively construct d(j), j=0, 1, 2, … n-1, by removing from uj all the components along directions d • j  1 , d j  2 , ... , d  0 The trick is to use the gradient directions; i.e., ui = r(i) for all i=0,1,…,n-1, which yields the very popular conjugate gradient method 18 Conjugate Gradient Method • • Set i=0, e > 0, x(0) = 0, so r(0) = b - Ax(0) = b While ||r(i) ||  e Do (a) If i = 0 Then d(0) = r(0) Else Begin 𝛽 (𝑖) = [𝒓(𝑖) ]𝑇 𝒓(𝑖) [𝒓(𝑖−1) ]𝑇 𝒓(𝑖−1) d(i) = r(i) + b(i)d(i-1) End 19 Conjugate Gradient Algorithm (b) Update stepsize 𝛼 (𝑖) = (𝒅(𝑖) )′ 𝒓 𝑖 (𝒅(𝑖) )′𝑨 𝒅 𝑖 (c) x(i+1) = x(i) + a(i) d(i) (d) r(i+1) = r(i) - a(i) Ad(i) (e) i := i + 1 End While Note that there is only one matrix vector multiply per iteration! 20 Conjugate Gradient Example • Using the same system as before, let  10 5 4   10  3.354  A   5 12 6  , b   20  We are solving for x  1.645   4 6 10   15  3.829  • • Select i=0, x(0) = 0, e = 0.1, then r(0) = b With i = 0, d(0) = r(0) = b 21 Conjugate Gradient Example 𝛼 (0) = (𝒅(0) )′ 𝒓 0 (𝒅(0) )′𝑨 𝒅 0 x (1)  x (0 )  a (10)d (10) =0.0582 0   10   0.582   0   0.0582   20    1.165  0   15   0.873  r (1)  r (0 )  a (10) Ad (10)  10   10 5 4   10  1.847    20   0.0582   5 12 6   20    2.129   15   4 6 10   15  1.606  i  i 1 1 This first step exactly matches Steepest Descent 22 Conjugate Gradient Example • With i=1 solve for b(1) b  21  1 d(2) •  1  T  1 r r 10.524   T   0.01452 725 r  0  r  0   1.847   10  1.992   1  21  (1)  r  b d 0   2.128   0.01452   20   1.838  1.606   15  1.824  Then 𝛼 (1) = (𝒅(1) )′ 𝒓 1 (𝒅(1) )′𝑨 𝒅 1 = 725 12450 = 1.388 23 Conjugate Gradient Example • And  0.582  1.993   3.348  x ( 2 )  x (1)  a ( 21)d ( 21 )   1.165   1.388  1.838   1.386   0.873  1.824   3.405  1.847   10 5 4  1.993   2.923  1 r ( 2 )  r (1)  a ( 2 ) Ad ( 21 )   2.129   1.388   5 12 6  1.838    0.532  1.606   4 6 10  1.824   2.658  i  11  2 24 Conjugate Gradient Example • With i=2 solve for b(2) b  32  2 d(3) • Then 𝛼 (3)  2  T  2 r r 15.897   T   1.511 r 1  r 1 10.524    2.924  1.992   0.086   2  23  (2)  r  b d 1   0.531   1.511  1.838    3.308   2.658  1.824   5.413  = (𝒅(2) )′ 𝒓 2 (𝒅(2) )′𝑨 𝒅 2 = 0.078 25 Conjugate Gradient Example • And x ( 3 )  x ( 2 )  a ( 32 )d ( 3 ) 2  3.348  0.086   3.354   1.386   0.783   3.308   1.646   3.405   5.413   3.829  r ( 3 )  r ( 2 )  a ( 32) Ad ( 32 )  2.923   10 5 4  0.086  0    0.532   0.783   5 12 6  3.308   0   2.658   4 6 10   5.413  0  i  21 3 Done in 3 = n iterations! 26 Krylov Subspace Method • • • • Recall in the i-th iteration of the generic Krylov solver, we want to find x(i) in {x(0) + Ki(r(0),A)} that minimizes ||r(i) ||= ||b-Ax(i) || In conjugate gradient, the iterate x(i) actually minimizes 1 𝑇 𝑓 𝒙 = 𝒙 𝑨𝒙 − 𝒃𝑇 𝒙 2 over the linear manifold {x(0) + Ki(r(0),A)} With positive definite A, both methods attain 𝒙(𝑛) = 𝒙∗ = 𝐀−1 𝒃 For any invertible A, we have to use Generalized Minimum Residual Method (GMRES) 27 References • D. P. Bertsekas, Nonlinear Programming, Chapter 1, 2nd Edition, Athena Scientific, 1999 • Y. Saad, Iterative Methods for Spare Linear Systems, 2002, free online at www.users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf 28

0 - University of Illinois at Urbana

Related documents

Products

Support

0 - University of Illinois at Urbana

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib