Steepest Descent and Conjugate Gradient Methods MATH 450 October 6, 2008 Steepest Descent Method Recall the minimization of function p(x) = hx, Axi − 2hx, bi: hy, b − Axi2 . p(x + t̂y) = p(x) − hy, Ayi Thus we can construct an iterative method by step (k) t hy (k) , b − Ax(k) i = hy (k) , Ay (k) i along the direction of y (k) at x(k) , i.e., x(k+1) = x(k) + t(k) · y (k) . For steepest descent method we choose y (k) = b − Ax(k) , which is the steepest descent direction, i.e, the negative gradient of p(x) at x(k) . This is the residual vector, and we can show that hy (k) , e(k) i ≥ 0 for positive definite A unless Ax = b (see Problem 4 of this section). 1 Steepest Descent Method What is the number of iterations? 2 General Questions for Non-stationary Iterative Methods 1. What is the direction to step from x(k) to x(k+1) ? 2. What is the step size? 3. How many iterations needed? 3 A-orthonormal System For a set of nonzero vectors {yk }: 1. Orthogonal: hyi , yj i = 0 if i 6= j. 2. Orthonormal: hyi , yj i = δij . 3. A-Orthonormal: hyi , Ayj i = δij for A symmetric and positive definite. 4 A-orthonormal System Let {u(1), u(2), . . . , u(n) } be an A-orthonormal system. Define x(k) = x(k−1) + hb − Ax(k−1) , u(k) iu(k) , (1 ≤ i ≤ n) in which x(0) is an arbitrary point in Rn . Then Ax(n) = b. Proof Note the step size is t(k) = hb − Ax(k−1) , u(k) i and the iteration is x(k) = x(k−1) + t(k) u(k) , and thus Ax(k) = Ax(k−1) + t(k) Au(k). (1) But then Ax(k−1) = Ax(k−2) + t(k−1) Au(k−1) and so on so forth. So Ax(n) = Ax(0) + t(1)Au(1) + t(2) Au(2) + · · · + t(n) Au(n). Taking inner product of this vector with any u(k) with 1 ≤ k ≤ n we get hAx(n) , u(k)i = hAx(0), u(k)i + t(k) . 5 Or hAx(n) − b, u(k)i = hAx(0) − b, u(k) i + t(k) . In order to show that hAx(n) − b, u(k) i = 0 we need to show the right-hand side is 0. By definition t(k) = hb − Ax(k−1) , u(k) i = hb − Ax(0) , u(k) i + hAx(0) − Ax(1) , u(k) i + · · · + hAx(k−2) − Ax(k−1) , u(k) i = hb − Ax(0) , u(k) i + h−t(1)Au(1) , u(k) i + · · · + h−t(k−1)Au(k−1) , u(k) i (use Eq.(1)) = hb − Ax(0) , u(k) i, thus hAx(n) − b, u(k) i = 0 for any u(k), and therefore must be 0 in Rn . Normalization is needed if {u(i)} is an A-orthogonal system rather than Aorthonormal, see Thm 2 in Page 237. How to produce orthonormal system → Conjugate Gradient Method Input x(0), A, M, b, ǫ r (0) ← b − Ax(0) u(0) ← r (0) while not convergent and k < M If u(k) = 0 then stop hr (k) ,r (k)i (k) t ← (k) (k) hu ,Au i x(k+1) ← x(k) + t(k)u(k) r (k+1) ← r (k) − t(k)Au(k) If kr (k+1)k ≤ ǫ then stop hr (k+1) ,r (k+1) i (k) s ← hr (k) ,r (k)i u(k+1) ← r (k+1) + s(k)u(k) Major idea: use residual vectors to generate a set of orthogonal vectors 6 How to produce orthonormal system → Conjugate Gradient Method • Prove that {u(k) } is an A-orthogonal set • Prove that {r(k) } is an orthogonal set • Prove that r(i) = b − Ax(i) Please see Thm 3 of Section 4.7 for the full proof. 7