C3 THEORY OF COMPUTATIONAL DYNAMICS HANDOUT 3 CONVERGENCE OF THE NEWTON METHOD We show that as long as D xF is not singular at the root, the Newton method in higher dimensions still converges quadratically. Recall that the Newton method for finding a root x* of a function F : Rm → Rm is given by xn+1 xn - [D x F]-1.F(xn) = (1) n To determine the rate of convergence define εn = xn - x*, as in one dimension. Then εn+1 = εn - [D (x*+ε )F]-1.F(x*+εn) = [D(x*+ε )F] (D(x*+ε )F. εn - F(x*+εn)) n -1 n (2) n Now, Taylor expanding, we have 2 2 D (x*+ε )F = D x*F + Dx* F(εn ,•) + O( εn F(x*+εn) = F(x*) + Dx*F. εn + D x* F(εn ,εn) + O( εn n ) (3) and 3 2 ) (4) Since F(x*) = 0, (3) and (4) together give D (x*+ε )F. εn - F(x*+εn) = n 1 2 2 D x* F(εn ,εn) + O( εn 3 ) (5) To complete the analysis, we claim that if Dx*F is not singular, then [D(x*+ε )F]-1 [Dx*F]-1 + O( εn ) = n (6) Substituting (5) and (6) into (2) then yields εn+1 1 2 = 2 [D x*F]-1.D x* F(εn ,εn) + O( εn 3 ) which is exactly analogous to the one dimensional case and gives the required quadratic convergence: εn+1 ≤ K. εn 2 It thus remains to verify (6), or in other words that if A is not singular, then (A+H)-1 = A-1 + O( H ) (7) Intuitively this is reasonable since (A+H).(A-1+O( H )) = Id + O( H ) Newton Method 2 so that A-1+O( H ) is the inverse of A+H to first order. To be more precise, we need to show that the map Φ(A) = A-1 is differentiable at A if A is not singular, since in that case (7) is just the first order Taylor expansion of Φ(A) at A. There are two approaches to showing the differentiability of Φ: 1. CRAMER'S RULE Recall that if A is not singular, then Φ(A) 1 † det AB = where B is the matrix of cofactors of A. Now, det A and the cofactors of A are all polynomials in the coefficients of A and hence infinitely differentiable as functions of A. Hence if det A ≠ 0, so is Φ(A). 2. COMPUTATION OF THE DERIVATIVE The above argument does not actually tell us what the derivative of Φ is. Although not strictly necessary for our analysis of the Newton method, we give a detailed derivation of DAΦ since this is a typical example of calculations of this kind. Note the overall strategy: we first suppose that Φ is differentiable and use this to compute a candidate for DAΦ; we then show that this candidate does indeed satisfy the definition of a derivative. We begin by considering an expression of the form (A+H)(A-1+Ψ(H)+O( H 2 )) = Id where Ψ is a linear function which is our candidate for DAΦ. Expanding this, we get 2 Id + HA-1 + AΨ(H) + HΨ(H) + O( H ) = Id so that Ψ(H) -A-1HA-1 - HΨ(H) + O( H = 2 ) If Ψ is linear then Ψ(H) = O( H ) and so collecting O( H ) terms we get Ψ(H) -A-1HA-1 = To show that Ψ(H) is indeed the derivative of Φ, it is sufficient to verify that Φ(A+H) - Φ(A) - Ψ(H) 2 = O( H ) = = (A+H)(A+H) -1 - (A+H)A-1 + AA-1HA-1 + HA-1HA-1 2 Id - Id - HA-1 + HA-1 + O( H ) = O( H Now (A+H)[Φ(A+H) - Φ(A) - Ψ(H)] Hence if Φ(A+H) - Φ(A) - Ψ(H) = O( H (A+H)O( H k ) so that k = 2 as required. = k ), we have O( H 2 ) 2 )