BIT 2002, Vol. 42, No. 1, pp. 206–213 0006-3835/02/4201-0206 $16.00 c Swets & Zeitlinger A UNIFIED CONVERGENCE THEORY FOR NEWTON-TYPE METHODS FOR ZEROS OF NONLINEAR OPERATORS IN BANACH SPACES ∗ XINGHUA WANG1 , CHONG LI2 , and MING-JUN LAI3 † 1 2 Department of Mathematics, Zhejiang University, Hangzhou, 310028, P. R. China Department of Applied Mathematics, Southeast University, Nanjing, 210096, P. R. China 3 Department of Mathematics, University of Georgia, Athens, GA 30602, USA. email: mjlai@math.uga.edu Abstract. The paper is concerned with the convergence problem of Newton type methods for finding zeros of nonlinear operators in Banach spaces. Some families of nonlinear operators are defined by different Lipschitz conditions and an “universal constant” is introduced so that a unified convergence determination of these methods is established for the defined families. AMS subject classification: 47H10, 65J15, 65H10. Key words: Newton’s method, Banach space, convergence. 1 Introduction. Let E and F be real or complex Banach spaces with norm · . For x0 ∈ E and r > 0, let B(x0 , r) and B(x0 , r) denote the open and closed ball with radius r and center x0 , respectively. Let f : B(x0 , r) −→ F be a nonlinear operator with the continuous Fréchet derivative f . In this paper, we assume that the inverse f (x0 )−1 of f (x0 ) exists. The Newton method and its variations are the major numerical methods to solve the equation f (x) = 0. Regarding the existence and uniqueness of the solution and the convergence of Newton’s method, the most famous result is the well-known Kantorovich Theorem. Theorem 1.1 (Kantorovich Theorem [6]). Suppose that f (x0 )−1 f (x) satisfies 1 (1.1) f (x0 )−1 (f (x ) − f (x)) ≤ γx − x, ∀x , x, x − x + x − x0 ≤ . γ Let β = f (x0 )−1 f (x0 ). If γβ ≤ 1/2, then the Newton sequence {xn } defined by (1.2) xn+1 = xn − f (xn )−1 f (xn ), n = 0, 1, 2, . . . , converges to the unique solution of the equation f (x) = 0 in B(x0 , 1/γ). ∗ Received September 2000. Revised April 2001. Communicated by Åke Björck. project is jointly supported by the Special Funds for Major State Basic Research Projects (Grant No. G19990328), the National (Grant No 19971013) and Jiangsu Provincial (Grant No. BK99001) Natural Science Foundation of China. † This CONVERGENCE FOR NEWTON-TYPE METHODS 207 Since then, there have been many similar convergence results for the Newton method and its variations under various similar conditions established in the literature. See, for example, [7, 12, 5, 1]. However, there is no unified convergence theorem. The purpose of the present paper is to establish such a theorem. After reviewing some preliminary results on majorizing functions and recalling the definitions of the Newton method and its variations in Section 2, we prove some convergence theorems for the Newton method in Section 3 and its variations in Section 4. Finally, we present some conclusions in Section 5. 2 Preliminaries. In our study, the following cubic majorizing function h plays a key role: 1 1 h(t) = β − t + γt2 + Lt3 , 2 6 (2.1) where β, γ, L are some fixed constants. Proposition 2.1. Let (2.2) r= 2 γ+ , γ 2 + 2L 2(γ + 2 γ 2 + 2L) b= . 3(γ + γ 2 + 2L)2 Then, the function h is decreasing monotonically in [0, r], while it is increasing monotonically in [r, +∞]. Moreover, if β ≤ b, then h(r) = β − b ≤ 0, h(β) > 0, h(+∞) > β > 0. Thus h has a unique zero in two intervals, respectively, which are denoted by t∗ and t∗∗ . They satisfy r (2.3) β < t∗ < β < r < t∗∗ b when β < b and t∗ = t∗∗ when β = b. Next we introduce some families of nonlinear operators. For γ, L, r, b given as above, let C 1 (x0 , r) denote the set of all operators mapping from B(x0 , r) to F such that f is continuous on B(x0 , r) and f (x0 )−1 exists while C 2 (x0 , r) denotes the subset of C 1 (x0 , r) such that f is continuous on B(x0 , r). Recall the Kantorovich condition (1.1). Let us give more similar conditions: (2.4) (2.5) (2.6) (2.7) (2.8) f (x0 )−1 (f (x) − f (x0 )) ≤ γx − x0 , ∀x, x − x0 ≤ r, f (x0 )−1 (f (x ) − f (x)) ≤ γ + Lx − x0 + 12 Lx − x x − x, −1 f (x0 ) ∀x , x, x − x + x − x0 ≤ r, (f (x) − f (x0 )) ≤ γ + 12 Lx − x0 x − x0 , ∀x, f (x0 )f (x0 ) ≤ γ, x − x0 ≤ r, f (x0 )−1 (f (x ) − f (x)) ≤ Lx − x, ∀x , x, x − x + x − x0 ≤ r, (2.9) f (x0 )−1 (f (x) − f (x0 )) ≤ Lx − x0 , ∀x, x − x0 ≤ r. 208 X. WANG, C. LI, AND M.-J. LAI We define K (1) (x0 , γ) = {f ∈ C 1 (x0 , r) : (1.1) holds}, (1) Kcent(x0 , γ) = {f ∈ C 1 (x0 , r) : (2.4) holds}, K (1) (x0 , γ, L) = {f ∈ C 1 (x0 , r) : (2.5) holds}, (1) Kcent (x0 , γ, L) = {f ∈ C 1 (x0 , r) : (2.6) holds}, K (2) (x0 , γ, L) = {f ∈ C 1 (x0 , r) : (2.7) and (2.8) hold}, (2) Kcent (x0 , γ, L) = {f ∈ C 2 (x0 , r) : (2.7) and (2.9) hold}. Then we have the following: Proposition 2.2. (1) (1) (i) Kcent (x0 , γ) = Kcent (x0 , γ, 0); (ii) K (1) (x0 , γ) = K (1) (x0 , γ, 0); (2) (1) (iii) K (2) (x0 , γ, L) ⊂ Kcent(x0 , γ, L) ⊂ K (1) (x0 , γ, L) ⊂ Kcent(x0 , γ, L). Proof. It suffices to prove that (2.10) (2) Kcent (x0 , γ, L) ⊂ K (1) (x0 , γ, L), since the other relations are clear. Observe that for any x , x with x − x + x − x0 ≤ r, f (x0 )−1 (f (x ) − f (x)) ≤ f (x0 )−1 f (x0 ) x − x + 1 f (x0 )−1 0 ×(f (x + τ (x − x)) − f (x1 )) dτ x − x ≤ γ + Lx − x0 + 12 Lx − x x − x, so that (2.10) holds. This completes the proof. We also need some other lemmas. Their proofs are straightforward and hence, are omitted here. Throughout the paper, we always denote β = f (x0 )−1 f (x0 ). (1) Lemma 2.3. Let f ∈ Kcent(x0 , γ, L) and suppose that β ≤ b. Then for any x ∈ B(x0 , r), f (x)−1 exists and satisfies (2.11) f (x)−1 f (x0 ) ≤ − 1 . h (x − x0 ) (2) Lemma 2.4. Let f ∈ Kcent(x0 , γ, L) and suppose that β ≤ b. Then for any x ∈ B(x0 , r), (2.12) f (x0 )−1 f (x) ≤ h (x − x0 ). CONVERGENCE FOR NEWTON-TYPE METHODS 3 209 Convergence theorems for the Newton method. The following two theorems can be obtained from Theorems 3.1 and 1.5 in [9] by taking L(u) = γ + Lu. However, to make this paper self-contained, we include their proofs here. Theorem 3.1. Suppose that f ∈ K (1) (x0 , γ, L). If β ≤ b, then the Newton sequence {xn } defined by (1.2) converges to the solution of the equation f (x) = 0 in B(x0 , r). Proof. Let {tn } denote the sequence generated by the Newton method (1.2) for the majorizing function h with the initial point t0 = 0. It is easy to check that tn converges to t∗ increasingly monotonically. We will inductively show the following inequality: xn − xn−1 ≤ tn − tn−1 , (3.1) n = 1, 2, . . . . Obviously, (3.1) holds for n = 1. Now assume that (3.1) holds for some n. Then xn−1+τ ∈ B(x0 , t∗ ) for all 0 ≤ τ ≤ 1, where xn−1+τ = xn−1 + τ (xn − xn−1 ). (3.2) Set tn−1+τ = tn−1 + τ (tn − tn−1 ). (3.3) We have that f (xn ) = f (xn ) − f (xn−1 ) − f (xn−1 )(xn − xn−1 ) 1 {f (xn−1+τ ) − f (xn−1 )} (xn − xn−1 )dτ. = 0 Using the condition (2.5), it follows from the induction assumption that f (x0 )−1 f (xn ) ≤ 1 0 1 ≤ 0 γ + Lxn−1 − x0 + 12 Lτ xn − xn−1 τ xn − xn−1 2 dτ γ + Ltn−1 + 12 Lτ (tn − tn−1 ) τ (tn − tn−1 )2 dτ = h(tn ). Then, from Proposition 2.2 and Lemma 2.1, we have that xn+1 − xn ≤ f (xn )−1 f (x0 )f (x0 )−1 f (xn ) ≤ −h(tn )/h (tn ) = tn+1 − tn , that is, (3.1) holds for all n = 1, 2, . . . . Therefore, {xn } ⊂ B(x0 , t∗ ) converges to an element, say, x∗ and {f (xn )} is uniformly bounded from condition (2.5). It follows that f (x∗ ) = 0, proving the theorem. 210 X. WANG, C. LI, AND M.-J. LAI (1) Theorem 3.2. Suppose that f ∈ Kcent (x0 , γ, L). If β ≤ b, then the equation f (x) = 0 has a unique solution in the closed ball B(x0 , r). Proof. In fact, we can show the following two more general results: Claim I. Suppose that f satisfies condition (2.6) in B(x0 , t∗ ). Then f (x) = 0 has at least one solution in B(x0 , t∗ ) when β ≤ b. Claim II. Suppose that f satisfies condition (2.6) in B(x0 , ξ), where ξ = t∗ if β = b and t∗ ≤ ξ < t∗∗ if β < b. Then f (x) = 0 has a unique solution in B(x0 , ξ) when β ≤ b. Proof of Claim I. Define xn+1 = xn − f (x0 )−1 f (xn ), n = 0, 1, 2, . . . , tn+1 = tn + h(tn ), n = 0, 1, 2, . . . . t0 = 0, Similarly, we can inductively show that inequality (3.1) holds for the above defined {xn }. Indeed, using the same signs given by (3.2) and (3.3), we have 1 xn+1 − xn = − {f (x0 )−1 f (xn−1+τ ) − I}(xn − xn−1 )dτ. 0 From condition (2.6) and the induction assumption it follows that 1 xn+1 − xn ≤ γ + 12 Lxn−1+τ − x0 xn−1+τ − x0 xn − xn−1 dτ 0 1 0 1 ≤ = γ + 12 Ltn−1+τ tn−1+τ (tn − tn−1 )dτ {h (tn−1+τ ) + 1}(tn − tn−1 )dτ = tn+1 − tn . 0 This proves that {xn } ⊂ B(x0 , t∗ ) and converges to a solution x∗ of f (x) = 0 since tn converges to t∗ increasingly monotonically. The proof of Claim I is complete. Proof of Claim II. In order to show Claim II, for any x0 ∈ B(x0 , ξ), define two sequences as follows: xn+1 = xn − f (x0 )−1 f (xn ), tn+1 = tn + h(tn ), n = 0, 1, . . . , n = 0, 1, . . . , where t0 = x0 − x0 . Let xτn = xn + τ (xn − xn ). Then 1 xn+1 − xn+1 = − {f (x0 )−1 f (xτn ) − I}(xn − xn )dτ. 0 With similar arguments as before, we can obtain that xn − xn ≤ tn − tn , n = 0, 1, 2, . . . , so that xn converges to x∗ , the limit of xn , too. This proves the conclusion of Claim II and completes the proof of the theorem. 211 CONVERGENCE FOR NEWTON-TYPE METHODS It should be remarked that, in Claim I and Claim II, the restrictions on the nonlinear operator f depend upon t∗ and t∗∗ and so upon the majorizing function h although they are more general than Theorem 3.2. Note that the family of nonlinear operators in Theorem 3.2 is independent from h and β. Obviously, when L = 0 Theorem 3.1 and 3.2 give the Kantorovich theorem. Moreover, they, with Proposition 2.2, also give the following result which is the main result obtained by Huang [5] for f ∈ K (2) (x0 , γ, L) and by Gutiérrez [1] for (2) f ∈ Kcent(x0 , γ, L). (2) Corollary 3.3. Suppose that f ∈ K (2) (x0 , γ, L) or f ∈ Kcent(x0 , γ, L). If β ≤ b, then the Newton sequence {xn } defined by (1.2) converges to the unique solution of the equation f (x) = 0 in B(x0 , r). 4 Convergence theorems for some variations of the Newton method. There are many modified Newton methods. However, the ones that obviously improve the computational efficiency are the following two variations which were proposed by King [7] and Werner [12]: (4.1) xn+1 = xn − f (xm[n/m] )−1 f (xn ), n = 0, 1, . . . , where [n/m] denotes the integer part of n/m, and xn+1 = xn − f (yn )−1 f (xn ), (4.2) yn+1 = xn+1 − 12 f (yn )−1 f (xn+1 ), y0 = x0 . n = 0, 1, . . . , We in √ particular recommend the iteration (4.2). Its convergence order is raised to 1 + 2, although the number of the evaluation of the function value is twice as many as in the Newton method. Theorem 4.1. Let f ∈ K (1) (x0 , γ, L) and suppose that β ≤ b. Then the sequence {xn } generated by (4.1) with the initial point x0 converges to the unique solution x∗ of the equation f (x) = 0. Proof. Let {tn } be the corresponding sequence {xn } when the iteration (4.1) is applied to the real function h. Then it is easy to show that {tn } is increasing monotonically and tending to t∗ . Furthermore, note that f (xn+1 ) = f (xn+1 ) − f (xn ) − f (xm[n/m] )(xn+1 − xn ) 1 [f (xn + τ (xn+1 − xn )) − f (xn )]dτ (xn+1 − xn ) = 0 + [f (xn ) − f (xm[n/m] )](xn+1 − xn ). From f ∈ K (1) (x0 , γ, L) and Lemma 2.1 we can inductively prove that xn+1 − xn ≤ tn+1 − tn , n = 0, 1, . . . , so that {xn } converges to the unique solution x∗ of the equation f (x) = 0, which completes the proof. 212 X. WANG, C. LI, AND M.-J. LAI Theorem 4.2. Let f ∈ K (2) (x0 , γ, L) and suppose that β ≤ b. Then the sequence {xn } generated by (4.2) with the initial point x0 converges to the unique solution x∗ of the equation f (x) = 0. Proof. Let {xn } and {yn } be the two sequences defined by (4.2). We introduce the sequence {zn } such that yn = 12 (xn + zn ). Thus (4.2) can be rewritten into xn+1 = xn − f (yn )−1 f (xn ), −1 zn+1 = xn+1 − f (yn ) y0 = x0 . (4.3) n = 0, 1, . . . , f (xn+1 ), Let {tn }, {sn } and {rn } be the corresponding {xn }, {yn } and {zn } for the function h, where s0 = t0 = 0. Then we can inductively prove that 0 = t0 = s0 ≤ tn ≤ rn ≤ tn+1 ≤ rn+1 ≤ t∗ (4.5) and limn→∞ tn = limn→∞ sn = limn→∞ rn = t∗ . Furthermore, observe that f (xn+1 ) = f (xn+1 ) − f (xn ) − f (yn )(xn+1 − xn ) = {f (xn+1 ) − f (zn ) − f (zn )(xn+1 − zn )} + {f (zn ) − f (yn )}(xn+1 − zn ) + {f (zn ) − f (yn ) − f (yn )(zn − yn )} − {f (xn ) − f (yn ) − f (yn )(xn − yn )} 1 [f (zn + τ (xn+1 − zn )) − f (zn )]dτ (xn+1 − zn ) = 0 + {f (zn ) − f (yn )}(xn+1 − zn ) 1 f (yn + τ (zn − yn ))(1 − τ )dτ (zn − yn )2 + 0 − = 1 f (yn + τ (xn − yn ))(1 − τ )dτ (xn − yn )2 0 1 [f (zn + τ (xn+1 − zn )) − f (zn )]dτ (xn+1 − zn ) 0 + {f (zn ) − f (yn )}(xn+1 − zn ) 1 [f (yn + τ (yn − xn )) − f (yn + τ (xn − yn ))](1 − τ )dτ (xn − yn )2 + 0 using zn − yn = yn − xn . Hence, from Lemma 2.1 and Lemma 2.2, we can show, by induction, that xn , yn , zn ∈ B(x0 , t∗ ), n = 0, 1, . . . , f (yn )−1 f (xn ) ≤ −h (sn )h(tn ), and f (yn )−1 f (xn+1 ) ≤ −h (sn )h(tn+1 ), n = 0, 1, . . . , n = 0, 1, . . . , so that zn − xn ≤ rn − tn , n = 0, 1, . . . , and xn+1 − zn ≤ tn+1 − rn , This completes the proof. n = 0, 1, . . . . CONVERGENCE FOR NEWTON-TYPE METHODS 5 213 Conclusions. We have established unified convergence theorems which include a lot of known results on the convergence of the Newton method and its variations. It is worthwhile to remark that the unification lies not only in that of the convergence theorems itself, but also under the same condition β ≤ b. Therefore the constant b is often called the “universal constant”. Moreover, it should be pointed out that the idea and technique developed in this paper can be used to establish the same result for Halley’s method [10, 2, 4, 13], which means that Halley’s iteration sequence converges under the same condition β ≤ b. Thus we also include them in our unified framework. In other words, the “universal constant” does not depend on any given method for solving a nonlinear equation. The condition β ≤ b appears to guarantee the existence of zeros for the majorizing function h. So this condition is the same for Newton’s method, Newton type methods, Halley’s method and others (Chebyshev, super-Halley, etc.). In short, our results have reached a rather high unification. REFERENCES 1. J. M. Gutiérrez, A new semilocal convergence theorem for Newton method, J. Comput. Appl. Math., 79 (1997), pp. 131–145. 2. D. F. Han and X. Wang, The error estimates of Halley’s method, Numer. Math. JCU, 6 (1997), pp. 231–240. 3. D. F. Han and X. Wang, Convergence on a deformed Newton method, Appl. Math. Comput., 94(1998), pp. 65–72. 4. M. A. Hernández, A note on Halley’s method, Numer. Math., 59 (1991), pp. 273–276. 5. Z. D. Huang, A note on the Kantorovich theorem for Newton iteration, J. Comput. Appl. Math., 47 (1993), pp. 211–217. 6. L. V. Kantorovich and G. P. Akilov, Functional Analysis, Pergamon Press, New York, 1982. 7. R. F. King, Tangent method for nonlinear equations, Numer. Math 18 (1972), pp. 298–304. 8. X. Wang, On error estimates for some numerical root-finding methods, Acta Math. Sinica, 22 (1979), pp. 638–642 (in Chinese). 9. X. Wang, Convergence of Newton’s method and inverse function in Banach spaces, Math. Comp., 68 (1999), pp. 169–186. 10. X. Wang, Convergence of the iteration of Halley family and Smale operator class in Banach space, Science in China (Ser. A), 41 (1998), pp. 700–709. 11. X. Wang, Convergence of the iteration of Halley’s family in weak condition, Chinese Science Bulletin, 42 (1997), pp. 552–555. √ 12. W. Werner, Über ein Verfahren der Ordnung 1 + 2 zur Nullstellenbestimmung, Numer. Math., 32 (1979), pp. 333–342. 13. T. Yamamoto, On the method of tangent hyperbolas in Banach spaces, J. Comput. Appl. Math., 21 (1988), pp. 75–86.