Applied Numerical Mathematics Kantorovich-type convergence criterion for inexact Newton methods Weiping Shen

advertisement
Applied Numerical Mathematics 59 (2009) 1599–1611
Contents lists available at ScienceDirect
Applied Numerical Mathematics
www.elsevier.com/locate/apnum
Kantorovich-type convergence criterion for inexact Newton methods ✩
Weiping Shen a,∗ , Chong Li b
a
b
Department of Mathematics, Zhejiang Normal University, Jinhua, 321004, PR China
Department of Mathematics, Zhejiang University, Hangzhou, 310027, PR China
a r t i c l e
i n f o
Article history:
Received 2 September 2008
Received in revised form 23 October 2008
Accepted 4 November 2008
Available online 13 November 2008
MSC:
65H10
65J15
47H30
a b s t r a c t
Assuming that the first derivative of an operator satisfies the Lipschitz condition, a
Kantorovich-type convergence criterion for inexact Newton methods is established, which
includes the well-known Kantorovich’s theorem as a special case. Comparisons and a
numerical example are presented to illustrate that our results obtained in the present
paper improve and extend some recent results in [X.P. Guo, On semilocal convergence
of inexact Newton methods, J. Comput. Math. 25 (2007) 231–242; W.P. Shen, C. Li,
Convergence criterion of inexact methods for operators with Hölder continuous derivatives,
Taiwan. J. Math. 12 (2008) 1865–1882].
© 2008 IMACS. Published by Elsevier B.V. All rights reserved.
Keywords:
Nonlinear equation
Inexact Newton methods
Lipschitz condition
Kantorovich-type theorem
1. Introduction
Let X and Y be (real or complex) Banach spaces, D ⊆ X be an open subset and let F : D ⊆ X → Y be a nonlinear operator
with continuous Fréchet derivative denoted by F . Finding solutions of the nonlinear operator equation
F (x) = 0
(1.1)
in Banach spaces is a very general subject which is widely used in both theoretical and applied areas of mathematics. The
most important method to find an approximation of a solution of (1.1) is Newton’s method which takes the following form:
xn+1 = xn − F (xn )−1 F (xn ),
n = 0, 1, 2, . . . .
(1.2)
The convergence issue of Newton’s method has been studied extensively, see for example [5,7,8,11,16–18,21–24]. These
results can be distinguished into two classes. One is about local convergence discussing the properties of Newton’s method
through the information depending on the solution x∗ of (1.1) (cf. [22–24]), and the other is about semilocal convergence
which only deals with the initial point x0 (cf. [5,7,8,11,16–18,21]). Among the semilocal convergence results on Newton’s
method, one of the famous results is the well-known Kantorovich’s theorem (cf. [11]) which, under the very mild condition
that the first derivative F is Lipschitz continuous, provides the following convergence criterion of Newton’s method:
✩
*
This project was supported in part by the National Natural Science Foundations of China (Grant Nos. 10671175; 10731060).
Corresponding author.
E-mail addresses: shenweiping@zjnu.cn (W.P. Shen), cli@zju.edu.cn (C. Li).
0168-9274/$30.00 © 2008 IMACS. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.apnum.2008.11.002
1600
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
0 < αγ 1
2
(1.3)
,
where γ is the Lipschitz constant and α F (x0 )−1 F (x0 ).
As expressed in (1.2), we have to solve the following Newton equation at each Newton step:
F (xn )sn = − F (xn ).
(1.4)
This sometimes makes Newton’s method inefficient from the point of view of practical calculations especially when F (xn )
is large and sparse. While using linear iterative methods to approximate the solution of (1.4) instead of solving it exactly
can reduce some of the costs of Newton’s method which was studied extensively and applied in [1–3,6,10,14,19,25] (such a
variant is the so-called inexact Newton method). In general, the inexact Newton method has the following general form:
Algorithm A[rn ; x0 ]. For n = 0 and a given initial guess x0 until convergence do
1. For the residual rn and the iteration xn , find the step sn satisfying
F (xn )sn = − F (xn ) + rn .
(1.5)
2. xn+1 = xn + sn .
3. Set n = n + 1 and turn to Step 1.
Here {rn } is a sequence of elements in Y (depending on {xn } in general).
As it is well known, the convergence behavior of the inexact Newton method depends on the residual controls of {rn }.
Several authors (cf. [6,19]) have analyzed the local convergence behavior in some manner such that the stopping relative
residuals {rn } satisfy rn ηn F (xn ). Ypma used in [25] the affine invariant condition F (xn )−1 rn ηn F (xn )−1 F (xn ),
which makes the method become an affine invariant one, and analyzed also the local convergence property of the inexact
Newton method.
The above results concern mainly with the local convergence of the inexact Newton method. In the spirit of Kantorovich’s
theorem, the semi-local convergence analysis for the inexact Newton method was studied recently, see [1,2,10] for example.
As in the case of the local convergence analysis, different residual controls were used. For example, the residual controls
rn ηn F (xn ) was adopted in [2]; while in [1], Argyros considered the residual controls rn dn hn /cn where {dn }, {hn }
and {cn } are real sequences satisfying some suitable conditions.
In particular, Guo considered in her recent paper [10] the residual controls
F (x0 )−1 rn ηn F (x0 )−1 F (xn ) for each n ∈ N,
(1.6)
and, under the Lipschitz continuity assumption on F , established the following convergence criterion for the inexact Newton
method (cf. [10, Theorem 2.6]):
αγ (4η + 5)3 − 2η2 − 14η − 11
,
(1 + η)(1 − η)2
(1.7)
where α , γ are the same as in (1.3) and η := supn0 ηn . However, the criterion (1.7) is not satisfactory; in particular, the
convergence criterion in the case when ηn ≡ 0 is much worse than the criterion (1.3).
Motivated by the ideas of the inexact Newton-like method for the inverse eigenvalue problem (cf. [4]), the authors of
the present paper presented in [12] the following residual controls:
1+β
P n rn ηn P n F (xn )
for each n ∈ N,
(1.8)
where { P n } is a sequence of invertible operators from Y to X and 0 β 1, and established the local convergence of
order 1 + β for inexact Newton-like methods. A semi-local convergence analysis for inexact Newton-like methods under the
( L , p )-Hölder condition on F (x0 )−1 F was done in [15] for the residual controls (1.8) with β > 0. Obviously, in the special
case when P n = F (x0 )−1 , (1.8) reduces to
F (x0 )−1 rn ηn F (x0 )−1 F (xn )1+β
for each n ∈ N.
(1.9)
In the present paper, by considering the residual controls (1.9), we continue our study on the convergence criterion of
the inexact Newton method. Under the assumption of the Lipschitz continuity of F (x0 )−1 F , we establish a Kantorovichtype convergence criterion for the inexact Newton method Algorithm A[rn ; x0 ]. Note that in the special case when β = 0,
(1.9) reduces to Guo’s controls (1.6) and our result extends the corresponding one in [10]. In particular, we get the following
convergence criterion:
αγ (1 − η)2
,
(1 + η)(2(1 + η) − η(1 − η)2 )
(1.10)
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
1601
which is better than (1.7) for η ∈ [0, 12 ] as showed in the last section. For the case when β = 1, our result in the present
paper is better than the corresponding one in paper [15]. Moreover, for the case when 0 < β < 1, our main result is also
sharper than the corresponding one in paper [15] for the case when γ 1. It is worth to be remarked that our results
obtained in the present paper include Kantorovich’s theorem as a special case. Our approach is based on the construction
of the majorizing function, which is completely different from that used in [10].
2. Preliminaries
Let X and Y be Banach spaces. Throughout the whole paper, we use B(x, r ) to stand for the open ball in X with center
x and radius r > 0, and assume that 0 β 1. Let σ , λ and θ be positive constants. We define two real-valued functions
ϕβ and ψβ respectively by
ϕβ (t ) =
σ
2
t 2 + 21−β θ t 1+β − (1 + θ)t + λ
for each t 0
(2.1)
and
σ
t 2 + θ t 1+β − (1 + θ)t + λ for each t 0.
2
The derivatives of ϕβ and ψβ are respectively
ψβ (t ) =
(2.2)
ϕβ (t ) = σ t + 21−β (1 + β)θ t β − (1 + θ) for each t 0
(2.3)
ψβ (t ) = σ t + (1 + β)θ t β − (1 + θ) for each t 0.
(2.4)
−ψβ (t ) −ϕβ (t ) for each t 0.
(2.5)
and
Hence
Furthermore, we assume that θ < 1 in the case when β = 0. By (2.3), the derivative ϕβ is strictly increasing and has
the values ϕβ (0) < 0 and ϕβ (+∞) = +∞. It follows that the equation ϕβ (t ) = 0 has a unique positive solution. In the
remainder, we denote the solution by r∗ , that is, r∗ satisfies
ϕβ (r∗ ) = σ r∗ + 21−β (1 + β)θ r∗β − (1 + θ) = 0.
(2.6)
Write
σ
1+β
r 2 + β 21−β θ r∗ .
2 ∗
Thus, we are ready to prove the following lemma which describes some useful properties of the function
b :=
(2.7)
ϕβ .
Lemma 2.1. Let ϕβ be defined by (2.1). Then ϕβ is strictly decreasing on [0, r∗ ] and satisfies that
ϕβ (r∗ ) = λ − b.
(2.8)
Moreover, if
λ b,
(2.9)
then ϕβ has a unique zero t ∗ in [0, r∗ ].
Proof. By (2.3) and (2.6),
ϕβ (r∗ ) =
ϕβ is negative on [0, r∗ ); hence ϕβ is strictly decreasing on [0, r∗ ]. Since, by (2.1),
σ
1+β
r 2 + 21−β θ r∗
2 ∗
1−β
= σ r∗ + 2
− (1 + θ)r∗ + λ
β
(1 + β)θ r∗
− (1 + θ) r∗ −
σ
2
1−β
r + β2
2 ∗
1+β
θ r∗
+ λ,
it follows from (2.6) and (2.7) that (2.8) holds.
Now assume that (2.9) holds. Then, ϕβ (r∗ ) 0 by (2.8). Note that ϕβ is strictly decreasing on [0, r∗ ] and that
Therefore, ϕβ has a unique zero t ∗ in [0, r∗ ], and the proof is complete. 2
(2.10)
ϕβ (0) > 0.
Let t 0 = 0 and let {tn } denote the sequence generated by
tn+1 = tn −
ϕβ (tn )
ψβ (tn )
for each n = 0, 1, . . . .
The convergence property of the sequence {tn }, which will play a key role, is described in the following lemma.
(2.11)
1602
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
Lemma 2.2. Suppose that (2.9) holds. Let t ∗ be the unique zero of ϕβ in [0, r∗ ] and let {tn } be the sequence generated by (2.11). Then
tn < tn+1 < t ∗
Consequently, {tn }
tn+1 − tn for each n ∈ N.
(2.12)
converges increasingly to t ∗ . Moreover, in the special case when
λ
1+θ
β > 0,
for each n ∈ N.
(2.13)
Proof. We will verify (2.12) by mathematical induction. To prove (2.12) holds for n = 0, we note that
t1 =
⎧
⎨ λ,
β = 0;
λ
⎩
, β > 0.
1+θ
(2.14)
Then t 1 r∗ . In fact, if β > 0, then by (2.7) and (2.6),
b=
σ
1+β
r 2 + β 21−β θ r∗
2 ∗
β
σ r∗ + 21−β (1 + β)θ r∗ r∗ = (1 + θ)r∗ ;
(2.15)
b
λ
b
r∗ and so t 1 = 1+θ
1+θ
r∗ by (2.9). If β = 0, then by (2.7) and (2.6),
hence 1+θ
b=
σ
r 2 σ r∗2 = (1 − θ)r∗ r∗ ;
2 ∗
(2.16)
hence t 1 = λ b r∗ by (2.9).
Furthermore, since
⎧σ
2
⎪
β = 0;
⎪
⎨ 2 λ + θλ > 0,
2
1+β
ϕβ (t 1 ) =
λ
λ
σ
⎪
⎪
⎩
+ 21−β θ
> 0, β > 0,
2 1+θ
1+θ
(2.12) holds for n = 0 because ϕβ is strictly decreasing on [0, r∗ ] and
Now assume that t 1 < t 2 < · · · < tn < t ∗ . Then,
ϕβ (t ∗ ) = 0.
ϕβ (tn ) > 0
(2.17)
and tn < r∗ . Thus, thanks to (2.5), (2.6) and the fact that ϕβ is strictly increasing, one has
−ψβ (tn ) −ϕβ (tn ) > −ϕβ (r∗ ) = 0.
(2.18)
This and (2.17) imply that
tn+1 = tn −
ϕβ (tn )
> tn .
ψβ (tn )
On the other hand, by (2.5) and (2.17), tn+1 tn − ϕβ (tn )/ϕβ (tn ). Since the function t → t − ϕβ (t )/ϕβ (t ) is monotonically
increasing on [0, t ∗ ], it follows that
tn+1 tn −
ϕβ (tn )
ϕβ (t ∗ )
< t∗ − ∗ = t∗.
ϕβ (tn )
ϕβ (t )
Hence, (2.12) holds for each n 0. This means that {tn } converges increasingly to a point, say ς . Clearly ς ∈ [0, t ∗ ] and ς
is a zero of ϕβ (cf. (2.11)). Noting that t ∗ is the unique zero of ϕβ in [0, r∗ ], one has that ς = t ∗ and so {tn } converges
increasingly to t ∗ .
Finally, suppose that β > 0. By (2.5) and (2.17),
tn+1 − tn = −
ϕβ (tn )
ϕβ (tn )
ϕβ (t 0 )
− − ψβ (tn )
ϕβ (tn )
ϕβ (t 0 )
for each n = 0, 1, . . . ,
(2.19)
where the last inequality holds because the function M defined by M (t ) := −ϕβ (t )/ϕβ (t ) for each 0 t t ∗ is monotonically decreasing. Noting that −ϕβ (t 0 )/ϕβ (t 0 ) = λ/(1 + θ), we see that (2.13) holds by (2.19). This completes the proof.
2
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
1603
3. Kantorovich-type convergence criterion
Recall that F : D → Y is an operator with continuous Fréchet derivative denoted by F . Let x0 ∈ D be such that the
inverse F (x0 )−1 exists. We say that F (x0 )−1 F satisfies Lipschitz condition on B(x0 , r ) with the Lipschitz constant γ if
F (x0 )−1 F (x) − F ( y ) γ x − y for all x, y ∈ B(x0 , r ).
(3.1)
Assume that B(x0 , γ1 ) ⊆ D throughout the whole paper. The following lemma can be proved by Banach’s Lemma with a
standard argument, see for example [21].
Lemma 3.1. Let r γ1 and let x ∈ B(x0 , r ). Then F (x) is invertible and satisfies that
−1 F (x) F (x0 ) 1
1 − γ x − x0 (3.2)
.
For the remainder of the present paper, we assume that the residuals {rn } satisfy (1.9) and that
η := supn0 ηn < 1. Let
α := F (x0 )−1 F (x0 ).
(3.3)
Without loss of generality, we may assume throughout that x0 is not a zero of F . This means that
σ :=
and
1/(1+β)
γ (1 + η
)
,
(1 + αγ η(1 + η))(1 + η(α β − 1))
θ :=
η
1 + η(α β − 1)
α (1 + η),
β = 0;
α 1 + η1/(1+β) (1 + θ), β > 0.
λ :=
α > 0. Write
(3.4)
(3.5)
Recall that r∗ is the unique positive zero of ϕβ (that is, r∗ satisfies (2.6)) and that {tn } denotes the sequence generated by
(2.11) with σ , θ and λ given by (3.4) and (3.5). The main theorem now can be stated as follows.
Theorem 3.1. Let σ and θ be given by (3.4). Suppose that F (x0 )−1 F satisfies the Lipschitz condition on B(x0 , t ∗ ) with the Lipschitz
constant γ . Suppose that
⎧
⎪
⎪
⎪
⎨
(1 − η)2
,
β = 0;
γ (1 + η)(2(1 + η) − η(1 − η)2 )
α
⎪
σ r∗2 + 2β 21−β θ r∗1+β
⎪
−1/(1+β) , β > 0.
⎪
⎩ min
,
η
2(1 + θ)(1 + η1/(1+β) )
(3.6)
Let {xn } be the sequence generated by Algorithm A[rn ; x0 ]. Then {xn } converges to a solution x∗ of (1.1) and the following assertion
holds:
xn − x∗ t ∗ − tn for each n ∈ N.
(3.7)
Proof. Let b be defined by (2.7). We assert that
⇐⇒
condition (3.6) holds
λ b and α β ηβ/(1+β) 1 .
(3.8)
In the case when β > 0, (3.8) is a direct consequence of the definitions of b and λ respectively given in (2.7) and (3.5). In
the case when β = 0, (2.6) and (2.7) reduce to
σ r∗ = 1 − θ and b =
σ
r2.
2 ∗
Furthermore,
σ=
γ (1 + η)
1 + αγ η(1 + η)
,
θ = η and λ = α (1 + η).
It follows that
b=
σ
2
r∗2 =
Consequently,
(1 − θ)2
2σ
=
(1 − η)2 (1 + αγ η(1 + η))
.
2γ (1 + η)
1604
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
α (1 + η) (1 − η)2 (1 + αγ η(1 + η))
2γ (1 + η)
⇐⇒
λ b.
(3.9)
This is equivalent to the assertion (3.8) because α β ηβ/(1+β) 1 holds automatically for β = 0. Thus (3.8) is proved and
Lemma 2.2 is applicable to concluding that (2.12) and (2.13) hold. To complete the proof, it is sufficient to verify that
1 + η1/(1+β) F (x0 )−1 F (xn ) tn+1 − tn
1 − γ tn
(3.10)
xn+1 − xn tn+1 − tn
(3.11)
and
hold for each n ∈ N. In fact, granting this, one has from (3.11) that
xn+m − xn m
xn+i − xn+i −1 i =1
m
(tn+i − tn+i −1 ) = tn+m − tn .
(3.12)
i =1
Therefore {xn } is a Cauchy sequence and (3.7) holds by letting m → ∞ in (3.12). Moreover, letting n → ∞ in (3.10) shows
that the limit x∗ is a solution of (1.1).
Below, we will prove (3.10) and (3.11) by mathematical induction. Note first that, by (1.9), if n ∈ N is such that xn is
well-defined, then
F (x0 )−1 rn ηn F (x0 )−1 F (xn )1+β η F (x0 )−1 F (xn )1+β .
Since t 1 = α (1 + η
(3.13)
1/(1+β)
), (3.10) is clear for n = 0 by the definition of α . By (3.13),
x1 − x0 = − F (x0 )−1 F (x0 ) + F (x0 )−1 r0 α + ηα 1+β .
Using (3.6), one has that
(3.14)
ηα β η1/(1+β) . This together with (3.14) yields that
x1 − x0 α + ηα 1+β α 1 + η1/(1+β) = t 1 − t 0 .
This shows that (3.11) holds for n = 0. Assume now that (3.10) and (3.11) hold for all n m − 1. We have to prove (3.10)
and (3.11) hold for n = m. For this end, we apply Algorithm A[rn ; x0 ] to get that
F (xm ) = F (xm ) − F (xm−1 ) − F (xm−1 )(xm − xm−1 ) + rm−1
1
=
τ
F xm
−1 dτ (xm − xm−1 ) − F (xm−1 )(xm − xm−1 ) + rm−1 ,
0
where xnτ = xn + τ (xn+1 − xn ) for each 0 τ 1. Hence,
1
τ −1
F (x0 )−1 F (xm ) F xm−1 − F (xm−1 ) dτ (xm − xm−1 ) + F (x0 )−1 rm−1 .
F (x0 )
(3.15)
0
Below we will show that
1
γ
τ −1
F xm−1 − F (xm−1 ) (xm − xm−1 ) dτ (tm − tm−1 )2
F (x0 )
2
(3.16)
0
and
F (x0 )−1 rm−1 η 1 + η1/(1+β) −1−β (tm − tm−1 )1+β .
(3.17)
Since (3.11) holds for all n m − 1, one has that
τ
x
m−1
−2
m
−2
m
− x0 xi +1 − xi + τ xm − xm−1 (t i +1 − t i ) + τ (tm − tm−1 ).
i =0
(3.18)
i =0
Hence, by (2.12),
τ
x
m−1
In particular,
− x0 τ tm + (1 − τ )tm−1 < t ∗ .
(3.19)
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
xm − x0 tm < t ∗ and xm−1 − x0 tm−1 < t ∗ .
1605
(3.20)
Thus, applying the Lipschitz condition, we get that
1
τ −1
F
(
x
)
F
x
F
(
x
)
(
x
−
x
)
d
τ
−
0
m−1
m
m−1
m−1
0
1
τ
γ xm
−1 − xm−1 xm − xm−1 dτ
0
=
γ
2
γ
2
xm − xm−1 2
(tm − tm−1 )2 ,
(3.21)
where the last inequality holds because of (3.11) (with n = m − 1). This completes the proof of (3.16). To show (3.17), we
note by (3.10) (with n = m − 1) that
F (x0 )−1 F (xm−1 ) 1 + η1/(1+β) −1 (tm − tm−1 ).
Thus, we have by (3.13) that
F (x0 )−1 rm−1 η F (x0 )−1 F (xm−1 )1+β η 1 + η1/(1+β) −1−β (tm − tm−1 )1+β .
(3.22)
Therefore, (3.17) is proved. Combining (3.15)–(3.17), one has that
F (x0 )−1 F (xm ) γ (tm − tm−1 )2 + η 1 + η1/(1+β) −1−β (tm − tm−1 )1+β .
2
(3.23)
Hence,
1 + η1/(1+β) F (x0 )−1 F (xm ) 1 − γ tm
γ (1 + η1/(1+β) )
η(1 + η1/(1+β) )−β
(tm − tm−1 )2 +
(tm − tm−1 )1+β .
2(1 − γ tm )
1 − γ tm
(3.24)
We claim that
σ (1 − γ t ) −ψβ (t )γ 1 + η1/(1+β) 0 for each t ∈ [t 1 , r∗ ].
(3.25)
In fact, let t ∈ [t 1 , r∗ ]. Then, by the definition of λ,
α 1 + η1/(1+β) = t 1 t .
Since
(3.26)
η η1/(1+β) (noting η < 1), it follows that
1 + αγ η(1 + η) 1 + αγ η1/(1+β) 1 + η1/(1+β) 1 + γ η1/(1+β) t .
Noting that θ
= η/(1 + η(α β
1
1 + η(α β − 1)
(3.27)
− 1)) and thanks to (3.26), one has that
= 1 + θ 1 − αβ
β
1 + θ − (1 + β)θ α β 1 + η1/(1+β)
1 + θ − (1 + β)θ t β .
(3.28)
In view of (2.4), ψβ (t ) = σ t + (1 + β)θ t β − (1 + θ). This and (3.28) imply that
1
1 + η(α β − 1)
Noting that
σ
−ψβ (t ) + σ t .
(3.29)
σ = γ (1 + η1/(1+β) )/((1 + αγ η(1 + η))(1 + η(α β − 1))), we use (3.28) and (3.27) to get that
γ (1 + η1/(1+β) )(−ψβ (t ) + σ t ) γ (1 + η1/(1+β) )(−ψβ (t ) + σ t )
=
,
1 + γ η1/(1+β) t
1 − γ t + γ (1 + η1/(1+β) )t
which is equivalent to
σ (1 − γ t ) −ψβ (t )γ 1 + η1/(1+β) .
(3.30)
1606
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
Since −ψβ (t ) −ϕβ (t ) −ϕβ (r∗ ) = 0, one sees that (3.25) is proved and the claim stands. In particular, (3.25) implies that
σ (1 − γ tm ) −ψβ (tm )γ (1 + η1/(1+β) ) and so
γ (1 + η1/(1+β) )
σ
,
1 − γ tm
−ψβ (tm )
(3.31)
because tm ∈ (t 1 , r∗ ) and −ψβ (tm ) > 0. Consequently, in view of the definitions of
σ and θ ,
η(1 + η1/(1+β) )−β
η(1 + η1/(1+β) )−β σ
1 − γ tm
−ψβ (tm )γ (1 + η1/(1+β) )
=
η(1 + η1/(1+β) )−β
−ψβ (tm )(1 + αγ η(1 + η))(1 + η(α β
− 1))
θ
.
−ψβ (tm )
(3.32)
This together with (3.31) and (3.24) implies
1 + η1/(1+β) F (x0 )−1 F (xm ) 1 − γ tm
σ (tm − tm−1 )2
−2ψβ (tm )
Noting the following well-known inequality
t 1+β + (1 + β)t 21−β (1 + t )1+β − 1
+
θ(tm − tm−1 )1+β
.
−ψβ (tm )
(3.33)
for each t 0,
we have
tm − tm−1
1+β
+ (1 + β)
tm−1
tm − tm−1
tm−1
21−β
1+
tm − tm−1
1+β
tm−1
−1 .
(3.34)
It follows that
1+β
(tm − tm−1 )1+β 21−β tm
1+β
− 21−β tm−1 − (1 + β)tm−1 (tm − tm−1 ).
β
(3.35)
Combining this with (3.33), one has that
1 + η1/(1+β) F (x0 )−1 F (xm )
1 − γ tm
−
=−
1
ψβ (tm )
σ
2
1+β
(tm − tm−1 )2 + 21−β θ tm
1+β
β
− 21−β θ tm−1 − (1 + β)θ tm−1 (tm − tm−1 )
ϕβ (tm ) − ϕβ (tm−1 ) − ψβ (tm−1 )(tm − tm−1 )
ψβ (tm )
= tm+1 − tm ;
(3.36)
and hence (3.10) holds for n = m. To verify (3.11) for n = m, we take t = t ∗ in (3.25) to get
σ 1 − γ t ∗ −ψβ t ∗ γ 1 + η1/(1+β) −ϕβ t ∗ γ 1 + η1/(1+β) −ϕβ (r∗ )γ 1 + η1/(1+β) = 0,
(3.37)
which implies t ∗ γ1 . Thus, by (3.20), Lemma 3.1 is applicable to concluding that F (xm )−1 exists and
F (xm )−1 F (x0 ) 1
1 − γ xm − x0 1
1 − γ tm
.
(3.38)
Then by (3.13), we conclude that
xm+1 − xm F (xm )−1 F (x0 ) F (x0 )−1 F (xm ) + F (x0 )−1 rm 1
F (x0 )−1 F (xm ) + η F (x0 )−1 F (xm )1+β .
1 − γ tm
(3.39)
We assert that
1+β
η F (x0 )−1 F (xm )
η1/(1+β) F (x0 )−1 F (xm ).
(3.40)
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
1607
Indeed, (3.40) is trivial in the case when β = 0. For the case when β > 0, by (3.10) (with n = m) and (2.13),
β
ηβ/(1+β) F (x0 )−1 F (xm ) ηβ/(1+β) (1 − γ tm )β
ηβ/(1+β) λβ
β
(
t
−
t
)
.
m
+
1
m
(1 + η1/(1+β) )β
(1 + η1/(1+β) )β (1 + θ)β
Recalling that λ = α (1 + η1/(1+β) )(1 + θ), it follows from (3.8) that
β
ηβ/(1+β) F (x0 )−1 F (xm ) ηβ/(1+β) λβ
(1 + η
1/(1+β) )β (1 + θ)β
= α β ηβ/(1+β) 1,
which implies (3.40) holds. Hence
xm+1 − xm 1 + η1/(1+β) F (x0 )−1 F (xm ) tm+1 − tm ,
1 − γ tm
(3.41)
thanks to (3.10) proved just for n = m. Therefore, (3.11) holds for n = m and the proof is complete.
2
In particular, in the case when ηn ≡ 0, Algorithm A[rn ; x0 ] reduces to Newton’s method, and furthermore one has σ = γ ,
θ = 0, λ = α . Thus r∗ = γ1 . In this case the sequence {tn } defined by (2.11) reduce to Newton’s sequence with initial point
t 0 = 0. Hence, by a well-known result, see for example [9,13,20] and [21], we have that
n
ξ2
−1
t̂ ∗ − tn = 2n −1
j =0
where
∗
t̂ =
1−
ξj
t̂ ∗
for each n ∈ N,
1 − 2αγ
and
γ
ξ=
1−
1+
(3.42)
1 − 2αγ
1 − 2αγ
(3.43)
.
Thus, Theorem 3.1 reduces to Kantorovich’s theorem.
Corollary 3.1. Let ηn ≡ 0 for each n 0. Suppose that F (x0 )−1 F satisfies the Lipschitz condition (3.1) with r = (1 −
and that
1
0 < αγ 2
1 − 2αγ )/γ
.
Then Newton’s sequence {xn } converges to a solution x∗ of (1.1) satisfying
2n −1
xn − x∗ ξ n
t̂ ∗ ,
2 −1 j
j =0 ξ
where ξ and t̂ ∗ are defined by (3.43).
Note that the convergence criterion given in Theorem 3.1 is implicit in the case when β > 0. An explicit convergence
criterion is given in the following corollary.
Corollary 3.2. Suppose that β > 0 and that F (x0 )−1 F satisfies the Lipschitz condition (3.1) on B(x0 , t ∗ ). Suppose also that
α min
21−1/β
μ1/β ν 1+1/β
−
21−1/β η
μ1+1/β ν 2+1/β
−
21−2/β γ
μ2/β ν
,
2/β
η−1/(1+β) ,
(3.44)
where μ = γ + (1 + β)η and ν = 1 + η1/(1+β) . Let {xn } be the sequence generated by Algorithm A[rn ; x0 ]. Then {xn } converges to a
solution x∗ of (1.1) and the following assertion holds:
xn − x∗ t ∗ − tn for each n ∈ N.
Proof. By Theorem 3.1, it suffices to prove that (3.44) implies (3.6) in the case when β > 0. To do this, we suppose that
(3.44) holds. Then
1 + ηα β
να 21−1/β (1 + ηα β )
ν
1/β
−
21−1/β η(1 + ηα β )
1+1/β
1+1/β
−
21−2/β γ (1 + ηα β )
ν
μ2/β ν 2/β−1
21−1/β (1 + ηα β )
21−1/β η
21−2/β γ
− 1+1/β 1+1/β − 2/β 2/β−1
.
1
/(β)
1
/β
μ
ν
μ ν
(1 + αγ η(1 + η))
μ
ν
μ
1/β
μ
(3.45)
1608
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
ν = 1 + η1/(1+β) , one has by the definitions of σ , θ and λ (cf. (3.4) and (3.5)) that
γν
,
1 + η αβ − 1 σ =
(1 + αγ η(1 + η))
β
1 + η α − 1 θ = η,
Noting that β > 0 and
1+η
α β − 1 (1 + θ) = 1 + ηα β
and
1+η
ϕβ defined by (2.2), we get that, for each t > 0,
Substituting these into
1+η
α β − 1 λ = 1 + ηα β να .
α β − 1 ϕβ (t ) =
γν
t2
+ η21−β t 1+β − 1 + ηα β t + 1 + ηα β να .
(1 + αγ η(1 + η)) 2
Consequently,
1+η
=
=
α β − 1 ϕβ 21−1/β μ−1/β ν −1/β
1+β
(21−1/β μ−1/β ν −1/β )2
γν
+ η21−β 21−1/β μ−1/β ν −1/β
(1 + αγ η(1 + η))
2
− 1 + ηα β 21−1/β μ−1/β ν −1/β + 1 + ηα β να ,
21−2/β γ
μ2/β ν 2/β−1 (1 + αγ η(1 + η))
+
21−1/β η
μ1+1/β ν 1+1/β
−
21−1/β (1 + ηα β
μ1/β ν 1/β
+ 1 + ηα β να .
This together with (3.45) implies that ϕβ (21−1/β μ−1/β ν −1/β ) 0. By the definition of r∗ , r∗ is the minimizer of
(0, +∞). It follows that ϕβ (r∗ ) ϕβ (21−1/β μ−1/β ν −1/β ) 0 and so λ b by (2.8), that is,
α
ϕβ on
σ r∗2 + 2β 21−β θ r∗1+β
.
2(1 + θ)(1 + η1/(1+β) )
2
The proof is complete.
Below we consider the special case when β = 1. Then,
√
σ=
γ (1 + η)
,
(1 + αγ η(1 + η))(1 + η(α − 1))
λ=
α (1 + η)(1 + αη)
.
1 + η(α − 1)
and
θ=
η
1 + η(α − 1)
(3.46)
√
(3.47)
Note that in this case the sequence {tn } defined by (2.11) also reduces to Newton’s sequence with initial point t 0 = 0. Hence,
as have noted in Corollary 3.1, we have that
n
ξ2
−1
t̂ ∗ − tn = 2n −1
j =0
where
∗
t̂ =
1+θ −
ξj
t̂ ∗
for each n ∈ N,
(1 + θ)2 − 2(σ + 2θ)λ
σ + 2θ
(3.48)
(1 + θ)2 − 2(σ + 2θ)λ
and ξ =
.
1 + θ + (1 + θ)2 − 2(σ + 2θ)λ
1+θ −
(3.49)
We have the following corollary, which gives an explicit criterion sharper than the corresponding one in Corollary 3.2, and
an explicit estimate for the error xn − x∗ .
Corollary 3.3. Suppose that β = 1 and that F (x0 )−1 F satisfies the Lipschitz condition (3.1) with r = t̂ ∗ . Suppose also that
α min
2γ (1 +
√
1
√
η)2 + 4η(1 + η) − η
,√ .
1
η
Then the sequence {xn } generated by Algorithm A[rn ; x0 ] converges to a solution x∗ of (1.1) satisfying
2n −1
xn − x∗ ξ n
t̂ ∗ for each n ∈ N.
2 −1 j
j =0 ξ
where t̂ ∗ and ξ are defined by (3.49).
(3.50)
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
Proof. Using the expressions of
r∗ =
1+θ
σ + 2θ
=
1609
σ and θ in (3.46), we conclude from (2.6) that
1 + αη
.
√
γ (1 + η)/(1 + αγ η(1 + η)) + 2η
(3.51)
Therefore, noting that β = 1 and thanks to (3.46) and (3.51), the criterion (3.6) is equivalent to
α min
2γ (1 +
√
1 + αη
√
η)2 /(1 + αγ η(1 + η)) + 4η(1 + η)
1
,√
η
(3.52)
.
On the other hand, (3.50) is equivalent to
α min
Since
2γ (1 +
√
1 + αη
1
√
η)2 + 4η(1 + η)
,√
η
.
(3.53)
α > 0, (3.53) implies criterion (3.52); hence (3.6) holds. This completes the proof by Theorem 3.1 and (3.48). 2
4. Concluding remarks
By using a majorizing function technique, we have established in the previous section a general convergence criterion for
the inexact Newton method with the controls given by (1.9). Our approach is different from that employed by Argyros in
[1], where the Lipschitz continuity assumption on the second Fréchet derivative is assumed. The remainder of this section
is devoted to some comparisons of our results with some known results reported in [10] and [15].
For the special case when β = 0, Guo established in [10, Theorem 2.6] the following convergence criterion:
αγ (4η + 5)3 − 2η2 − 14η − 11
.
(1 + η)(1 − η)2
(4.1)
By Theorem 3.1 of the present paper, we have the following convergence criterion for β = 0:
αγ (1 − η)2
.
(1 + η)(2(1 + η) − η(1 − η)2 )
(4.2)
Define the functions g 1 and g 2 respectively by
g 1 (η) =
(4η + 5)3 − 2η2 − 14η − 11
(1 + η)(1 − η)2
g 2 (η) =
(1 − η)2
(1 + η)(2(1 + η) − η(1 − η)2 )
for each
η ∈ [0, 1)
for each
η ∈ [0, 1).
and
Then (4.1) and (4.2) are respectively equivalent to
αγ g1 (η) and αγ g2 (η).
The graphs of the functions g 1 and g 2 are illustrated in Fig. 1, from which one can see that the criterion (4.2) is sharper
than (4.1) for η ∈ [0, η0 ) while for η ∈ (η0 , 1) the criterion (4.1) is a little better than (4.2) but the difference is much
smaller, where η0 = 0.5267 . . . is the solution of the equation g 1 (η) = g 2 (η) in [0, 1].
For the case when β > 0, the authors of the present paper have established in [15] the following explicit convergence
criterion for the inexact Newton method under the ( L , p )-Hölder condition:
⎧
1−1/β −1/β −1/β β
⎪
1 + η1/(1+β)
γ + (1 + β)η
, 1 , 0 < β < 1,
⎪
⎨ (1 + β)(1 + η1/(1+β) ) min 2
α
1
1
⎪
⎪
⎩ min
,√ ,
β = 1.
√
η
2(γ + 2η)(1 + η)2
(4.3)
Consider the special case when β = 1. Since
2γ (1 +
√
1
√
η)2 + 4η(1 + η) − η
2γ (1 +
√
1
√
η)2 + 4η(1 + η)2
=
1
2(γ + 2η)(1 +
√
η)2
,
it follows that the convergence criterion (3.50) in Corollary 3.3 is sharper than that given in (4.3).
Now consider the case when 0 < β < 1. We assert that our convergence criterion (3.44) obtained in Corollary 3.2 is
sharper than the one in (4.3) in the case when γ 1. To show this, suppose that (4.3) holds and that γ 1. Recall that
μ = γ + (1 + β)η and ν = 1 + η1/(1+β) . Then μ, ν 1. Thus, thanks to the fact that 1 + β 21/β , one has
1610
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
Fig. 1. Graphs of g 1 (η) and g 2 (η).
1−
(1 + β)η
μ
(1 + β)γ
−
21/β μ1/β
=
γ
(1 + β)γ
−
0,
μ 21/β μ1/β
which is equivalent to
1−
η
γ
β
−
.
μ 21/β μ1/β 1 + β
Thus (4.3) gives
α
β 21−1/β
21−1/β
η
γ
21−1/β
21−1/β η
21−2/β γ
1
−
−
= 1/β 1+1/β − 1+1/β 1+1/β − 2/β 1+1/β .
1
/β
1
+
1
/β
1
/β
1
+
1
/β
1
/β
1
/β
μ 2 μ
(1 + β)μ ν
μ ν
μ ν
μ
ν
μ ν
Hence
α
21−1/β
−
μ1/β ν 1+1/β
21−1/β η
μ1+1/β ν 2+1/β
−
21−2/β γ
μ2/β ν 2/β
.
This shows that (3.44) is sharper than the one in (4.3) when γ 1.
We end this paper with an example for which Theorem 3.1 (with β = 0) is applicable but not [10, Theorem 2.6].
Example 4.1. Let R2 be endowed with the l2 -norm. Let x0 = (ξ10 , ξ20 ) T = (0.95, 0.96) T . Define F : R2 → R2 by
F (x) :=
Then
2
F (x) =
1
ξ12
−
1
2
ξ1 −ξ2
1
2
and
F (x) − F ( y ) =
ξ22 ,
1
2
T
for each x = (ξ1 , ξ2 ) T ∈ R2 .
50
for each x = (ξ1 , ξ2 ) T ∈ R2
0
ξ1 −
23
ξ1 − ζ1 ζ2 − ξ2
0
0
for each x = (ξ1 , ξ2 ) T , y = (ζ1 , ζ2 ) T ∈ R2 .
Thus, one has that
F (x0 )−1 F (x) − F ( y ) F (x0 )−1 F (x) − F ( y )
2
F
F
−1 2
F (x0 )
(ξ1 − ζ1 ) + (ξ2 − ζ2 )2
F
= F (x0 )−1 F x − y 2 .
W.P. Shen, C. Li / Applied Numerical Mathematics 59 (2009) 1599–1611
This means that F (x0 )−1 F satisfies Lipschitz condition (3.1) with the Lipschitz constant
α = F (x0 )−1 F (x0 )2 , we obtain
1611
γ = F (x0 )−1 F . Noting that
αγ = F (x0 )−1 F F (x0 )−1 F (x0 )2 = 0.149144421 . . . .
On the other hand, taking
η = 0.15, one has
(1 − 2η)
= 0.191370741 . . . > αγ
(1 + η)(2(1 + η) − η(1 − 2η)2 )
2
and
(4η + 5)3 − 2η2 − 14η − 11
= 0.128802424 . . . < αγ .
(1 + η)(1 − η)2
Thus, Theorem 3.1 (with β = 0) is applicable but not [10, Theorem 2.6].
References
[1] I.K. Argyros, A new convergence theorem for the inexact Newton methods based on assumptions involving the second Fréchet derivative, Comput.
Math. Appl. 37 (1999) 109–115.
[2] Z.Z. Bai, P.L. Tong, On the affine invariant convergence theorems of inexact Newton method and Broyden’s method, J. UEST China 23 (1994) 535–540.
[3] P.N. Brown, A.C. Hindmarsh, Matrix-free methods for stiff systems of ODEs, SIAM J. Numer. Anal. 23 (1986) 610–638.
[4] R.H. Chan, H.L. Chung, S.F. Xu, The inexact Newton-like method for inverse eigenvalue problem, BIT 43 (2003) 7–20.
[5] F. Cianciaruso, E. De Pascale, Newton–Kantorovich approximations when the derivative is Hölderian: old and new results, Numer. Funct. Anal. Optim. 24
(2003) 713–723.
[6] R.S. Dembo, S.C. Eisenstat, T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal. 19 (1982) 400–408.
[7] J.A. Ezquerro, M.A. Hernández, Generalized differentiability conditions for Newton’s method, IMA J. Numer. Anal. 22 (2002) 187–205.
[8] J.A. Ezquerro, M.A. Hernández, On an application of Newton’s method to nonlinear operators with w-conditioned second derivative, BIT 42 (2002)
519–530.
[9] W.B. Gragg, R.A. Tapia, Optimal error bounds for the Newton–Kantorovich theorem, SIAM J. Numer. Anal. 11 (1974) 10–13.
[10] X.P. Guo, On semilocal convergence of inexact Newton methods, J. Comput. Math. 25 (2007) 231–242.
[11] L.V. Kantorovich, On Newton’s method for functional equations, Dokl. Akad. Nauk SSSR (N.S.) 59 (1948) 1237–1240.
[12] C. Li, W.P. Shen, Local convergence of inexact methods under the Hölder condition, J. Comput. Appl. Math. 222 (2008) 544–560.
[13] A.M. Ostrowski, Solutions of Equations in Euclidean and Banach Spaces, Academic Press, New York, 1973.
[14] A.D. Polyanin, A.V. Manzhirov, Handbook of Integral Equations, CRC Press, Boca Raton, FL, 1998.
[15] W.P. Shen, C. Li, Convergence criterion of inexact methods for operators with Hölder continuous derivatives, Taiwan. J. Math. 12 (2008) 1865–1882.
[16] S. Smale, The fundamental theorem of algebra and complexity theory, Bull. Amer. Math. Soc. (N.S.) 4 (1981) 1–36.
[17] S. Smale, Newton’s method estimates from data at one point, in: The Merging of Disciplines: New Directions in Pure, Applied, and Computational
Mathematics, Laramie, WY, 1985, Springer, New York, 1986, pp. 185–196.
[18] S. Smale, Complexity theory and numerical analysis, Acta Numer. 6 (1997) 523–551.
[19] T. Steihaug, Quasi-Newton methods for large scale nonlinear problems, Ph.D. Thesis, School of Organization and Management, Yale University, 1981.
[20] X.H. Wang, Convergence of an iteration process, KeXue Tongbao 20 (1975) 558–559.
[21] X.H. Wang, Convergence of Newton’s method and inverse function theorem in Banach space, Math. Comput. 68 (1999) 169–186.
[22] X.H. Wang, Convergence of Newton’s method and uniqueness of the solution of equations in Banach space, IMA J. Numer. Anal. 20 (2000) 123–134.
[23] X.H. Wang, C. Li, Convergence of Newton’s method and uniqueness of the solution of equations in Banach spaces II, Acta Math. Sin. (Engl. Ser.) 19
(2003) 405–412.
[24] X.H. Wang, C. Li, Local and global behavior for algorithms of solving equations, Chinese Sci. Bull. 46 (2001) 441–448.
[25] T.J. Ypma, Local convergence of inexact Newton methods, SIAM J. Numer. Anal. 21 (1984) 583–590.
Download