Local and global behavior for algorithms of solving equations REVIEWS

advertisement
REVIEWS
Local and global behavior
for algorithms of solving
equations
WANG Xinghua 1,2 & LI Chong2,3
1. Department of Mathematics, Zhejiang University, Hangzhou 310028,
China;
2. Academy of Mathematics and System Sciences, Chinese Academy of
Sciences, Beijing 100080, China;
3. Department of Applied Mathematics, Southeast University, Nanjing
210096, China
Correspondence should be addressed to Wang Xinghua (e-mail: wangxh
@mail.hz.zj.cn)
Abstract The theory of “point estimate” and the concept
of “general convergence”, which were put forward by Smale
in order to investigate the complexity of algorithms for solving equations, have been producing a deep impact on the
research about the local behavior, the semi-local behavior and
the global behavior of iteration methods. The criterion of
point estimate introduced by him not only provides a tool for
quantitative analysis of the local behavior but also motivates
the establishing of the unified determination for the semilocal behavior. Studying the global behavior in the view of
discrete dynamical system will lead to many profound research subjects and open up a rich and colorful prospect. In
this review, we will make a summarization about the research
progress and some applications in nonsmooth optimizations.
Keywords: Banach space, nonlinear operator equation, point estimate, Sullivan domain, nonsmooth optimization.
Finding the solution of the nonlinear operator equation
(0.1)
f ( x) = 0
in Banach space X is a very general subject which is
widely used in theoretical and applied areas of mathematics. Here we suppose that f is a nonlinear operator from
some domain D in a real or complex Banach space X to
another Y. The most important method to find the approximation solution is the Newton method which is defined by
x n +1 = x n − f ′( x n ) −1 f ( x n ), n = 0, 1, Λ ,
(0.2)
for some initial value x0 D. In general, the iteration sequence {xn} generated by the Newton method (0.2) converges to a solution x* of eq. (0.1) at a quadratic rate, that
is,
|| x n − x * || 2 , n → ∞ .
|| x n +1 − x * ||
(0.3)
There are several kinds of high order generalizations
for the Newton method. The most important two are the
iterations of Euler’s family and Halley’s family. The kth
iteration of Euler’s family is defined as the kth partial sum
Chinese Science Bulletin Vol. 46 No. 6 March 2001
of Taylor expansion of the local inverse f x−1 of f around x
at f(x):
k
E k , f ( x) = x +
∑ j!( f
1
−1 ( j )
(f
x )
( x))( − f ( x)) j .
(0.4)
j =1
When k = 1, we also obtain the Newton iteration H1, f (x) =
Nf (x) = x − f (x)−1 f (x).
When X = Y = or , the kth iteration Hk, f (x) of
Halley’s family is defined as a zero of the Pade approximation of order k to f at x. The numerator of the Pade approximation is a linear expression while its denominator is
a polynomial of degree k−1. Hk, f (x) is connected with the
classical Bernoulli algorithm[1]. By the connection we can
give the general iteration of Halley’s family in Banach
spaces [2]. When k = 1, we again obtain the Newton method
H1, f (x) = N f (x). Note that the iterations of Hk,f (x) and Ek,f
(x) both use the values of derivatives till order k of f at x
and are convergent with order k+1, which is the
maximal order for the information set {f (xn), f (xn), … ,
f
(k)
(xn)}[3]. Besides the Newton method, they are the most
representative.
In this review, based on Smale’s creative work[4] we
will make a summarization about the research progress
made by the authors and their colleagues on the local behavior, the semi-local behavior and the global behavior of
these iterations as well as some applications in nonsmooth
optimizations.
1 Smale’s point estimate theory
( ) Kantorovich’s classical theory. The wellknown work on the convergence of the Newton method
due to Kantorovich[5,6], which is the beginning of modern
investigation on methods of solving equations, is always
our reference. Under the hypothesis that f is bounded on
D and x0 is far enough from the boundary of D, the famous
work transforms the convergence of the Newton method
into the judgment of one bound of some criterion about x0.
The result gives the proper hypothesis as well as the clear
and definite conclusion so that it could be applied and
generalized widely.
( ) Approximation zero. In the report written for
the 20th International Conference of Mathematician [4],
Smale also admired Kantorovich’s work very much. However, to investigate the complexity of the algorithm for
solving equations, he thought we should not be tied by the
Kantorovich’s frame but restart the investigation. In fact,
due to some unexpected progress eventually, we see that
Smale’s opinion of restarting is really foresight and sagacity. Smale[4] claimed that if there is an initial value x0 such
that the sequence {xn} generated by the Newton method
from x0 satisfies
n
||xn−x*|| q2 −1||x0−x*|| ,
(1.1)
441
REVIEWS
1
, then the zero x* of f will be determined effec2
tively. Such an initial value x0 is called an approximation
zero of the adjoint zero x* (from refs. [4, 7— 9] to refs.
[10,11], the forming of the concept of Smale’s approximation zero goes through a process[12]. Here we adopt the
final form from ref. [10]. By the way, we should point out
that Chen [13] gave some other definitions of the approximation zero. But Theorem 7.2 in Wang’s paper[14] indicates
that these definitions are equivalent to the Smale’s definition). Obviously, the complexity of solving equations lies
at the cost of finding the approximation zero.
( ) Point estimate.
For the end, we need to set
up the criterion only to depend on the information of f at
one point x and the rule to judge x an approximation zero.
For the sake of extricating the dependence on domain information of f, Smale[4, 8] supposed that f is analytic at x
(and the analyticity is not destroyed artificially on the convergence ball of Taylor series of f at x) and is defined
for q =
1
(1.2)
γ f ( x ) = sup f ′( x) −1 f ( k ) ( x ) / k! k −1 .

k 2
Hereby, he introduced the criterions α f (x)=γ f (x)||f (x) −1
f(x)|| and δf (x)=γ f (x*)||x−x*|| and proved that if δ = δf (x)
(5− 17 )/4, then the inequality (1.1) holds for q = δ/ (14δ + 2δ2). In particular, if δ (3− 7 )/2, x is an approximation zero of the adjoint zero x*. Moreover, the determination bound, as constant, is exact. This result can be
viewed as the description of the local behavior of the
Newton method, from which the following section will be
spread.
However, in order to yield the inequality (1.1) and to
judge x an approximation zero, the bound given by Smale
for the criterion αf (x) is not exact. Wang et al. [15] obtained
the following exact result: if α = α f (x)
3− 2 2 , then the
inequality (1.1) holds for q = 1 − α − 1 − 6α + α 2 


1 − α + 1 − 6α + α 2  . In particular, if α (13 − 3 17 )/4,




x is an approximation zero. This result can be viewed as
the description of the semi-local behavior of the Newton
method, from which section 3 will be spread.
( ) A problem of Smale.
To explain the reasonableness of the criterion αf (x), Smale [4] proved
αϕ(t)
ϕ(t)= Σ di=0 t i , d
1, ∀t ∈ ( 0, 1),
(1.3)
where
∈ Υ{∞} , is the set of all natural
numbers. Considering the extension of his criterion in
Hilbert spaces, Smale presented an open problem in ref. [4]
whether the inequality (1.3) remains true for ϕ (t ) =
442
(Σ
)
d
2i 1/ 2
.
i =0 t
Wang et al.[16] proved that if 0
t
1/ 2 ,
αϕ(t) 1, and so they answered Smale’s problem negatively.
( ) Global complexity of algorithm of solving equations. In the report[4], Smale gave an estimate to the global
complexity of the method for solving equations and applied it to some linear programming problems. Wang et
al.[16] found that the result is not perfect and so completed
it.
The most part of the work around Smale’s report[4]
had been completed before 1992. The main results can be
found in Wang’s survey[17,18].
2 Local behavior of iterations
( ) Local behavior.
When x is near the solution
x* of the equation (0.1), what properties does the iteration
have? For example, in the case when x* is a single root, we
have that
|| Ek , f ( x) − x * ||
|| H k , f ( x ) − x* ||
|| x − x * || k +1 ,
(2.1)
as x x , that is, the kth iterations of Euler’s family and
Halley’s family, as shown above, are both of convergence
order k+1. These are the local behavior of iterations. Of
course, they are rather rough.
( ) Convergence ball of the Newton method.
Smale’s point estimate criterion provides the tool for the
quantitative research of the local behavior of iterations.
*
Resumptively, his result gave the exact radii r = (5− 17 )/
(4γf (x*)) of the convergence ball of the Newton method.
However, Traub et al.[19] and Wang[20] had proved before
that if, for a positive constant L,
|| f ′( x * ) −1 ( f ′( x) − f ′( x′)) || L || x − x ′ || ,
(2.2)
∀x, x ′ ∈ D ,
3
then the inequality (1.1) holds for q = Lρ(x0) provided
2
that ρ(x0)=||x0−x*|| 2/(3L). Moreover, 2/(3L) is the best
constant, in other words, under condition (2.2), the exact
radius of the convergence ball of the Newton method is r =
2/(3L).
Wang[21] put forward the concept of so-called “weak
condition”, which is weaker than condition that f is analytic at x and γf (x) γ, but still has the same quantitative
relation. For convenience, we first give the definition. We
say that f satisfies γ -condition of order k at x, if
|| f ′( x) −1 f (i ) ( x ) || i!γ i −1 , i = 2, Λ , k ;

|| f ′( x) −1 f ( k +1) ( x ′) || (k + 1)!γ k (1 − γ || x ′ − x ||) − k − 2 .
(2.3)
Wang et al.[22] showed that the exact radius of the convergence ball of the Newton method is still r = (5− 17 )/(4γ )
Chinese Science Bulletin Vol. 46 No. 6 March 2001
REVIEWS
provided that f satisfies γ -condition of order 1 at x*.
Furthermore, Wang[23] introduced the more general
and weaker “radius Lipschitz condition with the Laverage”, which includes condition (2.2) and γ -condition
of order 1 at x*:
|| f ′( x * ) −1 ( f ′( x ) − f ′( xτ )) ||
ρ ( x)
∫τρ( x)
L(u )du, ∀x ∈ B( x * , r ), 0
τ 1.
(2.4)
where L is a positive nondecreasing function while xτ = x*+
τ(x−x*). In this case, he proved that if r satisfies
1 r
(2.5)
(r + u) L(u)du 1,
r 0
then the Newton method with the initial value x0 ∈B(x*, r)
is convergent and the inequality (1.1) holds for
∫
∫
q=
ρ ( x0 )
uL(u )du
0
.
(2.6)
ρ ( x0 )


L(u) du 
ρ ( x 0 )1 −
0


Moreover, the value of r given by (2.5) is exact. Thus,
from (2.5), if taking L(u) = L a constant, and L(u) =
2γ (1−γ u)−3, respectively, we immediately obtain the exact
radii r =2/(3L), given by Traub et al.[19] and Wang[20], and r
∫
= (5− 17 )/(4γ ), given by Wang et al. [22] and Smale[8].
( ) Uniqueness ball of solution of the equations.
In the convergence ball of the Newton method, the solution of eq. (0.1) is certainly unique. But could the uniqueness ball be larger? Could the conditions for f to satisfy be
weaker? In fact, it is the case. Wang[23] introduced “the
center Lipschitz condition with L average”:
|| f ′( x * ) −1 ( f ′( x) − f ′( x* )) ||
ρ ( x)
∫0
L(u)du, ∀x ∈ B( x * , r ).
(2.7)
This is actually the case when τ = 0 in (2.4), but here L is
only required to be a positive integrable function. Under
this condition, he proved that if r satisfies that
1 r
(2.8)
(r − u ) L(u)du 1,
r 0
then eq. (0.1) has a unique solution in B(x*, r) and the value of r given by (2.8) is exact. Similarly, when we take L(u)
to be a constant L, (2.8) gives the radius r = 2/L of the
uniqueness ball of eq. (0.1) under condition (2.2), while
when we take L(u)=2γ (1−γ u)−3, (2.8) give the radius r =
1/(2γ ) of the uniqueness ball of eq. (0.1), which is a complete improvement of the result that eq. (0.1) has unique
∫
solution in B(x*, (5− 17 )/(4γ )) in the case when γf (x*)
γ due to Dedieu [24].
( ) There are no universal constant.
It should be
noted that the radius of the uniqueness ball of eq. (0.1) is
Chinese Science Bulletin Vol. 46 No. 6 March 2001
different from that of the convergence ball of the Newton
method even if f is analytic at x* and γf (x*) γ . Huang[25]
obtained the radii of the convergence balls of the Euler
method
and
the
Halley
method
E 2n, f ( x0 )
E 2n, f ( x 0 ) , which
are,
respectively,
0.181462/γ
and
0.164878/γ. Therefore, there is no universal constant for
the iterations, which is quite different from the semi-local
behavior of iterations studied in the coming section.
3 Semi-local behavior of iterations
( ) Semi-local behavior. The properties of iteration methods determined completely by the informations
of the initial value x0 are called the semi-local behavior.
Kantorovich’s theory is considered as the successful model
to investigate the semi-local behavior of iteration methods.
But this also easily brought misleading that the more complicated bound was required in order to determine the convergence of the iterations that have higher order than the
Newton method. The work led by Smale’s α criterion
corrected this misleading in time.
( ) The universal constant. Wang et al. [2] showed
that if αf (x 0) 3− 2 2 , the iteration of Halley’s family
H kn, f ( x0 ) and the iteration of Euler’s family E kn, f ( x 0 ) as
well as the Euler series E∞ , f ( x0 ) all are convergent. This
result first opened up the unified property of the semi-local
behavior of the iterations for finding zeros of equations.
Under the hypothesis that f is analytic at x0, 3− 2 2 is a
universal constant to judge the convergence by Smale’s α
criterion.
( ) The universal constant is applied to operators
with different smoothness. Note that Kantorovich’s theorem only assumes that f
is bounded. The analyticity
assumption of f at x0 seems too strong for the study of the
semi-local behavior of the Newton method. Is the strong
condition necessary to obtain a universal constant? Wang[21]
pointed out that it is not the case. The γ-condition of order
k in the last section is introduced for this purpose. It was
proved that for any given positive integer k, if f satisfies γcondition of order k at x0, then α = γ ||f (x0)−1 f(x)||
3− 2 2 implies that eq. (0.1) has unique solution in
B ( x 0 , r ) and for all 1
j k, the iteration of Halley’s fam-
ily H nj , f ( x0 ) converges, where r 1 r r2 while
r1  1 + α µ 1 − 6α + α 2
.
(3.1)
=
r2 
4γ
Recently Wang[26] proved again that the above result is still
true for Euler’s family.
( ) Unifying of Kantorovich condition and Smale
443
REVIEWS
condition. Smale’s condition has been weakened so that
the semi-local behavior of the Newton method only needs
γ-condition of order 1,
2γ
.
(3.2)
|| f ′( x0 ) −1 f ′ ( x ) ||
(1 − γ || x − x0 ||) 3
It seems possible to unify the condition and Kantorovich’s
condition. For this end, similar to (2.7), Wang[14] introduced the center Lipschitz condition with L average at x0:
|| f ′( x0 ) −1 ( f ′( x) − f ′( x0 )) ||
ρ ( x)
∫0
L(u)du, ∀x ∈ B ( x 0 , r ),
(3.3)
where L is a positive integrable function. In this case, we
proved that if
β = || f ′( x) −1 f ( x 0 ) ||
b=
r0
∫0
uL(u) du,
(3.4)
then eq. (0.1) has a unique solution in the ball B ( x 0 , r ) :
x* ∈ B ( x0 − f ′( x0 ) −1 f ( x0 ), r1 − β ) ⊂ B ( x0 , r1) ,
(3.5)
where r0 satisfies
r0
∫0
L(u )du = 1 while r1 and r 2 are two
positive zeros of
h(t ) = β − t +
with r1 r0
dition
t
∫0 (t − u )L(u)du
∫
ρ ( x x ′)
ρ ( x0 )
L (u )d u ,
∀x ∈ B( x 0 , r ), ∀x ′ ∈ B ( x, r − ρ ( x)),
(3.7)
where L is a positive nondecreasing function and
ρ ( x x ′) = || x ′ − x || + ρ ( x ) . In this case, we proved that if
(3.4) holds, the sequence {xn} generated by the Newton
method converges and satisfies
|| x * − xn ||
r1 − t n
 || x * − xn −1 || 



 r −t
 1 n −1 
be judged by the same bound (3.4) only provided that condition (3.7) is strengthened properly. With this end, it suffices to assume that L′ is nondecreasing and f satisfies
|| f ′( x 0 ) −1 f ′′( x 0 ) || L(0),

(3.9)
ρ ( xx ′)

−1
L ′(u )du .
|| f ′( x 0 ) ( f ′′( x ) − f ′′( x ′)) ||
ρ ( x)

Obviously, the condition (3.9) implies the condition (3.7).
Now we still take L(u) = 2γ (1−γu)−3 to give the weak
Smale condition while take L(u) = γ + Cu to give the following condition of Kantorovich type.
∫
|| f ′( x 0 ) −1 f ′′( x 0 ) || γ ,

|| f ′( x 0 ) −1 ( f ′′( x ) − f ′′( x ′)) || C || x − x ′ || .
(3.10)
Under the condition (3.10), we have
2 γ + 2 γ 2 + 2C 
 (3.11)
r0 =
, b= 
.
2
2
2


γ + γ + 2C
3 γ + γ + 2C 


The determination bound (3.11) was first raised in our
seminar. By this determination bound, under the condition
(3.10) (and the γ -condition by the determination bound b
2
(3.6)
r2. Similar to (2.4), again introduce the con-
|| f ′( x0 ) −1 ( f ′( x) − f ′( x′)) ||
problems depend respectively on two different conditions,
that is, the uniqueness and existence of the solution of eq.
(0.1) depend on condition (3.3) and the convergence of the
Newton method depends on condition (3.7), but condition
(3.7) implies condition (3.3). We again see the unity of the
semi-local behavior: the two problems are judged by the
same bound (3.4) while the more basic problem depends
on the weaker condition.
It is expected that the convergence of the Euler
method E 2n, f ( x0 ) and the Halley method H 2n, f ( x0 ) can
2
= (3− 2 2 )/γ ), some young members in our group such as
D. F. Han, Z. D. Huang and K. W. Liang, respectively,
investigated the semi-local behavior of many concrete
iterations.
It can be con( ) Operators classes of Smale.
sidered that the operators satisfying the condition γf (x) γ
define an operator class S (∞ ) . Consequently, the operators
satisfying γ -condition of order k at x define a larger operator class S (∞ ) than the operator class S(k). Clearly, as k
increases, they form a monotonic inclusion chain:
where t n = N hn ( 0) . Taking L to be a constant and L(u) =
2γ (1−γu)−3, we obtain the result corresponding to Kantorovich condition and Smale condition, respectively.
It is worth while attending that the two different
S ⊃ S (1) ⊃ S ( 2) ⊃ Λ ⊃ S (∞ ) .
(3.12)
Roughly speaking, the largest class S is the operator class
defined by (3.3) when L(u) = 2 γ (1− γu)−3. We call them
the operator classes of Smale. Besides, we can also obtain
some chains of generalized operator classes of Smale when
γ -condition is extended. For any chain of operator classes
444
Chinese Science Bulletin Vol. 46 No. 6 March 2001
2n
 || x * − x0 || 

 ,


r1


(3.8)
REVIEWS
of Smale, there exists a universal constant b such that the
criterion
(3.13)
β = || f ′( x 0 ) −1 f ( x 0 ) || b
is used to judge the existence and uniqueness of the zero of
any operator in S, the convergence of the previous k iterations of Euler’s family and Halley’s family for any operator in S(k) and all iterations of Euler’s family and Halley’s
family as well as Euler series and Bernoulli algorithm for
any operator in S (∞ ) . Wang[27] considered elementarily the
possibly generalized operator classes of Smale and the
further results are given in the recent paper
“Unified criterion of iteration for zero point in Banach
spaces”.
4 Global behavior for polynomials
( ) In the view of discrete dynamical system .
Many iterations, such as Euler’s family and Halley’s
family, are rational when f is a polynomial so that the results due to Fatou and Julia on discrete dynamics can be
applied. Investigating the global behavior of iterations in
this view, we can hold the main problems completely. For
example, if a “dead circle” appeared for some iteration, we
might think this is due to the fault of the iteration itself
before. But in the view of discrete dynamics, “dead circle”
is simply a periodic orbit. The closure of the exclusive
periodic points of a rational iteration is called Julia set,
which, for a common iteration, is a nondense perfect set
that has as many points as the reals. Therefore, it is not
strange that a “dead circle” appears for some iteration. In
fact, there must be many dead circles for every iteration.
( ) Without generally convergent one-point stationary algorithm.
In the view of discrete dynamical system, the most important problem is whether there exists
any generally convergent one-point stationary algorithm.
“General convergence” means that there exists an open set
with full measure U ⊂ Fd × such that the algorithm
with the initial value z converges to the zero of f for any
( f , z ) ∈ U , where Fd is the linear space of all polynomials of degree d, = Υ{∞} . McMullen[28] answered
negatively this problem. He proved that there does not
exist any generally convergent one-point stationary algorithm when d >3.
( ) Positive measure set of initial values for the
Newton method to fail.
From the above fact, for every
rational iteration, there is an open set with positive measure such that the iteration fails. In particular, for the Newton method, Curry et al.[29], according to Fatou’s theorem[30]
that the basin of a rational iteration must contain at least
one of its critical points, used computers to search cubic
polynomials that have an initial value domain with positive
measure such that the iteration is not convergent. They also
drew some pictures similar to Mandelbrot set (see the color
Chinese Science Bulletin Vol. 46 No. 6 March 2001
pictures in ref. [31]). Smale [32] constructed directly such a
cubic polynomial:
1
f 0 ( x) = x 3 − x + 1 .
2
The Newton iteration for this polynomial has a superattractive periodic orbit {0,1} of period 2. In the congress
report “Dynamics of Newton’s method on Riemann
sphere” given in the Fifth Conference of Computational
Mathematical Society of China in 1995 [31], Wang generalized the Smale’s example. He constructed a cubic polynomial fλ(z) such that Newton’s method has a periodic orbit
{0,1} of period 2 and a multiplier λ. Moreover, for different λ, he also drew four types of Sullivan domains to make
the Newton method fail[33]. By Shishikura’s theorem, it
follows that the necessary condition for Julia set to be
disconnected is that there exist two fixed points with multiplier λ = 1 or | λ | >1. This implies that the Julia set for
the Newton iteration is connected (it has only one fixed
while others with | λ | < 1) so
point with | λ | > 1, i.e.
that it has not any Harman ring.
( ) There is almost no phenomenon of phony convergence for the iterations of Halley’s family. There
is no end for finding the initial value domain with positive
measure such that the iteration is not convergent. One
important problem is whether periods of periodic orbits
corresponding to these Sullivan’s basins are more than 1 or
some of them are equal to 1. If there is any periodic orbit
of period 1, i.e. the fixed point, then there are a lot of phenomena of “phony convergence”. In general, zeros of a
polynomial are fixed points of the iteration. However, as
the topological degree of the iteration is often more than
the degree of the polynomial (the topological degree of the
Newton iteration is equal to the degree of the polynomial,
which is its advantage), the iteration generally has some
fixed points that are not zeros of the polynomial. We call
such fixed points the extraneous fixed points. The exclusive extraneous fixed point need not be worried about
(since there is no initial value set with positive measure
corresponding to it such that the iteration is not convergent). But for other fixed points, especially the superattractive or attractive extraneous fixed points, when we take
the initial value in the Sullivan’s domain corresponding to
it, the iteration sequence converges to a point that is not a
zero of the polynomial. This is the phenomenon of “phony
convergence”, which is also called “numerical extraneous
root”. Smale[32] proved that the Newton iteration has no
extraneous fixed point in the Gauss plane (this is seldom
met with the iterations). In addition,
is an exclusive
extraneous fixed point of the Newton iteration so that the
Newton iteration cannot diverge to
. Vrscay et al. [34]
n
point out that the Halley iteration H 2, P ( x ) and the iteration H kn, P ( x ) of Halley’s family for P(x) = xd−1(d 2)
445
REVIEWS
only have exclusive extraneous fixed points in the Gauss
plane. Wang et al. [1] solved the problem completely. They
proved that, for any polynomial f ∈ Fd , all the extraneous
fixed (at most (k−1)(d−1)+1) points of any kth iteration of
Halley’s family in the whole Riemann plane are exclusive. Therefore, there are almost no phony convergence for
any iteration of Halley’s family and almost no phenomenon of “numerical extraneous roots” appear.
( ) There exists the phenomenon of phony convergence for the iterations of Euler’s family in an open set
with positive measure.
Opposite to Helley’s family,
any iteration, but the Newton iteration, of Euler’s family
has non-exclusive extraneous fixed points of every type,
which was given by Wang et al. in the paper “Dynamics of
the iteration of Euler’s family and Halley’s family”. In
particular, for the Euler iteration E 2n, f ( x) , we gave a
polynomial to have the extraneous fixed point of the eigenvalue λ, where λ = r exp(2πα i) is any complex number[35]. Let the parameters r and α in the expression of λ
take the different values. We obtain that the extraneous
fixed points are respectively superattractive (r = 0), attractive (0 < r < 1), rationally indifferent (r = 1, α rational
number) and Siegel point (r = 1, α is irrational number
satisfying Bryuno [36] condition (ref. [37]). Sullivan domains adjoint to these points, that is, superattractive domain, attractive domain, parabolic domain and Siegel disk,
which are drawn in the paper[35], are the initial value domains with positive measure that makes the Euler method
fail. Taking the initial value in these Fatou domains (i.e.
superattractive domain, attractive domain, parabolic domain), the phenomenon of “phony convergence”, that is,
“numerical extraneous root”, appears for the Euler method.
Plate
is the color picture of Sullivan domains of the
Euler iteration for the polynomial f0(x) = x3 + x2 + 2x−4.
The sizes of its windows are [−3,3] [−2,2]. The red domain is the superattractive Sullivan corresponding to the
extraneous fixed point z = 0. The three roots of f0(x) are
be written as an optimization problem as follows in a
Banach space:
min F(x):= h(f(x)),
(5.1)
where f is a continuously Frechet differentiable operator
from a real reflexive Banach space X to another Y while h
is a convex function on Y. This problem has recently been
received a great deal of attention and is justifiable, so
many problems in optimization theory, such as nonlinear
inclusion, minimax problems and penalization methods,
can be cast within this framework. Moreover, this model
provides a unifying framework for the development and
analysis of algorithmic solution techniques. The GaussNewton method is one of the main methods for solving
convex composite optimization problems.
( ) Gauss-Newton method. The Gauss-Newton
method was proposed by Gauss in the early 19th century
for finding the solution of the following nonlinear leastsquare problem:
min f (x)Tf (x).
(5.2)
Abandoning the term containing the second derivative of
f(x), we obtain the basic Gauss-Newton method:
x n +1 = x n − [ f ′( x n ) T f ′( x n )]−1 f ′( x n ) T f ( x n ) .
(5.3)
Naturally, in the iteration (5.3), f (xn) need to be full column rank. To relax this limitation, we linearize f and solve
the linear least-square problem:
min(f (xn)+f (xn)d)T(f (xn)+f (xn)d),
(5.4)
to get the solution dn. Then set xn+1 = xn+dn. If using the
notation of generalized inverse, the Gauss-Newton method
can be rewritten as
xn+1 = xn−[f (xn)]+f(xn).
(5.5)
Applying this idea to convex composite optimization
problems, we get the Gauss-Newton method of solving
problem (5.1): For ∆ 0, x X, let D∆(x) denote the set of
solutions of the following optimization problem
(5.6)
min h( f ( x) + f ′( x )d )
||d ||
∆
Convex
( ) Convex composite optimization.
composite optimization problems are one of the main directions of the nonsmooth optimization theory, which can
and let d(x,C) denote the distance from x to C. Then the
(n+1)th step of the Gauss-Newton method is
xn+1 = xn+dn,
(5.7)
where dn D∆(xn) is chosen to satisfy ||dn|| ηd(0, D∆(xn)),
while η 1 has been fixed before.
( ) Semi-local behavior in the normal case. When
X = n, Y = m, using the idea of strong uniqueness in approximation, Womersley[38] gave the convergence condition that the Gauss-Newton method converges to the local
solution x* at a quadratic rate under the assumption that the
local solution x* of the convex composite optimization
problem (5.1) is strongly unique. This result generalizes
the corresponding result[39] in the case when h is a norm.
Comparing with Womersley’s work, Burke et al.[40] made
great progress in the study of this direction. Their research
446
Chinese Science Bulletin Vol. 46 No. 6 March 2001
respectively 1, −1 ± i 3 . This picture is helpful for
watching these color pictures[35] because the size of the
windows is only [−0.2, 0.2] [−0.2, 0.2] so that the three
roots of fλ(x) cannot be seen clearly. In fact, fλ is just a
disturbance of f0 on its coefficients, so the places of the
three roots of fλ are just the disturbance of the corresponding three roots of f0.
One open problem is whether there do exist Harman
rings for the iterations of Euler’s family.
5 Gauss-Newton method for convex composite optimization
REVIEWS
is based on the following two basic assumptions: one is the
weak sharpness of the set C of minima for h, which is a
weakening and generalizing of the strong uniqueness, that
is, there exists λ > 0 such that
(5.8)
h(y) hmin+λd(y,C), ∀ y Y,
and the other is the existence of the regular points x of
the inclusion f(x) C. Under the two assumptions and the
condition that h and f ′ satisfy Lipschitz condition in the
(5.11)
T −1 ( x 0 ) y = {z ∈ X : f ′( x 0 ) z ∈ y + C} .
Let L be the nondecreasing function given in Section 3.
Suppose that
corresponding neighborhoods of f (x ) and x , respectively, Burke and Ferris provide a neighborhood of x ,
determined by some information on x , such that, with the
initial point x0 from this neighborhood, the sequence generated by the Gauss-Newton method converges to a solution x* of problem (5.1) at a quadratic rate. Of course, the
information on x is closely related to the Lipschitz constants of h and f ′ and the constant λ in (5.8). It is worth
noting that their result does not require that the solution of
the problem (5.1) is unique. Although the weak sharpness
along with the regularity of x* implies the strong uniqueness of the solution x* in the case that C is a singleton,
Burke and Ferris’ assumptions are more proper because
they are made on h and f, respectively. However, the assumption that the minima set C of h is a weak sharp minima is quite strong. In fact, if some partial derivative of h
exists, then its minima set C will not be weak sharp. The
recent work due to Li and Wang relaxes this restriction. In
the paper[41, 42], only in the assumption that the inclusion f(x)
C has the regular point x and f ′ satisfies Lipschitz
condition in some neighborhood of x , did we obtain the
same results for any convex function h. In this case, the
information on x is independent from h so as to be simpler and clearer.
( ) Semi-local behavior in the surjective case .
One problem worth being considered is whether the
majorizing function technique for the Newton method to
solve nonlinear equations in section 3 can be applied to the
Gauss-Newton method for complex convex optimization
problems. This work has been completed by Li and Wang
in the recent paper “Point estimate of the Gauss-Newton
method for complex convex optimization problems”. Assume that X, Y are two reflexive Banach spaces, and the
minima set C for h is a nonempty closed convex cone with
0 C . Now for x0 X the convex pro- cess is defined as
follows:
T(x0)x = f (x0)x−C, ∀ x X.
(5.9)
According to the definition of Robinson[41], the norm and
the inverse of the convex process T(x0) are respectively
defined as
||T(x0)|| = sup {inf || y ||: y ∈ T ( x 0 ) x}
(5.10)
Furthermore, assume that r0 satisfies
|| x||
1
and
Chinese Science Bulletin Vol. 46 No. 6 March 2001
ρ ( xx ′)
∫ρ ( x)
|| T −1 ( x 0 ) || || f ′( x ) − f ′( x ′) ||
L(u) du ,
∀x ∈ B( x 0 , r ), ∀x ′ ∈ B ( x , r − ρ ( x )) .
(5.12)
b=
r0
∫0 uL(u)du
as in section 3(
r0
∫0
L(u) du = 1, and
) while r1 r2 are two
positive zeros of the majorizing function
t
∫0
hβ , L = β − t + (t − u ) L(u)du.
(5.13)
Let the sequence {tn} be generated by the Newton method
with the initial value t0 = 0 for the function hβ,L. Under the
assumption that T(x0) is surjective and condition (5.12) is
satisfied, we proved that β = ||x 1−x0|| b implies the sequence {xn} generated by the Gauss-Newton method with
the initial value x0 (when η = 1, and ∆ is large enough)
converges to a solution x * ∈ B ( x 0 , r ) of problem (5.1) at
a quadratic rate and the following inequality holds:
|| xn − x* || r − tn,
where r1 r r 2 if β b and r = r1 = r2 if β = b. In particular, taking L constant implies the generalization of the
famous Kantorovich theorem in convex composite optimization while taking L(u) = 2γ (1−γu)−3 implies the corresponding generalization of the theorem of Smale type.
Acknowledgements This work was jointly supported by the Special
Funds for Major State Basic Research Projects (Grant No. G19990328),
the National Natural Science Foundation of China (Grant No. 19971013)
and Zhejiang (Grant No. 100002) and Jiangsu Provincial (Grant No.
BK99001) Natural Science Foundation of China as well.
References
1.
2.
3.
4.
5.
6.
7.
8.
Wang, X. H., Han, D. F., On fixed points and Julia sets for iterations of two families, Chinese J . Numer. & Appl., 1997, 19(3): 94.
Wang, X. H., Zheng, S. M., Han, D. F., Convergence on Euler series, the iterations Euler’s and Helley’s families, Acta Math. Sinica
(in Chinese), 1990, 33(6): 721.
Wozniakowski, H., Generalized information and maximal order of
iteration for operator equations, SIAM J. Numer. Anal., 1975, 12:
121.
Smale, S., The fundamental theory for solving equations, in proceedings of the international congress of mathematicians, AMS,
Providence, RI, 1987, 185.
Kantorovich, L. V., On Newton method for functional equations,
Dokl. Acad. N. USSR, 1948, 59(7): 1237.
Kantorovich, L. V., Akilov, G. P., Functional Analysis, New York:
Pergamon Press, 1982.
Smale, S., The fundamental theorem of algebra and complexity
theory, Bull. (New Ser.), Amer. Math. Soc., 1981, 4: 1.
Smale, S., Newton’s method estimates from data at one point, in
the Merging of Disciplines: New Directions in Pure, Applied and
447
REVIEWS
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
448
Computational Mathematics (eds. Ewing, R., Gross, K., Martin,
C.), New York: Springer, 1986, 185 196.
Wang, X. H., Xuan, X. H., Random polynomial space and complexity theory, Scientia Sinica Ser. A, 1987, 30(7): 637.
Smale, S., Complexity theory and numerical analysis, Act. Numer.,
1997, 6: 523.
Blum, L., Cucker, F., Shub, M. et al., Complexity and Real Computation, New York: Springer, 1998.
Wang, X. H., Definite version on precise point estimate, Progress
in Natural Science, 1998, 8(2): 152.
Chen Pengyuan, Approximate zeros of quadratically convergent
algorithms , Math. Computation, 1994, 63: 247.
Wang, X. H., Convergence of Newton’s method and inverse function in Banach spaces, Math. Comput., 1999, 68: 169.
Wang, X. H., Han, D. F., On dominating sequence method in the
point estimate and Smale theorem, Scientia Sinica Ser. A, 1990,
33(2): 135.
Wang, X. H., Han, D. F., Shen, G. X., Some remarks on Smale’s
“Algorithms for solving equations”, Act. Math. Sinica (New Ser.),
1992, 8(4): 337.
Wang, X. H., Some results relevant to Smale’s reports, in From
Topology to Computation: Proceedings of the Smalefest (eds.
Hirsch, M., Marsden, J., Shub, M.), New York: Springer, 1993, 456
465.
Wang, X. H., A summary on continuous complexity theory, Contemp. Math., 1994, 162: 155.
Traub, J., Wozniakowski, H., Convergence and complexity of
Newton iteration for operator equation, J. Assoc. Comput. Mach.,
1979, 29: 250.
Wang, X. H., Convergent neighborhood on Newton’s method,
Kexue Tongbao (Chinese Science Bulletin), Special Issue of Math.,
Phy. and Chemistry, 1980, 25: 36.
Wang, X. H., Convergence on the iteration of Halley family in
weak conditions, Chinese Science Bulletin, 1997, 42(7): 552.
Wang, X. H., Han, D. F., Criterion α and Newton’s method, Chinese J. Numer. and Appl. Math., 1997, 19(2): 96.
Wang, X. H., Convergence of Newton’s method and uniqueness of
the solution of equations in Banach space, IMA J. Numer. Anal.,
2000, 20(1): 123.
Dedieu, J. P., Estimations for the separation number of a polynomial system, J. Symbolic Comput., 1997, 21: 1.
Huang, Z. D., On a family of Chebyshev-Halley type methods in
Banach space under weak Smale conditions, Numer. Anal., JCU,
2000, 9(1): 37.
Wang, X. H., Convergence of iterations of Euler family under
weak condition, Science in China, Ser. A, 2000, 43(9): 958.
Wang, X. H., Convergence of the iteration of Halley’s family and
Smale operator class in Banach space, Science in China, Ser. A,
1998, 41(7): 700.
McMullen, C., Families of rational maps and iterative root finding
algorithms , Ann. Math., 1987, 125: 467.
Curry, H., Garnett, L., Sullivan, D., On the iteration of a rational
function: computer experiments with Newton’s method, Commun.
Math. Phys., 1983, 91: 267.
Fatou, P., Sur les equations fonctionnalles, Bull. Soc. Math. France,
1919, 47: 161 271; 1920(48): 33 94; 208 314.
Si, Z. C., Yuan, Y. X., Wonderful Computation, Changsha: Hunan
Science and Technique Press, 1999.
Smale, S., On the efficiency of algorithms of analysis, Bull. (New
Ser.) Amer. Math. Soc., 1985, 13: 87.
Sullivan, D., Quasiconformal homeomorphisms and dynamics, Part
: Solution of the Fatou-Julia problem on wandering domains,
Ann. Math., 1985, 122: 401.
Vrscay, E. R., Gilbert, W. J., Extraneous fixed points,basin
boundaries and chaotic dynamics for Schröder and König rational
35.
36.
37.
38.
39.
40.
41.
42.
43.
iteration functions, Numer. Math., 1988, 52: 1.
Wang, X. H., Han, D. F., The extraneous fixed points of Euler iteration and corresponding Sullivan’s basin, Science in China, Ser.
A, 2001, 44(3): 292.
Bryuno, A. D., Convergence of transformations of differential
equations to normal forms , Dokl. Akad. Nauk. USSR, 1965, 165:
987.
Yoccoz, J. C., Linearisation des germes de diffeomorphismes
holomorphes de (C, O), C. R. Acad. Sci. Paris., 1988, 36: 55.
Womersley, R. S., Local properties of algorithms for minimizing
nonsmooth composite functions, Mathematical Programming,
1985, 32: 69.
Jittorntrum, K., Osborne, M. R., Strong uniqueness and the second
order convergence in nonlinear discrete approximation, Numerische Mathematik, 1980, 34: 439.
Burke, J. V., Ferris, M. C., A Gauss-Newton method for convex
composite optimization, Mathematical Programming, 1995, 71:
179.
Li, C., Wang, X. H., Gauss-Newton methods for a class of
nonsmooth optimization problems, Progress in Natural Science,
2000, 10(6): 470.
Li, C., Wang, X. H., On Convergence of the Gauss-Newton method
convex composite optimizction, Math. Programming, 2000, 94: to
appear.
Robinson, S., Normed convex processes, Trans. Amer. Math. Soc.,
1972, 174: 127.
(Received October 30, 2000)
Chinese Science Bulletin Vol. 46 No. 6 March 2001
Download