REVIEWS Local and global behavior for algorithms of solving equations WANG Xinghua 1,2 & LI Chong2,3 1. Department of Mathematics, Zhejiang University, Hangzhou 310028, China; 2. Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100080, China; 3. Department of Applied Mathematics, Southeast University, Nanjing 210096, China Correspondence should be addressed to Wang Xinghua (e-mail: wangxh @mail.hz.zj.cn) Abstract The theory of “point estimate” and the concept of “general convergence”, which were put forward by Smale in order to investigate the complexity of algorithms for solving equations, have been producing a deep impact on the research about the local behavior, the semi-local behavior and the global behavior of iteration methods. The criterion of point estimate introduced by him not only provides a tool for quantitative analysis of the local behavior but also motivates the establishing of the unified determination for the semilocal behavior. Studying the global behavior in the view of discrete dynamical system will lead to many profound research subjects and open up a rich and colorful prospect. In this review, we will make a summarization about the research progress and some applications in nonsmooth optimizations. Keywords: Banach space, nonlinear operator equation, point estimate, Sullivan domain, nonsmooth optimization. Finding the solution of the nonlinear operator equation (0.1) f ( x) = 0 in Banach space X is a very general subject which is widely used in theoretical and applied areas of mathematics. Here we suppose that f is a nonlinear operator from some domain D in a real or complex Banach space X to another Y. The most important method to find the approximation solution is the Newton method which is defined by x n +1 = x n − f ′( x n ) −1 f ( x n ), n = 0, 1, Λ , (0.2) for some initial value x0 D. In general, the iteration sequence {xn} generated by the Newton method (0.2) converges to a solution x* of eq. (0.1) at a quadratic rate, that is, || x n − x * || 2 , n → ∞ . || x n +1 − x * || (0.3) There are several kinds of high order generalizations for the Newton method. The most important two are the iterations of Euler’s family and Halley’s family. The kth iteration of Euler’s family is defined as the kth partial sum Chinese Science Bulletin Vol. 46 No. 6 March 2001 of Taylor expansion of the local inverse f x−1 of f around x at f(x): k E k , f ( x) = x + ∑ j!( f 1 −1 ( j ) (f x ) ( x))( − f ( x)) j . (0.4) j =1 When k = 1, we also obtain the Newton iteration H1, f (x) = Nf (x) = x − f (x)−1 f (x). When X = Y = or , the kth iteration Hk, f (x) of Halley’s family is defined as a zero of the Pade approximation of order k to f at x. The numerator of the Pade approximation is a linear expression while its denominator is a polynomial of degree k−1. Hk, f (x) is connected with the classical Bernoulli algorithm[1]. By the connection we can give the general iteration of Halley’s family in Banach spaces [2]. When k = 1, we again obtain the Newton method H1, f (x) = N f (x). Note that the iterations of Hk,f (x) and Ek,f (x) both use the values of derivatives till order k of f at x and are convergent with order k+1, which is the maximal order for the information set {f (xn), f (xn), … , f (k) (xn)}[3]. Besides the Newton method, they are the most representative. In this review, based on Smale’s creative work[4] we will make a summarization about the research progress made by the authors and their colleagues on the local behavior, the semi-local behavior and the global behavior of these iterations as well as some applications in nonsmooth optimizations. 1 Smale’s point estimate theory ( ) Kantorovich’s classical theory. The wellknown work on the convergence of the Newton method due to Kantorovich[5,6], which is the beginning of modern investigation on methods of solving equations, is always our reference. Under the hypothesis that f is bounded on D and x0 is far enough from the boundary of D, the famous work transforms the convergence of the Newton method into the judgment of one bound of some criterion about x0. The result gives the proper hypothesis as well as the clear and definite conclusion so that it could be applied and generalized widely. ( ) Approximation zero. In the report written for the 20th International Conference of Mathematician [4], Smale also admired Kantorovich’s work very much. However, to investigate the complexity of the algorithm for solving equations, he thought we should not be tied by the Kantorovich’s frame but restart the investigation. In fact, due to some unexpected progress eventually, we see that Smale’s opinion of restarting is really foresight and sagacity. Smale[4] claimed that if there is an initial value x0 such that the sequence {xn} generated by the Newton method from x0 satisfies n ||xn−x*|| q2 −1||x0−x*|| , (1.1) 441 REVIEWS 1 , then the zero x* of f will be determined effec2 tively. Such an initial value x0 is called an approximation zero of the adjoint zero x* (from refs. [4, 7— 9] to refs. [10,11], the forming of the concept of Smale’s approximation zero goes through a process[12]. Here we adopt the final form from ref. [10]. By the way, we should point out that Chen [13] gave some other definitions of the approximation zero. But Theorem 7.2 in Wang’s paper[14] indicates that these definitions are equivalent to the Smale’s definition). Obviously, the complexity of solving equations lies at the cost of finding the approximation zero. ( ) Point estimate. For the end, we need to set up the criterion only to depend on the information of f at one point x and the rule to judge x an approximation zero. For the sake of extricating the dependence on domain information of f, Smale[4, 8] supposed that f is analytic at x (and the analyticity is not destroyed artificially on the convergence ball of Taylor series of f at x) and is defined for q = 1 (1.2) γ f ( x ) = sup f ′( x) −1 f ( k ) ( x ) / k! k −1 . k 2 Hereby, he introduced the criterions α f (x)=γ f (x)||f (x) −1 f(x)|| and δf (x)=γ f (x*)||x−x*|| and proved that if δ = δf (x) (5− 17 )/4, then the inequality (1.1) holds for q = δ/ (14δ + 2δ2). In particular, if δ (3− 7 )/2, x is an approximation zero of the adjoint zero x*. Moreover, the determination bound, as constant, is exact. This result can be viewed as the description of the local behavior of the Newton method, from which the following section will be spread. However, in order to yield the inequality (1.1) and to judge x an approximation zero, the bound given by Smale for the criterion αf (x) is not exact. Wang et al. [15] obtained the following exact result: if α = α f (x) 3− 2 2 , then the inequality (1.1) holds for q = 1 − α − 1 − 6α + α 2 1 − α + 1 − 6α + α 2 . In particular, if α (13 − 3 17 )/4, x is an approximation zero. This result can be viewed as the description of the semi-local behavior of the Newton method, from which section 3 will be spread. ( ) A problem of Smale. To explain the reasonableness of the criterion αf (x), Smale [4] proved αϕ(t) ϕ(t)= Σ di=0 t i , d 1, ∀t ∈ ( 0, 1), (1.3) where ∈ Υ{∞} , is the set of all natural numbers. Considering the extension of his criterion in Hilbert spaces, Smale presented an open problem in ref. [4] whether the inequality (1.3) remains true for ϕ (t ) = 442 (Σ ) d 2i 1/ 2 . i =0 t Wang et al.[16] proved that if 0 t 1/ 2 , αϕ(t) 1, and so they answered Smale’s problem negatively. ( ) Global complexity of algorithm of solving equations. In the report[4], Smale gave an estimate to the global complexity of the method for solving equations and applied it to some linear programming problems. Wang et al.[16] found that the result is not perfect and so completed it. The most part of the work around Smale’s report[4] had been completed before 1992. The main results can be found in Wang’s survey[17,18]. 2 Local behavior of iterations ( ) Local behavior. When x is near the solution x* of the equation (0.1), what properties does the iteration have? For example, in the case when x* is a single root, we have that || Ek , f ( x) − x * || || H k , f ( x ) − x* || || x − x * || k +1 , (2.1) as x x , that is, the kth iterations of Euler’s family and Halley’s family, as shown above, are both of convergence order k+1. These are the local behavior of iterations. Of course, they are rather rough. ( ) Convergence ball of the Newton method. Smale’s point estimate criterion provides the tool for the quantitative research of the local behavior of iterations. * Resumptively, his result gave the exact radii r = (5− 17 )/ (4γf (x*)) of the convergence ball of the Newton method. However, Traub et al.[19] and Wang[20] had proved before that if, for a positive constant L, || f ′( x * ) −1 ( f ′( x) − f ′( x′)) || L || x − x ′ || , (2.2) ∀x, x ′ ∈ D , 3 then the inequality (1.1) holds for q = Lρ(x0) provided 2 that ρ(x0)=||x0−x*|| 2/(3L). Moreover, 2/(3L) is the best constant, in other words, under condition (2.2), the exact radius of the convergence ball of the Newton method is r = 2/(3L). Wang[21] put forward the concept of so-called “weak condition”, which is weaker than condition that f is analytic at x and γf (x) γ, but still has the same quantitative relation. For convenience, we first give the definition. We say that f satisfies γ -condition of order k at x, if || f ′( x) −1 f (i ) ( x ) || i!γ i −1 , i = 2, Λ , k ; || f ′( x) −1 f ( k +1) ( x ′) || (k + 1)!γ k (1 − γ || x ′ − x ||) − k − 2 . (2.3) Wang et al.[22] showed that the exact radius of the convergence ball of the Newton method is still r = (5− 17 )/(4γ ) Chinese Science Bulletin Vol. 46 No. 6 March 2001 REVIEWS provided that f satisfies γ -condition of order 1 at x*. Furthermore, Wang[23] introduced the more general and weaker “radius Lipschitz condition with the Laverage”, which includes condition (2.2) and γ -condition of order 1 at x*: || f ′( x * ) −1 ( f ′( x ) − f ′( xτ )) || ρ ( x) ∫τρ( x) L(u )du, ∀x ∈ B( x * , r ), 0 τ 1. (2.4) where L is a positive nondecreasing function while xτ = x*+ τ(x−x*). In this case, he proved that if r satisfies 1 r (2.5) (r + u) L(u)du 1, r 0 then the Newton method with the initial value x0 ∈B(x*, r) is convergent and the inequality (1.1) holds for ∫ ∫ q= ρ ( x0 ) uL(u )du 0 . (2.6) ρ ( x0 ) L(u) du ρ ( x 0 )1 − 0 Moreover, the value of r given by (2.5) is exact. Thus, from (2.5), if taking L(u) = L a constant, and L(u) = 2γ (1−γ u)−3, respectively, we immediately obtain the exact radii r =2/(3L), given by Traub et al.[19] and Wang[20], and r ∫ = (5− 17 )/(4γ ), given by Wang et al. [22] and Smale[8]. ( ) Uniqueness ball of solution of the equations. In the convergence ball of the Newton method, the solution of eq. (0.1) is certainly unique. But could the uniqueness ball be larger? Could the conditions for f to satisfy be weaker? In fact, it is the case. Wang[23] introduced “the center Lipschitz condition with L average”: || f ′( x * ) −1 ( f ′( x) − f ′( x* )) || ρ ( x) ∫0 L(u)du, ∀x ∈ B( x * , r ). (2.7) This is actually the case when τ = 0 in (2.4), but here L is only required to be a positive integrable function. Under this condition, he proved that if r satisfies that 1 r (2.8) (r − u ) L(u)du 1, r 0 then eq. (0.1) has a unique solution in B(x*, r) and the value of r given by (2.8) is exact. Similarly, when we take L(u) to be a constant L, (2.8) gives the radius r = 2/L of the uniqueness ball of eq. (0.1) under condition (2.2), while when we take L(u)=2γ (1−γ u)−3, (2.8) give the radius r = 1/(2γ ) of the uniqueness ball of eq. (0.1), which is a complete improvement of the result that eq. (0.1) has unique ∫ solution in B(x*, (5− 17 )/(4γ )) in the case when γf (x*) γ due to Dedieu [24]. ( ) There are no universal constant. It should be noted that the radius of the uniqueness ball of eq. (0.1) is Chinese Science Bulletin Vol. 46 No. 6 March 2001 different from that of the convergence ball of the Newton method even if f is analytic at x* and γf (x*) γ . Huang[25] obtained the radii of the convergence balls of the Euler method and the Halley method E 2n, f ( x0 ) E 2n, f ( x 0 ) , which are, respectively, 0.181462/γ and 0.164878/γ. Therefore, there is no universal constant for the iterations, which is quite different from the semi-local behavior of iterations studied in the coming section. 3 Semi-local behavior of iterations ( ) Semi-local behavior. The properties of iteration methods determined completely by the informations of the initial value x0 are called the semi-local behavior. Kantorovich’s theory is considered as the successful model to investigate the semi-local behavior of iteration methods. But this also easily brought misleading that the more complicated bound was required in order to determine the convergence of the iterations that have higher order than the Newton method. The work led by Smale’s α criterion corrected this misleading in time. ( ) The universal constant. Wang et al. [2] showed that if αf (x 0) 3− 2 2 , the iteration of Halley’s family H kn, f ( x0 ) and the iteration of Euler’s family E kn, f ( x 0 ) as well as the Euler series E∞ , f ( x0 ) all are convergent. This result first opened up the unified property of the semi-local behavior of the iterations for finding zeros of equations. Under the hypothesis that f is analytic at x0, 3− 2 2 is a universal constant to judge the convergence by Smale’s α criterion. ( ) The universal constant is applied to operators with different smoothness. Note that Kantorovich’s theorem only assumes that f is bounded. The analyticity assumption of f at x0 seems too strong for the study of the semi-local behavior of the Newton method. Is the strong condition necessary to obtain a universal constant? Wang[21] pointed out that it is not the case. The γ-condition of order k in the last section is introduced for this purpose. It was proved that for any given positive integer k, if f satisfies γcondition of order k at x0, then α = γ ||f (x0)−1 f(x)|| 3− 2 2 implies that eq. (0.1) has unique solution in B ( x 0 , r ) and for all 1 j k, the iteration of Halley’s fam- ily H nj , f ( x0 ) converges, where r 1 r r2 while r1 1 + α µ 1 − 6α + α 2 . (3.1) = r2 4γ Recently Wang[26] proved again that the above result is still true for Euler’s family. ( ) Unifying of Kantorovich condition and Smale 443 REVIEWS condition. Smale’s condition has been weakened so that the semi-local behavior of the Newton method only needs γ-condition of order 1, 2γ . (3.2) || f ′( x0 ) −1 f ′ ( x ) || (1 − γ || x − x0 ||) 3 It seems possible to unify the condition and Kantorovich’s condition. For this end, similar to (2.7), Wang[14] introduced the center Lipschitz condition with L average at x0: || f ′( x0 ) −1 ( f ′( x) − f ′( x0 )) || ρ ( x) ∫0 L(u)du, ∀x ∈ B ( x 0 , r ), (3.3) where L is a positive integrable function. In this case, we proved that if β = || f ′( x) −1 f ( x 0 ) || b= r0 ∫0 uL(u) du, (3.4) then eq. (0.1) has a unique solution in the ball B ( x 0 , r ) : x* ∈ B ( x0 − f ′( x0 ) −1 f ( x0 ), r1 − β ) ⊂ B ( x0 , r1) , (3.5) where r0 satisfies r0 ∫0 L(u )du = 1 while r1 and r 2 are two positive zeros of h(t ) = β − t + with r1 r0 dition t ∫0 (t − u )L(u)du ∫ ρ ( x x ′) ρ ( x0 ) L (u )d u , ∀x ∈ B( x 0 , r ), ∀x ′ ∈ B ( x, r − ρ ( x)), (3.7) where L is a positive nondecreasing function and ρ ( x x ′) = || x ′ − x || + ρ ( x ) . In this case, we proved that if (3.4) holds, the sequence {xn} generated by the Newton method converges and satisfies || x * − xn || r1 − t n || x * − xn −1 || r −t 1 n −1 be judged by the same bound (3.4) only provided that condition (3.7) is strengthened properly. With this end, it suffices to assume that L′ is nondecreasing and f satisfies || f ′( x 0 ) −1 f ′′( x 0 ) || L(0), (3.9) ρ ( xx ′) −1 L ′(u )du . || f ′( x 0 ) ( f ′′( x ) − f ′′( x ′)) || ρ ( x) Obviously, the condition (3.9) implies the condition (3.7). Now we still take L(u) = 2γ (1−γu)−3 to give the weak Smale condition while take L(u) = γ + Cu to give the following condition of Kantorovich type. ∫ || f ′( x 0 ) −1 f ′′( x 0 ) || γ , || f ′( x 0 ) −1 ( f ′′( x ) − f ′′( x ′)) || C || x − x ′ || . (3.10) Under the condition (3.10), we have 2 γ + 2 γ 2 + 2C (3.11) r0 = , b= . 2 2 2 γ + γ + 2C 3 γ + γ + 2C The determination bound (3.11) was first raised in our seminar. By this determination bound, under the condition (3.10) (and the γ -condition by the determination bound b 2 (3.6) r2. Similar to (2.4), again introduce the con- || f ′( x0 ) −1 ( f ′( x) − f ′( x′)) || problems depend respectively on two different conditions, that is, the uniqueness and existence of the solution of eq. (0.1) depend on condition (3.3) and the convergence of the Newton method depends on condition (3.7), but condition (3.7) implies condition (3.3). We again see the unity of the semi-local behavior: the two problems are judged by the same bound (3.4) while the more basic problem depends on the weaker condition. It is expected that the convergence of the Euler method E 2n, f ( x0 ) and the Halley method H 2n, f ( x0 ) can 2 = (3− 2 2 )/γ ), some young members in our group such as D. F. Han, Z. D. Huang and K. W. Liang, respectively, investigated the semi-local behavior of many concrete iterations. It can be con( ) Operators classes of Smale. sidered that the operators satisfying the condition γf (x) γ define an operator class S (∞ ) . Consequently, the operators satisfying γ -condition of order k at x define a larger operator class S (∞ ) than the operator class S(k). Clearly, as k increases, they form a monotonic inclusion chain: where t n = N hn ( 0) . Taking L to be a constant and L(u) = 2γ (1−γu)−3, we obtain the result corresponding to Kantorovich condition and Smale condition, respectively. It is worth while attending that the two different S ⊃ S (1) ⊃ S ( 2) ⊃ Λ ⊃ S (∞ ) . (3.12) Roughly speaking, the largest class S is the operator class defined by (3.3) when L(u) = 2 γ (1− γu)−3. We call them the operator classes of Smale. Besides, we can also obtain some chains of generalized operator classes of Smale when γ -condition is extended. For any chain of operator classes 444 Chinese Science Bulletin Vol. 46 No. 6 March 2001 2n || x * − x0 || , r1 (3.8) REVIEWS of Smale, there exists a universal constant b such that the criterion (3.13) β = || f ′( x 0 ) −1 f ( x 0 ) || b is used to judge the existence and uniqueness of the zero of any operator in S, the convergence of the previous k iterations of Euler’s family and Halley’s family for any operator in S(k) and all iterations of Euler’s family and Halley’s family as well as Euler series and Bernoulli algorithm for any operator in S (∞ ) . Wang[27] considered elementarily the possibly generalized operator classes of Smale and the further results are given in the recent paper “Unified criterion of iteration for zero point in Banach spaces”. 4 Global behavior for polynomials ( ) In the view of discrete dynamical system . Many iterations, such as Euler’s family and Halley’s family, are rational when f is a polynomial so that the results due to Fatou and Julia on discrete dynamics can be applied. Investigating the global behavior of iterations in this view, we can hold the main problems completely. For example, if a “dead circle” appeared for some iteration, we might think this is due to the fault of the iteration itself before. But in the view of discrete dynamics, “dead circle” is simply a periodic orbit. The closure of the exclusive periodic points of a rational iteration is called Julia set, which, for a common iteration, is a nondense perfect set that has as many points as the reals. Therefore, it is not strange that a “dead circle” appears for some iteration. In fact, there must be many dead circles for every iteration. ( ) Without generally convergent one-point stationary algorithm. In the view of discrete dynamical system, the most important problem is whether there exists any generally convergent one-point stationary algorithm. “General convergence” means that there exists an open set with full measure U ⊂ Fd × such that the algorithm with the initial value z converges to the zero of f for any ( f , z ) ∈ U , where Fd is the linear space of all polynomials of degree d, = Υ{∞} . McMullen[28] answered negatively this problem. He proved that there does not exist any generally convergent one-point stationary algorithm when d >3. ( ) Positive measure set of initial values for the Newton method to fail. From the above fact, for every rational iteration, there is an open set with positive measure such that the iteration fails. In particular, for the Newton method, Curry et al.[29], according to Fatou’s theorem[30] that the basin of a rational iteration must contain at least one of its critical points, used computers to search cubic polynomials that have an initial value domain with positive measure such that the iteration is not convergent. They also drew some pictures similar to Mandelbrot set (see the color Chinese Science Bulletin Vol. 46 No. 6 March 2001 pictures in ref. [31]). Smale [32] constructed directly such a cubic polynomial: 1 f 0 ( x) = x 3 − x + 1 . 2 The Newton iteration for this polynomial has a superattractive periodic orbit {0,1} of period 2. In the congress report “Dynamics of Newton’s method on Riemann sphere” given in the Fifth Conference of Computational Mathematical Society of China in 1995 [31], Wang generalized the Smale’s example. He constructed a cubic polynomial fλ(z) such that Newton’s method has a periodic orbit {0,1} of period 2 and a multiplier λ. Moreover, for different λ, he also drew four types of Sullivan domains to make the Newton method fail[33]. By Shishikura’s theorem, it follows that the necessary condition for Julia set to be disconnected is that there exist two fixed points with multiplier λ = 1 or | λ | >1. This implies that the Julia set for the Newton iteration is connected (it has only one fixed while others with | λ | < 1) so point with | λ | > 1, i.e. that it has not any Harman ring. ( ) There is almost no phenomenon of phony convergence for the iterations of Halley’s family. There is no end for finding the initial value domain with positive measure such that the iteration is not convergent. One important problem is whether periods of periodic orbits corresponding to these Sullivan’s basins are more than 1 or some of them are equal to 1. If there is any periodic orbit of period 1, i.e. the fixed point, then there are a lot of phenomena of “phony convergence”. In general, zeros of a polynomial are fixed points of the iteration. However, as the topological degree of the iteration is often more than the degree of the polynomial (the topological degree of the Newton iteration is equal to the degree of the polynomial, which is its advantage), the iteration generally has some fixed points that are not zeros of the polynomial. We call such fixed points the extraneous fixed points. The exclusive extraneous fixed point need not be worried about (since there is no initial value set with positive measure corresponding to it such that the iteration is not convergent). But for other fixed points, especially the superattractive or attractive extraneous fixed points, when we take the initial value in the Sullivan’s domain corresponding to it, the iteration sequence converges to a point that is not a zero of the polynomial. This is the phenomenon of “phony convergence”, which is also called “numerical extraneous root”. Smale[32] proved that the Newton iteration has no extraneous fixed point in the Gauss plane (this is seldom met with the iterations). In addition, is an exclusive extraneous fixed point of the Newton iteration so that the Newton iteration cannot diverge to . Vrscay et al. [34] n point out that the Halley iteration H 2, P ( x ) and the iteration H kn, P ( x ) of Halley’s family for P(x) = xd−1(d 2) 445 REVIEWS only have exclusive extraneous fixed points in the Gauss plane. Wang et al. [1] solved the problem completely. They proved that, for any polynomial f ∈ Fd , all the extraneous fixed (at most (k−1)(d−1)+1) points of any kth iteration of Halley’s family in the whole Riemann plane are exclusive. Therefore, there are almost no phony convergence for any iteration of Halley’s family and almost no phenomenon of “numerical extraneous roots” appear. ( ) There exists the phenomenon of phony convergence for the iterations of Euler’s family in an open set with positive measure. Opposite to Helley’s family, any iteration, but the Newton iteration, of Euler’s family has non-exclusive extraneous fixed points of every type, which was given by Wang et al. in the paper “Dynamics of the iteration of Euler’s family and Halley’s family”. In particular, for the Euler iteration E 2n, f ( x) , we gave a polynomial to have the extraneous fixed point of the eigenvalue λ, where λ = r exp(2πα i) is any complex number[35]. Let the parameters r and α in the expression of λ take the different values. We obtain that the extraneous fixed points are respectively superattractive (r = 0), attractive (0 < r < 1), rationally indifferent (r = 1, α rational number) and Siegel point (r = 1, α is irrational number satisfying Bryuno [36] condition (ref. [37]). Sullivan domains adjoint to these points, that is, superattractive domain, attractive domain, parabolic domain and Siegel disk, which are drawn in the paper[35], are the initial value domains with positive measure that makes the Euler method fail. Taking the initial value in these Fatou domains (i.e. superattractive domain, attractive domain, parabolic domain), the phenomenon of “phony convergence”, that is, “numerical extraneous root”, appears for the Euler method. Plate is the color picture of Sullivan domains of the Euler iteration for the polynomial f0(x) = x3 + x2 + 2x−4. The sizes of its windows are [−3,3] [−2,2]. The red domain is the superattractive Sullivan corresponding to the extraneous fixed point z = 0. The three roots of f0(x) are be written as an optimization problem as follows in a Banach space: min F(x):= h(f(x)), (5.1) where f is a continuously Frechet differentiable operator from a real reflexive Banach space X to another Y while h is a convex function on Y. This problem has recently been received a great deal of attention and is justifiable, so many problems in optimization theory, such as nonlinear inclusion, minimax problems and penalization methods, can be cast within this framework. Moreover, this model provides a unifying framework for the development and analysis of algorithmic solution techniques. The GaussNewton method is one of the main methods for solving convex composite optimization problems. ( ) Gauss-Newton method. The Gauss-Newton method was proposed by Gauss in the early 19th century for finding the solution of the following nonlinear leastsquare problem: min f (x)Tf (x). (5.2) Abandoning the term containing the second derivative of f(x), we obtain the basic Gauss-Newton method: x n +1 = x n − [ f ′( x n ) T f ′( x n )]−1 f ′( x n ) T f ( x n ) . (5.3) Naturally, in the iteration (5.3), f (xn) need to be full column rank. To relax this limitation, we linearize f and solve the linear least-square problem: min(f (xn)+f (xn)d)T(f (xn)+f (xn)d), (5.4) to get the solution dn. Then set xn+1 = xn+dn. If using the notation of generalized inverse, the Gauss-Newton method can be rewritten as xn+1 = xn−[f (xn)]+f(xn). (5.5) Applying this idea to convex composite optimization problems, we get the Gauss-Newton method of solving problem (5.1): For ∆ 0, x X, let D∆(x) denote the set of solutions of the following optimization problem (5.6) min h( f ( x) + f ′( x )d ) ||d || ∆ Convex ( ) Convex composite optimization. composite optimization problems are one of the main directions of the nonsmooth optimization theory, which can and let d(x,C) denote the distance from x to C. Then the (n+1)th step of the Gauss-Newton method is xn+1 = xn+dn, (5.7) where dn D∆(xn) is chosen to satisfy ||dn|| ηd(0, D∆(xn)), while η 1 has been fixed before. ( ) Semi-local behavior in the normal case. When X = n, Y = m, using the idea of strong uniqueness in approximation, Womersley[38] gave the convergence condition that the Gauss-Newton method converges to the local solution x* at a quadratic rate under the assumption that the local solution x* of the convex composite optimization problem (5.1) is strongly unique. This result generalizes the corresponding result[39] in the case when h is a norm. Comparing with Womersley’s work, Burke et al.[40] made great progress in the study of this direction. Their research 446 Chinese Science Bulletin Vol. 46 No. 6 March 2001 respectively 1, −1 ± i 3 . This picture is helpful for watching these color pictures[35] because the size of the windows is only [−0.2, 0.2] [−0.2, 0.2] so that the three roots of fλ(x) cannot be seen clearly. In fact, fλ is just a disturbance of f0 on its coefficients, so the places of the three roots of fλ are just the disturbance of the corresponding three roots of f0. One open problem is whether there do exist Harman rings for the iterations of Euler’s family. 5 Gauss-Newton method for convex composite optimization REVIEWS is based on the following two basic assumptions: one is the weak sharpness of the set C of minima for h, which is a weakening and generalizing of the strong uniqueness, that is, there exists λ > 0 such that (5.8) h(y) hmin+λd(y,C), ∀ y Y, and the other is the existence of the regular points x of the inclusion f(x) C. Under the two assumptions and the condition that h and f ′ satisfy Lipschitz condition in the (5.11) T −1 ( x 0 ) y = {z ∈ X : f ′( x 0 ) z ∈ y + C} . Let L be the nondecreasing function given in Section 3. Suppose that corresponding neighborhoods of f (x ) and x , respectively, Burke and Ferris provide a neighborhood of x , determined by some information on x , such that, with the initial point x0 from this neighborhood, the sequence generated by the Gauss-Newton method converges to a solution x* of problem (5.1) at a quadratic rate. Of course, the information on x is closely related to the Lipschitz constants of h and f ′ and the constant λ in (5.8). It is worth noting that their result does not require that the solution of the problem (5.1) is unique. Although the weak sharpness along with the regularity of x* implies the strong uniqueness of the solution x* in the case that C is a singleton, Burke and Ferris’ assumptions are more proper because they are made on h and f, respectively. However, the assumption that the minima set C of h is a weak sharp minima is quite strong. In fact, if some partial derivative of h exists, then its minima set C will not be weak sharp. The recent work due to Li and Wang relaxes this restriction. In the paper[41, 42], only in the assumption that the inclusion f(x) C has the regular point x and f ′ satisfies Lipschitz condition in some neighborhood of x , did we obtain the same results for any convex function h. In this case, the information on x is independent from h so as to be simpler and clearer. ( ) Semi-local behavior in the surjective case . One problem worth being considered is whether the majorizing function technique for the Newton method to solve nonlinear equations in section 3 can be applied to the Gauss-Newton method for complex convex optimization problems. This work has been completed by Li and Wang in the recent paper “Point estimate of the Gauss-Newton method for complex convex optimization problems”. Assume that X, Y are two reflexive Banach spaces, and the minima set C for h is a nonempty closed convex cone with 0 C . Now for x0 X the convex pro- cess is defined as follows: T(x0)x = f (x0)x−C, ∀ x X. (5.9) According to the definition of Robinson[41], the norm and the inverse of the convex process T(x0) are respectively defined as ||T(x0)|| = sup {inf || y ||: y ∈ T ( x 0 ) x} (5.10) Furthermore, assume that r0 satisfies || x|| 1 and Chinese Science Bulletin Vol. 46 No. 6 March 2001 ρ ( xx ′) ∫ρ ( x) || T −1 ( x 0 ) || || f ′( x ) − f ′( x ′) || L(u) du , ∀x ∈ B( x 0 , r ), ∀x ′ ∈ B ( x , r − ρ ( x )) . (5.12) b= r0 ∫0 uL(u)du as in section 3( r0 ∫0 L(u) du = 1, and ) while r1 r2 are two positive zeros of the majorizing function t ∫0 hβ , L = β − t + (t − u ) L(u)du. (5.13) Let the sequence {tn} be generated by the Newton method with the initial value t0 = 0 for the function hβ,L. Under the assumption that T(x0) is surjective and condition (5.12) is satisfied, we proved that β = ||x 1−x0|| b implies the sequence {xn} generated by the Gauss-Newton method with the initial value x0 (when η = 1, and ∆ is large enough) converges to a solution x * ∈ B ( x 0 , r ) of problem (5.1) at a quadratic rate and the following inequality holds: || xn − x* || r − tn, where r1 r r 2 if β b and r = r1 = r2 if β = b. In particular, taking L constant implies the generalization of the famous Kantorovich theorem in convex composite optimization while taking L(u) = 2γ (1−γu)−3 implies the corresponding generalization of the theorem of Smale type. Acknowledgements This work was jointly supported by the Special Funds for Major State Basic Research Projects (Grant No. G19990328), the National Natural Science Foundation of China (Grant No. 19971013) and Zhejiang (Grant No. 100002) and Jiangsu Provincial (Grant No. BK99001) Natural Science Foundation of China as well. References 1. 2. 3. 4. 5. 6. 7. 8. Wang, X. H., Han, D. F., On fixed points and Julia sets for iterations of two families, Chinese J . Numer. & Appl., 1997, 19(3): 94. Wang, X. H., Zheng, S. M., Han, D. F., Convergence on Euler series, the iterations Euler’s and Helley’s families, Acta Math. Sinica (in Chinese), 1990, 33(6): 721. Wozniakowski, H., Generalized information and maximal order of iteration for operator equations, SIAM J. Numer. Anal., 1975, 12: 121. Smale, S., The fundamental theory for solving equations, in proceedings of the international congress of mathematicians, AMS, Providence, RI, 1987, 185. Kantorovich, L. V., On Newton method for functional equations, Dokl. Acad. N. USSR, 1948, 59(7): 1237. Kantorovich, L. V., Akilov, G. P., Functional Analysis, New York: Pergamon Press, 1982. Smale, S., The fundamental theorem of algebra and complexity theory, Bull. (New Ser.), Amer. Math. Soc., 1981, 4: 1. Smale, S., Newton’s method estimates from data at one point, in the Merging of Disciplines: New Directions in Pure, Applied and 447 REVIEWS 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 448 Computational Mathematics (eds. Ewing, R., Gross, K., Martin, C.), New York: Springer, 1986, 185 196. Wang, X. H., Xuan, X. H., Random polynomial space and complexity theory, Scientia Sinica Ser. A, 1987, 30(7): 637. Smale, S., Complexity theory and numerical analysis, Act. Numer., 1997, 6: 523. Blum, L., Cucker, F., Shub, M. et al., Complexity and Real Computation, New York: Springer, 1998. Wang, X. H., Definite version on precise point estimate, Progress in Natural Science, 1998, 8(2): 152. Chen Pengyuan, Approximate zeros of quadratically convergent algorithms , Math. Computation, 1994, 63: 247. Wang, X. H., Convergence of Newton’s method and inverse function in Banach spaces, Math. Comput., 1999, 68: 169. Wang, X. H., Han, D. F., On dominating sequence method in the point estimate and Smale theorem, Scientia Sinica Ser. A, 1990, 33(2): 135. Wang, X. H., Han, D. F., Shen, G. X., Some remarks on Smale’s “Algorithms for solving equations”, Act. Math. Sinica (New Ser.), 1992, 8(4): 337. Wang, X. H., Some results relevant to Smale’s reports, in From Topology to Computation: Proceedings of the Smalefest (eds. Hirsch, M., Marsden, J., Shub, M.), New York: Springer, 1993, 456 465. Wang, X. H., A summary on continuous complexity theory, Contemp. Math., 1994, 162: 155. Traub, J., Wozniakowski, H., Convergence and complexity of Newton iteration for operator equation, J. Assoc. Comput. Mach., 1979, 29: 250. Wang, X. H., Convergent neighborhood on Newton’s method, Kexue Tongbao (Chinese Science Bulletin), Special Issue of Math., Phy. and Chemistry, 1980, 25: 36. Wang, X. H., Convergence on the iteration of Halley family in weak conditions, Chinese Science Bulletin, 1997, 42(7): 552. Wang, X. H., Han, D. F., Criterion α and Newton’s method, Chinese J. Numer. and Appl. Math., 1997, 19(2): 96. Wang, X. H., Convergence of Newton’s method and uniqueness of the solution of equations in Banach space, IMA J. Numer. Anal., 2000, 20(1): 123. Dedieu, J. P., Estimations for the separation number of a polynomial system, J. Symbolic Comput., 1997, 21: 1. Huang, Z. D., On a family of Chebyshev-Halley type methods in Banach space under weak Smale conditions, Numer. Anal., JCU, 2000, 9(1): 37. Wang, X. H., Convergence of iterations of Euler family under weak condition, Science in China, Ser. A, 2000, 43(9): 958. Wang, X. H., Convergence of the iteration of Halley’s family and Smale operator class in Banach space, Science in China, Ser. A, 1998, 41(7): 700. McMullen, C., Families of rational maps and iterative root finding algorithms , Ann. Math., 1987, 125: 467. Curry, H., Garnett, L., Sullivan, D., On the iteration of a rational function: computer experiments with Newton’s method, Commun. Math. Phys., 1983, 91: 267. Fatou, P., Sur les equations fonctionnalles, Bull. Soc. Math. France, 1919, 47: 161 271; 1920(48): 33 94; 208 314. Si, Z. C., Yuan, Y. X., Wonderful Computation, Changsha: Hunan Science and Technique Press, 1999. Smale, S., On the efficiency of algorithms of analysis, Bull. (New Ser.) Amer. Math. Soc., 1985, 13: 87. Sullivan, D., Quasiconformal homeomorphisms and dynamics, Part : Solution of the Fatou-Julia problem on wandering domains, Ann. Math., 1985, 122: 401. Vrscay, E. R., Gilbert, W. J., Extraneous fixed points,basin boundaries and chaotic dynamics for Schröder and König rational 35. 36. 37. 38. 39. 40. 41. 42. 43. iteration functions, Numer. Math., 1988, 52: 1. Wang, X. H., Han, D. F., The extraneous fixed points of Euler iteration and corresponding Sullivan’s basin, Science in China, Ser. A, 2001, 44(3): 292. Bryuno, A. D., Convergence of transformations of differential equations to normal forms , Dokl. Akad. Nauk. USSR, 1965, 165: 987. Yoccoz, J. C., Linearisation des germes de diffeomorphismes holomorphes de (C, O), C. R. Acad. Sci. Paris., 1988, 36: 55. Womersley, R. S., Local properties of algorithms for minimizing nonsmooth composite functions, Mathematical Programming, 1985, 32: 69. Jittorntrum, K., Osborne, M. R., Strong uniqueness and the second order convergence in nonlinear discrete approximation, Numerische Mathematik, 1980, 34: 439. Burke, J. V., Ferris, M. C., A Gauss-Newton method for convex composite optimization, Mathematical Programming, 1995, 71: 179. Li, C., Wang, X. H., Gauss-Newton methods for a class of nonsmooth optimization problems, Progress in Natural Science, 2000, 10(6): 470. Li, C., Wang, X. H., On Convergence of the Gauss-Newton method convex composite optimizction, Math. Programming, 2000, 94: to appear. Robinson, S., Normed convex processes, Trans. Amer. Math. Soc., 1972, 174: 127. (Received October 30, 2000) Chinese Science Bulletin Vol. 46 No. 6 March 2001