IMA Journal of Numerical Analysis (2011) 31, 322–347 doi:10.1093/imanum/drp015 Advance Access publication on October 11, 2009 Kantorovich’s theorems for Newton’s method for mappings and optimization problems on Lie groups J IN -H UA WANG† Department of Mathematics, Zhejiang University of Technology, Hangzhou 310032, People’s Republic of China C HONG L I‡ Department of Mathematics, Zhejiang University, Hangzhou 310027, People’s Republic of China [Received on 4 June 2008; revised on 1 April 2009] With the classical assumptions on f , a convergence criterion of Newton’s method (independent of affine connections) to find zeros of a mapping f from a Lie group to its Lie algebra is established, and estimates of the convergence domains of Newton’s method are obtained, which improve the corresponding results in Owren & Welfert (2000, BIT Numer. Math., 40, 121–145) and Wang & Li (2006, J. Zhejiang Univ. Sci. A, 8, 978–986). Applications to optimization are provided and the results due to Mahony (1996, Linear Algebra Appl., 248, 67–89) are extended and improved accordingly. Keywords: Newton’s method; Lie group; Lipschitz condition. 1. Introduction Recently, there has been an increased interest in studying numerical algorithms on manifolds. Classical examples are given by eigenvalue problems, symmetric eigenvalue problems, invariant subspace computations, optimization problems with equality constraints, etc. (see, for example, Luenberger, 1972; Gabay, 1982; Smith, 1993, 1994; Mahony, 1994, 1996; Udriste, 1994; Edelman et al., 1998; Owren & Welfert, 2000; Adler et al., 2002; Absil et al., 2007; Li et al., 2009a). In particular, optimization problems on Lie groups or homogeneous spaces have been studied recently in the context of using continuous-time differential equations for solving problems in numerical linear algebra. For example, let φ: G → R be given by φ(x) = −tr(x T Qx D) for each x ∈ G, (1.1) where G = SO(N , R) := {x ∈ R N ×N |x T x = I N and det x = 1}, D ∈ R N ×N is the diagonal matrix with diagonal entries 1, 2, . . . , N and Q is a fixed symmetric matrix. Brockett (1988, 1991) and Chu & Driessel (1990) considered the following optimization problem: min φ(x). x∈G (1.2) † Corresponding author. Email: wjh@zjut.edu.cn ‡ Email: cli@zju.edu.cn c The author 2009. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 AND NEWTON’S METHOD ON LIE GROUPS 323 where g: G → G is a differentiable map and x (0) is a random starting point. The application of one step of the backward Euler method on (1.3) leads to the fixed-point problem x = x (0) ∙ exp(lg(x)), (1.4) where l represents the size of the discretization step. Clearly, solving the problem (1.4) is equivalent to solving the equation f (x) = 0, (1.5) where f : G → G is the mapping defined by f (x) = exp−1 ((x (0) )−1 ∙ x) − lg(x) for each x ∈ G. Owren & Welfert (2000) introduced Newton’s method, independent of affine connections on the Lie group, for solving the equation (1.5) and showed that Newton’s method for the map with its differential satisfying the classical Lipschitz condition is locally convergent quadratically. Recently, the authors of the present paper studied the problems of existence, uniqueness of solutions of (1.5) and estimates of convergence balls of Newton’s method for (1.5) in Wang & Li (2007), where, however, all results except Theorem 2 of Wang & Li (2007) on the convergence of Newton’s method were for Abelian groups. Extensions of Smale’s point estimate theory for Newton’s method on Lie groups were presented in Li et al. (2009b). In a vector space framework, as is well known, one of the most important results on Newton’s method is Kantorovich’s theorem (cf. Kantorovich & Akilov, 1982). Under the mild condition that the second Fréchet derivative of F is bounded (or, more generally, the first derivative is Lipschitz continuous) on a proper open metric ball of the initial point x0 , Kantorovich’s theorem provides a simple and clear criterion, based on the knowledge of the first derivative around the initial point, ensuring the Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Brockett (1988, 1991) showed that the minimum x ∗ ∈ G occurs when x ∗T Qx ∗ is a diagonal matrix with diagonal entries (eigenvalues of Q) in ascending order. Thus solving the nonlinear optimization problem is equivalent to solving the numerical linear algebra problem of computing the eigenvalues and eigenvectors of the matrix Q. There are many other examples where the linear algebra problems are formulated as optimization problems on Lie groups or homogeneous spaces (cf. Bayer & Lagarias, 1989; Helmke & Moore, 1990; Smith, 1991; Mahony et al., 1993). For a general differentiable function φ on a Lie group some numerical optimization algorithms for solving (1.2) have been studied extensively (see, for example, Brockett, 1993; Mahony et al., 1993; Shub & Smale, 1993; Smith, 1993; Mahony, 1994; Moore et al., 1994). In particular, Mahony used one-parameter subgroups of a Lie group to develop a version of Newton’s method on an arbitrary Lie group in Mahony (1996), where the approach for solving (1.2) via Newton’s method was explored and the local convergence was analysed. In this manner the algorithm presented is independent of affine connections on the Lie group. On the other hand, motivated by looking for approaches to solving ordinary differential equations on Lie groups, Owren & Welfert (2000) considered the implicit method for Lie groups, where they used the implicit Euler method as a generic example of an implicit integration method in a Lie group setting. Consider the initial value problem on G ( 0 x = x ∙ g(x), (1.3) x(0) = x (0) , 324 J.-H. WANG AND C. LI where d(∙, ∙) is the Riemannian distance induced by the left-invariant Riemannian metric. Clearly, this kind of Lipschitz condition is still dependent on the metric on the underlying group and is generally very difficult to verify in a noncompact group (cf. Example 3.8 in Section 3). The purpose of the present paper is to establish Kantorovich’s theorem (independent of the connection) for Newton’s method on a Lie group. More precisely, under the assumption that the differential of f satisfies the Lipschitz condition around the initial point (which is in terms of one-parameter semigroups and is independent of the metric and weaker than the metric Lipschitz condition (1.6)), the convergence criterion of Newton’s method for solving (1.5) is established in Section 3. As a consequence, estimates of the convergence domains are also obtained. The main feature of our results (Theorems 3.1 and 3.3) is that they are completely independent of the metrics on the groups. In particular, Theorem 3.1 improves and extends Theorem 2 of Wang & Li (2007), while Theorem 3.3 (and its corollaries) improves and extends the corresponding result in Owren & Welfert (2000). In Section 4 we will show that Newton’s Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 existence, uniqueness of the solution of the equation and the quadratic convergence of Newton’s method. Another important result on Newton’s method is Smale’s (1986) point estimate theory (i.e., α-theory and γ -theory), where the notion of approximate zeros was introduced and the rules for judging an initial point x0 to be an approximate zero were established, depending on the information of the analytic nonlinear operator at this initial point and at a solution x ∗ . There has been a lot of work on the weakness and/or the extension of the Lipschitz continuity made on the mappings (see, for example, Zabrejko & Nguen (1987), Gutiérrez & Hernández (2000), Wang (2000), Ezquerro & Hernández (2002a,b) and the references therein). In particular, Zabrejko & Nguen (1987) parameterized the classical Lipschitz continuity. Wang (2000) introduced the notion of Lipschitz conditions with an L-average to unify both Kantorovich’s and Smale’s criteria. In a Riemannian manifold framework an analogue of the well-known Kantorovich’s theorem was given in Ferreira & Svaiter (2002) for Newton’s method for vector fields on Riemannian manifolds, while extensions of the famous Smale’s (1986) α-theory and γ -theory to analytic vector fields and analytic mappings on Riemannian manifolds were provided in Dedieu et al. (2003). In the recent paper Li & Wang (2006) the convergence criteria in Dedieu et al. (2003) were improved by using the notion of the γ -condition for the vector fields and mappings on Riemannian manifolds. The radii of uniqueness balls of singular points of vector fields satisfying the γ -conditions were estimated in Wang & Li (2006), while the local behaviour of Newton’s method on Riemannian manifolds was studied in Li & Wang (2005). Recently, inspired by the previous work of Zabrejko & Nguen (1987) on Kantorovich’s majorant method, Alvarez et al. (2008) introduced a Lipschitz-type radial function for the covariant derivative of vector fields and mappings on Riemannian manifolds and established a unified convergence criterion for Newton’s method on Riemannian manifolds. In the spirit of the works mentioned above, a natural and interesting problem is whether an analogue of the well-known Kantorovich’s theorem (independent of the connection) can be established for Newton’s method (for solving (1.5) and/or the optimization problem (1.2)) on Lie groups. On a finitedimensional completed and connected Riemannian manifold M, Newton’s method is defined in terms of geodesics, and the property that there is at least one geodesic to connect any two points of M plays a key role in the study. Newton’s method on a Lie group is defined in terms of one-parameter subgroups. However, there is no similar property for one-parameter subgroups in an arbitrary Lie group, which makes the study on a Lie group more complicated. In the recent paper Wang & Li (2007) we gave a kind of Kantorovich’s theorem for (1.5) by using the following metric Lipschitz condition at x0 : −1 (1.6) d f x0 (d f x 0 − d f x ) 6 Ld(x 0 , x) ∀ x 0 , x ∈ G with d(x0 , x) + d(x, x 0 ) < r1 , NEWTON’S METHOD ON LIE GROUPS 325 method for solving the problem (1.2) is equivalent to the one for solving (1.5), where f is a map from a Lie group to its Lie algebra associated to φ, and, as applications, the convergence criterion and the estimates of the convergence domains of Newton’s method for solving the problem (1.2) are provided. The results on the convergence domains improve the corresponding results due to Mahony (1996), while the result on the convergence criterion seems new for Newton’s method for solving the problem (1.2). Examples are provided to show that our results in the present paper are applicable but not the results in Mahony (1996) and Owren & Welfert (2000). Most of the notions and notation that are used in the present paper are standard (see, for example, Helgason, 1978; Varadarajan, 1984). A Lie group (G, ∙) is a Hausdorff topological group with countable bases that also has the structure of an analytic manifold such that the group product and the inversion are analytic operations in the differentiable structure given on the manifold. The dimension of a Lie group is that of the underlying manifold, and we shall always assume that it is m-dimensional. The symbol e designates the identity element of G. Let G be the Lie algebra of the Lie group G that is the tangent space Te G of G at e, equipped with the Lie bracket [∙, ∙]: G × G → G. In what follows we will make use of the left translation of the Lie group G. We define for each y ∈ G the left translation L y : G → G by L y (z) = y ∙ z for each z ∈ G. (2.1) The differential of L y at z is denoted by (L 0y )z , which clearly determines a linear isomorphism from Tz G to the tangent space T(y∙z) G. In particular, the differential (L 0y )e of L y at e determines a linear isomorphism from G to the tangent space Ty G. The exponential map exp : G → G is certainly the most important construction associated to G and G and is defined as follows. Given u ∈ G let σu : R → G be the one-parameter subgroup of G determined by the left-invariant vector field X u : y 7→ (L 0y )e (u), i.e., σu satisfies that σu (0) = e and σu0 (t) = X u (σu (t)) = L 0σu (t) (u) for each t ∈ R. (2.2) e The value of the exponential map exp at u is then defined by exp(u) = σu (1). Moreover, we have that exp(tu) = σtu (1) = σu (t) for each t ∈ R and u ∈ G (2.3) and exp(t + s)u = exp(tu) ∙ exp(su) for any t, s ∈ R and u ∈ G. (2.4) Note that the exponential map is not surjective in general. However, the exponential map is a diffeomorphism on an open neighbourhood of 0 ∈ G. In the case when G is Abelian, exp is also a homomorphism from G to G, i.e., exp(u + v) = exp(u) ∙ exp(v) for all u, v ∈ G. (2.5) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 2. Notions and preliminaries 326 J.-H. WANG AND C. LI In the non-Abelian case exp is not a homomorphism and, by the Baker–Campbell–Hausdorff formula (cf. Varadarajan, 1984, p. 114), (2.5) must be replaced by exp(w) = exp(u) ∙ exp(v) (2.6) 1 1 w := u + v + [u, v] + ([u, [u, v]] + [v, [v, u]]) + ∙ ∙ ∙ . 2 12 (2.7) for all u and v in a neighbourhood of 0 ∈ G, where w is defined by In particular, f x0 4x = d f (x ∙ exp(t (L 0x −1 )x 4x )) dt t=0 Define the linear map d f x : G → G by d f (x ∙ exp(tu)) d fx u = dt t=0 for each 4x ∈ Tx G. for each u ∈ G. (2.9) (2.10) Then, by (2.9), we have d f x = f x0 ◦ (L 0x )e . (2.11) Also, by definition, we have for all t > 0 that d f (x ∙ exp(tu)) = d f x∙exp(tu) u dt and f (x ∙ exp(tu)) − f (x) = Z for each u ∈ G (2.12) t d f x∙exp(su) u ds 0 for each u ∈ G. (2.13) For the remainder of the present paper we always assume that h∙, ∙i is an inner product on G and k ∙ k is the associated norm on G. We now introduce the following distance on G that plays a key role in the study. Let x, y ∈ G and define ) ( k there exist k > 1 and u 1 , . . . , u k ∈ G such that X ku i k , (2.14) %(x, y) := inf y = x ∙ exp u 1 ∙ ∙ ∙ exp u k i=1 where we adopt the convention that inf ∅ = +∞. It is easy to verify that %(∙, ∙) is a distance on G and that the topology induced by this distance is equivalent to the original one on G. Let x ∈ G and r > 0. We denote the corresponding ball of radius r around x of G by Cr (x), that is, Cr (x) := {y ∈ G|%(x, y) < r }. Let L(G) denote the set of all linear operators on G. Below we shall use the notion of the L-Lipschitz condition and a useful lemma. Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Let f : G → G be a C 1 -map and let x ∈ G. We use f x0 to denote the differential of f at x. Then, by DoCarmo (1992, p. 9) (the proof given there for a smooth mapping still works for a C 1 -map), for each 4x ∈ Tx G and any nontrivial smooth curve c: (−ε, ε) → G with c(0) = x and c0 (0) = 4x , one has that d f x0 4x = . (2.8) ( f ◦ c)(t) dt t=0 327 NEWTON’S METHOD ON LIE GROUPS D EFINITION 2.1 Let r > 0, let x0 ∈ G and let T be a mapping from G to L(G). Then T is said to satisfy the L-Lipschitz condition on Cr (x0 ) if kT (x ∙ exp u) − T (x)k 6 Lkuk (2.15) holds for any u ∈ G and x ∈ Cr (x0 ) such that kuk + %(x, x0 ) < r . Proof. We write y0 = x0 and yi+1 = yi ∙ exp u i for each i = 0, . . . , k. Since (2.15) holds with T = d f x−1 d f , one has that 0 −1 (2.17) d f x0 d f yi ∙exp u i − d f yi 6 Lku i k for each 0 6 i 6 k. Noting that yk+1 = x, we have that −1 − d f d f d f x0 d f x − IG = d f x−1 y ∙exp x u k k 0 0 6 k X −1 d f x0 d f yi ∙exp u i − d f yi i=0 =L k X i=0 ku i k ! < 1. Thus the conclusion follows from the Banach lemma and the proof is complete. 3. Convergence criteria Following Owren & Welfert (2000), we define Newton’s method with initial point x0 for f on a Lie group as follows: (3.1) f (x ) for each n = 0, 1, . . . . xn+1 = xn ∙ exp −d f x−1 n n Let β > 0 and L > 0. The quadratic majorizing function h, which was used in Kantorovich & Akilov (1982) and Wang (2000), is defined by h(t) = L 2 t −t +β 2 for each t > 0. (3.2) Let {tn } denote the sequence generated by Newton’s method with initial value t0 = 0 for h, that is, tn+1 = tn − h 0 (tn )−1 h(tn ) for each n = 0, 1, . . . . (3.3) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 L EMMA 2.2 Let 0 < r 6 L1 and let x0 ∈ G be such that d f x−1 exists. Suppose that d f x−1 d f satisfies 0 0 ∈ Cr (x0 ) be such that there exist k > 1 and u 0 , . . . , u k ∈ G the L-Lipschitz condition on Cr (x0 ). Let x P k satisfying x = x0 ∙ exp u 0 ∙ ∙ ∙ ∙ ∙ exp u k and i=0 ku i k < r . Then d f x−1 exists and 1 −1 (2.16) d f x d f x0 6 . Pk 1−L i=0 ku i k 328 J.-H. WANG AND C. LI Assume that λ := Lβ 6 12 . Then h has two zeros r1 and r2 given by √ √ 1 − 1 − 2λ 1 + 1 − 2λ and r2 = . r1 = L L (3.4) Moreover, {tn } is monotonic increasing and convergent to r1 and satisfies that n where ξ= 1− 1+ for each n = 0, 1, . . . , √ √ 1 − 2λ 1 − 2λ . (3.5) (3.6) of this section we always assume that Recall that f : G → G is a C 1 -mapping.In the remainder d f −1 f (x0 ). exists x0 ∈ G is such that d f x−1 and set β := x 0 0 d f satisfies the L-Lipschitz condition on Cr1 (x0 ) and that T HEOREM 3.1 Suppose that d f x−1 0 λ = Lβ 6 1 . 2 (3.7) Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges to a zero x ∗ of f . Moreover, for each n = 0, 1, . . . the following assertions hold: f (x ) (3.8) %(xn+1 , xn ) 6 d f x−1 6 tn+1 − tn , n n n ξ 2 −1 %(xn , x ) 6 P2n −1 r1 . j j=0 ξ ∗ (3.9) f (xn ) for each n = 0, 1, . . .. Below we shall show that each vn is well Proof. We write vn = −d f x−1 n defined and %(xn+1 , xn ) 6 kvn k 6 tn+1 − tn (3.10) holds for each n = 0, 1, . . .. Given this, one sees that the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges to a zero x ∗ of f because, by (3.1), xn+1 = xn ∙ exp vn for each n = 0, 1, . . . . Furthermore, assertions (3.8) and (3.9) hold for each n and the proof of the theorem is completed. by assumption and x1 = x0 ∙ exp v0 . Hence %(x1 , x0 ) 6 kv0 k. Since Notethat v0 is well defined = β = t1 − t0 , it follows that (3.10) is true for n = 0. We now proceed by ( f (x )) kv0 k = − d f x−1 0 0 mathematical induction on n. For this purpose we assume that vn is well defined and (3.10) holds for each n 6 k − 1. Then k−1 X i=0 kvi k 6 tk − t0 = tk < r1 and xk = x0 ∙ exp v0 ∙ ∙ ∙ exp vk−1 . (3.11) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 ξ 2 −1 r1 − tn = P2n −1 r1 j j=0 ξ NEWTON’S METHOD ON LIE GROUPS Thus we use Lemma 2.2 to conclude that d f x−1 exists and k 1 −1 = −h 0 (tk )−1 . d f xk d f x0 6 1 − Ltk 329 (3.12) Therefore vk is well defined. Observe that 0 = Z 0 1 d f xk−1 ∙exp(tvk−1 ) − d f xk−1 vk−1 dt, where the second equality is valid because of (2.13). Therefore, applying (2.15), one has that Z 1 −1 −1 f f (x ) d x0 d f x0 d f xk−1 ∙exp(tvk−1 ) − d f xk−1 kvk−1 kdt k 6 0 6 Z 6 L (tk − tk−1 )2 2 1 0 Lktvk−1 kkvk−1 kdt 1 = h(tk−1 ) + h 0 (tk−1 )(tk − tk−1 ) + h 00 (tk−1 )(tk − tk−1 )2 2 = h(tk ), (3.13) where the first equality holds because h(tk−1 )+h 0 (tk−1 )(tk −tk−1 ) = 0 by (3.3) and h 00 = L. Combining this with (3.12) yields that kvk k = −d f x−1 f (xk ) k −1 d f f f (x ) 6 d f x−1 d x0 k x0 k 6 −h 0 (tk )−1 h(tk ) = tk+1 − tk . (3.14) Since xk+1 = xk ∙ exp vk , we have %(xk+1 , xk ) 6 kvk k. This together with (3.14) gives that (3.10) holds for n = k, which completes the proof of the theorem. The remainder of this section is devoted to an estimate of the convergence domain of Newton’s method on G around a zero x ∗ of f . Below we shall always assume that x ∗ ∈ G is such that d f x−1 ∗ exists. L EMMA 3.2 Let 0 < r 6 satisfying 1 L and let x0 ∈ Cr (x ∗ ) be such that there exist j > 1 and w1 , . . . , w j ∈ G x0 = x ∗ ∙ exp w1 ∙ ∙ ∙ exp w j (3.15) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 f (xk ) = f (xk ) − f (xk−1 ) − d f xk−1 vk−1 Z 1 = d f xk−1 ∙exp(tvk−1 ) vk−1 dt − d f xk−1 vk−1 330 J.-H. WANG AND C. LI Pj ∗ −1 and i=1 kwi k < r . Suppose that d f x−1 ∗ d f satisfies the L-Lipschitz condition on Cr (x ). Then d f x 0 exists and Pj Pj 2 + L i=1 kwi k −1 i=1 kwi k . (3.16) d f x0 f (x0 ) 6 Pj 2 1 − L i=1 kwi k exists and Proof. By Lemma 2.2, d f x−1 0 1−L 1 Pj i=1 kwi k . (3.17) We write y0 = x ∗ and yi = yi−1 ∙ exp wi for each i = 1, . . . , j. Thus, by (3.15), we have y j = x0 . Fixing i, one has from (2.13) that Z 1 f (yi ) − f (yi−1 ) = d f yi−1 ∙exp(τ wi ) wi dτ, 0 which implies that d f x−1 ∗ ( f (yi ) − f (yi−1 )) = Z 1 0 d f yi−1 ∙exp(τ wi ) − d f x ∗ wi dτ + wi . d f x−1 ∗ (3.18) ∗ Since d f x−1 ∗ d f satisfies the L-Lipschitz condition on Cr (x ), it follows that −1 d f x ∗ d f yκ−1 ∙exp wκ − d f yκ−1 6 Lkwκ k for each κ = 1, . . . , j. This gives that i−1 X −1 −1 d f x ∗ d f yκ−1 ∙exp wκ − d f yκ−1 d f x ∗ d f yi−1 ∙exp(τ wi ) − d f x ∗ 6 κ=1 d f yi−1 ∙exp(τ wi ) − d f yi−1 + d f x−1 ∗ 6L i−1 X κ=1 Noting that f (x0 ) = kd f x−1 ∗ Pj i=1 ( f (yi ) − f (x0 )k 6 Z j X i=1 6 Z j X i=1 kwκ k + Lτ kwi k. (3.19) f (yi−1 )), we have from (3.18) and (3.19) that 0 1 1 −1 d f x ∗ d f yi−1 ∙exp(τ wi ) − d f x ∗ kwi kdτ + kwi k L 0 i−1 X κ=1 ! kwκ k + τ kwi k kwi kdτ + kwi k j j X X L = kwi k + 1 kwi k. 2 i=1 i=1 ! ! Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 −1 d f x0 d f x ∗ 6 331 NEWTON’S METHOD ON LIE GROUPS Combing this with (3.17) yields that P Pj j 2 + L i=1 kwi k i=1 kwi k −1 −1 −1 , d f x0 f (x0 ) 6 d f x0 d f x ∗ kd f x ∗ f (x0 )k 6 Pj 2 1 − L i=1 kwi k which completes the proof of the lemma. T HEOREM 3.3 Let 0 < r 6 = 0 and that d f x−1 Suppose that ∗ d f satisfies the L-Lipschitz ∗ ∗ condition on C3r /(1−Lr ) (x ). Let x0 ∈ Cr (x ). Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to a zero y ∗ of f and %(x ∗ , y ∗ ) < 3r 1−Lr . Proof. Since x0 ∈ Cr (x ∗ ), there exist j > 1 and w1 , . . . , w j ∈ G satisfying f (x ∗ ) 1 4L . Pj i=1 kwi k Set Lˉ = 1−L < r . By Lemma 3.2, d f x−1 exists and 0 L Pj i=1 kwi k Pj Pj 2 + L i=1 kwi k −1 i=1 kwi k β := d f x0 f (x0 ) 6 . Pj 2 1 − L i=1 kwi k and rˉ = (2+Lr )r 1−Lr . (3.20) ˉ d f satisfies the L-Lipschitz Then d f x−1 condition on Crˉ (x0 ). 0 To see this let x ∈ Crˉ (x0 ) and u ∈ G be such that kuk + %(x0 , x) < rˉ . Thus kuk + %(x, x ∗ ) 6 kuk + %(x, x0 ) + %(x0 , x ∗ ) < rˉ + r 6 3r . 1 − Lr ∗ Since d f x−1 ∗ d f satisfies the L-Lipschitz condition on C 3r /(1−Lr ) (x ), it follows that kd f x−1 ∗ (d f x∙exp u − d f x )k 6 Lkuk. (3.21) Consequently, by Lemma 2.2, one has that −1 d f x ∗ kd f x−1 d f x0 (d f x∙exp u − d f x ) 6 d f x−1 ∗ (d f x∙exp u − d f x )k 0 6 1−L ˉ = Lkuk. L Pj i=1 kwi k kuk ˉ This shows that d f x−1 d f satisfies the L-Lipschitz condition on Crˉ (x0 ). Let 0 p 1 − 2λˉ 2β 1 − ˉ p = . λˉ := Lβ and rˉ1 = Lˉ 1 + 1 − 2λˉ (3.22) Then, by (3.20), we have rˉ1 6 2β 6 2+L Pj i=1 kwi k i=1 kwi k Pj 1 − L i=1 kwi k Pj 6 rˉ . (3.23) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 and x0 = x ∗ ∙ exp w1 ∙ ∙ ∙ exp w j 332 Moreover, since L J.-H. WANG AND C. LI Pj i=1 kwi k 6 Lr 6 14 , it follows that λˉ 6 2 + 14 14 (2 + Lkvk)Lkvk 1 6 = . 1 2 2 2(1 − Lkvk)2 2 1− 4 (3.24) Thus Theorem 3.1 is applicable and the sequence {xn } generated by Newton’s method (3.1) with initial point x0 converges to a zero, say y ∗ , of f and %(x0 , y ∗ ) 6 rˉ1 . Furthermore, 3r . 1 − Lr The proof of the theorem is complete. 1 In particular, taking r = 4L in Theorem 3.3 the following corollary is obvious. C OROLLARY 3.4 Suppose that f (x ∗ ) = 0 and that d f x−1 ∗ d f satisfies the L-Lipschitz condition on C1/L (x ∗ ). Let x0 ∈ C1/4L (x ∗ ). Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to a zero y ∗ of f with %(x ∗ , y ∗ ) < L1 . Theorem 3.3 as well as Corollary 3.4 gives an estimate of the convergence domain for Newton’s method. However, we do not know whether the limit y ∗ of the sequence generated by Newton’s method with initial point x0 from this domain is equal to the zero x ∗ . The following corollary provides the convergence domain from which the sequence generated by Newton’s method with initial point x0 converges to the zero x ∗ . Let r > 0. We use B(0, r ) to denote the open ball at 0 with radius r on G, that is, B(0, r ) := {v ∈ G|kvk < r }. C OROLLARY 3.5 Suppose that f (x ∗ ) = 0 and that d f x−1 ∗ d f satisfies the L-Lipschitz condition on ρ 1 ∗ . , 4L C1/L (x ). Let ρ > 0 be the largest number such that Cρ (e) ⊆ exp B 0, L1 and let r = min 3+Lρ Let us write N (x ∗ , r ) := x ∗ ∙ exp(B(0, r )). Then, for each x0 ∈ N (x ∗ , r ), the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to x ∗ . Proof. Let x0 ∈ N (x ∗ , r ). Then there exists v ∈ G such that x0 = x ∗ ∙ exp v and kvk < r 6 min ρ 1 , . 3 + Lρ 4L This implies that x0 ∈ Cr (x ∗ ) and 3r 1 ,ρ . 6 min 1 − Lr L (3.25) Hence Theorem 3.3 can be applied to conclude that the sequence {x n } generated by Newton’s method 3r . (3.1) with initial point x0 is well defined and converges to a zero, say y ∗ , of f and %(x ∗ , y ∗ ) < 1−Lr 1 ∗ ∗ This together with (3.25) implies that %(y , x ) < ρ. Hence there exists u ∈ G such that kuk < L and Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 %(x ∗ , y ∗ ) 6 %(x ∗ , x0 ) + %(x0 , y ∗ ) < r + rˉ1 6 r + rˉ 6 333 NEWTON’S METHOD ON LIE GROUPS y ∗ = x ∗ ∙ exp u. Since Z Z 1 1 −1 −1 d f x ∗ ∙exp(τ u) dτ − I = d f x ∗ (d f x ∗ ∙exp(τ u) − d f x ∗ )dτ d f x ∗ 0 0 6 1 0 Lτ kukdτ < 1, it follows from the Banach lemma that d f x ∗ 0 d f x ∗ ∙exp(τ u) dτ is invertible and so is Note that Z 1 d f x ∗ ∙exp(τ u) dτ u = f (y ∗ ) − f (x ∗ ) = 0. R1 0 d f x ∗ ∙exp(τ u) dτ . 0 We obtain that u = 0 and y ∗ = x ∗ , which completes the proof of the corollary. Recall that, in the special case when G is a compact connected Lie group, G has a bi-invariant Riemannian metric (cf. DoCarmo, 1992, p. 46). Below we assume that G is a compact connected Lie group that is endowed with a bi-invariant Riemannian metric. Therefore an estimate of the convergence domain with the same property as in Corollary 3.5 is described in the following corollary. C OROLLARY 3.6 Let G be a compact connected Lie group that is endowed with a bi-invariant Rie1 mannian metric. Let 0 < r 6 4L . Suppose that f (x ∗ ) = 0 and that d f x−1 ∗ d f satisfies the L-Lipschitz ∗ condition on C3r /(1−Lr ) (x ). Let x0 ∈ Cr (x ∗ ). Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to x ∗ . Proof. By Theorem 3.3, the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is 3r well defined and converges to a zero, say y ∗ , of f with %(x ∗ , y ∗ ) < 1−Lr . Clearly, there is a minimizing −1 ∗ ∗ ∙ y and e. Since G is a compact connected Lie group that is endowed with geodesic c connecting x a bi-invariant Riemannian metric, it follows from Helgason (1978, p. 224) that c is also a one-parameter subgroup of G. Consequently, there exists u ∈ G such that y ∗ = x ∗ ∙ exp u and kuk = %(x ∗ , y ∗ ) < Since 3r . 1 − Lr Z Z 1 1 −1 −1 d f x ∗ ∙exp(τ u) dτ − I = d f x ∗ (d f x ∗ ∙exp(τ u) − d f x ∗ )dτ d f x ∗ 0 0 6 Z 1 0 Lτ kukdτ < 1, it follows from the Banach lemma that d f x−1 ∗ As Z 1 0 Z 1 0 d f x ∗ ∙exp(τ u) dτ is invertible and so is Z 1 0 d f x ∗ ∙exp(τ u) dτ . d f x ∗ ∙exp(τ u) dτ u = f (y ∗ ) − f (x ∗ ) = 0, we have that u = 0, and so y ∗ = x ∗ . This completes the proof of the corollary. Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 −1 R 1 Z 334 J.-H. WANG AND C. LI In particular, taking r = 1 4L in Corollary 3.6, one has the following result. C OROLLARY 3.7 Let G be a compact connected Lie group that is endowed with a bi-invariant Rieman∗ nian metric. Suppose that f (x ∗ ) = 0 and that d f x−1 ∗ d f satisfies the L-Lipschitz condition on C 1/L (x ). ∗ Let x0 ∈ C1/4L (x ). Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to x ∗ . t2 =t1 =0 Then, by (2.13), we have that dθx∙exp(tu) − dθx = Z t d2 θx∙exp(su) u ds 0 for each u ∈ G and t ∈ R. (3.27) E XAMPLE 3.8 Let N be a positive integer and let I N be the N × N identity matrix. Let G be the special linear group under standard matrix multiplication (cf. Varadarajan, 1984), that is, G = SL(N , R) := {x ∈ R N ×N | det x = 1}. Then G is a connected Lie group and its Lie algebra is G = Te G = sl(N , R) := {v ∈ R N ×N |tr(v) = 0}, where e = I N . We endow G with the standard inner product hu, vie = tr(u T v) for any u, v ∈ G. (3.28) Hence the corresponding norm k ∙ k is the Frobenius norm. Moreover, the exponential map exp : G → G is given by X vn for each v ∈ G exp(v) = n! n >0 and its inverse is the logarithm (cf. Hall, 2004, p. 34) exp−1 (z) = X (−1)k−1 k >1 (z − I N )k k for each z ∈ G with kz − I N k < 1. (3.29) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Clearly, Corollaries 3.5 and 3.7 improve the corresponding local convergence result in Owren & Welfert (2000, Theorem 4.6). The following example is devoted to an application of our results in this section to the initial value problem on the special linear group SL(N , R) and the special orthogonal group SO(N , R) that was considered by Owren & Welfert (2000). This, in particular, presents an example for which our results in the present paper are applicable but not the results in Mahony (1996) and Owren & Welfert (2000). For the following example we define the second differential d2 θx : G 2 → G for a C 2 -map θ : G → G as follows: ! ∂2 2 θ(x ∙ exp t2 u 2 ∙ exp t1 u 1 ) . (3.26) d θx u 1 u 2 = ∂t2 ∂t1 335 NEWTON’S METHOD ON LIE GROUPS Let g: G ×R → G be a differentiable map and let x (0) be a random starting point. Consider the following initial value problem on G studied in Owren & Welfert (2000): ( 0 x = x ∙ g(x), (3.30) x(0) = x (0) . The application of one step of the backward Euler method on (3.30) leads to the fixed-point problem x = x (0) ∙ exp(lg(x)), (3.31) f (x) = exp−1 ((x (0) )−1 ∙ x) − lg(x) for each x ∈ G. Thus solving the equation (3.31) is equivalent to finding a zero of f . Let g be the function considered in Owren & Welfert (2000) that is defined by g(x) = (sin x)(2x − 5x 2 ) − ((sin x)(2x − 5x 2 ))T for each x ∈ G, where sin x = X (−1) j−1 j >1 (x)2 j−1 (2 j − 1)! for each x ∈ G. 1 . To apply Consider the special case when l = 1 and x (0) = exp v0 with v0 ∈ G satisfying kv0 k 6 11 −1 −1 (0) −1 our results we have to estimate the norm of d f x . To do this we write w(∙) = exp ((x ) ∙ (∙)). Let Pk kvi k < 14 . Since x := exp v1 exp v2 ∙ ∙ ∙ exp vk for some v1 , v2 , . . . , vk ∈ G with i=0 5 1 t e − 1 6 t for each t ∈ 0, , (3.32) 4 4 one can use mathematical induction to prove that kx − I N k 6 e Pk i=1 kvi k k 5X −16 kvi k. 4 (3.33) i=1 Consequently, we have k(x (0) )−1 x − I N k 6 k 5X 5 < 1. kvi k < 4 16 (3.34) i=0 Thus, by definition and using (3.29), one has for each u ∈ G that P j−1 (0) −1 i (0) −1 (0) −1 j−1−i X i=0 ((x ) x − I N ) (x ) xu((x ) x − I N ) j−1 dwx (u) = . (−1) j j >1 Since k(x (0) )−1 xuk = k(x (0) )−1 xu − u + uk 6 (k(x (0) )−1 x − I N k + 1)kuk, (3.35) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 where l represents the size of the discretization step. Let f : G → G be defined by 336 J.-H. WANG AND C. LI it follows from (3.35) that X j (k(x (0) )−1 x − I N k + 1)k(x (0) )−1 x − I N k j−1 + k(x (0) )−1 x − I N k kuk kdwx (u) − uk 6 j j >2 = 2k(x (0) )−1 x − I N k kuk. 1 − k(x (0) )−1 x − I N k Pk 10 i=0 kvi k 2k(x (0) )−1 x − I N k 6 , Pk (0) −1 1 − k(x ) x − I N k 4 − 5 i=0 kvi k kdwx − IG k 6 (3.36) where IG is the identity on G. Similarly, one has X kd2 wx k 6 [(k(x (0) )−1 x − I N k + 1)k(x (0) )−1 x − I N k j−1 j >1 + ( j − 1)(k(x (0) )−1 x − I N k + 1)2 k(x (0) )−1 x − I N k j−2 ] = 3 + k(x (0) )−1 x − I N k . (1 − k(x (0) )−1 x − I N k)2 Combining this and (3.34) gives that kd wx k 6 2 3+ 1− 5 4 5 4 Pk i=0 kvi k 2 . Pk i=0 kvi k (3.37) d f satisfies the L-Lipschitz condition at x0 on Let x0 = I N and L = 51. Below we verify that d f x−1 0 C1/50 (x0 ). For this purpose we let v ∈ G and x ∈ C1/50 (x0 ). Then there exist k > 1 and v1 , . . . , vk ∈ G Pk 1 such that x = x0 ∙exp v1 ∙ ∙ ∙ exp vk and kvk+ i=1 kvi k < 50 . Let s ∈ [0, 1] and write y := x ∙exp sv = exp v1 ∙ ∙ ∙ exp vk ∙ exp sv. Note that, by (3.36), we have that because kv0 k 6 (3.37) that kd2 w y k 6 1 11 < 1− 1 10 . Since 5 1 4 10 + On the other hand, note that g(x) = 2 X dwx − IG 6 0 Pk 3+ Pk i=0 kvi k j >1 + kvk 6 3 20 i=1 kvi k + skvk (−1) j−1 2 10kv0 k 6 4 − 5kv0 k 7 2 6 1 10 + Pk 35 1 − (3.38) i=1 kvi k 10 7 + kvk < 3 25 , 144 2 . Pk kv k + skvk i i=1 X x 2 j − (x 2 j )T x 2 j+1 − (x 2 j+1 )T −5 . (−1) j−1 (2 j − 1)! (2 j − 1)! j >1 it follows from (3.39) (3.40) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Hence, by (3.34), we have 337 NEWTON’S METHOD ON LIE GROUPS Then, by definition, dgx0 = (−6 cos 1 − 16 sin 1)IG . Hence = dgx−1 0 1 IG −6 cos 1 − 16 sin 1 and IG − dgx0 It follows from (3.38) that −1 = 1 IG . 1 + 6 cos 1 + 16 sin 1 (3.41) k(IG − dgx0 )−1 kkd f x0 − (IG − dgx0 )k = k(IG − dgx0 )−1 kkdwx0 − IG k 1 2 ∙ < 1. 1 + 6 cos 1 + 16 sin 1 7 Thus the Banach lemma is applied to conclude that By definition, we have −1 d f x0 6 kd2 g y k 6 2 1/1 + 6 cos 1 + 16 sin 1 1 − (1/1 + 6 cos 1 + 16 sin 1) ∙ 2 7 6 1 . 10 (3.42) X (2 j + 1)2 ky 2 j+1 k X (2 j)2 ky 2 j k +5 . 2 2 (2 j − 1)! (2 j − 1)! j >1 (3.43) j >1 Pk Using (3.33), we have that ky i k 6 54 i i=1 kvi k + skvk + 1 for each i > 1 and it follows from (3.43) that ! k X 4(2 j)3 + 10(2 j + 1)3 X 4(2 j)2 + 10(2 j + 1)2 5 X 2 kd g y k 6 + . (3.44) kvi k + skvk 4 (2m − 1)! (2 j − 1)! j >1 i=1 j >1 Note that for each j > 1 we have (2 j − 1 + i) ∙ ∙ ∙ (2 j + 1)2 j (2 j − 2 + i) ∙ ∙ ∙ 2 j (2 j − 1) 2(2 j)i 6 + (2 j − 1)! (2 j − 1)! (2 j − 2)! for i = 2, 3. Then, by elementary calculations, we have that 2 X j >1 X (l + i) ∙ ∙ ∙ (l + 1) (2 j)i 6 = i!2i e (2 j − 1)! l! l >0 for i = 2, 3. Similarly, 2 X (2 j + 1)i 6 (i + 1)!2i e (2 j − 1)! j >1 for i = 2, 3. Combining (3.44) with the above two inequalities gives the following estimate: ! k X 2 kvi k + skvk + 136e. kd g y k 6 1320e i=1 (3.45) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 6 338 J.-H. WANG AND C. LI This together with (3.39) and (3.42) yields that −1 2 −1 2 2 d g d f x0 d f y 6 d f x0 kd w y k + d f x−1 y 0 6 ! ! k X e kvi k + skvk + 136 . (3.46) 1320 2 + Pk 10 i=1 i=1 kvi k + skvk 72 175 1 − 10 7 6 6 Z Z 0 1 1 0 −1 2 d f x0 d f x∙exp sv kvkds ! !! k X e kvi k + skvk + 136 1320 kvkds 2 + Pk 10 i=1 i=1 kvi k + skvk 72 175 1 − 10 7 6 51kvk. (3.47) 1 , and so that d f x−1 d f satisfies the L-Lipschitz condition Thus the claim stands. Moreover, r1 < L1 < 50 0 at x0 on Cr1 (x0 ). Noting that f (x0 ) = −v0 , we have by (3.42) that 1 1 f (x0 ) 6 L d f x−1 ∙ kv0 k < . λ = Lβ = L d f x−1 kv0 k 6 51 ∙ 0 0 10 2 Thus Theorem 3.1 is applied to conclude that the sequence generated by (3.1) with initial point x 0 = I N converges to a zero x ∗ of f . To illustrate an application of Corollary 3.4 let us take x (0) = I N , that is, f : G → G is defined by f (x) = exp−1 (x) − (sin x)(2x − 10x 2 ) + ((sin x)(2x − 10x 2 ))T for each x ∈ G. Then x ∗ := I N is a zero of f . Furthermore, by (3.35), we have dwx ∗ = IG and d f x ∗ = dwx ∗ − dgx ∗ = (1 + 6 cos 1 + 16 sin 1)IG . (3.48) As before, let v ∈ G and x ∈ C1/50 (x ∗ ). Then there exist k > 1 and v1 , v2 , . . . , vk ∈ G such that Pk 1 x = exp v1 exp v2 ∙ ∙ ∙ exp vk and i=1 kvi k + kvk < 50 . Similarly, let s ∈ [0, 1] and write y := x exp sv = exp v1 exp v2 ∙ ∙ ∙ exp vk exp sv. Note that, by (3.37) (as v0 = 0), one has that kd2 w y k 6 1− 5 4 3+ Pk 1 40 i=1 kvi k + skvk 2 . (3.49) Note that x ∗ = x0 = I N . Then, using (3.43)–(3.45) and (3.49), one can verify (with almost the same arguments as we did for (3.46) and (3.47)) that ! ! k X 121 e 2 kvi k + skvk + 136 1320 kd f x−1 ∗ d fyk 6 2 + Pk 12 kvi k + skvk 480 1 − 5 4 i=1 i=1 Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Then it follows from (3.27) and the fact that y = x ∙ exp sv that −1 d f x0 (d f x∙exp v − d f x ) NEWTON’S METHOD ON LIE GROUPS 339 and kd f x−1 ∗ (d f x∙exp v − d f x )k 6 51kvk, G = SO(N , R) := {x ∈ R N ×N |x T x = I N and det x = 1}. (3.50) Then G is a compact connected Lie group and its Lie algebra is the set of all N × N skew-symmetric matrices, that is, G = so(N , R) := {v ∈ R N ×N |v T + v = 0}. (3.51) Note that [u, v] = uv − vu and h[u, v], wie = −hu, [w, v]ie for any u, v, w ∈ G. One can easily verify (cf. DoCarmo, 1992, p. 41) that the left-invariant Riemannian metric induced by the inner product in (3.28) is a bi-invariant metric on G. Then Corollary 3.7 is applicable and the sequence generated by (3.1) with initial point x0 is well defined and converges quadratically to x ∗ . We end this section with a remark. R EMARK 3.9 It would be helpful to make some comparisons of Theorem 3.1 with the corresponding result given by theorem 2 in Wang & Li (2007), where the convergence criterion (3.7) was provided under the following metric L-Lipschitz condition at x0 : −1 (3.52) d f x0 (d f x 0 − d f x ) 6 Ld(x 0 , x) ∀ x 0 , x ∈ G with d(x0 , x) + d(x, x 0 ) < r1 , where d(∙, ∙) is the Riemannian distance induced by the left-invariant Riemannian metric. Clearly, this kind of metric L-Lipschitz condition is dependent on the metric on G and is much stronger than the L-Lipschitz condition on Cr1 (x0 ) given in Definition 2.1: −1 (3.53) d f x0 (d f x∙exp u − d f x ) 6 Lkuk ∀ x ∈ Cr1 (x0 ) and u ∈ G with %(x, x0 ) + kuk < r1 . Usually, in a noncompact Lie group, (3.52) is difficult to verify, in particular, for the points x and x 0 that no one-parameter semigroups connect because d f (∙) contains no information about the distance between the two points in G. Hence it is not convenient to apply Theorem 2 of Wang & Li (2007). For example, consider the group G = SL(N , R) in Example 3.8. It would be very difficult to verify the metric L-Lipschitz condition (3.52) (in fact, we do not know how to do that). 4. Applications to optimization problems Let φ: G → R be a C 2 -map. Consider the following optimization problem: min φ(x). x∈G (4.1) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 ∗ ∗ ∗ that is, d f x−1 ∗ d f satisfies the L-Lipschitz condition at x on C 1/50 (x ) and so on C 1/L (x ) because 1 1 1 ∗ L < 50 . Take x 0 = x ∙ exp v with v ∈ G and kvk < 204 . Corollary 3.4 is applied to conclude that the sequence generated by (3.1) with initial point x0 is well defined and converges quadratically to a zero y ∗ of f in C1/L (x ∗ ). Furthermore, if we take G to be the special orthogonal group under standard matrix multiplication, that is, 340 J.-H. WANG AND C. LI Newton’s method for solving (4.1) was presented in Mahony (1996), where a local quadratic convergence result was established for a smooth function φ. Let X ∈ G. Following Mahony (1996), we use e X to denote the left-invariant vector field associated with X defined by e X (x) = (L 0x )e X for each x ∈ G, Let {X 1 , . . . , X n } be an orthonormal basis of G. According to Helmke & Moore (1994, p. 356) (see also Mahony, 1996), grad φ is a vector field on G defined by grad φ(x) = ( e X 1, . . . , e X n )( e X n φ(x))T = X 1 φ(x), . . . , e n X j=1 e X j φ(x) e Xj for each x ∈ G. (4.3) Then Newton’s method with initial point x 0 ∈ G that was considered in Mahony (1996) can be written in a coordinate-free form as follows. A LGORITHM 4.1 Find X k ∈ G such that e X k = (L 0x )e X k and Set xk+1 = xk ∙ exp X k . Set k ← k + 1 and repeat. grad φ(xk ) + grad ( e X k φ)(xk ) = 0. Let f : G → G be a mapping defined by f (x) = (L 0x )−1 e grad φ(x) for each x ∈ G. (4.4) Define the linear operator Hx φ: G → G for each x ∈ G by e (Hx φ)X = (L 0x )−1 e grad ( X φ)(x) for each X ∈ G. (4.5) Then H(∙) φ defines a mapping from G to L(G). The following proposition gives the equivalence between d f x and Hx φ. P ROPOSITION 4.2 Let f (∙) and H(∙) φ be defined by (4.4) and (4.5), respectively. Then d f x = Hx φ for each x ∈ G. (4.6) Proof. Let x ∈ G and let {X 1 , . . . , X n } be an orthonormal basis of G. In view of (4.3) and (4.4), we have that X 1 φ)(x), . . . , ( e X n φ)(x))T . f (x) = (X 1 , . . . , X n )(( e (4.7) Since φ is a C 2 -mapping, it is easy to see by definition that e X j (e X (e X j φ)(x) X φ)(x) = e for each X ∈ G and j = 1, 2, . . . , n. (4.8) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 and e X φ is the Lie derivative of φ with respect to the left-invariant vector field e X , that is, for each x ∈ G we have d (4.2) (e X φ)(x) = φ(x ∙ exp t X ). dt t=0 NEWTON’S METHOD ON LIE GROUPS 341 Therefore, by (4.7), we have that d f (x ∙ exp t X ) d f x (X ) = dt t=0 T d d X 1 φ)(x ∙ exp t X ), . . . , ( e X n φ)(x ∙ exp t X ) = (X 1 , . . . , X n ) (e dt t=0 dt t=0 = (X 1 , . . . , X n )( e X 1( e X φ)(x), . . . , e X φ)(x))T Xn(e T e e f e f e = (L 0x )−1 e ( X 1 , . . . , X n )( X 1 ( X φ)(x), . . . , X n ( X φ)(x)) , where the fourth equality holds because of (4.8). This means that (4.6) holds and the proof is complete. R EMARK 4.3 One can easily see from Proposition 4.2 that, with the same initial point, the sequence generated by Algorithm 4.1 for φ coincides with the one generated by Newton’s method (3.1) for f defined by (4.4). −1 −1 0 −1 exists and let βφ := Hx0 φ L x0 e grad φ(x0 ). Recall that Let x0 ∈ G be such that Hx0 φ r1 is defined by (3.4). Then the main theorem of this section is as follows. T HEOREM 4.4 Suppose that λ := Lβφ 6 1 2 (4.9) −1 and that Hx0 φ (H(∙) φ) satisfies the L-Lipschitz condition on Cr1 (x0 ). Then the sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges to a critical point x ∗ of φ at which grad φ(x ∗ ) = 0. Furthermore, if Hx0 φ is additionally positive definite and the following Lipschitz condition is satisfied: −1 Hx0 φ kHx∙exp u φ − Hx φk 6 Lkuk for x ∈ G and u ∈ G with %(x0 , x) + kuk < r1 , (4.10) then x ∗ is a local solution of (4.1). Proof. Recall that f is defined by (4.4). Then, by Proposition 4.2, d f x = Hx φ for each x ∈ G. Hence, by the assumptions, d f x−1 d f satisfies the L-Lipschitz condition on Cr1 (x0 ) and condition (3.7) is satisfied 0 because Lβ = Lβφ 6 12 . Thus Theorem 3.1 is applicable. Hence the sequence generated by Newton’s method for f with initial point x0 is well defined and converges to a zero x ∗ of f . Consequently, by Remark 4.3, one sees that the first assertion holds. To prove the second assertion we assume that Hx0 φ is additionally positive definite and the Lipschitz condition (4.10) is satisfied. It is sufficient to prove that Hx ∗ φ is positive definite. Let λ∗ and λ0 be the minimum eigenvalues of Hx ∗ φ and Hx0 φ, respectively. Then λ0 > 0. We have to show that λ∗ > 0. To f (xn ) for each do this we let {xn } be the sequence generated by Algorithm 4.1 and write vn = d f x−1 n n = 0, 1, . . .. Then, by Remark 4.3, xn+1 = xn ∙ exp(−vn ) for each n = 0, 1, . . . , (4.11) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 = (X 1 , . . . , X n )( e X (e X 1 φ)(x), . . . , e X (e X n φ)(x))T 342 J.-H. WANG AND C. LI and, by Theorem 3.1, we have kvn k 6 tn+1 − tn for each n = 0, 1, . . . . (4.12) Therefore for each n = 0, 1, . . . we have Hx0 φ −1 Hxn+1 φ − Hx0 φ = Hx0 φ −1 Hxn ∙exp(−vn ) φ − Hx0 φ 6 n X Hx0 φ −1 Hx j ∙exp(−vn ) φ − Hx j φ j=0 n X Lkvn k n X L(tn+1 − tn ) j=0 6 j=0 6 Lr1 (4.13) due to (4.10)–(4.12). Since ∗ λ − 1 = 1 min h(Hx ∗ φ)v, vi − min h(Hx φ)v, vi 0 λ0 v∈G ,kvk=1 λ0 v∈G ,kvk=1 6 Hx0 φ −1 Hx ∗ φ − Hx0 φ , it follows that ∗ λ −1 − 1 6 lim n→∞ Hx0 φ Hxn+1 φ − Hx0 φ 6 Lr1 < 1 λ0 due to (4.13). This implies that λ∗ > 0 and completes the proof. Similar to the proof of Theorem 4.4, we can verify the following corollaries. C OROLLARY 4.5 Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose that (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let x0 ∈ C1/4L (x ∗ ). Then the sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges to a critical point, say y ∗ , of φ and %(x ∗ , y ∗ ) < L1 . Furthermore, if Hx ∗ φ is additionally positive definite and the following Lipschitz condition is satisfied: 1 k(Hx ∗ φ)−1 kkHx∙exp u φ − Hx φk 6 Lkuk for x ∈ G and u ∈ G with %(x ∗ , x) + kuk < , (4.14) L then y ∗ is also a local solution of (4.1). C OROLLARY 4.6 Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose that (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let ρ > 0 be the largest number such ρ 1 . Write N (x ∗ , r ) := x ∗ ∙ exp(B(0, r )). Then, that Cρ (e) ⊆ exp B 0, L1 and let r = min 3+Lρ , 4L ∗ for each x0 ∈ N (x , r ), the sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges quadratically to x ∗ . Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 = 343 NEWTON’S METHOD ON LIE GROUPS C OROLLARY 4.7 Let G be a compact connected Lie group that is endowed with a bi-invariant Riemannian metric. Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose that (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let x0 ∈ C1/4L (x ∗ ). Then the sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges quadratically to x ∗ . We end this paper with two examples with φ defined by (1.1) for which our results in the present paper are applicable but not the results in Mahony (1996). φ(x) = −tr(x T Qx D) for each x ∈ G, where D is the diagonal matrix with diagonal entries 1, 2 and 3 and Q is a fixed symmetric matrix. Then it is known (see, for example, Smith, 1993, 1994; Mahony, 1996) that grad φ(x) = −(L 0x )e [x T Qx, D] = −x[x T Qx, D] (4.15) e X φ(x) = −tr(x T Qx[D, X T ]). (4.16) and Therefore, in view of (4.15), we have This implies that grad ( e X φ)(x) = −(L 0x )e [x T Qx, [D, X T ]] = −x[x T Qx, [D, X T ]]. T T e (Hx φ)X = (L 0x )−1 e grad ( X φ)(x) = −[x Qx, [D, X ]]. (4.17) Fix X ∈ G and define the map g: G → G by g(x) := (Hx φ)X = −[x T Qx, [D, X T ]] for each x ∈ G. (4.18) Then, by definition (cf. (2.10)), for each s ∈ [0, 1] and u ∈ G we have d − [(x esu etu )T Qx esu etu , [D, X T ]] dt t=0 d −tu su T su tu T =− e (x e ) Qx e , [D, X ] e dt t=0 d su T su T =− Ade−tu ((x e ) Qx e ), [D, X ] dt t=0 dgx∙exp su u = = −[[−u, (x esu )T Qx esu ], [D, X T ]], where Ade X t (Y ) = Y + t[X, Y ] + t2 [X, [X, Y ]] + ∙ ∙ ∙ 2! for each X, Y ∈ G. (4.19) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 E XAMPLE 4.8 Let G and G be given by (3.50) and (3.51), respectively, with N = 3. Consider the optimization problem with φ: G → R given by 344 J.-H. WANG AND C. LI Remember that the norm k ∙ k is the Frobenius norm. Consequently, for each u ∈ G we have √ kdgx∙exp su uk 6 8kukkQkkDkkX k = 8 14kukkQkkX k. Hence, applying (2.13), we have that kg(x ∙ exp u) − g(x)k 6 Z 1 0 √ kdgx∙exp(su) ukds 6 8 14kukkQkkX k. √ kHx∙exp u φ − Hx φk 6 8 14kukkQk for each u ∈ G. (4.20) In particular, taking 1 Q = 0.003 2 0 we have 0 0 , 0 0.003 0 √ kHx∙exp u φ − Hx φk 6 16 21kuk for each u ∈ G. (4.21) Let 0 b1 = −1 0 1 0 0 , 0 0 0 0 b2 = 0 −1 0 1 0 0 , 0 0 0 b3 = 0 0 0 0 −1 0 1 . (4.22) 0 Then {b1 , b2 , b3 } is a basis of so(3, R). Thus we can endow the 2-norm k ∙ k2 on so(3, R) defined by kuk2 = q u 21 + u 22 + u 23 It is routine to check that kuk = √ for each u = u 1 b1 + u 2 b2 + u 3 b3 ∈ so(3, R). 2kuk2 for each u ∈ so(3, R). Let x0 = I3 and let 1 C = 0 0 −2 0 −0.006 0 −0.003 . −2 Then, by (4.17), we have that and so Hx0 φ (b1 , b2 , b3 ) = (b1 , b2 , b3 )C, Hx0 φ −1 (b1 , b2 , b3 ) = (b1 , b2 , b3 )C −1 . (4.23) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 This together with (4.18) implies that NEWTON’S METHOD ON LIE GROUPS 345 Therefore for any u = u 1 b1 + u 2 b2 + u 3 b3 ∈ so(3, R) we have −1 u = k(b1 , b2 , b3 )C −1 (u 1 , u 2 , u 3 )T k Hx 0 φ √ = 2kC −1 (u 1 , u 2 , u 3 )T k2 √ 6 2kC −1 kkuk2 = kC −1 kkuk. √ −1 Hx0 φ 6 kC −1 k 6 2. (4.24) −1 Combining this with (4.21) yields that Hx0 φ (H(∙) φ) satisfies the L-Lipschitz condition with √ L = 16 42. Furthermore, by (4.15) and (4.23), we have √ −1 0 −1 −1 βφ = Hx0 φ L x0 e grad φ(x0 ) = Hx0 φ [Q, D] = 0.003 2. Hence λ = Lβφ < 12 . Thus Theorem 4.4 is applicable and the sequence generated by Algorithm 4.1 with initial point x0 = I3 is well defined and convergent to a critical point x ∗ of φ. E XAMPLE 4.9 Consider the same problem as in Example 4.8 but with Q defined by 2 −2 0 Q = −2 1 −2 . 0 −2 0 Let 1 −2 2 1 x ∗ = 2 −1 −2 . 3 2 2 1 (4.25) Since x ∗T Qx ∗ = diag(−2, 1, 4), it follows that x ∗ is a local optimal solution of φ (cf. Brockett, 1988, 1991; Mahony, 1996). Let x ∈ G and u ∈ G. Then it follows from (4.20) that √ √ kHx∙exp u φ − Hx φk 6 8 14kukkQk = 56 6kuk. (4.26) Let b1 , b2 and b3 be given by (4.22) and let 3 C = 0 0 0 0 12 0 . 0 3 With the same arguments as we had in Example 4.8, we can show that (Hx ∗ φ)(b1 , b2 , b3 ) = (b1 , b2 , b3 )C and k(Hx ∗ φ) −1 k 6 kC −1 k= √ 33 . 12 (4.27) Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 Consequently, 346 J.-H. WANG AND C. LI Hence √ Hx ∗ φ is positive definite and (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition with L = 14 22 due to (4.26) and (4.27). Now let x0 = x ∗ ∙exp v with x ∗ given by (4.25) and v = 0.00269b1 . 1 Then kvk < 4L . Note that G is compact connected and endowed with a bi-invariant Riemannian metric. Hence Corollary 4.7 is applicable and the sequence generated by Algorithm 4.1 with initial point x0 = x ∗ ∙ exp v is well defined and converges quadratically to x ∗ . Funding R EFERENCES A BSIL , P. A., BAKER , C. G. & G ALLIVAN , K. A. (2007) Trust-region methods on Riemannian manifolds. Found. Comput. Math., 7, 303–330. A DLER , R., D EDIEU , J. P., M ARGULIES , J., M ARTENS , M. & S HUB , M. (2002) Newton method on Riemannian manifolds and a geometric model for human spine. IMA J. Numer. Anal., 22, 1–32. A LVAREZ , F., B OLTE , J. & M UNIER , J. (2008) A unifying local convergence result for Newton’s method in Riemannian manifolds. Found. Comput. Math., 8, 197–226. BAYER , D. A. & L AGARIAS , J. C. (1989) The nonlinear geometry of linear programming I, affine and projective scaling trajectories; II, Legendre transform coordinates and central trajectories. Trans. Am. Math. Soc., 314, 499–581. B ROCKETT, R. W. (1988) Dynamical systems that sort lists, diagonalise matrices and solve linear programming problems. Proceedings of 27th IEEE Conference on Decision and Control. Austin, TX, pp. 799–803. B ROCKETT, R. W. (1991) Dynamical systems that sort lists, diagonalise matrices and solve linear programming problems. Linear Algebra Appl., 146, 79–91. B ROCKETT, R. W. (1993) Differential geometry and the design of gradient algorithms. Proc. Sympos. Pure Math., 54, 69–92. C HU , M. T. & D RIESSEL , K. R. (1990) The projected gradient method for least squares matrix approximation with spectral constraints. SIAM J. Numer. Anal., 27, 1050–1060. D EDIEU , J. P., P RIOURET, P. & M ALAJOVICH , G. (2003) Newton’s method on Riemannian manifolds: covariant alpha theory. IMA J. Numer. Anal., 23, 395–419. D O C ARMO , M. P. (1992) Riemannian Geometry. Boston, MA: Birkhäuser. E DELMAN , A., A RIAS , T. A. & S MITH , T. (1998) The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl., 20, 303–353. E ZQUERRO , J. A. & H ERN ÁNDEZ , M. A. (2002a) Generalized differentiability conditions for Newton’s method. IMA J. Numer. Anal., 22, 187–205. E ZQUERRO , J. A. & H ERN ÁNDEZ , M. A. (2002b) On an application of Newton’s method to nonlinear operators with w-conditioned second derivative. BIT, 42, 519–530. F ERREIRA , O. P. & S VAITER , B. F. (2002) Kantorovich’s theorem on Newton’s method in Riemannian manifolds. J. Complex., 18, 304–329. G ABAY, D. (1982) Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl., 37, 177–219. G UTI ÉRREZ , J. M. & H ERN ÁNDEZ , M. A. (2000) Newton’s method under weak Kantorovich conditions. IMA J. Numer. Anal., 20, 521–532. H ALL , B. C. (2004) Lie Groups, Lie Algebras, and Representations. Graduate Text in Mathematics, vol. 222. New York: Springer. H ELGASON , S. (1978) Differential Geometry, Lie Groups, and Symmetric Spaces. New York: Academic Press. Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 This research was partially supported by the National Natural Science Foundation of China (grants 10671175; 10731060). NEWTON’S METHOD ON LIE GROUPS 347 Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011 H ELMKE , U. & M OORE , J. B. (1990) Singular value decomposition via gradient flows. Syst. Control Lett., 14, 369–377. H ELMKE , U. & M OORE , J. B. (1994) Optimization and Dynamical Systems. Communications and Control Engineering. London: Springer. K ANTOROVICH , L. V. & A KILOV, G. P. (1982) Functional Analysis. Oxford: Pergamon. L I , C., L ÓPEZ , G. & V ICTORIA , M. M. (2009a) Monotone vector fields and the proximal point algorithm on Hadamard manifolds. J. Lond. Math. Soc., 79, 663–683. L I , C. & WANG , J. H. (2005) Convergence of Newton’s method and uniqueness of zeros of vector fields on Riemannian manifolds. Sci. China Ser. A, 48, 1465–1478. L I , C. & WANG , J. H. (2006) Newton’s method on Riemannian manifolds: Smale’s point estimate theory under the γ -condition. IMA J. Numer. Anal., 26, 228–251. L I , C., WANG , J. H. & D EDIEU , J. P. (2009b) Smale’s point estimate theory for Newton’s method on Lie groups. J. Complex., 25, 128–151. L UENBERGER , D. G. (1972) The gradient projection method along geodesics. Manage. Sci., 18, 620–631. M AHONY, R. E. (1994) Optimization algorithms on homogeneous spaces: with applications in linear systems theory. Ph.D. Thesis, Department of Systems Engineering, University of Canberra, Australia. M AHONY, R. E. (1996) The constrained Newton method on a Lie group and the symmetric eigenvalue problem. Linear Algebra Appl., 248, 67–89. M AHONY, R. E., H ELMKE , U. & M OORE , J. B. (1993) Pole placement algorithms for symmetric realisations. Proceedings of 32nd IEEE Conference on Decision and Control, San Antonio, TX, pp. 1355–1358. M OORE , J. B., M AHONY, R. E. & H ELMKE , U. (1994) Numerical gradient algorithms for eigenvalue and singular value calculations. SIAM J. Matrix Anal. Appl., 15, 881–902. OWREN , B. & W ELFERT, B. (2000) The Newton iteration on Lie groups. BIT Numer. Math., 40, 121–145. S HUB , M. & S MALE , S. (1993) Complexity of Bezout’s theorem, I: geometric aspects. J. Am. Math. Soc., 6, 459–501. S MALE , S. (1986) Newton’s method estimates from data at one point. The Merging of Disciplines: New Directions in Pure, Applied and Computational Mathematics (R. Ewing, K. Gross & C. Martin eds). New York: Springer, pp. 185–196. S MITH , S. T. (1991) Dynamical systems that perform the singular value decomposition. Syst. Control Lett., 16, 319–327. S MITH , S. T. (1993) Geometric optimization method for adaptive filtering. Ph.D. Thesis, Harvard University, Cambridge, MA. S MITH , S. T. (1994) Optimization techniques on Riemannian manifolds. Fields Institute Communications (A. Bloch ed), vol. 3. Providence, RI: American Mathematical Society, pp. 113–146. U DRISTE , C. (1994) Convex Functions and Optimization Methods on Riemannian Manifolds. Mathematics and Its Applications, Vol. 297. Dordrecht: Kluwer Academic Publishers. VARADARAJAN , V. S. (1984) Lie Groups, Lie Algebras and Their Representations. Graduate Text in Mathematics, vol. 102. New York: Springer. WANG , J. H. & L I , C. (2006) Uniqueness of the singular points of vector fields on Riemannian manifolds under the γ -condition. J. Complex., 22, 533–548. WANG , J. H. & L I , C. (2007) Kantorovich’s theorem for Newton’s method on Lie groups. J. Zhejiang Univ. Sci. A, 8, 978–986. WANG , X. H. (2000) Convergence of Newton’s method and uniqueness of the solution of equations in Banach space. IMA J. Numer. Anal., 20, 123–134. Z ABREJKO , P. P. & N GUEN , D. F. (1987) The majorant method in the theory of Newton–Kantorovich approximates and the Ptak error estimates. Numer. Funct. Anal. Optim., 9, 671–674.