Kantorovich’s theorems for Newton’s method for mappings and

advertisement
IMA Journal of Numerical Analysis (2011) 31, 322–347
doi:10.1093/imanum/drp015
Advance Access publication on October 11, 2009
Kantorovich’s theorems for Newton’s method for mappings and
optimization problems on Lie groups
J IN -H UA WANG†
Department of Mathematics, Zhejiang University of Technology, Hangzhou 310032,
People’s Republic of China
C HONG L I‡
Department of Mathematics, Zhejiang University, Hangzhou 310027,
People’s Republic of China
[Received on 4 June 2008; revised on 1 April 2009]
With the classical assumptions on f , a convergence criterion of Newton’s method (independent of affine
connections) to find zeros of a mapping f from a Lie group to its Lie algebra is established, and estimates
of the convergence domains of Newton’s method are obtained, which improve the corresponding results
in Owren & Welfert (2000, BIT Numer. Math., 40, 121–145) and Wang & Li (2006, J. Zhejiang Univ. Sci.
A, 8, 978–986). Applications to optimization are provided and the results due to Mahony (1996, Linear
Algebra Appl., 248, 67–89) are extended and improved accordingly.
Keywords: Newton’s method; Lie group; Lipschitz condition.
1. Introduction
Recently, there has been an increased interest in studying numerical algorithms on manifolds. Classical
examples are given by eigenvalue problems, symmetric eigenvalue problems, invariant subspace computations, optimization problems with equality constraints, etc. (see, for example, Luenberger, 1972;
Gabay, 1982; Smith, 1993, 1994; Mahony, 1994, 1996; Udriste, 1994; Edelman et al., 1998; Owren
& Welfert, 2000; Adler et al., 2002; Absil et al., 2007; Li et al., 2009a). In particular, optimization
problems on Lie groups or homogeneous spaces have been studied recently in the context of using
continuous-time differential equations for solving problems in numerical linear algebra. For example,
let φ: G → R be given by
φ(x) = −tr(x T Qx D)
for each x ∈ G,
(1.1)
where G = SO(N , R) := {x ∈ R N ×N |x T x = I N and det x = 1}, D ∈ R N ×N is the diagonal matrix
with diagonal entries 1, 2, . . . , N and Q is a fixed symmetric matrix. Brockett (1988, 1991) and Chu &
Driessel (1990) considered the following optimization problem:
min φ(x).
x∈G
(1.2)
† Corresponding author. Email: wjh@zjut.edu.cn
‡ Email: cli@zju.edu.cn
c The author 2009. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
AND
NEWTON’S METHOD ON LIE GROUPS
323
where g: G → G is a differentiable map and x (0) is a random starting point. The application of one step
of the backward Euler method on (1.3) leads to the fixed-point problem
x = x (0) ∙ exp(lg(x)),
(1.4)
where l represents the size of the discretization step. Clearly, solving the problem (1.4) is equivalent to
solving the equation
f (x) = 0,
(1.5)
where f : G → G is the mapping defined by
f (x) = exp−1 ((x (0) )−1 ∙ x) − lg(x)
for each x ∈ G.
Owren & Welfert (2000) introduced Newton’s method, independent of affine connections on the Lie
group, for solving the equation (1.5) and showed that Newton’s method for the map with its differential
satisfying the classical Lipschitz condition is locally convergent quadratically. Recently, the authors
of the present paper studied the problems of existence, uniqueness of solutions of (1.5) and estimates
of convergence balls of Newton’s method for (1.5) in Wang & Li (2007), where, however, all results
except Theorem 2 of Wang & Li (2007) on the convergence of Newton’s method were for Abelian
groups. Extensions of Smale’s point estimate theory for Newton’s method on Lie groups were presented
in Li et al. (2009b).
In a vector space framework, as is well known, one of the most important results on Newton’s
method is Kantorovich’s theorem (cf. Kantorovich & Akilov, 1982). Under the mild condition that the
second Fréchet derivative of F is bounded (or, more generally, the first derivative is Lipschitz continuous) on a proper open metric ball of the initial point x0 , Kantorovich’s theorem provides a simple
and clear criterion, based on the knowledge of the first derivative around the initial point, ensuring the
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Brockett (1988, 1991) showed that the minimum x ∗ ∈ G occurs when x ∗T Qx ∗ is a diagonal matrix
with diagonal entries (eigenvalues of Q) in ascending order. Thus solving the nonlinear optimization
problem is equivalent to solving the numerical linear algebra problem of computing the eigenvalues
and eigenvectors of the matrix Q. There are many other examples where the linear algebra problems are
formulated as optimization problems on Lie groups or homogeneous spaces (cf. Bayer & Lagarias, 1989;
Helmke & Moore, 1990; Smith, 1991; Mahony et al., 1993). For a general differentiable function φ on a
Lie group some numerical optimization algorithms for solving (1.2) have been studied extensively (see,
for example, Brockett, 1993; Mahony et al., 1993; Shub & Smale, 1993; Smith, 1993; Mahony, 1994;
Moore et al., 1994). In particular, Mahony used one-parameter subgroups of a Lie group to develop
a version of Newton’s method on an arbitrary Lie group in Mahony (1996), where the approach for
solving (1.2) via Newton’s method was explored and the local convergence was analysed. In this manner
the algorithm presented is independent of affine connections on the Lie group.
On the other hand, motivated by looking for approaches to solving ordinary differential equations
on Lie groups, Owren & Welfert (2000) considered the implicit method for Lie groups, where they used
the implicit Euler method as a generic example of an implicit integration method in a Lie group setting.
Consider the initial value problem on G
( 0
x = x ∙ g(x),
(1.3)
x(0) = x (0) ,
324
J.-H. WANG AND C. LI
where d(∙, ∙) is the Riemannian distance induced by the left-invariant Riemannian metric. Clearly, this
kind of Lipschitz condition is still dependent on the metric on the underlying group and is generally
very difficult to verify in a noncompact group (cf. Example 3.8 in Section 3).
The purpose of the present paper is to establish Kantorovich’s theorem (independent of the connection) for Newton’s method on a Lie group. More precisely, under the assumption that the differential of f
satisfies the Lipschitz condition around the initial point (which is in terms of one-parameter semigroups
and is independent of the metric and weaker than the metric Lipschitz condition (1.6)), the convergence
criterion of Newton’s method for solving (1.5) is established in Section 3. As a consequence, estimates
of the convergence domains are also obtained. The main feature of our results (Theorems 3.1 and 3.3) is
that they are completely independent of the metrics on the groups. In particular, Theorem 3.1 improves
and extends Theorem 2 of Wang & Li (2007), while Theorem 3.3 (and its corollaries) improves and
extends the corresponding result in Owren & Welfert (2000). In Section 4 we will show that Newton’s
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
existence, uniqueness of the solution of the equation and the quadratic convergence of Newton’s method.
Another important result on Newton’s method is Smale’s (1986) point estimate theory (i.e., α-theory and
γ -theory), where the notion of approximate zeros was introduced and the rules for judging an initial
point x0 to be an approximate zero were established, depending on the information of the analytic nonlinear operator at this initial point and at a solution x ∗ . There has been a lot of work on the weakness
and/or the extension of the Lipschitz continuity made on the mappings (see, for example, Zabrejko
& Nguen (1987), Gutiérrez & Hernández (2000), Wang (2000), Ezquerro & Hernández (2002a,b) and
the references therein). In particular, Zabrejko & Nguen (1987) parameterized the classical Lipschitz
continuity. Wang (2000) introduced the notion of Lipschitz conditions with an L-average to unify both
Kantorovich’s and Smale’s criteria.
In a Riemannian manifold framework an analogue of the well-known Kantorovich’s theorem was
given in Ferreira & Svaiter (2002) for Newton’s method for vector fields on Riemannian manifolds,
while extensions of the famous Smale’s (1986) α-theory and γ -theory to analytic vector fields and
analytic mappings on Riemannian manifolds were provided in Dedieu et al. (2003). In the recent paper
Li & Wang (2006) the convergence criteria in Dedieu et al. (2003) were improved by using the notion
of the γ -condition for the vector fields and mappings on Riemannian manifolds. The radii of uniqueness
balls of singular points of vector fields satisfying the γ -conditions were estimated in Wang & Li (2006),
while the local behaviour of Newton’s method on Riemannian manifolds was studied in Li & Wang
(2005). Recently, inspired by the previous work of Zabrejko & Nguen (1987) on Kantorovich’s majorant
method, Alvarez et al. (2008) introduced a Lipschitz-type radial function for the covariant derivative of
vector fields and mappings on Riemannian manifolds and established a unified convergence criterion
for Newton’s method on Riemannian manifolds.
In the spirit of the works mentioned above, a natural and interesting problem is whether an analogue of the well-known Kantorovich’s theorem (independent of the connection) can be established for
Newton’s method (for solving (1.5) and/or the optimization problem (1.2)) on Lie groups. On a finitedimensional completed and connected Riemannian manifold M, Newton’s method is defined in terms
of geodesics, and the property that there is at least one geodesic to connect any two points of M plays a
key role in the study. Newton’s method on a Lie group is defined in terms of one-parameter subgroups.
However, there is no similar property for one-parameter subgroups in an arbitrary Lie group, which
makes the study on a Lie group more complicated. In the recent paper Wang & Li (2007) we gave a kind
of Kantorovich’s theorem for (1.5) by using the following metric Lipschitz condition at x0 :
−1
(1.6)
d f x0 (d f x 0 − d f x ) 6 Ld(x 0 , x) ∀ x 0 , x ∈ G with d(x0 , x) + d(x, x 0 ) < r1 ,
NEWTON’S METHOD ON LIE GROUPS
325
method for solving the problem (1.2) is equivalent to the one for solving (1.5), where f is a map from
a Lie group to its Lie algebra associated to φ, and, as applications, the convergence criterion and the
estimates of the convergence domains of Newton’s method for solving the problem (1.2) are provided.
The results on the convergence domains improve the corresponding results due to Mahony (1996), while
the result on the convergence criterion seems new for Newton’s method for solving the problem (1.2).
Examples are provided to show that our results in the present paper are applicable but not the results in
Mahony (1996) and Owren & Welfert (2000).
Most of the notions and notation that are used in the present paper are standard (see, for example,
Helgason, 1978; Varadarajan, 1984). A Lie group (G, ∙) is a Hausdorff topological group with countable
bases that also has the structure of an analytic manifold such that the group product and the inversion are
analytic operations in the differentiable structure given on the manifold. The dimension of a Lie group
is that of the underlying manifold, and we shall always assume that it is m-dimensional. The symbol
e designates the identity element of G. Let G be the Lie algebra of the Lie group G that is the tangent
space Te G of G at e, equipped with the Lie bracket [∙, ∙]: G × G → G.
In what follows we will make use of the left translation of the Lie group G. We define for each
y ∈ G the left translation L y : G → G by
L y (z) = y ∙ z
for each z ∈ G.
(2.1)
The differential of L y at z is denoted by (L 0y )z , which clearly determines a linear isomorphism from
Tz G to the tangent space T(y∙z) G. In particular, the differential (L 0y )e of L y at e determines a linear
isomorphism from G to the tangent space Ty G. The exponential map exp : G → G is certainly the most
important construction associated to G and G and is defined as follows. Given u ∈ G let σu : R → G be
the one-parameter subgroup of G determined by the left-invariant vector field X u : y 7→ (L 0y )e (u), i.e.,
σu satisfies that
σu (0) = e and σu0 (t) = X u (σu (t)) = L 0σu (t) (u) for each t ∈ R.
(2.2)
e
The value of the exponential map exp at u is then defined by
exp(u) = σu (1).
Moreover, we have that
exp(tu) = σtu (1) = σu (t)
for each t ∈ R and u ∈ G
(2.3)
and
exp(t + s)u = exp(tu) ∙ exp(su)
for any t, s ∈ R and u ∈ G.
(2.4)
Note that the exponential map is not surjective in general. However, the exponential map is a diffeomorphism on an open neighbourhood of 0 ∈ G. In the case when G is Abelian, exp is also a homomorphism
from G to G, i.e.,
exp(u + v) = exp(u) ∙ exp(v)
for all u, v ∈ G.
(2.5)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
2. Notions and preliminaries
326
J.-H. WANG AND C. LI
In the non-Abelian case exp is not a homomorphism and, by the Baker–Campbell–Hausdorff formula
(cf. Varadarajan, 1984, p. 114), (2.5) must be replaced by
exp(w) = exp(u) ∙ exp(v)
(2.6)
1
1
w := u + v + [u, v] + ([u, [u, v]] + [v, [v, u]]) + ∙ ∙ ∙ .
2
12
(2.7)
for all u and v in a neighbourhood of 0 ∈ G, where w is defined by
In particular,
f x0 4x
=
d
f (x ∙ exp(t (L 0x −1 )x 4x ))
dt
t=0
Define the linear map d f x : G → G by
d
f (x ∙ exp(tu))
d fx u =
dt
t=0
for each 4x ∈ Tx G.
for each u ∈ G.
(2.9)
(2.10)
Then, by (2.9), we have
d f x = f x0 ◦ (L 0x )e .
(2.11)
Also, by definition, we have for all t > 0 that
d
f (x ∙ exp(tu)) = d f x∙exp(tu) u
dt
and
f (x ∙ exp(tu)) − f (x) =
Z
for each u ∈ G
(2.12)
t
d f x∙exp(su) u ds
0
for each u ∈ G.
(2.13)
For the remainder of the present paper we always assume that h∙, ∙i is an inner product on G and k ∙ k
is the associated norm on G. We now introduce the following distance on G that plays a key role in the
study. Let x, y ∈ G and define
)
( k
there exist k > 1 and u 1 , . . . , u k ∈ G such that
X
ku i k ,
(2.14)
%(x, y) := inf
y = x ∙ exp u 1 ∙ ∙ ∙ exp u k
i=1
where we adopt the convention that inf ∅ = +∞. It is easy to verify that %(∙, ∙) is a distance on G and
that the topology induced by this distance is equivalent to the original one on G.
Let x ∈ G and r > 0. We denote the corresponding ball of radius r around x of G by Cr (x), that is,
Cr (x) := {y ∈ G|%(x, y) < r }.
Let L(G) denote the set of all linear operators on G. Below we shall use the notion of the L-Lipschitz
condition and a useful lemma.
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Let f : G → G be a C 1 -map and let x ∈ G. We use f x0 to denote the differential of f at x. Then, by
DoCarmo (1992, p. 9) (the proof given there for a smooth mapping still works for a C 1 -map), for each
4x ∈ Tx G and any nontrivial smooth curve c: (−ε, ε) → G with c(0) = x and c0 (0) = 4x , one has
that
d
f x0 4x =
.
(2.8)
( f ◦ c)(t)
dt
t=0
327
NEWTON’S METHOD ON LIE GROUPS
D EFINITION 2.1 Let r > 0, let x0 ∈ G and let T be a mapping from G to L(G). Then T is said to
satisfy the L-Lipschitz condition on Cr (x0 ) if
kT (x ∙ exp u) − T (x)k 6 Lkuk
(2.15)
holds for any u ∈ G and x ∈ Cr (x0 ) such that kuk + %(x, x0 ) < r .
Proof. We write y0 = x0 and yi+1 = yi ∙ exp u i for each i = 0, . . . , k. Since (2.15) holds with
T = d f x−1
d f , one has that
0
−1
(2.17)
d f x0 d f yi ∙exp u i − d f yi 6 Lku i k for each 0 6 i 6 k.
Noting that yk+1 = x, we have that
−1
−
d
f
d
f
d f x0 d f x − IG = d f x−1
y
∙exp
x
u
k
k
0
0
6
k X
−1
d f x0 d f yi ∙exp u i − d f yi i=0
=L
k
X
i=0
ku i k
!
< 1.
Thus the conclusion follows from the Banach lemma and the proof is complete.
3. Convergence criteria
Following Owren & Welfert (2000), we define Newton’s method with initial point x0 for f on a Lie
group as follows:
(3.1)
f
(x
)
for each n = 0, 1, . . . .
xn+1 = xn ∙ exp −d f x−1
n
n
Let β > 0 and L > 0. The quadratic majorizing function h, which was used in Kantorovich & Akilov
(1982) and Wang (2000), is defined by
h(t) =
L 2
t −t +β
2
for each t > 0.
(3.2)
Let {tn } denote the sequence generated by Newton’s method with initial value t0 = 0 for h, that is,
tn+1 = tn − h 0 (tn )−1 h(tn )
for each n = 0, 1, . . . .
(3.3)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
L EMMA 2.2 Let 0 < r 6 L1 and let x0 ∈ G be such that d f x−1
exists. Suppose that d f x−1
d f satisfies
0
0
∈ Cr (x0 ) be such that there exist k > 1 and u 0 , . . . , u k ∈ G
the L-Lipschitz condition on Cr (x0 ). Let x P
k
satisfying x = x0 ∙ exp u 0 ∙ ∙ ∙ ∙ ∙ exp u k and i=0
ku i k < r . Then d f x−1 exists and
1
−1
(2.16)
d f x d f x0 6
.
Pk
1−L
i=0 ku i k
328
J.-H. WANG AND C. LI
Assume that λ := Lβ 6 12 . Then h has two zeros r1 and r2 given by
√
√
1 − 1 − 2λ
1 + 1 − 2λ
and r2 =
.
r1 =
L
L
(3.4)
Moreover, {tn } is monotonic increasing and convergent to r1 and satisfies that
n
where
ξ=
1−
1+
for each n = 0, 1, . . . ,
√
√
1 − 2λ
1 − 2λ
.
(3.5)
(3.6)
of this section we always assume that
Recall that f : G → G is a C 1 -mapping.In the remainder
d f −1 f (x0 ).
exists
x0 ∈ G is such that d f x−1
and
set
β
:=
x
0
0
d f satisfies the L-Lipschitz condition on Cr1 (x0 ) and that
T HEOREM 3.1 Suppose that d f x−1
0
λ = Lβ 6
1
.
2
(3.7)
Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is well defined and
converges to a zero x ∗ of f . Moreover, for each n = 0, 1, . . . the following assertions hold:
f
(x
)
(3.8)
%(xn+1 , xn ) 6 d f x−1
6 tn+1 − tn ,
n
n
n
ξ 2 −1
%(xn , x ) 6 P2n −1 r1 .
j
j=0 ξ
∗
(3.9)
f (xn ) for each n = 0, 1, . . .. Below we shall show that each vn is well
Proof. We write vn = −d f x−1
n
defined and
%(xn+1 , xn ) 6 kvn k 6 tn+1 − tn
(3.10)
holds for each n = 0, 1, . . .. Given this, one sees that the sequence {xn } generated by Newton’s method
(3.1) with initial point x0 is well defined and converges to a zero x ∗ of f because, by (3.1),
xn+1 = xn ∙ exp vn
for each n = 0, 1, . . . .
Furthermore, assertions (3.8) and (3.9) hold for each n and the proof of the theorem is completed.
by assumption and x1 = x0 ∙ exp v0 . Hence %(x1 , x0 ) 6 kv0 k. Since
Notethat v0 is well defined
= β = t1 − t0 , it follows that (3.10) is true for n = 0. We now proceed by
(
f
(x
))
kv0 k = − d f x−1
0
0
mathematical induction on n. For this purpose we assume that vn is well defined and (3.10) holds for
each n 6 k − 1. Then
k−1
X
i=0
kvi k 6 tk − t0 = tk < r1
and
xk = x0 ∙ exp v0 ∙ ∙ ∙ exp vk−1 .
(3.11)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
ξ 2 −1
r1 − tn = P2n −1 r1
j
j=0 ξ
NEWTON’S METHOD ON LIE GROUPS
Thus we use Lemma 2.2 to conclude that d f x−1
exists and
k
1
−1
= −h 0 (tk )−1 .
d f xk d f x0 6
1 − Ltk
329
(3.12)
Therefore vk is well defined. Observe that
0
=
Z
0
1
d f xk−1 ∙exp(tvk−1 ) − d f xk−1 vk−1 dt,
where the second equality is valid because of (2.13). Therefore, applying (2.15), one has that
Z 1
−1
−1 f
f
(x
)
d x0
d f x0 d f xk−1 ∙exp(tvk−1 ) − d f xk−1 kvk−1 kdt
k 6
0
6
Z
6
L
(tk − tk−1 )2
2
1
0
Lktvk−1 kkvk−1 kdt
1
= h(tk−1 ) + h 0 (tk−1 )(tk − tk−1 ) + h 00 (tk−1 )(tk − tk−1 )2
2
= h(tk ),
(3.13)
where the first equality holds because h(tk−1 )+h 0 (tk−1 )(tk −tk−1 ) = 0 by (3.3) and h 00 = L. Combining
this with (3.12) yields that
kvk k = −d f x−1
f (xk )
k
−1
d
f
f
f
(x
)
6 d f x−1
d
x0
k x0
k
6 −h 0 (tk )−1 h(tk )
= tk+1 − tk .
(3.14)
Since xk+1 = xk ∙ exp vk , we have %(xk+1 , xk ) 6 kvk k. This together with (3.14) gives that (3.10) holds
for n = k, which completes the proof of the theorem.
The remainder of this section is devoted to an estimate of the convergence domain of Newton’s
method on G around a zero x ∗ of f . Below we shall always assume that x ∗ ∈ G is such that d f x−1
∗
exists.
L EMMA 3.2 Let 0 < r 6
satisfying
1
L
and let x0 ∈ Cr (x ∗ ) be such that there exist j > 1 and w1 , . . . , w j ∈ G
x0 = x ∗ ∙ exp w1 ∙ ∙ ∙ exp w j
(3.15)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
f (xk ) = f (xk ) − f (xk−1 ) − d f xk−1 vk−1
Z 1
=
d f xk−1 ∙exp(tvk−1 ) vk−1 dt − d f xk−1 vk−1
330
J.-H. WANG AND C. LI
Pj
∗
−1
and i=1 kwi k < r . Suppose that d f x−1
∗ d f satisfies the L-Lipschitz condition on Cr (x ). Then d f x 0
exists and
Pj
Pj
2 + L i=1 kwi k
−1
i=1 kwi k
.
(3.16)
d f x0 f (x0 ) 6
Pj
2 1 − L i=1 kwi k
exists and
Proof. By Lemma 2.2, d f x−1
0
1−L
1
Pj
i=1 kwi k
.
(3.17)
We write y0 = x ∗ and yi = yi−1 ∙ exp wi for each i = 1, . . . , j. Thus, by (3.15), we have y j = x0 .
Fixing i, one has from (2.13) that
Z 1
f (yi ) − f (yi−1 ) =
d f yi−1 ∙exp(τ wi ) wi dτ,
0
which implies that
d f x−1
∗ ( f (yi ) − f (yi−1 )) =
Z
1
0
d f yi−1 ∙exp(τ wi ) − d f x ∗ wi dτ + wi .
d f x−1
∗
(3.18)
∗
Since d f x−1
∗ d f satisfies the L-Lipschitz condition on Cr (x ), it follows that
−1
d f x ∗ d f yκ−1 ∙exp wκ − d f yκ−1 6 Lkwκ k for each κ = 1, . . . , j.
This gives that
i−1 X −1
−1
d f x ∗ d f yκ−1 ∙exp wκ − d f yκ−1 d f x ∗ d f yi−1 ∙exp(τ wi ) − d f x ∗ 6
κ=1
d f yi−1 ∙exp(τ wi ) − d f yi−1 + d f x−1
∗
6L
i−1
X
κ=1
Noting that f (x0 ) =
kd f x−1
∗
Pj
i=1 ( f (yi ) −
f (x0 )k 6
Z
j
X
i=1
6
Z
j
X
i=1

kwκ k + Lτ kwi k.
(3.19)
f (yi−1 )), we have from (3.18) and (3.19) that
0
1
1
−1
d f x ∗ d f yi−1 ∙exp(τ wi ) − d f x ∗ kwi kdτ + kwi k
L
0
i−1
X
κ=1
!
kwκ k + τ kwi k kwi kdτ + kwi k

j
j
X
X
L
=
kwi k + 1
kwi k.
2
i=1
i=1
!
!
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
−1
d f x0 d f x ∗ 6
331
NEWTON’S METHOD ON LIE GROUPS
Combing this with (3.17) yields that
P
Pj
j
2 + L i=1 kwi k
i=1 kwi k
−1
−1
−1
,
d f x0 f (x0 ) 6 d f x0 d f x ∗ kd f x ∗ f (x0 )k 6
Pj
2 1 − L i=1 kwi k
which completes the proof of the lemma.
T HEOREM 3.3 Let 0 < r 6
= 0 and that d f x−1
Suppose that
∗ d f satisfies the L-Lipschitz
∗
∗
condition on C3r /(1−Lr ) (x ). Let x0 ∈ Cr (x ). Then the sequence {xn } generated by Newton’s method
(3.1) with initial point x0 is well defined and converges quadratically to a zero y ∗ of f and %(x ∗ , y ∗ ) <
3r
1−Lr .
Proof. Since x0 ∈ Cr (x ∗ ), there exist j > 1 and w1 , . . . , w j ∈ G satisfying
f (x ∗ )
1
4L .
Pj
i=1 kwi k
Set Lˉ =
1−L
< r . By Lemma 3.2, d f x−1
exists and
0
L
Pj
i=1 kwi k
Pj
Pj
2 + L i=1 kwi k
−1
i=1 kwi k
β := d f x0 f (x0 ) 6
.
Pj
2 1 − L i=1 kwi k
and rˉ =
(2+Lr )r
1−Lr .
(3.20)
ˉ
d f satisfies the L-Lipschitz
Then d f x−1
condition on Crˉ (x0 ).
0
To see this let x ∈ Crˉ (x0 ) and u ∈ G be such that kuk + %(x0 , x) < rˉ . Thus
kuk + %(x, x ∗ ) 6 kuk + %(x, x0 ) + %(x0 , x ∗ ) < rˉ + r 6
3r
.
1 − Lr
∗
Since d f x−1
∗ d f satisfies the L-Lipschitz condition on C 3r /(1−Lr ) (x ), it follows that
kd f x−1
∗ (d f x∙exp u − d f x )k 6 Lkuk.
(3.21)
Consequently, by Lemma 2.2, one has that
−1
d f x ∗ kd f x−1
d f x0 (d f x∙exp u − d f x ) 6 d f x−1
∗ (d f x∙exp u − d f x )k
0
6
1−L
ˉ
= Lkuk.
L
Pj
i=1 kwi k
kuk
ˉ
This shows that d f x−1
d f satisfies the L-Lipschitz
condition on Crˉ (x0 ). Let
0
p
1 − 2λˉ
2β
1
−
ˉ
p
=
.
λˉ := Lβ
and rˉ1 =
Lˉ
1 + 1 − 2λˉ
(3.22)
Then, by (3.20), we have
rˉ1 6 2β 6
2+L
Pj
i=1 kwi k
i=1 kwi k
Pj
1 − L i=1 kwi k
Pj
6 rˉ .
(3.23)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
and
x0 = x ∗ ∙ exp w1 ∙ ∙ ∙ exp w j
332
Moreover, since L
J.-H. WANG AND C. LI
Pj
i=1 kwi k
6 Lr 6 14 , it follows that
λˉ 6
2 + 14 14
(2 + Lkvk)Lkvk
1
6
= .
1 2
2
2(1 − Lkvk)2
2 1− 4
(3.24)
Thus Theorem 3.1 is applicable and the sequence {xn } generated by Newton’s method (3.1) with initial
point x0 converges to a zero, say y ∗ , of f and %(x0 , y ∗ ) 6 rˉ1 . Furthermore,
3r
.
1 − Lr
The proof of the theorem is complete.
1
In particular, taking r = 4L
in Theorem 3.3 the following corollary is obvious.
C OROLLARY 3.4 Suppose that f (x ∗ ) = 0 and that d f x−1
∗ d f satisfies the L-Lipschitz condition on
C1/L (x ∗ ). Let x0 ∈ C1/4L (x ∗ ). Then the sequence {xn } generated by Newton’s method (3.1) with initial
point x0 is well defined and converges quadratically to a zero y ∗ of f with %(x ∗ , y ∗ ) < L1 .
Theorem 3.3 as well as Corollary 3.4 gives an estimate of the convergence domain for Newton’s
method. However, we do not know whether the limit y ∗ of the sequence generated by Newton’s method
with initial point x0 from this domain is equal to the zero x ∗ . The following corollary provides the
convergence domain from which the sequence generated by Newton’s method with initial point x0 converges to the zero x ∗ . Let r > 0. We use B(0, r ) to denote the open ball at 0 with radius r on G, that
is,
B(0, r ) := {v ∈ G|kvk < r }.
C OROLLARY 3.5 Suppose that f (x ∗ ) = 0 and that d f x−1
∗ d f satisfies the L-Lipschitz condition on
ρ
1
∗
.
, 4L
C1/L (x ). Let ρ > 0 be the largest number such that Cρ (e) ⊆ exp B 0, L1 and let r = min 3+Lρ
Let us write N (x ∗ , r ) := x ∗ ∙ exp(B(0, r )). Then, for each x0 ∈ N (x ∗ , r ), the sequence {xn } generated
by Newton’s method (3.1) with initial point x0 is well defined and converges quadratically to x ∗ .
Proof. Let x0 ∈ N (x ∗ , r ). Then there exists v ∈ G such that x0 = x ∗ ∙ exp v and
kvk < r 6 min
ρ
1
,
.
3 + Lρ 4L
This implies that x0 ∈ Cr (x ∗ ) and
3r
1
,ρ .
6 min
1 − Lr
L
(3.25)
Hence Theorem 3.3 can be applied to conclude that the sequence {x n } generated by Newton’s method
3r
.
(3.1) with initial point x0 is well defined and converges to a zero, say y ∗ , of f and %(x ∗ , y ∗ ) < 1−Lr
1
∗
∗
This together with (3.25) implies that %(y , x ) < ρ. Hence there exists u ∈ G such that kuk < L and
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
%(x ∗ , y ∗ ) 6 %(x ∗ , x0 ) + %(x0 , y ∗ ) < r + rˉ1 6 r + rˉ 6
333
NEWTON’S METHOD ON LIE GROUPS
y ∗ = x ∗ ∙ exp u. Since
Z
Z 1
1
−1
−1
d f x ∗ ∙exp(τ u) dτ − I = d f x ∗ (d f x ∗ ∙exp(τ u) − d f x ∗ )dτ d f x ∗
0
0
6
1
0
Lτ kukdτ
< 1,
it follows from the Banach lemma that d f x ∗ 0 d f x ∗ ∙exp(τ u) dτ is invertible and so is
Note that
Z 1
d f x ∗ ∙exp(τ u) dτ u = f (y ∗ ) − f (x ∗ ) = 0.
R1
0
d f x ∗ ∙exp(τ u) dτ .
0
We obtain that u = 0 and y ∗ = x ∗ , which completes the proof of the corollary.
Recall that, in the special case when G is a compact connected Lie group, G has a bi-invariant
Riemannian metric (cf. DoCarmo, 1992, p. 46). Below we assume that G is a compact connected Lie
group that is endowed with a bi-invariant Riemannian metric. Therefore an estimate of the convergence
domain with the same property as in Corollary 3.5 is described in the following corollary.
C OROLLARY 3.6 Let G be a compact connected Lie group that is endowed with a bi-invariant Rie1
mannian metric. Let 0 < r 6 4L
. Suppose that f (x ∗ ) = 0 and that d f x−1
∗ d f satisfies the L-Lipschitz
∗
condition on C3r /(1−Lr ) (x ). Let x0 ∈ Cr (x ∗ ). Then the sequence {xn } generated by Newton’s method
(3.1) with initial point x0 is well defined and converges quadratically to x ∗ .
Proof. By Theorem 3.3, the sequence {xn } generated by Newton’s method (3.1) with initial point x0 is
3r
well defined and converges to a zero, say y ∗ , of f with %(x ∗ , y ∗ ) < 1−Lr
. Clearly, there is a minimizing
−1
∗
∗
∙ y and e. Since G is a compact connected Lie group that is endowed with
geodesic c connecting x
a bi-invariant Riemannian metric, it follows from Helgason (1978, p. 224) that c is also a one-parameter
subgroup of G. Consequently, there exists u ∈ G such that y ∗ = x ∗ ∙ exp u and
kuk = %(x ∗ , y ∗ ) <
Since
3r
.
1 − Lr
Z
Z 1
1
−1
−1
d f x ∗ ∙exp(τ u) dτ − I = d f x ∗ (d f x ∗ ∙exp(τ u) − d f x ∗ )dτ d f x ∗
0
0
6
Z
1
0
Lτ kukdτ
< 1,
it follows from the Banach lemma that d f x−1
∗
As
Z
1
0
Z
1
0
d f x ∗ ∙exp(τ u) dτ is invertible and so is
Z
1
0
d f x ∗ ∙exp(τ u) dτ .
d f x ∗ ∙exp(τ u) dτ u = f (y ∗ ) − f (x ∗ ) = 0,
we have that u = 0, and so y ∗ = x ∗ . This completes the proof of the corollary.
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
−1 R 1
Z
334
J.-H. WANG AND C. LI
In particular, taking r =
1
4L
in Corollary 3.6, one has the following result.
C OROLLARY 3.7 Let G be a compact connected Lie group that is endowed with a bi-invariant Rieman∗
nian metric. Suppose that f (x ∗ ) = 0 and that d f x−1
∗ d f satisfies the L-Lipschitz condition on C 1/L (x ).
∗
Let x0 ∈ C1/4L (x ). Then the sequence {xn } generated by Newton’s method (3.1) with initial point x0
is well defined and converges quadratically to x ∗ .
t2 =t1 =0
Then, by (2.13), we have that
dθx∙exp(tu) − dθx =
Z
t
d2 θx∙exp(su) u ds
0
for each u ∈ G and t ∈ R.
(3.27)
E XAMPLE 3.8 Let N be a positive integer and let I N be the N × N identity matrix. Let G be the special
linear group under standard matrix multiplication (cf. Varadarajan, 1984), that is,
G = SL(N , R) := {x ∈ R N ×N | det x = 1}.
Then G is a connected Lie group and its Lie algebra is
G = Te G = sl(N , R) := {v ∈ R N ×N |tr(v) = 0},
where e = I N . We endow G with the standard inner product
hu, vie = tr(u T v)
for any u, v ∈ G.
(3.28)
Hence the corresponding norm k ∙ k is the Frobenius norm. Moreover, the exponential map exp : G → G
is given by
X vn
for each v ∈ G
exp(v) =
n!
n >0
and its inverse is the logarithm (cf. Hall, 2004, p. 34)
exp−1 (z) =
X
(−1)k−1
k >1
(z − I N )k
k
for each z ∈ G with kz − I N k < 1.
(3.29)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Clearly, Corollaries 3.5 and 3.7 improve the corresponding local convergence result in Owren &
Welfert (2000, Theorem 4.6). The following example is devoted to an application of our results in this
section to the initial value problem on the special linear group SL(N , R) and the special orthogonal
group SO(N , R) that was considered by Owren & Welfert (2000). This, in particular, presents an example for which our results in the present paper are applicable but not the results in Mahony (1996) and
Owren & Welfert (2000).
For the following example we define the second differential d2 θx : G 2 → G for a C 2 -map θ : G → G
as follows:
!
∂2
2
θ(x ∙ exp t2 u 2 ∙ exp t1 u 1 )
.
(3.26)
d θx u 1 u 2 =
∂t2 ∂t1
335
NEWTON’S METHOD ON LIE GROUPS
Let g: G ×R → G be a differentiable map and let x (0) be a random starting point. Consider the following
initial value problem on G studied in Owren & Welfert (2000):
( 0
x = x ∙ g(x),
(3.30)
x(0) = x (0) .
The application of one step of the backward Euler method on (3.30) leads to the fixed-point problem
x = x (0) ∙ exp(lg(x)),
(3.31)
f (x) = exp−1 ((x (0) )−1 ∙ x) − lg(x)
for each x ∈ G.
Thus solving the equation (3.31) is equivalent to finding a zero of f .
Let g be the function considered in Owren & Welfert (2000) that is defined by
g(x) = (sin x)(2x − 5x 2 ) − ((sin x)(2x − 5x 2 ))T
for each x ∈ G,
where
sin x =
X
(−1) j−1
j >1
(x)2 j−1
(2 j − 1)!
for each x ∈ G.
1
. To apply
Consider the special case when l = 1 and x (0) = exp v0 with v0 ∈ G satisfying kv0 k 6 11
−1
−1
(0)
−1
our results we have to estimate the norm of d f x . To do this we write w(∙) = exp ((x ) ∙ (∙)). Let
Pk
kvi k < 14 . Since
x := exp v1 exp v2 ∙ ∙ ∙ exp vk for some v1 , v2 , . . . , vk ∈ G with i=0
5
1
t
e − 1 6 t for each t ∈ 0,
,
(3.32)
4
4
one can use mathematical induction to prove that
kx − I N k 6 e
Pk
i=1 kvi k
k
5X
−16
kvi k.
4
(3.33)
i=1
Consequently, we have
k(x (0) )−1 x − I N k 6
k
5X
5
< 1.
kvi k <
4
16
(3.34)
i=0
Thus, by definition and using (3.29), one has for each u ∈ G that
P j−1 (0) −1
i (0) −1
(0) −1
j−1−i
X
i=0 ((x ) x − I N ) (x ) xu((x ) x − I N )
j−1
dwx (u) =
.
(−1)
j
j >1
Since
k(x (0) )−1 xuk = k(x (0) )−1 xu − u + uk 6 (k(x (0) )−1 x − I N k + 1)kuk,
(3.35)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
where l represents the size of the discretization step. Let f : G → G be defined by
336
J.-H. WANG AND C. LI
it follows from (3.35) that


X j (k(x (0) )−1 x − I N k + 1)k(x (0) )−1 x − I N k j−1
+ k(x (0) )−1 x − I N k kuk
kdwx (u) − uk 6 
j
j >2
=
2k(x (0) )−1 x − I N k
kuk.
1 − k(x (0) )−1 x − I N k
Pk
10 i=0
kvi k
2k(x (0) )−1 x − I N k
6
,
Pk
(0)
−1
1 − k(x ) x − I N k
4 − 5 i=0 kvi k
kdwx − IG k 6
(3.36)
where IG is the identity on G. Similarly, one has
X
kd2 wx k 6
[(k(x (0) )−1 x − I N k + 1)k(x (0) )−1 x − I N k j−1
j >1
+ ( j − 1)(k(x (0) )−1 x − I N k + 1)2 k(x (0) )−1 x − I N k j−2 ]
=
3 + k(x (0) )−1 x − I N k
.
(1 − k(x (0) )−1 x − I N k)2
Combining this and (3.34) gives that
kd wx k 6
2
3+
1−
5
4
5
4
Pk
i=0 kvi k
2 .
Pk
i=0 kvi k
(3.37)
d f satisfies the L-Lipschitz condition at x0 on
Let x0 = I N and L = 51. Below we verify that d f x−1
0
C1/50 (x0 ). For this purpose we let v ∈ G and x ∈ C1/50 (x0 ). Then there exist k > 1 and v1 , . . . , vk ∈ G
Pk
1
such that x = x0 ∙exp v1 ∙ ∙ ∙ exp vk and kvk+ i=1
kvi k < 50
. Let s ∈ [0, 1] and write y := x ∙exp sv =
exp v1 ∙ ∙ ∙ exp vk ∙ exp sv. Note that, by (3.36), we have that
because kv0 k 6
(3.37) that
kd2 w y k 6
1
11
<
1−
1
10 .
Since
5 1
4 10
+
On the other hand, note that
g(x) = 2
X
dwx − IG 6
0
Pk
3+
Pk
i=0 kvi k
j >1
+ kvk 6
3
20
i=1 kvi k + skvk
(−1) j−1
2
10kv0 k
6
4 − 5kv0 k
7
2 6
1
10
+
Pk
35 1 −
(3.38)
i=1 kvi k
10
7
+ kvk <
3
25 ,
144
2 .
Pk
kv
k
+
skvk
i
i=1
X
x 2 j − (x 2 j )T
x 2 j+1 − (x 2 j+1 )T
−5
.
(−1) j−1
(2 j − 1)!
(2 j − 1)!
j >1
it follows from
(3.39)
(3.40)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Hence, by (3.34), we have
337
NEWTON’S METHOD ON LIE GROUPS
Then, by definition, dgx0 = (−6 cos 1 − 16 sin 1)IG . Hence
=
dgx−1
0
1
IG
−6 cos 1 − 16 sin 1
and
IG − dgx0
It follows from (3.38) that
−1
=
1
IG .
1 + 6 cos 1 + 16 sin 1
(3.41)
k(IG − dgx0 )−1 kkd f x0 − (IG − dgx0 )k = k(IG − dgx0 )−1 kkdwx0 − IG k
1
2
∙ < 1.
1 + 6 cos 1 + 16 sin 1 7
Thus the Banach lemma is applied to conclude that
By definition, we have
−1 d f x0 6
kd2 g y k 6 2
1/1 + 6 cos 1 + 16 sin 1
1 − (1/1 + 6 cos 1 + 16 sin 1) ∙
2
7
6
1
.
10
(3.42)
X (2 j + 1)2 ky 2 j+1 k
X (2 j)2 ky 2 j k
+5
.
2
2
(2 j − 1)!
(2 j − 1)!
j >1
(3.43)
j >1
Pk
Using (3.33), we have that ky i k 6 54 i
i=1 kvi k + skvk + 1 for each i > 1 and it follows from (3.43)
that
!
k
X 4(2 j)3 + 10(2 j + 1)3 X 4(2 j)2 + 10(2 j + 1)2
5 X
2
kd g y k 6
+
. (3.44)
kvi k + skvk
4
(2m − 1)!
(2 j − 1)!
j >1
i=1
j >1
Note that for each j > 1 we have
(2 j − 1 + i) ∙ ∙ ∙ (2 j + 1)2 j
(2 j − 2 + i) ∙ ∙ ∙ 2 j (2 j − 1)
2(2 j)i
6
+
(2 j − 1)!
(2 j − 1)!
(2 j − 2)!
for i = 2, 3.
Then, by elementary calculations, we have that
2
X
j >1
X (l + i) ∙ ∙ ∙ (l + 1)
(2 j)i
6
= i!2i e
(2 j − 1)!
l!
l >0
for i = 2, 3.
Similarly,
2
X (2 j + 1)i
6 (i + 1)!2i e
(2 j − 1)!
j >1
for i = 2, 3.
Combining (3.44) with the above two inequalities gives the following estimate:
!
k
X
2
kvi k + skvk + 136e.
kd g y k 6 1320e
i=1
(3.45)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
6
338
J.-H. WANG AND C. LI
This together with (3.39) and (3.42) yields that
−1 2 −1 2
2 d
g
d f x0 d f y 6 d f x0 kd w y k + d f x−1
y
0
6
!
!
k
X
e
kvi k + skvk + 136 . (3.46)
1320
2 +
Pk
10
i=1
i=1 kvi k + skvk
72
175 1 −
10
7
6
6
Z
Z
0
1
1
0
−1 2
d f x0 d f x∙exp sv kvkds
!
!!
k
X
e
kvi k + skvk + 136
1320
kvkds
2 +
Pk
10
i=1
i=1 kvi k + skvk
72
175 1 −
10
7
6 51kvk.
(3.47)
1
, and so that d f x−1
d f satisfies the L-Lipschitz condition
Thus the claim stands. Moreover, r1 < L1 < 50
0
at x0 on Cr1 (x0 ). Noting that f (x0 ) = −v0 , we have by (3.42) that
1
1
f (x0 ) 6 L d f x−1
∙ kv0 k < .
λ = Lβ = L d f x−1
kv0 k 6 51 ∙
0
0
10
2
Thus Theorem 3.1 is applied to conclude that the sequence generated by (3.1) with initial point x 0 = I N
converges to a zero x ∗ of f .
To illustrate an application of Corollary 3.4 let us take x (0) = I N , that is, f : G → G is defined by
f (x) = exp−1 (x) − (sin x)(2x − 10x 2 ) + ((sin x)(2x − 10x 2 ))T
for each x ∈ G.
Then x ∗ := I N is a zero of f . Furthermore, by (3.35), we have
dwx ∗ = IG
and
d f x ∗ = dwx ∗ − dgx ∗ = (1 + 6 cos 1 + 16 sin 1)IG .
(3.48)
As before, let v ∈ G and x ∈ C1/50 (x ∗ ). Then there exist k > 1 and v1 , v2 , . . . , vk ∈ G such that
Pk
1
x = exp v1 exp v2 ∙ ∙ ∙ exp vk and i=1
kvi k + kvk < 50
. Similarly, let s ∈ [0, 1] and write y :=
x exp sv = exp v1 exp v2 ∙ ∙ ∙ exp vk exp sv. Note that, by (3.37) (as v0 = 0), one has that
kd2 w y k 6
1−
5
4
3+
Pk
1
40
i=1 kvi k + skvk
2 .
(3.49)
Note that x ∗ = x0 = I N . Then, using (3.43)–(3.45) and (3.49), one can verify (with almost the same
arguments as we did for (3.46) and (3.47)) that
!
!
k
X
121
e
2
kvi k + skvk + 136
1320
kd f x−1
∗ d fyk 6
2 +
Pk
12
kvi k + skvk
480 1 − 5
4
i=1
i=1
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Then it follows from (3.27) and the fact that y = x ∙ exp sv that
−1
d f x0 (d f x∙exp v − d f x )
NEWTON’S METHOD ON LIE GROUPS
339
and
kd f x−1
∗ (d f x∙exp v − d f x )k 6 51kvk,
G = SO(N , R) := {x ∈ R N ×N |x T x = I N and det x = 1}.
(3.50)
Then G is a compact connected Lie group and its Lie algebra is the set of all N × N skew-symmetric
matrices, that is,
G = so(N , R) := {v ∈ R N ×N |v T + v = 0}.
(3.51)
Note that [u, v] = uv − vu and h[u, v], wie = −hu, [w, v]ie for any u, v, w ∈ G. One can easily verify
(cf. DoCarmo, 1992, p. 41) that the left-invariant Riemannian metric induced by the inner product in
(3.28) is a bi-invariant metric on G. Then Corollary 3.7 is applicable and the sequence generated by
(3.1) with initial point x0 is well defined and converges quadratically to x ∗ .
We end this section with a remark.
R EMARK 3.9 It would be helpful to make some comparisons of Theorem 3.1 with the corresponding
result given by theorem 2 in Wang & Li (2007), where the convergence criterion (3.7) was provided
under the following metric L-Lipschitz condition at x0 :
−1
(3.52)
d f x0 (d f x 0 − d f x ) 6 Ld(x 0 , x) ∀ x 0 , x ∈ G with d(x0 , x) + d(x, x 0 ) < r1 ,
where d(∙, ∙) is the Riemannian distance induced by the left-invariant Riemannian metric. Clearly, this
kind of metric L-Lipschitz condition is dependent on the metric on G and is much stronger than the
L-Lipschitz condition on Cr1 (x0 ) given in Definition 2.1:
−1
(3.53)
d f x0 (d f x∙exp u − d f x ) 6 Lkuk ∀ x ∈ Cr1 (x0 ) and u ∈ G with %(x, x0 ) + kuk < r1 .
Usually, in a noncompact Lie group, (3.52) is difficult to verify, in particular, for the points x and x 0
that no one-parameter semigroups connect because d f (∙) contains no information about the distance
between the two points in G. Hence it is not convenient to apply Theorem 2 of Wang & Li (2007). For
example, consider the group G = SL(N , R) in Example 3.8. It would be very difficult to verify the
metric L-Lipschitz condition (3.52) (in fact, we do not know how to do that).
4. Applications to optimization problems
Let φ: G → R be a C 2 -map. Consider the following optimization problem:
min φ(x).
x∈G
(4.1)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
∗
∗
∗
that is, d f x−1
∗ d f satisfies the L-Lipschitz condition at x on C 1/50 (x ) and so on C 1/L (x ) because
1
1
1
∗
L < 50 . Take x 0 = x ∙ exp v with v ∈ G and kvk < 204 . Corollary 3.4 is applied to conclude that the
sequence generated by (3.1) with initial point x0 is well defined and converges quadratically to a zero
y ∗ of f in C1/L (x ∗ ).
Furthermore, if we take G to be the special orthogonal group under standard matrix multiplication,
that is,
340
J.-H. WANG AND C. LI
Newton’s method for solving (4.1) was presented in Mahony (1996), where a local quadratic convergence result was established for a smooth function φ.
Let X ∈ G. Following Mahony (1996), we use e
X to denote the left-invariant vector field associated
with X defined by
e
X (x) = (L 0x )e X
for each x ∈ G,
Let {X 1 , . . . , X n } be an orthonormal basis of G. According to Helmke & Moore (1994, p. 356) (see also
Mahony, 1996), grad φ is a vector field on G defined by
grad φ(x) = ( e
X 1, . . . , e
X n )( e
X n φ(x))T =
X 1 φ(x), . . . , e
n
X
j=1
e
X j φ(x) e
Xj
for each x ∈ G.
(4.3)
Then Newton’s method with initial point x 0 ∈ G that was considered in Mahony (1996) can be written
in a coordinate-free form as follows.
A LGORITHM 4.1 Find X k ∈ G such that e
X k = (L 0x )e X k and
Set xk+1 = xk ∙ exp X k .
Set k ← k + 1 and repeat.
grad φ(xk ) + grad ( e
X k φ)(xk ) = 0.
Let f : G → G be a mapping defined by
f (x) = (L 0x )−1
e grad φ(x)
for each x ∈ G.
(4.4)
Define the linear operator Hx φ: G → G for each x ∈ G by
e
(Hx φ)X = (L 0x )−1
e grad ( X φ)(x)
for each X ∈ G.
(4.5)
Then H(∙) φ defines a mapping from G to L(G). The following proposition gives the equivalence between
d f x and Hx φ.
P ROPOSITION 4.2 Let f (∙) and H(∙) φ be defined by (4.4) and (4.5), respectively. Then
d f x = Hx φ
for each x ∈ G.
(4.6)
Proof. Let x ∈ G and let {X 1 , . . . , X n } be an orthonormal basis of G. In view of (4.3) and (4.4), we
have that
X 1 φ)(x), . . . , ( e
X n φ)(x))T .
f (x) = (X 1 , . . . , X n )(( e
(4.7)
Since φ is a C 2 -mapping, it is easy to see by definition that
e
X j (e
X (e
X j φ)(x)
X φ)(x) = e
for each X ∈ G and j = 1, 2, . . . , n.
(4.8)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
and e
X φ is the Lie derivative of φ with respect to the left-invariant vector field e
X , that is, for each x ∈ G
we have
d (4.2)
(e
X φ)(x) = φ(x ∙ exp t X ).
dt t=0
NEWTON’S METHOD ON LIE GROUPS
341
Therefore, by (4.7), we have that
d f (x ∙ exp t X )
d f x (X ) = dt t=0
T
d
d X 1 φ)(x ∙ exp t X ), . . . , ( e
X n φ)(x ∙ exp t X )
= (X 1 , . . . , X n )
(e
dt t=0
dt t=0
= (X 1 , . . . , X n )( e
X 1( e
X φ)(x), . . . , e
X φ)(x))T
Xn(e
T
e
e f e
f e
= (L 0x )−1
e ( X 1 , . . . , X n )( X 1 ( X φ)(x), . . . , X n ( X φ)(x)) ,
where the fourth equality holds because of (4.8). This means that (4.6) holds and the proof is complete.
R EMARK 4.3 One can easily see from Proposition 4.2 that, with the same initial point, the sequence
generated by Algorithm 4.1 for φ coincides with the one generated by Newton’s method (3.1) for f
defined by (4.4).
−1
−1 0 −1
exists and let βφ := Hx0 φ
L x0 e grad φ(x0 ). Recall that
Let x0 ∈ G be such that Hx0 φ
r1 is defined by (3.4). Then the main theorem of this section is as follows.
T HEOREM 4.4 Suppose that
λ := Lβφ 6
1
2
(4.9)
−1
and that Hx0 φ (H(∙) φ) satisfies the L-Lipschitz condition on Cr1 (x0 ). Then the sequence generated
by Algorithm 4.1 with initial point x0 is well defined and converges to a critical point x ∗ of φ at which
grad φ(x ∗ ) = 0.
Furthermore, if Hx0 φ is additionally positive definite and the following Lipschitz condition is
satisfied:
−1 Hx0 φ kHx∙exp u φ − Hx φk 6 Lkuk for x ∈ G and u ∈ G with %(x0 , x) + kuk < r1 , (4.10)
then x ∗ is a local solution of (4.1).
Proof. Recall that f is defined by (4.4). Then, by Proposition 4.2, d f x = Hx φ for each x ∈ G. Hence, by
the assumptions, d f x−1
d f satisfies the L-Lipschitz condition on Cr1 (x0 ) and condition (3.7) is satisfied
0
because Lβ = Lβφ 6 12 . Thus Theorem 3.1 is applicable. Hence the sequence generated by Newton’s
method for f with initial point x0 is well defined and converges to a zero x ∗ of f . Consequently, by
Remark 4.3, one sees that the first assertion holds.
To prove the second assertion we assume that Hx0 φ is additionally positive definite and the Lipschitz
condition (4.10) is satisfied. It is sufficient to prove that Hx ∗ φ is positive definite. Let λ∗ and λ0 be the
minimum eigenvalues of Hx ∗ φ and Hx0 φ, respectively. Then λ0 > 0. We have to show that λ∗ > 0. To
f (xn ) for each
do this we let {xn } be the sequence generated by Algorithm 4.1 and write vn = d f x−1
n
n = 0, 1, . . .. Then, by Remark 4.3,
xn+1 = xn ∙ exp(−vn )
for each n = 0, 1, . . . ,
(4.11)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
= (X 1 , . . . , X n )( e
X (e
X 1 φ)(x), . . . , e
X (e
X n φ)(x))T
342
J.-H. WANG AND C. LI
and, by Theorem 3.1, we have
kvn k 6 tn+1 − tn
for each n = 0, 1, . . . .
(4.12)
Therefore for each n = 0, 1, . . . we have
Hx0 φ −1 Hxn+1 φ − Hx0 φ = Hx0 φ −1 Hxn ∙exp(−vn ) φ − Hx0 φ 6
n X
Hx0 φ −1 Hx j ∙exp(−vn ) φ − Hx j φ j=0
n
X
Lkvn k
n
X
L(tn+1 − tn )
j=0
6
j=0
6 Lr1
(4.13)
due to (4.10)–(4.12). Since
∗
λ
− 1 = 1 min h(Hx ∗ φ)v, vi − min h(Hx φ)v, vi
0
λ0 v∈G ,kvk=1
λ0
v∈G ,kvk=1
6 Hx0 φ −1 Hx ∗ φ − Hx0 φ ,
it follows that
∗
λ
−1 − 1 6 lim n→∞ Hx0 φ Hxn+1 φ − Hx0 φ 6 Lr1 < 1
λ0
due to (4.13). This implies that λ∗ > 0 and completes the proof.
Similar to the proof of Theorem 4.4, we can verify the following corollaries.
C OROLLARY 4.5 Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose
that (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let x0 ∈ C1/4L (x ∗ ). Then the
sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges to a critical
point, say y ∗ , of φ and %(x ∗ , y ∗ ) < L1 .
Furthermore, if Hx ∗ φ is additionally positive definite and the following Lipschitz condition is
satisfied:
1
k(Hx ∗ φ)−1 kkHx∙exp u φ − Hx φk 6 Lkuk for x ∈ G and u ∈ G with %(x ∗ , x) + kuk < , (4.14)
L
then y ∗ is also a local solution of (4.1).
C OROLLARY 4.6 Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose that
(Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let ρ > 0 be the largest number such
ρ
1
. Write N (x ∗ , r ) := x ∗ ∙ exp(B(0, r )). Then,
that Cρ (e) ⊆ exp B 0, L1 and let r = min 3+Lρ
, 4L
∗
for each x0 ∈ N (x , r ), the sequence generated by Algorithm 4.1 with initial point x0 is well defined
and converges quadratically to x ∗ .
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
=
343
NEWTON’S METHOD ON LIE GROUPS
C OROLLARY 4.7 Let G be a compact connected Lie group that is endowed with a bi-invariant
Riemannian metric. Let x ∗ be a local optimal solution of (4.1) such that (Hx ∗ φ)−1 exists. Suppose
that (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition on C1/L (x0 ). Let x0 ∈ C1/4L (x ∗ ). Then the
sequence generated by Algorithm 4.1 with initial point x0 is well defined and converges quadratically
to x ∗ .
We end this paper with two examples with φ defined by (1.1) for which our results in the present
paper are applicable but not the results in Mahony (1996).
φ(x) = −tr(x T Qx D)
for each x ∈ G,
where D is the diagonal matrix with diagonal entries 1, 2 and 3 and Q is a fixed symmetric matrix. Then
it is known (see, for example, Smith, 1993, 1994; Mahony, 1996) that
grad φ(x) = −(L 0x )e [x T Qx, D] = −x[x T Qx, D]
(4.15)
e
X φ(x) = −tr(x T Qx[D, X T ]).
(4.16)
and
Therefore, in view of (4.15), we have
This implies that
grad ( e
X φ)(x) = −(L 0x )e [x T Qx, [D, X T ]] = −x[x T Qx, [D, X T ]].
T
T
e
(Hx φ)X = (L 0x )−1
e grad ( X φ)(x) = −[x Qx, [D, X ]].
(4.17)
Fix X ∈ G and define the map g: G → G by
g(x) := (Hx φ)X = −[x T Qx, [D, X T ]] for each x ∈ G.
(4.18)
Then, by definition (cf. (2.10)), for each s ∈ [0, 1] and u ∈ G we have
d − [(x esu etu )T Qx esu etu , [D, X T ]]
dt t=0
d
−tu
su T
su tu
T
=−
e
(x
e
)
Qx
e
,
[D,
X
]
e
dt t=0
d
su T
su
T
=−
Ade−tu ((x e ) Qx e ), [D, X ]
dt t=0
dgx∙exp su u =
= −[[−u, (x esu )T Qx esu ], [D, X T ]],
where
Ade X t (Y ) = Y + t[X, Y ] +
t2
[X, [X, Y ]] + ∙ ∙ ∙
2!
for each X, Y ∈ G.
(4.19)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
E XAMPLE 4.8 Let G and G be given by (3.50) and (3.51), respectively, with N = 3. Consider the
optimization problem with φ: G → R given by
344
J.-H. WANG AND C. LI
Remember that the norm k ∙ k is the Frobenius norm. Consequently, for each u ∈ G we have
√
kdgx∙exp su uk 6 8kukkQkkDkkX k = 8 14kukkQkkX k.
Hence, applying (2.13), we have that
kg(x ∙ exp u) − g(x)k 6
Z
1
0
√
kdgx∙exp(su) ukds 6 8 14kukkQkkX k.
√
kHx∙exp u φ − Hx φk 6 8 14kukkQk
for each u ∈ G.
(4.20)
In particular, taking

1

Q = 0.003
2
0
we have

0

0 ,
0
0.003
0
√
kHx∙exp u φ − Hx φk 6 16 21kuk
for each u ∈ G.
(4.21)
Let

0

b1 = −1
0
1
0


0 ,
0
0
0

0

b2 =  0
−1
0 1


0 0 ,
0 0

0

b3 = 0
0
0
0
−1

0

1 .
(4.22)
0
Then {b1 , b2 , b3 } is a basis of so(3, R). Thus we can endow the 2-norm k ∙ k2 on so(3, R) defined by
kuk2 =
q
u 21 + u 22 + u 23
It is routine to check that kuk =
√
for each u = u 1 b1 + u 2 b2 + u 3 b3 ∈ so(3, R).
2kuk2 for each u ∈ so(3, R). Let x0 = I3 and let

1

C = 0
0
−2
0 −0.006
0


−0.003 .
−2
Then, by (4.17), we have that
and so
Hx0 φ (b1 , b2 , b3 ) = (b1 , b2 , b3 )C,
Hx0 φ
−1
(b1 , b2 , b3 ) = (b1 , b2 , b3 )C −1 .
(4.23)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
This together with (4.18) implies that
NEWTON’S METHOD ON LIE GROUPS
345
Therefore for any u = u 1 b1 + u 2 b2 + u 3 b3 ∈ so(3, R) we have
−1 u = k(b1 , b2 , b3 )C −1 (u 1 , u 2 , u 3 )T k
Hx 0 φ
√
= 2kC −1 (u 1 , u 2 , u 3 )T k2
√
6 2kC −1 kkuk2
= kC −1 kkuk.
√
−1 Hx0 φ 6 kC −1 k 6 2.
(4.24)
−1
Combining this with (4.21) yields that Hx0 φ (H(∙) φ) satisfies the L-Lipschitz condition with
√
L = 16 42. Furthermore, by (4.15) and (4.23), we have
√
−1 0 −1
−1
βφ = Hx0 φ
L x0 e grad φ(x0 ) = Hx0 φ
[Q, D] = 0.003 2.
Hence λ = Lβφ < 12 . Thus Theorem 4.4 is applicable and the sequence generated by Algorithm 4.1
with initial point x0 = I3 is well defined and convergent to a critical point x ∗ of φ.
E XAMPLE 4.9 Consider the same problem as in Example 4.8 but with Q defined by


2 −2 0
Q = −2 1 −2 .
0 −2 0
Let


1 −2 2
1
x ∗ = 2 −1 −2 .
3
2 2
1
(4.25)
Since x ∗T Qx ∗ = diag(−2, 1, 4), it follows that x ∗ is a local optimal solution of φ (cf. Brockett, 1988,
1991; Mahony, 1996). Let x ∈ G and u ∈ G. Then it follows from (4.20) that
√
√
kHx∙exp u φ − Hx φk 6 8 14kukkQk = 56 6kuk.
(4.26)
Let b1 , b2 and b3 be given by (4.22) and let

3
C = 0
0

0 0
12 0 .
0 3
With the same arguments as we had in Example 4.8, we can show that
(Hx ∗ φ)(b1 , b2 , b3 ) = (b1 , b2 , b3 )C
and
k(Hx ∗ φ)
−1
k 6 kC
−1
k=
√
33
.
12
(4.27)
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
Consequently,
346
J.-H. WANG AND C. LI
Hence √
Hx ∗ φ is positive definite and (Hx ∗ φ)−1 (H(∙) φ) satisfies the L-Lipschitz condition with
L = 14 22 due to (4.26) and (4.27). Now let x0 = x ∗ ∙exp v with x ∗ given by (4.25) and v = 0.00269b1 .
1
Then kvk < 4L
. Note that G is compact connected and endowed with a bi-invariant Riemannian metric. Hence Corollary 4.7 is applicable and the sequence generated by Algorithm 4.1 with initial point
x0 = x ∗ ∙ exp v is well defined and converges quadratically to x ∗ .
Funding
R EFERENCES
A BSIL , P. A., BAKER , C. G. & G ALLIVAN , K. A. (2007) Trust-region methods on Riemannian manifolds. Found.
Comput. Math., 7, 303–330.
A DLER , R., D EDIEU , J. P., M ARGULIES , J., M ARTENS , M. & S HUB , M. (2002) Newton method on Riemannian
manifolds and a geometric model for human spine. IMA J. Numer. Anal., 22, 1–32.
A LVAREZ , F., B OLTE , J. & M UNIER , J. (2008) A unifying local convergence result for Newton’s method in
Riemannian manifolds. Found. Comput. Math., 8, 197–226.
BAYER , D. A. & L AGARIAS , J. C. (1989) The nonlinear geometry of linear programming I, affine and projective
scaling trajectories; II, Legendre transform coordinates and central trajectories. Trans. Am. Math. Soc., 314,
499–581.
B ROCKETT, R. W. (1988) Dynamical systems that sort lists, diagonalise matrices and solve linear programming
problems. Proceedings of 27th IEEE Conference on Decision and Control. Austin, TX, pp. 799–803.
B ROCKETT, R. W. (1991) Dynamical systems that sort lists, diagonalise matrices and solve linear programming
problems. Linear Algebra Appl., 146, 79–91.
B ROCKETT, R. W. (1993) Differential geometry and the design of gradient algorithms. Proc. Sympos. Pure Math.,
54, 69–92.
C HU , M. T. & D RIESSEL , K. R. (1990) The projected gradient method for least squares matrix approximation
with spectral constraints. SIAM J. Numer. Anal., 27, 1050–1060.
D EDIEU , J. P., P RIOURET, P. & M ALAJOVICH , G. (2003) Newton’s method on Riemannian manifolds: covariant
alpha theory. IMA J. Numer. Anal., 23, 395–419.
D O C ARMO , M. P. (1992) Riemannian Geometry. Boston, MA: Birkhäuser.
E DELMAN , A., A RIAS , T. A. & S MITH , T. (1998) The geometry of algorithms with orthogonality constraints.
SIAM J. Matrix Anal. Appl., 20, 303–353.
E ZQUERRO , J. A. & H ERN ÁNDEZ , M. A. (2002a) Generalized differentiability conditions for Newton’s method.
IMA J. Numer. Anal., 22, 187–205.
E ZQUERRO , J. A. & H ERN ÁNDEZ , M. A. (2002b) On an application of Newton’s method to nonlinear operators
with w-conditioned second derivative. BIT, 42, 519–530.
F ERREIRA , O. P. & S VAITER , B. F. (2002) Kantorovich’s theorem on Newton’s method in Riemannian manifolds.
J. Complex., 18, 304–329.
G ABAY, D. (1982) Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl., 37,
177–219.
G UTI ÉRREZ , J. M. & H ERN ÁNDEZ , M. A. (2000) Newton’s method under weak Kantorovich conditions. IMA J.
Numer. Anal., 20, 521–532.
H ALL , B. C. (2004) Lie Groups, Lie Algebras, and Representations. Graduate Text in Mathematics, vol. 222.
New York: Springer.
H ELGASON , S. (1978) Differential Geometry, Lie Groups, and Symmetric Spaces. New York: Academic Press.
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
This research was partially supported by the National Natural Science Foundation of China (grants
10671175; 10731060).
NEWTON’S METHOD ON LIE GROUPS
347
Downloaded from imajna.oxfordjournals.org at Biomedical Library Serials on February 14, 2011
H ELMKE , U. & M OORE , J. B. (1990) Singular value decomposition via gradient flows. Syst. Control Lett., 14,
369–377.
H ELMKE , U. & M OORE , J. B. (1994) Optimization and Dynamical Systems. Communications and Control Engineering. London: Springer.
K ANTOROVICH , L. V. & A KILOV, G. P. (1982) Functional Analysis. Oxford: Pergamon.
L I , C., L ÓPEZ , G. & V ICTORIA , M. M. (2009a) Monotone vector fields and the proximal point algorithm on
Hadamard manifolds. J. Lond. Math. Soc., 79, 663–683.
L I , C. & WANG , J. H. (2005) Convergence of Newton’s method and uniqueness of zeros of vector fields on
Riemannian manifolds. Sci. China Ser. A, 48, 1465–1478.
L I , C. & WANG , J. H. (2006) Newton’s method on Riemannian manifolds: Smale’s point estimate theory under
the γ -condition. IMA J. Numer. Anal., 26, 228–251.
L I , C., WANG , J. H. & D EDIEU , J. P. (2009b) Smale’s point estimate theory for Newton’s method on Lie groups.
J. Complex., 25, 128–151.
L UENBERGER , D. G. (1972) The gradient projection method along geodesics. Manage. Sci., 18, 620–631.
M AHONY, R. E. (1994) Optimization algorithms on homogeneous spaces: with applications in linear systems
theory. Ph.D. Thesis, Department of Systems Engineering, University of Canberra, Australia.
M AHONY, R. E. (1996) The constrained Newton method on a Lie group and the symmetric eigenvalue problem.
Linear Algebra Appl., 248, 67–89.
M AHONY, R. E., H ELMKE , U. & M OORE , J. B. (1993) Pole placement algorithms for symmetric realisations.
Proceedings of 32nd IEEE Conference on Decision and Control, San Antonio, TX, pp. 1355–1358.
M OORE , J. B., M AHONY, R. E. & H ELMKE , U. (1994) Numerical gradient algorithms for eigenvalue and singular
value calculations. SIAM J. Matrix Anal. Appl., 15, 881–902.
OWREN , B. & W ELFERT, B. (2000) The Newton iteration on Lie groups. BIT Numer. Math., 40, 121–145.
S HUB , M. & S MALE , S. (1993) Complexity of Bezout’s theorem, I: geometric aspects. J. Am. Math. Soc., 6,
459–501.
S MALE , S. (1986) Newton’s method estimates from data at one point. The Merging of Disciplines: New Directions
in Pure, Applied and Computational Mathematics (R. Ewing, K. Gross & C. Martin eds). New York: Springer,
pp. 185–196.
S MITH , S. T. (1991) Dynamical systems that perform the singular value decomposition. Syst. Control Lett., 16,
319–327.
S MITH , S. T. (1993) Geometric optimization method for adaptive filtering. Ph.D. Thesis, Harvard University,
Cambridge, MA.
S MITH , S. T. (1994) Optimization techniques on Riemannian manifolds. Fields Institute Communications
(A. Bloch ed), vol. 3. Providence, RI: American Mathematical Society, pp. 113–146.
U DRISTE , C. (1994) Convex Functions and Optimization Methods on Riemannian Manifolds. Mathematics and Its
Applications, Vol. 297. Dordrecht: Kluwer Academic Publishers.
VARADARAJAN , V. S. (1984) Lie Groups, Lie Algebras and Their Representations. Graduate Text in Mathematics,
vol. 102. New York: Springer.
WANG , J. H. & L I , C. (2006) Uniqueness of the singular points of vector fields on Riemannian manifolds under
the γ -condition. J. Complex., 22, 533–548.
WANG , J. H. & L I , C. (2007) Kantorovich’s theorem for Newton’s method on Lie groups. J. Zhejiang Univ. Sci.
A, 8, 978–986.
WANG , X. H. (2000) Convergence of Newton’s method and uniqueness of the solution of equations in Banach
space. IMA J. Numer. Anal., 20, 123–134.
Z ABREJKO , P. P. & N GUEN , D. F. (1987) The majorant method in the theory of Newton–Kantorovich approximates and the Ptak error estimates. Numer. Funct. Anal. Optim., 9, 671–674.
Download