a new scaling for newton`s iteration for the polar decomposition and

advertisement
A NEW SCALING FOR NEWTON’S ITERATION FOR THE POLAR
DECOMPOSITION AND ITS BACKWARD STABILITY
RALPH BYERS∗† ‡ AND HONGGUO XU∗§
Abstract. We propose a scaling scheme for Newton’s iteration for calculating the polar decomposition. The scaling factors are generated by a simple scalar iteration in which the initial value
depends only on estimates of the extreme singular values of the original matrix, which can for example be the Frobenius norms of the matrix and its inverse. In exact arithmetic, for matrices with
condition number no greater than 1016 , with this scaling scheme, no more than 9 iterations are needed
for convergence to the unitary polar factor with a convergence tolerance roughly equal to 10−16 . It
is proved that if matrix inverses computed in finite precision arithmetic satisfy a backward-forward
error model then the numerical method is backward stable. It is also proved that Newton’s method
with Higham’s scaling or with Frobenius norm scaling is backward stable.
Key words. matrix sign function, polar decomposition, singular value decomposition (SVD),
Newton’s method, numerical stability, scaling
AMS subject classifications. 65F05, 65G05
1. Introduction. Every matrix A ∈ Cn×n has a polar decomposition A = QH,
where H = H ∗ ∈ Cn×n is Hermitian positive semi-definite and Q ∈ Cn×n is unitary,
i.e., Q∗ Q = I. The polar decomposition is unique with positive definite symmetric
factor H if and only if A is nonsingular. Its applications include unitary approximation and distance calculations [8, 9, 12]. The polar decomposition generalizes to
rectangular matrices; see, for example, [15]. We consider only the square matrix case
here, because numerical methods for computing the polar decomposition typically begin by reducing to the problem down to the square matrix case using, for example, a
QR factorization [5, 8]. (An algorithm that works directly with rectangular matrices
appears in [6].)
The polar decomposition may be easily constructed from a singular value decomposition (SVD) of A. However, the SVD is a substantial calculation that displays
much more of the structure of A than does the polar decomposition. Constructing
the polar decomposition from the SVD destroys this extra information and wastes the
arithmetic work used to compute it. It is intuitively more appealing to use the polar
decomposition as a preliminary step in the computation of the SVD as in [12].
When A is nonsingular one way to compute the polar decomposition is through
Newton’s iteration
1
Qk+1 =
(1.1)
ζk Qk + (ζk Qk )−∗ ,
Q0 = A,
2
where ζk = ζ(Qk ) > 0 is a positive scalar function of Qk chosen to accelerate
convergence [8]. Each iterate Qk has polar decomposition Qk = QHk where Q is
the unitary polar factor of A, H0 = H is the Hermitian polar factor of A, and
∗ Department of Mathematics, University of Kansas, Lawrence, Kansas 66045, USA. xu@math.
ku.edu.
† Deceased.
‡ This material is based upon work partially supported by the University of Kansas General
Research Fund allocations 2301062-003 and 2301054-003 and by the National Science Foundation
awards 0098150, 0112375, and 9977352.
§ This author is partially supported by NSF under Grant No.EPS-9874732 with matching support
from the State of Kansas and the University of Kansas General Research Fund allocation 2301717003.
1
2
R. BYERS AND H. XU
Hk+1 = (ζk Hk + (ζk Hk )−1 )/2, k ≥ 0. For appropriately chosen acceleration parameters ζk , limk→∞ Hk = I. Hence, the unitary polar factor is Q = limk→∞ Qk and the
Hermitian polar factor is H = limk→∞ Q∗k A.
Iteration (1.1) was first proposed in [8] and studied further in [5, 6, 14]. It is
called “Newton’s iteration” because it can be derived from Newton’s method applied
to the equation X ∗ X = I. It is closely related to Newton’s iteration for the matrix
sign function [17, 22].
Simplicity is an attractive feature of (1.1). Apart from the computation of ζk ,
each iteration needs only one matrix inversion and one matrix-matrix addition. The
simplicity allows implementations of (1.1) to take advantage of the hierarchical memory and parallelism [1, 2, 13]. Many authors have studied choices of the acceleration
parameters ζk [3, 4, 8, 16, 17, 22]. If ζk ≡ 1, then the iterates Qk converge quadratically to the unitary polar factor Q [8]. Convergence is also quadratic if ζ = ζ(U ) is a
smooth function of U ∈ Cn×n and ζ(U ) = 1 whenever U is unitary.
The choice
s
||Q−1
(2)
k ||2
(1.2)
ζk =
||Qk ||2
where || · ||2 is the spectral norm is proposed in [8]. This scale factor is optimal in the
sense that, given Qk , (1.2) minimizes the next error ||Qk+1 − Q||2 . With this scale
factor, for the matrices Qk generated by (1.1), the error sequence ||Qk − Q||2 converges
monotonically to zero. Unfortunately, to determine the scale factor (1.2), one needs
to compute two extreme singular values of Qk at each iteration. In order to preserve
rapid convergence of (1.1) with scaling (1.2), highly accurate values of these extreme
(2)
singular values are required to guarantee ζk → 1. This is expensive enough to make
scale factor (1.2) unattractive.
To save the cost of computing the extreme singular values, one might approximate
(1.2). A commonly used scale factor is the (1, ∞)-scaling
(1.3)
(1,∞)
ζk
=
−1
||Q−1
k ||1 ||Qk ||∞
||Qk ||1 ||Qk ||∞
14
,
(where || · ||1 and || · ||∞ are the 1-norm and ∞-norm, respectively) which was proposed
(1,∞)
(2)
by Higham in [8]. The factor ζk
is within a constant factor of ζk . It adds a
negligible amount of arithmetic work compared to the cost of Q−1
k which is needed at
each iteration anyway.
The scale factor
(1.4)
(F )
ζk
1/2
−1/2
= ||Q−1
k ||F ||Qk ||F
(where || · ||F is the Frobenius norm) is discussed in [5, 8, 16]. It can also be computed
at a negligible cost. It is optimal in the sense that, given Qk , it minimizes ||Qk+1 ||F ,
and causes the sequence ||Qk ||F to converge monotonically [5].
Another relatively inexpensive scale factor is [4]
(1.5)
(d)
ζk
= | det(Qk )|−1/n .
The complex modulus of the determinant is very inexpensively obtained from the same
matrix factorization used to calculate Q−1
k . This scaling is optimal in the sense that
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
3
Pn
(k+1) 2
(k+1)
for a given iterate Qk , it minimizes D(Qk+1 ) = j=1 (ln(σj
)) where σj
is
the j-th singular value of Qk+1 . The function D(Qk+1 ) is a measure of the departure
of Qk+1 from the unitary matrices.
This paper considers the sub-optimal scaling strategy
s √
p
√
2 ab
, ζk = 1/ ρ(ζk−1 ), k = 2, 3, . . . ,
(1.6)
ζ0 = 1/ ab, ζ1 =
a+b
where ρ(x) = (x + x−1 )/2 and a and b are any numbers such that 0 < a ≤ ||A−1 ||−1
2 ≤
||A||2 ≤ b. Apart from estimating the extreme singular values of the initial matrix
Q0 = A, the scale factor costs only several floating point operations per iteration.
Moreover, only rough estimates of ||A||2 and ||A−1 ||−1
are needed. One may simply
2
choose a = ||A−1 ||−1
F and b = ||A||F . From Table 2.1 with such choices for any matrices
with condition number no greater than 1016 and size no greater than 1011 , at most
nine iterations of (1.1) with scaling (1.6) are necessary to approximate the unitary
polar factor Q to within 2-norm distance less than 10−16 .
We show below that in the presence of rounding error, (1.1) with (1.6) is numerically stable assuming that matrix inverses are calculated with small forward-backward
error. This is the case, for example, when matrix inverses are computed using the
bidiagonal reduction [7, Page 252]. We also prove the numerical stability of Newton’s
iteration with any of the scalings (1.2) - (1.4).
Commenting on an early draft of this paper, Krystyna Ziȩtak pointed out that
the suboptimal (quasi-optimal) scaling parameters were discovered independently by
Andrzej Kielbasiński but not published in the open literature. They were presented
by Krystyna Ziȩtak at the 1999 Householder meeting at Whistler and the 1999 ILAS
conference at Barcelona. In Section 5 of their recent paper [19], Kielbasiński et al
mention the quasi-optimal scaling parameters. In [18], these authors gave an error
analysis of Higham’s method [8] with the same mixed backward-forward stability
assumption for matrix inversion. They gave numerical experiments in [23].
In the following A ∈ Cn×n is always nonsingular. A = U ΣV ∗ is the SVD of A,
where U and V are unitary, Σ = diag(σ1 , . . . , σn ) is diagonal, and σ1 ≥ . . . ≥ σn ≥ 0
are the singular values of A. The set of the singular values is denoted by σ(A). The
condition number with respect to the spectral norm of A is denoted by κ2 (A) = σ1 /σn .
Following [7, Page 18], a flop is the computational work of a floating point addition,
subtraction, multiplication or division together with the associated subscripting and
indexing overhead. It takes two flops to execute the Fortran statement A(I,J) =
A(I,J) + C*A(K,J).
2. Scaling and convergence. Let A ∈ Cn×n be nonsingular with the SVD
A = U ΣV ∗ with Σ = diag(σ1 , σ2 , . . . , σn ). Each Newton iterate Qk in (1.1) has
singular value decomposition Qk = U Σk V ∗ where
Σk+1 = ζk Σk + (ζk Σk )−1 /2,
(2.1)
(k)
(k)
Σ0 = Σ.
(k)
In particular, Qk has singular values σ1 , σ2 , . . . σn
k > 0) that obey
(2.2)
(0)
σj
= σj ,
(k+1)
σj
1
=
2
(k)
ζk σj
+
1
(k)
ζk σj
(in no particular order when
!
(k)
= ρ(ζk σj ),
k = 0, 1, 2, . . . ,
4
R. BYERS AND H. XU
(k)
where ρ(x) = (x + x−1 )/2. For appropriately chosen ζk , limk→∞ σj = 1 and
limk→∞ Qk = U V ∗ = Q. Consequently, the convergence properties of (1.1) derive
(k)
directly from the n scalar sequences σj determined by (2.2). To attain good convergence behavior in (1.1), the acceleration parameters ζk must interact well with
ρ(x) = (x + x−1 )/2.
The following two lemmas list some easily verified elementary properties of ρ(x) =
(x + x−1 )/2.
Lemma 2.1. If x > 0, then
1. ρ( x1 ) = ρ(x),
2. 1 ≤ ρ(x) ≤ max(x, x−1 ) with either equality iff x = 1.
3. ρ(x) is decreasing on x ∈ (0, 1] and increasing on x ∈ [1, ∞).
Lemma 2.2.
Suppose that 0 < a ≤ b. Define αζ = max{(ζa)−1 , ζb} and
−1/2
ζopt = (ab)
. Then we have the following properties.
1. For any ζ > 0, 1 ≤ max ρ(ζx) = ρ(αζ ), and 1 = max ρ(ζx) iff αζ = 1.
a<x<b
a<x<b
2. For any ζ > 0, 1 ≤ min ρ(ζx), and 1 = min ρ(ζx) iff ζa ≤ 1 ≤ ζb.
a<x<b
pa<x<b
3. min αζ = αζopt = b/a, and ζ = ζopt is the only minimizer.
ζ>0
p 4. min max ρ(ζx) = min ρ(αζ ) = ρ αζopt = ρ
b/a .
ζ>0 a≤x≤b
ζ>0
In the following, for ease of notation let τ (x) be the function
√
1 √
1
(2.3)
.
τ (x) = ρ( x) =
x+ √
2
x
The k-fold composition of τ (x) with itself is written τ k (x), i.e., τ 0 (x) = x, τ 1 (x) =
τ (x), and for k > 1, τ k+1 (x) = τ (τ k (x)). Similarly ρk (x) is the k-fold composition of
ρ(x) = (x + x−1 )/2 with itself.
Suppose that 0 < a ≤ σn ≤ σ1 ≤ b. Consider the sequence of intervals generated
by Newton’s iteration: [a0 , b0 ] = [a, b], [a1 , b1 ] = ρ(ζ0 [a0 , b0 ]), [a2 , b2 ] = ρ(ζ1 [a1 , b1 ]),
(k)
. . . . It follows from (2.2) that σj ∈ [ak , bk ], j = 1, 2, . . . , n, k = 0, 1, . . .. Note that
minx>0 ρ(x) = 1, so, for k ≥ 1, [ak , bk ] ⊆ [1, bk ] and
(2.4)
(k)
1 ≤ σj
≤ bk .
It is intuitively satisfying to choose the sequence of acceleration parameters ζk in (1.1)
to minimize the sequence bk .
From Lemma 2.2, the initial optimal
scaling factor is ζ0 = (ab)−1/2 . The initial
p
p
interval is scaled to be ζ0 [a, b] = [ a/b, b/a] which contains 1. The next interval
is
p
[a1 , b1 ] = ρ(ζ0 [a, b]) = [1, ρ( b/a)] = [1, τ (b/a)]
where τ (x) is given by (2.3). The left endpoint is a1 = 1, so the optimal scaling factor
p
−1/2
for the next iteration is ζ1 = b1
= 1/ τ (b/a). The next interval is
p
[a2 , b2 ] = ρ(ζ1 [a1 , b1 ]) = [1, ρ( τ (b/a))] = [1, τ 2 (b/a)].
An easy induction shows that the sequence of intervals is [a0 , b0 ] = [a, b] and for
k ≥ 1, [ak , bk ] = [1, τ k (b/a)], and the sequence of optimal scaling factors is
q
√
(2.5)
ζ0 = 1/ ab, ζk = 1/ τ k (b/a), k = 1, 2, 3, . . .
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
5
which is equivalent to (1.6).
√
Since τ (x) = ρ( x) ≤ ρ(x) for x ≥ 1,
τ (b/a) ≤ ρ(b/a).
By induction we have
τ k (b/a) ≤ ρk (b/a),
for all k ≥ 1. The sequence b/a, ρ(b/a), ρ2 (b/a), . . . , is generated by Newton’s iteration
xk+1 = ρ(xk ) with x0 = b/a. It converges to 1 quadratically. Obviously τ k (b/a) ≥ 1
for k ≥ 0. So b/a, τ (b/a), τ 2 (b/a), . . . also converges to 1 at least quadratically. It is
not difficult to show that 1 ≤ τ (x) ≤ x for any x ≥ 1. We have
bk = τ k (b/a) = τ (τ k−1 (b/a)) ≤ τ k−1 (b/a) = bk−1 .
Hence, after the first step, the sequence of intervals satisfies
[a1 , b1 ] ⊇ [a2 , b2 ] ⊇ . . . ⊇ [ak , bk ] ⊇ . . . ,
and it converges to the single point 1 quadratically. (Note ak = 1 for all k ≥ 1.) The
initial interval, [a0 , b0 ] = [a, b] is an exception, because, in general, [a, b] + [1, τ (b/a)].
Based on this fact and (2.4), the convergence properties of (1.1) with (1.6) are
clear, and we summarize them in the following theorem.
Theorem 2.3. If
(2.6)
0 < a ≤ ||A−1 ||−1
2 ≤ ||A||2 ≤ b,
and Qk is obtained from the Newton iteration (1.1) with scaling (1.6) then
(2.7)
||Qk − Q|| ≤ τ k (b/a) − 1 ≤ ρk (b/a) − 1,
k = 1, 2, . . . .
In fact the convergence properties are highly satisfactory even in case b/a is large.
Table 2.1 uses Theorem 2.3 to list the number of Newton’s iteration (1.1) with
scaling (1.6) (and exact arithmetic) required to guarantee selected absolute errors
δ > ||Qk − Q||2 and values of b/a. The table demonstrates that Newton’s iteration
(1.1) with scaling (1.6) typically needs no more than nine iterations to attain typical floating point precision accuracy. The table also demonstrates that convergence
is insensitive to the choice of a and b—widely differing values of b/a need similar
numbers of iterations to attain similar accuracy. In particular the easy-to-compute
choices a = ||A−1 ||−1
F and b = ||A||F satisfy (2.6) and are unlikely to lead to even one
more iteration than the optimum choices of a = ||A−1 ||−1
2 and b = ||A||2 , particularly
for ill-conditioned matrices. For instance, for any A ∈ Cn×n with κ2 (A) = 1016 , for
16
27
a = ||A−1 ||−1
for any
F and b = ||A||F we have b/a ≤ nκ(A) = n10 . Then b/a ≤ 10
11
n ≤ 10 , and the number of iterations is 9, same as with the optimum choices.
In Theorem 2.3, smaller values of b/a give smaller values of τ k (b/a) and hence
better error bounds. Inequality (2.6) implies that b/a ≥ κ2 (A) = ||A−1 ||2 ||A||2 and
equality can be achieved only with a = ||A−1 ||−1
= σn and b = ||A||2 = σ1 . With
2
a = σn and b = σ1 the scaling factors (1.6) are
q
√
(2.8)
ζk = 1/ τ k (κ2 (A)), (k ≥ 1),
ζ0 = 1/ σ1 σn ,
6
R. BYERS AND H. XU
Table 2.1
Number of Newton iterations (1.1) with scaling (1.6) (and exact arithmetic) required to guarantee absolute error ||Qk − Q||2 < δ for selected values of δ and b/a such that 0 < a ≤ ||A−1 ||−1
2 ≤
||A||2 ≤ b. See Theorem 2.3.
δ \ b/a
10−1
10−4
10−7
10−10
10−13
10−16
10−19
10
2
4
4
5
5
5
6
105
4
5
6
7
7
7
7
1010
5
6
7
7
8
8
8
1015
6
7
8
8
8
9
9
1020
6
7
8
8
9
9
9
1025
6
8
8
9
9
9
10
1027
6
8
8
9
9
9
10
and the corresponding intervals are [σn , σ1 ], and [1, τ k (κ2 (A))] for k ≥ 1. Let Σk be
the matrices generated by (2.1) with a = σn and b = σ1 . It is easy to verify that in
this case, the right endpoint of the k-th interval is a singular value of Σk . This in
turn implies that inequality (2.7) is an equality, i.e.,
||Qk − Q||2 = ||Σk − I||2 = τ k (κ2 (A)) − 1.
The number sequence bk was also derived in [16] in order to show the convergence
behavior of Newton’s method with the optimal scale factors. It is shown that when
(2)
a = σn and b = σ1 , for Qk generated with ζk−1 defined in (1.2), one has ||Qk ||2 ≤ bk
[16, 11]. Due to this fact we call ζk defined in (2.5) sub-optimal scale factors. Note
that bk is derived based on different interpretations here. It is the right end point of
the kth interval generated by applying Newton’s iteration to the initial interval [a, b].
For this interval iteration, ζk is the scale factor that minimizes bk+1 − 1 (i.e., it makes
[ak+1 , bk+1 ] as close to 1 as possible).
3. The Algorithm. The Newton’s method (1.1) with scaling scheme (1.6) is
implemented by the following algorithm.
Algorithm 3.1 (Newton’s method (1.1) with scaling (1.6))
Input: Nonsingular matrix A ∈ Cn×n and a stopping criterion δ > 0
Output: The polar decomposition A = QH.
Step 0:
a. Set Q0 = A; Compute Q−∗
0
√
−1
b. Choose a ≤ ||Q−1
0 ||2 and b ≥ ||Q0 ||2 ; ζ0 = 1/ ab
c. Set k = 0
Step 1: While ||Qk − Q−∗
k ||F ≥ δ
a. Qk+1 = (ζk Qk + ζk−1 Q−∗
k )/2
b.
q
If k = 0, ζ1 = √ 2 √
qb/a+ a/b
Else
ζk+1 = ζ −12+ζ
k
k
End if
c. Compute Q−∗
k+1
d. k = k + 1
End while
Step 2: Q = (Qk + Q−∗
H = 12 (Q∗ A + (Q∗ A)∗ )
k )/2;
Here are some remarks.
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
7
1. The matrix A−1 = Q−1
0 needs to be computed in the first iteration anyway.
Hence, power iterations on A and A−1 may be used evaluate the extreme singular
values σ1 and σn−1 , respectively using only O(n2 ) extra flops per iteration. These
estimates may then serve as the sub-optimal scaling factors b and a−1 , respectively.
Since highly accurate estimates of σ1 and σn are unnecessary, a few power iterations
should suffice. Alternatively, ||A||F and ||A−1 ||−1
F may be used for b and a.
||
<
δ is essentially equivalent to ||Qk+1 −
2. The stopping criterion ||Qk − Q−∗
F
k
Qk ||F < δ, which is used in [11, Section 8.9]. This follows from the fact that when
ζk ≈ 1 (which is usually the case for a small δ),
Qk+1 − Qk =
1
1 −1 −∗
(ζ Qk − (2 − ζk )Qk ) ≈ (Q−∗
− Qk ).
2 k
2 k
In practice, in order for the computed Qk to be within O(ε)√of a unitary matrix,
where ε is the machine epsilon, it is sufficient to choose δ = O( ε). See (4.16).
3. Commonly the matrix inversion method used in the algorithm is an LU
factorization-based method such as the Gaussian elimination with partial pivoting
or complete pivoting [8]. Such an inversion method usually works well in practice
[18]. In order to guarantee the algorithm to be numerically backward stable, one may
use the more expensive bidiagonal reduction-based matrix inversion method provided
in Appendix A.1. So the computed matrix inverses satisfy the backward-forward error
model. See Assumption 4.1 in Section 4 below.
4. Estimating σ1 and σn usually uses O(n2 ) flops. Each iteration uses 2n3 flops
for the matrix inverse by an LU factorization-based method and O(n2 ) flops for matrix
addition. Computing H uses 2n3 flops. If p is the number of iterations for convergence,
then the algorithm uses a total of roughly 2(p + 1)n3 flops [8]. When p = 9 it is about
20n3 flops, which is less than the QR-like SVD method (which takes 22n3 to 26n3
flops for the SVD and 4n3 for Q and H). If the bidiagonal reduction-based matrix
inversion method is used, the total cost will be 2(3p + 1)n3 flops.
5. In order to reduce the cost while maintaining numerical stability, one may
first use the bidiagonal reduction-based method for a few iterations. When κ2 (Qk ) is
not too large, say 100, one shifts to an LU factorization-based inversion method for the
subsequent iterations. The matrix inverses essentially satisfy the backward-forward
error model in the latter case [10, Section 14]. Also, it takes only a few iterations
for the condition number to drop below 100. In the case when κ2 (A) = 10−16 , with
a = σn and b = σ1 , then ||Q3 ||2 = τ 3 (10−16 ) ≈ 42. Since this is usually the worst
case in practice, the bidiagonal reduction-based method is required in no more than
3 iterations. With this strategy, the maximum cost (with p = 9) is 3 · 6n3 + (9 − 3) ·
2n3 + 2n3 = 32n3 flops.
6. Although the cost for computing the scale factors (1.3) - (1.5) is negligible
in Newton’s iteration, computing the sub-optimal scale factors is essentially costless.
Also, the use of sub-optimal scaling simplifies the algorithm, since the “shifting scale
factor to 1” strategy, which is used for the (1, ∞)-scaling ([8]), is not needed. Finally,
with sub-optimal scaling, in general, the number of iterations is no greater than 9, and
it can be obtained by simply computing τ k (b/a) − 1. It is still not clear how to predict
the number of iterations with other scalings, although in practice it is observed that
the (1, ∞)-scaling and the sub-optimal scaling essentially have the same convergence
rate.
4. Stability and Rounding Error Analysis. In this section, a first order error
analysis establishes that Newton’s method (1.1) with scaling (1.6) can be implemented
8
R. BYERS AND H. XU
in a backward stable way. The same conclusion is drawn for the scalings (1.2), (1.3),
b H||
b 2 for the
and (1.4). In outline the approach is to estimate the residual ||A − Q
b
b
rounding-error-perturbed unitary factor Q and Hermitian factor H produced by finite
precision arithmetic in the algorithm in Section 3. The method here is to first estimate
b − Q and H
b − H and then use them to estimate the residual.
the forward errors Q
For the error analysis, we employ the standard model of floating point arithmetic
with machine epsilon ε [10, Section 2.2].
We also need the following assumptions.
Assumption 4.1. If a nonsingular matrix A ∈ Cn×n is inverted using finite
precision arithmetic with machine epsilon ε to obtain a “computed inverse” X, then
X = (A + E)−1 + F
where E, F ∈ Cn×n are perturbation matrices satisfying
||E||2 ≤ c1 (n)ε||A||2 ,
||F ||2 ≤ c2 (n)ε||A−1 ||2
and ci (n) (i = 1, 2) are some low degree polynomials of n.
In Newton’s method it is typical to use Gaussian elimination with partial or
complete pivoting for computing matrix inverses. Although it works well in practice,
the computed matrix inverses may not satisfy Assumption 4.1 [10, Section 14.1]. We
show in Appendix A.1 that Assumption 4.1 is satisfied by a matrix inversion algorithm
that uses the bidiagonal reduction method.
Assumption 4.2.
c3 (n)κ2 (A)ε < 1,
where c3 (n) is a low degree polynomial of n.
Note that 1/κ2 (A) is the measure of the relative distance of a nonsingular A to
the nearest singular matrices [7, Page 73], i.e.,
||E||2
1
=
min
.
κ2 (A) det(A+E)=0 ||A||2
If such a condition doesn’t hold then matrices like A + E in Assumption 4.1 can be
singular. So this is a condition about numerical nonsingularity of A. It is essential in
the subsequent first order error analysis, although it won’t be explicitly stated.
We now begin the error analysis. In practice, rounding errors perturb Newton’s
b k is the computed version of
method recurrence (1.1). Under Assumptions 4.1, if Q
Qk , then
b k+1 = ζk Q
b k + Fk,1 + 1 (Q
b k + Fk,2 )−∗ + Fk,3
Q
2
2ζk
ζk b
1 b
1
ζk
−∗
=
(Qk + Fk,2 ) +
(Qk + Fk,2 ) + Fk,1 +
Fk,3 − Fk,2
2
2ζk
2ζk
2
ζk b
1 b
(4.1)
=: (Qk + Fkb ) +
(Qk + Fkb )−∗ + Fkf
2
2ζk
where Fkb = Fk,2 and Fkf = Fk,1 + 2ζ1k Fk,3 − ζ2k Fk,2 . The perturbation matrix
Fk,1 represents rounding errors introduced by floating point matrix addition and scalar
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
9
multiplication, and the perturbation matrices Fk,2 and Fk,3 represent rounding errors
introduced by matrix inversion under Assumption 4.1. The F ’s obey the bounds
b k ||2 , ||ζ −1 Q
b −∗ ||2
||Fk,1 ||2 ≤ d1 ε max ||ζk Q
k
k
b k ||2
||Fk,2 ||2 ≤ d2 ε||Q
b −∗ ||2
||Fk,3 ||2 ≤ d3 ε||Q
k
b k ||2
||Fkb ||2 ≤ db ε||Q
ε
b k ||2 , ||ζ −1 Q
b −∗ ||2 ,
||Fkf ||2 ≤ df max ||ζk Q
k
k
2
where ε is the machine epsilon and d1 , d2 = db , d3 and df are some modest constants
that may depend on n, the details of the arithmetic and the inversion algorithm but
b k . Each Qk is a smooth function of Qk−1 and each
depend neither on Qk nor Q
Fk,j = O(ε), so, by induction,
(4.2)
b k = Qk + O(ε).
Q
Hence, the bounds above may be loosely expressed in terms of Qk as
(4.3)
(4.4)
||Fkb ||2 ≤ db ε||Qk ||2 + O(ε2 )
ε
2
||Fkf ||2 ≤ df max ||ζk Qk ||2 , ||ζk−1 Q−∗
k ||2 + O(ε )
2
≤ df ε||Qk+1 ||2 + O(ε2 ).
Inequality (4.4) is a consequence of (2.2).
We need the following lemma for continuing our analysis.
Lemma 4.3. Let A ∈ Cn×n be a nonsingular matrix with polar decomposition
A = QH with Q ∈ Cn×n unitary and H ∈ Cn×n Hermitian positive definite. If
F ∈ Cn×n with ||F ||2 = 1 and t ≥ 0, then when t is sufficiently small A + tF has the
polar decomposition
A + tF = Q (I + tE) + O(t2 ) H + tG + O(t2 )
where G ∈ Cn×n is the unique Hermitian solution to
(4.5)
F ∗ A + A∗ F = GH + HG,
and E ∈ Cn×n is the unique skew-Hermitian solution to
(4.6)
Q∗ F − F ∗ Q = EH + HE.
Also, E is given by
(4.7)
E = (Q∗ F − G)H −1 .
Proof. See proof in Appendix A.2.
At each of the perturbed Newton iteration (4.1) rounding errors are equivalent to
b k to Q
b k + Fkb , taking one Newton step (1.1), then perturbing the result
perturbing Q
b k = Wk H
b k and Q
b k + Fkb = W
fk H
e k be the polar decompositions
by adding Fkf . Let Q
b
b
of Qk and Qk + Fkb , respectively. By Lemma 4.3,
(4.8)
fk = Wk (I + Ekb ) + O(ε2 ),
W
10
R. BYERS AND H. XU
where Ekb satisfies
∗
bk + H
b k Ekb = Wk∗ Fkb − Fkb
Ekb H
Wk + O(ε2 ).
(4.9)
From (4.1),
1 b
ζk b
b k+1 − Fkf .
(Qk + Fkb ) +
(Qk + Fkb )−∗ = Q
2
2ζk
fk ,
Since the unitary factor in the polar decomposition of the left-hand side matrix is W
b
applying Lemma 4.3 to Qk+1 , we have
fk = Wk+1 (I − Ekf ) + O(ε2 ),
W
or equivalently,
fk (I + Ekf ) + O(ε2 ),
Wk+1 = W
(4.10)
where Ekf satisfies
(4.11)
∗
∗
b k+1 + H
b k+1 Ekf = Wk+1
Wk+1 + O(ε2 ).
Ekf H
Fkf − Fkf
b k satisfies (4.2), by Lemma 4.3
Since Qk has the polar decomposition Qk = QHk and Q
b
one also has Hk = Hk +O(ε), Wk = Q+O(ε). Based on these first-order error results,
(4.9) and (4.11) can be expressed as
(4.12)
(4.13)
∗
Ekb Hk + Hk Ekb = Q∗ Fkb − Fkb
Q + O(ε2 )
∗
Ekf Hk+1 + Hk+1 Ekf = Q∗ Fkf − Fkf
Q + O(ε2 ).
Combining (4.8) and (4.10), one has
Wk+1 = Wk (I + Ekb + Ekf ) + O(ε2 ).
It follows by induction (with W0 = Q),
Wj = Q(I + Ej ) + O(ε2 ),
j > 0,
where
(4.14)
Ej =
j−1
X
(Ekb + Ekf ).
k=0
Suppose that Algorithm 3.1 applied to a nonsingular matrix A ∈ Cn×n with polar
b p that
decomposition A = QH completes after p iterations in Step one. We obtain Q
−∗
−∗
b
b
b
b
b
bp
satisfies ||Qp − Qp ||2 ≤ ||Qp − Qp ||F < δ. With the polar decomposition Qp = Wp H
and (2.2), we have (see the proof in Appendix A.3)
(4.15)
1 b
b −∗
b
(Qp + Q
p ) = Wp + ∆Qp ,
2
where
b p ||2 < δ 2 /8.
||∆Q
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
11
Suppose δ is small. Step two of the algorithm produces approximate polar factors
1 b∗
∗b
b = 1 (Q
bp + Q
b −∗
b
Q
+
K
,
Q
A
+
A
Q
)
+
F,
H
=
p
2
2
b from Q
b p and K for rounding error
where F accounts for rounding error forming Q
b from A and Q
b obeying
forming H
||F || ≤ dF ε,
||K|| ≤ dK ε||A||2 ,
with modest constants dF and dK . So we have
b − Wp ||2 ≤ dF ε + δ 2 /8.
||Q
(4.16)
Since Wp = Q(I + E) with E := Ep defined in (4.14), by (4.15),
(4.17)
b = Q(I + E) + ∆Q
b p + F = Q(I + E + L),
Q
b p + F ) satisfies
where L = Q∗ (∆Q
||L||2 ≤ dL max(ε, δ 2 )
for some modest constant dL , which combines the rounding error F and the effect of
stopping criterion. Both dL and dK may depend on n and the details of the finite
b or H.
b With (4.17)
precision arithmetic and computational algorithm but not on A, Q
and the fact that E is skew-Hermitian,
(4.18)
b = 1 ((I + E + L)∗ Q∗ A + A∗ Q(I + E + L) + K)
H
2
1
= (1 − E + L∗ )H + H(I + E + L) + K)
2
1
= (2H − EH + HE + L∗ H + HL + K).
2
So by (4.17) and (4.18), to the first order,
bH
b − A = 1 Q(I + E + L)(2H − EH + HE + L∗ H + HL + K) − A
Q
2
1
= Q (2H − EH + HE + L∗ H + HL + 2EH + 2LH + K) − A
2
+O(||L||22 ) + O(ε||L||2 ) + O(ε2 )
1
= Q (EH + HE + L∗ H + HL + 2LH + K) + O(max(ε2 , εδ 2 , δ 4 )).
2
From (4.14) this expression can be written
p−1
X
bH
b − A = 1Q
Q
(Ekb H + HEkb + Ekf H + HEkf )
2
k=0
(4.19)
1
+ Q(L∗ H + HL + 2LH + K) + O(max(ε2 , εδ 2 , δ 4 )).
2
Note that so far the sub-optimal scale factors have not play a role. In order to
continue the analysis we need the following lemma which involves the sub-optimal
scaling.
12
R. BYERS AND H. XU
Lemma 4.4. Let A ∈ Cn×n be a nonsingular matrix with singular value decomposition U ΣV ∗ , Σ = diag(σ1 , σ2 , . . . , σn ) and σ1 ≥ σ2 ≥ · · · ≥ σn . Consider Newton’s
(k)
iteration (2.1) with scaling (1.6) and initial iterate Σ0 = Σ. Let σj be the j-th
(k)
(k)
diagonal entry of Σk , and let σmax = max1≤j≤n σj . If a, b satisfy 0 < a ≤ σn and
b ≥ σ1 , then, for all k ≥ 0 and 1 ≤ i, j ≤ n,
(k)
σmax
(k)
σi
+
(k)
σj
≤
b
.
σi + σj
Proof. See Appendix A.4.
Recall that Ekb , Ekf satisfy (4.12) and (4.13), respectively. Let
Hk = V Σk V ∗
be an eigen-decomposition where V is unitary and Σk is diagonal obeying (2.1). (Recall that throughout the algorithm the singular vectors of the Qk ’s are the singular
vectors of A. In particular, for all k, the unitary matrix of right singular vectors of
A is also a unitary matrix of eigenvectors of Hk .) In this notation, (4.12) and (4.13)
can be written
∗
ekb Σk + Σk E
ekb = Fekb − Fekb
E
+ O(ε2 )
∗
ekf Σk+1 + Σk+1 E
ekf = Fekf − Fekf
E
+ O(ε2 )
where
ekb = V ∗ Ekb V, E
ekf = V ∗ Ekf V, Fekb = V ∗ Q∗ Fkb V, Fekf = V ∗ Q∗ Fkf V.
(4.20) E
ekb and E
ekf are
So, the (i, j)-th entries of E
(4.21) eeij,kb =
feij,kb − feji,kb
(k)
σi
+
(k)
σj
+ O(ε2 ),
eeij,kf =
feij,kf − feji,kf
(k+1)
σi
+
(k+1)
σj
+ O(ε2 ).
ekj ||2 = ||Ekj ||2 and ||Fekj ||2 = ||Fkj ||2 for j = b, f , because V and Q are
Note that ||E
unitary.
Multiplying (4.19) on the left by V ∗ and on the right by V gives
bH
b − A)V =
V ∗ (Q
p−1 X
1 ∗
ekb Σ + ΣE
ekb + E
ekf Σ + ΣE
ekf
V QV
E
2
k=0
1
+ V ∗ (L∗ H + HL + 2LH + K) V + O(max(ε2 , εδ 2 , δ 4 ))
2
ekb and E
ekf are given by (4.20). From (4.21), the (i, j)-th entry of the sum
where E
13
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
Pp−1 e
e
e
e
is
k=0 Ekb Σ + ΣEkb + Ekf Σ + ΣEkf
p−1
X
(e
eij,kb + eeij,kf )(σi + σj )
k=0
=
p−1
X
k=0
=
p−1
X
feij,kb − feji,kb
(k)
σi
+
(k)
σj
+
(k+1)
σi
+
(k)
(k)
σmax
σi
(σi + σj ) + O(ε2 )
(k+1)
σj
(k)
feij,kb − feji,kb (σi + σj )σmax
k=0
!
feij,kf − feji,kf
(k)
+
(k+1)
feij,kf − feji,kf (σi + σj )σmax
(k+1)
+ σj
(k+1)
σmax
σi
!
(k+1)
+ σj
+ O(ε2 ).
Inequalities (4.3) and (4.4) and Lemma 4.4 imply
p−1
X
eij,kb + eeij,kf )(σi + σj )
(e
k=0
≤ 2ε
p−1
X
(k+1)
(k)
db
(σi + σj )σmax
(k)
σi
k=0
(k)
+ σj
+ df
(σi + σj )σmax
(k+1)
σi
(k+1)
!
+ O(ε2 )
+ σj
≤ 2pε(db + df )b + O(ε2 ).
Hence, the residual is bounded as
bH
b − A||2 ≤ ||Q
||Q
p−1
X
(Ekb H + HEkb + Ekf H + HEkf ) /2||2
k=0
∗
+ ||Q(L H + HL + 2LH + K)/2||2 + O(max(ε2 , εδ 2 , δ 4 ))
≤ npε(db + df )b + (2dL + dK /2) max(ε, δ 2 )||A||2 + O(max(ε2 , εδ 2 , δ 4 ))
In the same way, from (4.18) we can obtain
b − H||2 ≤ ||(−EH + HE)/2||2 + ||(L∗ H + HL + K)/2||2 + O(δ 4 )
||H
≤ npε(db + df )b + (dL + dK /2) max(ε, δ 2 )||A||2 + O(max(ε2 , εδ 2 , δ 4 )).
By applying Lemma 4.4 to (4.21) to estimate ||E||2 , then from (4.17) we can derive
b − Q||2 ≤ ||QE||2 + ||QL||2 ≤ θnpε(db + df )b + dL max(ε, δ 2 ),
||Q
where
(4.22)
θ=
1
σn
2
σn−1 +σn
A is complex
A is real.
(The formula in real case is based on the fact eeii,kb = eeii,kf = 0 from (4.21).)
We present the above error analysis results as well as (4.16) in the following
theorem.
Theorem 4.5. Suppose that A ∈ Cn×n satisfies Assumption 4.2 and has the polar
b and H
b be the matrices computed by Algorithm 3.1
decomposition A = QH. Let Q
14
R. BYERS AND H. XU
after p iterations with a matrix inversion method that satisfies the error model in
Assumption 4.1. Then
bH
b − A||2 ≤ npε(db + df )b + (2dL + dK /2) max(ε, δ 2 )||A||2 + O(max(ε2 , εδ 2 , δ 4 ))
||Q
b − H||2 ≤ npε(db + df )b + (dL + dK /2) max(ε, δ 2 )||A||2 + O(max(ε2 , εδ 2 , δ 4 ))
||H
b − Q||2 ≤ θnpε(db + df )b + dL max(ε, δ 2 )
||Q
b − Wp ||2 ≤ dF ε + δ 2 /8,
||Q
where db , df , dL , dK , dF are some modest constants, Wp is a unitary matrix, and θ is
defined in (4.22).
As noted in Table 2.1, in most practical situations p ≤ 9. Therefore, if b is not too
much greater√than ||A||2 and the algorithm uses a stopping criterion δ not too much
greater than ε, then the algorithm is backward
stable.
1√
Corollary 4.6. If b = ||A||2 and δ = n 4 ε then
√
bH
b − A||2 ≤ (np(db + df ) + n(2dL + dK /2))ε||A||2 + O(ε2 )
||Q
√
b − H||2 ≤ (np(db + df ) + n(dL + dK /2))ε||A||2 + O(ε2 )
||H
√
b − Q||2 ≤ np(db + df )ε(θ||A||2 ) + ndL ε
||Q
√
b − Wp ||2 ≤ (dF + n/8)ε.
||Q
b −H||2 and ||Q−Q||
b
Note that the error bounds for ||H
2 coincide with the perturbation
results; see, for example, [8, 21, 20] and [11, Section 8.2]. The quantity θ||A||2 serves
as the condition number for the perturbation of Q.
Remark 1. The same procedure can be used to give an error analysis for
Newton’s method with other scalings. Note that the backward stability depends
on whether
(k)
(4.23)
(σi + σj )σmax
(k)
σi
(k)
+ σj
= O(||A||2 ),
which depends on scaling factors. From Remark 2 in Appendix A.4, (4.23) holds for
the optimal scaling (1.2). Lemma A.3 in Appendix A.5 shows that (4.23) also holds
for the (1, ∞)-scaling (1.3) and the scaling (1.4). Therefore, Newton’s method with
these three scalings is also backward stable under the same conditions of Theorem 4.5
and an appropriate stopping criterion, when the number of iterations is not too large.
(See Remark 3 in Appendix A.5.)
We also observed that the numerical stability doesn’t necessarily depend on how
fast the method converges. In fact, one can show that when ||A||2 ≥ ||A−1 ||2 , Newton’s
method without scaling (a = b = 1) computes a polar decomposition satisfying the
same error bounds given in Theorem 4.5.
5. Numerical Examples. We did some numerical experiments with Newton’s
method (1.1) with scaling (1.6) using a = ||A−1 ||−1
F , b = ||A||F and also with the
(1, ∞)-scaling (1.3). The main purpose is to test the numerical stability results and
convergence rate, and to compare the sub-optimal scaling and (1, ∞)-scaling. For this
reason we used the bidiagonal reduction matrix inversion method Algorithm BR for
computing matrix inverses.
All numerical experiments were done on a Dell PC with a Pentium-IV processor,
in Matlab version 7.2 with machine epsilon ε ≈ 2.22 × 10−16 .
15
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
1√
In the numerical experiments we use stopping criterion δ = n 4 ε, where n is
the size of matrices. For Newton’s method with the (1, ∞)-scaling the scale factor
is shifted to 1 when ||Xk+1 − Xk ||F /||Xk+1 ||F < 10−2 . Based on the results in Corolb and H
b are the computed unitary and Hermitian polar factors produced
lary 4.6, if Q
b H||
b 2 /||A||2 , ||H − H||
b 2 /||H||2 , and
by Algorithm 3.1, then we expect to observe ||A − Q
b
||Q − Q||2 /(θ||A||2 ) not much larger than ε.
In the tables we will use the following notations
eQ =
b 2
||Q − Q||
,
θ||A||2
eH =
b 2
||H − H||
,
||H||2
b H||
b 2
||A − Q
,
||A||2
res =
b − I||2 ,
b∗ Q
ror = ||Q
where the “exact” factors Q and H for the matrices in the first example are obtained
from the SVD of A using Matlab’s variable precision arithmetic vpa with 24 significant decimal digits. p is the number of iterations, and n is the dimension of matrices.
The symbol “HSF” refers to Newton’s method (1.1) with Higham’s (1, ∞)-scaling.
The symbol “SUB” refers to (1.1) with scaling (1.6) using the Frobenius norms for
the initial interval, i.e., a = ||A−1 ||−1
F and b = ||A||F .
Example 1. Three groups of twenty real matrices were constructed with dimension 20 by using Matlab’ gallery(’randsvd’,20,kappa,5) with kappa equal
to 102 , 108 , 1015 , respectively. The singular values of the generated matrices are random values with uniformly distributed logarithm. For each group the ranges of the
condition numbers κ2 (A) and θ||A||2 are listed below.
kappa = 102 : κ2 (A) ∈ [37.7, 90.6], θ||A||2 ∈ [33.5, 83.7]
kappa = 108 : κ2 (A) ∈ [1.13, 6.75] × 107 , θ||A||2 ∈ [0.2, 5.44] × 107
kappa = 1015 : κ2 (A) ∈ [6.35 × 1010 , 7.53 × 1014 ], θ||A||2 ∈ [1.78 × 1010 , 5.22 × 1014 ]
The test results are summarized in Table 5.1, where for each group the minimum
and maximum values of the errors, residuals, and numbers of iterations are listed.
Table 5.1
Extreme values of errors, residuals, and iteration counts from Example 1.
kappa
p
eQ
eH
res
ror
SUB
HSF
SUB
HSF
SUB
HSF
SUB
HSF
SUB
HSF
102
Min
6
6
2.6e−17
2.2e−17
2.2e−16
1.7e−16
4.1e−16
3.5e−16
7.9e−16
7.0e−16
Max
6
7
7.2e−17
7.7e−17
4.1e−16
3.9e−16
8.9e−16
8.2e−16
1.1e−15
1.2e−15
108
Min
8
8
7.4e−19
7.4e−19
2.5e−16
1.9e−16
4.0e−16
2.7e−16
8.0e−16
7.9e−16
Max
8
8
1.7e−17
1.7e−17
4.1e−16
3.9e−16
7.5e−16
5.9e−16
1.1e−15
1.2e−15
1015
Min
8
8
6.9e−19
6.9e−19
1.9e−16
2.2e−16
3.5e−16
3.9e−16
7.8e−16
8.2e−16
Max
9
9
2.3e−17
2.3e−17
4.4e−16
4.0e−16
6.3e−16
7.2e−16
1.3e−15
1.1e−15
Example 2. In this example the test matrices are the Hilbert matrices which are
n × n matrices with entries aij = 1/(i + j − 1). The example uses dimensions n = 6,
n = 8, n = 10, n = 12 and n = 14. For every Hilbert matrix, the polar decomposition
is An = In An .
The condition number κ2 (An ) ranges from 1.5 × 107 to 5.1 × 1017 , and θ||An ||2
ranges from 2.6 × 105 to 1.0 × 1017 . The test results are reported in Table 5.2.
16
R. BYERS AND H. XU
Table 5.2
Errors, residuals and iteration counts for Hilbert matrices from Example 2.
n
p
eQ
eH
res
ror
SUB
HSF
SUB
HSF
SUB
HSF
SUB
HSF
SUB
HSF
6
8
7
1.2e−18
1.2e−18
2.3e−16
1.9e−16
2.6e−16
2.5e−16
2.6e−16
2.8e−16
8
8
8
1.6e−19
1.6e−19
1.9e−16
1.4e−16
2.4e−16
2.5e−16
3.9e−16
5.3e−16
10
9
8
2.7e−19
2.7e−19
8.7e−17
8.7e−17
1.8e−16
1.8e−16
6.2e−16
6.8e−16
12
9
9
4.1e−19
4.1e−19
1.4e−16
1.6e−16
3.0e−16
3.3e−16
6.3e−16
8.5e−16
14
9
9
2.0e−17
2.0e−17
2.3e−16
2.7e−16
3.8e−16
6.3e−16
6.5e−16
1.0e−15
Newton’s method with the sub-optimal scaling performed well in both examples
calculating the polar factors to nearly full precision. As predicted it takes at most 9
iterations. Newton’s method with the (1, ∞)-scaling performs equally well.
6. Conclusion. The sub-optimal scaling scheme (1.6) is essentially costless and
simplifies the algorithm of Newton’s iteration (1.1) for computing polar factors. In a
typical floating point system, with this scaling scheme, for matrices with κ2 (A) < 1016
no more than 9 iterations are needed for convergence to the unitary polar factor
with a convergence tolerance roughly equal to the machine epsilon. By employing
the bidiagonal factorization for matrix inversion, (1.1) with (1.6) forms a provably
backward stable algorithm. Newton’s method with (1, ∞)-scaling and scaling (1.4) is
also proved to be backward stable, provided the number of iterations is not too large.
Acknowledgment. We thank Nick Higham for detailed suggestions and sending
part of his unpublished book [11]. We thank Andrzej Kielbasiński for pointing out a
mistake in an earlier version and the referees for their comments and suggestions.
REFERENCES
[1] Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox. Part I. In
R. F. Sincovec et. al., editor, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 1993. Also available as Computer Science Report CSD-92-718, University of California, Berkeley, CA 1992.
[2] Z. Bai and J. Demmel. Design of a parallel nonsymmetric eigenroutine toolbox. Part II. Technical Report Department of Mathematics Research Report 95-11, University of Kentucky,
Lexington, KY, 1995.
[3] L. A. Balzer. Accelerated convergence of the matrix sign function method of solving Lyapunov,
Riccati and other matrix equations. Internat. J. Control, 32(6):1057–1078, 1980.
[4] R. Byers. Solving the algebraic Riccati equation with the matrix sign function. Linear Algebra
Appl., 85:267–279, 1987.
[5] A. A. Dubrulle. An optimum iteration for the matrix polar decomposition. Electron. Trans.
Numer. Anal., 8:21–25 (electronic), 1999.
[6] W. Gander. Algorithms for the polar decomposition. SIAM J. Sci. Statist. Comput.,
11(6):1102–1115, 1990.
[7] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies in the
Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition,
1996.
[8] N. J. Higham. Computing the polar decomposition—with applications. SIAM J. Sci. Statist.
Comput., 7(4):1160–1174, 1986.
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
17
[9] N. J. Higham. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra
Appl., 103:103–118, 1988.
[10] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM Publications, Philadelphia, PA,USA, second edition, 2002.
[11] N. J. Higham. Functions of Matrices: Theory and Computation. SIAM Publications, Philadelphia, PA,USA, 2008.
[12] N. J. Higham and P. Papadimitriou. Parallel singular value decomposition via the polar decomposition. Technical Report Numerical Analysis Report No. 239, Department of Mathematics, University of Manchester, Manchester M13 9PL, England, 1993.
[13] N. J. Higham and P. Papadimitriou. A parallel algorithm for computing the polar decomposition. Parallel Computing, 20(8):1161–1173, 1994.
[14] N. J. Higham and R. S. Schreiber. Fast polar decomposition of an arbitrary matrix. SIAM J.
Sci. Statist. Comput., 11:648–655, 1990.
[15] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
1990. Corrected reprint of the 1985 original.
[16] C. S. Kenney and A. J. Laub. On scaling Newton’s method for polar decomposition and the
matrix sign function. SIAM J. Matrix Anal. Appl., 13:688–706, 1992.
[17] C. S. Kenney and A. J. Laub. The matrix sign function. IEEE Trans. Automat. Control,
40(8):1330–1348, 1995.
[18] A. Kielbasiński and K. Ziȩtak. Numerical behavior of Higham’s scaled method for polar decomposition. Numerical Algorithms, 32:105–140, 2003.
[19] A. Kielbasiński, P. Zieliński, and K. Ziȩtak. Higham’s scaled method for polar decomposition
and numerical matrix-inversion. Technical Report Institute of Mathematics and Computer
Science Report I18/2007/P-045, Wroclaw University of Technology, Wroclaw, Poland, 2007.
[20] R.-C. Li. New perturbation bounds for the unitary polar factor. SIAM J. Matrix Anal. Appl.,
16:327–332, 1995.
[21] R. Mathias. Perturbation bounds for the polar decomposition. SIAM J. Matrix Anal. Appl.,
14:588–597, 1993.
[22] J. D. Roberts. Linear model reduction and solution of the algebraic Riccati equation by use
of the sign function. Internat. J. Control, 32:677–687, 1980. (Reprint of Technical Report
No. TR-13, CUED/B-Control, Cambridge University, Engineering Department, 1971).
[23] P. Zieliński and K. Ziȩtak. Polar decomposition - properties, applications and algorithms.
Applied Mathematics, Annals of the Polish Mathematical Society, 38:23–49, 1995.
Appendix A.
A.1. Bidiagonal reduction-based matrix inversion algorithm.
Algorithm BR
Input: Nonsingular matrix A ∈ Cn×n
Output: G = A−1
Step 1: Compute A = U BV ∗ with U, V unitary and B upper bidiagonal
Step 2: Solve BY = U ∗ for Y by back substitution
Step 3: Compute G = V Y
In Step 1 one may use the Householder reflectors to perform the reduction. The
reduction needs 83 n3 flops and computing U needs 43 n3 flops. The matrix V is stored
in factorized form. The cost for solving the matrix equation is O(n2 ) flops. With the
factorized form of V it needs 2n3 flops to compute G. So the total cost is 6n3 flops.
In order to show that a matrix inverse computed by Algorithm BR follows Assumption 4.1, we need the following lemma.
Lemma A.1. Consider the system Bx = z, where z ∈ Cn and B is nonsingular
and upper bidiagonal denoted by


α1 β1
0


..
..


.
.
.
B=


..

. βn−1 
0
αn
18
R. BYERS AND H. XU
Let x̂ be the numerical solution with back substitution. Then x̂ satisfies
x̂ = B −1 (z + δz) + δx,
where
|δz| ≤ 3nε|z| + O(ε2 ),
|δx| ≤ 3nε|x̂| + O(ε2 ).
Proof. The components of the computed vector x̂ can be formulated as
x̂n =
zn
,
αn (1 + n )
x̂k =
zk (1 + δk ) − βk x̂k+1
,
αk (1 + k )
1 ≤ k ≤ n − 1,
where |n |, |δk | < ε, |k | < 3ε, k ≤ n − 1. So we have
αn x̂n (1 + n ) = zn
(A.1)
αn−1 x̂n−1 (1 + n−1 ) + βn−1 x̂n = zn−1 (1 + δn−1 )
..
.
α1 x̂1 (1 + 1 ) + β1 x̂2 = z1 (1 + δ1 ).
Define x̃ = [x̃1 , . . . , x̃n ]T , z̃ = [z̃1 , . . . , z̃n ]T with
x̃n = x̂n (1 + n ),
z̃n = zn ,
x̃n−1 = x̂n−1 (1 + n−1 )(1 + n ),
z̃n−1 = zn−1 (1 + δn−1 )(1 + n ),
...,
...,
x̃1 = x̂1
z̃1 = z1 (1 + δ1 )
n
Y
(1 + k ),
k=1
n
Y
(1 + k );
k=2
By multiplying (1 + n ) to the 2nd equation, (1 + n )(1 + n−1 ) to the 3rd equation,
and so on, in (A.1), we obtain that x̃ and z̃ satisfy
B x̃ = z̃.
Let δz = z̃ − z and δx = x̂ − x̃. Then from x̃ = B −1 z̃ we have
x̂ = B −1 (z + δz) + δx.
The error bounds for |δx| and |δz| follows simply from the definitions.
Theorem A.2. Let X be the inverse of A computed by Algorithm BR then X
satisfies Assumption 4.1.
Proof. We only consider the first order errors.
Let Û B̂ V̂ ∗ be the computed bidiagonal factorization of A. Then Û = U + ∆U1 ,
V̂ = V + ∆V1 , where U, V are unitary and ||∆U1 ||2 ≤ d1 ε, ||∆V1 ||2 ≤ d2 ε, and
U B̂V ∗ = A + E,
where ||E||2 ≤ d3 ε||A||2 for some modest constants d1 , d2 , d3 . Let Ŷ be the numerical
solution of B̂Y = Û ∗ computed by back substitution. By Lemma A.1,
Ŷ = B̂ −1 (Û ∗ + ∆U2 ) + ∆Y,
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
19
where
3
3
||∆Y ||2 ≤ 3n 2 ε||Ŷ ||2 .
||∆U2 ||2 ≤ 3n 2 ε,
Let X be the computed matrix product V̂ Ŷ . We have
X = V̂ Ŷ + ∆X,
where ||∆X||2 ≤ d4 ε||Ŷ ||2 for some modest constant d4 . Now
X = V̂ B̂ −1 (Û ∗ + ∆U2 ) + V̂ ∆Y + ∆X = V̂ B̂ −1 Û ∗ + V̂ B̂ −1 ∆U2 + V̂ ∆Y + ∆X
=: V B̂ −1 U ∗ + F = (A + E)−1 + F,
where
F = ∆V1 B̂ −1 U ∗ + V B̂ −1 (∆U1 )∗ + V̂ B̂ −1 ∆U2 + V̂ ∆Y + ∆X.
3
It is easily verified ||F ||F ≤ (6n 2 + d1 + d2 + d4 )ε||A−1 ||2 .
A.2. Proof of Lemma 4.3. Equations (4.5) and (4.7) are established in the
proof of Theorem 2.5 in [8]. Here we slightly modify that proof to establish (4.6).
Let A(t) = A + tF have polar decompositions Q(t)H(t). Note that H(t) =
(A(t)∗ A(t))1/2 (positive definite square root) and Q(t) = A(t)H(t)−1 are sums, differences, products, quotients and compositions with C ∞ functions of the entries of
A(t) which is trivially a C ∞ function. Hence, Q(t) and H(t) are also C ∞ . (Here we
use the fact that A is nonsingular to observe that H(t) = (A(t)∗ A(t))1/2 avoids the
singularity of the square root at zero.) Using Q̇ and Ḣ to denote differentiation by t,
Taylor’s theorem implies that
Q(t) = Q(0) + tQ̇(0) + O(t2 ) = Q(I + tQ∗ Q̇(0)) + O(t2 )
H(t) = H(0) + tḢ(0) + O(t2 ).
The proof of Theorem 2.5 in [8] shows Ḣ(0) = G = G∗ with G given by (4.5) and
Q∗ (0)Q̇(0) = E = −E ∗ with E given by (4.7).
Differentiate A + tF = Q(t)H(t) to get F = Q̇H + QḢ. Evaluating at t = 0,
letting E = Q∗ (0)Q̇(0), G = Ḣ(0) gives Q∗ F = EH + G. Using the facts that E is
skew-Hermitian and G is Hermitian while subtracting this equation to its Hermitian
transpose gives (4.6). The Lyapunov operator on the right is nonsingular, because A
nonsingular implies that the eigenvalues of the Hermitian polar factor H are real and
positive. Hence, the solution E is unique.
b p = Up Σp Vp∗ be the SVD. Recall that the singular
A.3. Proof for (4.15). Let Q
(p)
bp − Q
b −∗
values satisfy σ ≥ 1. Then ||Q
p ||2 < δ implies
i
(p)
σi
−
1
(p)
< δ,
i = 1, 2, . . . , n.
σi
Because
(p)
σi
−
1
(p)
σi
(p)
=
(σi
(p)
+ 1)(σi
(p)
σi
− 1)
,
20
R. BYERS AND H. XU
we have
(p)
(p)
σi
δσi
−1<
(p)
σi
.
+1
Then
1
2
(p)
σi
+
1
!
(p)
(p)
−1=
(σi
− 1)2
(p)
σi
2σi
(p)
σi
<
(p)
2(σi
+ 1)2
δ2 .
(p)
(p)
Since the function x/(x+1)2 is decreasing when x ≥ 1, we have σi /(σi +1)2 ≤ 1/4.
Hence
!
1
1
(p)
σi + (p) − 1 < δ 2 /8.
2
σ
i
and using Wp =
Up Vp∗ ,
−1
∗
b p ||2 = ||(Q
bp + Q
b −∗
||∆Q
p )/2 − Wp ||2 = ||Up ((Σp + Σp )/2 − I)Vp ||2
!
!
1
1
(p)
= max
σi + (p) − 1 < δ 2 /8.
i
2
σ
i
(k)
A.4. Proof of Lemma 4.4. By (2.4), σmax ≤ bk for k ≥ 0.
So we only need to show
bk
(k)
σi
+
b
,
σi + σj
≤
(k)
σj
k ≥ 0.
An easy calculation shows for all i and j,
(k)
σi
(k)
σj
+
ζk−1 (k−1)
1
(k−1)
=
(σi
+ σj
) 1+
(k−1) (k−1)
2
2
ζk−1 σi
σj
!
.
Recall bk satisfies
1
bk =
2
1
ζk−1 bk−1 +
ζk−1 bk−1
(k−1) (k−1)
σj
Because σi
1
ζk−1
=
bk−1 1 + 2 2
2
ζk−1 bk−1
≤ b2k−1 ,
bk
(k)
σi
(k)
=
+ σj
ζk−1
2
≤
bk−1 1 + ζ 2 1b2
k−1 k−1
(k−1)
(k−1)
1
(σi
+ σj
) 1 + 2 (k−1)
ζk−1
2
ζk−1 σi
bk−1
(k−1)
σi
(k−1)
(k−1)
σj
.
+ σj
An easy induction on k now implies
bk
(k)
σi
Remark 2.
+
(k)
σj
≤
b0
(0)
σi
+
(0)
σj
=
b
,
σi + σj
k ≥ 0.
NEWTON’S ITERATION FOR POLAR DECOMPOSITION
21
1. The condition a ≤ ||A−1 ||−1
is essential for proving Lemma 4.4. Without it
2
(k)
one may not have the inequality σj ≤ bk and the result can’t be proved.
(k)
2. In the case that the optimal scaling is employed, bk = σmax and one has the
same result.
A.5. Relation (4.23) for the scalings (1.3) and (1.4).
Lemma A.3. Suppose that the matrix sequence Qk is generated by Newton’s
(1,∞)
(F )
iteration with scaling ζk that is either ζk
in (1.3) or ζk in (1.4). The singular
values of Qk satisfy
!
k−1
(k)
Y
k
||A||2
||A||2
σmax
(A.2) (k)
≤ n2
, k ≥ 0, 1 ≤ i, j ≤ n,
≤
ψ`
(k)
σi + σj
σi + σj
σ +σ
i
`=0
j
where ψ` = max 1,
(`)
√
1
≤
(`)
ζ`2 σmax σmin
n.
Proof. Let
wk =
1
(k) (k)
ζk2 σmin σmax
,
k ≥ 0.
If
s
ζk =
using the inequalities ||A||2 ≤
(1,∞)
ζk
p
=
4
−1
||Q−1
k ||1 ||Qk ||∞
,
||Qk ||1 ||Qk ||∞
||A||1 ||A||∞ ≤
√
n||A||2 , we have ([11, Page 208])
√
1
1
√
ζk ≤ q
≤ 4 n ζk .
4
n
(k) (k)
σmin σmax
(A.3)
If
s
ζk =
(F )
ζk
using the inequalities ||A||2 ≤ ||A||F ≤
So in both cases we have
√
=
||Q−1
k ||F
,
||Qk ||F
n ||A||2 , we also have (A.3).
√
1
√ ≤ wk ≤ n.
n
Construct an interval [ak , bk ] according to the rule
(k)
ak = σmin wk ,
(k)
ak = σmin ,
(k)
(k)
(k)
bk = σmax
wk ≤ 1
.
(k)
bk = σmax wk , wk ≥ 1
(k)
(k)
Because ak ≤ σmin and bk ≥ σmax , we have σ1 , . . . , σn
cases we have (ζk ak )−1 = ζk bk . So ρ(ζk ak ) = ρ(ζk bk ) and
(k+1)
σj
(k)
= ρ(ζk σj ) ∈ ρ(ζk [ak , bk ]) = [1, ρ(ζk bk )],
∈ [ak , bk ]. Also, in both
j = 1, 2, . . . , n.
22
R. BYERS AND H. XU
Since
ρ(ζk bk ) =
1
2
1
ζk bk
ζk bk +
ζk bk
2
=
1+
1
(ζk bk )2
,
we have
(k+1)
σmax
(k+1)
(k+1)
σi
+ σj
ρ(ζk bk )
≤
(k+1)
σi
(k+1)
+ σj
1 + (ζk 1bk )2
=
(k)
(k)
ζk
1
+ σj ) 1 + 2 (k) (k)
2 (σi
ζk σi σj
bk 1 + (ζk 1bk )2
=
(k)
(k)
(σi + σj ) 1 + 2
(k)
(k)
(k+1)
σi
(k+1)
+ σj
(k)
bk
≤
(k)
σi
(k)
+ σj
(k) (k)
(k)
If wk ≥ 1 then bk = σmax wk . Because σi σj
(k+1)
σmax
(k+1)
σi
(k+1)
+ σj
(k)
σi
σmax
=
(k)
σi
(k)
.
+ σj
2
(k)
≤ σmax ≤ b2k , we have
(k)
bk
≤
(k)
σj
2
(k)
≤ σmax = b2k , we have
(k) (k)
If wk ≤ 1, then bk = σmax . Because σi σj
(k+1)
.
1
ζk σi
σmax
ζk bk
2
(k)
+ σj
= wk
σmax
(k)
σi
(k)
.
+ σj
Hence
(k+1)
σmax
(k+1)
σi
+
(k+1)
σj
(k)
≤ ψk
σmax
(k)
σi
+
(k)
σj
,
ψk = max{1, wk } ≤
√
n.
Then the inequalities in (A.2) can be easily derived.
(1,∞)
(F )
Remark 3. Suppose that Newton’s method with scaling ζk
or ζk terminates after p iterations. We have the same error bounds as in Theorem 4.5 but with
p
bH
b − A||2 , ||H
b − H||2 , and ||Q
b − Q||2 .
a factor n 2 in the first term of the bounds for ||Q
When n and p are both large, this factor is notably large and one may not be able to
use the bounds to claim backward stability. Unfortunately, we are unable to provide
an upper bound for p, although it is observed that p is usually moderate in practice
(p ≤ 9 for the (1, ∞)-scaling for all examples in Section 5).
p
However, we argue that the factor n 2 is an overestimate. The point is that when
(F )
(1,∞)
Qk is getting close to a unitary matrix, ζk
and ζk are getting close to 1. Then
wk as well as ψk will be close to 1. So in practice wk will be around 1 after couple
iterations.
Download