ETNA

advertisement
ETNA
Electronic Transactions on Numerical Analysis.
Volume 43, pp. 45-59, 2014.
Copyright  2014, Kent State University.
ISSN 1068-9613.
Kent State University
http://etna.math.kent.edu
A MINIMAL RESIDUAL NORM METHOD FOR LARGE-SCALE
SYLVESTER MATRIX EQUATIONS∗
SAID AGOUJIL†, ABDESLEM H. BENTBIB‡, KHALIDE JBILOU§, AND EL MOSTAFA SADEK‡§
Abstract. In this paper, we present a new method for solving large-scale Sylvester matrix equations with a
low-rank right-hand side. The proposed method is an iterative method based on a projection onto an extended
block Krylov subspace by minimization of the norm of the residual. The obtained reduced-order problem is solved
via different direct or iterative solvers that exploit the structure of the linear operator associated with the obtained
matrix equation. In particular, we use the global LSQR algorithm as iterative method for the derived low-order
problem. Then, when convergence is achieved, a low-rank approximate solution is computed given as a product of
two low-rank matrices, and a stopping procedure based on an economical computation of the norm of the residual
is proposed. Different numerical examples are presented, and the proposed minimal residual approach is compared
with the corresponding Galerkin-type approach.
Key words. extended block Krylov subspaces, low-rank approximation, Sylvester equation, minimal residual
methods
AMS subject classifications. 65F10, 65F30
1. Introduction. In this paper, we consider the large-scale Sylvester matrix equation
(1.1)
AX + XB + EF T = 0,
where A and B are square matrices of size n×n and s×s, respectively, and E, F are matrices
of size n × r and s × r, respectively.
Sylvester equations play a fundamental role in many problems in control, communication
theory, image processing, signal processing, filtering, model reduction problems, decoupling
techniques for ordinary partial differential equations, and the implementation of implicit numerical methods for ordinary differential equations; see, e.g., [3, 6, 10, 13] and the references
therein. Sylvester matrix equations have a unique solution if and only if α + β 6= 0 for
all α ∈ σ(A) and β ∈ σ(B), where σ(Z) denotes the spectrum of the matrix Z. For small- to
medium-sized problems, direct methods such as the Bartels-Stewart [2] and the HessenbergSchur [8] algorithms can be used. These algorithms are based on reducing A and B into
triangular or Hessenberg forms.
During the last years, several projection methods based on Krylov subspaces have been
proposed; see, e.g., [5, 6, 10, 12, 13, 14, 18, 23] and the references therein. The main idea
employed in these methods is to use a block (global or extended) subspace and then apply the
block (global or extended block) Arnoldi process to construct orthonormal bases. Then, the
original large Sylvester matrix equation is projected onto these Krylov subspaces. The alternating directional implicit (in short ADI) iterations [3, 4, 16] can also be utilized if spectral
information about A and B is given. Note that, ADI iterations allow for faster convergence
if optimal shifts to A and B can be effectively computed and linear systems with shifted
coefficient matrices are solved effectively at low cost.
∗ Received October 30, 2013. Accepted May 6, 2014. Published online on September 5, 2014. Recommended by
L. Reichel.
† Laboratory LAMAI, Département of Computer Science, Faculty of Sciencs and Technology, B.P 509 Errachidia 52000, Morocco (agoujil@gmail.com).
‡ Faculté des Sciences et Techniques-Gueliz, Laboratoire de Mathématiques Appliquées et Informatique, Morocco (a.bentbib@uca.ma, sadek@lmpa.univ-littoral.fr).
§ Laboratoire LMPA, 50 rue F. Buisson, ULCO, 62228, Calais, France
(jbilou@lmpa.univ-littoral.fr).
45
ETNA
Kent State University
http://etna.math.kent.edu
46
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
The approximate solutions produced by Krylov subspace methods are given as
Xm = Vm Ym WTm , where Vm and Wm are orthonormal matrices whose columns form bases
for the Krylov subspace Km (A, E) and Km (B T , F ), respectively. Then, the Sylvester matrix
equation is projected via the following Galerkin condition
VTm (AVm Ym WTm + Vm Ym WTm B + EF T )Wm = 0.
The obtained low-order Sylvester matrix equation is solved by direct methods such as the
Hessenberg-Schur method. Instead of using a Galerkin-type projection, in this paper we
consider the minimization of the Frobenius norm of the residual associated with the Sylvester
matrix equation (1.1), which leads to the following minimization problem
YmMR =
argmin
Xm =Vm Ym WT
m
AXm + Xm B + EF T .
F
The rest of the paper is organized as follows. In the next section, we recall the extended block Arnoldi algorithm with some of its properties. In Section 3, we define the minimal residual method for Sylvester matrix equations by using the extended Krylov subspaces
Km (A, E) and Km (B T , F ). Then, in Section 4, we give some direct and iterative methods
for solving the obtained low-order minimization problem. Finally, Section 5 is devoted to
numerical experiments.
Throughout the paper, we use the following notations. The Frobenius inner product of
the matrices X and Y is defined by hX, Y iF = tr(X T Y ), where tr(Z) denotes the trace of a
square matrix Z. The associated norm is the Frobenius norm denoted by k.kF . The Kronecker
product of two matrices A and B is defined by A ⊗ B = [ai,j B], where A = [ai,j ]. This
product satisfies the properties (A ⊗ B)(C ⊗ D) = (AC ⊗ BD) and (A ⊗ B)T = AT ⊗ B T .
Finally, if X is a matrix, then vec(X) is the vector obtained by stacking all the columns of X.
2. The extended block Arnoldi process. In this section, we recall the extended Krylov
subspace and extended block Arnoldi process. Let V be a matrix of dimension n × r. Then
the block Krylov subspace associated to (A, V ) is defined as
Km (A, V ) = Range
V, AV, A2 V, . . . , Am−1 V
.
Assuming that the matrix A is nonsingular, the extended block Krylov subspace associated
with (A, A−1 , V ) is given by (see [5, 23])
e
Km
(A, V ) = Range
V, A−1 V, AV, A−2 V, A2 V, . . . , Am−1 V, A−m+1 V
= Km (A, V ) + Km (A−1 , A−1 V ).
The extended block Arnoldi process allows us to construct an orthonormal basis for the exe
(A, V ); see [9, 23].
tended block Krylov subspace Km
ETNA
Kent State University
http://etna.math.kent.edu
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
47
A LGORITHM 1. The extended block Arnoldi algorithm (EBA).
Input: A an n × n matrix, V an n × r matrix, and m an integer.
1. Compute the QR decomposition of [V, A−1 V ], i.e., [V, A−1 V ] = V1 Λ.
2. Set V0 = [ ].
3. For j = 1, 2, ..., m
(1)
(2)
(a) Set Vj : first r columns of Vj , Vj : second r columns of Vj .
i
h
(1)
(2)
(b) Vj = [Vj−1 , Vj ] , V̂j+1 = A Vj , A−1 Vj
(c) Orthogonalize V̂j+1 with respect to Vj to get Vj+1 , i.e.,
i. For i = 1, 2, . . .
ii. Hi,j = ViT V̂j+1
iii. V̂j+1 = V̂j+1 − Vi Hi,j
iv. End For
(d) Compute the QR decomposition of V̂j+1 , i.e., V̂j+1 = Vj+1 Hj+1,j .
4. End For
Since the extended block Arnoldi algorithm involves the Gram-Schmidt process, the obtained block vectors Vm = [V1 , V2 , . . . , Vm ] (Vi ∈ Rn×2r ) have their columns mutually
orthogonal provided none of the upper triangular matrices Hj+1,j are rank deficient. After m steps, Algorithm 1 has built an orthonormal basis Vm of the extended block Krylov
subspace Range({V, A V, . . . , Am−1 V, A−1 V, . . . , (A−1 )m V }) and a block upper Hessenberg matrix Hm whose non zeros blocks are the entries Hi,j . Note that each submatrix Hi,j
(1 ≤ i ≤ j ≤ m) is of order 2r.
Let Tm ∈ R2mr×2mr be the restriction of the matrix A to the extended Krylov subspace
e
Km (A, V ), i.e., Tm = VTm A Vm . It is shown in [23] that Tm is also block upper Hessenberg
with 2r × 2r blocks. Moreover, a recursion is derived to compute Tm from Hm without
requiring matrix-vector products with A. For more details on how to compute Tm from Hm ,
we refer to [23]. We note that for large problems, the inverse of the matrix A is not computed
explicitly, and in this case we can use iterative solvers with preconditioners to solve linear
systems involving A. We notice that Algorithm 1 could suffer from a possible breakdown
if for some j the upper triangular matrix Hj+1,j is rank deficient. To avoid this problem,
one may define a sequential extended block Arnoldi with a deflation procedure in the same
manner as it was already done for the classical block Arnoldi algorithm.
P ROPOSITION 2.1 ([9]). Let T̄m = VTm+1 AVm . Then we have the following relations
AVm = Vm+1 T̄m
= Vm Tm + Vm+1 Tm+1,m ETm ,
where Em = [02(m−1)r×2r , I2r ]T is the matrix of the last 2r columns of the identity matrix I2mr .
In the next section, we define the minimal residual approach for solving Sylvester matrix
equations. For this we can use block, global or extended Krylov subspaces. We will focus
only on extended Krylov subspace methods; the other approaches could be used in the same
manner.
3. The minimal residual method for large-scale Sylvester matrix equations. In the
last years, several projection methods based on Krylov subspaces have been proposed to produce approximations to exact solutions of Lyapunov or Sylvester equations;
see, e.g., [6, 12, 13, 14, 22, 23]. Most of these projection methods use the Galerkin condition to extract the approximate solutions from the projection space.
ETNA
Kent State University
http://etna.math.kent.edu
48
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
In this section, we consider approximate solutions of the form
X MR = Vm YmMR WTm ,
where YmMR solves the following low-order minimization problem
Y MR =
argmin AXm + Xm B + EF T F ,
Xm =Vm Ym WT
m
where Vm = [V1 , V2 , · · · , Vm ], Wm = [W1 , W2 , · · · , Wm ] ∈ Rn×mr are the orthonormal matrices constructed by applying simultaneously m steps of the extended block Arnoldi
algorithm to the pairs (A, E) and (B T , F ), respectively. In this case, we have
AVm = Vm+1 T̄A
m,
B T Wm = Wm+1 T̄B
m.
We have the following result.
T HEOREM 3.1. Let Vm and Wm be the orthonormal matrices constructed by applying
simultaneously m steps of the extended block Arnoldi algorithm to the pairs (A, E) and
(B T , F ), respectively. Then the minimization problem
YmMR =
argmin AXm + Xm B + EF T ,
F
Xm =Vm Ym WT
m
can be written as
(3.1)
A
YmMR = argmin T̄m Ym I
I
RE RFT
T
0 +
Ym (T̄B
)
+
m
0
0
0 ,
0 F
where E = V1 RE and F = W1 RF are the QR-factorizations of E and F , respectively.
Proof. We have
AX + XB + EF T min
X=Vm Ym WT
m
F
= min kAVm Ym WTm + BVm Ym WTm + V1 RE RFT W1T kF
Ym
I
RE RFT 0
B T
T
A
Y (T̄ ) +
Wm+1 = min Vm+1 T̄m Ym I 0 +
0 m m
0
0
Ym
F
T
A
I
R
R
0
E F
.
T̄m Ym I 0 +
Y (T̄B )T +
= min 0 m m
0
0 F
Ym Notice that the computation of the norm of the residual R(Xm ) requires only the knowledge of YmMR but does not require the approximate solution Xm , which is computed only
when convergence is achieved. This residual norm can be calculated by
A
I
RE RFT 0 B T
.
I
0
(3.2)
rm = T̄
Y
+
Y
(
T̄
)
+
m m
0 m m
0
0 F
GA
Recall that the Galerkin approach produces approximations Xm
= Vm YmGA WTm where
YmGA is a solution of the low-order Sylvester matrix equation
(3.3)
GA
GA
B T
e eT
TA
m Ym + Ym (Tm ) + E1 F1 = 0,
e1 = VT E and Fe1 = WT F . The projected problem (3.3) has a unique solution if
where E
m
m
BT
and only if the matrices TA
and
T
m
m have no eigenvalue in common. Notice that the problem (3.1) does not have this restriction. Now, the main issue is how to solve the reduced-order
minimization problem (3.1). In the next section, we describe different direct and iterative
strategies for solving (3.1). These techniques were inspired from those in [18] for Lyapunov
matrix equations.
ETNA
Kent State University
http://etna.math.kent.edu
49
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
4. Methods for solving the reduced minimization problem.
4.1. The least squares approach associated with the Kronecker form. Using the
Kronecker product properties, problem (4.3) can be written as
I
I
A
B
,
⊗
T̄
+
T̄
⊗
y
+
c
(4.1)
min m
m
0
0
y
F
RE RFT 0
and y = vec(Y ). This least squares problem is then solved
0
0
by direct methods such as the classical QR algorithm. For small values of the iteration number m, this gives good results. However, when m increases, the method becomes very slow
and cannot be used.
where c = vec
4.2. The Hu-Reichel method for solving the reduced problem. Here we discuss the
Hu-Reichel method (see [11]) for solving the reduced-order matrix least squares problem (3.1).
More precisely, we apply the approach given in [15] and the strategy used in [18].
T
T
B
be the real Schur decompositions of the matrices
Let TA
m = U TA U and Tm = V TB V
TA and TB , respectively. The minimization problem (3.1) is written as
B T A
I
Tm
RE RFT 0 Tm
min Y
+
Y I 0 +
,
hB
0
0
0 Y hA
F
A
B
where hA and hB represent the 2r last rows of the matrices Tm and Tm , respectively. Then
we obtain the new minimization problem
TA Y + Y TB T + R R T
0
E
m
m
F
min T ,
Y
hA Y
Y hB F
which is equivalent to
2
T 2
A
2
BT
T
min Tm Y + Y Tm + RE RF + khA Y kF + Y hB F
Y
F
2 2
2
T
T
T
= min U TA U Y + Y (V TB V ) + RE RFT F + khA Y kF + Y hTB F
Y
2 2 e e T
e T T 2
T
T e
= min TA Y + Y TB + U RE RF V + hA U Y + Y V hB ,
e
Y
F
F
F
where Ye = U T Y V . Using the Kronecker product, the preceding problem is transformed into
(4.2)


2

I ⊗ TA + TB ⊗ I
vec(U T RE RFT V ) 
 .
 vec(Ye ) + 
I ⊗ hA U
0
min Y 0
hB V ⊗ I
2
Problem (4.2) can also be expressed as
2
R
d ,
ỹ
+
min 0 2
S
ỹ
I ⊗ hA U
where R = I ⊗ TA + TB ⊗ I and S =
.
hB V ⊗ I
ETNA
Kent State University
http://etna.math.kent.edu
50
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
The associated normal equation is (RT R + S T S)ỹ + RT d = 0. If the matrix R is
nonsingular, then we have
(I + (SR−1 )T SR−1 )z + d = 0,
where z = Rỹ.
Now, using the Sherman-Morrison-Woodbury formula, we get
z = −d + (SR−1 )T (I + SR−1 (SR−1 )T )−1 SR−1 d.
Therefore from z, we obtain Ye and then Y , the solution of problem (3.1). We notice that an
economical strategy for computing SR−1 is also possible and can be obtained in the same
way as in [15, 18]. Furthermore, in its matrix form, problem (4.2) could also be solved by the
global LSQR or by the global CG method with an appropriate preconditioner.
R EMARK 4.1. Both the Kronecker least squares approach and the Hu-Reichel method
require O(m3 r3 ) multiplications which is similar to the amount of computations required at
each step of the Galerkin method when using the Bartels-Stewart algorithm for solving the
low-order Sylvester equation (3.3). When m increases, the least squares approach and the
Hu-Reichel method become expensive. In these cases, one should use iterative methods as
will be defined in the next two subsections.
4.3. The global LSQR algorithm. In this section, we demonstrate how to adapt the
LSQR algorithm of Paige and Sanders [19] for solving the low-order least squares problem (3.1). The classical LSQR algorithm is analytically equivalent to the conjugate gradient
method applied to the associated normal equation. The LSQR method makes use of the
Golub-Kahan bidiagonalization process [7].
At each step m of the process, we determine approximate solutions to the solution YmMR
of the problem (3.1). For simplicity, problem (3.1) is now written in the following way,
(4.3)
min kLm (Y ) − CkF ,
Y
where
(4.4)
Lm (Y ) =
T̄A
mY
and
C=−
I
I
0 +
Y T̄BT
m ,
0
RE RFT
0
0
.
0
Notice that the adjoint of the linear operator Lm with respect to the Frobenius inner product
is given by
I
T
+ I 0 Z T̄B
(4.5)
L∗m (Z) = (T̄A
)
Z
m.
m
0
The global Lanczos bidiagonalization process. This iterative procedure is initialized
by
e1 = C,
β1 U
e1 ),
α1 Ve1 = Lm T (U
β1 = kCkF ,
e1 )kF ,
α1 = kLm T (U
ETNA
Kent State University
http://etna.math.kent.edu
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
51
and continued using the recurrence relations
ei+1 = Lm (Vei ) − αi U
ei ,
βi+1 U
ei − βi Vei .
αi+1 Vei+1 = L∗m U
ei kF = kVei kF = 1, i = 1, 2, . . . We
The scalars αi > 0 and βi > 0 are chosen such that kU
e k := [U
e k := [Ve1 , Ve2 , ..., Vek ], and
e1 , U
e2 , ..., U
ek ], V
set U

α1
 β2


T̄k = 




α2
β3
..
.
..
.
αk
βk+1



.



It is not difficult to show that the constructed blocks Vei and Vej are F-orthonormal (which
means that they are orthonormal with respect to the Frobenius inner product). Using the
global Lanczos process, we find approximate solutions Y k of the exact solution YmMR of
problem (3.1). We can write the previous recurrence relations in matrix form as
(4.6)
e k+1 (T̄k ⊗ I).
[Lm (Ve1 ), Lm (Ve2 ), ..., Lm (Vek )] = U
The method consists in searching for an approximate solution of the form
Yk =
k
X
i=1
z (i) Vei .
Applying the linear operator Lm to Y k , we get
!
k
k
X
X
z (i) Lm (Vei )
z (i) Vei =
Lm (Y k ) = Lm
i=1
i=1
e k+1 (T̄k ⊗ I)(zk ⊗ I)
= [Lm (Ve1 ), ..., Lm (Vek )] (zk ⊗ Is ) = U
e k+1 (T̄k zk ⊗ I),
=U
where zk = (z (1) , z (2) , · · · , z (k) )T . Then,
e k+1 (T̄k zk ⊗ Is )
e1 − U
C − Lm (Y k ) = β
U
1
F
F
e
e k+1 (T̄k zk ⊗ Is )
= Uk+1 (β1 e1 ⊗ Is ) − U
F
e
= Uk+1 (β1 e1 ⊗ Is ) − (T̄k zk ⊗ Is ) F
= β1 e1 − T̄k zk 2 .
Therefore, problem (4.3) is equivalent to
min β1 e1 − T̄k zk 2 .
The global LSQR algorithm for solving problem (3.1) is summarized in Algorithm 2.
ETNA
Kent State University
http://etna.math.kent.edu
52
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
A LGORITHM 2. The global LSQR algorithm (Gl-LSQR).
1. Set Y 0 = 0.
e1 = C/β1 , α1 = k(Lm )T (U
e1 )kF , Ve1 = (Lm )T (U
e1 )/α1 ,
Compute β1 = kCkF , U
f1 = Ve1 , Φ1 = β1 , ρ1 = α1 .
and set W
2. For i = 1, 2, ..., kmax
fi = Lm (Vei ) − αi U
ei , βi+1 = kW
fi k
(a) W
ei+1 = W
fi /βi+1
(b) U
ei+1 ) − βi+1 Vei
(c) Li = (Lm )T (U
(d) αi+1 = kLi kF
q
(e) Vei+1 = Li /αi+1 , ρi = ρ21 + β 2
i+1
(f) ci = ρ1 /ρ1 , si = βi+1 /ρ1
(g) θi+1 = si αi+1 , ρi+1 = ci αi+1
(h) Φi = ci Φi , Φi+1 = si Φi
fi
(i) Y i = Y i−1 + (Φi /ρi )W
fi . If Φi+1 is small enough then stop.
(j) Wi+1 = Vi+1 − (θi+1 /ρi )W
3. End For
The preceding algorithm allows us to compute an approximate solution Y k , with
1 ≤ k ≤ kmax , of the minimization problem (4.6). Next, we give an expression of the associated residual norm.
T HEOREM 4.2. Let Xm = Vm Ym WTm be the approximate solution to the Sylvester
equation obtained after m iterations of the (EBA) algorithm, where
A I
RE RFT 0 BT
,
I
0
T̄
Y
+
Y
T̄
+
Ym = argmin m
m
0
0
0 F
Y
is obtained by the global LSQR algorithm. Then
kRm (Xm )kF = Φ,
where Φ = |Φkmax +1 | is given in Algorithm 2.
Proof. We have
kR(Xm )kF = kAXm + Xm B + EF T kF
A
I
RE RFT
BT
= T̄m Ym I 0 +
Y T̄ +
0 m m
0
= min k Lm (Y ) − CkF
Y
0 0 F
= Φ.
Theorem 4.2 plays a very important role in practice. It allows us to compute the norm
of the residual without computing the approximate solution, which is available only when
convergence is achieved.
If the matrices A and B are stable (Re(λi (A)) < 0 and Re(λi (B)) < 0), then the
Sylvester matrix equation (1.1) has a unique solution given by the integral representation
(see [17])
Z ∞
etA EF T etB dt.
X=
0
ETNA
Kent State University
http://etna.math.kent.edu
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
53
The logarithmic ”2-norm” of the stable matrix A is defined by
µ2 (A) =
1
λmax (A + AT ) < 0.
2
The logarithmic norm provides a useful bound for the matrix exponential. It is known
(see [21]) that
k etA k2 ≤ eµ2 (A)t ,
t ≥ 0.
In the following proposition, we give an upper bound for the norm of the error X − Xm .
T HEOREM 4.3. Assume that A and B are stable matrices, and let Xm = Vm Ym WTm be
the approximate solution to the Sylvester equation obtained after m iterations of the extended
block Arnoldi algorithm. Then we obtain the following upper bound for the error X − Xm :
k X − Xm k2 ≤
−rm
,
µ2 (A) + µ2 (B)
where rm is the residual norm given by (3.2).
Proof. Subtracting R(Xm ) = AXm + Xm B + EF T from the initial Sylvester matrix
equation (1.1), we get
A(Xm − X) + B(Xm − X) = R(Xm ).
As A and B are assumed to be stable, we find
Z ∞
etA R(Xm ) etB dt.
X − Xm =
0
Hence, by using the logarithmic norm, we obtain
Z
k X − Xm k≤k R(Xm ) k2
∞
et(µ2 (A)+µ2 (B)) dt.
0
Therefore, as k R(Xm ) k2 ≤k R(Xm ) kF and µ2 (A) + µ2 (B) < 0, the result follows.
The convergence of the global LSQR algorithm may be slow, and then we have to use
this method with a preconditioner. Different strategies are possible, and one can adapt the
techniques used in [1].
4.4. The preconditioned global conjugate gradient method. In this section, we consider the preconditioned global conjugate gradient method (PGCG) for solving the least
squares reduced problem (4.3). This is a matrix version of the well known PCGLS, the
classical preconditioned CG method applied to the normal equation. The normal equation
associated with (4.3) is given by
(4.7)
L∗m (Lm Y )) = L∗m (C),
where the operators Lm and L∗m are defined in (4.4) and (4.5), respectively.
A
B
Let the matrices Tm and Tm be expressed as
A
B
A
Tm
Tm
and TB = B
,
Tm = A
hm
hm
ETNA
Kent State University
http://etna.math.kent.edu
54
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
A
B
where hA and hB represent the 2r last rows of the matrices Tm and Tm , respectively. Then
the normal equations (4.7) can be written as
(4.8)
AT
BT
A
T
B
T
B
A
B
Tm Tm Y + Y Tm Tm + TA
m Y Tm + Tm Y Tm − C1 = 0,
where C1 = LTm (C). Considering the singular value decomposition (SVD) of the matrices
A
B
Tm and Tm ,
A
T
Tm = U A ΣA V A ,
B
T
Tm = U B ΣB V B ,
we get the following eigendecompositions
AT
A
Tm Tm = QA DA QTA ,
BT
B
Tm Tm = QB DB QTB ,
T
where QA = V A , QB = V B , and DA = ΣA ΣA . Setting Ye = QTA Y QB and Ce = QTA C1 QB ,
the normal equations (4.8) are now expressed as
(4.9)
e A Ye T
e B + (T
e A )T Ye T
e B )T − Ce = 0,
DA Ye + Ye DB + T
m
m
m
m
e A = QT T A QA , T
e B = QT TB QB , and Ye = QT Y QB . This expression suggests
where T
m
m
A m
B m
A
that one can use the first part as a preconditioner, that is, the matrix operator
(4.10)
P(Ye ) = DA Ye + Ye DB .
It can be seen that expression (4.9) corresponds to the normal equations of the matrix operator
T
QA e e B T
A e
e
e
e
Y (T m ) ,
(4.11)
Lm ( Y ) = T m Y Q B 0 +
0
A
B
where TemA = Tm QA and TemB = Tm QB . Therefore, the preconditioned global conjugate
gradient algorithm is obtained by applying the preconditioner (4.10) to the normal equations
associated with the matrix linear operator defined by (4.11). This is summarized in Algorithm 3.
A LGORITHM 3. The preconditioned global CG algorithm (PGCG).
1. Choose a tolerance tol > 0, a maximum number of jmax iterations.
e0 = C − Lem (Ye0 ), S0 = Le∗ (R
e0 ), Z0 = P −1 (S0 ), P0 = S0 .
Choose Ye0 , and set R
m
2. For j = 0, 1, 2, ..., jmax
(a) Wj = Lem (Pj )
(b) αj = hSj , Zj iF /|Wj |2F
(c) Yej+1 = Yej + αj Pj
ej+1 = R
e j − α j Wj
(d) R
e
(e) If kRj+1 kF < tol, stop
else
ej+1 )
(f) Sj+1 = Le∗m (R
−1
(g) Zj+1 = P (Sj+1 )
(h) βj = hSj+1 , Zj+1 iF / hSj , Zj iF
(i) Pj+1 = Zj+1 + βj Pj .
3. End For
Notice that the use of the preconditioner P requires solving a Sylvester equation at each
iteration. But since the matrices DA and DB of these Sylvester matrix equations are diagonal
matrices, the cost is reduced.
ETNA
Kent State University
http://etna.math.kent.edu
55
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
4.5. Low-rank form of the approximate solutions. The solution Xm can be given as
a product of two matrices of low-rank. It is possible to decompose it as Xm = Z1 Z2T ,
where the matrices Z1 and Z2 are of low-rank (lower than 2m). Consider the singular value
decomposition of the 2mr × 2mr matrix
YmMR = Ye1 Σ Ye2T ,
where Σ is the diagonal matrix of the singular values of YmMR sorted in decreasing order.
Let Y1,l and Y2,l be the 2mr × l matrices consisting of the first l columns of Ye1 and Ye2 ,
respectively, corresponding to the l singular values of magnitude larger than some tolerance.
We obtain the truncated singular value decomposition
YmMR ≈ U1,l Σl U2,l T ,
1/2
where Σl = diag[σ1 , . . . , σl ]. Setting Z1,m = Vm U1,l Σl
follows that
(4.12)
1/2
, and Z2,m = Wm U2,l Σl
, it
T
Xm ≈ Z1,m Z2,m
.
This is very important for large problems, because one does not have to compute and store
the approximation Xm at each iteration. We notice that there are cheaper ways to compute
low-rank representations of the approximate solution; see [3].
The minimal residual algorithm for solving Sylvester matrix equations (MRS) is summarized in Algorithm 4.
A LGORITHM 4. The minimal residual method for large Sylvester matrix equations (MR).
1. Choose a tolerance tol > 0, a maximum number of itermax iterations.
2. For m = 1, 2, 3, ..., itermax
A
3. Update Vm , Tm , by EBA (Algorithm 1) applied to (A, E).
B
4. Update Wm , Tm , by EBA applied to (B T , F ).
5. Solve the low-order problem (3.1);
6. if kR(Xm )kF ≤ tol, stop.
7. End For
T
8. Using (4.12), the approximate solution Xm is given by Xm ≈ Z1,m Z2,m
.
5. Numerical experiments. In this section, we present some numerical examples of
continuous-time Sylvester matrix equations. We give comparisons between the Galerkin projection approach (GA) and the minimal residual (MR) approach for large-scale problems
using the global LSQR (Gl-LSQR), the preconditioned global conjugate gradient method
(PGCG), the direct Kronecker product when solving the reduced least squares minimization
problem, and the Hu-Reichel method (HR). In the last experiment, we also give a comparison
with the low-rank factored ADI (Lr ADI) method described in [3, 4]. The algorithms are
coded in Matlab 8.0. For the Galerkin approach (GA) and at each iteration m, the projected
problem of size 2mr × 2mr is solved by using the Bartels-Stewart algorithm [2]. When solving the reduced minimization problem, the global LSQR and the preconditioned global CG
(PGCG) are stopped when the relative norm of the residual is less than toll = 10−12 or when
a maximum of kmax = 1000 iterations is achieved. The minimal residual (Algorithm 4) and
the Galerkin approach are stopped when the norm of the residual is less than tol = 10−7 , and
a maximum of itermax = 50 outer iterations is allowed.
ETNA
Kent State University
http://etna.math.kent.edu
56
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
8
GA
MR
6
Log10 of resid. norms
4
2
0
−2
−4
−6
−8
0
10
20
30
Iterations
40
50
60
F IG . 5.1. GA: solid line, MR: dashed line.
The first set of matrices A and B is obtained from the discretisation of the operator
(5.1)
Lu = ∆u − f1 (x, y)
∂u
∂u
+ f2 (x, y)
+ g(x, y)
∂x
∂y
on the unit square [0, 1] × [0, 1] with homogeneous Dirichlet boundary conditions. The number of inner grid points in each direction is n0 for the operator Lu . The dimensions of the
matrices A and B are n = n20 and s = s20 , respectively. The discretization of the operator Lu yields matrices extracted from the Lyapack package [20] using the command
fdm 2d matrix and denoted as A = f dm(f1 (x, y), f2 (x, y), g(x, y)). For the second set
of matrix tests, we use the matrices add32, pde2961, and thermal from the Harwell Boeing
Collection1 and also the matrix ’flow meter’ from the Oberwolfach Collection2 . The coefficients of the matrices E and F are random values uniformly distributed on [0, 1], and we set
r = 2.
E XAMPLE 5.1. In Figure 5.1 and Figure 5.2 we plot the residual norms versus the number of iterations for the minimal residual and the Galerkin approaches. For this first experiment, we use the global LSQR algorithm to solve the reduced minimization problem (3.1).
In Figure 5.1, the matrices A and B are obtained from the discretisation of the operator Lu
with dimensions n = 4900 and s = 3600, respectively. In Figure 5.2, we use the matrices A = pde2961 and B = thermal from the Harwell Boeing collection with dimensions
n = 2961 and s = 3456, respectively. For this experiment, the reduced-order problem is
solved by the direct least squares method associated to the Kronecker form (4.1). As can
be seen from these two figures, the MR algorithm converges successfully, while for the GA
approach, we obtain residual norms around 10−5 .
E XAMPLE 5.2. For the second set of experiments, we compare the performances of the
MR method associated to the different techniques for solving the reduced-order minimization
problem and the Galerkin approach (GA). The reduced minimization problem is solved by the
direct least squares method, by the global LSQR, by the preconditioned conjugate gradient
1 http://math.nist.gov/MatrixMarket
2 https://portal.uni-freiburg.de/imteksimulation/benchmark
ETNA
Kent State University
http://etna.math.kent.edu
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
57
8
GA
MR
6
4
Log10 of resid. norms
2
0
−2
−4
−6
−8
−10
0
5
10
15
20
Iterations
25
30
35
40
F IG . 5.2. GA: solid line, MR: dashed line.
method for the normal equation (GPCG), and by the Hu-Reichel method (HR). In Table 5.1,
we list the CPU times (in seconds), the total number of outer iterations, and the norms of the
residuals obtained by these different approaches. The reported times include the time required
for LU computations of the matrices A and B used in the extended block Arnoldi process. As
can be seen from this table, the results given by the Galerkin approach are not good (except
for the first experiment). In some cases, the approximations produced by the GA method
deteriorate after a number of iterations. We also notice that when using the direct method
for solving the reduced minimization problem, the CPU time is higher when many outer
iterations are needed to achieve convergence; this is the case for the last two experiments.
E XAMPLE 5.3. For the last experiment, we compare the performances of the GA approach, the MR method with GPCG as a solver for the low-order minimization problem, and
the low-rank factored ADI (Lr ADI) method described in [3, 4]. For this experiment, the
matrices A and B are the same as those given in [4, Example 1]. They are obtained from
the 5-point discretization of the operator Lu in (5.1) with dimensions n = 6400 for A and
s = 3600 for B. For this experiment, E and F are random matrices with r = 4 columns.
In Table 5.2, we list the residual norms and the corresponding CPU time for each method.
For this experiment, the algorithms are stopped when the relative residual norms are smaller
than 10−11 .
6. Conclusion. In this paper we presented an iterative method for solving large-scale
Sylvester matrix equations with low-rank right-hand sides. The proposed method is based on
a projection onto extended block Krylov subspaces with a minimization property. The obtained low-order minimization problem is solved via iterative or direct methods. To speed up
the convergence when solving the low-order minimization problem, we use a preconditioned
global conjugate gradient method associated to the normal matrix equation. The approximate
solutions are given as a product of two low-rank matrices, which allows to save memory for
large problems. The advantage of the minimal residual method as compared to the Galerkintype counterpart is its stability. This is demonstrated by the performed numerical tests.
ETNA
Kent State University
http://etna.math.kent.edu
58
S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK
TABLE 5.1
Results for experiments 2.
Matrices
Method
A = thermal,
B = add32
n = 3456, s = 4960
MR(PGCG)
MR (direct)
MR (Gl-LSQR)
MR(HR)
GA
26s
26s
28s
27s
25s
10
9
10
9
9
1.1 × 10−8
2.1 × 10−8
1.3 × 10−8
3.5 × 10−8
1.4 × 10−8
A = f dm(cos(xy), ey x , 100)
B = add32
n = 90000, s = 4960
MR(PGCG)
MR (direct)
MR (Gl-LSQR)
MR(HR)
GA
45s
56s
49s
88s
83s
15
15
17
15
21
1.9 × 10−8
2.3 × 10−8
1.1 × 10−8
3.1 × 10−8
8.6 × 10−5
A = f dm(sin(xy), exy , 10)
B = thermal
n = 122500, s = 3456
MR(PGCG)
MR(direct)
MR(Gl-LSQR)
MR(HR)
GA
56s
58s
−−
110s
82s
18
16
50
16
25
2.5 × 10−8
3.1 × 10−8
− − −−
2.0 × 10−8
8.4 × 10−5
A = f low
B = f dm(sin(xy), xy, 1000)
n = 9669, s = 40000
MR (PGCG)
MR (direct)
MR(Gl-LSQR)
MR(HR)
GA
49s
700s
−−
150s
95s
16
35
50
35
50
2.0 × 10−8
3.1 × 10−8
− − −−
2.4 × 10−8
2.5 × 10−5
A = f dm(xy, y 2 , 1)
B = f dm(xy, cos(xy), 10)
n = 122500, s = 48400
MR(PGCG)
MR(direct)
MR(Gl-LSQR)
MR(HR)
GA
706s
1500s
−−
−−
805s
18
42
−−
−−
50
2.1 × 10−8
1.5 × 10−8
− − −−
− − −−
4.2 × 10−4
2
CPU time Its.
Res. norm
TABLE 5.2
Results for experiments 3.
Method
MR(PGCG)
GA
Lr ADI
Res. norm.
1.2 × 10−12
6.4 × 10−12
2.5 × 10−12
CPU time
2.8s
3.2s
5.2s
Acknowledgments. We would like to thank the two referees for their valuable remarks
and helpful comments and suggestions. We also thank Dr. Patrick Kürschner for providing
us with the Matlab program for the Lr ADI method.
REFERENCES
[1] J. BAGLAMA , L. R EICHEL , AND D. R ICHMOND, An augmented LSQR method, Numer. Algorithms, 64
(2013), pp. 263–293.
[2] R. H. BARTELS AND G. W. S TEWART, Algorithm 432: Solution of the matrix equation AX + XB = C,
Comm. ACM, 15 (1972), pp. 820–826.
ETNA
Kent State University
http://etna.math.kent.edu
MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS
59
[3] P. B ENNER , R. C. L I , AND N. T RUHAR, On the ADI method for Sylvester equations, J. Comput. Appl. Math.,
233 (2009), pp. 1035–1045.
[4] P. B ENNER AND P. K ÜRSCHNER, Computing real low-rank solutions of Sylvester equations by the factored
ADI method, Comput. Math. Appl., 67 (2014), pp. 1656–1672.
[5] V. D RUSKIN AND L. K NIZHNERMAN, Extended Krylov subspaces: approximation of the matrix square root
and related functions, SIAM J. Matrix Anal. Appl., 19 (1998), pp. 755–771.
[6] A. E L G UENNOUNI , K. J BILOU , AND A. J. R IQUET, Block Krylov subspace methods for solving large
Sylvester equations, Numer. Algorithms, 29 (2002), pp. 75–96.
[7] G. H. G OLUB AND W. K AHAN, Calculating the singular values and pseudo-inverse of a matrix, J. Soc.
Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224.
[8] G. H. G OLUB , S. NASH , AND C. VAN L OAN, A Hessenberg Schur method for the problem AX +XB = C,
IEEE Trans. Automat. Control, 24 (1979), pp. 909–913.
[9] M. H EYOUNI AND K. J BILOU, An extended Block Arnoldi algorithm for large-scale solutions of the
continuous-time algebraic Riccati equation, Electron. Trans. Numer. Anal., 33 (2009), pp. 53–62.
http://etna.mcs.kent.edu/vol.33.2008-2009/pp53-62.dir
[10] M. H EYOUNI, Extended Arnoldi methods for large low-rank Sylvester matrix equations, Appl. Numer. Math.,
60 (2010), pp. 1171–1182.
[11] D. H U AND L. R EICHEL, Krylov-subspace methods for the Sylvester equation, Linear Algebra Appl., 172
(1992), pp. 283–313.
[12] I. M. JAIMOUKHA AND E. M. K ASENALLY, Krylov subspace methods for solving large Lyapunov equations,
SIAM J. Numer. Anal., 31 (1994), pp. 227–251.
[13] K. J BILOU, Low-rank approximate solution to large Sylvester matrix equations, App. Math. Comput., 177
(2006), pp. 365–376.
[14] K. J BILOU AND A. J. R IQUET, Projection methods for large Lyapunov matrix equations, Linear Algebra
Appl., 415 (2006), pp. 344–358.
[15] Z. J IA AND Y. S UN, A QR decomposition based solver for the least squares problems from the minimal
residual method for the Sylvester equation, J. Comput. Math., 25 (2007), pp. 531–542.
[16] D. K RESSNER AND C. T OBLER, Low-rank tensor Krylov subspace methods for parametrized linear systems,
SIAM J. Matrix Anal. Appl., 32 (2011), pp. 1288–1316.
[17] P. L ANCASTER AND M. T ISMENETSKY, The Theory of Matrices, Academic Press, Orlando, 1985.
[18] Y. L IN AND V. S IMONCINI, Minimal residual methods for large scale Lyapunov equations, App. Numer.
Math., 72 (2013), pp. 52–71.
[19] C. C. PAIGE AND A. S AUNDERS, LSQR: an algorithm for sparse linear equations and sparse least squares,
ACM Trans. Math. Software, 8 (1982), pp. 43–71.
[20] T. P ENZL, LYAPACK: A MATLAB toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear-quadratic optimal control problems, software available at
https://www.tu-chemnitz.de/sfb393/lyapack/.
[21] C. M OLER AND C. VAN L OAN, Ninteen dubious ways to compute the exponential of a matrix, SIAM Rev.,
20 (1978), pp. 801–836.
[22] Y. S AAD, Numerical solution of large Lyapunov equations, in Signal Processing, Scattering, Operator Theory
and Numerical Methods, M. A. Kaashoek, J. H. Van Shuppen, and A. C. Ran, eds., Progress in Systems
and Control Theory 5, Birkhäuser, Boston, 1990, pp. 503–511.
[23] V. S IMONCINI, A new iterative method for solving large-scale Lyapunov matrix equations, SIAM J. Sci.
Comput., 29 (2007), pp. 1268–1288.
Download