ETNA Electronic Transactions on Numerical Analysis. Volume 43, pp. 45-59, 2014. Copyright 2014, Kent State University. ISSN 1068-9613. Kent State University http://etna.math.kent.edu A MINIMAL RESIDUAL NORM METHOD FOR LARGE-SCALE SYLVESTER MATRIX EQUATIONS∗ SAID AGOUJIL†, ABDESLEM H. BENTBIB‡, KHALIDE JBILOU§, AND EL MOSTAFA SADEK‡§ Abstract. In this paper, we present a new method for solving large-scale Sylvester matrix equations with a low-rank right-hand side. The proposed method is an iterative method based on a projection onto an extended block Krylov subspace by minimization of the norm of the residual. The obtained reduced-order problem is solved via different direct or iterative solvers that exploit the structure of the linear operator associated with the obtained matrix equation. In particular, we use the global LSQR algorithm as iterative method for the derived low-order problem. Then, when convergence is achieved, a low-rank approximate solution is computed given as a product of two low-rank matrices, and a stopping procedure based on an economical computation of the norm of the residual is proposed. Different numerical examples are presented, and the proposed minimal residual approach is compared with the corresponding Galerkin-type approach. Key words. extended block Krylov subspaces, low-rank approximation, Sylvester equation, minimal residual methods AMS subject classifications. 65F10, 65F30 1. Introduction. In this paper, we consider the large-scale Sylvester matrix equation (1.1) AX + XB + EF T = 0, where A and B are square matrices of size n×n and s×s, respectively, and E, F are matrices of size n × r and s × r, respectively. Sylvester equations play a fundamental role in many problems in control, communication theory, image processing, signal processing, filtering, model reduction problems, decoupling techniques for ordinary partial differential equations, and the implementation of implicit numerical methods for ordinary differential equations; see, e.g., [3, 6, 10, 13] and the references therein. Sylvester matrix equations have a unique solution if and only if α + β 6= 0 for all α ∈ σ(A) and β ∈ σ(B), where σ(Z) denotes the spectrum of the matrix Z. For small- to medium-sized problems, direct methods such as the Bartels-Stewart [2] and the HessenbergSchur [8] algorithms can be used. These algorithms are based on reducing A and B into triangular or Hessenberg forms. During the last years, several projection methods based on Krylov subspaces have been proposed; see, e.g., [5, 6, 10, 12, 13, 14, 18, 23] and the references therein. The main idea employed in these methods is to use a block (global or extended) subspace and then apply the block (global or extended block) Arnoldi process to construct orthonormal bases. Then, the original large Sylvester matrix equation is projected onto these Krylov subspaces. The alternating directional implicit (in short ADI) iterations [3, 4, 16] can also be utilized if spectral information about A and B is given. Note that, ADI iterations allow for faster convergence if optimal shifts to A and B can be effectively computed and linear systems with shifted coefficient matrices are solved effectively at low cost. ∗ Received October 30, 2013. Accepted May 6, 2014. Published online on September 5, 2014. Recommended by L. Reichel. † Laboratory LAMAI, Département of Computer Science, Faculty of Sciencs and Technology, B.P 509 Errachidia 52000, Morocco (agoujil@gmail.com). ‡ Faculté des Sciences et Techniques-Gueliz, Laboratoire de Mathématiques Appliquées et Informatique, Morocco (a.bentbib@uca.ma, sadek@lmpa.univ-littoral.fr). § Laboratoire LMPA, 50 rue F. Buisson, ULCO, 62228, Calais, France (jbilou@lmpa.univ-littoral.fr). 45 ETNA Kent State University http://etna.math.kent.edu 46 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK The approximate solutions produced by Krylov subspace methods are given as Xm = Vm Ym WTm , where Vm and Wm are orthonormal matrices whose columns form bases for the Krylov subspace Km (A, E) and Km (B T , F ), respectively. Then, the Sylvester matrix equation is projected via the following Galerkin condition VTm (AVm Ym WTm + Vm Ym WTm B + EF T )Wm = 0. The obtained low-order Sylvester matrix equation is solved by direct methods such as the Hessenberg-Schur method. Instead of using a Galerkin-type projection, in this paper we consider the minimization of the Frobenius norm of the residual associated with the Sylvester matrix equation (1.1), which leads to the following minimization problem YmMR = argmin Xm =Vm Ym WT m AXm + Xm B + EF T . F The rest of the paper is organized as follows. In the next section, we recall the extended block Arnoldi algorithm with some of its properties. In Section 3, we define the minimal residual method for Sylvester matrix equations by using the extended Krylov subspaces Km (A, E) and Km (B T , F ). Then, in Section 4, we give some direct and iterative methods for solving the obtained low-order minimization problem. Finally, Section 5 is devoted to numerical experiments. Throughout the paper, we use the following notations. The Frobenius inner product of the matrices X and Y is defined by hX, Y iF = tr(X T Y ), where tr(Z) denotes the trace of a square matrix Z. The associated norm is the Frobenius norm denoted by k.kF . The Kronecker product of two matrices A and B is defined by A ⊗ B = [ai,j B], where A = [ai,j ]. This product satisfies the properties (A ⊗ B)(C ⊗ D) = (AC ⊗ BD) and (A ⊗ B)T = AT ⊗ B T . Finally, if X is a matrix, then vec(X) is the vector obtained by stacking all the columns of X. 2. The extended block Arnoldi process. In this section, we recall the extended Krylov subspace and extended block Arnoldi process. Let V be a matrix of dimension n × r. Then the block Krylov subspace associated to (A, V ) is defined as Km (A, V ) = Range V, AV, A2 V, . . . , Am−1 V . Assuming that the matrix A is nonsingular, the extended block Krylov subspace associated with (A, A−1 , V ) is given by (see [5, 23]) e Km (A, V ) = Range V, A−1 V, AV, A−2 V, A2 V, . . . , Am−1 V, A−m+1 V = Km (A, V ) + Km (A−1 , A−1 V ). The extended block Arnoldi process allows us to construct an orthonormal basis for the exe (A, V ); see [9, 23]. tended block Krylov subspace Km ETNA Kent State University http://etna.math.kent.edu MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 47 A LGORITHM 1. The extended block Arnoldi algorithm (EBA). Input: A an n × n matrix, V an n × r matrix, and m an integer. 1. Compute the QR decomposition of [V, A−1 V ], i.e., [V, A−1 V ] = V1 Λ. 2. Set V0 = [ ]. 3. For j = 1, 2, ..., m (1) (2) (a) Set Vj : first r columns of Vj , Vj : second r columns of Vj . i h (1) (2) (b) Vj = [Vj−1 , Vj ] , V̂j+1 = A Vj , A−1 Vj (c) Orthogonalize V̂j+1 with respect to Vj to get Vj+1 , i.e., i. For i = 1, 2, . . . ii. Hi,j = ViT V̂j+1 iii. V̂j+1 = V̂j+1 − Vi Hi,j iv. End For (d) Compute the QR decomposition of V̂j+1 , i.e., V̂j+1 = Vj+1 Hj+1,j . 4. End For Since the extended block Arnoldi algorithm involves the Gram-Schmidt process, the obtained block vectors Vm = [V1 , V2 , . . . , Vm ] (Vi ∈ Rn×2r ) have their columns mutually orthogonal provided none of the upper triangular matrices Hj+1,j are rank deficient. After m steps, Algorithm 1 has built an orthonormal basis Vm of the extended block Krylov subspace Range({V, A V, . . . , Am−1 V, A−1 V, . . . , (A−1 )m V }) and a block upper Hessenberg matrix Hm whose non zeros blocks are the entries Hi,j . Note that each submatrix Hi,j (1 ≤ i ≤ j ≤ m) is of order 2r. Let Tm ∈ R2mr×2mr be the restriction of the matrix A to the extended Krylov subspace e Km (A, V ), i.e., Tm = VTm A Vm . It is shown in [23] that Tm is also block upper Hessenberg with 2r × 2r blocks. Moreover, a recursion is derived to compute Tm from Hm without requiring matrix-vector products with A. For more details on how to compute Tm from Hm , we refer to [23]. We note that for large problems, the inverse of the matrix A is not computed explicitly, and in this case we can use iterative solvers with preconditioners to solve linear systems involving A. We notice that Algorithm 1 could suffer from a possible breakdown if for some j the upper triangular matrix Hj+1,j is rank deficient. To avoid this problem, one may define a sequential extended block Arnoldi with a deflation procedure in the same manner as it was already done for the classical block Arnoldi algorithm. P ROPOSITION 2.1 ([9]). Let T̄m = VTm+1 AVm . Then we have the following relations AVm = Vm+1 T̄m = Vm Tm + Vm+1 Tm+1,m ETm , where Em = [02(m−1)r×2r , I2r ]T is the matrix of the last 2r columns of the identity matrix I2mr . In the next section, we define the minimal residual approach for solving Sylvester matrix equations. For this we can use block, global or extended Krylov subspaces. We will focus only on extended Krylov subspace methods; the other approaches could be used in the same manner. 3. The minimal residual method for large-scale Sylvester matrix equations. In the last years, several projection methods based on Krylov subspaces have been proposed to produce approximations to exact solutions of Lyapunov or Sylvester equations; see, e.g., [6, 12, 13, 14, 22, 23]. Most of these projection methods use the Galerkin condition to extract the approximate solutions from the projection space. ETNA Kent State University http://etna.math.kent.edu 48 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK In this section, we consider approximate solutions of the form X MR = Vm YmMR WTm , where YmMR solves the following low-order minimization problem Y MR = argmin AXm + Xm B + EF T F , Xm =Vm Ym WT m where Vm = [V1 , V2 , · · · , Vm ], Wm = [W1 , W2 , · · · , Wm ] ∈ Rn×mr are the orthonormal matrices constructed by applying simultaneously m steps of the extended block Arnoldi algorithm to the pairs (A, E) and (B T , F ), respectively. In this case, we have AVm = Vm+1 T̄A m, B T Wm = Wm+1 T̄B m. We have the following result. T HEOREM 3.1. Let Vm and Wm be the orthonormal matrices constructed by applying simultaneously m steps of the extended block Arnoldi algorithm to the pairs (A, E) and (B T , F ), respectively. Then the minimization problem YmMR = argmin AXm + Xm B + EF T , F Xm =Vm Ym WT m can be written as (3.1) A YmMR = argmin T̄m Ym I I RE RFT T 0 + Ym (T̄B ) + m 0 0 0 , 0 F where E = V1 RE and F = W1 RF are the QR-factorizations of E and F , respectively. Proof. We have AX + XB + EF T min X=Vm Ym WT m F = min kAVm Ym WTm + BVm Ym WTm + V1 RE RFT W1T kF Ym I RE RFT 0 B T T A Y (T̄ ) + Wm+1 = min Vm+1 T̄m Ym I 0 + 0 m m 0 0 Ym F T A I R R 0 E F . T̄m Ym I 0 + Y (T̄B )T + = min 0 m m 0 0 F Ym Notice that the computation of the norm of the residual R(Xm ) requires only the knowledge of YmMR but does not require the approximate solution Xm , which is computed only when convergence is achieved. This residual norm can be calculated by A I RE RFT 0 B T . I 0 (3.2) rm = T̄ Y + Y ( T̄ ) + m m 0 m m 0 0 F GA Recall that the Galerkin approach produces approximations Xm = Vm YmGA WTm where YmGA is a solution of the low-order Sylvester matrix equation (3.3) GA GA B T e eT TA m Ym + Ym (Tm ) + E1 F1 = 0, e1 = VT E and Fe1 = WT F . The projected problem (3.3) has a unique solution if where E m m BT and only if the matrices TA and T m m have no eigenvalue in common. Notice that the problem (3.1) does not have this restriction. Now, the main issue is how to solve the reduced-order minimization problem (3.1). In the next section, we describe different direct and iterative strategies for solving (3.1). These techniques were inspired from those in [18] for Lyapunov matrix equations. ETNA Kent State University http://etna.math.kent.edu 49 MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 4. Methods for solving the reduced minimization problem. 4.1. The least squares approach associated with the Kronecker form. Using the Kronecker product properties, problem (4.3) can be written as I I A B , ⊗ T̄ + T̄ ⊗ y + c (4.1) min m m 0 0 y F RE RFT 0 and y = vec(Y ). This least squares problem is then solved 0 0 by direct methods such as the classical QR algorithm. For small values of the iteration number m, this gives good results. However, when m increases, the method becomes very slow and cannot be used. where c = vec 4.2. The Hu-Reichel method for solving the reduced problem. Here we discuss the Hu-Reichel method (see [11]) for solving the reduced-order matrix least squares problem (3.1). More precisely, we apply the approach given in [15] and the strategy used in [18]. T T B be the real Schur decompositions of the matrices Let TA m = U TA U and Tm = V TB V TA and TB , respectively. The minimization problem (3.1) is written as B T A I Tm RE RFT 0 Tm min Y + Y I 0 + , hB 0 0 0 Y hA F A B where hA and hB represent the 2r last rows of the matrices Tm and Tm , respectively. Then we obtain the new minimization problem TA Y + Y TB T + R R T 0 E m m F min T , Y hA Y Y hB F which is equivalent to 2 T 2 A 2 BT T min Tm Y + Y Tm + RE RF + khA Y kF + Y hB F Y F 2 2 2 T T T = min U TA U Y + Y (V TB V ) + RE RFT F + khA Y kF + Y hTB F Y 2 2 e e T e T T 2 T T e = min TA Y + Y TB + U RE RF V + hA U Y + Y V hB , e Y F F F where Ye = U T Y V . Using the Kronecker product, the preceding problem is transformed into (4.2) 2 I ⊗ TA + TB ⊗ I vec(U T RE RFT V ) . vec(Ye ) + I ⊗ hA U 0 min Y 0 hB V ⊗ I 2 Problem (4.2) can also be expressed as 2 R d , ỹ + min 0 2 S ỹ I ⊗ hA U where R = I ⊗ TA + TB ⊗ I and S = . hB V ⊗ I ETNA Kent State University http://etna.math.kent.edu 50 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK The associated normal equation is (RT R + S T S)ỹ + RT d = 0. If the matrix R is nonsingular, then we have (I + (SR−1 )T SR−1 )z + d = 0, where z = Rỹ. Now, using the Sherman-Morrison-Woodbury formula, we get z = −d + (SR−1 )T (I + SR−1 (SR−1 )T )−1 SR−1 d. Therefore from z, we obtain Ye and then Y , the solution of problem (3.1). We notice that an economical strategy for computing SR−1 is also possible and can be obtained in the same way as in [15, 18]. Furthermore, in its matrix form, problem (4.2) could also be solved by the global LSQR or by the global CG method with an appropriate preconditioner. R EMARK 4.1. Both the Kronecker least squares approach and the Hu-Reichel method require O(m3 r3 ) multiplications which is similar to the amount of computations required at each step of the Galerkin method when using the Bartels-Stewart algorithm for solving the low-order Sylvester equation (3.3). When m increases, the least squares approach and the Hu-Reichel method become expensive. In these cases, one should use iterative methods as will be defined in the next two subsections. 4.3. The global LSQR algorithm. In this section, we demonstrate how to adapt the LSQR algorithm of Paige and Sanders [19] for solving the low-order least squares problem (3.1). The classical LSQR algorithm is analytically equivalent to the conjugate gradient method applied to the associated normal equation. The LSQR method makes use of the Golub-Kahan bidiagonalization process [7]. At each step m of the process, we determine approximate solutions to the solution YmMR of the problem (3.1). For simplicity, problem (3.1) is now written in the following way, (4.3) min kLm (Y ) − CkF , Y where (4.4) Lm (Y ) = T̄A mY and C=− I I 0 + Y T̄BT m , 0 RE RFT 0 0 . 0 Notice that the adjoint of the linear operator Lm with respect to the Frobenius inner product is given by I T + I 0 Z T̄B (4.5) L∗m (Z) = (T̄A ) Z m. m 0 The global Lanczos bidiagonalization process. This iterative procedure is initialized by e1 = C, β1 U e1 ), α1 Ve1 = Lm T (U β1 = kCkF , e1 )kF , α1 = kLm T (U ETNA Kent State University http://etna.math.kent.edu MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 51 and continued using the recurrence relations ei+1 = Lm (Vei ) − αi U ei , βi+1 U ei − βi Vei . αi+1 Vei+1 = L∗m U ei kF = kVei kF = 1, i = 1, 2, . . . We The scalars αi > 0 and βi > 0 are chosen such that kU e k := [U e k := [Ve1 , Ve2 , ..., Vek ], and e1 , U e2 , ..., U ek ], V set U α1 β2 T̄k = α2 β3 .. . .. . αk βk+1 . It is not difficult to show that the constructed blocks Vei and Vej are F-orthonormal (which means that they are orthonormal with respect to the Frobenius inner product). Using the global Lanczos process, we find approximate solutions Y k of the exact solution YmMR of problem (3.1). We can write the previous recurrence relations in matrix form as (4.6) e k+1 (T̄k ⊗ I). [Lm (Ve1 ), Lm (Ve2 ), ..., Lm (Vek )] = U The method consists in searching for an approximate solution of the form Yk = k X i=1 z (i) Vei . Applying the linear operator Lm to Y k , we get ! k k X X z (i) Lm (Vei ) z (i) Vei = Lm (Y k ) = Lm i=1 i=1 e k+1 (T̄k ⊗ I)(zk ⊗ I) = [Lm (Ve1 ), ..., Lm (Vek )] (zk ⊗ Is ) = U e k+1 (T̄k zk ⊗ I), =U where zk = (z (1) , z (2) , · · · , z (k) )T . Then, e k+1 (T̄k zk ⊗ Is ) e1 − U C − Lm (Y k ) = β U 1 F F e e k+1 (T̄k zk ⊗ Is ) = Uk+1 (β1 e1 ⊗ Is ) − U F e = Uk+1 (β1 e1 ⊗ Is ) − (T̄k zk ⊗ Is ) F = β1 e1 − T̄k zk 2 . Therefore, problem (4.3) is equivalent to min β1 e1 − T̄k zk 2 . The global LSQR algorithm for solving problem (3.1) is summarized in Algorithm 2. ETNA Kent State University http://etna.math.kent.edu 52 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK A LGORITHM 2. The global LSQR algorithm (Gl-LSQR). 1. Set Y 0 = 0. e1 = C/β1 , α1 = k(Lm )T (U e1 )kF , Ve1 = (Lm )T (U e1 )/α1 , Compute β1 = kCkF , U f1 = Ve1 , Φ1 = β1 , ρ1 = α1 . and set W 2. For i = 1, 2, ..., kmax fi = Lm (Vei ) − αi U ei , βi+1 = kW fi k (a) W ei+1 = W fi /βi+1 (b) U ei+1 ) − βi+1 Vei (c) Li = (Lm )T (U (d) αi+1 = kLi kF q (e) Vei+1 = Li /αi+1 , ρi = ρ21 + β 2 i+1 (f) ci = ρ1 /ρ1 , si = βi+1 /ρ1 (g) θi+1 = si αi+1 , ρi+1 = ci αi+1 (h) Φi = ci Φi , Φi+1 = si Φi fi (i) Y i = Y i−1 + (Φi /ρi )W fi . If Φi+1 is small enough then stop. (j) Wi+1 = Vi+1 − (θi+1 /ρi )W 3. End For The preceding algorithm allows us to compute an approximate solution Y k , with 1 ≤ k ≤ kmax , of the minimization problem (4.6). Next, we give an expression of the associated residual norm. T HEOREM 4.2. Let Xm = Vm Ym WTm be the approximate solution to the Sylvester equation obtained after m iterations of the (EBA) algorithm, where A I RE RFT 0 BT , I 0 T̄ Y + Y T̄ + Ym = argmin m m 0 0 0 F Y is obtained by the global LSQR algorithm. Then kRm (Xm )kF = Φ, where Φ = |Φkmax +1 | is given in Algorithm 2. Proof. We have kR(Xm )kF = kAXm + Xm B + EF T kF A I RE RFT BT = T̄m Ym I 0 + Y T̄ + 0 m m 0 = min k Lm (Y ) − CkF Y 0 0 F = Φ. Theorem 4.2 plays a very important role in practice. It allows us to compute the norm of the residual without computing the approximate solution, which is available only when convergence is achieved. If the matrices A and B are stable (Re(λi (A)) < 0 and Re(λi (B)) < 0), then the Sylvester matrix equation (1.1) has a unique solution given by the integral representation (see [17]) Z ∞ etA EF T etB dt. X= 0 ETNA Kent State University http://etna.math.kent.edu MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 53 The logarithmic ”2-norm” of the stable matrix A is defined by µ2 (A) = 1 λmax (A + AT ) < 0. 2 The logarithmic norm provides a useful bound for the matrix exponential. It is known (see [21]) that k etA k2 ≤ eµ2 (A)t , t ≥ 0. In the following proposition, we give an upper bound for the norm of the error X − Xm . T HEOREM 4.3. Assume that A and B are stable matrices, and let Xm = Vm Ym WTm be the approximate solution to the Sylvester equation obtained after m iterations of the extended block Arnoldi algorithm. Then we obtain the following upper bound for the error X − Xm : k X − Xm k2 ≤ −rm , µ2 (A) + µ2 (B) where rm is the residual norm given by (3.2). Proof. Subtracting R(Xm ) = AXm + Xm B + EF T from the initial Sylvester matrix equation (1.1), we get A(Xm − X) + B(Xm − X) = R(Xm ). As A and B are assumed to be stable, we find Z ∞ etA R(Xm ) etB dt. X − Xm = 0 Hence, by using the logarithmic norm, we obtain Z k X − Xm k≤k R(Xm ) k2 ∞ et(µ2 (A)+µ2 (B)) dt. 0 Therefore, as k R(Xm ) k2 ≤k R(Xm ) kF and µ2 (A) + µ2 (B) < 0, the result follows. The convergence of the global LSQR algorithm may be slow, and then we have to use this method with a preconditioner. Different strategies are possible, and one can adapt the techniques used in [1]. 4.4. The preconditioned global conjugate gradient method. In this section, we consider the preconditioned global conjugate gradient method (PGCG) for solving the least squares reduced problem (4.3). This is a matrix version of the well known PCGLS, the classical preconditioned CG method applied to the normal equation. The normal equation associated with (4.3) is given by (4.7) L∗m (Lm Y )) = L∗m (C), where the operators Lm and L∗m are defined in (4.4) and (4.5), respectively. A B Let the matrices Tm and Tm be expressed as A B A Tm Tm and TB = B , Tm = A hm hm ETNA Kent State University http://etna.math.kent.edu 54 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK A B where hA and hB represent the 2r last rows of the matrices Tm and Tm , respectively. Then the normal equations (4.7) can be written as (4.8) AT BT A T B T B A B Tm Tm Y + Y Tm Tm + TA m Y Tm + Tm Y Tm − C1 = 0, where C1 = LTm (C). Considering the singular value decomposition (SVD) of the matrices A B Tm and Tm , A T Tm = U A ΣA V A , B T Tm = U B ΣB V B , we get the following eigendecompositions AT A Tm Tm = QA DA QTA , BT B Tm Tm = QB DB QTB , T where QA = V A , QB = V B , and DA = ΣA ΣA . Setting Ye = QTA Y QB and Ce = QTA C1 QB , the normal equations (4.8) are now expressed as (4.9) e A Ye T e B + (T e A )T Ye T e B )T − Ce = 0, DA Ye + Ye DB + T m m m m e A = QT T A QA , T e B = QT TB QB , and Ye = QT Y QB . This expression suggests where T m m A m B m A that one can use the first part as a preconditioner, that is, the matrix operator (4.10) P(Ye ) = DA Ye + Ye DB . It can be seen that expression (4.9) corresponds to the normal equations of the matrix operator T QA e e B T A e e e e Y (T m ) , (4.11) Lm ( Y ) = T m Y Q B 0 + 0 A B where TemA = Tm QA and TemB = Tm QB . Therefore, the preconditioned global conjugate gradient algorithm is obtained by applying the preconditioner (4.10) to the normal equations associated with the matrix linear operator defined by (4.11). This is summarized in Algorithm 3. A LGORITHM 3. The preconditioned global CG algorithm (PGCG). 1. Choose a tolerance tol > 0, a maximum number of jmax iterations. e0 = C − Lem (Ye0 ), S0 = Le∗ (R e0 ), Z0 = P −1 (S0 ), P0 = S0 . Choose Ye0 , and set R m 2. For j = 0, 1, 2, ..., jmax (a) Wj = Lem (Pj ) (b) αj = hSj , Zj iF /|Wj |2F (c) Yej+1 = Yej + αj Pj ej+1 = R e j − α j Wj (d) R e (e) If kRj+1 kF < tol, stop else ej+1 ) (f) Sj+1 = Le∗m (R −1 (g) Zj+1 = P (Sj+1 ) (h) βj = hSj+1 , Zj+1 iF / hSj , Zj iF (i) Pj+1 = Zj+1 + βj Pj . 3. End For Notice that the use of the preconditioner P requires solving a Sylvester equation at each iteration. But since the matrices DA and DB of these Sylvester matrix equations are diagonal matrices, the cost is reduced. ETNA Kent State University http://etna.math.kent.edu 55 MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 4.5. Low-rank form of the approximate solutions. The solution Xm can be given as a product of two matrices of low-rank. It is possible to decompose it as Xm = Z1 Z2T , where the matrices Z1 and Z2 are of low-rank (lower than 2m). Consider the singular value decomposition of the 2mr × 2mr matrix YmMR = Ye1 Σ Ye2T , where Σ is the diagonal matrix of the singular values of YmMR sorted in decreasing order. Let Y1,l and Y2,l be the 2mr × l matrices consisting of the first l columns of Ye1 and Ye2 , respectively, corresponding to the l singular values of magnitude larger than some tolerance. We obtain the truncated singular value decomposition YmMR ≈ U1,l Σl U2,l T , 1/2 where Σl = diag[σ1 , . . . , σl ]. Setting Z1,m = Vm U1,l Σl follows that (4.12) 1/2 , and Z2,m = Wm U2,l Σl , it T Xm ≈ Z1,m Z2,m . This is very important for large problems, because one does not have to compute and store the approximation Xm at each iteration. We notice that there are cheaper ways to compute low-rank representations of the approximate solution; see [3]. The minimal residual algorithm for solving Sylvester matrix equations (MRS) is summarized in Algorithm 4. A LGORITHM 4. The minimal residual method for large Sylvester matrix equations (MR). 1. Choose a tolerance tol > 0, a maximum number of itermax iterations. 2. For m = 1, 2, 3, ..., itermax A 3. Update Vm , Tm , by EBA (Algorithm 1) applied to (A, E). B 4. Update Wm , Tm , by EBA applied to (B T , F ). 5. Solve the low-order problem (3.1); 6. if kR(Xm )kF ≤ tol, stop. 7. End For T 8. Using (4.12), the approximate solution Xm is given by Xm ≈ Z1,m Z2,m . 5. Numerical experiments. In this section, we present some numerical examples of continuous-time Sylvester matrix equations. We give comparisons between the Galerkin projection approach (GA) and the minimal residual (MR) approach for large-scale problems using the global LSQR (Gl-LSQR), the preconditioned global conjugate gradient method (PGCG), the direct Kronecker product when solving the reduced least squares minimization problem, and the Hu-Reichel method (HR). In the last experiment, we also give a comparison with the low-rank factored ADI (Lr ADI) method described in [3, 4]. The algorithms are coded in Matlab 8.0. For the Galerkin approach (GA) and at each iteration m, the projected problem of size 2mr × 2mr is solved by using the Bartels-Stewart algorithm [2]. When solving the reduced minimization problem, the global LSQR and the preconditioned global CG (PGCG) are stopped when the relative norm of the residual is less than toll = 10−12 or when a maximum of kmax = 1000 iterations is achieved. The minimal residual (Algorithm 4) and the Galerkin approach are stopped when the norm of the residual is less than tol = 10−7 , and a maximum of itermax = 50 outer iterations is allowed. ETNA Kent State University http://etna.math.kent.edu 56 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK 8 GA MR 6 Log10 of resid. norms 4 2 0 −2 −4 −6 −8 0 10 20 30 Iterations 40 50 60 F IG . 5.1. GA: solid line, MR: dashed line. The first set of matrices A and B is obtained from the discretisation of the operator (5.1) Lu = ∆u − f1 (x, y) ∂u ∂u + f2 (x, y) + g(x, y) ∂x ∂y on the unit square [0, 1] × [0, 1] with homogeneous Dirichlet boundary conditions. The number of inner grid points in each direction is n0 for the operator Lu . The dimensions of the matrices A and B are n = n20 and s = s20 , respectively. The discretization of the operator Lu yields matrices extracted from the Lyapack package [20] using the command fdm 2d matrix and denoted as A = f dm(f1 (x, y), f2 (x, y), g(x, y)). For the second set of matrix tests, we use the matrices add32, pde2961, and thermal from the Harwell Boeing Collection1 and also the matrix ’flow meter’ from the Oberwolfach Collection2 . The coefficients of the matrices E and F are random values uniformly distributed on [0, 1], and we set r = 2. E XAMPLE 5.1. In Figure 5.1 and Figure 5.2 we plot the residual norms versus the number of iterations for the minimal residual and the Galerkin approaches. For this first experiment, we use the global LSQR algorithm to solve the reduced minimization problem (3.1). In Figure 5.1, the matrices A and B are obtained from the discretisation of the operator Lu with dimensions n = 4900 and s = 3600, respectively. In Figure 5.2, we use the matrices A = pde2961 and B = thermal from the Harwell Boeing collection with dimensions n = 2961 and s = 3456, respectively. For this experiment, the reduced-order problem is solved by the direct least squares method associated to the Kronecker form (4.1). As can be seen from these two figures, the MR algorithm converges successfully, while for the GA approach, we obtain residual norms around 10−5 . E XAMPLE 5.2. For the second set of experiments, we compare the performances of the MR method associated to the different techniques for solving the reduced-order minimization problem and the Galerkin approach (GA). The reduced minimization problem is solved by the direct least squares method, by the global LSQR, by the preconditioned conjugate gradient 1 http://math.nist.gov/MatrixMarket 2 https://portal.uni-freiburg.de/imteksimulation/benchmark ETNA Kent State University http://etna.math.kent.edu MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 57 8 GA MR 6 4 Log10 of resid. norms 2 0 −2 −4 −6 −8 −10 0 5 10 15 20 Iterations 25 30 35 40 F IG . 5.2. GA: solid line, MR: dashed line. method for the normal equation (GPCG), and by the Hu-Reichel method (HR). In Table 5.1, we list the CPU times (in seconds), the total number of outer iterations, and the norms of the residuals obtained by these different approaches. The reported times include the time required for LU computations of the matrices A and B used in the extended block Arnoldi process. As can be seen from this table, the results given by the Galerkin approach are not good (except for the first experiment). In some cases, the approximations produced by the GA method deteriorate after a number of iterations. We also notice that when using the direct method for solving the reduced minimization problem, the CPU time is higher when many outer iterations are needed to achieve convergence; this is the case for the last two experiments. E XAMPLE 5.3. For the last experiment, we compare the performances of the GA approach, the MR method with GPCG as a solver for the low-order minimization problem, and the low-rank factored ADI (Lr ADI) method described in [3, 4]. For this experiment, the matrices A and B are the same as those given in [4, Example 1]. They are obtained from the 5-point discretization of the operator Lu in (5.1) with dimensions n = 6400 for A and s = 3600 for B. For this experiment, E and F are random matrices with r = 4 columns. In Table 5.2, we list the residual norms and the corresponding CPU time for each method. For this experiment, the algorithms are stopped when the relative residual norms are smaller than 10−11 . 6. Conclusion. In this paper we presented an iterative method for solving large-scale Sylvester matrix equations with low-rank right-hand sides. The proposed method is based on a projection onto extended block Krylov subspaces with a minimization property. The obtained low-order minimization problem is solved via iterative or direct methods. To speed up the convergence when solving the low-order minimization problem, we use a preconditioned global conjugate gradient method associated to the normal matrix equation. The approximate solutions are given as a product of two low-rank matrices, which allows to save memory for large problems. The advantage of the minimal residual method as compared to the Galerkintype counterpart is its stability. This is demonstrated by the performed numerical tests. ETNA Kent State University http://etna.math.kent.edu 58 S. AGOUJIL, A. H. BENTBIB, K. JBILOU, AND EL M. SADEK TABLE 5.1 Results for experiments 2. Matrices Method A = thermal, B = add32 n = 3456, s = 4960 MR(PGCG) MR (direct) MR (Gl-LSQR) MR(HR) GA 26s 26s 28s 27s 25s 10 9 10 9 9 1.1 × 10−8 2.1 × 10−8 1.3 × 10−8 3.5 × 10−8 1.4 × 10−8 A = f dm(cos(xy), ey x , 100) B = add32 n = 90000, s = 4960 MR(PGCG) MR (direct) MR (Gl-LSQR) MR(HR) GA 45s 56s 49s 88s 83s 15 15 17 15 21 1.9 × 10−8 2.3 × 10−8 1.1 × 10−8 3.1 × 10−8 8.6 × 10−5 A = f dm(sin(xy), exy , 10) B = thermal n = 122500, s = 3456 MR(PGCG) MR(direct) MR(Gl-LSQR) MR(HR) GA 56s 58s −− 110s 82s 18 16 50 16 25 2.5 × 10−8 3.1 × 10−8 − − −− 2.0 × 10−8 8.4 × 10−5 A = f low B = f dm(sin(xy), xy, 1000) n = 9669, s = 40000 MR (PGCG) MR (direct) MR(Gl-LSQR) MR(HR) GA 49s 700s −− 150s 95s 16 35 50 35 50 2.0 × 10−8 3.1 × 10−8 − − −− 2.4 × 10−8 2.5 × 10−5 A = f dm(xy, y 2 , 1) B = f dm(xy, cos(xy), 10) n = 122500, s = 48400 MR(PGCG) MR(direct) MR(Gl-LSQR) MR(HR) GA 706s 1500s −− −− 805s 18 42 −− −− 50 2.1 × 10−8 1.5 × 10−8 − − −− − − −− 4.2 × 10−4 2 CPU time Its. Res. norm TABLE 5.2 Results for experiments 3. Method MR(PGCG) GA Lr ADI Res. norm. 1.2 × 10−12 6.4 × 10−12 2.5 × 10−12 CPU time 2.8s 3.2s 5.2s Acknowledgments. We would like to thank the two referees for their valuable remarks and helpful comments and suggestions. We also thank Dr. Patrick Kürschner for providing us with the Matlab program for the Lr ADI method. REFERENCES [1] J. BAGLAMA , L. R EICHEL , AND D. R ICHMOND, An augmented LSQR method, Numer. Algorithms, 64 (2013), pp. 263–293. [2] R. H. BARTELS AND G. W. S TEWART, Algorithm 432: Solution of the matrix equation AX + XB = C, Comm. ACM, 15 (1972), pp. 820–826. ETNA Kent State University http://etna.math.kent.edu MINIMAL RESIDUAL METHOD FOR SYLVESTER MATRIX EQUATIONS 59 [3] P. B ENNER , R. C. L I , AND N. T RUHAR, On the ADI method for Sylvester equations, J. Comput. Appl. Math., 233 (2009), pp. 1035–1045. [4] P. B ENNER AND P. K ÜRSCHNER, Computing real low-rank solutions of Sylvester equations by the factored ADI method, Comput. Math. Appl., 67 (2014), pp. 1656–1672. [5] V. D RUSKIN AND L. K NIZHNERMAN, Extended Krylov subspaces: approximation of the matrix square root and related functions, SIAM J. Matrix Anal. Appl., 19 (1998), pp. 755–771. [6] A. E L G UENNOUNI , K. J BILOU , AND A. J. R IQUET, Block Krylov subspace methods for solving large Sylvester equations, Numer. Algorithms, 29 (2002), pp. 75–96. [7] G. H. G OLUB AND W. K AHAN, Calculating the singular values and pseudo-inverse of a matrix, J. Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224. [8] G. H. G OLUB , S. NASH , AND C. VAN L OAN, A Hessenberg Schur method for the problem AX +XB = C, IEEE Trans. Automat. Control, 24 (1979), pp. 909–913. [9] M. H EYOUNI AND K. J BILOU, An extended Block Arnoldi algorithm for large-scale solutions of the continuous-time algebraic Riccati equation, Electron. Trans. Numer. Anal., 33 (2009), pp. 53–62. http://etna.mcs.kent.edu/vol.33.2008-2009/pp53-62.dir [10] M. H EYOUNI, Extended Arnoldi methods for large low-rank Sylvester matrix equations, Appl. Numer. Math., 60 (2010), pp. 1171–1182. [11] D. H U AND L. R EICHEL, Krylov-subspace methods for the Sylvester equation, Linear Algebra Appl., 172 (1992), pp. 283–313. [12] I. M. JAIMOUKHA AND E. M. K ASENALLY, Krylov subspace methods for solving large Lyapunov equations, SIAM J. Numer. Anal., 31 (1994), pp. 227–251. [13] K. J BILOU, Low-rank approximate solution to large Sylvester matrix equations, App. Math. Comput., 177 (2006), pp. 365–376. [14] K. J BILOU AND A. J. R IQUET, Projection methods for large Lyapunov matrix equations, Linear Algebra Appl., 415 (2006), pp. 344–358. [15] Z. J IA AND Y. S UN, A QR decomposition based solver for the least squares problems from the minimal residual method for the Sylvester equation, J. Comput. Math., 25 (2007), pp. 531–542. [16] D. K RESSNER AND C. T OBLER, Low-rank tensor Krylov subspace methods for parametrized linear systems, SIAM J. Matrix Anal. Appl., 32 (2011), pp. 1288–1316. [17] P. L ANCASTER AND M. T ISMENETSKY, The Theory of Matrices, Academic Press, Orlando, 1985. [18] Y. L IN AND V. S IMONCINI, Minimal residual methods for large scale Lyapunov equations, App. Numer. Math., 72 (2013), pp. 52–71. [19] C. C. PAIGE AND A. S AUNDERS, LSQR: an algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Software, 8 (1982), pp. 43–71. [20] T. P ENZL, LYAPACK: A MATLAB toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear-quadratic optimal control problems, software available at https://www.tu-chemnitz.de/sfb393/lyapack/. [21] C. M OLER AND C. VAN L OAN, Ninteen dubious ways to compute the exponential of a matrix, SIAM Rev., 20 (1978), pp. 801–836. [22] Y. S AAD, Numerical solution of large Lyapunov equations, in Signal Processing, Scattering, Operator Theory and Numerical Methods, M. A. Kaashoek, J. H. Van Shuppen, and A. C. Ran, eds., Progress in Systems and Control Theory 5, Birkhäuser, Boston, 1990, pp. 503–511. [23] V. S IMONCINI, A new iterative method for solving large-scale Lyapunov matrix equations, SIAM J. Sci. Comput., 29 (2007), pp. 1268–1288.