ETNA Electronic Transactions on Numerical Analysis. Volume 43, pp. 163-187, 2015. Copyright 2015, Kent State University. ISSN 1068-9613. Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING∗ JESSE L. BARLOW† Abstract. Given positive integers m, n, and p, where m ≥ n + p and p ≪ n. A method is proposed to modify the QR decomposition of X ∈ Rm×n to produce a QR decomposition of X with p rows deleted. The algorithm is based upon the classical block Gram-Schmidt method, requires an approximation of the norm of the inverse of a triangular matrix, has O(mnp) operations, and achieves an accuracy in the matrix 2-norm that is comparable to similar bounds for related procedures for p = 1 in the vector 2-norm. Since the algorithm is based upon matrixmatrix operations, it is appropriate for modern cache oriented computer architectures. Key words. QR decomposition, singular value decomposition, orthogonality, downdating, matrix-matrix operations. AMS subject classifications. 65F25, 65F20, 65F35 1. Introduction. Given a matrix X ∈ Rm×n and an integer p, where m ≥ n + p and p ≪ n. Suppose that we have the orthogonal decomposition (i.e., QR decomposition) (1.1) X = U R, where U ∈ Rm×n is left orthogonal and R ∈ Rn×n is upper triangular. Let X be partitioned as X0 p , (1.2) X= X m−p and suppose that we wish to produce the QR decomposition (1.3) X = U R, where U ∈ R(m−p)×n̄ is left orthogonal, R ∈ Rn̄×n is upper trapezoidal, and n̄ ≤ n. Obtaining (1.3) inexpensively from (1.1), called the block downdating problem, is important in the context of solving recursive least squares problems where observations are added or deleted over time. It also arises as an intermediate computation in a recent null space algorithm by Overton et al. [18]. In (1.2), the first p observations are deleted, but, by simply applying a row permutation to X, any p observations can be deleted. Without changing the algorithm, we could assume that U ∈ Rm×n0 and R ∈ Rn0 ×n is upper trapezoidal, where n0 ≤ n, but, for simplicity, we assume n0 = n. For p = 1, the block downdating problem (or simply the downdating problem) has an extensive literature [4, 9, 20]. A block downdating algorithm for the Cholesky decomposition based upon hyperbolic transformations was described by Q. Liu [17]. Our block CGS algorithm, closely related to the BCGS2 algorithm in [3] and the CGS2 algorithm in [1, 12], has matrix-matrix operations substituted for the matrix-vector operations, orthogonal decompositions—either the QR or the singular value decomposition—substituted for normalizations along with inverse norm estimates and singular values used in orthogonality tests instead of the size of vector norms. This leads to a BLAS-3 [10] or matrix-matrix operation oriented algorithm, which is more suited to modern computer architectures as it makes more effective use of caching. It requires O(mnp) operations. ∗ Received December 5, 2013. Accepted November 5, 2014. Published online on January 23, 2015. Recommended by L. Reichel. The research of Jesse L. Barlow was sponsored by the National Science Foundation under contract no. CCF-1115704. † Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802–6822, USA (barlow@cse.psu.edu). 163 ETNA Kent State University http://etna.math.kent.edu 164 J. L. BARLOW The downdating problem for p = 1 is solved by adding the column b = e1 , the first column of the identity, to X in (1.1), updating its QR factorization, and obtaining the matrices U and R in (1.3) as “by-products” of that updating process. For p > 1 we substitute for b = e1 the left orthogonal matrix B ∈ Rm×p given by V p , (1.4) B= 0 m−p where V ∈ Rp×p is orthogonal. Following a script in [3] for adding a block of columns B, we seek an integer k ≤ p, a left orthogonal matrix QB ∈ Rm×k , an upper trapezoidal matrix RB ∈ Rk×p , and a matrix SB ∈ Rn×p such that B = U SB + QB R B , (1.5) T U QB = 0. (1.6) Once the problem (1.5)–(1.6) is solved, we have the decomposition RB 0 V X0 = QB U . SB R 0 X We then let Z ∈ R(n+k)×(n+k) be an orthogonal matrix such that (1.7) Z T RV Y 0 p 0 = 0 , n̄ R R |{z} |{z} RB SB p n where n̄ = n − p + k and R ∈ Rn̄×n remains upper trapezoidal. Applying Z to QB we obtain (1.8) e = QB U where we note that (1.9) Thus, V 0 i h e1 U e2 , U Z= U |{z} |{z} p X0 e RV =U 0 X n̄ Y0 . R V e1 RV , =U 0 e1 in (1.8) has the form requiring that the matrix U p U e (1.10) U1 = V 0 m−p and implying that UV , RV ∈ Rp×p must be orthogonal. U , ETNA Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING 165 e is left orthogonal, U e2 ∈ Rm×n̄ has the form Since U e2 = 0 p (1.11) U U m−p so that U in (1.11) and R in (1.9) satisfy (1.3). Once QB , SB , and RB are obtained, the decomposition (1.7) can be constructed efficiently as the product of 2 × 2 orthogonal transformations. Unfortunately, if the matrix (1.12) C= B U is rank deficient, the problem of obtaining QB , RB , and SB in (1.5)–(1.6) is ill-posed; the left orthogonal matrix QB is not unique, RB is rank deficient, and the resulting factorization in (1.3) is rank deficient. From (1.5)–(1.6), C in (1.12) can be factored into SB I n . C = QB U RB 0 Since QB U is left orthogonal and the matrix on the right is quasi-upper triangular, C is rank deficient (ill-conditioned) only if RB is. There is little difficulty in obtaining QB , RB , and SB that satisfy (1.5) with a small residual, i.e., where kB − U SB − QB RB k2 is small. However, if C is (near) rank deficient, the ill-conditioning in RB can make it difficult to obtain QB such that U T QB 2 is small, i.e., such that (1.6) has a small residual. Understanding this issue and constructing an algorithm that addresses it are the main themes of the text below. To develop block CGS downdating, we relax the assumption that U is left orthogonal and instead assume that I n − U T U ≤ ξ ≪ 1 (1.13) 2 for some small and unknown value ξ. In some contexts (as in, say, [3]), we may assume that ξ ≤ f (m, n)εM where εM is machine precision and f (m, n) is a modestly growing function, but, in our discussion, we simply assume that it is “small.” It is possible that ξ depends on the condition number of R, for example, when it is the result of a modified Gram-Schmidt QR factorization [6]. Using the assumption (1.13), in Section 2 we design our algorithm with the goal of computing QB ∈ Rm×k , RB ∈ Rk×p upper trapezoidal, and SB ∈ Rn×p , k ≤ p, such that √ kB − U SB − QB RB k2 ≤ 5ξ + O(ξ 2 ), (1.14) T U QB ≤ 0.5ξ + O(ξ 2 ). (1.15) 2 When k < p, i.e., when RB is strictly upper trapezoidal, the algorithm produces a lower bound estimate ξest for ξ in (1.13). The algorithm we develop and the bounds associated with it, (1.14)–(1.15), assume the reliability of an O(p2 log p) operation heuristic that, for an upper R2 , finds triangular matrix the largest integer k such that, for a prescribed constant βorth , R2−1 (1 : k, 1 : k)2 ≤ βorth . The outline of this paper is as follows. The algorithm for computing QB , RB , and SB that satisfy (1.14)–(1.15) is assembled in Section 2. It is based firmly upon the function block CGS2 step from [3, Function 2.2], which we restate as Function 2.1. Our modification, given as block CGS2 down (Function 2.2), is justified by Theorems 2.1 and 2.2, ETNA Kent State University http://etna.math.kent.edu 166 J. L. BARLOW F UNCTION 2.1 (Function block CGS2 step from [3]). function [QB , SB , RB ]=block CGS2 step(U, B) % % First block Gram-Schmidt step Produces Y1 = (I − U U T )B % Left orthogonal Q1 satisfies Range(Q1 ) = Range(Y1 ) = Range((Im − U U T )B) % if R1 nonsingular % % (1) S1 = U T B; (2) Y1 = B − U S1 ; (3) Q1 R1 = Y1 ; % QR decomposition of Y1 % % Second block Gram-Schmidt step Produces Y2 = (I − U U T )Q1 % Left orthogonal QB satisfies Range(QB ) = Range(Y2 ) = Range((Im − U U T )2 B) % if R1 and R2 nonsingular % % (4) S2 = U T Q1 ; (5) Y2 = Q1 − U S2 ; (6) QB R2 = Y2 ; % QR decomposition of Y2 % % % Assemble RB and SB to satisfy (1.5) (7) SB = S1 + S2 R1 ; RB = R2 R1 ; end block CGS2 step requires a condition number estimate to determine the appropriate value k for the dimensions of QB and RB and leads to the function block downdate info (Function 2.4). We forgo a backward error analysis of block downdate info, but this could also be established from the backward error analysis of block CGS2 step in [3, Section 3.2]. In Section 3.1, we perform a sequence of Givens rotations to produce Z in (1.7) that results in the factorization (1.3). In Section 3.2, we give an analysis of the relationship between T e1 and U e2 deviate from struc In̄ − U U and In − U T U F and show how, in practice, U F tural orthogonality in (1.10)–(1.11). Numerical tests are presented in Section 4 along with a Householder-based algorithm for adding a block of rows in Section 4.1. Proofs of important theorems are given in Section 5, and we summarize with a brief conclusion in Section 6. 2. A block Gram-Schmidt algorithm for downdating. 2.1. The functions block CGS2 step and block CGS2 down. We begin our development by examining the function block CGS2 step from [3, Function 2.2] (given as Function 2.1 here) applied to a general matrix B ∈ Rm×p with the intention of producing matrices QB , RB , and SB to solve (1.5)–(1.6). The function performs two classical GramSchmidt (BCGS) steps on B. The resulting matrices QB and RB satisfy QB RB = (Im − U U T )2 B, and QB , SB , and RB satisfy (1.5). The remaining issue is the extent to which (1.6) or at least (1.15) can be satisfied. In [3], Function 2.1 is part of a QR factorization of a larger matrix, and the conditions on that QR factorization implicitly impose conditions on RB relative to the loss of orthogonality in U . ETNA Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING 167 Assuming U satisfies (1.13) and that RB is nonsingular, we have that −1 U T QB = (In − U T U )2 U T BRB , kU k2 ≤ (1 + ξ)1/2 . A crude norm bound is T U QB ≤ In − U T U 2 kU k kBk R−1 B 2 2 2 2 2 ≤ ξ 2 (1 + ξ)1/2 kBk R−1 . 2 B 2 If (2.1) for some constant corth , then (2.2) −1 ≤ corth ξ kBk2 RB 2 T U QB ≤ corth ξ + O(ξ 2 ). 2 The inequality (2.1) poses two problems: we cannot always guarantee that RB is nonsingular much less that it satisfies (2.1), and ξ is not necessarily known. The relationship (2.1)–(2.2) can be made columnwise in that for any positive integer k ≤ p, T U QB ( : , 1 : k) ≤ corth ξ + O(ξ 2 ) (2.3) 2 if (2.4) −1 (1 : k, 1 : k)2 ≤ corth . ξ B( : , 1 : k)2 RB Unfortunately, (2.3)–(2.4) does not lead to such a bound as in (1.14). The downdating problem (1.3) demands that we choose B of the form (1.4), but our ideas for choosing QB , RB , and SB apply to the situation in [3] where B is more general, provided that we normalize it as kBk2 = 1. For the downdating problem, RB and SB are discarded once the decomposition (1.3) is computed, whereas for the application in [3], these quantities are needed in subsequent computations. To delete the first p rows from the QR decomposition (1.1), we let I B0 = p 0 and look at the application of the Function 2.1 to B0 . Steps (1)–(2) of Function 2.1 read S1 = U T B0 = U (1 : p, : )T , Y 1 = B 0 − U S1 . Instead of computing the QR decomposition in step (3), we compute the singular value decomposition (SVD) of Y1 . Thus, we have (2.5) Y 1 = Q1 R 1 V T , Q1 ∈ Rm×p , left orthogonal, (2.6) V ∈ Rp×p , orthogonal, R1 = diag(ρ1 , . . . , ρp ), ρ1 ≥ · · · ≥ ρp ≥ 0. ETNA Kent State University http://etna.math.kent.edu 168 J. L. BARLOW R EMARK 2.1. In (2.5), the matrix R1 in the statement (3) of Function 2.1 is replaced by the diagonal matrix of Y1 ’s singular values rather than its upper triangular factor. We use the same notation for it since it is the matrix R1 obtained by Function 2.1 with B given by (1.4) and V being the right singular vector matrix in (2.5). Moreover, R1 is used in the same way in the algorithm described below as it is in Function 2.1. We substitute the SVD because it sets us up to extract useful information for solving the problem (1.14)–(1.15). To obtain a matrix QB that is near orthogonal to U , the second BCGS step, steps (4)–(5) in Function 2.1, reads S2 = U T Q1 , Y 2 = Q1 − U S2 , (2.7) b B R2 = Y2 , QR decomposition, Q b B ∈ Rm×p , left orthogonal, Q R2 ∈ Rp×p , upper triangular. b B are left orthogonal to near machine accuracy, that is, To guarantee that Q1 and Q b B }, Ip − QT Q ≤ εM L(m, p) ≤ ξ, Q ∈ {Q1 , Q (2.8) F where L(m, p) = O(mp3/2 ), εM is the machine unit, and ξ is defined in (1.13), we recommend that (2.5) is computed with either the Golub-Kahan-Householder (GKH) SVD [13] or the Lawson-Hanson-Chan (LHC) SVD [16, Section 18.5], [8] and that the QR factorization (2.7) may be computed using the Householder QR factorization [7]. Step (6) of Function 2.1 becomes (2.9) SB = S1 V + S2 R 1 , b B = R2 R1 . R bB , R bB , and SB such that Equations (2.7) and (2.9) yield Q (2.10) bB R bB . B = U SB + Q b B and R bB because we are not yet ready to accept these matrices as solutions We write Q to (1.14)–(1.15), however, SB will not be further modified by our algorithms. The only difference between (2.9) and step (6) of Function 2.1 is the presence of V in the computation of SB . The modifications (2.5) and (2.9) to block CGS2 step produce block CGS2 down bB , R bB , and SB as Function 2.1 given in Function 2.2. This function produces the same Q applied to U and B in (1.4). Note that it also outputs R1 , the diagonal matrix of singular values from the SVD in (2.5)–(2.6), and the upper triangular matrix R2 . R EMARK 2.2. Function 2.2 requires three matrix multiplications with U , one SVD of Y1 using the GKH or LHC SVD, one QR decomposition of Y2 , two matrix multiplications and one matrix addition to form SB , and one multiplication of a triangular matrix by a diagonal matrix to form RB . This totals to 6mnp+ 10mp2 + (20/3)p3 + 2np2 + O(m+ n) operations. The dominant cost is the matrix multiplication with U . bB , R bB , and SB satisfy (1.5) (and thus Although equation (2.10) guarantees that Q b also (1.14)), we cannot guarantee that QB = QB satisfies (1.15). However, we are able b B that are sufficiently orthogonal to U . More preto identify a subset of the columns of Q cisely, for a given constant corth such that 0 < corth ≤ 1, we intend to find the largest ETNA Kent State University http://etna.math.kent.edu 169 BLOCK GRAM-SCHMIDT DOWNDATING F UNCTION 2.2 (The function block CGS2 down). b B , SB , R bB , R2 , R1 ]=block CGS2 down(U, p) function [Q % % % First orthogonalization of B0 against U I (= U (1 : p, : )T ); (1) S1 = U T p 0 I (2) Y1 = p − U S1 ; 0 (3) Q1 R1 V T = Y1 ; % % % SVD of Y1 Second orthogonalization % (4) S2 = U T Q1 ; Y2 = Q1 − U S2 ; b B R2 = Y2 ; (5) Q % QR decomposition of Y2 b B = R2 R1 ; (6) SB = S1 V + S2 R1 ; R end block CGS2 down integer k such that Tb (2.11) U QB ( : , 1 : k) ≤ corth ξ + O(ξ 2 ), 2 Tb U Q ( : , 1 : k) (2.12) ≤ corth ξF + O(ξ 2 ), B F ξF ≥ max{In − U T U F , ξ}. In this paper, we default to corth = 0.5 as it was used in [4] for the case p = 1. Daniel et al. [9] used corth = 1 for the case p = 1 since it maintains roughly the same level of orthogonality. Values of corth > 1 are not desirable since that would make the bound in (2.11) larger than the bound in (1.13). On the other hand, we do not want the restriction (2.11) to be too harsh, thus we recommend corth to be bounded away from zero. To assure the bound (1.15), we need two theorems both of which require the definition (2.13) βj = R2−1 (1 : j, 1 : j)2 , j = 1, . . . , p, β0 = 0 and both of which are proved in Section 5.1. Equation (2.13) defines the norms of the inverses of the leading principal submatrices of R2 in (2.7). The first theorem relates the values βj in (2.13) to the singular values in the diagonal matrix R1 in (2.5). T HEOREM 2.1. Assume that U satisfies (1.13). Let Q1 in (2.5) be left orthogonal and βj , j = 1, . . . , p, be given by (2.13). Let ρj , j = 1, . . . , p, be the singular values of Y1 in (2.5). If, for a given constant corth with 0 < corth ≤ 1, k is the largest integer such that 1/2 (2.14) βk ≤ βorth = 1 + c2orth and k < p, then (2.15) 2 ρj ≤ αorth ξ + O(ξ ), αorth 1 + c2orth = corth 1/2 , for j = k + 1, . . . , p. √ √ Using corth = 0.5, the two constants in Theorem 2.1 are βorth = 5/2 and αorth = 5. If k < p, then the inequality ρk+1 ≤ αorth ξ + O(ξ 2 ) ETNA Kent State University http://etna.math.kent.edu 170 J. L. BARLOW yields the lower bound estimate of ξ given by ξest = ρk+1 /αorth . (2.16) This estimate may be useful in determining if U has become too far off from being left orthogonal after a sequence of modifications to the QR factorization. A second theorem, which follows from Theorem 2.1, shows that establishing k in (2.14) leads to an algorithm to produce QB , SB , and RB satisfying (1.14)–(1.15). T HEOREM 2.2. Assume the hypothesis and terminology of Theorem 2.1 including the bB , R bB , and SB be the output of Function 2.2. Then, assumption that k satisfies (2.14). Let Q we have (2.11)–(2.12) and (2.17) where Dp = 0m×p , (2.18) and (2.19) b B ( : , 1 : k)R bB (1 : k, : ) + Dk , B = U SB + Q b B ( : , k + 1 : p)R bB (k + 1 : p, k + 1 : p), Dk = Q kDk k2 ≤ ( 0 ρk+1 k < p, k = p, k < p. b B ( : , 1 : k), Thus, using the bound for ρk+1 in (2.15), for corth = 0.5, we have that QB = Q b RB = RB (1 : k, : ), and SB satisfy (1.14)–(1.15). In the next section, to find the integer k that satisfies (2.14), we give an O(p2 log p) operation algorithm. b B ( : , 1 : k) is near orthogonal to U . To pro2.2. Finding the largest k such that Q duce an algorithm to find the largest k satisfying (2.14), for the upper triangular matrix R2 in step (5) of Function 2.2, we need an algorithm to compute R2−1 (1 : j, 1 : j)2 with reasonable accuracy for a given j, and we need a binary search. To produce the first, we compute zj , wj ∈ Rj such that R2−1 (1 : j, 1 : j)wj = βj zj , R2−T (1 : j, 1 : j)zj = βj wj + fj , thus yielding the approximate leading singular triplet (βj , zj , wj ) of R2−1 (1 : j, 1 : j). The vector fj is a residual that satisfies wjT fj = 0, kfj k2 ≤ tol ∗ βj for some tolerance tol. This can easily be done with a few steps of a Golub-Kahan-Lanczos (GKL) bidiagonal reduction followed by an algorithm to find the largest singular triplet of a bidiagonal matrix. This approach is related to ideas in Ferng, Golub, and Plemmons [11] and ideas that have been used in ULV decompositions [2, 5]. If we are seeking the leading singular value of R2−1 (1 : j, 1 : j), such a procedure is akin to finding the leading eigenvalue of a symmetric, positive definite matrix by the Lanczos algorithm, and no reorthogonalization is necessary in this circumstance [19]. Since the details of GKL bidiagonal reduction and the process of extracting the leading singular triplet is explored in detail elsewhere (see, for instance, [14, Chapters 8 and 9]), we skip these here and simply assume that the triplet (βj , zj , wj ) can be delivered by the “black box” call [βj , zj , wj ] = GKL inv norm(R2 (1 : j, 1 : j)). ETNA Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING 171 F UNCTION 2.3 (Lanczos-based routine for finding k satisfying (2.14)). function k = max col orth(R2 , βorth ) n=length(R2 ); if βorth |R2 (1, 1)| < 1 % % All of R2 is too small, no columns are guaranteed orthogonal % k = 0; else [β, z, w]=GKL inv norm(R2 ); if β <= βorth % % % −1 R ≤ βorth , all columns are guarantee orthogonal 2 2 k = p; else % % % Do binary search f irst = 1; last = p; % % % % At any given point in this loop f irst ≤ k < last while last − f irst > 1 middle = ⌊(f irst + last)/2⌋; cols = 1 : middle; [β, z, w] = GKL inv norm(R2 (cols, cols)); if β > βorth last = middle; else f irst = middle; end; end; k = f irst; end; end; end max col orth Coupling GKL inv norm with a binary search that successively brackets k in the interval f irst ≤ k < last that starts with f irst = 1 and last = p and converges when k = f irst = last − 1, yields the function max col orth (Function 2.3) that produces k in (2.14). 2.3. The necessary information for a block downdate. The following function, Funcb B , SB , and R bB and then tion 2.4, uses block CGS2 down (Function 2.2) to produce Q b uses max col orth (Function 2.3) to find k such that QB = QB ( : , 1 : k), SB , and bB ( : , 1 : k) satisfy (1.14)–(1.15) as shown by Theorem 2.2. RB = R ETNA Kent State University http://etna.math.kent.edu 172 J. L. BARLOW F UNCTION 2.4 (Block Downdate Information). function [QB , SB , RB , k, ξest ]=block downdate info (U, p) % % % Produces information for a Block Downdate Operation for deleting the first p rows from a QR decomposition where U is a near left orthogonal matrix. % Function max −1orth colsfrom Section 2.2 is called to find the largest integer k % such that R2 (1 : k, 1 : k) ≤ βorth for 1 ≤ j ≤ p. 2 % % % Define constants used in the function. Specify corth = 0.5 to get the bounds (1.14)–(1.15). % 1/2 (1) corth = 0.5; βorth = 1 + c2orth ; αorth = βorth /corth ; b B , SB , R bB , R2 , R1 ]=block CGS2 down(U, p); (2) [Q % % % b B guaranteed orthogonal to U . Determine “rank =k”, the number of columns of Q (3) k=max orth cols(R2 , βorth ) % % % Give lower bound estimate, ξest in (2.16), for ξ. Note that ρk+1 = R1 (k + 1, k + 1). % (4) if k < p (5) ξest = R1 (k + 1, k + 1)/αorth ; (6) else (7) ξest = 0; (8) end; % % Produce rank k solution to satisfy (1.14)–(1.15). % bB (1 : k, : ); (9) RB = R b B ( : , 1 : k); (10) QB = Q end block downdate info R EMARK 2.3. Except for O(p2 log p) operations for max orth cols, almost all of the operations for Function 2.4 are from block CGS2 down, thus it requires 6mnp + 10mp2 + (20/3)p3 + 2np2 + O(m + n + p2 log p) operations. 3. Producing a new QR factorization. 3.1. The algorithm to produce a new QR factorization. The remaining step in performing the block downdate is to find an orthogonal matrix Z such that RV Y 0 p RB 0 k (3.1) ZT S , n̄ = n − p + k, = R n n̄ R 0 B |{z} |{z} |{z} |{z} p n p n where R remains upper trapezoidal. First, permute the rows of the above matrix so that RB 0 k R 0 B PT S R n = SB (n̄ + 1 : n, : ) R(n̄ + 1 : n, : ) . B |{z} |{z} SB (1 : n̄, : ) R(1 : n̄, : ) p n ETNA Kent State University http://etna.math.kent.edu 173 BLOCK GRAM-SCHMIDT DOWNDATING We then let Z̆0 be a product of Householder transformations such that RB T = R̆B , Z̆0 SB (n̄ + 1 : n, : ) Z̆0 0 , we have where R̆B is upper triangular. Letting Z0 = 0 In̄ RB 0 R̆B Y̆0 p T . Z0 SB (n̄ + 1 : n, : ) R(n̄ + 1 : n, : ) = n̄ S̆B R̆ SB (1 : n̄, : ) R(1 : n̄, : ) |{z} |{z} p n Note that R̆ is upper trapezoidal and Y̆0 is nonzero only in the columns n̄ + 1, . . . , n. The remaining orthogonal transformations in Z are either Givens rotations or Householder transformations applied to only two rows. For simplicity, we just refer to both kinds as “Givens rotations”. Since R̆B is upper triangular, we let (3.2) Z̆ = Z1 · · · Zp , where each Zj , j = 1, . . . , p, is given by Zj = Gj,p+n̄ · · · Gj,p+1 and Gj,ℓ is a Givens rotation rotating rows j and ℓ and inserting a zero in position (j, ℓ) so that R̆B RV . = Z̆ T 0 S̆B In terms of data movement, (3.2) is a poor Givens ordering. A better one is to let Here we have Z̆ = Zb1 · · · Zbp−1 Zbp · · · Zbn · · · Zbn̄+p−1 . Gj,n̄+p · · · G1,n+p−j+1 Zbj = Gp,n̄+2p−j · · · G1,n̄+p−j+1 Gp,n̄+2p−j · · · Gj−n̄+1,p+1 j < p, p ≤ j ≤ n̄, n̄ < k < n̄ + p. On its set of “active rows,” each of these Zbj has the form Γj ∆j Zbj = , Γ2j + ∆2j = I, −∆j Γj where Γj and ∆j are diagonal. The orthogonal factor Z in the operation above is given by Z = P Z0 Zb1 . . . Zbn̄+p−1 . R EMARK 3.1. Ignoring terms of O(mn), the algorithm described in this section requires 2mp2 +(10/3)p3 −2kp2 operations to calculate and apply Z0 and 6mn̄p+3nn̄p+3n̄p2 operations to calculate and apply Z̆, with a total of 6mn̄p+3nn̄p+3n̄p2 +2mp2 +(10/3)p3 −2kp2 operations. The dominant cost in this algorithm is the operation of updating the orthogonal factor as in (1.8). ETNA Kent State University http://etna.math.kent.edu 174 J. L. BARLOW e 3.2. Properties of the new QR factorization. In our new QR factorization, we have U given in (1.8) with (3.3) UV ∆U p e U = ∆U U , m−p V |{z} |{z} p n̄ where, if ξ = 0, ∆U = 0p×n̄ and ∆UV = 0(m−p)×p . In practice, this will not necessarily be the case. Given the QR factorization (1.1), our downdate algorithm has produced RB 0 V X0 + Dk 0 = QB U SB R 0 X RB 0 = QB U ZZ T + Dk 0 SB R e R V Y 0 + Dk 0 =U 0 R UV ∆U RV Y0 (3.4) + Dk 0 . = 0 R U ∆UV Blockwise that is V UV = R V + Dk , ∆UV 0 X0 = UV Y0 + (∆U )R, (3.5) X = (∆UV )Y0 + U R. We accept U above to produce (1.3). Thus we need bounds for X − U R (3.6) F and (3.7) T In̄ − U U . F For the quantity in (3.6), we could use an identical argument to obtain a bound in the 2-norm, but for the quantity in (3.7), the Frobenius norm yields a more meaningful bound. Bounds for ∆U 2 and k∆UV k2 are also desirable. Theorems 3.1 and 3.3 are proved in Section 5.2. The short proof of Corollary 3.2 following from Theorem 3.1 is given in this section. e be the result of the matrix Z defined to perform the operation (1.7) T HEOREM 3.1. Let U using QB , RB , and SB produced by Function 2.4 with input U satisfying (1.13) and QB e is partitioned according satisfying (2.8). Assume that QB is exactly left orthogonal. If U to (3.3), then (3.8) (3.9) (3.10) where k∆UV k2 ≤ αorth ξ + O(ξ 2 ), ∆U ≤ (αorth + γorth )ξ + O(ξ 2 ), 2 T e U e ≤ γorth ξ + O(ξ 2 ), In+k − U 2 γorth = 1 + corth . ETNA Kent State University http://etna.math.kent.edu 175 BLOCK GRAM-SCHMIDT DOWNDATING C OROLLARY 3.2. Assume the terminology and hypothesis of Theorem 3.1. Then X − U R ≤ k∆UV k kXk (3.11) 2 F F ≤ αorth ξ kXkF + O(ξ 2 ). (3.12) Proof. From (3.5), we have X − U R = X − UR = (∆UV )Y0 , where Y0 = Z( : , 1 : p)T 0 . R Thus, Since X − UR = k(∆UV )Y0 k ≤ k∆UV k kY0 k . F 2 F F T 0 Z( : , 1 : p) kY0 kF = ≤ kXkF , R F we have (3.11). The bound (3.8) gives us (3.12). We now let U V p ∆U p e e U1 = , U2 = ∆UV m−p m−p U and present a theorem bounding the quantity in (3.7). T HEOREM 3.3. Assume theterminologyand hypothesis of Theorem 3.1. Let QB satisfy (2.8), and define ξF = max{In − U T U F , ξ}. Then 2 T 2 e T e 2 2 T e e In̄ − U2 U2 ≤ ξF + 2 U QB F − U1 U2 F (3.13) (3.14) (3.15) Thus, (3.16) F 2 2 eT U e1 + Ip − QTB QB F − Ip − U , 1 F T e2T U e2 In̄ − U U ≤ In̄ − U + (∆U )T (∆U )F , F F √ T e e ≤ In̄ − U2 U2 + p(αorth + γorth )2 ξ 2 + O(ξF3 ). F 2 5 T 2 e T e 2 3 T e e − U U − U I In̄ − U U ≤ ξF2 − 2 U p 1 + O(ξF ). 1 1 2 2 F F F Theorem 3.3 establishes that a loss of orthogonality in U will not be significantly worse than that that U is closer to a left orthogonal matrix than U in U . Moreover, it is possible eT e eT U e1 since Ip − U and may contain a significant portion of the loss of orthogU U 1 1 2 F F e while Ip − QT QB will be near machine accuracy and our implied bound onality in U F B from (1.15) for U T QB F could be quite pessimistic. In the next section, our numerical tests on a sliding window problem bear out this observation. ETNA Kent State University http://etna.math.kent.edu 176 J. L. BARLOW 4. Numerical tests. We consider a classic “sliding window” problem from statistics. Here, we compute the QR decomposition of a matrix X(t) ∈ Rm×n which is a slice of a large matrix Xbig ∈ RM ×n . The results displayed are for M = 4000, m = 300, n = 250. At step t, X(t) = Xbig (p ∗ (t − 1) + 1 : p(t − 1) + m, : ) , where, in the example shown, p = 40. The matrix Xbig is constructed by generating a random M × n matrix using MATLAB’s randn function that simulates a standard normal distribution and then multiplying the rows at random by factors of 1, 10−7 , 10−14 , and 10−21 . Thus Xbig is a random matrix with rows that have large and small entries. 4.1. Adding a block of rows to a QR factorization. For the purposes of our numerical tests in Section 4.2, we also need an algorithm that adds a block of rows to a QR factorization. Again, supposing that we already have the factorization (1.1) and we wish to add a p×n block of rows given by Xnew , then we simply compute the QR factorization R Rnew , = Qnew 0 Xnew where Qnew is the product of n Householder transformations. Thus, the new QR factorization is X = Unew Rnew , Xnew where Unew = U 0 0 Qnew ( : , 1 : n). Ip 4.2. The sliding window experiment. Given the QR factorization X(t) = U (t)R(t) at step t, the QR factorization of X(t+1) is produced by adding p rows at the bottom of X(t) using the algorithm in Section 4.1 to produce its QR factorization, deleting the p rows at the top of X(t), and updating its QR factorization using Function 2.4 followed by the algorithm in Section 3.1 to obtain X(t + 1) = U (t + 1)R(t + 1). For t = 1, 2, . . . , rank, changes were frequent because of the wild scaling of Xbig . At t = 1, we produce the QR decomposition X(1) = U (1)R(1) using the modified Gram-Schmidt algorithm. Since X(1) is ill-conditioned, consistent with the bound in [6], we expect that I − U (1)T U (1) 2 will be significantly larger εM = 2−53 ≈ 1.1102 × 10−16 . than the IEEE double precision machine unit ETNA Kent State University http://etna.math.kent.edu 177 BLOCK GRAM-SCHMIDT DOWNDATING Orthogonality for Sliding Window Problem 0 Log10 of orthogonality loss Orthogonality loss in (4.1) Loss estimate in (2.22) −5 −10 −15 0 20 40 60 Window number 80 100 F IG . 4.1. Loss of orthogonality for the sliding window example. We produce two graphs. The first, Figure 4.1, is the graph of (4.1) ξ = I − U (t)T U (t)2 . The symbol ”+” in the graph indicates the estimate of ξ given by ξest in (2.16) from the function block downdate info whenever k < p holds in Theorem 2.2. As can see be observed, ξest is a fairly accurate estimate of ξ. The second, Figure 4.2, is the graph of (4.2) kX(t) − U (t)R(t)k2 · kX(t)k2 The value of ”*” is graphed whenever k < p hold in Function 2.4, which makes R having smaller rank than R. Note that in Figure 4.1, the value of I − U (t)T U (t)2 improves to near machine precision after about 10–20 steps, and this improvement persists. A similar pattern can be observed for the relative residual kX(t) − U (t)R(t)k2 / kX(t)k2 , that is, it also improves to near machine precision and remains at this level. If we compute the QR factorization of X(1) with Householder transformations instead of the modified Gram-Schmidt method, the loss of orthogonality starts out at near machine precision and stays there. The residuals, graphed in Figure 4.2, follow the same pattern. We have repeated this test with different values of M , m, n, and p many times. If U (1) satisfied the fundamental assumption (1.13), the result was always similar. However, if the matrix U (1) in the initial MGS factorization of X(1) did not satisfy the assumption (1.13), meaning that U (1) could not be considered “near left orthogonal” and thus did not meet a fundamental assumption of this work, the residuals still self-corrected, but the loss of orthogonality either did not self-correct or took significantly longer to do so. ETNA Kent State University http://etna.math.kent.edu 178 J. L. BARLOW Residuals for Sliding Window Problem 0 −5 Log 10 of residuals Relative residual in (4.2) rank deficient R −10 −15 0 20 40 60 Window number 80 100 F IG . 4.2. Residuals for the sliding window example. 5. Proofs of the key theorems. 5.1. Proofs of Theorems 2.1 and 2.2. Using the form of R1 in (2.6), we have that if for some ℓ ≤ p it hold that ρℓ = · · · = ρp = 0, then V ( : , ℓ : p) = 0, (I − U U ) 0 T V ( : , ℓ : p) is linearly dependent upon the columns of U and thereby 0 allowing us to immediately reduce the dimension of this problem. Therefore, without loss of generality, we assume that thus concluding that ρ1 ≥ . . . ≥ ρp > 0 (5.1) and thus that R1 is nonsingular. Using (5.1), we note that (5.2) U T Q1 = (I − U T U )U T BR1−1 , = (I − U T U )F1 , where F1 = U T BR1−1 . This allows us to rewrite Q1 as (5.3) leading to the following lemma. Q1 = BR1−1 − U F1 ETNA Kent State University http://etna.math.kent.edu 179 BLOCK GRAM-SCHMIDT DOWNDATING L EMMA 5.1. Let U ∈ Rm×n and ξ satisfy (1.13). Let B, Q1 ∈ Rm×p be left orthogonal matrices and be as in (5.3), n + p ≤ m, R1 ∈ Rp×p be nonsingular, and F1 = U T BR1−1 . Then for any unit vector w ∈ Rp , (5.4) −1 2 R w = 1 + kF1 wk2 (1 + δw ), 1 2 2 |δw | ≤ ξ, where δw depends upon w. Proof. Using the fact that Q1 and B are left orthogonal and computing the normal equation’s matrix of both sides of (5.3) yields I = R1−T R1−1 − R1−T B T U F1 − F1T U T BR1−1 + F1T U T U F1 = R1−T R1−1 − 2F1T F1 + F1T U T U F1 = R1−T R1−1 − F1T F1 − F1T (I − U T U )F1 . Thus, R1−T R1−1 = I + F1T F1 + F1T (I − U T U )F1 so that for any unit vector w ∈ Rp , kwk2 = 1, the use of norm inequalities yields −1 2 R w = 1 + kF1 wk2 + wT F1T (I − U T U )F1 w 1 2 2 2 2 ≤ 1 + kF1 wk2 + I − U T U 2 kF1 wk2 2 = 1 + (1 + ξ) kF1 wk2 . By a similar argument, −1 2 R w ≥ 1 + (1 − ξ) kF1 wk2 . 1 2 2 Thus, for some |δw | ≤ ξ that depends upon w, we have (5.4). R EMARK 5.1. Note that Lemma 5.1 does not depend upon R1 being diagonal, but R1 only needs to be nonsingular. From (5.4), using the definition of the 2-norm yields kF1 ( : , 1 : k)k2 ≤ 1/2 R−1 (1 : k, 1 : k)2 − 1 1 2 (1 − ξ) 1/2 · From the definition of R1 in (2.5)–(2.6), i.e., as a diagonal matrix of singular values, we have that (5.5) 1 − ρ2k kF1 ( : , 1 : k)k2 ≤ ρk 1/2 (1 − ξ)−1/2 , and thus from (5.2) it follows that T U Q1 ( : , 1 : k) ≤ I − U T U kF1 ( : , 1 : k)k 2 2 2 2 1/2 1 − ρk ≤ξ + O(ξ 2 ). ρk ETNA Kent State University http://etna.math.kent.edu 180 J. L. BARLOW The following lemma allows us to assume that R2 in (2.7) is nonsingular. L EMMA 5.2. Assume the hypothesis and terminology of Lemma 5.1. Let the left orthogb B ∈ Rm×p and the upper triangular matrix R2 ∈ Rp×p be given by (2.7). If onal matrix Q R2 (1 : k, 1 : k) is singular, then ρk ≤ ξ(1 + ξ). (5.6) Proof. If ρk = 0, the theorem holds trivially, so we assume that ρk > 0. For the left orthogonal matrix B in (1.4), Q1 R1 = (I − U U T )B, b B R2 R1 = (I − U U T )2 B. Q If R2 (1 : k, 1 : k) is singular, there is a vector v 6= 0 such that R2 (1 : k, 1 : k)v = 0. Since ρ1 ≥ · · · ≥ ρk > 0 and R1 (1 : k, 1 : k) = diag(ρ1 , . . . , ρk ), we can choose v so that v = R1 (1 : k, 1 : k)w, where w is a unit vector. Thus, Since b B ( : , 1 : k)R2 (1 : k, 1 : k)R1 (1 : k, 1 : k)w = (I − U U T )2 B( : , 1 : k)w = 0. Q (I − U U T )2 = I − U U T − U (I − U T U )U T , this implies Q1 ( : , 1 : k)R1 (1 : k, 1 : k)w = (I − U U T )B( : , 1 : k)w = U (I − U T U )U T B( : , 1 : k)w. Thus, by the definition of ρk and using the fact that kwk2 = 1, ρk ≤ kR1 (1 : k, 1 : k)wk2 ≤ U (I − U T U )U T B( : , 1 : k)w2 ≤ U (I − U T U )U T B( : , 1 : k) . 2 Since B is left orthogonal, we have 2 2 ρk ≤ kU k2 I − U T U 2 ≤ ξ kU k2 . From the assumption (1.13), 2 kU k2 = U T U 2 ≤ 1 + I − U T U 2 = 1 + ξ, thus ρk satisfies (5.6). Assuming R2 is nonsingular, we have that b B = (I − U T U )F2 , UT Q F2 = U T Q1 R2−1 ETNA Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING so that 181 Tb U QB ( : , 1 : k) ≤ ξ kF2 ( : , 1 : k)k2 , 2 Tb U QB ( : , 1 : k) ≤ ξF kF2 ( : , 1 : k)k2 . F Invoking Lemma 5.1, for any unit vector w ∈ Rp , we obtain −1 2 R w = 1 + kF2 wk2 (1 + δw ), |δw | ≤ ξ 2 2 2 so that, again from the definition of the matrix 2-norm, 1/2 Tb + O(ξ 2 ), U QB ( : , 1 : k) ≤ ξ βk2 − 1 2 1/2 Tb + O(ξF2 ), U QB ( : , 1 : k) ≤ ξF βk2 − 1 F where βk = R2−1 (1 : k, 1 : k)2 . Proof of Theorem 2.1. We note that F1 and F2 are related according to F2 = U T Q1 R2−1 = (I − U T U )U T BR1−1 R2−1 = (I − U T U )F1 R2−1 . (5.7) We now exploit (5.7) to show an important relationship between βj = R1−1 (1 : j, 1 : j)2 and ρj , the jth singular value of R1 . Applying norm inequalities to (5.7) and (5.5), we have kF2 ( : , 1 : j)k2 ≤ I − U T U 2 kF1 ( : , 1 : j)k2 R2−1 (1 : j, 1 : j)2 1/2 ! 1/2 1 − ρ2j 2 ≤ξ 1 + kF2 ( : , 1 : j)k2 + O(ξ 2 ), ρj which yields kF2 ( : , 1 : j)k2 1 + kF2 ( : , 1 : 2 j)k2 1/2 1 − ρ2j ≤ξ ρj Since βj > βorth implies (2.15) and since x/ 1 + x2 for x > 0, kF2 ( : , 1 : j)k2 ≤ corth implies that corth (1 + c2orth ) 1/2 1 − ρ2j ≤ξ ρj 1/2 1/2 1/2 + O(ξ 2 ). is a strictly increasing function + O(ξ 2 ) ≤ ξ/ρj + O(ξ 2 ), which becomes ρj ≤ αorth ξ + O(ξ 2 ), where αorth is given in (2.15). 1/2 We note that if k is the largest integer such that βk ≤ 1 + c2orth , then the matrix b QB = QB ( : , 1 : k) as defined in Theorem 2.2 satisfies T U QB ≤ corth ξ + O(ξ 2 ), (5.8) T 2 U QB ≤ corth ξF + O(ξ 2 ). F ETNA Kent State University http://etna.math.kent.edu 182 J. L. BARLOW Some algebra shows that bB R bB B = U SB + Q = U SB + QB R B + D k , (5.9) where Dk is given by (2.18). Thus, we have that (5.10) kB − U SB − QB RB k2 = kDk k2 b b = Q B ( : , k + 1 : p)RB (k + 1 : p, k + 1 : p) 2 b = RB (k + 1 : p, k + 1 : p) . 2 b To prove a bound for R B (k + 1 : p, k + 1 : p) and thus to bound the residual in (5.10), 2 we need the following lemma proved in [3]. L EMMA 5.3 ( [3, Lemma 3.2]). If U ∈ Rm×n satisfies (1.13), then Im − U U T ≤ 1. b B ( : , 1 : k) satisfies (2.11). Proof of Theorem 2.2. From (5.8), we have that QB = Q For k = p, we have Dk = 0, and (2.19) is trivially satisfied. For k < p, from (5.9), we have (2.17)–(2.18), thus by orthogonal equivalence, b kDk k2 = R B (k + 1 : p, k + 1 : p) . 2 b Thus, we only need to establish a bound for R B (k + 1 : p, k + 1 : p) to prove the theorem. 2 Since bB (k + 1 : p, k + 1 : p) = R2 (k + 1 : p, k + 1 : p)R1 (k + 1 : p, k + 1 : p), R by a standard norm inequality and the SVD structure (2.5)–(2.6), it follows that b RB (k + 1 : p, k + 1 : p) 2 ≤ kR2 (k + 1 : p, k + 1 : p)k2 kR1 (k + 1 : p, k + 1 : p)k2 = ρk+1 kR2 (k + 1 : p, k + 1 : p)k2 ≤ ρk+1 kR2 k2 . (5.11) A bit of algebra shows that b TB (I − U U T )Q1 , R2 = Q thus we can use orthogonal equivalence and Lemma 5.3 to show kR2 k2 ≤ I − U U T 2 ≤ 1. Thus, from (5.11), b kDk k2 = R B (k + 1 : p, k + 1 : p) ≤ ρk+1 kR2 k2 ≤ ρk+1 , 2 which establishes (2.19). ETNA Kent State University http://etna.math.kent.edu 183 BLOCK GRAM-SCHMIDT DOWNDATING 5.2. Proofs of Theorems 3.1 and 3.3. We begin with the proof of Theorem 3.1. First, we need two lemmas. e and γorth be as in Theorem 3.1. Then L EMMA 5.4. Let U def eT U e In+k − U = ξ˜ ≤ γorth ξ + O(ξ 2 ). 2 Proof. We have that eT U e = Z T [In+k − QB U T QB U ]Z In+k − U I − QT Q QTB U Z. = ZT k T B B U QB In − U T U Thus using standard norm inequalities, we have I k − QT QB QTB U B eT U e In+k − U = T T U QB I n − U U 2 2 T Ik − QTB QB T 2 U QTB 2 ≤ U QB I n − U U 2 2 . 2 Since QB satisfies (2.8) and since Theorem 2.2 implies that Function 2.4 produces a matrix QB satisfying T U QB ≤ corth ξ + O(ξ 2 ), Ik − QTB QB ≤ ξ 2 2 and moreover U is assumed to satisfy (1.13), it follows that 1 corth T e e ξ + O(ξ 2 ) = γorth ξ + O(ξ 2 ). In+k − U U ≤ corth 1 2 2 L EMMA 5.5. Let RV ∈ Rp×p be the upper triangular matrix defined in (3.1). Using the terminology in Theorem 2.1 and Lemma 5.4 with the convention that ρp+1 = 0, we have (5.12) (5.13) −1 ˜ 1/2 (1 − ρk+1 )−1 R ≤ (1 + ξ) V 2 ≤ 1 + (αorth + γorth /2)ξ + O(ξ 2 ), kRV k2 ≤ 1 + (αorth + γorth /2)ξ + O(ξ 2 ). Proof. Theorem 2.2 implies (5.14) e1 RV = V − Dk , U 0 where Dk is bounded as in (2.19). Through the use of a singular value inequality in [15, Problem 7.3.P16], we conclude that e1 )σj (RV ) ≤ σj (U e1 RV ) ≤ σ1 (U e1 )σj (RV ). σ p (U Thus, using standard norm inequalities, V e σp (RV ) U1 ≥ σp − kDk k2 . 0 2 ETNA Kent State University http://etna.math.kent.edu 184 J. L. BARLOW Since V is orthogonal and e e ˜ 1/2 + O(ξ 2 ), U 1 ≤ U ≤ (1 + ξ) 2 2 2 we have that within a margin of O(ξ ), ˜ 1/2 ≥ 1 − kDk k . σp (RV )(1 + ξ) 2 Invoking Theorem 2.2 yields ˜ −1/2 (1 − ρk+1 ). σp (RV ) ≥ (1 + ξ) Thus, −1 ˜ 1/2 (1 − ρk+1 )−1 + O(ξ 2 ). R = σp (RV )−1 ≤ (1 + ξ) V 2 From the bound for ξ˜ in Lemma 5.4 and the bound for ρk+1 in Theorem 2.1, we have −1 R ≤ (1 + αorth + γorth /2)ξ + O(ξ 2 ). V 2 To establish the bound for kRV k2 , simply note that e † V − Dk RV = U 1 0 so that e † ˜ −1/2 (1 + ρk+1 ). kRV k2 ≤ U 1 (1 + ρk+1 ) ≤ (1 − ξ) 2 The bound (5.13) follows from an argument similar to that for (5.12). We are now ready to prove Theorem 3.1. Proof of Theorem 3.1. We have already proved (3.10). Next we bound k∆UV k2 . From (1.8), (3.1), (3.3), and (5.14), we have ∆UV = −Dk (p + 1 : m, : )RV−1 . Thus, from Theorem 2.1 and Lemma 5.5, k∆UV k2 ≤ kDk (p + 1 : m, : )k2 RV−1 2 ≤ ρk+1 (1 + (αorth + γorth /2)ξ + O(ξ 2 ) = αorth ξ + O(ξ 2 ), which is (3.8). Now proceed to prove (3.9). We have that so that (5.15) e1T U e2 = UVT ∆U + (∆UV )T U U T e1T U e2 UV ∆U ≤ U + (∆UV )T U 2 . 2 2 To bound the first term in (5.15), note that eT e ˜ e eT U U1 U2 ≤ In+k − U = ξ, 2 2 ETNA Kent State University http://etna.math.kent.edu 185 BLOCK GRAM-SCHMIDT DOWNDATING and thus (5.16) T UV ∆U ≤ ξ˜ + k∆UV k U . 2 2 2 e, Since U is just the lower right block of U ˜ 1/2 e U ≤ U ≤ (1 + ξ) 2 2 so that (5.16) becomes T ˜ 1/2 . UV ∆U ≤ ξ˜ + k∆UV k (1 + ξ) 2 2 Again using the singular value result in [15, Problem 7.3.P16], we have ˜ 1/2 . σp (UV ) ∆U 2 ≤ (ξ˜ + k∆UV k2 )(1 + ξ) We note that from (3.4) it follows that UV = (V + Dk (1 : p, : ))RV−1 , hence σp (UV ) ≥ σp (V + Dk (1 : p, : ))σp (RV−1 ). Using the orthogonality of V and the bound for kDk k2 from Theorem 2.2, we have −1 σp (UV ) ≥ (1 − ρk+1 ) kRV k2 . Therefore, using (5.13) yields ˜ 1/2 (1 − ρk+1 )−1 kRV k ∆U ≤ (ξ˜ + k∆UV k )(1 + ξ) 2 2 2 1/2 ˜ ˜ = (ξ + k∆UV k )(1 + ξ) (1 + ρk+1 )(1 − ρk+1 )−1 2 ≤ (αorth + γorth )ξ + O(ξ 2 ). We now prove Theorem 3.3. Proof of Theorem 3.3. We note that " #2 2 eT U e1 eT U e U Ip − U T e 1 1 2 e In+k − U U = e2 e eT U eT U F In̄ − U U 2 2 1 F 2 2 2 T T e2T U e2 e1 U e2 + e1 U e1 + 2 U (5.17) = Ip − U In̄ − U F F F and that by orthogonal equivalence 2 T 2 eT U e QB U )Z In+k − U = Z T (In+k − QB U F F T 2 QB U = In+k − QB U F 2 T 2 2 T (5.18) = I k − QB QB F + 2 U QB F + I n − U T U F . 2 e2 eT U Equating (5.17) and (5.18) and solving for In̄ − U obtains (3.13). 2 F ETNA Kent State University http://etna.math.kent.edu 186 J. L. BARLOW To obtain (3.15), we note that thus ∆U e , U2 = U Thus, eT U e2 = (∆U )T (∆U ) + In̄ − U T U . In̄ − U 2 T e2T U e2 + (∆U )T (∆U )F , In̄ − U U ≤ In̄ − U F F which is (3.14). From the inequalities (∆U )T (∆U ) ≤ √p (∆U )T (∆U ) ≤ √p ∆U 2 2 F 2 and the bound (3.9), we obtain (3.15). To get (3.16), we note that In − U T U , Ik − QTB QB ≤ ξF (5.19) F F and that (5.20) T U QB ≤ corth ξF + O(ξF2 ) = 1 ξF + O(ξF2 ), F 2 thus √ T 2 eT U e2 + p(αorth + γorth )2 ξ 2 )2 + O(ξF4 ) In̄ − U U ≤ (In̄ − U 2 F F 2 √ T e2 U e2 + p(αorth + γorth )2 ξ 2 e2T U e2 ≤ In̄ − U + O(ξF ). In̄ − U F F From (3.13) and (5.19)–(5.20), we have 2 2 5 e T e 2 3 eT U e2 eT e ≤ ξF − 2 U In̄ − U 2 1 U2 − Ip − U1 U1 + O(ξF ), 2 F F F which is (3.16). 6. Conclusion. We have taken the 2-norm formulation of the downdating algorithms in [4, 9, 20] for deleting a single row from a QR factorization and fashioned the matrix 2norm formulation for a block downdating algorithm designed to delete p rows from a matrix. Similar to results shown in [4], if we are asked to delete p rows from a QR decomposition with a near left orthogonal factor U satisfying (1.13), we can obtain a QR decomposition for the remaining m − p rows that has a new left orthogonal factor U whose loss of orthogonality can be bounded as in Theorem 3.3. Our numerical tests indicate that repeated block updates and downdates often have a correcting effect on the loss of orthogonality. Acknowledgments. The author completed much of this work while visiting the University of California at Berkeley as the guest of James Demmel. That visit provided a stimulating and helpful atmosphere to work on the ideas in this paper as well as some other ideas. During that time, a conversation with Ming Gu had an important influence on this work. Thanks also to Lothar Reichel and to two careful referees for helpful suggestions that improved this manuscript. ETNA Kent State University http://etna.math.kent.edu BLOCK GRAM-SCHMIDT DOWNDATING 187 REFERENCES [1] N. A BDELMALEK, Round off error analysis for Gram–Schmidt method and solution of linear least squares problems, Nordisk Tidskr. Informationsbehandling (BIT), 11 (1971), pp. 354–367. [2] J. BARLOW, H. E RBAY, AND I. S LAPNI ČAR, An allternative algorithm for the refinement of ULV decomposition, SIAM J. Matrix Anal. Appl., 27 (2005), pp. 198–211. [3] J. BARLOW AND A. S MOKTUNOWICZ, Reorthogonalized block classical Gram-Schmidt, Numer. Math., 123 (2013), pp. 395–423. [4] J. BARLOW, A. S MOKTUNOWICZ , AND H. E RBAY, Improved Gram–Schmidt downdating methods, BIT, 45 (2005), pp. 259–285. [5] J. BARLOW, P. YOON , AND H. Z HA, An algorithm and a stability theory for downdating the ULV decomposition, BIT, 36 (1996), pp. 14–40. [6] A. B J ÖRCK, Solving linear least squares problems by Gram–Schmidt orthogonalization, Nordisk Tidskr. Informationsbehandling (BIT), 7 (1967), pp. 1–21. [7] P. B USINGER AND G. G OLUB, Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), pp. 269–278. [8] T. C HAN, An improved algorithm for computing the singular value decomposition, ACM Trans. Math. Software, 8 (1982), pp. 72–83. [9] J. W. DANIEL , W. B. G RAGG , L. K AUFMAN , AND G. W. S TEWART, Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization, Math. Comp., 30 (1976), pp. 772–795. [10] J. D ONGARRA , J. D U C ROZ , I. D UFF , AND S. H AMMARLING, A set of level 3 basic linear algebra subprograms, ACM Trans. Math. Software, 16 (1990), pp. 1–17. [11] W. F ERNG , G. G OLUB , AND R. P LEMMONS, Adaptive Lanczos methods for recursive condition estimation, Numer. Algorithms, 1 (1991), pp. 1–20. [12] L. G IRAUD , J. L ANGOU , M. ROZLO ŽNIK , AND J. VAN DEN E SHOF, Rounding error analysis of the classical Gram–Schmidt orthogonalization process, Numer. Math., 101 (2005), pp. 87–100. [13] G. G OLUB AND W. K AHAN, Calculating the singular values and pseudoinverse of a matrix, Soc. Indust. Appl. Math. Ser. B Numer. Anal., 2 (1965), pp. 205–224. [14] G. G OLUB AND C. VAN L OAN, Matrix Computations, 4th ed., Johns Hopkins University Press, Baltimore, 2013. [15] R. H ORN AND C. J OHNSON, Matrix Analysis, 2nd ed., Cambridge University Press, Cambridge, 2013. [16] C. L AWSON AND R. H ANSON, Solving Least Squares Problems, Prentice-Hall, Englewood Cliff, 1974. [17] Q. L IU, Modified Gram-Schmidt-based methods for block downdating the Cholesky factorization, J. Comput. Appl. Math., 235 (2011), pp. 1897–1905. [18] M. OVERTON , N. G UGLIELMI , AND G. S TEWART, An efficient algorithm for generalized null space decomposition, SIAM J. Matrix Anal. Appl., to appear. [19] B. PARLETT, H. S IMON , AND L. S TRINGER, On estimating the largest eigenvalue with the Lanczos algorithm, Math. Comp., 38 (1982), pp. 153–166. [20] K. YOO AND H. PARK, Accurate downdating of a modified Gram–Schmidt QR decomposition, BIT, 36 (1996), pp. 166–181.