Continuous analogues of matrix factorizations NASC seminar, 9th May 2014 Alex Townsend DPhil student Mathematical Institute University of Oxford (joint work with Nick Trefethen) Many thanks to Gil Strang, MIT. Work supported by supported by EPSRC grant EP/P505666/1. Introduction Discrete vs. continuous v = column vector Alex Townsend @ Oxford f (x ) chebfun [Battles & Trefethen, 04] 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] Alex Townsend @ Oxford 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] A = square matrix f (x , y ) chebfun2 [T. & Trefethen, 13] Alex Townsend @ Oxford 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] A = square matrix f (x , y ) chebfun2 [T. & Trefethen, 13] chebop [Driscoll, Bornemann, & Trefethen, 08] Av Alex Townsend @ Oxford ∫ f (s , y )v (s ) ds 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] A = square matrix f (x , y ) chebfun2 [T. & Trefethen, 13] chebop [Driscoll, Bornemann, & Trefethen, 08] cmatrix [T. & Trefethen, 14] Av SVD , QR , LU , Chol Alex Townsend @ Oxford ∫ f (s , y )v (s ) ds ? 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] A = square matrix f (x , y ) chebfun2 [T. & Trefethen, 13] chebop [Driscoll, Bornemann, & Trefethen, 08] cmatrix [T. & Trefethen, 14] Av SVD , QR , LU , Chol ∫ f (s , y )v (s ) ds ? Interested in continuous analogues rather than infinite analogues. Alex Townsend @ Oxford 2/24 Introduction Discrete vs. continuous v = column vector f (x ) chebfun [Battles & Trefethen, 04] A = tall skinny matrix [ f 1 (x ) | · · · | f n (x ) ] quasimatrix [Stewart, 98] A = square matrix f (x , y ) chebfun2 [T. & Trefethen, 13] chebop [Driscoll, Bornemann, & Trefethen, 08] cmatrix [T. & Trefethen, 14] Av SVD , QR , LU , Chol ∫ f (s , y )v (s ) ds ? Interested in continuous analogues rather than infinite analogues. Aside: Infinite analogues are Schmidt, Wiener–Hopf, infinite-dimensional QR, etc. Alex Townsend @ Oxford 2/24 Introduction Matrices, quasimatrices, cmatrices matrix quasimatrix cmatrix m×n [a , b ] × n [a , b ] × [c , d ] . A cmatrix is a continuous function of (y , x ) ∈ [a , b ] × [c , d ]. Alex Townsend @ Oxford 3/24 Introduction Matrices vs. cmatrices An m × n matrix: An [a , b ] × [c , d ] cmatrix: {1, . . . , m} entries indexed by {1, . . . , m} × {1, . . . , n} entries indexed by subset of R Well-ordered Not well-ordered by < [a , b ] × [c , d ] Question What is the 1st column? Successor No successor What is the next column? A null set Null subsets What sparsity makes sense? Finite Infinite Convergence? Alex Townsend @ Oxford 4/24 Introduction Matrices vs. cmatrices An m × n matrix: An [a , b ] × [c , d ] cmatrix: {1, . . . , m} entries indexed by {1, . . . , m} × {1, . . . , n} entries indexed by subset of R Well-ordered Not well-ordered by < [a , b ] × [c , d ] Question What is the 1st column? Successor No successor What is the next column? A null set Null subsets What sparsity makes sense? Finite Infinite Convergence? Three heroes: Alex Townsend @ Oxford 4/24 Introduction Matrices vs. cmatrices An m × n matrix: An [a , b ] × [c , d ] cmatrix: {1, . . . , m} entries indexed by {1, . . . , m} × {1, . . . , n} entries indexed by subset of R Well-ordered Not well-ordered by < [a , b ] × [c , d ] Question What is the 1st column? Successor No successor What is the next column? A null set Null subsets What sparsity makes sense? Finite Infinite Convergence? Three heroes: Alex Townsend @ Oxford Smoothness 4/24 Introduction Matrices vs. cmatrices An m × n matrix: An [a , b ] × [c , d ] cmatrix: {1, . . . , m} entries indexed by {1, . . . , m} × {1, . . . , n} entries indexed by subset of R [a , b ] × [c , d ] Question Well-ordered Not well-ordered by < What is the 1st column? Successor No successor What is the next column? A null set Null subsets What sparsity makes sense? Finite Infinite Convergence? Three heroes: Alex Townsend @ Oxford Smoothness pivoting 4/24 Introduction Matrices vs. cmatrices An m × n matrix: An [a , b ] × [c , d ] cmatrix: {1, . . . , m} entries indexed by {1, . . . , m} × {1, . . . , n} entries indexed by subset of R [a , b ] × [c , d ] Question Well-ordered Not well-ordered by < What is the 1st column? Successor No successor What is the next column? A null set Null subsets What sparsity makes sense? Finite Infinite Convergence? Three heroes: Alex Townsend @ Oxford Smoothness pivoting mach 4/24 Singular value decomposition Matrix factorization A = U ΣV T , Σ = diagonal, U , V = orthonormal columns . A Alex Townsend @ Oxford U Σ VT 5/24 Singular value decomposition Matrix factorization A = U ΣV T , Σ = diagonal, U , V = orthonormal columns . A U Σ VT Exists: SVD exists and is (almost) unique Alex Townsend @ Oxford 5/24 Singular value decomposition Matrix factorization A = U ΣV T , Σ = diagonal, U , V = orthonormal columns . A U Σ VT Exists: SVD exists and is (almost) unique Application: A best rank r approx. is Ar = 1st r terms (in 2- & F-norm) Alex Townsend @ Oxford 5/24 Singular value decomposition Matrix factorization A = U ΣV T , Σ = diagonal, U , V = orthonormal columns . A U Σ VT Exists: SVD exists and is (almost) unique Application: A best rank r approx. is Ar = 1st r terms (in 2- & F-norm) ∑ Separable model: A = jn=1 σj uj vjT is a sum of outer products Alex Townsend @ Oxford 5/24 Singular value decomposition Matrix factorization A = U ΣV T , Σ = diagonal, U , V = orthonormal columns . A U Σ VT Exists: SVD exists and is (almost) unique Application: A best rank r approx. is Ar = 1st r terms (in 2- & F-norm) ∑ Separable model: A = jn=1 σj uj vjT is a sum of outer products Computation: Bidiagonalize then iterate [Golub & Kahan (1965)] Alex Townsend @ Oxford 5/24 Singular value decomposition Continuous analogue A = U ΣV T , U , V = orthonormal columns Σ = diagonal, σ1 σ2 v1T v2T Σ VT u1 u2 At least formally . A Alex Townsend @ Oxford U 6/24 Singular value decomposition Continuous analogue A = U ΣV T , U , V = orthonormal columns Σ = diagonal, σ1 σ2 v1T v2T Σ VT u1 u2 At least formally . A U Exists: SVD exists if A is continuous and is (almost) unique [Schmidt 1907] Alex Townsend @ Oxford 6/24 Singular value decomposition Continuous analogue A = U ΣV T , U , V = orthonormal columns Σ = diagonal, σ1 σ2 v1T v2T Σ VT u1 u2 At least formally . A U Exists: SVD exists if A is continuous and is (almost) unique [Schmidt 1907] Application: A best rank r approx. is fr = 1st r terms (L 2 -norm) [Weyl 1912] Alex Townsend @ Oxford 6/24 Singular value decomposition Continuous analogue A = U ΣV T , U , V = orthonormal columns Σ = diagonal, σ1 σ2 v1T v2T Σ VT u1 u2 At least formally . A U Exists: SVD exists if A is continuous and is (almost) unique [Schmidt 1907] Application: A best rank r approx. is fr = 1st r terms (L 2 -norm) [Weyl 1912] ∑ T Separable model: A = ∞ j =1 σj uj vj is a sum of “outer products” Alex Townsend @ Oxford 6/24 Singular value decomposition Continuous analogue A = U ΣV T , U , V = orthonormal columns Σ = diagonal, σ1 σ2 v1T v2T Σ VT u1 u2 At least formally . A U Exists: SVD exists if A is continuous and is (almost) unique [Schmidt 1907] Application: A best rank r approx. is fr = 1st r terms (L 2 -norm) [Weyl 1912] ∑ T Separable model: A = ∞ j =1 σj uj vj is a sum of “outer products” Computation: Avoid bidiagonalization Alex Townsend @ Oxford 6/24 Singular value decomposition Absolute and uniform convergence of the SVD Theorem Let A be an [a , b ] × [c , d ] cmatrix that is (uniformly) Lipschitz continuous in both variables. Then the SVD of A exists, the singular values are unique with σj → 0 as j → ∞, and ∞ ∑ A= σj uj vjT , j =1 where the series is uniformly and absolutely convergent to A . Proof. See [Schmidt 1907], [Hammerstein 1923], and [Smithies 1937]. If A satisfies the assumptions of the theorem, then A Alex Townsend @ Oxford = U ΣV T . 7/24 Singular value decomposition Algorithm 1. Compute A = QA RA ≈ . 2. Compute quasimatrix QR, RAT = QR RR A QA RA QR RR = . (Householder triangularization of a quasimatrix [Trefethen 08]) RAT = 3. Compute SVD . RR U Σ VT A = (QA V )Σ(QR U )T This is a continuous analogue of a discrete algorithm [Ipsen 90]. Alex Townsend @ Oxford 8/24 Singular value decomposition Related work Erhard Schmidt James Mercer Autonne, Bateman, Hammerstein, Kellogg, Picard, Smithies, Weyl Aizerman, Braverman, König, Rozonoer Alex Townsend @ Oxford Carl Eckart & Gail Young Golub, Hestenes, Kahan, Kogbetliantz, Reinsch 9/24 LU decomposition Matrix factorization A = P −1 LU , P = permutation, L = unit lower-triangular, U = upper-triangular. . A P −1 L U P −1 L = “psychologically” lower-triangular Exists: It (almost) exists and with extra conditions is (almost) unique Application: Used to solve dense linear systems Ax = b ∑ Separable model: A = jn=1 `j ujT is a sum of outer products [Pan 2000] Computation: Gaussian elimination with pivoting Alex Townsend @ Oxford 10/24 LU decomposition Continuous analogue A = LU , L = unit lower-triangular, U = upper-triangular. u1T u2T `1 `2 . A L U Exists: It (usually) exists and with extra conditions is (almost) unique Application: Can be used to “solve” integral equations ∑ T Separable model: A = ∞ j =1 `j uj is a sum of outer products Computation: Continuous analogue of GECP (GE with complete pivoting) Alex Townsend @ Oxford 11/24 LU decomposition Computation The standard point of view: . Alex Townsend @ Oxford A P −1 L U 12/24 LU decomposition Computation The standard point of view: A different point of view: . A A ←− A − A (j , :)A (:, k )/A (j , k ) A ←− A − A (y0 , :)A (:, x0 )/A (y0 , x0 ) P −1 L U (GE step for matrices) (GE step for functions) Each step of GE is a rank-1 update. We use complete pivoting. Alex Townsend @ Oxford 12/24 LU decomposition Computation The standard point of view: A different point of view: . A A ←− A − A (j , :)A (:, k )/A (j , k ) A ←− A − A (y0 , :)A (:, x0 )/A (y0 , x0 ) P −1 L U (GE step for matrices) (GE step for functions) Each step of GE is a rank-1 update. We use complete pivoting. Pivoting orders the columns and rows. Alex Townsend @ Oxford 12/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? Alex Townsend @ Oxford y1 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? y1 y 2 Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? y1 y3 y 2 Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? y1 y3 y 2 y4 Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? y1 y3 y 2 y4 y5 Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? y1 y3 y 2 y4 y5 Alex Townsend @ Oxford 13/24 LU decomposition What is a triangular quasimatrix? u1T u2T `1 `2 L = unit lower-triangular U = upper-triangular . U A L What is a lower-triangular quasimatrix? Red dots = 0’s, blue squares = 1’s Position of 0’s is determined by pivoting strategy Forward substitution has a continuous analogue More precisely, L is lower-triangular wrt y1 , y2 , . . . Alex Townsend @ Oxford y1 y3 y 2 y4 y5 13/24 LU decomposition Absolute and uniform convergence of LU Theorem Let A be an [a , b ] × [c , d ] continuous cmatrix. Suppose A (·, x ) is analytic in the “stadium” of radius 2ρ(b − a ) about [a , b ] for some ρ > 1 where it is bounded in absolute value by M (uniformly in x). Then A= ∞ ∑ `j ujT , j =1 where the series is uniformly and absolutely convergent to A . Moreover, k ∑ A − `j ujT ≤ M ρ−k . j =1 stadium 2ρ(b−a) a b ∞ Alex Townsend @ Oxford 14/24 LU decomposition A Chebfun2 application Low rank function approximation A = chebfun2(@(x,y) cos(10*(x.ˆ2+y))+sin(10*(x+y.ˆ2))); contour(A, ’.’) • = pivot location A (y , x ) ≈ k ∑ j =1 Alex Townsend @ Oxford Rank = 125 Rank = 65 Rank = 5 Rank = 33 ∫ d∫ b A (y , x )dydx ≈ `j (y )uj (x ), c a Rank = 28 Rank = 2 k ∫ b ∑ j =1 a ∫ d `j (y )dy uj (x )dx . c 15/24 LU decomposition A Chebfun2 application SVD is optimal, but GE can be faster 2D Runge function: A (y , x ) = Wendland’s CSRBFs: 1 . 1 + γ(x 2 + y 2 ) As (y , x ) = φ3,s (kx − y k2 ) ∈ C2s . 0 10 0 10 10 Relative error in L2 SVD GE −2 Relative error in L2 SVD −2 10 −4 10 −6 10 −8 10 −10 10 γ=100 GE −4 10 φ3,0 ∈ C0 −6 10 −8 10 φ3,1 ∈ C2 −10 10 −12 10 φ3,3 ∈ C6 −12 10 γ=10 −14 −14 10 γ=1 10 −16 10 −16 10 0 5 10 15 20 Rank of approximant Alex Townsend @ Oxford 25 30 0 50 100 150 200 Rank of approximant 16/24 LU decomposition Related work Eugene Tyrtyshnikov Goreinov, Oseledets, Savostyanov, Zamarashkin Mario Bebendorf Gesenhues, Griebel, Hackbusch, Rjasanow Keith Geddes Carvajal, Chapman Petros Drineas Candes, Greengard, Mahoney, Martinsson, Rokhlin Moral of the story: Iterative GE is everywhere, under different guises Many others: Halko, Liberty, Martinsson, O’Neil, Tropp, Tygert, Woolfe, etc. Alex Townsend @ Oxford 17/24 Cholesky factorization Matrix factorization A = RT R, . A R = upper-triangular RT R Exists: Exists and is unique if A is a positive-definite matrix Application: A numerical test for a positive-definite matrix ∑ Separable model: A = jn=1 rj rjT is a sum of outer products Computation: Cholesky algorithm, i.e., GECP on a positive definite matrix Alex Townsend @ Oxford 18/24 Cholesky factorization Continuous analogue A = RT R, R = upper-triangular quasimatrix r1T r2T r1 r2 At least formally . Pivoting: Essential. Continuous analogue of pivoted Cholesky Exists: Exists and is essentially unique for nonnegative definite functions Definition An [a , b ] × [a , b ] continuous symmetric cmatrix A is nonnegative definite if ∫ b∫ b T v (y )A (y , x )v (x )dxdy ≥ 0, v Av = ∀v ∈ C[a , b ]. a Alex Townsend @ Oxford a 19/24 Cholesky factorization Convergence Theorem Let A be an [a , b ] × [a , b ] continuous, symmetric, and nonnegative definite cmatrix. Suppose that A (·, x ) is analytic in the closed Bernstein ellipse E2ρ(b −a ) with foci a and b with ρ > 1 and bounded in absolute value by M, uniformly in y. Then ∞ ∑ A= rj rjT , stadium j =1 where the series is uniformly and absolutely convergent to A . Moreover, k ∑ 32Mk ρ−k A − rj rjT ≤ . 4ρ − 1 j =1 E2ρ(b−a) a b ∞ Alex Townsend @ Oxford 20/24 Cholesky factorization Computation Pivoted Cholesky = GECP on nonnegative definite function1 1 1 1 0 0 0 Pivots in Cholesky 0 10 −5 0 −1 1 −1 0 −1 1 −1 1 1 1 0 0 0 10 0 1 Pivot size −1 −1 −10 10 −15 10 0 2 4 6 8 10 12 14 Step −1 −1 0 1 −1 −1 0 1 Each step is a rank 1 update: 1 Always −1 −1 0 1 A ←− A − A (:, x0 )A (x0 , :)/A (x0 , x0 ) take the absolute maximum on the diagonal even if there is a tie with an off-diagonal entry. Alex Townsend @ Oxford 21/24 Cholesky factorization A Chebfun2 application A test for symmetric nonnegative definite functions A = chebfun2(@(x,y) cos(10*x.*y) + y + x.ˆ2 + sin(10*x.*y)); B = A.’ * A; chol(B) Inverse multiquadric 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −0.5 0 0.5 1 −1 −1 −0.5 0 0.5 1 All the pivots are nonnegative and on the y = x line ⇒ nonnegative definite. Alex Townsend @ Oxford 22/24 Demo Demo Alex Townsend @ Oxford 23/24 References Z. Battles & L. N. Trefethen, An extension of MATLAB to continuous functions and operators, SISC, 25 (2004), pp. 1743–1770. T. A. Driscoll, F. Bornemann, & L. N. Trefethen, The chebop system for automatic solution of differential equations, BIT, 48 (2008), pp. 701–723. C. Eckart & G. Young, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211–218. N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd edition, SIAM, 2002. E. Schmidt, Zur Theorie der linearen und nichtlinearen Integralgleichungen. I Teil. Entwicklung willkürlichen Funktionen nach System vorgeschriebener, Math. Ann., 63 (1907), pp. 433–476. G. W. Stewart, Afternotes Goes to Graduate School, Philadelphia, SIAM, 1998. T. & L. N. Trefethen, Gaussian elimination as an iterative algorithm, SIAM News, March 2013. T. & L. N. Trefethen, An extension of Chebfun to two dimensions, to appear in SISC, 2013. Alex Townsend @ Oxford 24/24