Uploaded by Ilya Belko

numerical-linear-algebra-and-matrix-factorizations-1nbsped-3030364674-9783030364670 compress

advertisement
22
Tom Lyche
Numerical Linear
Algebra and Matrix
Factorizations
Editorial Board
T. J.Barth
M.Griebel
D.E.Keyes
R.M.Nieminen
D.Roose
T.Schlick
Texts in Computational
Science and Engineering
Editors
Timothy J. Barth
Michael Griebel
David E. Keyes
Risto M. Nieminen
Dirk Roose
Tamar Schlick
22
More information about this series at http://www.springer.com/series/5151
Tom Lyche
Numerical Linear Algebra
and Matrix Factorizations
Tom Lyche
Blindern
University of Oslo
Oslo, Norway
ISSN 1611-0994
ISSN 2197-179X (electronic)
Texts in Computational Science and Engineering
ISBN 978-3-030-36467-0
ISBN 978-3-030-36468-7 (eBook)
https://doi.org/10.1007/978-3-030-36468-7
Mathematics Subject Classification (2010): 15-XX, 65-XX
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
It is a pleasure to write this foreword to the book “Numerical Linear Algebra
and Matrix Factorizations” by Tom Lyche. I see this book project from three
perspectives, corresponding to my three different roles: first, as a friend and close
colleague of Tom for a number of years, secondly as the present department head,
and, finally, as a researcher within the international linear algebra and matrix theory
community. The book actually has a long history and started out as lecture notes
that Tom wrote for a course in numerical linear algebra. For almost forty years this
course has been an important and popular course for our students in mathematics,
both in theoretical and more applied directions, as well as students in statistics,
physics, mechanics and computer science. These notes have been revised multiple
times during the years, and new topics have been added. I have had the pleasure
to lecture the course myself, using Tom’s lecture notes, and I believe that both the
selection of topics and the combined approach of theory and algorithms is very
appealing. This is also what our students point out when they have taken this course.
As we know, the area presented in this book play a highly central role in many
applications of mathematics and in scientific computing in general. Sometimes,
in the international linear algebra and matrix theory community, one divides the
area into numerical linear algebra, applied linear algebra and core (theoretical)
linear algebra. This may serve some purpose, but often it is fruitful to have a more
unified view on this, in order to see the interplay between theory, applications and
algorithms. I think this view dominates this book, and that this makes the book
interesting to a wide range of readers. Finally, I would like to thank Tom for his
work with this book and the mentioned course, and for being a good colleague from
whom I have learned a lot. I know that his international research community in
spline theory also share this view. Most importantly, I hope that you, the reader, will
enjoy the book!
Oslo, Norway
June 2019
Geir Dahl
v
Preface
This book, which has grown out of a one semester course at the University of Oslo,
targets upper undergraduate and beginning graduate students in mathematics, statistics, computational physics and engineering who need a mathematical background
in numerical linear algebra and related matrix factorizations.
Mastering the material in this book should enable a student to analyze computational problems and develop his or her own algorithms for solving problems of the
following kind,
• System of linear equations. Given a (square) matrix A and a vector b. Find a
vector x such that Ax = b.
• Least squares. Given a (rectangular) matrix A and a vector b. Find a vector x
such that the sum of squares of the components of b − Ax is as small as possible.
• Eigenvalues and eigenvectors. Given a (square) matrix A . Find a number λ
and/or a nonzero vector x such that Ax = λx.
Such problems can be large and difficult to handle, so much can be gained by
understanding and taking advantage of special structures. For this we need a good
understanding of basic numerical linear algebra and matrix factorizations. Factoring
a matrix into a product of simpler matrices is a crucial tool in numerical linear
algebra for it allows one to tackle large problems through solving a sequence of
easier ones.
The main characteristics of this book are as follows:
1. It is self-contained, only assuming first year calculus, an introductory course
in linear algebra, and some experience in solving mathematical problems on a
computer. A special feature of this book is the detailed proofs of practically all
results. Parts of the book can be studied independently making it suitable for self
study.
2. There are numerous exercises which can be found at the end of each chapter.
In a separate book we offer solutions to all problems. Solutions of many exam
problems given for this course at the University of Oslo are included in this
separate volume.
vii
viii
Preface
3. The book, consisting of an introductory first chapter and 15 more chapters,
naturally disaggregating into six thematically related parts. The chapters are
designed to be suitable for a one week per chapter one semester course. Toward
the goal of being self-contained, the first chapter contains a review of linear
algebra, and is provided to the reader for convenient occasional reference.
4. Many of the chapters contain material beyond what might normally be covered
in one week of lectures. A typical 15 week semester’s curriculum could consist
of the following curated material
LU and QR factorizations 2.4, 2.5, 3.2, 3.3, 3.5, 4.1, 4.2, 5.1 − 5.4, 5.6
SVD, norms and LSQ
6.1, 6.3, 7.1 − 7.4, 8.1 − 8.3, 9.1 − 9.3, 9.4.1
Kronecker products
10.1, 10.2, 10.3, 11.1, 11.2, 11.3
Iterative methods
12.1 − 12.4, 13.1 − 13.3, 13.5
Eigenpairs
14.1 − 14.5, 15.1 − 15.3
Chapters 2–4 give a rather complete treatment of various LU factorizations.
Chapters 5–9 cover QR and singular value factorizations, matrix norms, least
squares methods and perturbation theory for linear equations and least squares
problems.
Chapter 10 gives an introduction to Kronecker products. We illustrate their use
by giving simple proofs of properties of the matrix arising from a discretization
of the 2 dimensional Poison Equation. Also, we study fast methods based on
eigenvector expansions and the Fast Fourier Transform in Chap. 11. Some
background from Chaps. 2, 3 and 4 may be needed for Chaps. 10 and 11.
Iterative methods are studied in Chaps. 12 and 13. This includes the classical
methods of Jacobi, Gauss Seidel Richardson and Successive Over Relaxation
(SOR), as well as a derivation and convergence analysis of the methods of
steepest descent and conjugate gradients. The preconditioned conjugate gradient
method is introduced and applied to the Poisson problem with variable coefficients.
In Chap. 14 we consider perturbation theory for eigenvalues, the power
method and its variants, and use the Inertia Theorem to find a single eigenvalue of
a symmetric matrix. Chapter 15 gives a brief informal introduction to one of the
most celebrated algorithms of the twentieth century, the QR method for finding
all eigenvalues and eigenvectors of a matrix.
5. In this book we give many detailed numerical algorithms for solving linear
algebra problems. We have written these algorithms as functions in MATLAB. A
list of these functions and the page number where they can be found is included
after the table of contents. Moreover, their listings can be found online at http://
folk.uio.no/tom/numlinalg/code. Complexity is discussed briefly in Sect. 3.3.2.
As for programming issues, we often vectorize the algorithms leading to shorter
and more efficient programs. Stability is important both for the mathematical
problems and for the numerical algorithms. Stability can be studied in terms of
perturbation theory that leads to condition numbers, see Chaps. 8, 9 and 14. We
Preface
ix
will often use phrases like “the algorithm is numerically stable” or “the algorithm
is not numerically stable” without saying precisely what we mean by this.
Loosely speaking, an algorithm is numerically stable if the solution, computed
in floating point arithmetic, is the exact solution of a slightly perturbed problem.
To determine upper bounds for these perturbations is the topic of backward error
analysis. We refer to [7] and [17, 18] for an in-depths treatment.
A list of freely available software tools for solving linear algebra problems
can be found at
www.netlib.org/utk/people/JackDongarra/la-sw.html
To supplement this volume the reader might consult Björck [2], Meyer [15] and
Stewart [17, 18]. For matrix analysis the two volumes by Horn and Johnson [9, 10]
contain considerable additional material.
Acknowledgments
I would like to thank my colleagues Elaine Cohen, Geir Dahl, Michael Floater,
Knut Mørken, Richard Riesenfeld, Nils Henrik Risebro, Øyvind Ryan and Ragnar
Winther for all the inspiring discussions we have had over the years. Earlier versions
of this book were converted to LaTeX by Are Magnus Bruaset and Njål Foldnes
with help for the final version from Øyvind Ryan. I thank Christian Schulz, Georg
Muntingh and Øyvind Ryan who helped me with the exercise sessions and we have,
in a separate volume, provided solutions to practically all problems in this book. I
also thank an anonymous referee for useful suggestions. Finally, I would like to give
a special thanks to Larry Schumaker for his enduring friendship and encouragement
over the years.
Oslo, Norway
June 2019
Tom Lyche
Contents
1
A Short Review of Linear Algebra . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.1 Notation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.2 Vector Spaces and Subspaces .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.2.1 Linear Independence and Bases. . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.2.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.2.3 The Vector Spaces Rn and Cn . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.3 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.3.1 Basic Properties.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.3.2 The Inverse Matrix . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.5 Eigenvalues, Eigenvectors and Eigenpairs . . . . . .. . . . . . . . . . . . . . . . . . . .
1.6 Exercises Chap. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.6.1 Exercises Sect. 1.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.6.2 Exercises Sect. 1.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
1.6.3 Exercises Sect. 1.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Part I
2
1
1
5
6
8
10
11
12
13
15
18
20
20
21
22
LU and QR Factorizations
Diagonally Dominant Tridiagonal Matrices; Three Examples . . . . . . . .
2.1 Cubic Spline Interpolation .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.1.1 Polynomial Interpolation .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.1.2 Piecewise Linear and Cubic Spline Interpolation .. . . . . . . .
2.1.3 Give Me a Moment . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.1.4 LU Factorization of a Tridiagonal System . . . . . . . . . . . . . . . .
2.2 A Two Point Boundary Value Problem .. . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.2.1 Diagonal Dominance . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.3 An Eigenvalue Problem .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.3.1 The Buckling of a Beam . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.4 The Eigenpairs of the 1D Test Matrix .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
27
27
28
28
31
34
37
38
40
40
41
xi
xii
Contents
2.5
Block Multiplication and Triangular Matrices . .. . . . . . . . . . . . . . . . . . . .
2.5.1 Block Multiplication . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.5.2 Triangular Matrices . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Exercises Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.6.1 Exercises Sect. 2.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.6.2 Exercises Sect. 2.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.6.3 Exercises Sect. 2.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.6.4 Exercises Sect. 2.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
2.6.5 Exercises Sect. 2.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
43
43
46
48
48
52
53
53
54
55
3
Gaussian Elimination and LU Factorizations . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.1 3 by 3 Example .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.2 Gauss and LU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.3 Banded Triangular Systems . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.3.1 Algorithms for Triangular Systems . . . .. . . . . . . . . . . . . . . . . . . .
3.3.2 Counting Operations . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.4 The PLU Factorization .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.4.1 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.4.2 Permutation Matrices . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.4.3 Pivot Strategies . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.5 The LU and LDU Factorizations . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.5.1 Existence and Uniqueness.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.6 Block LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.7 Exercises Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.7.1 Exercises Sect. 3.3 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.7.2 Exercises Sect. 3.4 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.7.3 Exercises Sect. 3.5 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.7.4 Exercises Sect. 3.6 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3.8 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
57
57
59
62
62
64
66
66
66
69
70
71
74
75
75
76
78
81
81
4
LDL* Factorization and Positive Definite Matrices . . . . . . . . . . . . . . . . . . . .
4.1 The LDL* Factorization . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
4.2 Positive Definite and Semidefinite Matrices . . . .. . . . . . . . . . . . . . . . . . . .
4.2.1 The Cholesky Factorization . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
4.2.2 Positive Definite and Positive Semidefinite Criteria . . . . . .
4.3 Semi-Cholesky Factorization of a Banded Matrix .. . . . . . . . . . . . . . . . .
4.4 The Non-symmetric Real Case . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
4.5 Exercises Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
4.5.1 Exercises Sect. 4.2 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
4.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
83
83
85
87
89
91
95
96
96
97
5
Orthonormal and Unitary Transformations .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 99
5.1 Inner Products, Orthogonality and Unitary Matrices . . . . . . . . . . . . . . . 99
5.1.1 Real and Complex Inner Products . . . . .. . . . . . . . . . . . . . . . . . . . 100
5.1.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102
2.6
2.7
Contents
5.2
5.3
5.4
5.5
5.6
5.7
5.8
Part II
6
xiii
5.1.3 Sum of Subspaces and Orthogonal Projections . . . . . . . . . . .
5.1.4 Unitary and Orthogonal Matrices . . . . . .. . . . . . . . . . . . . . . . . . . .
The Householder Transformation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Householder Triangulation.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.3.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.3.2 The Number of Arithmetic Operations . . . . . . . . . . . . . . . . . . . .
5.3.3 Solving Linear Systems Using Unitary
Transformations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
The QR Decomposition and QR Factorization .. . . . . . . . . . . . . . . . . . . .
5.4.1 Existence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
QR and Gram-Schmidt .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Exercises Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.7.1 Exercises Sect. 5.1 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.7.2 Exercises Sect. 5.2 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.7.3 Exercises Sect. 5.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.7.4 Exercises Sect. 5.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
5.7.5 Exercises Sect. 5.6 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
104
106
107
111
111
113
113
114
114
116
117
119
119
119
120
123
123
125
Eigenpairs and Singular Values
Eigenpairs and Similarity Transformations . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.1 Defective and Nondefective Matrices . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.1.1 Similarity Transformations .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.1.2 Algebraic and Geometric Multiplicity of Eigenvalues .. . .
6.2 The Jordan Factorization .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3 The Schur Factorization and Normal Matrices .. . . . . . . . . . . . . . . . . . . .
6.3.1 The Schur Factorization .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3.2 Unitary and Orthogonal Matrices . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3.3 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3.4 The Rayleigh Quotient . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3.5 The Quasi-Triangular Form . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.3.6 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.4 Minmax Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.4.1 The Hoffman-Wielandt Theorem . . . . . .. . . . . . . . . . . . . . . . . . . .
6.5 Left Eigenvectors .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.5.1 Biorthogonality . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.6 Exercises Chap. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.6.1 Exercises Sect. 6.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.6.2 Exercises Sect. 6.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.6.3 Exercises Sect. 6.3 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.6.4 Exercises Sect. 6.4 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
6.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
129
129
131
132
133
135
135
135
137
139
139
140
141
143
143
144
145
145
147
149
150
150
xiv
7
Contents
The Singular Value Decomposition .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.1 The SVD Always Exists . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.1.1 The Matrices A∗ A, AA∗ . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.2 Further Properties of SVD . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.2.1 The Singular Value Factorization . . . . . .. . . . . . . . . . . . . . . . . . . .
7.2.2 SVD and the Four Fundamental Subspaces . . . . . . . . . . . . . . .
7.3 A Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.4 Determining the Rank of a Matrix Numerically . . . . . . . . . . . . . . . . . . . .
7.4.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.4.2 Low Rank Approximation . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.5 Exercises Chap. 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.5.1 Exercises Sect. 7.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.5.2 Exercises Sect. 7.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.5.3 Exercises Sect. 7.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
7.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Part III
8
9
153
154
154
156
156
159
159
161
161
162
163
163
164
167
168
Matrix Norms and Least Squares
Matrix Norms and Perturbation Theory for Linear Systems . . . . . . . . .
8.1 Vector Norms .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.2 Matrix Norms.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.2.1 Consistent and Subordinate Matrix Norms .. . . . . . . . . . . . . . .
8.2.2 Operator Norms.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.2.3 The Operator p-Norms . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.2.4 Unitary Invariant Matrix Norms . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.2.5 Absolute and Monotone Norms .. . . . . . .. . . . . . . . . . . . . . . . . . . .
8.3 The Condition Number with Respect to Inversion.. . . . . . . . . . . . . . . . .
8.3.1 Perturbation of the Right Hand Side in a Linear
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.3.2 Perturbation of a Square Matrix . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.4 Proof That the p-Norms Are Norms . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.4.1 p-Norms and Inner Product Norms . . . .. . . . . . . . . . . . . . . . . . . .
8.5 Exercises Chap. 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.5.1 Exercises Sect. 8.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.5.2 Exercises Sect. 8.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.5.3 Exercises Sect. 8.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.5.4 Exercises Sect. 8.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
8.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
171
171
174
174
175
177
179
180
180
Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.1.1 Curve Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.2 Geometric Least Squares Theory .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.3 Numerical Solution.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.3.1 Normal Equations . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.3.2 QR Factorization.. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
199
200
202
204
205
205
206
181
183
185
188
190
190
191
194
197
198
Contents
xv
9.3.3
9.4
9.5
9.6
9.7
Part IV
Singular Value Decomposition, Generalized
Inverses and Least Squares .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Perturbation Theory for Least Squares .. . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.4.1 Perturbing the Right Hand Side . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.4.2 Perturbing the Matrix .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Perturbation Theory for Singular Values .. . . . . . .. . . . . . . . . . . . . . . . . . . .
9.5.1 The Minmax Theorem for Singular Values
and the Hoffman-Wielandt Theorem . .. . . . . . . . . . . . . . . . . . . .
Exercises Chap. 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.6.1 Exercises Sect. 9.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.6.2 Exercises Sect. 9.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.6.3 Exercises Sect. 9.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.6.4 Exercises Sect. 9.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
9.6.5 Exercises Sect. 9.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
207
210
211
212
213
213
216
216
217
218
221
221
222
Kronecker Products and Fourier Transforms
10 The Kronecker Product .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.1 The 2D Poisson Problem.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.1.1 The Test Matrices .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.2 The Kronecker Product . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.3 Properties of the 2D Test Matrices . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.4 Exercises Chap. 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.4.1 Exercises Sects. 10.1, 10.2 . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.4.2 Exercises Sect. 10.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
10.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
225
225
228
229
232
234
234
234
236
11 Fast Direct Solution of a Large Linear System . . . . . .. . . . . . . . . . . . . . . . . . . .
11.1 Algorithms for a Banded Positive Definite System .. . . . . . . . . . . . . . . .
11.1.1 Cholesky Factorization . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.1.2 Block LU Factorization of a Block Tridiagonal
Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.1.3 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.2 A Fast Poisson Solver Based on Diagonalization .. . . . . . . . . . . . . . . . . .
11.3 A Fast Poisson Solver Based on the Discrete Sine
and Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.3.1 The Discrete Sine Transform (DST) .. .. . . . . . . . . . . . . . . . . . . .
11.3.2 The Discrete Fourier Transform (DFT).. . . . . . . . . . . . . . . . . . .
11.3.3 The Fast Fourier Transform (FFT) .. . . .. . . . . . . . . . . . . . . . . . . .
11.3.4 A Poisson Solver Based on the FFT . . .. . . . . . . . . . . . . . . . . . . .
11.4 Exercises Chap. 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.4.1 Exercises Sect. 11.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
11.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
237
237
238
238
239
239
242
242
242
244
247
247
247
250
xvi
Part V
Contents
Iterative Methods for Large Linear Systems
12 The Classical Iterative Methods . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.1 Classical Iterative Methods; Component Form .. . . . . . . . . . . . . . . . . . . .
12.1.1 The Discrete Poisson System . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.2 Classical Iterative Methods; Matrix Form.. . . . . .. . . . . . . . . . . . . . . . . . . .
12.2.1 Fixed-Point Form . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.2.2 The Splitting Matrices for the Classical Methods .. . . . . . . .
12.3 Convergence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.3.1 Richardson’s Method .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.3.2 Convergence of SOR . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.3.3 Convergence of the Classical Methods
for the Discrete Poisson Matrix .. . . . . . .. . . . . . . . . . . . . . . . . . . .
12.3.4 Number of Iterations . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.3.5 Stopping the Iteration.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.4 Powers of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.4.1 The Spectral Radius . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.4.2 Neumann Series . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.5 The Optimal SOR Parameter ω . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.6 Exercises Chap. 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.6.1 Exercises Sect. 12.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.6.2 Exercises Sect. 12.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
12.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
253
253
255
257
258
258
260
261
263
13 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.1 Quadratic Minimization and Steepest Descent .. . . . . . . . . . . . . . . . . . . .
13.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.2.1 Derivation of the Method.. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.2.2 The Conjugate Gradient Algorithm .. . .. . . . . . . . . . . . . . . . . . . .
13.2.3 Numerical Example . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.2.4 Implementation Issues . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.3 Convergence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.3.1 The Main Theorem . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.3.2 The Number of Iterations for the Model Problems . . . . . . .
13.3.3 Krylov Spaces and the Best Approximation Property . . . .
13.4 Proof of the Convergence Estimates. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.4.1 Chebyshev Polynomials .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.4.2 Convergence Proof for Steepest Descent .. . . . . . . . . . . . . . . . .
13.4.3 Monotonicity of the Error . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.5 Preconditioning .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.6 Preconditioning Example . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.6.1 A Variable Coefficient Problem .. . . . . . .. . . . . . . . . . . . . . . . . . . .
13.6.2 Applying Preconditioning .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.7 Exercises Chap. 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.7.1 Exercises Sect. 13.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.7.2 Exercises Sect. 13.2 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
279
280
283
283
285
286
286
288
288
289
289
293
293
296
298
299
302
302
305
306
306
307
264
266
267
268
268
270
271
274
274
276
277
Contents
xvii
13.7.3 Exercises Sect. 13.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.7.4 Exercises Sect. 13.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.7.5 Exercises Sect. 13.5 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
13.8 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Part VI
309
312
313
313
Eigenvalues and Eigenvectors
14 Numerical Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.1 Eigenpairs.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.2 Gershgorin’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.3 Perturbation of Eigenvalues.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.3.1 Nondefective Matrices . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.4 Unitary Similarity Transformation of a Matrix into Upper
Hessenberg Form .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.4.1 Assembling Householder Transformations .. . . . . . . . . . . . . . .
14.5 Computing a Selected Eigenvalue of a Symmetric Matrix .. . . . . . . .
14.5.1 The Inertia Theorem .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.5.2 Approximating λm . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6 Exercises Chap. 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6.1 Exercises Sect. 14.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6.2 Exercises Sect. 14.2 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6.3 Exercises Sect. 14.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6.4 Exercises Sect. 14.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.6.5 Exercises Sect. 14.5 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
14.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
324
326
326
328
329
330
330
331
331
332
332
334
15 The QR Algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.1 The Power Method and Its Variants . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.1.1 The Power Method . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.1.2 The Inverse Power Method .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.1.3 Rayleigh Quotient Iteration . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.2 The Basic QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.2.1 Relation to the Power Method . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.2.2 Invariance of the Hessenberg Form .. . .. . . . . . . . . . . . . . . . . . . .
15.2.3 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.3 The Shifted QR Algorithms . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.4 Exercises Chap. 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.4.1 Exercises Sect. 15.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
15.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
335
335
335
339
340
342
343
344
345
345
346
346
347
Part VII
317
317
318
320
322
Appendix
16 Differentiation of Vector Functions . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 355
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 357
List of Figures
Fig. 1.1
The triangle T defined by the three points P1 , P2 and P3 .. . . . . . . . .
Fig. 2.1
The polynomial of degree 13 interpolating
f (x) = arctan(10x) + π/2 on [−1, 1]. See text
. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
The piecewise linear polynomial interpolating
f (x) = arctan(10x) + π/2 at n = 14 uniform points on
[−1, 1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
A cubic spline with one knot interpolating f (x) = x 4 on
[0, 2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
A cubic B-spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
The cubic spline interpolating f (x) = arctan(10x) + π/2
at 14 equidistant sites on [−1, 1]. The exact function is also
shown .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Fig. 2.2
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 3.1
Fig. 3.2
Fig. 5.1
Fig. 5.2
Fig. 5.3
Fig. 5.4
Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
Lower triangular 5 × 5 band matrices: d = 1
(left) and d = 2 right . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
The construction of v 1 and v 2 in Gram-Schmidt. The
constant c is given by c := s 2 , v 1 /v 1 , v 1 . . .. . . . . . . . . . . . . . . . . . . .
The orthogonal projections of s + t into S and T . . . . . . . . . . . . . . . . . .
The Householder transformation in Example 5.1 . . . . . . . . . . . . . . . . . . .
A plane rotation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
23
29
29
31
34
51
59
62
103
105
108
117
Fig. 7.1
The ellipse y12 /9 + y22 = 1 (left) and the rotated ellipse AS
(right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 160
Fig. 8.1
A convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 185
Fig. 9.1
Fig. 9.2
A least squares fit to data . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 201
Graphical interpretation of the bounds in Theorem 9.8 .. . . . . . . . . . . . 211
Fig. 10.1 Numbering of grid points.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227
xix
xx
List of Figures
Fig. 10.2 The 5-point stencil .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227
Fig. 10.3 Band structure of the 2D test matrix . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 228
Fig. 11.1 Fill-inn in the Cholesky factor of the Poisson matrix
(n = 100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 238
Fig. 12.1 The functions α → |1 − αλ1 | and α → |1 − αλn | . . . . . . . . . . . . . . . . . 262
Fig. 12.2 ρ(Gω ) with ω ∈ [0, 2] for n = 100, (lower curve) and
n = 2500 (upper curve) . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266
Fig. 13.1 Level curves for Q(x, y) given by (13.4). Also shown
is a steepest descent iteration (left) and a conjugate
gradient iteration (right) to find the minimum of Q (cf
Examples 13.1,13.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 281
Fig. 13.2 The orthogonal projection of x − x 0 into Wk . . .. . . . . . . . . . . . . . . . . . . . 291
Fig. 13.3 This is an illustration of the proof of Theorem 13.6 for
k = 3. f ≡ Q − Q∗ has a double zero at μ1 and one zero
between μ2 and μ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 295
Fig. 14.1 The Gershgorin disk Ri . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 319
Fig. 15.1 Post multiplication in a QR step . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 344
List of Tables
Table 12.1 The number of iterations kn to solve the discrete Poisson
problem with n unknowns using the methods of Jacobi,
Gauss-Seidel, and SOR (see text) with a tolerance 10−8 . . . . . . . . . 256
Table 12.2 Spectral radial for GJ , G1 , Gω∗ and the smallest integer
kn such that ρ(G)kn ≤ 10−8 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266
Table 13.1 The
of iterations K for the averaging problem on a
√
√ number
n × n grid for various n . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287
Table 13.2 The
of iterations K for the Poisson problem on a
√ number
√
n × n grid for various n . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287
Table 13.3 The number of iterations K (no preconditioning) and Kpre
(with preconditioning) for the problem (13.52) using the
discrete Poisson problem as a preconditioner .. . . . . . . . . . . . . . . . . . . . 305
Table 15.1 Quadratic convergence of Rayleigh quotient iteration . . . . . . . . . . . . 341
xxi
Listings
2.1
2.2
2.3
2.4
2.5
trifactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
trisolve .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
splineint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
findsubintervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
splineval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
36
36
50
50
51
3.1
3.2
3.3
3.4
3.5
rforwardsolve.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
rbacksolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
cforwardsolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
L1U . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
cbacksolve .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
63
63
64
73
75
4.1
4.2
4.3
LDLs. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
bandcholesky .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
bandsemicholeskyL .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
85
89
94
5.1
5.2
5.3
housegen .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
housetriang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
rothesstri .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
110
112
123
11.1 fastpoisson .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
11.2 fftrec . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
241
246
12.1 jdp . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
12.2 sordp ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
256
257
13.1 cg . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
13.2 cgtest. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
13.3 pcg . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
286
287
301
14.1 hesshousegen .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
14.2 accumulateQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
325
326
15.1 powerit .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
15.2 rayleighit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .
338
341
xxiii
Chapter 1
A Short Review of Linear Algebra
In this introductory chapter we give a compact introduction to linear algebra with
emphasis on Rn and Cn . For a more elementary introduction, see for example the
book [13].
1.1 Notation
The following sets and notations will be used in this book.
1. The sets of natural numbers, integers, rational numbers, real numbers, and
complex numbers are denoted by N, Z, Q, R, C, respectively.
2. We use the “colon equal” symbol v := e to indicate that the symbol v is defined
by the expression e.
3. Rn is the set of n-tuples of real numbers which we will represent as bold face
column vectors. Thus x ∈ Rn means
⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
x = ⎢ . ⎥,
⎣ .. ⎦
xn
where xi ∈ R for i = 1, . . . , n. Row vectors are normally identified using the
transpose operation. Thus if x ∈ Rn then x is a column vector and x T is a row
vector.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_1
1
2
1 A Short Review of Linear Algebra
4. Addition and scalar multiplication are denoted and defined by
⎡
⎤
x1 + y1
⎢
⎥
x + y := ⎣ ... ⎦ ,
⎡
⎤
ax1
⎢ ⎥
ax := ⎣ ... ⎦ ,
xn + yn
x, y ∈ Rn ,
a ∈ R.
axn
5. Rm×n is the set of matrices A with real elements. The integers m and n are the
number of rows and columns in the tableau
⎤
a11 a12 · · · a1n
⎢ a21 a22 · · · a2n ⎥
⎥
⎢
A=⎢ .
.. . . .. ⎥ .
⎣ ..
. . ⎦
.
am1 am2 · · · amn
⎡
The element in the ith row and j th column of A will be denoted by ai,j , aij ,
A(i, j ) or (A)i,j . We use the notations
⎤
a1j
⎢ a2j ⎥
⎢ ⎥
a :j := ⎢ . ⎥ ,
⎣ .. ⎦
⎡
⎤
a T1:
⎢ aT ⎥
⎢ 2: ⎥
A = [a :1 , a :2 , . . . a :n ] = ⎢ . ⎥
⎣ .. ⎦
⎡
a Ti: := [ai1 , ai2 , . . . , ain ],
a Tm:
amj
for the columns a :j and rows a Ti: of A. We often drop the colon and write a j
and a Ti with the risk of some confusion. If m = 1 then A is a row vector, if
n = 1 then A is a column vector, while if m = n then A is a square matrix.
In this text we will denote matrices by boldface capital letters A, B, C, · · · and
vectors most often by boldface lower case letters x, y, z, · · · .
6. A complex number is a number written in the form x = a + ib, where a, b
are real numbers and i, the imaginary unit, satisfies i 2 = −1. The set of all
such numbers is denoted by C. The numbers a = Re x and b = Im x are the
real and imaginary part of x. The number
x√
:= a − ib is called the complex
√
conjugate of x = a + ib, and |x| := xx = a 2 + b 2 the absolute value or
modulus of x. The complex exponential function can be defined by
ex = ea+ib := ea (cos b + i sin b).
In particular,
eiπ/2 = i,
eiπ = −1,
e2iπ = 1.
1.1 Notation
3
We have ex+y = ex ey for all x, y ∈ C. The polar form of a complex number is
x = a + ib = reiθ ,
r = |x| =
a 2 + b2 ,
cos θ =
a
,
r
sin θ =
b
.
r
7. For matrices and vectors with complex elements we use the notation A ∈ Cm×n
and x ∈ Cn . We define complex row vectors using either the transpose x T or
the conjugate transpose operation x ∗ := x T = [x 1 , . . . , x n ]. If x ∈ Rn then
x∗ = xT .
8. For x, y ∈ Cn and a ∈ C the operations of vector addition and scalar
multiplication is defined by component operations as in the real case (cf. 4.).
9. The arithmetic operations on rectangular matrices are
• matrix addition C := A + B if A, B, C are matrices of the same size, i.e.,
with the same number of rows and columns, and cij := aij + bij for all i, j .
• multiplication by a scalar C := αA, where cij := αaij for all i, j .
• matrix multiplication C := AB, C = A · B or C = A ∗ B, where A ∈
p
Cm×p , B ∈ Cp×n , C ∈ Cm×n , and cij := k=1 aik bkj for i = 1, . . . , m,
j = 1, . . . , n.
• element-by-element matrix operations C := A × B, D := A/B, and E :=
A ∧ r where all matrices are of the same size and cij := aij bij , dij := aij /bij
and eij := aijr for all i, j and suitable r. For the division A/B we assume that
all elements of B are nonzero. The element-by-element product C = A × B
is known as the Schur product and also the Hadamard product.
10. Let A ∈ Rm×n or A ∈ Cm×n . The transpose AT and conjugate transpose
A∗ are n × m matrices with elements aijT := aj i and aij∗ := a j i , respectively.
If B is an n, p matrix then (AB)T = B T AT and (AB)∗ = B ∗ A∗ . A matrix
A ∈ Cn×n is symmetric if AT = A and Hermitian if A∗ = A.
11. The unit vectors in Rn and Cn are denoted by
⎡ ⎤
1
⎢0⎥
⎢ ⎥
⎢ ⎥
e1 := ⎢0⎥ ,
⎢.⎥
⎣ .. ⎦
0
⎡ ⎤
0
⎢1⎥
⎢ ⎥
⎢ ⎥
e2 := ⎢0⎥ ,
⎢.⎥
⎣ .. ⎦
0
⎡ ⎤
0
⎢0⎥
⎢ ⎥
⎢ ⎥
e3 := ⎢1⎥ ,
⎢.⎥
⎣ .. ⎦
0
...,
⎡ ⎤
0
⎢0⎥
⎢ ⎥
⎢ ⎥
en := ⎢0⎥ ,
⎢.⎥
⎣ .. ⎦
1
while I n = I := [δij ]ni,j =1 , where
δij :=
1
if i = j,
0
otherwise,
(1.1)
is the identity matrix of order n. Both the columns and the transpose of the
rows of I are the unit vectors e1 , e2 , . . . , en .
4
1 A Short Review of Linear Algebra
12. Some matrices with many zeros have names indicating their “shape”. Suppose
A ∈ Rn×n or A ∈ Cn×n . Then A is
•
•
•
•
•
•
•
diagonal if aij = 0 for i = j .
upper triangular or right triangular if aij = 0 for i > j .
lower triangular or left triangular if aij = 0 for i < j .
upper Hessenberg if aij = 0 for i > j + 1.
lower Hessenberg if aij = 0 for i < j + 1.
tridiagonal if aij = 0 for |i − j | > 1.
d-banded if aij = 0 for |i − j | > d.
13. We use the following notations for diagonal- and tridiagonal n × n matrices
⎤
⎡
⎡
⎤
d1 0 · · · 0
d1
⎢ 0 d2 · · · 0 ⎥
⎥
⎥ ⎢
⎢
diag(di ) = diag(d1 , . . . , dn ) := ⎢ . . . . ⎥ = ⎣ . . . ⎦ ,
⎣ .. .. . . .. ⎦
dn
0 0 · · · dn
⎤
⎡
d 1 c1
⎥
⎢ a 1 d 2 c2
⎥
⎢
⎥
⎢ .. ..
..
B = tridiag(ai , di , ci ) = tridiag(a, d, c) := ⎢
⎥.
.
.
.
⎥
⎢
⎣
an−2 dn−1 cn−1 ⎦
an−1 dn
Here bii := di for i = 1, . . . , n, bi+1,i := ai , bi,i+1 := ci for i = 1, . . . , n − 1,
and bij := 0 otherwise.
14. Suppose A ∈ Cm×n and 1 ≤ i1 < i2 < · · · < ir ≤ m, 1 ≤ j1 < j2 < · · · <
jc ≤ n. The matrix A(i, j ) ∈ Cr×c is the submatrix of A consisting of rows
i := [i1 , . . . , ir ] and columns j := [j1 , . . . , jc ]
⎤
⎡
ai1 ,j1 ai1 ,j2 · · · ai1 ,jc
⎢ai ,j ai ,j · · · ai ,j ⎥
2 c⎥
i i · · · ir
⎢ 2 1 2 2
A(i, j ) := A 1 2
=⎢ .
.. . .
. ⎥.
.
⎣ .
j1 j2 · · · jc
. .. ⎦
.
air ,j1 air ,j2 · · · air ,jc
For the special case of consecutive rows and columns we also use the notation
⎤
ar1 ,c1 +1 · · · ar1 ,c2
⎢ ar +1,c ar +1,c +1 · · · ar +1,c ⎥
1
1
1
1
2 ⎥
⎢ 1
A(r1 : r2 , c1 : c2 ) := ⎢
.. ⎥ .
..
.. . .
⎣
.
.⎦
.
.
ar2 ,c1 ar2 ,c1 +1 · · · ar2 ,c2
⎡
ar1 ,c1
1.2 Vector Spaces and Subspaces
5
1.2 Vector Spaces and Subspaces
Many mathematical systems have analogous properties to vectors in R2 or R3 .
Definition 1.1 (Real Vector Space) A real vector space is a nonempty set V,
whose objects are called vectors, together with two operations + : V × V −→ V
and · : R × V −→ V, called addition and scalar multiplication, satisfying the
following axioms for all vectors u, v, w in V and scalars c, d in R.
(V1)
(V2)
(V3)
(V4)
(V5)
(S1)
(S2)
(S3)
(S4)
(S5)
The sum u + v is in V,
u + v = v + u,
u + (v + w) = (u + v) + w,
There is a zero vector 0 such that u + 0 = u,
For each u in V there is a vector −u in V such that u + (−u) = 0,
The scalar multiple c · u is in V,
c · (u + v) = c · u + c · v,
(c + d) · u = c · u + d · u,
c · (d · u) = (cd) · u,
1 · u = u.
The scalar multiplication symbol · is often omitted, writing cv instead of c · v. We
define u − v := u + (−v). We call V a complex vector space if the scalars consist
of all complex numbers C. In this book a vector space is either real or complex.
From the axioms it follows that
1. The zero vector is unique.
2. For each u ∈ V the negative −u of u is unique.
3. 0u = 0, c0 = 0, and −u = (−1)u.
Here are some examples
1. The spaces Rn and Cn , where n ∈ N, are real and complex vector spaces,
respectively.
2. Let D be a subset of R and d ∈ N. The set V of all functions f , g : D → Rd is a
real vector space with
(f + g)(t) := f (t) + g(t),
(cf )(t) := cf (t),
t ∈ D,
c ∈ R.
Two functions f , g in V are equal if f (t) = g(t) for all t ∈ D. The zero
element is the zero function given by f (t) = 0 for all t ∈ D and the negative
of f is given by −f = (−1)f . In the following we will use boldface letters for
functions only if d > 1.
3. For n ≥ 0 the space Πn of polynomials of degree at most n consists of all
polynomials p : R → R, p : R → C, or p : C → C of the form
p(t) := a0 + a1 t + a2 t 2 + · · · + an t n ,
(1.2)
6
1 A Short Review of Linear Algebra
where the coefficients a0 , . . . , an are real or complex numbers. p is called the
zero polynomial if all coefficients are zero. All other polynomials are said to be
nontrivial. The degree of a nontrivial polynomial p given by (1.2) is the smallest
integer 0 ≤ k ≤ n such that p(t) = a0 + · · · + ak t k with ak = 0. The degree of
the zero polynomial is not defined. Πn is a vector space if we define addition and
scalar multiplication as for functions.
Definition 1.2 (Linear Combination) For n ≥ 1 let X := {x 1 , . . . , x n } be a set of
vectors in a vector space V and let c1 , . . . , cn be scalars.
1.
2.
3.
4.
The sum c1 x 1 + · · · + cn x n is called a linear combination of x 1 , . . . , x n .
The linear combination is nontrivial if cj x j = 0 for at least one j .
The set of all linear combinations of elements in X is denoted span(X ).
A vector space is finite dimensional if it has a finite spanning set; i.e., there
exists n ∈ N and {x 1 , . . . , x n } in V such that V = span({x 1 , . . . , x n }).
Example 1.1 (Linear Combinations)
1. Any x = [x1 , . . . , xm ]T in Cm can be written as a linear combination of the unit
vectors as x = x1 e1 + x2 e2 + · · · + xm em . Thus, Cm = span({e1 , . . . , em }) and
Cm is finite dimensional. Similarly Rm is finite dimensional.
2. Let Π = ∪n Πn be the space of all polynomials. Π is a vector space that
is not finite dimensional. For suppose Π is finite dimensional. Then Π =
span({p1 , . . . , pm }) for some polynomials p1 , . . . , pm . Let d be an integer such
that the degree of pj is less than d for j = 1, . . . , m. A polynomial of degree d
cannot be written as a linear combination of p1 , . . . , pm , a contradiction.
1.2.1 Linear Independence and Bases
Definition 1.3 (Linear Independence) A set X = {x 1 , . . . , x n } of nonzero
vectors in a vector space is linearly dependent if 0 can be written as a nontrivial
linear combination of {x 1 , . . . , x n }. Otherwise X is linearly independent.
A set of vectors X = {x 1 , . . . , x n } is linearly independent if and only if
c1 x 1 + · · · + cn x n = 0
⇒
c1 = · · · = cn = 0.
(1.3)
Suppose {x 1 , . . . , x n } is linearly independent. Then
1. If x ∈ span(X ) then the scalars c1 , . . . , cn in the representation x = c1 x 1 +· · ·+
cn x n are unique.
2. Any nontrivial linear combination of x 1 , . . . , x n is nonzero,
Lemma 1.1 (Linear Independence and Span) Suppose v 1 , . . . , v n span a vector
space V and that w1 , . . . , w k are linearly independent vectors in V. Then k ≤ n.
1.2 Vector Spaces and Subspaces
7
Proof Suppose k > n. Write w1 as a linear combination of elements from
the set X0 := {v 1 , . . . , v n }, say w 1 = c1 v 1 + · · · + cn v n . Since w1 = 0
not all the c’s are equal to zero. Pick a nonzero c, say ci1 . Then v i1 can be
expressed as a linear combination of w1 and the remaining v’s. So the set X1 :=
{w1 , v 1 , . . . , v i1 −1 , v i1 +1 , . . . , v n } must also be a spanning set for V. We repeat this
for w 2 and X1 . In the linear combination w2 = di1 w1 + j =i1 dj v j , we must
have di2 = 0 for some i2 with i2 = i1 . For otherwise w2 = d1 w1 contradicting
the linear independence of the w’s. So the set X2 consisting of the v’s with v i1
replaced by w1 and v i2 replaced by w2 is again a spanning set for V. Repeating this
process n − 2 more times we obtain a spanning set Xn where v 1 , . . . , v n have been
replaced by w1 , . . . , w n . Since k > n we can then write wk as a linear combination
of w1 , . . . , wn contradicting the linear independence of the w’s. We conclude that
k ≤ n.
Definition 1.4 (Basis) A finite set of vectors {v 1 , . . . , v n } in a vector space V is a
basis for V if
1. span{v 1 , . . . , v n } = V.
2. {v 1 , . . . , v n } is linearly independent.
Theorem 1.1 (Basis Subset of a Spanning Set) Suppose V is a vector space and
that {v 1 , . . . , v n } is a spanning set for V. Then we can find a subset {v i1 , . . . , v ik }
that forms a basis for V.
Proof If {v 1 , . . . , v n } is linearly dependent we can express one of the v’s as a
nontrivial linear combination of the remaining v’s and drop that v from the spanning
set. Continue this process until the remaining v’s are linearly independent. They still
span the vector space and therefore form a basis.
Corollary 1.1 (Existence of a Basis) A vector space is finite dimensional (cf.
Definition 1.2) if and only if it has a basis.
Proof Let V = span{v 1 , . . . , v n } be a finite dimensional vector space. By Theorem 1.1, V has a basis. Conversely, if V = span{v 1 , . . . , v n } and {v 1 , . . . , v n } is a
basis then it is by definition a finite spanning set.
Theorem 1.2 (Dimension of a Vector Space) Every basis for a vector space V
has the same number of elements. This number is called the dimension of the vector
space and denoted dim V.
Proof Suppose X = {v 1 , . . . , v n } and Y = {w1 , . . . , w k } are two bases for V. By
Lemma 1.1 we have k ≤ n. Using the same Lemma with X and Y switched we
obtain n ≤ k. We conclude that n = k.
The set of unit vectors {e1 , . . . , en } form a basis for both Rn and Cn .
Theorem 1.3 (Enlarging Vectors to a Basis) Every linearly independent set of
vectors {v 1 , . . . , v k } in a finite dimensional vector space V can be enlarged to a
basis for V.
8
1 A Short Review of Linear Algebra
Proof If {v 1 , . . . , v k } does not span V we can enlarge the set by one vector v k+1
which cannot be expressed as a linear combination of {v 1 , . . . , v k }. The enlarged
set is also linearly independent. Continue this process. Since the space is finite
dimensional it must stop after a finite number of steps.
1.2.2 Subspaces
Definition 1.5 (Subspace) A nonempty subset S of a real or complex vector space
V is called a subspace of V if
(V1)
(S1)
The sum u + v is in S for any u, v ∈ S.
The scalar multiple cu is in S for any scalar c and any u ∈ S.
Using the operations in V, any subspace S of V is a vector space, i.e., all 10
axioms V 1 − V 5 and S1 − S5 are satisfied for S. In particular, S must contain the
zero element in V. This follows since the operations of vector addition and scalar
multiplication are inherited from V.
Example 1.2 (Examples of Subspaces)
1. {0}, where 0 is the zero vector is a subspace, the trivial subspace. The dimension
of the trivial subspace is defined to be zero. All other subspaces are nontrivial.
2. V is a subspace of itself.
3. span(X ) is a subspace of V for any X = {x 1 , . . . , x n } ⊆ V. Indeed, it is easy to
see that (V1) and (S1) hold.
4. The sum of two subspaces S and T of a vector space V is defined by
S + T := {s + t : s ∈ S and t ∈ T }.
(1.4)
Clearly (V1) and (S1) hold and it is a subspace of V .
5. The intersection of two subspaces S and T of a vector space V is defined by
S ∩ T := {x : x ∈ S and x ∈ T }.
(1.5)
It is a subspace of V.
6. The union of two subspaces S and T of a vector space V is defined by
S ∪ T := {x : x ∈ S or x ∈ T }.
(1.6)
In general it is not a subspace of V.
7. A sum of two subspaces S and T of a vector space V is called a direct sum and
denoted S ⊕ T if S ∩ T = {0}.
1.2 Vector Spaces and Subspaces
9
Theorem 1.4 (Dimension Formula for Sums of Subspaces) Let S and T be two
finite subspaces of a vector space V. Then
dim(S + T ) = dim(S) + dim(T ) − dim(S ∩ T ).
(1.7)
In particular, for a direct sum
dim(S ⊕ T ) = dim(S) + dim(T ).
(1.8)
Proof Let {u1 , . . . , up } be a basis for S ∩ T , where {u1 , . . . , up } = ∅, the empty
set, in the case S ∩ T = {0}. We use Theorem 1.3 to extend {u1 , . . . , up } to a basis
{u1 , . . . , up , s 1 , . . . , s q } for S and a basis {u1 , . . . , up , t 1 , . . . , t r } for T . Every
x ∈ S + T can be written as a linear combination of
{u1 , . . . , up , s 1 , . . . , s q , t 1 , . . . , t r }
so these vectors span S + T . We show that they are linearly independent and hence
p
q
a basis. Suppose u + s + t = 0, where u := j =1 αj uj , s := j =1 ρj s j , and
t := rj =1 σj t j . Now s = −(u + t) belongs to both S and to T and hence s ∈
S ∩ T . Therefore s can be written as a linear combination of u1 , . . . , up say s :=
p
p
q
j =1 βj uj . But then 0 =
j =1 βj uj −
j =1 ρj s j and since
{u1 , . . . , up , s 1 , . . . , s q }
is linearly independent we must have β1 = · · · = βp = ρ1 = · · · = ρq = 0
and hence s = 0. We then have u + t = 0 and by linear independence of
{u1 , . . . , up , t 1 , . . . , t r } we obtain α1 = · · · = αp = σ1 = · · · = σr = 0. We
have shown that the vectors {u1 , . . . , up , s 1 , . . . , s q , t 1 , . . . , t r } constitute a basis
for S + T . But then
dim(S +T ) = p +q +r = (p +q)+(p +r)−p = dim(S)+dim(T )−dim(S ∩T )
and (1.7) follows. Equation (1.7) implies (1.8) since dim{0} = 0.
It is convenient to introduce a matrix transforming a basis in a subspace into a
basis for the space itself.
Lemma 1.2 (Change of Basis Matrix) Suppose S is a subspace of a finite
dimensional vector space V and let {s 1 , . . . , s n } be a basis for S and {v 1 , . . . , v m }
a basis for V. Then each s j can be expressed as a linear combination of v 1 , . . . , v m ,
say
m
sj =
aij v i for j = 1, . . . , n.
i=1
(1.9)
10
1 A Short Review of Linear Algebra
n
m
If x ∈ S then x =
j =1 cj s j =
i=1 bi v i for some coefficients b :=
T
T
[b1 , . . . , bm ] , c := [c1 , . . . , cn ] . Moreover b = Ac, where A = [aij ] ∈ Cm×n is
given by (1.9). The matrix A has linearly independent columns.
Proof Equation (1.9) holds for some aij since s j ∈ V and {v 1 , . . . , v m } spans V.
Since {s 1 , . . . , s n } is a basis for S and {v 1 , . . . , v m } a basis for V, every x ∈ S can
be written x = nj=1 cj s j = m
i=1 bi v i for some scalars (cj ) and (bi ). But then
m
n
bi v i = x =
i=1
(1.9)
n
cj s j =
j =1
cj
j =1
m
i=1
aij v i =
m
i=1
n
aij cj v i .
j =1
Since {v 1 , . . . , v m } is linearly independent it follows that bi = nj=1 aij cj for i =
1, . . . , m or b = Ac. Finally, to show that A has linearly independent columns
suppose b := Ac = 0 for some c = [c1 , . . . , cn ]T . Define x := nj=1 cj s j . Then
x= m
i=1 bi v i and since b = 0 we have x = 0. But since {s 1 , . . . , s n } is linearly
independent it follows that c = 0.
The matrix A in Lemma 1.2 is called a change of basis matrix.
1.2.3 The Vector Spaces Rn and Cn
When V = Rm or Cm we can think of n vectors in V, say x 1 , . . . , x n , as a set
X := {x 1 , . . . , x n } or as the columns of an m × n matrix X = [x 1 , . . . , x n ]. A
linear combination can then be written as a matrix times vector Xc, where c =
[c1 , . . . , cn ]T is the vector of scalars. Thus
R(X) := {Xc : c ∈ Rn } = span(X ).
Definition 1.6 (Column Space, Null Space, Inner Product and Norm) Associated with an m × n matrix X = [x 1 , . . . , x n ], where x j ∈ V, j = 1, . . . , n are the
following subspaces of V.
1. The subspace R(X) is called the column space of X. It is the smallest subspace
containing X = {x 1 , . . . , x n }. The dimension of R(X) is called the rank of X.
The matrix X has rank n if and only if it has linearly independent columns.
2. R(X T ) is called the row space of X. It is generated by the rows of X written as
column vectors.
3. The subspace N (X) := {y ∈ Rn : Xy = 0} is called the null space or kernel
space of X. The dimension of N (X) is called the nullity of X and denoted
null(X).
1.3 Linear Systems
11
4. The standard inner product is
x, y := y ∗ x = x T y =
n
xj yj .
(1.10)
√
x ∗ x.
(1.11)
j =1
5. The Euclidian norm is defined by
n
1/2
|xj |2
x2 :=
=
j =1
Clearly N (X) is nontrivial if and only if X has linearly dependent columns. Inner
products and norms are treated in more generality in Chaps. 5 and 8.
The following Theorem is shown in any basic course in linear algebra. See
Exercise 7.10 for a simple proof using the singular value decomposition.
Theorem 1.5 (Counting Dimensions of Fundamental Subspaces) Suppose X ∈
Cm×n . Then
1. rank(X) = rank(X∗ ).
2. rank(X) + null(X) = n,
3. rank(X) + null(X ∗ ) = m,
1.3 Linear Systems
Consider a linear system
a11 x1 + a12x2 + · · · + a1n xn = b1
a21 x1 + a22x2 + · · · + a2n xn = b2
..
..
..
..
.
.
.
.
am1 x1 + am2 x2 + · · · + amn xn = bm
of m equations in n unknowns. Here for all i, j , the coefficients aij , the unknowns
xj , and the components bi of the right hand side are real or complex numbers. The
system can be written as a vector equation
x1 a 1 + x2 a 2 + · · · + xn a n = b,
12
1 A Short Review of Linear Algebra
T
T
where a j = a1j , . . . , a mj ∈ Cm for j = 1, . . . , n and b = b1 , . . . , bm ∈ Cm .
It can also be written as a matrix equation
⎡
a11 a12
⎢ a21 a22
⎢
Ax = ⎢ .
..
⎣ ..
.
am1 am2
···
···
..
.
⎤⎡ ⎤ ⎡ ⎤
x1
b1
a1n
⎢ x2 ⎥ ⎢ b2 ⎥
a2n ⎥
⎥⎢ ⎥ ⎢ ⎥
.. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ = b.
. ⎦⎣ . ⎦ ⎣ . ⎦
· · · amn
xn
bm
The system is homogeneous if b = 0 and it is said to be underdetermined,
square, or overdetermined if m < n, m = n, or m > n, respectively.
1.3.1 Basic Properties
A linear system has a unique solution, infinitely many solutions, or no solution. To
discuss this we first consider the real case, and a homogeneous underdetermined
system.
Lemma 1.3 (Underdetermined System) Suppose A ∈ Rm×n with m < n. Then
there is a nonzero x ∈ Rn such that Ax = 0.
Proof Suppose A ∈ Rm×n with m < n. The n columns of A span a subspace of
Rm . Since Rm has dimension m the dimension of this subspace is at most m. By
Lemma 1.1 the columns of A must be linearly dependent. It follows that there is a
nonzero x ∈ Rn such that Ax = 0.
A square matrix is either nonsingular or singular.
Definition 1.7 (Real Nonsingular or Singular Matrix) A square matrix A ∈
Rn×n is said to be nonsingular if the only real solution of the homogeneous system
Ax = 0 is x = 0. The matrix is singular if there is a nonzero x ∈ Rn such that
Ax = 0.
Theorem 1.6 (Linear Systems; Existence and Uniqueness) Suppose A ∈ Rn×n .
The linear system Ax = b has a unique solution x ∈ Rn for any b ∈ Rn if and only
if the matrix A is nonsingular.
Proof Suppose A is nonsingular. We define B = A b ∈ Rn×(n+1) by adding a
column to A. ByLemma 1.3 there is a nonzero z ∈ Rn+1 such that Bz = 0. If we
T
z̃
write z =
where z̃ = z1 , . . . , zn ∈ Rn and zn+1 ∈ R, then
zn+1
Bz = [A b]
z̃
zn+1
= Az̃ + zn+1 b = 0.
1.3 Linear Systems
13
We cannot have zn+1 = 0 for then Az̃ = 0 for a nonzero z̃, contradicting the
nonsingularity of A. Define x := −z̃/zn+1 . Then
Ax = −A
z̃
zn+1
=−
1
zn+1
Az̃ = −
1 zn+1
− zn+1 b = b,
so x is a solution.
Suppose Ax = b and Ay = b for x, y ∈ Rn . Then A(x − y) = 0 and since A is
nonsingular we conclude that x − y = 0 or x = y. Thus the solution is unique.
Conversely, if Ax = b has a unique solution for any b ∈ Rn then Ax = 0 has a
unique solution which must be x = 0. Thus A is nonsingular.
For the complex case we have
Lemma 1.4 (Complex Underdetermined System) Suppose A ∈ Cm×n with m <
n. Then there is a nonzero x ∈ Cn such that Ax = 0.
Definition 1.8 (Complex Nonsingular Matrix) A square matrix A ∈ Cn×n is said
to be nonsingular if the only complex solution of the homogeneous system Ax = 0
is x = 0. The matrix is singular if it is not nonsingular.
Theorem 1.7 (Complex Linear System; Existence and Uniqueness) Suppose
A ∈ Cn×n . The linear system Ax = b has a unique solution x ∈ Cn for any
b ∈ Cn if and only if the matrix A is nonsingular.
1.3.2 The Inverse Matrix
Suppose A ∈ Cn×n is a square matrix. A matrix B ∈ Cn×n is called a right inverse
of A if AB = I . A matrix C ∈ Cn×n is said to be a left inverse of A if CA = I .
We say that A is invertible if it has both a left- and a right inverse. If A has a right
inverse B and a left inverse C then
C = CI = C(AB) = (CA)B = I B = B
and this common inverse is called the inverse of A and denoted by A−1 . Thus the
inverse satisfies A−1 A = AA−1 = I .
We want to characterize the class of invertible matrices and start with a lemma.
Theorem 1.8 (Product of Nonsingular Matrices) If A, B, C ∈ Cn×n with AB =
C then C is nonsingular if and only if both A and B are nonsingular. In particular,
if either AB = I or BA = I then A is nonsingular and A−1 = B.
Proof Suppose both A and B are nonsingular and let Cx = 0. Then ABx = 0 and
since A is nonsingular we see that Bx = 0. Since B is nonsingular we have x = 0.
We conclude that C is nonsingular.
14
1 A Short Review of Linear Algebra
For the converse suppose first that B is singular and let x ∈ Cn be a nonzero
vector so that Bx = 0. But then Cx = (AB)x = A(Bx) = A0 = 0 so C is
singular. Finally suppose B is nonsingular, but A is singular. Let x̃ be a nonzero
vector such that Ax̃ = 0. By Theorem 1.7 there is a vector x such that Bx = x̃ and
x is nonzero since x̃ is nonzero. But then Cx = (AB)x = A(Bx) = Ax̃ = 0 for a
nonzero vector x and C is singular.
Theorem 1.9 (When Is a Square Matrix Invertible?) A square matrix is invertible if and only if it is nonsingular.
Proof Suppose first A is a nonsingular matrix. By Theorem 1.7 eachof the linear
systems Abi = ei has a unique solution bi for i = 1, . . . , n. Let B = b1 , . . . , b n .
Then AB = Ab1 , . . . , Abn = e1 , . . . , en = I so that A has a right inverse B.
By Theorem 1.8 B is nonsingular since I is nonsingular and AB = I . Since B is
nonsingular we can use what we have shown for A to conclude that B has a right
inverse C, i.e. BC = I . But then AB = BC = I so B has both a right inverse and
a left inverse which must be equal so A = C. Since BC = I we have BA = I , so
B is also a left inverse of A and A is invertible.
Conversely, if A is invertible then it has a right inverse B. Since AB = I and I
is nonsingular, we again use Theorem 1.8 to conclude that A is nonsingular.
To verify that some matrix B is an inverse of another matrix A it is enough to
show that B is either a left inverse or a right inverse of A. This calculation also
proves that A is nonsingular. We use this observation to give simple proofs of the
following results.
Corollary 1.2 (Basic Properties of the Inverse Matrix) Suppose A, B ∈ Cn×n
are nonsingular and c is a nonzero constant.
1.
2.
3.
4.
5.
A−1 is nonsingular and (A−1 )−1 = A.
C = AB is nonsingular and C −1 = B −1 A−1 .
AT is nonsingular and (AT )−1 = (A−1 )T =: A−T .
A∗ is nonsingular and (A∗ )−1 = (A−1 )∗ =: A−∗ .
cA is nonsingular and (cA)−1 = 1c A−1 .
Proof
1. Since A−1 A = I the matrix A is a right inverse of A−1 . Thus A−1 is nonsingular
and (A−1 )−1 = A.
2. We note that (B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 B = I . Thus AB is
invertible with the indicated inverse since it has a left inverse.
3. Now I = I T = (A−1 A)T = AT (A−1 )T showing that (A−1 )T is a right inverse
of AT . The proof of part 4 is similar.
4. The matrix 1c A−1 is a one sided inverse of cA.
1.4 Determinants
15
1.4 Determinants
The first systematic treatment of determinants was given by Cauchy in 1812. He
adopted the word “determinant”. The first use of determinants was made by Leibniz
in 1693 in a letter to De L’Hôspital. By the beginning of the twentieth century the
theory of determinants filled four volumes of almost 2000 pages (Muir, 1906–1923.
Historic references can be found in this work). The main use of determinants in this
text will be to study the characteristic polynomial of a matrix and to show that a
matrix is nonsingular.
For any A ∈ Cn×n the determinant of A is defined by the number
sign(σ )aσ (1),1aσ (2),2 · · · aσ (n),n .
det(A) =
(1.12)
σ ∈Sn
This sum ranges of all n! permutations of {1, 2, . . . , n}. Moreover, sign(σ ) equals
the number of times a bigger integer precedes a smaller one in σ . We also denote
the determinant by (Cayley, 1841)
a11
a21
.
..
a
n1
a1n a2n .. .
. · · · ann a12 · · ·
a22 · · ·
..
.
an2
From the definition we have
a11 a12 a21 a22 = a11 a22 − a21 a12 .
The first term on the right corresponds to the identity permutation given by (i) =
i, i = 1, 2. The second term comes from the permutation σ = {2, 1}. For n = 3
there are six permutations of {1, 2, 3}. Then
a11 a12 a13 a21 a22 a23 = a11 a22 a33 − a11 a32a23 − a21 a12a33
a a a 31 32 33
+ a21 a32a13 + a31 a12a23 − a31 a22a13 .
This follows since sign({1, 2, 3}) = sign({2, 3, 1}) = sign({3, 1, 2}) = 1, and
noting that interchanging two numbers in a permutation reverses it sign we find
sign({2, 1, 3}) = sign({3, 2, 1}) = sign({1, 3, 2}) = −1.
To compute the value of a determinant from the definition can be a trying
experience. It is often better to use elementary operations on rows or columns
to reduce it to a simpler form. For example, if A is triangular then det(A) =
16
1 A Short Review of Linear Algebra
a11 a22 · · · ann , the product of the diagonal elements. In particular, for the identity
matrix det(I ) = 1. The elementary operations using either rows or columns are
1. Interchanging two rows(columns): det(B) = − det(A),
2. Multiply a row(column) by a scalar: α, det(B) = α det(A),
3. Add a constant multiple of one row(column) to another row(column):
det(B) = det(A).
where B is the result of performing the indicated operation on A.
If only a few elements in a row or column are nonzero then a cofactor expansion
can be used. These expansions take the form
n
det(A) =
(−1)i+j aij det(Aij ) for i = 1, . . . , n, row
(1.13)
(−1)i+j aij det(Aij ) for j = 1, . . . , n, column.
(1.14)
j =1
n
det(A) =
i=1
Here Ai,j denotes the submatrix of A obtained by deleting the ith row and j th
column of A. For A ∈ Cn×n and 1 ≤ i, j ≤ n the determinant det(Aij ) is called the
cofactor of aij .
Example 1.3 (Determinant Equation for a Straight Line) The equation for a straight
line through two points (x1 , y1 ) and (x2 , y2 ) in the plane can be written as the
equation
1 x y det(A) := 1 x1 y1 = 0
1 x y 2 2
involving a determinant of order 3. We can compute this determinant using row
operations of type 3. Subtracting row 2 from row 3 and then row 1 from row 2, and
then using a cofactor expansion on the first column we obtain
1 x y 1
x
y 1 x1 y1 = 0 x1 − x y1 − y 1 x y 0 x − x y − y 2 2
2
1 2
1
x1 − x y1 − y = (x1 − x)(y2 − y1 ) − (y1 − y)(x2 − x1 ).
=
x2 − x1 y2 − y1 Rearranging the equation det(A) = 0 we obtain
y − y1 =
y2 − y1
(x − x1 )
x2 − x1
1.4 Determinants
17
which is the slope form of the equation of a straight line.
We will freely use, without proofs, the following properties of determinants. If
A, B are square matrices of order n with real or complex elements, then
det(AB) = det(A) det(B).
det(AT ) = det(A), and det(A∗ ) = det(A), (complex conjugate).
det(aA) = a n det(A), for a ∈ C.
A is singular
if
and only if det(A) = 0.
CD
5. If A =
for some square matrices C, E then det(A) = det(C) det(E).
0 E
6. Cramer’s rule Suppose A ∈ Cn×n is nonsingular and b ∈ Cn . Let x =
[x1 , x2 , . . . , xn ]T be the unique solution of Ax = b. Then
1.
2.
3.
4.
xj =
det(Aj (b))
,
det(A)
j = 1, 2, . . . , n,
where Aj (b) denote the matrix obtained from A by replacing the j th column of
A by b.
7. Adjoint formula for the inverse. If A ∈ Cn×n is nonsingular then
A−1 =
1
adj(A),
det(A)
where the matrix adj(A) ∈ Cn×n with elements adj(A)i,j = (−1)i+j det(Aj,i )
is called the adjoint of A. Moreover, Aj,i denotes the submatrix of A obtained
by deleting the j th row and ith column of A.
8. Cauchy-Binet formula: Let A ∈ Cm×p , B ∈ Cp×n and C = AB. Suppose
1 ≤ r ≤ min{m, n, p} and let i = {i1 , . . . , ir } and j = {j1 , . . . , jr } be integers
with 1 ≤ i1 < i2 < · · · < ir ≤ m and 1 ≤ j1 < j2 < · · · < jr ≤ n. Then
⎡
⎤
ci1 ,j1 · · · ci1 ,jr
⎢ ..
.. ⎥ =
⎣ .
. ⎦
cir ,j1 · · · cir ,jr
⎡
k
⎤⎡
⎤
ai1 ,k1 · · · ai1 ,kr
bk1 ,j1 · · · bk1 ,jr
⎢ ..
.. ⎥ ⎢ ..
.. ⎥ ,
⎣ .
. ⎦⎣ .
. ⎦
air k1 · · · air ,kr
bkr ,j1 · · · bkr ,jr
where we sum over all k = {k1 , . . . , kr } with 1 ≤ k1 < k2 < · · · < kr ≤ p.
More compactly,
det C(i, j ) =
det A(i, k) det B(k, j ) ,
k
Note the resemblance to the formula for matrix multiplication.
(1.15)
18
1 A Short Review of Linear Algebra
1.5 Eigenvalues, Eigenvectors and Eigenpairs
Suppose A ∈ Cn×n is a square matrix, λ ∈ C and x ∈ Cn . We say that (λ, x) is an
eigenpair for A if Ax = λx and x is nonzero. The scalar λ is called an eigenvalue
and x is said to be an eigenvector.1 The set of eigenvalues is called the spectrum
of A and is denoted by σ (A). For example, σ (I ) = {1, . . . , 1} = {1}.
Eigenvalues are the roots of the characteristic polynomial.
Lemma 1.5 (Characteristic Equation) For any A ∈ Cn×n we have λ ∈
σ (A) ⇐⇒ det(A − λI ) = 0.
Proof Suppose (λ, x) is an eigenpair for A. The equation Ax = λx can be written
(A − λI )x = 0. Since x is nonzero the matrix A − λI must be singular with a
zero determinant. Conversely, if det(A − λI ) = 0 then A − λI is singular and
(A − λI )x = 0 for some nonzero x ∈ Cn . Thus Ax = λx and (λ, x) is an eigenpair
for A.
The expression det(A − λI ) is a polynomial of exact degree n in λ. For n = 3
we have
a11 − λ a12
a13 det(A − λI ) = a21 a22 − λ a23 .
a
a32 a33 − λ 31
Expanding this determinant by the first column we find
a − λ a23 − a21 a12 a13 det(A − λI ) = (a11 − λ) 22
a32 a33 − λ
a32 a33 − λ a12 a13 = (a11 − λ)(a22 − λ)(a33 − λ) + r(λ)
+ a31 a22 − λ a23 for some polynomial r of degree at most one. In general
det(A − λI ) = (a11 − λ)(a22 − λ) · · · (ann − λ) + r(λ),
(1.16)
where each term in r(λ) has at most n − 2 factors containing λ. It follows that r is
a polynomial of degree at most n − 2, det(A − λI ) is a polynomial of exact degree
n in λ and the eigenvalues are the roots of this polynomial.
We observe that det(A − λI ) = (−1)n det(λI − A) so det(A − λI ) = 0 if and
only if det(λI − A) = 0.
1 The
word “eigen” is derived from German and means “own”.
1.5 Eigenvalues, Eigenvectors and Eigenpairs
19
Definition 1.9 (Characteristic Polynomial of a Matrix) The function πA : C →
C given by πA (λ) = det(A − λI ) is called the characteristic polynomial of A.
The equation det(A − λI ) = 0 is called the characteristic equation of A.
By the fundamental theorem of algebra an n × n matrix has, counting multiplicities, precisely n eigenvalues λ1 , . . . , λn some of which might be complex even if A
is real. The complex eigenpairs of a real matrix occur in complex conjugate pairs.
Indeed, taking the complex conjugate on both sides of the equation Ax = λx with
A real gives Ax = λx.
Theorem 1.10 (Sums and Products of Eigenvalues; Trace) For any A ∈ Cn×n
trace(A) = λ1 + λ2 + · · · + λn ,
det(A) = λ1 λ2 · · · λn ,
(1.17)
where the trace of A ∈ Cn×n is the sum of its diagonal elements
trace(A) := a11 + a22 + · · · + ann .
(1.18)
Proof We compare two different expansions of πA . On the one hand from (1.16)
we find
πA (λ) = (−1)n λn + cn−1 λn−1 + · · · + c0 ,
where cn−1 = (−1)n−1 trace(A) and c0 = πA (0) = det(A). On the other hand
πA (λ) = (λ1 − λ) · · · (λn − λ) = (−1)n λn + dn−1 λn−1 + · · · + d0 ,
where dn−1 = (−1)n−1 (λ1 + · · · + λn ) and d0 = λ1 · · · λn . Since cj = dj for all j
we obtain (1.17).
For a 2 × 2 matrix the characteristic equation takes the convenient form
λ2 − trace(A)λ + det(A) = 0.
(1.19)
Thus, if A = 21 12 then trace(A) = 4, det(A) = 3 so that πA (λ) = λ2 − 4λ + 3.
Since A is singular ⇐⇒ Ax = 0, some x = 0 ⇐⇒ Ax = 0x, some x =
0 ⇐⇒ zero is an eigenvalue of A, we obtain
Theorem 1.11 (Zero Eigenvalue) The matrix A ∈ Cn×n is singular if and only if
zero is an eigenvalue.
Since the determinant of a triangular matrix is equal to the product of the diagonal
elements the eigenvalues of a triangular matrix are found on the diagonal. In general
it is not easy to find all eigenvalues of a matrix. However, sometimes the dimension
of the problem can be reduced. Since the determinant of a block triangular matrix is
equal to the product of the determinants of the diagonal blocks we obtain
20
1 A Short Review of Linear Algebra
Theorem 1.12 (Eigenvalues of a Block Triangular Matrix) If A =
block triangular then πA = πB · πC .
B
D
0 C
is
1.6 Exercises Chap. 1
1.6.1 Exercises Sect. 1.1
Exercise 1.1 (Strassen Multiplication (Exam Exercise 2017-1)) (By arithmetic
operations we mean additions, subtractions, multiplications and divisions.)
Let A and B be n × n real matrices.
a) With A, B ∈ Rn×n , how many arithmetic operations are required to form the
product AB?
b) Consider the 2n × 2n block matrix
W X
AB E F
=
,
Y Z
CD GH
where all matrices A, . . . , Z are in Rn×n . How many operations does it take to
compute W , X, Y and Z by the obvious algorithm?
c) An alternative method to compute W , X, Y and Z is to use Strassen’s formulas:
P1 = (A + D) (E + H ) ,
P2 = (C + D) E,
P5 = (A + B) H ,
P3 = A (F − H ) ,
P6 = (C − A) (E + F ) ,
P4 = D (G − E) ,
P7 = (B − D) (G + H ) ,
W = P1 + P4 − P5 + P7 ,
X = P3 + P5 ,
Y = P2 + P4 ,
Z = P1 + P3 − P2 + P6 .
You do not have to verify these formulas. What is the operation count for this
method?
d) Describe a recursive algorithm, based on Strassen’s formulas, which given two
matrices A and B of size m × m, with m = 2k for some k ≥ 0, calculates the
product AB.
e) Show that the operation count of the recursive algorithm is O mlog2 (7) . Note
that log2 (7) ≈ 2.8 < 3, so this is less costly than straightforward matrix
multiplication.
1.6 Exercises Chap. 1
21
1.6.2 Exercises Sect. 1.3
Exercise 1.2 (The Inverse of a General 2 × 2 Matrix) Show that
ab
cd
−1
=α
d −b
,
−c a
α=
1
,
ad − bc
for any a, b, c, d such that ad − bc = 0.
Exercise 1.3 (The Inverse of a Special 2 × 2 Matrix) Find the inverse of
cos θ − sin θ
A=
.
sin θ cos θ
Exercise 1.4 (Sherman-Morrison Formula) Suppose A ∈ Cn×n , and B, C ∈
Rn×m for some n, m ∈ N. If (I + C T A−1 B)−1 exists then
(A + BC T )−1 = A−1 − A−1 B(I + C T A−1 B)−1 C T A−1 .
Exercise 1.5 (Inverse Update (Exam Exercise 1977-1))
a) Let u, v ∈ Rn and suppose v T u = 1. Show that I − uv T has an inverse given
by I − τ uv T , where τ = 1/(v T u − 1).
b) Let A ∈ Rn×n be nonsingular with inverse C := A−1 , and let a ∈ Rn . Let A be
the matrix which differs from A by exchanging the ith row of A with a T , i.e.,
A = A − ei (eTi A − a T ), where ei is the ith column in the identity matrix I .
Show that if
λ := a T Cei = 0,
−1
then A has an inverse C = A
(1.20)
given by
1
C = C I + ei (eTi − a T C)
λ
(1.21)
c) Write an algorithm which to given C and a checks if (1.20) holds and computes
C provided λ = 0. (hint: Use (1.21) to find formulas for computing each column
in C. )
Exercise 1.6 (Matrix Products (Exam Exercise 2009-1)) Let A, B, C, E ∈
Rn×n be matrices where AT = A. In this problem an (arithmetic) operation is an
addition or a multiplication. We ask about exact numbers of operations.
a) How many operations are required to compute the matrix product BC? How
many operations are required if B is lower triangular?
b) Show that there exists a lower triangular matrix L ∈ Rn×n such that A = L+LT .
22
1 A Short Review of Linear Algebra
c) We have E T AE = S + S T where S = E T LE. How many operations are
required to compute E T AE in this way?
1.6.3 Exercises Sect. 1.4
Exercise 1.7 (Cramer’s Rule; Special Case) Solve the following system by
Cramer’s rule:
12
x1
3
=
21
x2
6
Exercise 1.8 (Adjoint Matrix; Special Case) Show that if
⎡
⎤
2 −6 3
A = ⎣ 3 −2 −6 ⎦ ,
6 3 2
then
⎡
⎤
14 21 42
adj(A) = ⎣ −42 −14 21 ⎦ .
21 −42 14
Moreover,
⎤
343 0 0
adj(A)A = ⎣ 0 343 0 ⎦ = det(A)I .
0 0 343
⎡
Exercise 1.9 (Determinant Equation for a Plane) Show that
x
x1
x2
x
3
y
y1
y2
y3
z
z1
z2
z3
1 1 = 0.
1 1
is the equation for a plane through three points (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 )
in space.
1.6 Exercises Chap. 1
23
Fig. 1.1 The triangle T
defined by the three points
P1 , P2 and P3
Exercise 1.10 (Signed Area of a Triangle) Let Pi = (xi , yi ), i = 1, 2, 3, be three
points in the plane defining a triangle T . Show that the area of T is2
1 1 1
1
A(T ) = x1 x2 x3 .
2
y1 y2 y3 The area is positive if we traverse the vertices in counterclockwise order.
Exercise 1.11 (Vandermonde Matrix) Show that
1
1
.
..
1
x12 · · · x1n−1 x22 · · · x2n−1 (xi − xj ),
..
.. =
.
. i>j
xn xn2 · · · xnn−1 x1
x2
..
.
where i>j (xi − xj ) = ni=2 (xi − x1 )(xi − x2 ) · · · (xi − xi−1 ). This determinant
is called the Vandermonde determinant.3
Exercise 1.12 (Cauchy Determinant (1842)) Let α = [α1 , . . . , αn ]T , β =
[β1 , . . . , βn ]T be in Rn .
a) Consider the matrix A ∈ Rn×n with elements ai,j = 1/(αi + βj ), i, j =
1, 2, . . . , n. Show that
det(A) = P g(α)g(β)
2 Hint:
3 Hint:
A(T ) = A(ABP3 P1 ) + A(P3 BCP2 ) − A(P1 ACP2 ), c.f. Fig. 1.1.
subtract xnk times column k from column k +1 for k = n−1, n−2, . . . , 1.
24
1 A Short Review of Linear Algebra
where P =
n
n
i=1
j =1 aij ,
g(γ ) =
and for γ = [γ1 , . . . , γn ]T
n
(γi − γ1 )(γi − γ2 ) · · · (γi − γi−1 )
i=2
Hint: Multiply the ith row of A by nj=1 (αi + βj ) for i = 1, 2, . . . , n. Call the
resulting matrix C. Each element of C is a product of n−1 factors αr +βs . Hence
det(C) is a sum of terms where each term contain precisely n(n − 1) factors
αr + βs . Thus det(C) = q(α, β) where q is a polynomial of degree at most
n(n−1) in αi and βj . Since det(A) and therefore det(C) vanishes if αi = αj for
some i = j or βr = βs for some r = s, we have that q(α, β) must be divisible
by each factor in g(α) and g(β). Since g(α) and g(β) is a polynomial of degree
n(n−1), we have
q(α, β) = kg(α)g(β)
for some constant k independent of α and β. Show that k = 1 by choosing
βi + αi = 0, i = 1, 2, . . . , n.
b) Notice that the cofactor of any element in the above matrix A is the determinant
of a matrix of similar form. Use the cofactor and determinant of A to represent
the elements of A−1 = (bj,k ). Answer:
bj,k = (αk + βj )Ak (−βj )Bj (−αk ),
where
Ak (x) =
s =k
αs − x
αs − αk
,
Bk (x) =
s =k
βs − x
βs − βk
.
Exercise 1.13 (Inverse of the Hilbert Matrix) Let H n = (hi,j ) be the n × n
matrix with elements hi,j = 1/(i+j−1). Use Exercise 1.12 to show that the elements
n
ti,j
in T n = H −1
n are given by
n
ti,j
=
f (i)f (j )
,
i+j −1
where
f (i +1) =
i 2 − n2
i2
f (i),
i = 1, 2, . . . ,
f (1) = −n.
Part I
LU and QR Factorizations
The first three chapters in this part consider ways of factoring a matrix A into a lower
triangular matrix L and an upper triangular matrix U resulting in the product A =
LU . We also consider the factorization A = LDU , where L is lower triangular,
D is diagonal and U is upper triangular. Moreover, L and U have ones on their
respective diagonals.
Three simple introductory problems and related LU factorizations are considered
in Chap. 2. We also consider some basic properties of triangular matrices and
the powerful tool of block multiplication. We consider Gaussian elimination, it’s
relation to LU factorization, and the general theory of LU factorizations in Chap. 3.
Symmetric positive definite matrices, where LU factorizations play an important
role, are considered in Chap. 4.
There exists problems where Gaussian elimination leads to inaccurate results.
Such problems can to a large extent be avoided by using an alternative method based
on QR factorization. Here A = QR, where Q is unitary, i.e., Q∗ Q = I , and R
is upper triangular. QR factorization is related to Gram-Schmidt orthogonalization
of a basis in a vector space. The QR factorization plays an important role in
computational least squares and eigenvalue problems.
Chapter 2
Diagonally Dominant Tridiagonal
Matrices; Three Examples
In this chapter we consider three problems originating from:
• cubic spline interpolation,
• a two point boundary value problem,
• an eigenvalue problem for a two point boundary value problem.
Each of these problems leads to a linear algebra problem with a matrix which is
diagonally dominant and tridiagonal. Taking advantage of structure we can show
existence, uniqueness and characterization of a solution, and derive efficient and
stable algorithms based on LU factorization to compute a numerical solution.
For a particular tridiagonal test matrix we determine all its eigenvectors and
eigenvalues. We will need these later when studying more complex problems.
We end the chapter with an introduction to block multiplication, a powerful tool
in matrix analysis and numerical linear algebra. Block multiplication is applied to
derive some basic facts about triangular matrices.
2.1 Cubic Spline Interpolation
We consider the following interpolation problem.
Given an interval [a, b] , n + 1 ≥ 2 equidistant sites in [a, b]
xi = a +
i−1
(b − a),
n
i = 1, 2, . . . , n + 1
(2.1)
and y values y := [y1 , . . . , yn+1 ]T ∈ Rn+1 . We seek a function g : [a, b] → R such
that
g(xi ) = yi , for i = 1, . . . , n + 1.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_2
(2.2)
27
28
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
For simplicity we only consider equidistant sites. More generally they could be any
a ≤ x1 < x2 < · · · < xn+1 ≤ b.
2.1.1 Polynomial Interpolation
Since there are n + 1 interpolation conditions in (2.2) a natural choice for a function
g is a polynomial of degree n. As shown in most books on numerical methods such
a g is uniquely defined and there are good algorithms for computing it. Evidently,
when n = 1, g is the straight line
g(x) = y1 +
y2 − y1
(x − x1 ),
x2 − x1
(2.3)
known as the linear interpolation polynomial.
Polynomial interpolation is an important technique which often gives good
results, but the interpolant g can have undesirable oscillations when n is large. As
an example, consider the function given by
f (x) = arctan(10x) + π/2,
x ∈ [−1, 1].
The function f and the polynomial g of degree at most 13 satisfying (2.2) with
[a, b] = [−1, 1] and yi = f (xi ), i = 1, . . . , 14 is shown in Fig. 2.1. The interpolant
has large oscillations near the end of the range. This is an example of the Runge
phenomenon. Using larger n will only make the oscillations bigger.1
2.1.2 Piecewise Linear and Cubic Spline Interpolation
To avoid oscillations like the one in Fig. 2.1 piecewise linear interpolation can be
used. An example is shown in Fig. 2.2. The interpolant g approximates the original
function quite well, and for some applications, like plotting, the linear interpolant
using many points is what is used. Note that g is a piecewise polynomial of the form
⎧
⎪
p1 (x),
if x1 ≤ x < x2 ,
⎪
⎪
⎪
⎪
⎪
if x2 ≤ x < x3 ,
p (x),
⎪
⎨ 2
..
g(x) := .
(2.4)
⎪
⎪
⎪
⎪
⎪
p (x), if xn−1 ≤ x < xn ,
⎪
⎪ n−1
⎩
pn (x),
if xn ≤ x ≤ xn+1 ,
1 This is due to the fact that the sites are uniformly spaced. High degree interpolation converges
uniformly to the function being interpolated when a sequence consisting of the extrema of the
Chebyshev polynomial of increasing degree is used as sites. This is not true for any continuous
function (the Faber theorem), but holds if the function is Lipschitz continuous.
2.1 Cubic Spline Interpolation
29
8
6
4
2
−1.0
0.5
−0.5
1.0
−2
−4
−6
Fig. 2.1 The polynomial of degree 13 interpolating f (x) = arctan(10x) + π/2 on [−1, 1]. See
text
3.0
2.5
2.0
1.5
1.0
0.5
−1.0
−0.5
0.5
1.0
Fig. 2.2 The piecewise linear polynomial interpolating f (x) = arctan(10x) + π/2 at n = 14
uniform points on [−1, 1]
where each pi is a polynomial of degree ≤ 1. In particular, p1 is given in (2.3) and
the other polynomials pi are given by similar expressions.
The piecewise linear interpolant is continuous, but the first derivative will usually
have jumps at the interior sites. We can obtain a smoother approximation by letting
g be a piecewise polynomial of higher degree. With degree 3 (cubic) we obtain
continuous derivatives of order ≤ 2 (C 2 ). We consider here the following functions
giving examples of C 2 cubic spline interpolants.
30
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
Definition 2.1 (The D2 -Spline Problem) Given n ∈ N, an interval [a, b], y ∈
Rn+1 , knots (sites) x1 , . . . , xn+1 given by (2.1) and numbers μ1 , μn+1 . The problem
is to find a function g : [a, b] → R such that
• piecewise cubic polynomial: g is of the form (2.4) with each pi a cubic
polynomial, ,
• smoothness: g ∈ C 2 [a, b], i.e., derivatives of order ≤ 2 are continuous on R,
• interpolation: g(xi ) = yi , i = 1, 2, . . . , n + 1,
• D 2 boundary conditions: g (a) = μ1 , g (b) = μn+1 .
We call g a D2 -spline. It is called an N-spline or natural spline if μ1 = μn+1 = 0.
Example 2.1 (A D2 -Spline) Suppose we choose n = 2 and sample data from the
function f : [0, 2] → R given by f (x) = x 4 . Thus we consider the D2 -spline
problem with [a, b] = [0, 2], y := [0, 1, 16]T and μ1 = g (0) = 0, μ3 = g (2) =
48. The knots are x1 = 0, x2 = 1 and x3 = 2. The function g given by
g(x) :=
p1 (x) = − 12 x + 32 x 3 ,
p2 (x) = 1 + 4(x − 1) +
if 0 ≤ x < 1,
9
2 (x
−
1)2
+
13
2 (x
−
1)3 ,
if 1 ≤ x ≤ 2,
(2.5)
is a D2 -spline solving this problem. Indeed, p1 and p2 are cubic polynomials. For
smoothness we find p1 (1) = p2 (1) = 1, p1 (1) = p2 (1) = 4, p1 (1) = p2 (1) =
9 which implies that g ∈ C 2 [0, 2]. Finally we check that the interpolation and
boundary conditions hold. Indeed, g(0) = p1 (0) = 0, g(1) = p2 (1) = 1, g(2) =
p2 (2) = 16, g (0) = p1 (0) = 0 and g (2) = p2 (2) = 48. Note that p1 (x) = 9 =
39 = p2 (x) showing that the third derivative of g is piecewise constant with a jump
discontinuity at the interior knot. A plot of f and g is shown in Fig. 2.3. It is hard
to distinguish one from the other.
We note that
• The C 2 condition is equivalent to
(j )
(j )
pi−1 (xi ) = pi (xi ),
j = 0, 1, 2,
i = 2, . . . , n.
• The extra boundary conditions D2 or N are introduced to obtain a unique
interpolant. Indeed counting requirements we have 3(n − 1) C 2 conditions,
n + 1 conditions (2.2), and two boundary conditions, adding to 4n. Since a
cubic polynomial has four coefficients, this number is equal to the number of
coefficients of the n polynomials p1 , . . . , pn and give hope for uniqueness of the
interpolant.
2.1 Cubic Spline Interpolation
31
Fig. 2.3 A cubic spline with one knot interpolating f (x) = x 4 on [0, 2]
2.1.3 Give Me a Moment
Existence and uniqueness of a solution of the D2 -spline problem hinges on the
nonsingularity of a linear system of equations that we now derive. The unknowns
are derivatives at the knots. Here we use second derivatives which are sometimes
called moments. We start with the following lemma.
Lemma 2.1 (Representing Each pi Using (0, 2) Interpolation) Given a < b,
h = (b−a)/n with n ≥ 2, xi = a+(i−1)h, and numbers yi , μi for i = 1, . . . , n+1.
For i = 1, . . . , n there are unique cubic polynomials pi such that
pi (xi ) = yi , pi (xi+1 ) = yi+1 ,
pi (xi ) = μi , pi (xi+1 ) = μi+1 .
(2.6)
Moreover,
pi (x) = ci,1 + ci,2 (x − xi ) + ci,3 (x − xi )2 + ci,4 (x − xi )3
i = 1, . . . , n,
(2.7)
where
ci1 = yi , ci2 =
h
yi+1 − yi
h
μi
μi+1 − μi
− μi − μi+1 , ci,3 =
, ci,4 =
.
h
3
6
2
6h
(2.8)
32
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
Proof Consider pi in the form (2.7) for some 1 ≤ i ≤ n. Evoking (2.6) we find
pi (xi ) = ci,1 = yi . Since pi (x) = 2ci,3 +6ci,4 (x −xi ) we obtain ci,3 from pi (xi ) =
2ci,3 = μi (a moment), and then ci,4 from pi (xi+1 ) = μi + 6hci,4 = μi+1 . Finally
we find ci,2 by solving pi (xi+1 ) = yi + ci,2 h + μ2i h2 + μi+16h−μi h3 = yi+1 . For
j = 0, 1, 2, 3 the shifted powers (x − xi )j constitute a basis for cubic polynomials
and the formulas (2.8) are unique by construction. It follows that pi is unique. Theorem 2.1 (Constructing a D2 -Spline) Suppose for some moments μ1 , . . . ,
μn+1 that each pi is given as in Lemma 2.1 for i = 1, . . . , n. If in addition
μi−1 + 4μi + μi+1 =
6
(yi+1 − 2yi + yi−1 ),
h2
i = 2, . . . , n,
(2.9)
then the function g given by (2.4) solves a D2 -spline problem.
Proof Suppose for 1 ≤ i ≤ n that pi is given as in Lemma 2.1 for some
μ1 , . . . , μn+1 . Consider the C 2 requirement. Since pi−1 (xi ) = pi (xi ) = yi and
(x ) = p (x ) = μ for i = 2, . . . , n it follows that g ∈ C 2 if and only if
pi−1
i
i
i i
(x ) = p (x ) for i = 2, . . . , n. By (2.7)
pi−1
i
i i
pi−1
(xi ) = ci−1,2 + 2hci−1,3 + 3h2 ci−1,4
yi − yi−1
h
μi−1
h
μi − μi−1
− μi−1 − μi + 2h
+ 3h2
h
3
6
2
6h
h
yi − yi−1
h
+ μi−1 + μi
=
h
6
3
h
−
y
y
h
i+1
i
− μi − μi+1 .
pi (xi ) = ci2 =
h
3
6
=
(2.10)
(x ) = p (x ) if and only if (2.9) holds.
A simple calculation shows that pi−1
i
i i
Finally consider the function g given by (2.4). If (2.9) holds then g ∈ C 2 [a, b].
By construction g(xi ) = yi , i = 1, . . . , n + 1, g (a) = p1 (x1 ) = μ1 and g (b) =
pn (xn+1 ) = μn+1 . It follows that g solves the D2 -spline problem.
In order for the D2 -spline to exist we need to show that μ2 , . . . , μn always can be
determined from (2.9). For n ≥ 3 and with μ1 and μn+1 given (2.9) can be written
in the form
⎡
4
⎢1
⎢
⎢
⎢
⎢
⎣
⎤⎡
⎤
⎡ 2
⎤
μ2
δ y 2 − μ1
1
⎥ ⎢ μ3 ⎥
⎢
⎥
δ 2 y3
4 1
⎥⎢
⎥
⎢
⎥
6
⎢ .. ⎥
⎢
⎥ 2
.
.. .. .. ⎥
.
=
⎥
⎢
⎥
⎢
⎥ , δ yi := yi+1 − 2yi + yi−1 .
. . . ⎥ ⎢ . ⎥ h2 ⎢
.
⎥
2
⎦
⎣
⎦
⎣
1 4 1
μn−1
δ yn−1 ⎦
1 4
μn
δ 2 yn − μn+1
(2.11)
2.1 Cubic Spline Interpolation
33
This is a square linear system of equations. We recall (see Theorem 1.6) that a
square system Ax = b has a solution for all right hand sides b if and only if the
coefficient matrix A is nonsingular, i.e., the homogeneous system Ax = 0 only
has the solution x = 0. Moreover, the solution is unique. We need to show that the
coefficient matrix in (2.11) is nonsingular.
We observe that the matrix in (2.11) is strictly diagonally dominant in accordance
with the following definition.
Definition 2.2 (Strict Diagonal Dominance) The matrix A = [aij ] ∈ Cn×n is
strictly diagonally dominant if
|aii | >
|aij |, i = 1, . . . , n.
(2.12)
j =i
Theorem 2.2 (Strict Diagonal Dominance) A strictly diagonally dominant matrix
is nonsingular. Moreover, if A ∈ Cn×n is strictly diagonally dominant then the
solution x of Ax = b is bounded as follows:
|b | i
, where σi := |aii | −
1≤i≤n σi
max |xi | ≤ max
1≤i≤n
|aij |.
(2.13)
j =i
Proof We first show that the bound (2.13) holds for any solution x. Choose k so
that |xk | = maxi |xi |. Then
|bk | = |akk xk +
|akj ||xj | ≥ |xk | |akk | −
akj xj | ≥ |akk ||xk | −
j =k
j =k
|akj | ,
j =k
and this implies max1≤i≤n |xi | = |xk | ≤ |bσkk | ≤ max1≤i≤n |bσii | . For the
nonsingularity, if Ax = 0, then max1≤i≤n |xi | ≤ 0 by (2.13), and so x = 0.
For an alternative simple proof of the nonsingularity based on Gershgorin circle
theorem see Exercise 14.3.
Theorem 2.2 implies that the system (2.11) has a unique solution giving rise
to a function g detailed in Lemma 2.1 and solving the D2 -spline problem. For
uniqueness suppose g1 and g2 are two D2 -splines interpolating the same data. Then
g := g1 −g2 is an N-spline satisfying (2.2) with y = 0. The solution [μ2 , . . . , μn ]T
of (2.11) and also μ1 = μn+1 are zero. It follows from (2.8) that all coefficients ci,j
are zero. We conclude that g = 0 and g1 = g2 .
Example 2.2 (Cubic B-Spline) For the N-spline with [a, b] = [0, 4] and y =
[0, 16 , 23 , 16 , 0] the linear system (2.9) takes the form
⎡
⎤⎡ ⎤ ⎡ ⎤
410
2
μ2
⎣1 4 1⎦ ⎣μ3 ⎦ = ⎣−6⎦ .
μ4
014
2
34
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
Fig. 2.4 A cubic B-spline
The solution is μ2 = μ4 = 1, μ3 = −2. The knotset is {0, 1, 2, 3, 4}. Using (2.8)
(cf. Exercise 2.6) we find
⎧
⎪
p1 (x) = 16 x 3 ,
⎪
⎪
⎪
⎨p (x) = 1 + 1 (x − 1) + 1 (x − 1)2 − 1 (x − 1)3 ,
2
6
2
2
2
g(x) :=
2
2 + 1 (x − 2)3 ,
⎪
p
(x)
=
−
(x
−
2)
⎪
3
3
2
⎪
⎪
⎩
p4 (x) = 16 − 12 (x − 3) + 12 (x − 3)2 − 16 (x − 3)3 ,
if 0 ≤ x < 1,
if 1 ≤ x < 2,
if 2 ≤ x < 3,
if 3 ≤ x ≤ 4,
(2.14)
A plot of this spline is shown in Fig. 2.4. On (0, 4) the function g equals the nonzero
part of a function known as a C 2 cubic B-spline.
2.1.4 LU Factorization of a Tridiagonal System
To find the D 2 -spline g we have to solve the triangular system (2.11). Consider
solving a general tridiagonal linear system Ax = b where A = tridiag(ai , di , ci ) ∈
2.1 Cubic Spline Interpolation
35
Cn×n . Instead of using Gaussian elimination directly, we can construct two matrices
L and U such that A = LU . Since Ax = LU x = b we can find x by solving
two systems Lz = b and U x = z. Moreover L and U are both triangular and
bidiagonal, and if in addition they are nonsingular the two systems can be solved
easily without using elimination.
In our case we write the product A = LU in the form
⎡
d1
⎢a1
⎢
⎢
⎢
⎢
⎣
⎤
⎡
c1
1
⎥
d 2 c2
⎥ ⎢l1 1
⎥ ⎢
..
.. ..
⎥=⎢ . .
.
.
.
⎥ ⎣ .. ..
an−2 dn−1 cn−1 ⎦
ln−1
an−1 dn
⎤
⎤⎡
u1 c 1
⎥
⎥ ⎢ .. ..
⎥
⎥⎢
.
.
⎥.
⎥⎢
⎦⎣
un−1 cn−1 ⎦
1
un
(2.15)
To find L and U we first consider the case n = 3. Equation (2.15) takes the form
⎡
⎤ ⎡
⎤ ⎡
⎤
⎤⎡
1 0 0
u1
d 1 c1 0
u1 c 1 0
c1
0
⎣a1 d2 c2 ⎦ = ⎣l1 1 0⎦ ⎣ 0 u2 c2 ⎦ = ⎣l1 u1 l1 c1 + u2
⎦,
c2
0 a2 d3
0 0 u3
0
l2 u2 l2 c2 + u3
0 l2 1
and the systems Lz = b and U x = z can be written
⎡
⎤⎡ ⎤ ⎡ ⎤
1 0 0
b1
z1
⎣l1 1 0⎦ ⎣z2 ⎦ = ⎣b2 ⎦ ,
z3
b3
0 l2 1
⎡
⎤⎡ ⎤ ⎡ ⎤
u1 c 1 0
z1
x1
⎣ 0 u2 c2 ⎦ ⎣x2 ⎦ = ⎣z2 ⎦ .
0 0 u3
x3
z3
Comparing elements we find
u1 = d 1 ,
l1 = a1 /u1 ,
z1 = b1 ,
z2 = b2 − l1 z1 ,
x3 = z3 /u3 ,
u2 = d2 − l1 c1 ,
l2 = a2 /u2 ,
u3 = d3 − l2 c2 ,
z3 = b3 − l2 z2 ,
x2 = (z2 − c2 x3 )/u2 ,
x1 = (z1 − c1 x2 )/u1 .
In general, if
u1 = d 1 ,
lk = ak /uk ,
uk+1 = dk+1 − lk ck ,
k = 1, 2, . . . , n − 1,
(2.16)
then A = LU . If u1 , u2 , . . . , un−1 are nonzero then (2.16) is well defined. If in
addition un = 0 then we can solve Lz = b and U x = z for z and x. We formulate
this as two algorithms. In trifactor, vectors l ∈ Cn−1 , u ∈ Cn are computed
from a, c ∈ Cn−1 , d ∈ Cn . This implements the LU factorization of a tridiagonal
matrix:
36
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
function [l,u]=trifactor(a,d,c)
% [l,u]=trifactor(a,d,c)
u=d; l=a;
for k =1:length(a)
l(k)=a(k)/u(k);
u(k+1)=d(k+1)-l(k)*c(k);
end
Listing 2.1 trifactor
In trisolve, the solution x of a tridiagonal system with r right hand sides is
computed from a previous call to trifactor. Here l, ∈ Cn−1 and u ∈ Cn are
output from trifactor and b ∈ Cn,r for some r ∈ N:
function x = trisolve (l,u,c,b)
% x = trisolve (l,u,c,b)
x=b;
n= size(b,1);
for k =2:n
x(k,:)=b(k,:)-l(k-1)*x(k-1,:);
end
x(n,:)=x(n,:)/u(n);
for k=(n-1):-1:1
x(k,:)=(x(k,:)-c(k)*x(k+1,:))/u(k);
end
Listing 2.2 trisolve
Since division by zero can occur, the algorithms will not work in general, but for
tridiagonal strictly diagonally dominant systems we have
Theorem 2.3 (LU of a Tridiagonal Strictly Dominant System) A strictly diagonally dominant tridiagonal matrix has a unique LU factorization of the form (2.15).
Proof We show that the uk ’s in (2.16) are nonzero for k = 1, . . . , n. For this it is
sufficient to show by induction that
|uk | ≥ σk + |ck |,
where, σk := |dk | − |ak−1| − |ck | > 0, k = 1, . . . , n,
(2.17)
and where a0 := cn := 0. By assumption |u1 | = |d1 | = σ1 + |c1 |. Suppose |uk | ≥
σk + |ck | for some 1 ≤ k ≤ n − 1. Then |ck |/|uk | < 1 and by (2.16) and strict
diagonal dominance
|uk+1 | = |dk+1 − lk ck | = |dk+1 −
a k ck
|ak ||ck |
| ≥ |dk+1 | −
uk
|uk |
(2.18)
≥ |dk+1 | − |ak | = σk+1 + |ck+1 |.
2.2 A Two Point Boundary Value Problem
37
Corollary 2.1 (Stability of the LU Factorization) Suppose A ∈ Cn×n is tridiagonal and strictly diagonally dominant with computed elements in the LU factorization
given by (2.16). Then (2.17) holds, u1 = d1 and
|lk | =
|ak |
|ak |
|ak ||ck |
≤
, |uk+1 | ≤ |dk+1 | +
, k = 1, . . . , n − 1.
|uk |
|dk | − |ak−1 |
|dk | − |ak−1 |
(2.19)
Proof Using (2.16) and (2.17) for 1 ≤ k ≤ n − 1 we find
|lk | =
|ak |
|ak |
|ak ||ck |
≤
, |uk+1 | ≤ |dk+1 | + |lk ||ck | ≤ |dk+1 | +
.
|uk |
|dk | − |ak−1 |
|dk | − |ak−1 |
• For a strictly diagonally dominant tridiagonal matrix it follows from Corollary 2.1 that the LU factorization algorithm trifactor is stable meaning that
we cannot have severe growth in the computed elements uk and lk .
• The number of arithmetic operations to compute the LU factorization of a tridiagonal matrix of order n using (2.16) is 3n−3, while the number of arithmetic operations for Algorithm trisolve is r(5n−4), where r is the number of right-hand
sides. This means that the complexity to solve a tridiagonal system is O(n), or
more precisely 8n − 7 when r = 1, and this number only grows linearly2 with n.
2.2 A Two Point Boundary Value Problem
Consider the simple two point boundary value problem
− u (x) = f (x), x ∈ [0, 1],
u(0) = 0, u(1) = 0,
(2.20)
where f is a given continuous function on [0, 1] and u is an unknown function. This
problem is also known as the one-dimensional (1D) Poisson problem. In principle
it is easy to solve (2.20) exactly. We just integrate f twice and determine the two
integration constants so that the homogeneous boundary conditions u(0) = u(1) =
0 are satisfied. For example, if f (x) = 1 then u(x) = x(x − 1)/2 is the solution.
Suppose f cannot be integrated exactly. Problem (2.20) can then be solved
approximately using the finite difference method. We need a difference approximation to the second derivative. If g is a function differentiable at x then
g(x + h2 ) − g(x − h2 )
h→0
h
g (x) = lim
2 We
show in Sect. 3.3.2 that Gaussian elimination on a full n × n system is an O(n3 ) process.
38
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
and applying this to a function u that is twice differentiable at x
u(x+h)−u(x)
−
u (x + h2 ) − u (x − h2 )
h
= lim
h→0
h→0
h
h
u(x + h) − 2u(x) + u(x − h)
= lim
.
h→0
h2
u (x) = lim
u(x)−u(x−h)
h
To define the points where this difference approximation is used we choose a
positive integer m, let h := 1/(m + 1) be the discretization parameter, and replace
the interval [0, 1] by grid points xj := j h for j = 0, 1, . . . , m + 1. We then obtain
approximations vj to the exact solution u(xj ) for j = 1, . . . , m by replacing the
differential equation by the difference equation
−vj −1 + 2vj − vj +1
= f (j h),
h2
j = 1, . . . , m,
v0 = vm+1 = 0.
Moving the h2 factor to the right hand side this can be written as an m × m linear
system
⎡
⎤
⎡
⎤
⎡
⎤
2 −1 0
f (h)
⎢ −1 2 −1
⎥ v1
⎢
⎥⎢
⎢ f (2h) ⎥
⎢
⎥ ⎢ v2 ⎥
⎢
⎥
.. .. ..
⎢
⎥⎢ . ⎥
⎥
⎥
..
.
.
.
2
0
⎥⎢ . ⎥ = h ⎢
Tv = ⎢
⎢
⎥ =: b.
.
⎢
⎥⎢ . ⎥
⎢
⎥
⎢
0⎥⎣
⎣f (m − 1)h ⎦
⎢
⎥ vm−1 ⎦
⎣
−1 2 −1 ⎦
vm
f (mh)
0 −1 2
(2.21)
The matrix T is called the second derivative matrix and will occur frequently in
this book. It is our second example of a tridiagonal matrix, T = tridiag(ai , di , ci ) ∈
Rm×m , where in this case ai = ci = −1 and di = 2 for all i.
2.2.1 Diagonal Dominance
We want to show that (2.21) has a unique solution. Note that T is not strictly
diagonally dominant. However, T is weakly diagonally dominant in accordance
with the following definition.
Definition 2.3 (Diagonal Dominance) The matrix A = [aij ] ∈ Cn×n is weakly
diagonally dominant if
|aii | ≥
|aij |, i = 1, . . . , n.
j =i
(2.22)
2.2 A Two Point Boundary Value Problem
39
We showed in Theorem 2.2 that a strictly diagonally dominant matrix is
nonsingular. This is in general not true in the weakly diagonally dominant case.
Consider the 3 matrices
⎡
⎤
110
A1 = ⎣1 2 1⎦ ,
011
⎡
10
⎣
A2 = 0 0
00
⎤
0
0⎦ ,
1
⎡
⎤
2 −1 0
A3 = ⎣ −1 2 −1 ⎦ .
0 −1 2
They are all weakly diagonally dominant, but A1 and A2 are singular, while A3 is
nonsingular. Indeed, for A1 column two is the sum of columns one and three, A2
has a zero row, and det(A3 ) = 4 = 0. It follows that for the nonsingularity and
existence of an LU factorization of a weakly diagonally dominant matrix we need
some additional conditions. Here are some sufficient conditions.
Theorem 2.4 (Weak Diagonal Dominance) Suppose A = tridiag(ai , di , ci ) ∈
Cn×n is tridiagonal and weakly diagonally dominant. If in addition |d1 | > |c1 | and
ai = 0 for i = 1, . . . , n − 2, then A has a unique LU factorization (2.15). If in
addition dn = 0, then A is nonsingular.
Proof The proof is similar to the proof of Theorem 2.2. The matrix A has an LU
factorization if the uk ’s in (2.16) are nonzero for k = 1, . . . , n − 1. For this it is
sufficient to show by induction that |uk | > |ck | for k = 1, . . . , n − 1. By assumption
|u1 | = |d1 | > |c1 |. Suppose |uk | > |ck | for some 1 ≤ k ≤ n − 2. Then |ck |/|uk | < 1
and by (2.16) and since ak = 0
|uk+1 | = |dk+1 − lk ck | = |dk+1 −
a k ck
|ak ||ck |
> |dk+1 | − |ak |.
| ≥ |dk+1 | −
uk
|uk |
(2.23)
This also holds for k = n − 1 if an−1 = 0. By (2.23) and weak diagonal
dominance |uk+1 | > |dk+1 | − |ak | ≥ |ck+1 | and it follows by induction that an
LU factorization exists. It is unique since any LU factorization must satisfy (2.16).
For the nonsingularity we need to show that un = 0. For then by Lemma 2.5, both
L and U are nonsingular, and this is equivalent to A = LU being nonsingular. If
an−1 = 0 then by (2.16) |un | > |dn | − |an−1 | ≥ 0 by weak diagonal dominance,
while if an−1 = 0 then again by (2.23) |un | ≥ |dn | > 0.
Consider now the special system T v = b given by (2.21). The matrix T is weakly
diagonally dominant and satisfies the additional conditions in Theorem 2.4. Thus it
is nonsingular and we can solve the system in O(n) arithmetic operations using the
algorithms trifactor and trisolve.
We could use the explicit inverse of T , given in Exercise 2.15, to compute the
solution of T v = b as v = T −1 b. However, this is not a good idea. In fact, all
elements in T −1 are nonzero and the calculation of T −1 b requires O(n2 ) operations.
40
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
2.3 An Eigenvalue Problem
Recall that if A ∈ Cn×n is a square matrix and Ax = λx for some nonzero x ∈
Cn , then λ ∈ C is called an eigenvalue and x an eigenvector. We call (λ, x) an
eigenpair of A.
2.3.1 The Buckling of a Beam
Consider a horizontal beam of length L located between 0 and L on the x-axis of
the plane. We assume that the beam is fixed at x = 0 and x = L and that a force F is
applied at (L, 0) in the direction towards the origin. This situation can be modeled
by the boundary value problem
Ry (x) = −Fy(x),
y(0) = y(L) = 0,
(2.24)
where y(x) is the vertical displacement of the beam at x, and R is a constant defined
by the rigidity of the beam. We can transform the problem to the unit interval [0, 1]
by considering the function u : [0, 1] → R given by u(t) := y(tL). Since u (t) =
L2 y (tL), the problem (2.24) then becomes
u (t) = −Ku(t),
u(0) = u(1) = 0,
K :=
F L2
.
R
(2.25)
Clearly u = 0 is a solution, but we can have nonzero solutions corresponding
to certain values of the K known as eigenvalues. The corresponding function u is
called an eigenfunction. If F = 0 then K = 0 and u = 0 is the only solution, but
if the force is increased it will reach a critical value where the beam will buckle and
maybe break. This critical value corresponds to the smallest eigenvalue of (2.25).
With u(t) = sin (πt) we find u (t) = −π 2 u(t) and this u is a solution if K = π 2 .
It can be shown that this is the smallest eigenvalue of (2.25) and solving for F we
2
find F = πL2R .
We can approximate this eigenvalue numerically. Choosing m ∈ N, h := 1/(m +
1) and using for the second derivative the approximation
u (j h) ≈
u((j + 1)h) − 2u(j h) + u((j − 1)h)
,
h2
j = 1, . . . , m,
(this is the same finite difference approximation as in Sect. 2.2) we obtain
−vj −1 + 2vj − vj +1
= Kvj ,
h2
j = 1, . . . , m, h =
1
,
m+1
v0 = vm+1 = 0,
2.4 The Eigenpairs of the 1D Test Matrix
41
where vj ≈ u(j h) for j = 0, . . . , m + 1. If we define λ := h2 K then we obtain the
equation
T v = λv, with v = [v1 , . . . , vm ]T ,
(2.26)
and
⎤
2 −1 0
⎥
⎢ −1 2 −1
⎥
⎢
⎥
⎢
.. .. ..
⎥
⎢
.
.
.
0
⎥ ∈ Rm×m .
⎢
:= tridiagm (−1, 2, −1)− = ⎢
⎥
⎢
0⎥
⎥
⎢
⎣
−1 2 −1 ⎦
0 −1 2
(2.27)
⎡
T = Tm
The problem now is to determine the eigenvalues of T . Normally we would need
a numerical method to determine the eigenvalues of a matrix, but for this simple
problem the eigenvalues can be determined exactly. We show in the next subsection
that the smallest eigenvalue of (2.26) is given by λ = 4 sin2 (πh/2). Since λ =
2
2
h2 K = h FRL we can solve for F to obtain
F =
4 sin2 (πh/2)R
.
h2 L2
For small h this is a good approximation to the value
π 2R
L2
we computed above.
2.4 The Eigenpairs of the 1D Test Matrix
The second derivative matrix T = tridiag(−1, 2, −1) is a special case of the
tridiagonal matrix
T 1 := tridiag(a, d, a)
(2.28)
where a, d ∈ R. We call this the 1D test matrix. It is symmetric and strictly
diagonally dominant if |d| > 2|a|.
We show that the eigenvectors are the columns of the sine matrix defined by
j kπ
S = sin
m+1
m
∈ Rm×m .
j,k=1
(2.29)
42
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
For m = 3,
⎡
⎡
⎤
3π ⎤
sin π4 sin 2π
t 1 t
4 sin 4
4π
6π ⎦ = ⎣
S = [s 1 , s 2 , s 3 ] = ⎣sin 2π
1 0 −1⎦ ,
4 sin 4 sin 4
3π
6π
9π
t −1 t
sin 4 sin 4 sin 4
1
t := √ .
2
Lemma 2.2 (Eigenpairs of 1D Test Matrix) Suppose T 1 = (tkj )k,j
tridiag(a, d, a) ∈ Rm×m with m ≥ 2, a, d ∈ R, and let h = 1/(m + 1).
=
1. We have T 1 s j = λj s j for j = 1, . . . , m, where
s j = [sin(j πh), sin(2j πh), . . . , sin(mj πh)]T ,
(2.30)
λj = d + 2a cos(j πh).
(2.31)
2. The eigenvalues are distinct and the eigenvectors are orthogonal
s Tj s k =
m+1
1
δj,k =
δj,k ,
2
2h
j, k = 1, . . . , m.
(2.32)
Proof We find for 1 < k < m
m
(T 1 s j )k =
tk,l sin(lj πh)
l=1
= a sin (k − 1)j πh + sin (k + 1)j πh + d sin(kj πh)
= 2a cos(j πh) sin(kj πh) + d sin(kj πh) = λj sk,j .
This also holds for k = 1, m, and part 1 follows. Since j πh = j π/(m + 1) ∈ (0, π)
for j = 1, . . . , m and the cosine function is strictly monotone decreasing on (0, π)
the eigenvalues are distinct, and since T 1 is symmetric it follows from Lemma 2.3
below that the eigenvectors s j are orthogonal. To finish the proof of (2.32) we
compute
m
m
sin2 (kj πh) =
s Tj s j =
k=1
=
m+1 1
−
2
2
sin2 (kj πh) =
k=0
m
cos(2kj πh) =
k=0
1
2
m+1
,
2
m
k=0
1 − cos(2kj πh)
2.5 Block Multiplication and Triangular Matrices
43
since the last cosine sum is zero.√We show this by summing a geometric series of
complex exponentials. With i = −1 we find
m
m
cos(2kj πh) + i
k=0
m
e2ikj πh =
sin(2kj πh) =
k=0
k=0
e2i(m+1)j πh − 1
= 0,
e2ij πh − 1
and (2.32) follows.
T
Recall that the conjugate transpose of a matrix is defined by A∗ := A , where
A is obtained from A by taking the complex conjugate of all elements. A matrix
A ∈ Cn×n is Hermitian if A∗ = A. A real symmetric matrix is Hermitian.
Lemma 2.3 (Eigenpairs of a Hermitian Matrix) The eigenvalues of a Hermitian
matrix are real. Moreover, eigenvectors corresponding to distinct eigenvalues are
orthogonal.
Proof Suppose A∗ = A and Ax = λx with x = 0. We multiply both sides of
∗
Ax = λx from the left by x ∗ and divide by x ∗ x to obtain λ = xx ∗Ax
x . Taking
(x ∗ Ax)∗
x ∗ A∗ x
x ∗ Ax
∗
complex conjugates we find λ = λ = (x ∗ x)∗ = x ∗ x = x ∗ x = λ, and λ is real.
Suppose that (λ, x) and (μ, y) are two eigenpairs for A with μ = λ. Multiplying
Ax = λx by y ∗ gives
λy ∗ x = y ∗ Ax = (x ∗ A∗ y)∗ = (x ∗ Ay)∗ = (μx ∗ y)∗ = μy ∗ x,
using that μ is real. Since λ = μ it follows that y ∗ x = 0, which means that x and y
are orthogonal.
2.5 Block Multiplication and Triangular Matrices
Block multiplication is a powerful and essential tool for dealing with matrices. It
will be used extensively in this book. We will also need some basic facts about
triangular matrices.
2.5.1 Block Multiplication
A rectangular matrix A can be partitioned into submatrices by drawing horizontal
lines between selected rows and vertical lines between selected columns. For
example, the matrix
⎡
⎤
123
A = ⎣4 5 6⎦
789
44
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
can be partitioned as
⎤
⎡
123
A11 A12
(i)
= ⎣4 5 6⎦,
A21 A22
789
⎡ T⎤ ⎡
⎤
123
a 1:
(iii) ⎣a T2: ⎦ = ⎣ 4 5 6 ⎦ ,
a T3:
789
⎡
⎤
123
(ii) a :1 , a :2 , a :3 = ⎣ 4 5 6 ⎦ ,
789
⎡
⎤
123
(iv) A11 , A12 = ⎣ 4 5 6 ⎦ .
789
In (i) the matrix A is divided into four submatrices
4
56
, and A22 =
,
A11 = 1 , A12 = 2, 3 , A21 =
7
89
while in (ii) and (iii) A has been partitioned into columns and rows, respectively.
The submatrices in a partition are often referred to as blocks and a partitioned matrix
is sometimes called a block matrix.
In the following we assume that A ∈ Cm×p and B ∈ Cp×n . Here are some rules
and observations for block multiplication.
1. If B = b:1 , . . . , b :n is partitioned into columns then the partition of the product
AB into columns is
AB = Ab:1 , Ab:2 , . . . , Ab:n .
In particular, if I is the identity matrix of order p then
A = AI = A e1 , e2 , . . . , ep = Ae1 , Ae2 , . . . , Aep
and we see that column j of A can be written Aej for j = 1, . . . , p.
2. Similarly, if A is partitioned into rows then
⎡ T ⎤
T⎤
a1:
a1: B
⎢ aT ⎥
⎢ aT B ⎥
⎢ 2: ⎥
⎢ 2: ⎥
AB = ⎢ . ⎥ B = ⎢ . ⎥ ,
⎣ .. ⎦
⎣ .. ⎦
T
T B
am:
am:
⎡
and taking A = I it follows that row i of B can be written eTi B for i = 1, . . . , m.
3. It is often useful to write the matrix-vector product Ax as a linear combination
of the columns of A
Ax = x1 a :1 + x2 a :2 + · · · + xp a :p .
2.5 Block Multiplication and Triangular Matrices
45
4. If B = B 1 , B 2 , where B 1 ∈ Cp×r and B 2 ∈ Cp×(n−r) then
A B 1 , B 2 = AB 1 , AB 2 .
This follows
from Rule 1. by an appropriate grouping of columns.
A1
5. If A =
, where A1 ∈ Ck×p and A2 ∈ C(m−k)×p then
A2
A1 B
A1
B=
.
A2
A2 B
This follows from Rule 2. bya grouping
of rows.
B1
, where A1 ∈ Cm×s , A2 ∈ Cm×(p−s) , B 1 ∈
6. If A = A1 , A2 and B =
B2
Cs×n and B 2 ∈ C(p−s)×n then
B1
= A1 B 1 + A2 B 2 .
A1 , A2
B2
Indeed,
p
(AB)ij =
aik bkj =
k=1
p
s
aik bkj +
k=1
aik bkj
k=s+1
= (A1 B 1 )ij + (A2 B 2 )ij = (A1 B 1 + A2 B 2 )ij .
A11 A12
B 11 B 12
7. If A =
and B =
then
A21 A22
B 21 B 22
A11 A12
A21 A22
B 11 B 12
A11 B 11 + A12 B 21 A11 B 12 + A12 B 22
=
,
B 21 B 22
A21 B 11 + A22 B 21 A21 B 12 + A22 B 22
provided the vertical partition in A matches the horizontal one in B, i.e. the
number of columns in A11 and A21 equals the number of rows in B 11 and B 12
and the number of columns in A equals the number of rows in B. To show this
we use Rule 4. to obtain
A A
B 12
A11 A12 B 11
, 11 12
.
AB =
A21 A22 B 21
A21 A22 B 22
We complete the proof using Rules 5. and 6.
46
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
8. Consider finally the general case. If all the matrix products Aik B kj in
s
C ij =
Aik B kj ,
i = 1, . . . , p, j = 1, . . . , q
k=1
are well defined then
⎡
A11 · · ·
⎢ ..
⎣ .
⎤⎡
B 11 · · ·
A1s
.. ⎥ ⎢ ..
. ⎦⎣ .
Ap1 · · · Aps
B s1
⎤ ⎡
⎤
C 11 · · · C 1q
B 1q
.. ⎥ = ⎢ ..
.. ⎥ .
. ⎦ ⎣ .
. ⎦
· · · B sq
C p1 · · · C pq
The requirements are that
• the number of columns in A is equal to the number of rows in B.
• the position of the vertical partition lines in A has to mach the position of the
horizontal partition lines in B. The horizontal lines in A and the vertical lines
in B can be anywhere.
2.5.2 Triangular Matrices
We need some basic facts about triangular matrices and we start with
Lemma 2.4 (Inverse of a Block Triangular Matrix) Suppose
A11 A12
A=
0 A22
where A, A11 and A22 are square matrices. Then A is nonsingular if and only if
both A11 and A22 are nonsingular. In that case
−1
A11 C
−1
A =
,
(2.33)
0 A−1
22
for some matrix C.
Proof Suppose A is nonsingular. We partition B := A−1 conformally with A and
have
B 11 B 12 A11 A12
I 0
BA =
=
=I
B 21 B 22
0 A22
0I
Using block-multiplication we find
B 11 A11 = I , B 21 A11 = 0, B 21 A12 + B 22 A22 = I , B 11 A12 + B 12 A22 = 0.
2.5 Block Multiplication and Triangular Matrices
47
The first equation implies that A11 is nonsingular, this in turn implies that B 21 =
0A−1
11 = 0 in the second equation, and then the third equation simplifies to
B 22 A22 = I . We conclude that also A22 is nonsingular. From the fourth equation
we find
−1
B 12 = C = −A−1
11 A12 A22 .
Conversely, if A11 and A22 are nonsingular then
−1
−1 A11 A12
I 0
A11 −A−1
11 A12 A22
=
=I
0
A−1
0
A
0I
22
22
and A is nonsingular with the indicated inverse.
Consider now a triangular matrix.
Lemma 2.5 (Inverse of a Triangular Matrix) An upper (lower) triangular matrix
A = [aij ] ∈ Cn×n is nonsingular if and only if the diagonal elements aii ,
i = 1, . . . , n are nonzero. In that case the inverse is upper (lower) triangular with
diagonal elements aii−1 , i = 1, . . . , n.
Proof We use induction on n. The result holds for n = 1. The 1-by-1 matrix A =
−1
]. Suppose
[a11] is nonsingular if and only if a11 = 0 and in that case A−1 = [a11
(k+1)×(k+1)
the result holds for n = k and let A ∈ C
be upper triangular. We partition
A in the form
ak
Ak
A=
0 ak+1,k+1
and note that Ak ∈ Ck×k is upper triangular. By Lemma 2.4 A is nonsingular if and
only if Ak and (ak+1,k+1 ) are nonsingular and in that case
−1
A
c
A−1
k
=
,
−1
0 ak+1,k+1
for some c ∈ Cn . By the induction hypothesis Ak is nonsingular if and only if the
diagonal elements a11, . . . , akk of Ak are nonzero and in that case A−1
k is upper
−1
triangular with diagonal elements aii , i = 1, . . . , k. The result for A follows. Lemma 2.6 (Product of Triangular Matrices) The product C = AB = (cij ) of
two upper (lower) triangular matrices A = (aij ) and B = (bij ) is upper (lower)
triangular with diagonal elements cii = aii bii for all i.
Proof Exercise.
A matrix is called unit triangular if it is triangular with 1’s on the diagonal.
48
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
Lemma 2.7 (Unit Triangular Matrices) For a unit upper (lower) triangular
matrix A ∈ Cn×n :
1. A is nonsingular and the inverse is unit upper(lower) triangular.
2. The product of two unit upper (lower) triangular matrices is unit upper (lower)
triangular.
Proof 1. follows from Lemma 2.5, while Lemma 2.6 implies 2.
2.6 Exercises Chap. 2
2.6.1 Exercises Sect. 2.1
Exercise 2.1 (The Shifted Power Basis Is a Basis) Show that the polynomials
{(x − xi )j }0≤j ≤n is a basis for polynomials of degree n.3
Exercise 2.2 (The Natural Spline, n = 1) How can one define an N-spline when
n = 1?
Exercise 2.3 (Bounding the Moments) Show that for the N-spline the solution of
the linear system (2.11) is bounded as follows4 :
max |μj | ≤
2≤j ≤n
3
max |yi+1 − 2yi + yi−1 |.
h2 2≤i≤n
Exercise 2.4 (Moment Equations for 1. Derivative Boundary Conditions) Suppose instead of the D2 boundary conditions we use D1 conditions given by g (a) =
s1 and g (b) = sn+1 for some given numbers s1 and sn+1 . Show that the linear
system for the moments of a D1 -spline can be written
⎡
⎡
2
⎢1
⎢
⎢
⎢
⎢
⎣
1
4 1
.. .. ..
. . .
1 4
1
⎤⎡
⎤
y2 − y1 − hs1
δ 2 y2
δ 2 y3
..
.
⎤
⎢
⎥
⎢
⎥
⎢
⎥
⎥
⎥⎢
⎢
⎥
⎥
⎥⎢
⎥
6 ⎢
⎥
⎥⎢
⎢
⎥,
⎥= 2⎢
⎥⎢
⎥
⎥ h ⎢
⎥⎢
⎥
⎢
⎥
δ 2 yn−1
1⎦ ⎣ μn ⎦
⎢
⎥
⎣
⎦
μn+1
δ 2 yn
2
hsn+1 − yn+1 + yn
μ1
μ2
..
.
(2.34)
where δ 2 yi := yi+1 − 2yi + yi−1 , i = 2, . . . , n. Hint: Use (2.10) to compute g (x1 )
and g (xn+1 ), Is g unique?
3 Hint:
4 Hint,
consider an arbitrary polynomial of degree n and expanded it in Taylor series around xi .
use Theorem 2.2.
2.6 Exercises Chap. 2
49
Exercise 2.5 (Minimal Norm Property of the Natural Spline) Study proof of the
following theorem.5
Theorem 2.5 (Minimal Norm Property of a Cubic Spline) Suppose g is an Nspline. Then
b
2
g (x) dx ≤
a
b
2
h (x) dx
a
for all h ∈ C 2 [a, b] such that h(xi ) = g(xi ), i = 1, . . . , n + 1.
Proof Let h be any interpolant as in the theorem. We first show the orthogonality
condition
b
g e = 0,
e := h − g.
(2.35)
a
b
b
b
Integration by parts gives a g e = g e a − a g e . The first term is zero since
g is continuous and g (b) = g (a) = 0. For the second term, since g is equal to
a constant vi on each subinterval (xi , xi+1 ) and e(xi ) = 0, for i = 1, . . . , n + 1
b
a
n
g e =
xi+1
g e =
xi
i=1
n
xi+1
vi
i=1
xi
n
e =
vi e(xi+1 ) − e(xi ) = 0.
i=1
Writing h = g + e and using (2.35)
b
2
h ) =
a
b
2
g ) +
b
2
g ) +
a
=
a
and the proof is complete.
g + e )2
a
=
b
b
2
e ) +2
b
2
e ) ≥
a
a
b
g e
a
b
2
g )
a
Exercise 2.6 (Computing the D2 -Spline) Let g be the D2 -spline corresponding to
an interval [a, b], a vector y ∈ Rn+1 and μ1 , μn+1 . The vector x = [x1 , . . . , xn ] and
5 The name spline is inherited from a“physical analogue”, an elastic ruler that is used to draw
smooth curves. Heavy weights, called ducks, are used to force the ruler to pass through, or near
given locations. The ruler will take a shape that minimizes its potential energy. Since the potential
energy is proportional to the integral of the square of the curvature, and the curvature can be
approximated by the second derivative it follows from Theorem 2.5 that the mathematical N-spline
approximately models the physical spline.
50
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
the coefficient matrix C ∈ Rn×4 in (2.7) are returned in the following algorithm. It
uses Algorithms 2.1 and 2.2 to solve the tridiagonal linear system.
function [x,C]=splineint(a,b,y,mu1,munp1)
% [x,C]=splineint(a,b,y,mu1,munp1)
y=y(:); n=length(y)-1;
h=(b-a)/n; x=a:h:b-h; c=ones(n-2,1);
[l,u]= trifactor(c,4*ones(n-1,1),c);
b1=6/h^2*(y(3:n+1)-2*y(2:n)+y(1:n-1));
b1(1)=b1(1)-mu1; b1(n-1)=b1(n-1)-munp1;
mu= [mu1;trisolve(l,u,c,b1);munp1];
C=zeros(4*n,1);
C(1:4:4*n-3)=y(1:n);
C(2:4:4*n-2)=(y(2:n+1)-y(1:n))/h...
-h*mu(1:n)/3-h*mu(2:n+1)/6;
C(3:4:4*n-1)=mu(1:n)/2;
C(4:4:4*n)=(mu(2:n+1)-mu(1:n))/(6*h);
C=reshape(C,4,n)’;
end
Listing 2.3 splineint
Use the algorithm to compute the ci,j in Example 2.2.
Exercise 2.7 (Spline Evaluation) To plot a piecewise polynomial g in the
form (2.4) we need to compute values g(rj ) at a number of sites r = [r1 , . . . , rm ] ∈
Rm for some reasonably large integer m. To determine g(rj ) for some j we need to
find an integer ij so that g(rj ) = pij (rj ).
Given k ∈ N, t = [t1 , . . . , tk ] and a real number x. We consider the problem
of computing an integer i so that i = 1 if x < t2 , i = k if x ≥ tk , and ti ≤
x < ti+1 otherwise. If x ∈ Rm is a vector then an m-vector i should be computed,
such that the j th component of i gives the location of the j th component of x.
The following MATLAB function determines i = [i1 , . . . , im ]. It uses the built in
MATLAB functions length, min, sort, find.
function i = findsubintervals(t,x)
%i = findsubintervals(t,x)
k=length(t); m=length(x);
if k<2
i=ones(m,1);
else
t(1)=min(x(1),t(1))-1;
[~,j]=sort([t(:)’,x(:)’]);
i=(find(j>k)-(1:m))’;
end
Listing 2.4 findsubintervals
2.6 Exercises Chap. 2
51
Fig. 2.5 The cubic spline interpolating f (x) = arctan(10x) + π/2 at 14 equidistant sites on
[−1, 1]. The exact function is also shown
Use findsubintervals and the algorithm splineval below to make the
plots in Fig. 2.5.
function [X,G]=splineval(x,C,X)
% [X,G]=splineval(x,C,X)
m=length(X);
i=findsubintervals(x,X);
G=zeros(m,1);
for j=1:m
k=i(j);
t=X(j)-x(k);
G(j)=[1,t,t^2,t^3]*C(k,:)’;
end
Listing 2.5 splineval
Given output x, C of splineint, defining a cubic spline g, and a vector X,
splinevalăcomputes the vector G = g(X).
52
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
2.6.2 Exercises Sect. 2.2
Exercise 2.8 (Central Difference Approximation of 2. Derivative) Consider
δ 2 f (x) :=
f (x + h) − 2f (x) + f (x − h)
,
h2
h > 0,
f : [x − h, x + h] → R.
a) Show using Taylor expansion that if f ∈ C 2 [x − h, x + h] then for some η2
δ 2 f (x) = f (η2 ),
x − h < η2 < x + h.
b) Show that if f ∈ C 4 [x − h, x + h] then for some η4
δ 2 f (x) = f (x) +
h2 (4)
f (η4 ),
12
x − h < η4 < x + h.
δ 2 f (x) is known as the central difference approximation to the second derivative
at x.
Exercise 2.9 (Two Point Boundary Value Problem) We consider a finite difference method for the two point boundary value problem
−u (x) + r(x)u (x) + q(x)u(x) = f (x), for x ∈ [a, b],
u(a) = g0 ,
(2.36)
u(b) = g1 .
We assume that the given functions f, q and r are continuous on [a, b] and that
q(x) ≥ 0 for x ∈ [a, b]. It can then be shown that (2.36) has a unique solution u.
To solve (2.36) numerically we choose m ∈ N, h = (b −a)/(m+1), xj = a +j h
for j = 0, 1, . . . , m + 1 and solve the difference equation
vj +1 − vj −1
−vj −1 + 2vj − vj +1
+ q(xj )vj = f (xj ),
+ r(xj )
h2
2h
j = 1, . . . , m,
(2.37)
with v0 = g0 and vm+1 = g1 .
a) Show that (2.37) leads to a tridiagonal linear system Av = b, where A =
tridiag(aj , dj .cj ) ∈ Rm×m has elements
aj = −1 −
h
h
r(xj ), cj = −1 + r(xj ), dj = 2 + h2 q(xj ),
2
2
2.6 Exercises Chap. 2
53
and
⎧
2
⎪
⎪
⎨h f (x1 ) − a1 g0 ,
bj = h2 f (xj ),
⎪
⎪
⎩h2 f (x ) − c g ,
m
m 1
if j = 1,
if 2 ≤ j ≤ m − 1,
if j = m.
b) Show that the linear system satisfies the conditions in Theorem 2.4 if the spacing
h is so small that h2 |r(x)| < 1 for all x ∈ [a, b].
(c) Propose a method to find v1 , . . . , vm .
Exercise 2.10 (Two Point Boundary Value Problem; Computation)
a) Consider the problem (2.36) with r = 0, f = q = 1 and boundary conditions
u(0) = 1, u(1) = 0. The exact solution is u(x) = 1 − sinh x/ sinh 1. Write a
computer program to solve (2.37) for h = 0.1, 0.05, 0.025, 0.0125, and compute
the “error” max1≤j ≤m |u(xj ) − vj | for each h.
b) Make a combined plot of the solution u and the computed points vj , j =
0, . . . , m + 1 for h = 0.1.
c) One can show that the error is proportional to hp for some integer p. Estimate p
based on the error for h = 0.1, 0.05, 0.025, 0.0125.
2.6.3 Exercises Sect. 2.3
Exercise 2.11 (Approximate Force) Show that
F =
π 2R
4 sin2 (πh/2)R
=
+ O(h2 ).
h2 L2
L2
Exercise 2.12 (Symmetrize Matrix (Exam Exercise 1977-3)) Let A ∈ Rn×n be
tridiagonal and suppose ai,i+1 ai+1,i > 0 for i = 1, . . . , n − 1. Show that there
exists a diagonal matrix D = diag(d1 , . . . , dn ) with di > 0 for all i such that
B := DAD −1 is symmetric.
2.6.4 Exercises Sect. 2.4
Exercise 2.13 (Eigenpairs T of Order 2) Compute directly the eigenvalues and
eigenvectors for T when n = 2 and thus verify Lemma 2.2 in this case.
54
2 Diagonally Dominant Tridiagonal Matrices; Three Examples
Exercise 2.14 (LU Factorization of 2. Derivative Matrix) Show that T = LU ,
where
⎡
1
⎢ 1
⎢−
⎢ 2
⎢
L=⎢ 0
⎢
⎢ .
⎣ ..
0
0 ··· ···
.
1 ..
..
.
− 23 1
.. ..
..
. .
.
m−1
··· 0 − m
⎤
⎡
2 −1
0
.. ⎥
⎢ 3
⎢0
.⎥
⎥
⎢ 2
.. ⎥ , U = ⎢ .. . .
⎥
⎢. .
.⎥
⎢
⎥
⎢ ..
⎣.
0⎦
1
0 ···
⎤
··· 0
. ⎥
.
−1 . . .. ⎥
⎥
⎥
.. ..
.
. 0 ⎥
.
⎥
⎥
.. m
. m−1 −1 ⎦
· · · 0 m+1
m
0
(2.38)
This is the LU factorization of T .
Exercise 2.15 (Inverse of the 2. Derivative Matrix) Let S ∈ Rm×m have elements
sij given by
si,j = sj,i =
1
j m+1−i ,
m+1
1 ≤ j ≤ i ≤ m.
(2.39)
Show that ST = I and conclude that T −1 = S.
2.6.5 Exercises Sect. 2.5
Exercise 2.16 (Matrix Element as a Quadratic Form) For any matrix A show
that aij = eTi Aej for all i, j .
Exercise 2.17 (Outer Product Expansion of a Matrix) For any matrix A ∈ Cm×n
n
T
show that A = m
i=1
j =1 aij e i e j .
Exercise 2.18 (The Product AT A) Let B = AT A. Explain why this product is
defined for any matrix A. Show that bij = a T:i a :j for all i, j .
Exercise 2.19 (Outer Product Expansion) For A ∈ Rm×n and B ∈ Rp×n show
that
AB T = a :1 bT:1 + a :2 bT:2 + · · · + a :n bT:n .
This is called the outer product expansion of the columns of A and B.
Exercise 2.20 (System with Many Right Hand Sides; Compact Form) Suppose
A ∈ Rm×n , B ∈ Rm×p , and X ∈ Rn×p . Show that
AX = B
⇐⇒
Ax :j = b:j , j = 1, . . . , p.
2.7 Review Questions
55
Exercise 2.21 (Block Multiplication Example) Suppose A = A1 , A2 and B =
B1
. When is AB = A1 B 1 ?
0
Exercise 2.22 (Another Block Multiplication Example) Suppose A, B, C ∈
Rn×n are given in block form by
A :=
λ aT
,
0 A1
B :=
T
10
,
0 B1
C :=
T
10
,
0 C1
where A1 , B 1 , C 1 ∈ R(n−1)×(n−1) . Show that
λ aT B1
CAB =
.
0 C 1 A1 B 1
2.7 Review Questions
2.7.1
2.7.2
2.7.3
2.7.4
2.7.5
2.7.6
2.7.7
2.7.8
2.7.9
How do we define nonsingularity of a matrix?
Define the second derivative matrix T . How did we show that it is
nonsingular?
Why do we not use the explicit inverse of T to solve the linear system
T x = b?
What are the eigenpairs of the matrix T ?
Why are the diagonal
1 1+i elements of a Hermitian matrix real?
Is the matrix 1+i
Hermitian? Symmetric?
2
Is a weakly diagonally dominant matrix nonsingular?
Is a strictly diagonally dominant matrix always nonsingular?
Does a tridiagonal matrix always have an LU factorization?
Chapter 3
Gaussian Elimination and LU
Factorizations
In this chapter we first review Gaussian elimination. Gaussian elimination leads
to an LU factorization of the coefficient matrix or more generally to a PLU
factorization, if row interchanges are introduced. Here P is a permutation matrix, L
is lower triangular and U is upper triangular.
We also consider in great detail the general theory of LU factorizations.
3.1 3 by 3 Example
Gaussian elimination with row interchanges is the classical method for solving n
linear equations in n unknowns.1 We first recall how it works on a 3 × 3 system.
Example 3.1 (Gaussian Elimination on a 3 × 3 System) Consider a nonsingular
system of three equations in three unknowns:
(1)
(1)
(1)
a11
x1 + a12
x2 + a13
x3 = b1(1),
I
(1)
(1)
(1)
x1 + a22
x2 + a23
x3 = b2(1),
a21
II
(1)
(1)
(1)
x1 + a32
x2 + a33
x3 = b3(1).
a31
III.
1 The
method was known long before Gauss used it in 1809. It was further developed by Doolittle
in 1881, see [4].
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_3
57
58
3 Gaussian Elimination and LU Factorizations
(1)
(1)
To solve this system by Gaussian elimination suppose a11 = 0. We subtract l21 :=
(1) (1)
(1)
(1) (1)
a21
/a11 times equation I from equation II and l31
:= a31
/a11 times equation I
from equation III. The result is
(1)
(1)
(1)
a11
x1 + a12
x2 + a13
x3 = b1(1),
I
II
(2)
(2)
x2 + a23
x3 = b2(2),
a22
(2)
(2)
III ,
(2)
a32 x2 + a33 x3 = b3 ,
(2)
where bi
(1)
= bi
(1) (1)
− li1 bi
(2)
(1)
(1) (1)
for i = 2, 3 and aij = aij − li,1 a1j for i, j = 2, 3.
(1)
(1)
(1)
= 0 and a21
= 0 we first interchange equation I and equation II. If a11
=
If a11
(1)
a21 = 0 we interchange equation I and III. Since the system is nonsingular the first
column cannot be zero and an interchange is always possible.
(2)
(2)
(2) (2)
If a22 = 0 we subtract l32 := a32 /a22 times equation II from equation III to
obtain
(1)
(1)
(1)
x1 + a12
x2 + a13
x3 = b1(1),
a11
(2)
(2)
x2 + a23
x3 = b2(2),
a22
(3)
a33
x3 = b3(3),
I
II
III ,
(3)
(2)
(2) (2)
(2) (2)
(2)
(2)
where a33
= a33
− l32
a23 and b3(3) = b3(2) − l32
b2 . If a22
= 0 then a32
=0
(cf. Sect. 3.4) and we first interchange equation II and equation III . The reduced
system is easy to solve since it is upper triangular. Starting from the bottom and
moving upwards we find
(3)
(3)
x3 = b3 /a33
(2)
(2)
(1)
(1)
(2)
x2 = (b2 − a23 x3 )/a22
(1)
(1)
x1 = (b1 − a12 x2 − a13 x3 )/a11 .
This is known as back substitution. Gauss elimination leads to an LU factorization.
(k)
Indeed, if akk = 0, k = 1, 2 then
⎡
1
⎤
⎤⎡
(1) (1) (1)
0 0
a12 a13
a11
⎥⎢
(2) (2) ⎥
1 0⎦ ⎣ 0 a22
a23 ⎦
(2)
(3)
l32 1
0 0 a33
⎢ (1)
LU := ⎣l21
(1)
l31
⎤ ⎡
⎤
⎡
(1)
(1)
(1)
(1) (1) (1)
a12
a13
a12 a13
a11
a11
⎥ ⎢ (1) (1) (1)⎥
⎢ (1) (1) (1) (1)
(2)
(1) (1)
(2)
= ⎣l21
a11 l21 a12 + a22
l21 a13 + a23
⎦=⎣a21 a22 a23 ⎦= A.
(1) (1) (1) (1)
(2) (2) (1) (1)
(2) (2)
(3)
(1) (1) (1)
l31 a11 l31 a12 + l32 a22 l31 a13 + l32 a23 + a33
a31 a32 a33
3.2 Gauss and LU
59
Thus Gaussian elimination leads to an LU factorization of the coefficient matrix
A(1) (cf. the proof of Theorem 3.2).
3.2 Gauss and LU
In Gaussian elimination without row interchanges we start with a linear system
Ax = b and generate a sequence of equivalent systems A(k) x = b(k) for k =
1, . . . , n, where A(1) = A, b(1) = b, and A(k) has zeros under the diagonal in its
first k − 1 columns. Thus A(n) is upper triangular and the system A(n) x = b(n) is
easy to solve. The process is illustrated in Fig. 3.1.
The matrix A(k) takes the form
⎤
⎡
(1)
(1)
(1)
(1)
(1)
a1,1 · · · a1,k−1 a1,k · · · a1,j · · · a1,n
⎢
..
..
..
.. ⎥
..
⎥
⎢
.
⎢
.
.
.
. ⎥
⎢
(k−1)
(k−1)
(k−1)
(k−1) ⎥
⎥
⎢
ak−1,k
· · · ak−1,j
· · · ak−1,n
ak−1,k−1
⎥
⎢
⎢
(k)
(k)
(k) ⎥
a
·
·
·
a
·
·
·
a
⎢
k,k
k,j
k,n ⎥
⎥
A(k) = ⎢
(3.1)
..
..
.. ⎥ .
⎢
⎥
⎢
.
.
.
⎥
⎢
(k)
(k)
(k)
⎢
ai,k · · · ai,j · · · ai,n ⎥
⎥
⎢
⎢
..
..
.. ⎥
⎥
⎢
.
.
. ⎦
⎣
(k)
(k)
(k)
an,k · · · an,j · · · an,n
The process transforming A(k) into A(k+1) for k = 1, . . . , n − 1 can be described
as follows.
for i = k + 1 : n
(k)
(k) (k)
lik
= aik
/akk
(3.2)
for j = k : n
(k+1)
aij
A(1)
Fig. 3.1 Gaussian elimination
A(2)
(k)
(k) (k)
= aij − lik akj
A(k)
A(n)
60
3 Gaussian Elimination and LU Factorizations
(k+1)
(k)
For j = k it follows from (3.2) that aik
= aik −
(k)
aik
(k)
akk
(k)
akk = 0 for i =
k + 1, . . . , n. Thus A(k+1) will have zeros under the diagonal in its first k columns
(k)
and the elimination is carried one step further. The numbers lik in (3.2) are called
multipliers.
Gaussian elimination with no row interchanges is valid if and only if the pivots
(k)
akk
are nonzero for k = 1, . . . , n − 1. This depends on certain submatrices of A
known as principal submatrices.
Definition 3.1 (Principal Submatrix) For k = 1, . . . , n the matrices A[k] ∈ Ck×k
given by
⎡
A[k]
a11 · · ·
⎢
:= A(1 : k, 1 : k) = ⎣ ...
⎤
ak1
.. ⎥
. ⎦
ak1 · · · akk
are called the leading principal submatrices of A ∈ Cn×n . More generally, a
matrix B ∈ Ck×k is called a principal submatrix of A if B = A(r, r), where
r = [r1 , . . . , rk ] for some 1 ≤ r1 < · · · < rk ≤ n. Thus,
bi,j = ari ,rj ,
i, j = 1, . . . , k.
The determinant of a (leading) principal submatrix is called a (leading) principal
minor.
A principal submatrix is leading if rj = j for j = 1, . . . , k. Also a principal
submatrix is special in that it uses the same rows and columns of A. For k = 1 The
only principal submatrices of order k = 1 are the diagonal elements of A.
!1 2 3"
Example 3.2 (Principal Submatrices) The principal submatrices of A = 4 5 6
789
are
[1], [5], [9],
1 2 1 3 5 6
4 5 , 7 9 , 8 9 , A.
The leading principal submatrices are
[1],
1 2
45
, A.
(k)
Theorem 3.1 We have ak,k = 0 for k = 1, . . . , n − 1 if and only if the leading
principal submatrices A[k] of A are nonsingular for k = 1, . . . , n − 1. Moreover
(1) (2)
(k)
a22 · · · akk
,
det(A[k] ) = a11
k = 1, . . . , n.
(3.3)
3.2 Gauss and LU
61
(k)
Proof Let B k = Ak−1 be the upper left k −1 corner of A(k) given by (3.1). Observe
that the elements of B k are computed from A by using only elements from A[k−1] .
Since the determinant of a matrix does not change under the operation of subtracting
a multiple of one row from another row the determinant of A[k] equals the product
(1)
(k)
· · · akk
= 0 for k =
of diagonal elements of B k+1 and (3.3) follows. But then a11
1, . . . , n − 1 if and only if det(A[k] ) = 0 for k = 1, . . . , n − 1, or equivalently A[k]
is nonsingular for k = 1, . . . , n − 1.
Gaussian elimination is a way to compute the LU factorization of the coefficient
matrix.
Theorem 3.2 Suppose A ∈ Cn×n and that the leading principal submatrices A[k]
are nonsingular for k = 1, . . . , n − 1. Then Gaussian elimination with no row
interchanges results in an LU factorization of A. In particular A = LU , where
⎡
⎤
1
⎢ (1)
⎥
⎢l21 1
⎥
⎥
L=⎢
.. ⎥ ,
⎢ ..
. ⎦
⎣ .
(1) (2)
ln1 ln2 · · · 1
⎡ (1)
a11 · · ·
⎢
..
U =⎣
.
(1) ⎤
a1n
.. ⎥ ,
. ⎦
(3.4)
(n)
ann
(j )
where the lij and aij(i) are given by (3.2).
Proof From (3.2) we have for all i, j
(k) (k)
(k)
(k+1)
lik akj = aij − aij
(k) (j )
(j )
for k < min(i, j ), and lij ajj = aij for i > j.
Thus for i ≤ j we find
i−1
i−1
(k) (k)
(LU )ij =
(i)
lik akj + aij =
k=1
(k)
(k+1) (i)
(1)
aij − aij
+ aij = aij = aij ,
(3.5)
k=1
while for i > j
j −1
j −1
(k) (k)
lik
akj
(LU )ij =
k=1
(j )
+ lij ajj
=
(j )
aij(k) − aij(k+1) + aij = aij .
(3.6)
k=1
Note that this Theorem holds even if A is singular. Since L is nonsingular the
(n)
= 0 when A is singular.
matrix U is then singular, and we must have ann
62
3 Gaussian Elimination and LU Factorizations
3.3 Banded Triangular Systems
Once we know an LU factorization of A the system Ax = b is solved in two steps.
Since LU x = b we have Ly = b, where y := U x. We first solve Ly = b, for y
and then U x = y for x.
3.3.1 Algorithms for Triangular Systems
A nonsingular triangular linear system Ax = b is easy to solve. By Lemma 2.5 A
has nonzero diagonal elements. Consider first the lower triangular case. For n = 3
the system is
⎡
⎤⎡ ⎤ ⎡ ⎤
a11 0 0
b1
x1
⎣a21 a22 0 ⎦ ⎣x2 ⎦ = ⎣b2 ⎦ .
x3
b3
a31 a32 a33
From the first equation we find x1 = b1 /a11 . Solving the second equation for x2 we
obtain x2 = (b2 − a21 x1 )/a22 . Finally the third equation gives x3 = (b3 − a31 x1 −
a32 x2 )/a33 . This process is known as forward substitution. In general
k−1
xk = bk −
ak,j xj /akk ,
k = 1, 2, . . . , n.
(3.7)
j =1
When A is a lower triangular band matrix the number of arithmetic operations
necessary to find x can be reduced. Suppose A is a lower triangular d-banded,
/ {lk , lk + 1, . . . , k for k = 1, 2, . . . , n, and where
so that ak,j = 0 for j ∈
lk := max(1, k − d), see Fig. 3.2. For a lower triangular d-band matrix the
calculation in (3.7) can be simplified as follows
xk = bk −
k−1
ak,j xj /akk ,
k = 1, 2, . . . , n.
(3.8)
j =lk
Note that (3.8) reduces to (3.7) if d = n. Letting A(k, lk : (k − 1)) ∗ x(lk : (k − 1))
denote the sum k−1
j =lk akj xj we arrive at the following algorithm, where the initial
Fig. 3.2 Lower triangular
5 × 5 band matrices: d = 1
(left) and d = 2 right
⎡
a11
⎢a21
⎢
⎢ 0
⎣ 0
0
0
a22
a32
0
0
0
0
a33
a43
0
0
0
0
a44
a54
⎤
0
0 ⎥
⎥
0 ⎥,
0 ⎦
a55
⎡
a11
⎢a21
⎢
⎢a31
⎣ 0
0
0
a22
a32
a42
0
0
0
a33
a43
a53
0
0
0
a44
a54
⎤
0
0 ⎥
⎥
0 ⎥
0 ⎦
a55
3.3 Banded Triangular Systems
63
“r” in the name signals that this algorithm is row oriented. The algorithm takes a
nonsingular lower triangular d-banded matrix A ∈ Cn×n , and b ∈ Cn , as input, and
returns an x ∈ Cn so that Ax = b. For each k we take the inner product of a part of
a row with the already computed unknowns.
function x=rforwardsolve(A,b,d)
% x=rforwardsolve(A,b,d)
n=length(b); x=b;
x(1)=b(1)/A(1,1);
for k=2:n
lk=max(1,k-d);
x(k)=(b(k)-A(k,lk:(k-1))*x(lk:(k-1)))/A(k,k);
end
Listing 3.1 rforwardsolve
A system Ax = b, where A is upper triangular must be solved by back
substitution or ’bottom-up’. We first find xn from the last equation and then move
upwards for the remaining unknowns. For an upper triangular d-banded matrix this
leads to the following algorithm, which takes a nonsingular upper triangular dbanded matrix A ∈ Cn×n , b ∈ Cn and d, as input, and returns an x ∈ Cn so
that Ax = b.
function x=rbacksolve(A,b,d)
% x=rbacksolve(A,b,d)
n=length(b); x=b;
x(n)=b(n)/A(n,n);
for k=n-1:-1:1
uk=min(n,k+d);
x(k)=(b(k)-A(k,(k+1):uk)*x((k+1):uk))/A(k,k);
end
Listing 3.2 rbacksolve
Example 3.3 (Column Oriented Forwardsolve) In this example we develop a column oriented vectorized version of forward substitution. For a backward substitution see Exercise 3.1. Consider the system Ax = b, where A ∈ Cn×n is lower
triangular. Suppose after k − 1 steps of the algorithm we have a reduced system in
the form
⎡
ak,k
0
⎢ak+1,k ak+1,k+1
⎢
⎢ .
⎣ ..
an,k
···
···
..
.
0
0
..
.
· · · an×n
⎤⎡
⎤ ⎡
⎤
xk
bk
⎥ ⎢xk+1 ⎥ ⎢bk+1 ⎥
⎥⎢
⎥ ⎢
⎥
⎥⎢ . ⎥ = ⎢ . ⎥.
⎦ ⎣ .. ⎦ ⎣ .. ⎦
xn
bn
This system is of order n − k + 1. The unknowns are xk , . . . , xn .
64
3 Gaussian Elimination and LU Factorizations
We see that xk = bk /ak,k and eliminating xk from the remaining equations we
obtain a system of order n − k with unknowns xk+1 , . . . , xn
⎡
ak+1,k+1
0
⎢ak+2,k+1 ak+2,k+2
⎢
⎢
..
⎣
.
an,k+1
···
···
..
.
0
0
..
.
· · · an,n
⎤
⎡
⎤ ⎡
⎤
⎡
⎤
bk+1
ak+1,k
⎥ xk+1
⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥
⎥ ⎣ . ⎦ = ⎣ . ⎦ − xk ⎣ .. ⎦ .
⎦
xn
bn
an,k
Thus at the kth step, k = 1, 2, . . . n we set xk = bk /A(k, k) and update b as follows:
b((k + 1) : n) = b((k + 1) : n) − x(k) ∗ A((k + 1) : n, k).
This leads to the following algorithm for column oriented forward solve, which
takes a nonsingular lower triangular d-banded matrix A ∈ Cn×n , b ∈ Cn , and d as
input, and returns an x ∈ Cn so that Ax = b.
function x=cforwardsolve(A,b,d)
%x=cforwardsolve(A,b,d)
x=b; n=length(b);
for k=1:n-1
x(k)=b(k)/A(k,k); uk=min(n,k+d);
b((k+1):uk)=b((k+1):uk)-A((k+1):uk,k)*x(k);
end
x(n)=b(n)/A(n,n);
end
Listing 3.3 cforwardsolve
3.3.2 Counting Operations
It is useful to have a number which indicates the amount of work an algorithm
requires. In this book we measure this by estimating the total number of (complex)
arithmetic operations. We count both additions, subtractions, multiplications and
divisions, but not work on indices. As an example we show that the LU factorization
of a full matrix of order n using Gaussian elimination requires exactly
NLU :=
2 3 1 2 1
n − n − n
3
2
6
(3.9)
operations. Let M, D, A, S be the number of (complex) multiplications, divisions,
additions, and subtractions. In (3.2) the multiplications and subtractions occur in the
(k) (k)
calculation of aijk+1 = aij(k) − lik
akj which is carried out (n − k)2 times. Moreover,
3.3 Banded Triangular Systems
65
each calculation involves one subtraction and one multiplication. Thus we find M +
n−1
2
1
2
2
S = 2 n−1
k=1 (n − k) = 2
m=1 m = 3 n(n − 1)(n − 2 ). For each k there are n − k
n−1
1
divisions giving a sum of k=1 (n − k) = 2 n(n − 1). Since there are no additions
we obtain the total
M +D+A+S =
2
1
1
n(n − 1)(n − ) + n(n − 1) = NLU
3
2
2
given by (3.9).
We are only interested in NLU when n is large and for such n the term 23 n3
dominates. We therefore regularly ignore lower order terms and use number of
operations both for the exact count and for the highest order term. We also say
more loosely that the number of operations is O(n3 ). We will use the number of
operations counted in one of these ways as a measure of the complexity of an
algorithm and say that the complexity of LU factorization of a full matrix is O(n3 )
or more precisely 23 n3 .
We will compare the number of arithmetic operations of many algorithms with
the number of arithmetic operations of Gaussian elimination and define for n ∈ N
the number Gn as follows:
Definition 3.2 (Gn := 23 n3 ) We define Gn := 23 n3 .
There is a quick way to arrive at the leading term 2n3 /3. We only consider the
operations contributing to this term. In (3.2) the leading term comes from the inner
loop contributing to M +S. Then we replace sums by integrals letting the summation
indices be continuous variables and adjust limits of integration in an insightful way
to simplify the calculation. Thus,
n−1
M +S =2
n−1
(n − k) ≈ 2
2
1
k=1
n
(n − k) dk ≈ 2
2
(n − k)2 dk =
0
2 3
n
3
and this is the correct leading term.
Consider next NS , the number of forward plus backward substitutions. By (3.7)
we obtain
n
NS = 2
n
(2k − 1) ≈ 2
k=1
1
(2k − 1)dk ≈ 4
n
kdk = 2n2 .
0
The last integral actually give the exact value for the sum in this case (cf. (3.26)).
We see that LU factorization is an O(n3 ) process while solving a triangular
system requires O(n2 ) arithmetic operations. Thus, if n = 106 and one arithmetic operation requires c = 10−14 seconds of computing time then cn3 =
104 seconds ≈ 3 hours and cn2 = 0.01 second, giving dramatic differences in
computing time.
66
3 Gaussian Elimination and LU Factorizations
3.4 The PLU Factorization
Theorem 3.1 shows that Gaussian elimination
can fail on a
row
interchanges
without
nonsingular system. A simple example is 01 11 xx12 = 11 . We show here that any
nonsingular linear system can be solved by Gaussian elimination if we incorporate
row interchanges.
3.4.1 Pivoting
Interchanging two rows (and/or two columns) during Gaussian elimination is known
as pivoting. The element which is moved to the diagonal position (k, k) is called
the pivot element or pivot for short, and the row containing the pivot is called the
pivot row. Gaussian elimination with row pivoting can be described as follows.
1. Choose rk ≥ k so that ar(k)
= 0.
k ,k
2. Interchange rows rk and k of A(k) .
(k)
and aij(k+1) using (3.2).
3. Eliminate by computing lik
To show that Gaussian elimination can always be carried to completion by using
suitable row interchanges suppose by induction on k that A(k) is nonsingular. Since
A(1) = A this holds for k = 1. By Lemma 2.4 the lower right diagonal block in
A(k) is nonsingular. But then at least one element in the first column of that block
= 0. But then A(k+1)
must be nonzero and it follows that rk exists so that ar(k)
k ,k
is nonsingular since it is computed from A(k) using row operations preserving the
nonsingularity. We conclude that A(k) is nonsingular for k = 1, . . . , n.
3.4.2 Permutation Matrices
Row interchanges can be described in terms of permutation matrices.
Definition 3.3 A permutation matrix is a matrix of the form
P = I (:, p) = [ei1 , ei2 , . . . , ein ] ∈ Rn×n ,
where ei1 , . . . , ein is a permutation of the unit vectors e1 , . . . , en ∈ Rn .
Every permutation p = [i1 , . . . , in ]T of the integers 1, 2, . . . , n gives rise to a
permutation matrix and vice versa. Post-multiplying a matrix A by a permutation
matrix results in a permutation of the columns, while pre-multiplying by a permutation matrix gives a permutation of the rows. In symbols
AP = A(:, p),
P T A = A(p, :).
(3.10)
3.4 The PLU Factorization
67
Indeed, AP = (Aei1 , . . . , Aein ) = A(:, p) and P T A = (AT P )T = (AT (:
, p))T = A(p, :).
Since P T P = I the inverse of P is equal to its transpose, P −1 = P T and
P P T = I as well. We will use a particularly simple permutation matrix.
Definition 3.4 We define a (j, k)-Interchange matrix I j k by interchanging column j and k of the identity matrix.
Since I j k = I kj , and we obtain the identity by applying I j k twice, we see that
I 2j k = I and an interchange matrix is symmetric and equal to its own inverse. Premultiplying a matrix by an interchange matrix interchanges two rows of the matrix,
while post-multiplication interchanges two columns.
We can keep track of the row interchanges using pivot vectors pk . We define
p := pn , where p 1 := [1, 2, . . . , n]T , and p k+1 := I rk ,k p k for k = 1, . . . , n − 1.
(3.11)
We obtain p k+1 from pk by interchanging the entries rk and k in p k . In particular,
since rk ≥ k, the first k − 1 components in pk and pk+1 are the same.
There is a close relation between the pivot vectors p k and the corresponding
interchange matrices P k := I rk ,k . Since P k I (pk , :) = I (P k p k , :) = I (p k+1 , :)
we obtain
P T = P n−1 · · · P 1 = I (p, :),
P = P 1 P 2 · · · P n−1 = I (:, p).
(3.12)
Instead of interchanging the rows of A during elimination we can keep track of
the ordering of the rows using the pivot vectors p k . Gaussian elimination with row
pivoting starting with aij(1) = aij can be described as follows:
p = [1, . . . , n]T ;
for k = 1 : n − 1
choose rk ≥ k so that ap(k)
= 0.
r ,k
k
p = Irk ,k p
(3.13)
for i = k + 1 : n
(k)
(k)
(k)
api ,k = api ,k /apk ,k
for j = k : n
ap(k+1)
= ap(k)
− ap(k)
a (k)
i ,j
i ,j
i ,k pk ,j
68
3 Gaussian Elimination and LU Factorizations
This leads to the following factorization:
Theorem 3.3 Gaussian elimination with row pivoting on a nonsingular matrix A ∈
Cn×n leads to the factorization A = P LU , where P is a permutation matrix, L
is lower triangular with ones on the diagonal, and U is upper triangular. More
explicitly, P = I (:, p), where p = I rn−1 ,n−1 · · · I r1 ,1 [1, . . . , n]T , and
⎡
⎤
1
⎢a (1)
⎥
⎢ p2 ,1 1
⎥
⎢
⎥
L=⎢ .
.. ⎥ ,
.
. ⎦
⎣ .
ap(1)
a (2) · · · 1
n ,1 pn ,2
⎡
⎤
(1)
(1)
ap1 ,1 · · · ap1 ,n
⎢
. ⎥
..
⎥
U =⎢
. .. ⎦ .
⎣
ap(n)
n ,n
(3.14)
Proof The proof is analogous to the proof for LU factorization without pivoting.
From (3.13) we have for all i, j
(k)
(k)
(k)
(k+1)
(k)
(j )
(j )
api ,k apk ,j = api ,j − api ,j for k < min(i, j ), and api ,j apj ,j = api ,j for i > j.
Thus for i ≤ j we find
n
(LU )ij =
i−1
ap(k)
a (k) + ap(i)i ,j
i ,k pk ,j
li,k ukj =
k=1
i−1
=
k=1
(k+1) (k)
api ,j − api ,j
(i)
(1)
+ api ,j = api ,j = api ,j = P T A ij ,
k=1
while for i > j
j −1
n
(k)
lik
ukj
(LU )ij =
k=1
j −1
=
(j )
ap(k)
a (k) + ap(k)
a
i ,k pk ,j
i ,j pj ,j
=
k=1
(k)
(j )
(k+1) (1)
api ,j − api ,j + api ,j = api ,j = api ,j = P T A ij .
k=1
The PLU factorization can also be written P T A = LU . This shows that for a
nonsingular matrix there is a permutation of the rows of A so that the permuted
matrix has an LU factorization.
3.4 The PLU Factorization
69
3.4.3 Pivot Strategies
The choice of pivot element in (3.13) is not unique. In partial pivoting we select
the largest element
(k)
|ar(k)
| := max{|ai,k
| : k ≤ i ≤ n}
k ,k
with rk the smallest such index in case of a tie. The following example illustrating
that small pivots should be avoided.
Example 3.4 Applying Gaussian elimination without row interchanges to the linear
system
10−4 x1 + 2x2 = 4
x1 + x2 = 3
we obtain the upper triangular system
10−4 x1 + 2x2 = 4
(1 − 2 × 104 )x2 = 3 − 4 × 104
The exact solution is
x2 =
−39997
≈ 2,
−19999
x1 =
4 − 2x2
20000
≈ 1.
=
−4
10
19999
Suppose we round the result of each arithmetic operation to three digits. The
solutions fl(x1 ) and fl(x2 ) computed in this way is
fl(x2 ) = 2,
fl(x1 ) = 0.
The computed value 0 of x1 is completely wrong. Suppose instead we apply
Gaussian elimination to the same system, but where we have interchanged the
equations. The system is
x1 + x2 = 3
10−4 x1 + 2x2 = 4
and we obtain the upper triangular system
x1 + x2 = 3
(2 − 10−4 )x2 = 4 − 3 × 10−4
70
3 Gaussian Elimination and LU Factorizations
Now the solution is computed as follows
x2 =
3.9997
≈ 2,
1.9999
x1 = 3 − x2 ≈ 1.
In this case rounding each calculation to three digits produces fl(x1 ) = 1 and
fl(x2 ) = 2 which is quite satisfactory since it is the exact solution rounded to three
digits.
Related to partial pivoting is scaled partial pivoting. Here rk is the smallest
index such that
|ar(k)
|
k ,k
sk
:= max
# |a (k)|
i,k
sk
$
:k≤i≤n ,
sk := max |akj |.
1≤j ≤n
This can sometimes give more accurate results if the coefficient matrix have
coefficients of wildly different sizes. Note that the scaling factors sk are computed
using the initial matrix.
It also is possible to interchange both rows and columns. The choice
(k)
ar(k)
:= max{|ai,j
| : k ≤ i, j ≤ n}
k ,sk
with rk , sk the smallest such indices in case of a tie, is known as complete pivoting.
Complete pivoting is known to be more numerically stable than partial pivoting, but
requires a lot of search and is seldom used in practice.
3.5 The LU and LDU Factorizations
Gaussian elimination without row interchanges is one way of computing an LU
factorization of a matrix. There are other ways that can be advantageous for certain
kind of problems. Here we consider the general theory of LU factorizations. Recall
that A = LU is an LU factorization of A ∈ Cn×n if L ∈ Cn×n is lower triangular
and U ∈ Cn×n is upper triangular , i.e.,
⎡
l1,1 · · ·
⎢ .. . .
L=⎣ .
.
ln,1 · · ·
⎤
0
.. ⎥ ,
. ⎦
ln,n
⎡
u1,1 · · ·
⎢ .. . .
U =⎣ .
.
0 ···
⎤
u1,n
.. ⎥ .
. ⎦
un,n
To find an LU factorization there is one equation for each of the n2 elements in A,
and L and U contain a total of n2 + n unknown elements. There are several ways to
restrict the number of unknowns to n2 .
3.5 The LU and LDU Factorizations
71
L1U: lii = 1 all i,
LU1: uii = 1 all i,
LDU: A = LDU , lii = uii = 1 all i, D = diag(d11, . . . , dnn ).
3.5.1 Existence and Uniqueness
Consider the L1U factorization. Three things can happen. An L1U factorization
exists and is unique, it exists, but it is not unique, or it does not exist. The 2 × 2 case
illustrates this.
Example
3.5
(L1U of 2 × 2 Matrix) Let a, b, c, d ∈ C. An L1U factorization of
A = [ ac db must satisfy the equations
u2
u1
ab
1 0 u1 u2
=
=
u1 l1 u2 l1 + u3
cd
l1 1 0 u3
for the unknowns l1 in L and u1 , u2 , u3 in U . The equations are
u1 = a,
u2 = b,
al1 = c,
bl1 + u3 = d.
(3.15)
These equations do not always have a solution. Indeed, the main problem is the
equation al1 = c. There are essentially three cases
1. a = 0: The matrix has a unique L1U factorization.
2. a = c = 0: The L1U factorization exists, but it is not unique. Any value for l1
can be used.
3. a = 0, c = 0: No L1U factorization exists.
Consider the four matrices
2 −1
01
A1 :=
, A2 :=
,
−1 2
11
A3 :=
01
,
02
A4 :=
11
.
11
From the previous discussion it follows that A1 has a unique L1U factorization, A2
has no L1U factorization, A3 has an L1U factorization but it is not unique, and A4
has a unique L1U factorization even if it is singular.
In preparation for the main theorem about LU factorization we prove a simple
lemma. Recall that
⎤
⎡
a11 · · · ak1
⎢
.. ⎥
A[k] := ⎣ ...
. ⎦
ak1 · · · akk
is called a leading principal submatrix of A.
72
3 Gaussian Elimination and LU Factorizations
Lemma 3.1 (L1U of Leading Principal Submatrices) Suppose A = LU is an
L1U factorization of A ∈ Cn×n . For k = 1, . . . , n let A[k] , L[k] , U [k] be the leading
principal submatrices of A, L, U , respectively. Then A[k] = L[k] U [k] is an L1U
factorization of A[k] for k = 1, . . . , n.
Proof For k = 1, . . . , n − 1 we partition A = LU as follows:
L[k] S k
L[k] 0
U [k] S k
L[k] U [k]
A[k] B k
=
=
,
Ck F k
0 Tk
M k U [k] M k S k + N k T k
Mk N k
(3.16)
where F k , N k , T k ∈ Cn−k,n−k . Comparing blocks we find A[k] = L[k] U [k] . Since
L[k] is unit lower triangular and U [k] is upper triangular this is an L1U factorization
of A[k] .
The following theorem gives a necessary and sufficient condition for existence of
a unique LU factorization. The conditions are the same for the three factorizations
L1U, LU1 and LDU.
Theorem 3.4 (LU Theorem) A square matrix A ∈ Cn×n has a unique L1U (LU1,
LDU) factorization if and only if the leading principal submatrices A[k] of A are
nonsingular for k = 1, . . . , n − 1.
Proof Suppose A[k] is nonsingular for k = 1, . . . , n − 1. Under these conditions
Gaussian elimination gives an L1U factorization (cf. Theorem 3.2). We give another
proof here that in addition to showing uniqueness also gives alternative ways to
compute the L1U factorization. The proofs for the LU1 and LDU factorizations are
similar and left as exercises.
We use induction on n to show that A has a unique L1U factorization. The result
is clearly true for n = 1, since the unique L1U factorization of a 1 × 1 matrix is
[a11] = [1][a11]. Suppose that A[n−1] has a unique L1U factorization A[n−1] =
Ln−1 U n−1 , and that A[1] , . . . , A[n−1] are nonsingular. By block multiplication
Ln−1 0 U n−1 un
Ln−1 U n−1 Ln−1 un
A[n−1] cn
=
=
,
A=
l Tn 1
r Tn ann
0 unn
l Tn U n−1 l Tn un + unn
(3.17)
if and only if A[n−1] = Ln−1 U n−1 and l n , un ∈ Cn−1 and unn ∈ C are determined
from
U Tn−1 l n = r n ,
Ln−1 un = cn ,
unn = ann − l Tn un .
(3.18)
Since A[n−1] is nonsingular it follows that Ln−1 and U n−1 are nonsingular and
therefore l n , un , and unn are uniquely given. Thus (3.17) gives a unique L1U
factorization of A.
Conversely, suppose A has a unique L1U factorization A = LU . By Lemma 3.1
A[k] = L[k] U [k] is an L1U factorization of A[k] for k = 1, . . . , n − 1. Suppose
3.5 The LU and LDU Factorizations
73
A[k] is singular for some k ≤ n − 1. We will show that this leads to a contradiction.
Let k be the smallest integer so that A[k] is singular. Since A[j ] is nonsingular for
j ≤ k − 1 it follows from what we have already shown that A[k] = L[k] U [k] is the
unique L1U factorization of A[k] . The matrix U [k] is singular since A[k] is singular
and L[k] is nonsingular. By (3.16) we have U T[k] M Tk = C Tk . This can be written as
n − k linear systems for the columns of M Tk . By assumption M Tk exists, but since
U T[k] is singular M k is not unique, a contradiction.
By combining the last two equations in (3.18) we obtain with k = n
U Tk−1 l k = r k ,
c
Lk−1 0 uk
= k .
akk
l Tk 1 ukk
This can be used in an algorithm to compute the L1U factorization. Moreover, if A
is d-banded then the first k − d components in r k and ck are zero so both L and U
will be d-banded. Thus we can use the banded rforwardsolve Algorithm 3.1
to solve the lower triangular system U Tk−1 l k = r k for the kth row l Tk in L and the
kth column uukkk in U for k = 2, . . . , n. This leads to the following algorithm to
compute the L1U factorization of a d-banded matrix A with d ≥ 1. The algorithm
will fail if the conditions in the LU theorem are not satisfied.
function [L,U]=L1U(A,d)
% [L,U]=L1U(A,d)
n=length(A);
L=eye(n,n); U=zeros(n,n);U(1,1)=A(1,1);
for k=2:n
km=max(1,k-d);
L(k,km:(k-1))=rforwardsolve(U(km:(k-1) ...
,km:(k-1))’,A(k,km:(k-1))’,d)’;
U(km:k,k)=rforwardsolve(L(km:k,km:k),A(km:k,k),d);
end
Listing 3.4 L1U
For each k we essentially solve a lower triangular linear system of order d. Thus
the number of arithmetic operation for this algorithm is O(d 2 n).
Remark 3.1 (LU of Upper Triangular Matrix) A matrix A ∈ Cn×n can have an LU
factorization even if A[k] is singular for some k < n. By Theorem 4.1 such an LU
factorization cannot be unique. An L1U factorization of an upper triangular matrix
A is A = I A so it always exists even if A has zeros somewhere on the diagonal.
By Lemma 2.5, if some akk is zero then A[k] is singular and the L1U factorization
is not unique. In particular, for the zero matrix any unit lower triangular matrix can
be used as L in an L1U factorization.
74
3 Gaussian Elimination and LU Factorizations
3.6 Block LU Factorization
Suppose A ∈ Cn×n is a block matrix of the form
⎡
⎤
A11 · · · A1m
⎢
.. ⎥ ,
A := ⎣ ...
. ⎦
Am1 · · · Amm
(3.19)
where each diagonal block Aii is square. We call the factorization
⎤⎡
⎡
I
⎢ L21 I
⎢
A = LU = ⎢ .
⎣ ..
..
⎥⎢
⎥⎢
⎥⎢
⎦⎣
.
U 11
U 22
Lm1 · · · Lm,m−1 I
⎤
· · · U 1m
· · · U 2m ⎥
⎥
.. ⎥
..
. . ⎦
U mm
(3.20)
a block L1U factorization of A. Here the ith diagonal blocks I and U ii in L and
U have the same size as Aii , the ith diagonal block in A. Moreover, the U ii are not
necessarily upper triangular. Block LU1 and block LDU factorizations are defined
similarly.
The results for element-wise LU factorization carry over to block LU factorization as follows.
Theorem 3.5 (Block LU Theorem) Suppose A ∈ Cn×n is a block matrix of the
form (3.19). Then A has a unique block LU factorization (3.20) if and only if the
leading principal block submatrices
⎡
A{k}
A11 · · ·
⎢ ..
:= ⎣ .
⎤
A1k
.. ⎥
. ⎦
Ak1 · · · Akk
are nonsingular for k = 1, . . . , m − 1.
Proof Suppose A{k} is nonsingular for k = 1, . . . , m − 1. Following the proof
in Theorem 3.4 suppose A{m−1} has a unique block LU factorization A{m−1} =
L{m−1} U {m−1} , and that A{1} , . . . , A{m−1} are nonsingular. Then L{m−1} and
U {m−1} are nonsingular and
A{m−1} B
C T Amm
L−1
L{m−1} 0 U {m−1}
{m−1} B
,
=
−1
C T U −1
0
Amm − C T U −1
{m−1} I
{m−1} L{m−1} B
A=
(3.21)
3.7 Exercises Chap. 3
75
is a block LU factorization of A. It is unique by derivation. Conversely, suppose A
has a unique block LU factorization A = LU . Then as in Lemma 3.1 it is easily seen
that A{k} = L{k} U {k} is the unique block LU factorization of A[k] for k = 1, . . . , m.
The rest of the proof is similar to the proof of Theorem 3.4.
Remark 3.2 (Comparing LU and Block LU) The number of arithmetic operations
for the block LU factorization is the same as for the ordinary LU factorization. An
advantage of the block method is that it combines many of the operations into matrix
operations.
Remark 3.3 (A Block LU Is Not an LU) Note that (3.20) is not an LU factorization
of A since the U ii ’s are not upper triangular in general. To relate the block LU
factorization to the usual LU factorization we assume that each U ii has an LU
factorization U ii = L̃ii Ũ ii . Then A = L̂Û , where L̂ := L diag(L̃ii ) and
−1
Û := diag(L̃ii )U , and this is an ordinary LU factorization of A.
3.7 Exercises Chap. 3
3.7.1 Exercises Sect. 3.3
Exercise 3.1 (Column Oriented Backsolve) Suppose A ∈ Cn×n is nonsingular,
upper triangular, d-banded, and b ∈ Cn . Justify the following column oriented
vectorized algorithm for solving Ax = b.
function x=cbacksolve(A,b,d)
% x=cbacksolve(A,b,d)
x=b; n=length(b);
for k=n:-1:2
x(k)=b(k)/A(k,k); lk=max(1,k-d);
b(lk:(k-1))=b(lk:(k-1))-A(lk:(k-1),k)*x(k);
end
x(1)=b(1)/A(1,1);
end
Listing 3.5 cbacksolve
Exercise 3.2 (Computing the Inverse of a Triangular Matrix) Suppose A ∈
Cn×n is a nonsingular lower triangular matrix. By Lemma 2.5 the inverse B =
[b1 , . . . , bn ] is also lower triangular. The kth column bk of B is the solution of the
linear systems Abk = ek . Show that bk (k) = 1/a(k, k) for k = 1, . . . , n, and
explain why we can find bk by solving the linear systems
A((k +1):n, (k +1):n)bk ((k +1):n) = −A((k +1):n, k)bk (k),
k = 1, . . . , n−1.
(3.22)
76
3 Gaussian Elimination and LU Factorizations
Is it possible to store the interesting part of bk in A as soon as it is computed?
When A instead is upper triangular, show also that we can find bk by solving the
linear systems
A(1:k, 1:k)bk (1:k) = I (1:k, k),
k = n, n − 1, . . . , 1,
(3.23)
for k = n, n − 1, . . . , 1.
Exercise 3.3 (Finite Sums of Integers) Use induction on m, or some other
method, to show that
1
m(m + 1),
2
1
1
12 + 22 + · · · + m2 = m(m + )(m + 1),
3
2
1 + 2 + ···+ m =
1 + 3 + 5 + · · · + 2m − 1 = m2 ,
1 ∗ 2 + 2 ∗ 3 + 3 ∗ 4 + · · · + (m − 1)m =
1
(m − 1)m(m + 1).
3
(3.24)
(3.25)
(3.26)
(3.27)
Exercise 3.4 (Multiplying Triangular Matrices) Show that the matrix multiplication AB can be done in 13 n(2n2 + 1) ≈ Gn arithmetic operations when A ∈ Rn×n
is lower triangular and B ∈ Rn×n is upper triangular. What about BA?
3.7.2 Exercises Sect. 3.4
Exercise 3.5 (Using PLU for A∗ ) Suppose we know the PLU factors P , L, U in a
PLU factorization A = P LU of A ∈ Cn×n . Explain how we can solve the system
A∗ x = b economically.
Exercise 3.6 (Using PLU for Determinant) Suppose we know the PLU factors
P , L, U in a PLU factorization A = P LU of A ∈ Cn×n . Explain how we can use
this to compute the determinant of A.
Exercise 3.7 (Using PLU for A−1 ) Suppose the factors P , L, U in a PLU factorization of A ∈ Cn×n are known. Use Exercise 3.4 to show that it takes approximately
2Gn arithmetic operations to compute A−1 = U −1 L−1 P T . Here we have not
counted the final multiplication with P T which amounts to n row interchanges.
Exercise 3.8 (Upper Hessenberg System (Exam Exercise (1994-2)) Gaussian
elimination with row pivoting can be written in the following form if for each k we
exchange rows k and k + 1
3.7 Exercises Chap. 3
77
Algorithm 1
1. for k = 1, 2, . . . , n − 1
(a) exchange ak,j and ak+1,j for j = k, k + 1, . . . , n
(b) for i = k + 1, k + 2, . . . , n
i. ai,k = mi,k = ai,k /ak,k
ii. ai,j = ai,j − mi,k ak,j for j = k + 1, k + 2, . . . , n
To solve the set of equations Ax = b we have the following algorithm:
Algorithm 2
1. for k = 1, 2, . . . , n − 1
(a) exchange bk and bk+1
(b) bi = bi − ai,k bk for i = k + 1, k + 2, . . . , n
2. xn = bn /an,n
3. for k = n − 1, n − 2, . . . , 1
(a) sum = 0
(b) sum = sum + ak,j xj for j = k + 1, k + 2, . . . , n
(c) xk = (bk − sum)/ak,k
We say that H ∈ Rn×n is unreduced upper Hessenberg if it is upper Hessenberg
and the subdiagonal elements hi,i−1 = 0 for i = 2, . . . , n.
a) Let H ∈ Rn×n be unreduced upper Hessenberg. Give an O(n2 ) algorithm for
solving the linear system H x = b using suitable specializations of Algorithms 1
and 2.
b) Find the number of multiplications/divisions in the algorithm you developed in
exercise a). Is division by zero possible?
c) Let U ∈ Rn×n be upper triangular and nonsingular. We define
C := U + veT1 ,
(3.28)
where v ∈ Rn and e1 is the first unit vector in Rn . We also let
P := I 1,2 I 2,3 · · · I n−1,n ,
(3.29)
where the I i,j are obtained from the identity matrix by interchanging rows i and
j . Explain why the matrix E := CP is unreduced upper Hessenberg.
78
3 Gaussian Elimination and LU Factorizations
d) Let A ∈ Rn×n be nonsingular. We assume that A has a unique L1U factorization
A = LU . To a given W ∈ Rn we define a rank one modification of A by
B := A + weT1 .
(3.30)
Show that B has the factorization B = LH P T , where L is unit lower triangular,
P is given by (3.29) and H is unreduced upper Hessenberg.
e) Use the results above to sketch an O(n2 ) algorithm for solving the linear system
Bx = b, where B is given by (3.30). We assume that the matrices L and U in
the L1U factorization of A have already been computed.
3.7.3 Exercises Sect. 3.5
Exercise 3.9 (# Operations for Banded Triangular Systems) Show that for 1 ≤
d ≤ n Algorithm 3.4, with A(k, k) = 1 for k = 1, . . . , n in Algorithm 3.1, requires
exactly NLU (n, d) := (2d 2 + d)n − (d 2 + d)(8d + 1)/6 = O(d 2 n) operations.2 In
particular, for a full matrix d = n − 1 and we find NLU (n, n) = 23 n3 − 12 n2 − 16 n ≈
Gn in agreement with the exact count (3.9) for Gaussian elimination, while for a
tridiagonal matrix NLU (n, 1) = 3n − 3 = O(n).
Exercise 3.10 (L1U and LU1) Show that the matrix A3 in Example 3.5 has
no LU1 or LDU factorization. Give an example of a matrix that has an LU1
factorization, but no LDU or L1U factorization.
Exercise 3.11 (LU of Nonsingular Matrix) Show that the following are equivalent for a nonsingular matrix A ∈ Cn×n .
1. A has an LDU factorization.
2. A has an L1U factorization.
3. A has an LU1 factorization.
Exercise 3.12 (Row Interchange) Show that A = [ 01 11 has a unique L1U
factorization. Note that we have only interchanged rows in Example 3.5.
Exercise 3.13 (LU and Determinant) Suppose A has an L1U factorization A =
LU . Show that
det(A[k] ) = u11 u22 · · · ukk for k = 1, . . . , n.
2 Hint:
Consider the cases 2 ≤ k ≤ d and d + 1 ≤ k ≤ n separately.
3.7 Exercises Chap. 3
79
Exercise 3.14 (Diagonal Elements in U) Suppose A ∈ Cn×n and A[k] is nonsingular for k = 1, . . . , n − 1. Use Exercise 3.13 to show that the diagonal elements
ukk in the L1U factorization are
u11 = a11,
ukk =
det(A[k] )
, for k = 2, . . . , n.
det(A[k−1] )
(3.31)
Exercise 3.15 (Proof of LDU Theorem) Give a proof of the LU theorem for the
LDU case.
Exercise 3.16 (Proof of LU1 Theorem) Give a proof of the LU theorem for the
LU1 case.
Exercise 3.17 (Computing the Inverse (Exam Exercise 1978-1)) Let A ∈ Rn×n
be nonsingular and with a unique L1U factorization A = LU . We partition L and
U as follows
1 0
u1,1 uT1
L=
, U=
,
(3.32)
0 U 2,2
1 L2,2
where L2,2 , U 2,2 ∈ R(n−1)×(n−1) . Define A2,2 := L2,2 U 2,2 and B 2,2 := A−1
2,2 .
a) Show that A−1 = B, where
(1 + uT1 B 2,2 1 )/u1,1 −uT1 B 2,2 /u1,1
B :=
.
−B 2,2 1
B 2,2
(3.33)
b) Suppose that the elements li,j , i > j in L and ui,j , j ≥ i in U are stored in
A with elements ai,j . Write an algorithm that overwrites the elements in A with
ones in A−1 . Only one extra vector s ∈ Rn should be used.
Exercise 3.18 (Solving T H x = b (Exam Exercise 1981-3)) In this exercise we
consider nonsingular matrices T , H , S ∈ Rn×n with T = (tij ) upper triangular,
H = (hij ) upper Hessenberg and S := T H . We assume that H has a unique
LU1 factorization H = LU with L∞ U ∞ ≤ KH ∞ for a constant K not
too large. In this exercise the number of operations is the highest order term in the
number of multiplications and divisions.
a) Give an algorithm which computes S from T and H without using the lower parts
(tij , i > j ) of T and (hij , i > j + 1) of H . In what order should the elements
in S be computed if S overwrites the elements in H ? What is the number of
operations of the algorithm?
b) Show that L is upper Hessenberg.
c) Give a detailed algorithm for finding the LU1-factorization of H stored in H .
Determine the number of operations in the algorithm.
80
3 Gaussian Elimination and LU Factorizations
d) Given b ∈ Rn and T ,H as before. Suppose S and the LU1-factorization are not
computed. We want to find x ∈ Rn such that Sx = b. We have the 2 following
methods
Method 1:
1. S = T H
2. Solve Sx = b
Method 2:
1. Solve T z = b
2. Solve H x = z
What method would you prefer? Give reasons for your answer.
Exercise 3.19 (L1U Factorization Update (Exam Exercise 1983-1)) Let A ∈
Rn×n be nonsingular with columns a 1 , a 2 , . . . , a n . We assume that A has a unique
L1U factorization A = LU .
For a positive integer p ≤ n and b ∈ Rn we define
B := [a 1 , . . . , a p−1 , a p+1 , . . . , a n , b] ∈ Rn×n .
a) Show that H := L−1 B is upper Hessenberg. We assume that H has a unique
L1U factorization. H = LH U H .
b) Describe briefly how many multiplications/divisions are required to find the
L1U factorization of H ?
c) Suppose we have found the L1U factorization H := LH U H of H . Explain how
we can find the L1U factorization of B from LH and U H .
Exercise 3.20 (U1L Factorization (Exam Exercise 1990-1)) We say that A ∈
Rn×n has a U1L factorization if A = U L for an upper triangular matrix U ∈ Rn×n
with ones on the diagonal and a lower triangular L ∈ Rn×n . A UL and the more
common LU factorization are analogous, but normally not the same.
a) Find a U1L factorization of the matrix
−3 −2
A :=
.
4 2
b) Let the columns of P ∈ Rn×n be the unit vectors in reverse order, i.e.,
P := [en , en−1 , . . . , e1 ].
3.8 Review Questions
c)
d)
e)
f)
81
Show that P T = P and P 2 = I . What is the connection between the elements
in A and P A?
Let B := P AP . Find integers r, s, depending on i, j, n, such that bi,j = ar,s .
Make a detailed algorithm which to given A ∈ Rn×n determines B := P AP .
The elements bi,j in B should be stored in position i, j in A. You should not use
other matrices than A and a scalar w ∈ R.
Let P AP = MR be an L1U factorization of P AP , i.e., M is lower triangular
with ones on the diagonal and R is upper triangular. Express the matrices U and
L in a U1L factorization of A in terms of M, R and P .
Give necessary and sufficient conditions for a matrix to have a unique U1L
factorization.
3.7.4 Exercises Sect. 3.6
Exercise 3.21 (Making Block LU into LU) Show that L̂ is unit lower triangular
and Û is upper triangular.
3.8 Review Questions
3.8.1
3.8.2
3.8.3
3.8.4
When is a triangular matrix nonsingular?
What is the general condition for Gaussian elimination without row interchanges to be well defined?
What is the content of the LU theorem?
Approximately how many arithmetic operations are needed for
• the multiplication of two square matrices?
• The LU factorization of a matrix?
• the solution of Ax = b, when A is triangular?
3.8.5
3.8.6
What is a PLU factorization? When does it exist?
What is complete pivoting?
Chapter 4
LDL* Factorization and Positive Definite
Matrices
In this chapter we consider LU factorizations of Hermitian and positive definite
matrices. Recall that a matrix A ∈ Cn×n is Hermitian if A∗ = A, i.e., aj i = a ij
for all i, j . A real Hermitian matrix is symmetric. Since aii = a ii the diagonal
elements of a Hermitian matrix must be real.
4.1 The LDL* Factorization
There are special versions of the LU factorization for Hermitian and positive definite
matrices which takes advantage of the special properties of such matrices. The most
important ones are
1. the LDL* factorization which is an LDU factorization with U = L∗ and D a
diagonal matrix with real diagonal elements
2. the LL* factorization which is an LU factorization with U = L∗ and lii > 0
all i.
A matrix A having an LDL* factorization must be Hermitian since D is real so
that A∗ = (LDL∗ )∗ = LD ∗ L∗ = A. The LL* factorization is called a Cholesky
factorization .
Example 4.1 (LDL* of 2 × 2 Hermitian Matrix) Let a, d ∈ R and b ∈ C. An LDL*
factorization of a 2 × 2 Hermitian matrix must satisfy the equations
1 0 d1 0 1 l1
d1 l1
d1
ab
=
=
bd
l1 1 0 d2 0 1
d1 l1 d1 |l1 |2 + d2
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_4
83
84
4 LDL* Factorization and Positive Definite Matrices
for the unknowns l1 in L and d1 , d2 in D. They are determined from
d1 = a. al1 = b,
d2 = d − a|l1 |2 .
(4.1)
There are essentially three cases
1. a = 0: The matrix has a unique LDL* factorization. Note that d1 and d2 are real.
2. a = b = 0: The LDL* factorization exists, but it is not unique. Any value for l1
can be used.
3. a = 0, b = 0: No LDL* factorization exists.
Lemma 3.1 carries over to the Hermitian case.
Lemma 4.1 (LDL* of Leading Principal Sub Matrices) Suppose A = LDL∗
is an LDL* factorization of A ∈ Cn×n . For k = 1, . . . , n let A[k] ,L[k] and D [k]
be the leading principal submatrices of A,L and D, respectively. Then A[k] =
L[k] D [k] L∗[k] is an LDL* factorization of A[k] for k = 1, . . . , n.
Proof For k = 1, . . . , n − 1 we partition A = LDL∗ as follows:
∗
L[k] M ∗k
L[k] 0
D [k] 0
A[k] B ∗k
= LDU ,
=
A=
Bk F k
Mk Nk
0 Ek
0 N ∗k
(4.2)
where F k , N k , E k ∈ Cn−k,n−k . Block multiplication gives A[k] = L[k] D [k] L∗[k] .
Since L[k] is unit lower triangular and D [k] is real and diagonal this is an LDL*
factorization of A[k] .
Theorem 4.1 (LDL* Theorem) The matrix A ∈ Cn×n has a unique LDL*
factorization if and only if A = A∗ and A[k] is nonsingular for k = 1, . . . , n − 1.
Proof We essentially repeat the proof of Theorem 3.4 incorporating the necessary
changes. Suppose A∗ = A and that A[k] is nonsingular for k = 1, . . . , n − 1.
Note that A∗[k] = A[k] for k = 1, . . . , n. We use induction on n to show that A
has a unique LDL* factorization. The result is clearly true for n = 1, since the
unique LDL* factorization of a 1-by-1 matrix is [a11] = [1][a11][1] and a11 is real
since A∗ = A. Suppose that A[n−1] has a unique LDL* factorization A[n−1] =
Ln−1 D n−1 L∗n−1 , and that A[1] , . . . , A[n−1] are nonsingular. By definition D n−1 is
real. Using block multiplication
∗
Ln−1 l n
A[n−1] a n
Ln−1 0 D n−1 0
A=
=
l ∗n 1
0 dnn
a ∗n ann
0∗ 1
Ln−1 D n−1 L∗n−1 Ln−1 D n−1 l n
=
l ∗n D n−1 L∗n−1 l ∗n D n−1 l n + dnn
(4.3)
4.2 Positive Definite and Semidefinite Matrices
85
if and only if A[n−1] = Ln−1 D n−1 L∗n−1 , and
a n = Ln−1 D n−1 l n ,
ann = l ∗n D n−1 l n + dnn .
(4.4)
Thus we obtain an LDL* factorization of A that is unique since Ln−1 and D n−1 are
nonsingular. Also dnn is real since ann and D n−1 are real.
For the converse we use Lemma 4.1 in the same way as Lemma 3.1 was used to
prove Theorem 3.4.
Here is an analog of Algorithm 3.4 that tries to compute the LDL* factorization
of a d-banded matrix A with d ≥ 1. It uses the upper part of the matrix.
function [L,dg]=LDLs(A,d)
% [L,dg]=LDLs(A,d)
n=length(A);
L=eye(n,n); dg=zeros(n,1);dg(1)=A(1,1);
for k=2:n
m=rforwardsolve(L(1:k-1,1:k-1),A(1:k-1,k),d);
L(k,1:k-1)=m./dg(1:k-1);
dg(k)=A(k,k)-L(k,1:k-1)*m;
end
Listing 4.1 LDLs
The number of arithmetic operations for the LDL* factorization is approximately
half the number of operations needed for the LU factorization. Indeed, in the
L1U factorization we needed to solve two triangular systems to find the vectors s
and m, while only one such system is needed to find m in the Hermitian case (4.3).
The work to find dnn is O(n) and does not contribute to the highest order term.
1
2 Gn ,
Example 4.2 (A Factorization) Is the factorization
31
1 0 3 0
1 1/3
=
13
1/3 1 0 8/3 0 1
an LDL* factorization?
4.2 Positive Definite and Semidefinite Matrices
Given A ∈ Cn×n . The function f : Cn → R given by
f (x) = x ∗ Ax =
n
n
aij x i xj
i=1 j =1
86
4 LDL* Factorization and Positive Definite Matrices
is called a quadratic form. Note that f is real valued if A is Hermitian. Indeed,
f (x) = x ∗ Ax = (x ∗ Ax)∗ = x ∗ A∗ x = f (x).
Definition 4.1 (Positive Definite Matrix) We say that a matrix A ∈ Cn×n is
(i) positive definite if A∗ = A and x ∗ Ax > 0 for all nonzero x ∈ Cn ;
(ii) positive semidefinite if A∗ = A and x ∗ Ax ≥ 0 for all x ∈ Cn ;
(iii) negative (semi)definite if −A is positive (semi)definite.
We observe that
1. The zero-matrix is positive semidefinite, while the unit matrix is positive definite.
2. The matrix A is positive definite if and only if it is positive semidefinite and
x ∗ Ax = 0 ⇒ x = 0.
3. A positive definite matrix A is nonsingular. For if Ax = 0 then x ∗ Ax = 0 and
this implies that x = 0.
4. It follows from Lemma 4.6 that a nonsingular positive semidefinite matrix is
positive definite.
5. If A is real then it is enough to show definiteness for real vectors only. Indeed, if
A ∈ Rn×n , AT = A and x T Ax > 0 for all nonzero x ∈ Rn then z∗ Az > 0 for
all nonzero z ∈ Cn . For if z = x + iy = 0 with x, y ∈ Rn then
z∗ Az = (x − iy)T A(x + iy) = x T Ax − iy T Ax + ix T Ay − i 2 y T Ay
= x T Ax + y T Ay,
and this is positive since at least one of the real vectors x, y is nonzero.
Example 4.3 (Gradient and Hessian) Symmetric positive definite matrices is
important in nonlinear optimization. Consider (cf. (16.1)) the gradient ∇f and
hessian Hf of a function f : Ω ⊂ Rn → R
⎡ ∂f (x) ⎤
⎢
∇f (x) = ⎢
⎣
∂x1
..
.
∂f (x)
∂xn
⎥
⎥ ∈ Rn ,
⎦
⎡ ∂ 2 f (x)
⎢ ∂x1.∂x1
Hf (x) = ⎢
⎣ ..
∂ 2 f (x)
∂xn ∂x1
...
∂ 2 f (x)
∂x1 ∂xn
..
.
...
∂ 2 f (x)
∂xn ∂xn
⎤
⎥
⎥ ∈ Rn×n .
⎦
We assume that f has continuous first and second order partial derivatives on Ω.
Under suitable conditions on the domain Ω it is shown in advanced calculus texts
that if ∇f (x) = 0 and Hf (x) is positive definite then x is a local minimum for f .
This can be shown using the second-order Taylor expansion (16.2). Moreover, x is
a local maximum if ∇f (x) = 0 and Hf (x) is negative definite.
Lemma 4.2 (The Matrix A∗ A) The matrix A∗ A is positive semidefinite for any
m, n ∈ N and A ∈ Cm×n . It is positive definite if and only if A has linearly
independent columns or equivalently rank n.
4.2 Positive Definite and Semidefinite Matrices
87
Proof Clearly A∗ A is Hermitian. Let x ∈ Cn and set z := Ax. By the definition
(1.11) of the Euclidean norm we have x ∗ A∗ Ax = z∗ z = z22 = Ax22 ≥ 0
with equality if and only if Ax = 0. It follows that A∗ A is positive semidefinite
and positive definite if and only if A has linearly independent columns. But this is
equivalent to A having rank n (cf. Definition 1.6).
Lemma 4.3 (T Is Positive Definite) The second derivative matrix T
tridiag(−1, 2, −1) ∈ Rn×n is positive definite.
=
Proof Clearly T is symmetric. For any x ∈ Rn
n
n−1
xT T x = 2
xi2 −
i=1
i=1
n−1
=
n
xi xi+1 −
n−1
xi2 − 2
xi−1 xi
i=2
n−1
xi xi+1 +
i=1
i=1
2
xi+1
+ x12 + xn2
i=1
n−1
= x12 + xn2 +
(xi+1 − xi )2 .
i=1
Thus x T T x ≥ 0 and if x T T x = 0 then x1 = xn = 0 and xi = xi+1 for i =
1, . . . , n − 1 which implies that x = 0. Hence T is positive definite.
4.2.1 The Cholesky Factorization
Recall that a principal submatrix B = A(r, r) ∈ Ck×k of a matrix A ∈ Cn×n has
elements bi,j = ari ,rj for i, j = 1, . . . , k, where 1 ≤ r1 < · · · < rk ≤ n. It is a
leading principal submatrix, denoted A[k] if r = [1, 2, . . . , k]T . We have
A(r, r) = X∗ AX,
X := [er1 , . . . , erk ] ∈ Cn×k .
(4.5)
Lemma 4.4 (Submatrices) Any principal submatrix of a positive (semi)definite
matrix is positive (semi)definite.
Proof Let X and B := A(r, r) be given by (4.5). If A is positive semidefinite then
B is positive semidefinite since
y ∗ By = y ∗ X∗ AXy = x ∗ Ax ≥ 0,
y ∈ Ck ,
x := Xy.
(4.6)
Suppose A is positive definite and y ∗ By = 0. By (4.6) we have x = 0 and since
X has linearly independent columns it follows that y = 0. We conclude that B is
positive definite.
88
4 LDL* Factorization and Positive Definite Matrices
Theorem 4.2 (LDL* and LL*) The following is equivalent for a matrix A ∈
Cn×n .
1. A is positive definite,
2. A has an LDL* factorization with positive diagonal elements in D,
3. A has a Cholesky factorization.
If the Cholesky factorization exists it is unique.
Proof Recall that A−∗ := (A−1 )∗ = (A∗ )−1 .
We show that 1 ⇒ 2 ⇒ 3 ⇒ 1.
1 ⇒ 2: Suppose A is positive definite. By Lemma 4.4 the leading principal
submatrices A[k] ∈ Ck×k are positive definite and therefore nonsingular for
k = 1, . . . , n − 1. Since A is Hermitian it has by Theorem 4.1 a unique
LDL* factorization A = LDL∗ . To show that the ith diagonal element in D
is positive we note that x i := L−∗ ei is nonzero since L−∗ is nonsingular. But
then dii = e∗i Dei = e∗i L−1 AL−∗ ei = x ∗i Ax i > 0 since A is positive definite.
2 ⇒ 3: Suppose A has an LDL* factorization A = LDL∗ with positive
∗
1/2 and D 1/2 :=
diagonal
√ elements
√ dii in D. Then A = SS , where S := LD
diag( d11, . . . , dnn ), and this is a Cholesky factorization of A.
3 ⇒ 1: Suppose A has a Cholesky factorization A = LL∗ . Clearly A∗ = A.
Since L has positive diagonal elements it is nonsingular and A is positive definite
by Lemma 4.2.
For uniqueness suppose LL∗ = SS ∗ are two Cholesky factorizations of the
positive definite matrix A. Since A is nonsingular both L and S are nonsingular.
Then S −1 L = S ∗ L−∗ , where by Lemma 2.5 S −1 L is lower triangular and S ∗ L−∗
is upper triangular, with diagonal elements ii /sii and sii /ii , respectively. But then
both matrices must be equal to the same diagonal matrix and 2ii = sii2 . By positivity
ii = sii and we conclude that S −1 L = I = S ∗ L−∗ which means that L = S. A Cholesky factorization can also be written in the equivalent form A = R ∗ R,
where R = L∗ is upper triangular with positive diagonal elements.
2 −1
Example 4.4 (2 × 2) The matrix A =
has an LDL* and a Cholesky−1 2
factorization given by
√ √
√
2√
0
2 −1/ 2
2 −1
1 0 2 0 1 − 12
√
√
=
.
=
3/2
−1/ 2 3/2
0
−1 2
− 12 1 0 32 0 1
There are many good algorithms for finding the Cholesky factorization of a
matrix, see [3]. The following version for finding the factorization of a matrix A
with bandwidth d ≥ 1 uses the LDL* factorization Algorithm 4.1. Only the upper
part of A is used. The algorithm uses the MATLAB command diag.
4.2 Positive Definite and Semidefinite Matrices
89
function L=bandcholesky(A,d)
%L=bandcholesky(A,d)
[L,dg]=LDL(A,d);
L=L*diag(sqrt(dg));
end
Listing 4.2 bandcholesky
As for the LDL* factorization the leading term in an operation count for a band
matrix is O(d 2 n) . When d is small this is a considerable saving compared to the
count 12 Gn = n3 /3 for a full matrix.
4.2.2 Positive Definite and Positive Semidefinite Criteria
Not all Hermitian matrices are positive definite, and sometimes we can tell just by
glancing at the matrix that it cannot be positive definite. Here are some necessary
conditions.
Theorem 4.3 (Necessary Conditions for Positive (Semi)Definiteness) If A ∈
Cn×n is positive (semi)definite then for all i, j with i = j
1.
2.
3.
4.
aii > 0, (aii ≥ 0),
|Re (aij )| < (aii + ajj )/2, (|Re (aij )| ≤ (aii + ajj )/2),
√
√
|aij | < aii ajj , (|aij | ≤ aii ajj ),
If A is positive semidefinite and aii = 0 for some i then aij = aj i = 0 for
j = 1, . . . , n.
Proof Clearly aii = eTi Aei > (≥)0 and Part 1 follows. If α, β ∈ C and αei +βej =
0 then
0 < (≤)(αei + βej )∗ A(αe i + βej ) = |α|2 aii + |β|2ajj + 2Re (αβaij ).
(4.7)
Taking α = 1, β = ±1 we obtain aii + ajj ± 2Re aij > 0 and this implies Part 2.
We first show 3. when A is positive definite. Taking α = −aij , β = aii in (4.7) we
find
0 < |aij |2 aii + aii2 ajj − 2|aij |2 aii = aii (aii ajj − |aij |2 ).
Since aii > 0 Part 3 follows in the positive definite case.
Suppose now A is positive semidefinite. For ε > 0 we define B := A + εI . The
matrix B is positive definite since it is Hermitian and x ∗ Bx ≥ εx22 > 0 for any
nonzero x ∈ Cn . From what we have shown
%
|aij | = |bij | < bii bjj = (aii + ε)(ajj + ε), i = j.
90
4 LDL* Factorization and Positive Definite Matrices
Since ε > 0 is arbitrary Part 3 follows in the semidefinite case. Since A is Hermitian
Part 3 implies Part 4.
Example 4.5 (Not Positive Definite) Consider the matrices
A1 =
01
,
11
A2 =
12
,
22
A3 =
−2 1
.
1 2
Here A1 and A3 are not positive definite, since a diagonal element is not positive.
A2 is not positive definite since neither Part 2 nor Part 3 in Theorem 4.3 are satisfied.
The matrix 21 12 enjoys all the necessary conditions in Theorem 4.3. But to
decide if it is positive definite it is nice to have sufficient conditions as well.
We start by considering eigenvalues of a positive (semi)definite matrix.
Lemma 4.5 (Positive Eigenvalues) A matrix is positive (semi)definite if and only
if it is Hermitian and all its eigenvalues are positive (nonnegative).
Proof Suppose A is positive (semi)definite. Then A is Hermitian by definition, and
if Ax = λx and x is nonzero, then x ∗ Ax = λx ∗ x. This implies that λ > 0(≥ 0)
since A is positive (semi)definite and x ∗ x = x22 > 0. Conversely, suppose
A ∈ Cn×n is Hermitian with positive (nonnegative) eigenvalues λ1 , . . . , λn . By
Theorem 6.9 (the spectral theorem) there is a matrix U ∈ Cn×n with U ∗ U =
U U ∗ = I such that U ∗ AU = diag(λ1 , . . . , λn ). Let x ∈ Cn and define z :=
U ∗ x = [z1 , . . . , zn ]T ∈ Cn . Then x = U U ∗ x = U z and by the spectral theorem
x ∗ Ax = z∗ U ∗ AU z = z∗ diag(λ1 , . . . , λn )z =
n
λj |zj |2 ≥ 0.
j =1
It follows that A is positive semidefinite. Since U ∗ is nonsingular we see that z =
U ∗ x is nonzero if x is nonzero, and therefore A is positive definite.
Lemma 4.6 (Positive Semidefinite and Nonsingular) A matrix is positive definite
if and only if it is positive semidefinite and nonsingular.
Proof If A is positive definite then it is positive semidefinite and if Ax = 0 then
x ∗ Ax = 0 which implies that x = 0. Conversely, if A is positive semidefinite then
it is Hermitian with nonnegative eigenvalues (cf. Lemma 4.5). If it is nonsingular all
eigenvalues are positive (cf. Theorem 1.11), and it follows from Lemma 4.5 that A
is positive definite.
The following necessary and sufficient conditions can be used to decide if a
matrix is positive definite.
4.3 Semi-Cholesky Factorization of a Banded Matrix
91
Theorem 4.4 (Positive Definite Characterization) The following statements are
equivalent for a matrix A ∈ Cn×n .
1. A is positive definite.
2. A is Hermitian with only positive eigenvalues.
3. A is Hermitian and all leading principal submatrices have a positive determinant.
4. A = BB ∗ for a nonsingular B ∈ Cn×n .
Proof
1 ⇐⇒ 2: This follows from Lemma 4.5.
1 ⇒ 3: A positive definite matrix has positive eigenvalues, and since the
determinant of a matrix equals the product of its eigenvalues (cf. Theorem 1.10)
the determinant is positive. Every leading principal submatrix of a positive
definite matrix is positive definite (cf. Lemma 4.4) and therefore has a positive
determinant.
3 ⇒ 4: Since a leading principal submatrix has a positive determinant it is
nonsingular and Theorem 4.1 implies that A has a unique LDL* factorization
and by Theorem 4.2 a unique Cholesky factorization A = BB ∗ with B = L.
4 ⇒ 1: This follows from Lemma 4.2.
Example
4.6
(Positive Definite Characterization) Consider the symmetric matrix
31
A :=
.
13
1. We have x T Ax = 2x12 + 2x22 + (x1 + x2 )2 > 0 for all nonzero x showing that A
is positive definite.
2. The eigenvalues of A are λ1 = 2 and λ2 = 4. They are positive showing that A
is positive definite since it is symmetric.
3. We find det(A[1] ) = 3 and det(A[2] ) = 8 showing again that A is positive definite
since it is also symmetric.
4. Finally A is positive definite since by Example 4.2 we have
A = BB ∗ ,
B=
√
3 0
1 0
√
.
8/3
0
1/3 1
4.3 Semi-Cholesky Factorization of a Banded Matrix
A positive semidefinite matrix has a factorization that is similar to the Cholesky
factorization.
Definition 4.2 (Semi-Cholesky Factorization) A factorization A = LL∗ of A ∈
Cn×n , where L is lower triangular with nonnegative diagonal elements is called a
semi-Cholesky factorization.
92
4 LDL* Factorization and Positive Definite Matrices
Note that a semi-Cholesky factorization of a positive definite matrix is necessarily a Cholesky factorization. For if A is positive definite then it is nonsingular and
then L must be nonsingular. Thus the diagonal elements of L cannot be zero.
Theorem 4.5 (Characterization, Semi-Cholesky Factorization) A matrix A ∈
Cn×n has a semi-Cholesky factorization A = LL∗ if and only if it is positive
semidefinite.
Proof If A = LL∗ is a semi-Cholesky factorization then A is Hermitian. Moreover,
x ∗ Ax = L∗ x22 ≥ 0 and A is positive semidefinite. For the converse we use
induction on n. A positive semidefinite matrix of order one has a semi-Cholesky
factorization since the only element in A is nonnegative. Suppose any positive
semidefinite matrix of order n − 1 has a semi-Cholesky factorization and suppose
A ∈ Cn×n is positive semidefinite. We partition A as follows
α v∗
A=
,
v B
α ∈ C, v ∈ Cn−1 , B ∈ C(n−1)×(n−1) .
(4.8)
There are two cases. Suppose first α = e∗1 Ae1 > 0. We claim that C := B −
vv ∗ /α is positive semidefinite. C is Hermitian since B is. To show that C is positive
semidefinite we consider any y ∈ Cn−1 and define x ∗ := [−y ∗ v/α, y ∗ ] ∈ Cn . Then
∗ ∗
αv
−v y/α
v B
y
∗
−v y/α
= [0, −(y ∗ v)v ∗ /α + y ∗ B]
y
0 ≤ x ∗ Ax = [−y ∗ v/α, y ∗ ]
(4.9)
= −y ∗ vv ∗ y/α + y ∗ By = y ∗ Cy.
So C ∈ C(n−1)×(n−1) is positive semidefinite and by the induction hypothesis it has
a semi-Cholesky factorization C = L1 L∗1 . The matrix
L∗ :=
∗ β v /β
,
0 L∗1
β :=
√
α,
(4.10)
is upper triangular with nonnegative diagonal elements and
LL∗ =
β 0
v/β L1
∗ ∗
β v /β
αv
=
=A
v B
0 L∗1
is a semi-Cholesky factorization of A.
If α = 0 then part 4 of Theorem 4.3 implies that v = 0. Moreover, B ∈
C(n−1)×(n−1) in (4.8) is positive semidefinite and therefore has a semi-Cholesky
4.3 Semi-Cholesky Factorization of a Banded Matrix
93
0 0∗
factorization B =
But then
where L =
is a semi-Cholesky
0 L1
factorization of A. Indeed, L is lower triangular and
L1 L∗1 .
LL∗ =
LL∗ ,
0 0∗
0 L1
∗ ∗
0 0
00
=
= A.
0 L∗1
0B
Recall that a matrix A is d-banded if aij = 0 for |i − j | > d. A (semi-) Cholesky
factorization preserves bandwidth.
Theorem 4.6 (Bandwidth Semi-Cholesky Factor) The semi-Cholesky factor L
given by (4.10) has the same bandwidth as A.
Proof Suppose A ∈ Cn×n is d-banded. Then v ∗ = [u∗ , 0∗ ] in (4.8), where u ∈ Cd ,
and therefore C := B − vv ∗ /α differs from B only in the upper left d × d corner. It
follows that C has the same bandwidth as B and A. By induction on n, C = L1 L∗1 ,
where L∗1 has the same bandwidth as C. But then L in (4.10) has the same bandwidth
as A.
Consider now implementing an algorithm based on the previous discussion.
Since A is Hermitian we only need to use the lower part of A. The first column
of L is [β, v ∗ /β]∗ if α > 0. If α = 0 then by 4 in Theorem 4.3 the first column of
A is zero and this is also the first column of L. We obtain
if A(1, 1) > 0
A(1, 1) = A(1, 1)
A(2 : n, 1) = A(2 : n, 1)/A(1, 1)
(4.11)
for j = 2 : n
A(j : n, j ) = A(j : n, j ) − A(j, 1) ∗ A(j : n, 1)
Here we store the first column of L in the first column of A and the lower part of
C = B − vv ∗ /α in the lower part of A(2 : n, 2 : n).
The code can be made more efficient when A is a d-banded matrix. We simply
replace all occurrences of n by min(i + d, n). Continuing the reduction we arrive
at the following algorithm, which take a d-banded positive semidefinite A and
d ≥ 1 as input, and returns a lower triangular matrix L so that A = LL∗ . This
is the Cholesky factorization of A if A is positive definite and a semi-Cholesky
factorization of A otherwise. The algorithm uses the MATLAB command tril:
94
4 LDL* Factorization and Positive Definite Matrices
function L=bandsemicholeskyL(A,d)
%L=bandsemicholeskyL(A,d)
n=length(A);
for k=1:n
kp=min(n,k+d);
if A(k,k)>0
A(k,k)=sqrt(A(k,k));
A((k+1):kp,k)=A((k+1):kp,k)/A(k,k);
for j=k+1:kp
A(j:kp,j)=A(j:kp,j)-A(j,k)*A(j:kp,k);
end
else
A(k:kp,k)=zeros(kp-k+1,1);
end
end
L=tril(A);
end
Listing 4.3 bandsemicholeskyL
In the algorithm we overwrite the lower triangle of A with the elements of L.
Column k of L is zero for those k where kk = 0. We reduce round-off noise by
forcing those rows to be zero. In the semidefinite case no update is necessary and
we “do nothing”.
Deciding when a diagonal element is zero can be a problem in floating point
arithmetic.
We end the section with some necessary and sufficient conditions for a matrix to
be positive semidefinite.
Theorem 4.7 (Positive Semidefinite Characterization) The following is equivalent for a matrix A ∈ Cn×n .
1.
2.
3.
4.
A is positive semidefinite.
A is Hermitian with only nonnegative eigenvalues.
A is Hermitian and all principal submatrices have a nonnegative determinant.
A = BB ∗ for some B ∈ Cn×n .
Proof
1 ⇐⇒ 2: This follows from Lemma 4.5.
1 ⇐⇒ 4: This follows from Theorem 4.5
1 ⇐⇒ 3: We refer to page 567 of [15], where it is shown that 4 ⇒ 3 (and
therefore 1 ⇒ 3 since 1 ⇐⇒ 4) and 3 ⇒ 1.
4.4 The Non-symmetric Real Case
95
Example 4.7 (Positive
Semidefinite Characterization) Consider the symmetric
11
matrix A :=
.
11
1. We have x ∗ Ax = x12 + x22 + x1 x2 + x2 x1 = (x1 + x2 )2 ≥ 0 for all x ∈ R2
showing that A is positive semidefinite.
2. The eigenvalues of A are λ1 = 2 and λ2 = 0 and they are nonnegative showing
that A is positive semidefinite since it is symmetric.
3. There are three principal sub matrices, and they have determinants det([a11]) =
1, det([a22]) = 1 and det(A) = 0 and showing again that A is positive
semidefinite.
10
4. Finally A is positive semidefinite since A = BB ∗ , where B =
.
10
In part 4 of Theorem 4.7 we require nonnegativity of all principal minors,
while only positivity of leading principal minors was required for positive definite
matrices (cf. Theorem 4.4). To see that nonnegativity
of the leading principal minors
0
. The leading principal minors are
is not enough consider the matrix A := 00 −1
nonnegative, but A is not positive semidefinite.
4.4 The Non-symmetric Real Case
In this section we say that a matrix A ∈ Rn×n is positive semidefinite if x ∗ Ax ≥ 0
for all x ∈ Rn and positive definite if x ∗ Ax > 0 for all nonzero x ∈ Rn . Thus
we do not require A to be symmetric. This means that some of the eigenvalues can
be complex (cf. Example 4.8). Note that a non-symmetric positive definite matrix is
nonsingular, but in Exercise 4.3 you can show that a converse is not true.
We have the following theorem.
Theorem 4.8 (The Non-symmetric Case) Suppose A ∈ Rn×n is positive definite.
Then the following holds true.
1.
2.
3.
4.
5.
Every principal submatrix of A is positive definite,
A has a unique LU factorization,
the real eigenvalues of A are positive,
det(A) > 0,
aii ajj > aij aj i , for i = j .
Proof
1. The proof is the same as for Lemma 4.4.
2. Since all leading submatrices are positive definite they are nonsingular and the
result follows from the LU Theorem 3.4.
96
4 LDL* Factorization and Positive Definite Matrices
3. Suppose (λ, x) is an eigenpair of A and that λ is real. Since A is real we can
choose x to be real. Multiplying Ax = λx by x T and solving for λ we find
T
λ = xx TAx
> 0.
x
4. The determinant of A equals the product of its eigenvalues. The eigenvalues are
either real and positive or occur in complex conjugate pairs. The product of two
nonzero complex conjugate
is positive.
a numbers
a 5. The principal submatrix ajiii ajjij has a positive determinant.
Example 4.8 (2 × 2 Positive Definite) A non-symmetric positive definite matrix can
have complex eigenvalues. The family of matrices
2 2−a
A[a] :=
,
a 1
a∈R
is positive definite for any a ∈ R. Indeed, for any nonzero x ∈ R2
x T Ax = 2x12 + (2 − a)x1 x2 + ax2 x1 + x22 = x12 + (x1 + x2 )2 > 0.
The eigenvalues of A[a] are positive for a ∈ [1 −
other values of a.
√
5
2 ,1
+
√
5
2 ]
and complex for
4.5 Exercises Chap. 4
4.5.1 Exercises Sect. 4.2
Exercise 4.1 (Positive Definite Characterizations)
Show directly that all 4 char21
acterizations in Theorem 4.4 hold for the matrix
.
12
Exercise 4.2 (L1U factorization (Exam 1982-1)) ]Find the L1U factorization of
the following matrix A ∈ Rn×n
⎛
⎞
1 −1 0 · · · 0
⎜
. ⎟
⎜ −1 2 −1 . . . .. ⎟
⎜
⎟
⎜
⎟
A = ⎜ 0 ... ... ... 0 ⎟ .
⎜
⎟
⎜ . .
⎟
⎝ .. . . −1 2 −1 ⎠
0 · · · 0 −1 2
Is A positive definite?
4.6 Review Questions
97
Exercise 4.3 (A Counterexample) In the non-symmetric case a nonsingular positive semidefinite matrixis not necessarily positive definite. Show this by considering
1 0
the matrix A :=
.
−2 1
Exercise 4.4 (Cholesky Update (Exam Exercise 2015-2))
a) Let E ∈ Rn×n be of the form E = I + uuT , where u ∈ Rn . Show that E is
symmetric and positive definite, and find an expression for E −1 .1
b) Let A ∈ Rn×n be of the form A = B + uuT , where B ∈ Rn×n is symmetric and
positive definite, and u ∈ Rn . Show that A can be decomposed as
A = L(I + vv T )LT ,
where L is nonsingular and lower triangular, and v ∈ Rn .
c) Assume that the Cholesky decomposition of B is already computed. Outline a
procedure to solve the system Ax = b, where A is of the form above.
Exercise 4.5 (Cholesky Update (Exam Exercise 2016-2)) Let A ∈ Rn×n be a
symmetric positive definite matrix with a known Cholesky factorization A = LLT .
Furthermore, let A+ be a corresponding (n + 1) × (n + 1) matrix of the form
A+ =
A a
,
aT α
where a is a vector in Rn , and α is a real number. We assume that the matrix A+ is
symmetric positive definite.
a) Show that if A+ = L+ LT+ is the Cholesky factorization of A+ , then L+ is of the
form
L 0
L+ = T
,
y λ
i.e., that the leading principal n × n submatrix of L+ is L.
b) Explain why α > L−1 a22 .
c) Explain how you can compute L+ when L is known.
4.6 Review Questions
4.6.1
4.6.2
1 Hint:
What is the content of the LDL* theorem?
Is A∗ A always positive definite?
The matrix E −1 is of the form E −1 = I + auuT for some a ∈ R.
98
4.6.3
4.6.4
4.6.5
4.6.6
4.6.7
4 LDL* Factorization and Positive Definite Matrices
! 10 4 3 "
Is the matrix 4 0 2 positive definite?
3 25
What class of matrices has a Cholesky factorization?
What is the bandwidth of the Cholesky factor of a band matrix?
For a symmetric matrix give 3 conditions that are equivalent to positive
definiteness.
What class of matrices has a semi-Cholesky factorization?
Chapter 5
Orthonormal and Unitary
Transformations
In Gaussian elimination and LU factorization we solve a linear system by transforming it to triangular form. These are not the only kind of transformations that can be
used for such a task. Matrices with orthonormal columns, called unitary matrices
can be used to reduce a square matrix to upper triangular form and more generally
a rectangular matrix to upper triangular (also called upper trapezoidal) form. This
lead to a decomposition of a rectangular matrix known as a QR decomposition and
a reduced form which we refer to as a QR factorization. The QR decomposition
and factorization will be used in later chapters to solve least squares- and eigenvalue
problems.
Unitary transformations have the advantage that they preserve the Euclidian norm
of a vector. This means that when a unitary transformation is applied to an inaccurate
vector then the error will not grow. Thus a unitary transformation is said to be
numerically stable. We consider two classes of unitary transformations known as
Householder- and Givens transformations, respectively.
5.1 Inner Products, Orthogonality and Unitary Matrices
An inner product or scalar product in a vector space is a function mapping pairs
of vectors into a scalar.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_5
99
100
5 Orthonormal and Unitary Transformations
5.1.1 Real and Complex Inner Products
Definition 5.1 (Inner Product) An inner product in a complex vector space V is
a function V × V → C satisfying for all x, y, z ∈ V and all a, b ∈ C the following
conditions:
1. x, x ≥ 0 with equality if and only if x = 0.
2. x, y = y, x
3. ax + by, z = ax, z + by, z.
(positivity)
(skew symmetry)
(linearity)
The pair (V, ·, ·) is called an inner product space.
Note the complex conjugate in 2. Since
x, ay + bz = ay + bz, x = ay, x + bz, x = ay, x + bz, x
we find
x, ay + bz = ax, y + bx, z,
ax, ay = |a|2 x, y.
(5.1)
An inner product in a real vector space V is real valued function satisfying
Properties 1,2,3 in Definition 5.1, where we can replace skew symmetry by
symmetry
x, y = y, x (symmetry).
In the real case we have linearity in both variables since we can remove the complex
conjugates in (5.1).
Recall that (cf. (1.10)) the standard inner product in Cn is given by
x, y := y ∗ x = x T y =
n
xj yj .
j =1
Note the complex conjugate on y. It is clearly an inner product in Cn .
The function
· : V → R,
x −→ x :=
x, x
(5.2)
is called the inner product norm.
The inner product
norm for the standard inner product is the Euclidian norm
√
x = x2 = x ∗ x.
5.1 Inner Products, Orthogonality and Unitary Matrices
101
The following inequality holds for any inner product.
Theorem 5.1 (Cauchy-Schwarz Inequality) For any x, y in a real or complex
inner product space
|x, y| ≤ xy,
(5.3)
with equality if and only if x and y are linearly dependent.
Proof If y = 0 then 0x + y = 0 and x and y are linearly dependent. Moreover the
inequality holds with equality since x, y = x, 0y = 0x, y = 0 and y = 0.
So assume y = 0. Define
z := x − ay,
a :=
x, y
.
y, y
By linearity z, y = x, y − ay, y = 0 so that by 2. and (5.1)
ay, z + z, ay = az, y + az, y = 0.
(5.4)
But then
x2 = x, x = z + ay, z + ay
(5.4)
(5.1)
= z, z + ay, ay = z2 + |a|2y2
≥ |a|2 y2 =
|x, y|2
.
y2
Multiplying by y2 gives (5.3). We have equality if and only if z = 0, which means
that x and y are linearly dependent.
Theorem 5.2 (Inner Product Norm) For all x, y in an inner product space and
all a in C we have
1. x ≥ 0 with equality if and only if x = 0.
2. ax = |a| x.
3. x + y ≤ x + y,
√
where x := x, x.
(positivity)
(homogeneity)
(subadditivity)
In general a function : Cn → R that satisfies these three properties is called a
vector norm. A class of vector norms called p-norms will be studied in Chap. 8.
102
5 Orthonormal and Unitary Transformations
Proof The first statement is an immediate consequence of positivity, while the
second one follows from (5.1). Expanding x + ay2 = x + ay, x + ay using
(5.1) we obtain
x + ay2 = x2 + ay, x + ax, y + |a|2 y2 ,
a ∈ C,
x, y ∈ V.
(5.5)
Now (5.5) with a = 1 and the Cauchy-Schwarz inequality implies
x + y2 ≤ x2 + 2xy + y2 = (x + y)2 .
Taking square roots completes the proof.
In the real case the Cauchy-Schwarz inequality implies that −1 ≤
for nonzero x and y, so there is a unique angle θ in [0, π] such that
cos θ =
x, y
.
xy
x,y
xy
≤1
(5.6)
This defines the angle between vectors in a real inner product space.
5.1.2 Orthogonality
Definition 5.2 (Orthogonality) Two vectors x, y in a real or complex inner product space are orthogonal or perpendicular, denoted as x ⊥ y, if x, y = 0. The
vectors are orthonormal if in addition x = y = 1.
From the definitions (5.6), (5.20) of angle θ between two nonzero vectors in Rn
or Cn it follows that x ⊥ y if and only if θ = π/2.
Theorem 5.3 (Pythagoras) For a real or complex inner product space
x + y2 = x2 + y2 ,
if x ⊥ y.
Proof We set a = 1 in (5.5) and use the orthogonality.
(5.7)
Definition 5.3 (Orthogonal- and Orthonormal Bases) A set of nonzero vectors
{v 1 , . . . , v k } in a subspace S of a real or complex inner product space is an
orthogonal basis for S if it is a basis for S and v i , v j = 0 for i = j . It is an
orthonormal basis for S if it is a basis for S and v i , v j = δij for all i, j .
A basis for a subspace of an inner product space can be turned into an orthogonalor orthonormal basis for the subspace by the following construction (Fig. 5.1).
5.1 Inner Products, Orthogonality and Unitary Matrices
*
v1 := s1
A
A
A
*
*
Fig. 5.1 The construction of
v 1 and v 2 in Gram-Schmidt.
The constant c is given by
c := s 2 , v 1 /v 1 , v 1 103
v2
s2
cv1
Theorem 5.4 (Gram-Schmidt) Let {s 1 , . . . , s k } be a basis for a real or complex
inner product space (S, ·, ·). Define
j −1
v 1 := s 1 ,
v j := s j −
i=1
s j , v i vi ,
v i , v i j = 2, . . . , k.
(5.8)
Then {v 1 , . . . , v k } is an orthogonal basis for S and the normalized vectors
,
{u1 , . . . , uk } :=
vk
v1
,...,
v 1 v k -
form an orthonormal basis for S.
Proof To show that {v 1 , . . . , v k } is an orthogonal basis for S we use induction on
k. Define subspaces Sj := span{s 1 , . . . , s j } for j = 1, . . . , k. Clearly v 1 = s 1
is an orthogonal basis for S1 . Suppose for some j ≥ 2 that v 1 , . . . , v j −1 is an
orthogonal basis for Sj −1 and let v j be given by (5.8) as a linear combination of
s j and v 1 , . . . , v j −1 . Now each of these v i is a linear combination of s 1 , . . . , s i ,
j
and we obtain v j = i=1 ai s i for some a0 , . . . , aj with aj = 1. Since s 1 , . . . , s j
are linearly independent and aj = 0 we deduce that v j = 0. By the induction
hypothesis
j −1
v j , v l = s j , v l −
i=1
s j , v i s j , v l v i , v l = s j , v l −
v l , v l = 0
v i , v i v l , v l for l = 1, . . . , j − 1. Thus v 1 , . . . , v j is an orthogonal basis for Sj .
If {v 1 , . . . , v k } is an orthogonal basis for S then clearly {u1 , . . . , uk } is an
orthonormal basis for S.
Sometimes we want to extend an orthogonal basis for a subspace to an orthogonal
basis for a larger space.
104
5 Orthonormal and Unitary Transformations
Theorem 5.5 (Orthogonal Extension of Basis) Suppose S ⊂ T are finite dimensional subspaces of a vector space V. An orthogonal basis for S can always be
extended to an orthogonal basis for T .
Proof Suppose dim S := k < dim T = n. Using Theorem 1.3 we first extend
an orthogonal basis s 1 , . . . , s k for S to a basis s 1 , . . . , s k , s k+1 , . . . , s n for T , and
then apply the Gram-Schmidt process to this basis obtaining an orthogonal basis
v 1 , . . . , v n for T . This is an extension of the basis for S since v i = s i for i =
1, . . . , k. We show this by induction. Clearly v 1 = s 1 . Suppose for some 2 ≤ r < k
that v j = s j for j = 1, . . . , r − 1. Consider (5.8) for j = r. Since s r , v i =
s r , s i = 0 for i < r we obtain v r = s r .
Letting S = span(s 1 , . . . , s k ) and T be Rn or Cn we obtain
Corollary 5.1 (Extending Orthogonal Vectors to a Basis) For 1 ≤ k < n a
set {s 1 , . . . , s k } of nonzero orthogonal vectors in Rn or Cn can be extended to an
orthogonal basis for the whole space.
5.1.3 Sum of Subspaces and Orthogonal Projections
Suppose S and T are subspaces of a real or complex vector space V endowed with
an inner product x, y. We define
• Sum: S + T := {s + t : s ∈ S and t ∈ T },
• direct sum S ⊕ T : a sum where S ∩ T = {0},
⊥
• orthogonal sum S ⊕T : a sum where s, t = 0 for all s ∈ S and t ∈ T .
We note that
• S + T is a vector space, a subspace of V which in this book will be Rn or Cn (cf.
Example 1.2).
• Every v ∈ S ⊕ T can be decomposed uniquely in the form v = s + t, where
s ∈ S and t ∈ T . For if v = s 1 + t 1 = s 2 + t 2 for s 1 , s 2 ∈ S and t 1 , t 2 ∈ T ,
then 0 = s 1 − s 2 + t 1 − t 2 or s 1 − s 2 = t 2 − t 1 . It follows that s 1 − s 2 and t 2 − t 1
belong to both S and T and hence to S ∩ T . But then s 1 − s 2 = t 2 − t 1 = 0 so
s 1 = s 2 and t 2 = t 1 .
By (1.8) in the introduction chapter we have
dim(S ⊕ T ) = dim(S) + dim(T ).
The subspaces S and T in a direct sum are called complementary subspaces.
• An orthogonal sum is a direct sum. For if v ∈ S ∩T then v is orthogonal to itself,
v, v = 0, which implies that v = 0. We often write T := S ⊥ .
• Suppose v = s 0 + t 0 ∈ S ⊕ T , where s 0 ∈ S and t 0 ∈ T . The vector s 0 is
called the oblique projection of v into S along T . Similarly, The vector t 0 is
5.1 Inner Products, Orthogonality and Unitary Matrices
105
Fig. 5.2 The orthogonal
projections of s + t into S
and T
s+t
t0
s0
S
⊥
called the oblique projection of v into T along S. If S ⊕T is an orthogonal
sum then s 0 is called the orthogonal projection of v into S. Similarly, t 0 is
called the orthogonal projection of v in T = S ⊥ . The orthogonal projections
are illustrated in Fig. 5.2.
Theorem 5.6 (Orthogonal Projection) Let S and T be subspaces of a finite
dimensional real or complex vector space V with an inner product ·, ·. The
⊥
⊥
orthogonal projections s 0 of v ∈ S ⊕T into S and t 0 of v ∈ S ⊕T into T satisfy
v = s 0 + t 0 , and
s 0 , s = v, s,
for all s ∈ S,
t 0 , t = v, t,
for all t ∈ T .
(5.9)
Moreover, if {v 1 , . . . , v k } is an orthogonal basis for S then
k
s0 =
i=1
v, v i vi .
v i , v i (5.10)
Proof We have s 0 , s = v − t 0 , s = v, s, since t 0 , s = 0 for all s ∈ S and
(5.9) follows. If s 0 is given by (5.10) then for j = 1, . . . , k
k
s 0 , v j = i=1
v, v i vi , v j =
v i , v i k
i=1
v, v i v i , v j = v, v j .
v i , v i By linearity (5.9) holds for all s ∈ S. By uniqueness it must be the orthogonal
⊥
projections of v ∈ S ⊕T into S. The proof for t 0 is similar.
106
5 Orthonormal and Unitary Transformations
Corollary 5.2 (Best Approximation) Let S be a subspaces of a finite dimensional
real or complex
vector space V with an inner product ·, · and corresponding norm
√
v := v, v. If s 0 ∈ S is the orthogonal projection of v ∈ V then
v − s 0 < v − s, for all s ∈ S, s = s 0 .
(5.11)
Proof Let s 0 = s ∈ S and 0 = u := s 0 − s ∈ S. It follows from (5.9) that
v − s 0 , u = 0. By (5.7) (Pythagoras) we obtain
v − s2 = v − s 0 + u2 = v − s 0 2 + u2 > v − s 0 2 .
5.1.4 Unitary and Orthogonal Matrices
In the rest of this chapter orthogonality is in terms of the standard inner product in
Cn given by x, y := y ∗ x = nj=1 xj yj . For symmetric and Hermitian matrices
we have the following characterization.
Lemma 5.1 Let A ∈ Cn×n and x, y be the standard inner product in Cn . Then
1. AT = A ⇐⇒ Ax, y = x, Ay for all x, y ∈ Cn .
2. A∗ = A ⇐⇒ Ax, y = x, Ay for all x, y ∈ Cn .
Proof Suppose AT = A and x, y ∈ Cn . Then
∗
x, Ay = (Ay)∗ x = y ∗ A x = y ∗ AT x = y ∗ Ax = Ax, y.
For the converse we take x = ej and y = ei for some i, j and obtain
eTi Aej = Aej , ei = ej , Aei = eTi AT ej .
Thus, A = AT since they have the same i, j element for all i, j . The proof of 2. is
similar.
A square matrix U ∈ Cn×n is unitary if U ∗ U = I . If U is real then U T U =
I and U is called an orthogonal matrix. Unitary and orthogonal matrices have
orthonormal columns.
If U ∗ U = I the matrix U is nonsingular, U −1 = U ∗ and therefore U U ∗ =
U U −1 = I as well. Moreover, both the columns and rows of a unitary matrix of
order n form orthonormal bases for Cn . We also note that the product of two unitary
matrices is unitary. Indeed, if U ∗1 U 1 = I and U ∗2 U 2 = I then (U 1 U 2 )∗ (U 1 U 2 ) =
U ∗2 U ∗1 U 1 U 2 = I .
5.2 The Householder Transformation
107
Theorem 5.7 (Unitary Matrix) The matrix U ∈ Cn×n is unitary if and only if
U x, U y = x, y for all x, y ∈ Cn . In particular, if U is unitary then U x2 =
x2 for all x ∈ Cn .
Proof If U ∗ U = I and x, y ∈ Cn then
U x, U y = (U y)∗ (U x) = y ∗ U ∗ U x = y ∗ x = x, y.
Conversely, if U x, U y = x, y for all x, y ∈ Cn then U ∗ U = I since for
i, j = 1, . . . , n
(U ∗ U )i,j = e∗i U ∗ U ej = (U ei )∗ (U ej ) = U ej , U ei = ej , ei = e∗i ej ,
so that (U ∗ U )i,j = δi,j for all i, j . The last part of the theorem follows immediately
by taking y = x:
5.2 The Householder Transformation
We consider a unitary matrix with many useful properties.
Definition 5.4 (Householder Transformation) A matrix H ∈ Cn×n of the form
H := I − uu∗ , where u ∈ Cn and u∗ u = 2
is called a Householder transformation. The name elementary reflector is also
used.
In the real case and for n = 2 we find
10
1 − u21 −u1 u2
u H =
.
− 1 u1 u2 =
u2
−u2 u1 1 − u22
01
A Householder transformation is Hermitian and unitary. Indeed, H ∗ = (I −
uu∗ )∗ = H and
H ∗ H = H 2 = (I − uu∗ )(I − uu∗ ) = I − 2uu∗ + u(u∗ u)u∗ = I .
In the real case H is symmetric and orthogonal.
There are several ways to represent a Householder transformation. Householder
used I − 2uu∗ , where u∗ u = 1. For any nonzero v ∈ Cn the matrix
H := I − 2
vv ∗
v∗ v
(5.12)
108
5 Orthonormal and Unitary Transformations
√ v
is a Householder transformation. Indeed, H = I − uu∗ , where u := 2 v
has
2
√
length 2.
Two vectors can, under certain conditions, be mapped into each other by a
Householder transformation.
Lemma 5.2 Suppose x, y ∈ Cn are two vectors such that x2 = y2 , y ∗ x is
∗
real and v := x − y = 0. Then I − 2 vv
v ∗ v x = y.
Proof Since x ∗ x = y ∗ y and Re(y ∗ x) = y ∗ x we have
v ∗ v = (x − y)∗ (x − y) = 2x ∗ x − 2Re(y ∗ x) = 2v ∗ x.
∗
But then I − 2 vvv∗ v x = x −
2v ∗ x
v∗ v v
(5.13)
= x − v = y.
There is a nice geometric interpretation of this Lemma. We have
H =I−
vv ∗
vv ∗
2vv ∗
=
P
−
,
where
P
:=
I
−
,
v∗ v
v∗v
v∗ v
and
Px = x −
v ∗ x (5.13)
1
1
v = x − v = (x + y).
v∗v
2
2
If x, y ∈ Rn it follows that H x is the reflected image of x. The “mirror” M :=
{w ∈ Rn : w∗ v = 0} contains the vector (x + y)/2 and has normal x − y. This is
illustrated for the real case in Fig. 5.3.
“mirror”
Fig. 5.3 The Householder
transformation in
Example 5.1
y = Hx
Px
x
v =x−y
5.2 The Householder Transformation
109
Example 5.1 (Reflector) Suppose x := [1, 0, 1]T and y := [−1, 0, 1]T . Then v =
x − y = [2, 0, 0]T and
⎡
⎤
⎡ ⎤
⎡
⎤
100
2 −1 0 0
2
H := I − T = ⎣0 1 0⎦ − ⎣0⎦ 2 0 0 = ⎣ 0 1 0⎦ ,
v v
4
001
0
0 01
2vv T
⎡
⎤
⎡ ⎤
⎡
100
2 00
1
P := I − T = ⎣0 1 0⎦ − ⎣0⎦ 2 0 0 = ⎣0 1
v v
4
001
0
00
vv T
⎤
0
0⎦ .
1
The set
M := {w ∈ R3 : wT v = 0} = {
! w1 "
w2
w3
: 2w1 = 0}
is the yz plane (cf. Fig. 5.3), H x = [−1, 0, 1]T = y, and P x = [0, 0, 1]T =
(x + y)/2 ∈ M.
Householder transformations can be used to produce zeros in vectors. In the
following Theorem we map any vector in Cn into a multiple of the first unit vector.
Theorem 5.8 (Zeros in Vectors) For any x ∈ Cn there is a Householder transformation H ∈ Cn×n such that
H x = ae1 ,
a = −ρx2 ,
ρ :=
x1 /|x1 |,
if x1 = 0,
1,
otherwise.
Proof If x = 0 √
then H x = 0 and a = 0. Any u with u2 =
we choose u := 2e1 in this case. For x = 0 we define
z + e1
u := √
, where z := ρx/x2 .
1 + z1
√
2 will work, and
(5.14)
2 x = x. Moreover, z = 1 and z =
Since |ρ| = 1 we have ρx2 z = |ρ|
2
1
(z+e1 )∗ (z+e1 )
2+2z1
∗
|x1 |/x2 is real so that u u =
=
=
2.
Finally,
1+z1
1+z1
H x = x − (u∗ x)u = ρx2 (z − (u∗ z)u) = ρx2 (z −
(z∗ + e∗1 )z
(z + e1 ))
1 + z1
= ρx2 (z − (z + e1 )) = −ρx2 e1 = ae1 .
110
5 Orthonormal and Unitary Transformations
The formulas in Theorem 5.8 are implemented in the following algorithm
adapted from [17]. To any given x ∈ Cn , a number a and a vector u with u∗ u = 2
is computed so that (I − uu∗ )x = ae1 :
function [u,a]=housegen(x)
% [u,a]=housegen(x)
a=norm(x);
if a==0
u=x; u(1)=sqrt(2); return;
end
if x(1)== 0
r=1;
else
r=x(1)/abs(x(1));
end
u=conj(r)*x/a;
u(1)=u(1)+1;
u=u/sqrt(u(1));
a=-r*a;
end
Listing 5.1 housegen
Note that
• In Theorem 5.8 the first component of z is z1 = |x1 |/x2 ≥ 0. Since z2 =
1 we have 1 ≤ 1 + z1 ≤ 2. It follows that we avoid cancelation error when
computing 1 + z1 and u and a are computed in a numerically stable way.
• In order to compute H x for a vector x we do not need to form the matrix H .
Indeed, H x = (I − uu∗ )x = x − (u∗ x)u. If u, x ∈ Rm this requires 2m
operations to find uT x, m operations for (uT x)u and m operations for the final
subtraction of the two vectors, a total of 4m arithmetic operations. If A ∈ Rm×n
then 4mn operations are required for H A = A − (uT A)u, i.e., 4m operations
for each of the n columns of A.
• Householder transformations can also be used to zero out only the lower part of
a vector. Suppose x T := [y, z]T , where y ∈ Ck , z ∈ Cn−k for some 1 ≤ k <
n. The command [û, a] := housegen(z) defines a Householder
transformation
Ĥ = I − ûû∗ so that Ĥ z = ae1 ∈ Cn−k . With u := û0 ∈ Cn we see that
u∗ u = û∗ û = 2, and
I 0
y
I 0
0 ∗
∗
, where H := I − uu =
Hx =
,
−
0 û =
ae1
0 Ĥ
0I
û
defines a Householder transformation that produces zeros in the lower part of x.
5.3 Householder Triangulation
111
5.3 Householder Triangulation
We say that a matrix R ∈ Cm×n is upper trapezoidal, if ri,j = 0 for j < i and
i = 2, 3 . . . , m. Upper trapezoidal matrices corresponding to m < n, m = n, and
m > n look as follows:
⎡
⎤
⎡
⎤
⎡
⎤
xxxx
xxx
xxxx
⎢
⎥
⎢
⎥
⎣0 x x x ⎦ , ⎢0 x x x ⎥ , ⎢0 x x ⎥ .
⎣0 0 x x ⎦
⎣0 0 x ⎦
00xx
000x
000
In this section we consider a method for bringing a matrix to upper trapezoidal form
using Householder transformations.
5.3.1 The Algorithm
We treat the cases m > n and m ≤ n separately and consider first m > n. We
describe how to find a sequence H 1 , . . . , H n of Householder transformations such
that
R1
An+1 := H n H n−1 · · · H 1 A =
= R,
0
and where R 1 is square and upper triangular. We define
A1 := A,
Ak+1 = H k Ak ,
k = 1, 2, . . . , n.
Suppose Ak has the following form
⎡
(1)
a (1) · · · a1,k−1
⎢ 1,1
..
..
⎢
.
⎢
.
⎢
(k−1)
⎢
ak−1,k−1
⎢
⎢
⎢
Ak = ⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
=
Bk Ck
.
0 Dk
⎤
(1)
(1)
(1)
a1,k
· · · a1,j
· · · a1,n
..
..
.. ⎥
⎥
.
.
. ⎥
(k−1)
(k−1)
(k−1) ⎥
ak−1,k · · · ak−1,j · · · ak−1,n ⎥
⎥
(k)
(k)
(k) ⎥
ak,k · · · ak,j · · · ak,n ⎥
⎥
..
..
.. ⎥
.
.
. ⎥
⎥
(k)
(k)
(k) ⎥
ai,k · · · ai,j · · · ai,n ⎥
..
..
.. ⎥
⎥
.
.
. ⎦
(k)
(k)
(k)
am,k · · · am,j · · · am,n
(5.15)
Thus Ak is upper trapezoidal in its first k − 1 columns (which is true for k = 1).
112
5 Orthonormal and Unitary Transformations
Let Ĥ k := I − ûk û∗k be a Householder transformation that maps the first column
(k)
(k)
[ak,k , . . . , am,k ]T of D k to a multiple of e1 , Ĥ k (D k e1 ) = ak e1 . Using Algorithm 5.1
"
!
I
0
is a Householder
we have [ûk , ak ] = housegen(D k e1 ). Then H k := k−1
0 Ĥ k
transformation and
Bk Ck
B k+1 C k+1
Ak+1 := H k Ak =
=
,
0 D k+1
0 Ĥ k D k
where B k+1 ∈ Ck×k is upper triangular and D k+1 ∈ C(m−k)×(n−k) . Thus Ak+1 is
upper trapezoidal in its first k columns
and the reduction has been carried one step
further. At the end R := An+1 = R01 , where R 1 is upper triangular.
The process can also be applied to A ∈ Cm×n if m ≤ n. If m = 1 then A is
already in upper trapezoidal form. Suppose m > 1. In this case m − 1 Householder
transformations will suffice and H m−1 · · · H 1 A is upper trapezoidal.
In an algorithm we can store most of the vector ûk = [ukk , . . . , umk ]T and the
matrix and Ak in A. However, the elements uk,k and ak = rk,k have to compete for
the diagonal in A. For m = 4 and n = 3 the two possibilities look as follows:
⎡
u11
⎢u21
A=⎢
⎣u31
u41
r12
u22
u32
u42
⎤
⎡
r13
r11
⎥
⎢
r23 ⎥
u21
or A = ⎢
⎣u31
u33 ⎦
u43
u41
r12
r22
u32
u42
⎤
r13
r23 ⎥
⎥.
r33 ⎦
u43
The following algorithm for Householder triangulation takes A ∈ Cm×n and
B ∈ Cm×r as input, and uses housegen to compute Householder transformations
H 1 , . . . , H s so that R = H s · · · H 1 A is upper trapezoidal, and C = H s · · · H 1 B.
The matrices R and C are returned. If B is the empty matrix then C is the empty
matrix with m rows and 0 columns. rk,k is stored in A, and uk,k is stored in a
separate vector. We will see that the algorithm can be used to solve linear systems
and least squares problems with right hand side(s) B, and to compute the product of
the Householder transformations by choosing B = I .
function [R,C] = housetriang(A,B)
% [R,C] = housetriang(A,B)
[m,n]=size(A); r=size(B,2); A=[A,B];
for k=1:min(n,m-1)
[v,A(k,k)]=housegen(A(k:m,k));
C=A(k:m,k+1:n+r); A(k:m,k+1:n+r)=C-v*(v’*C);
end
R=triu(A(:,1:n)); C=A(:,n+1:n+r);
end
Listing 5.2 housetriang
5.3 Householder Triangulation
113
Here v = ûk and the update is computed as Ĥ k C = (I − vv ∗ )C = C − v(v ∗ C).
The MATLAB command triu extracts the upper triangular part of A introducing
zeros in rows n + 1, . . . , m.
5.3.2 The Number of Arithmetic Operations
The bulk of the work in Algorithm 5.2 is the computation of C −v ∗(v ∗ ∗C) for each
k. Since in Algorithm 5.2, C ∈ C(m−k+1)×(n+r−k) and m ≥ n the cost of computing
the update C − v ∗ (v T ∗ C) in the real case is 4(m − k + 1)(n + r − k) arithmetic
operations. This implies that the work in Algorithm 5.2 can be estimated as
n
0
2
4(m − k)(n + r − k)dk = 2m(n + r)2 − (n + r)3 .
3
(5.16)
For m = n and r = 0 this gives 4n3 /3 = 2Gn for the number of arithmetic
operations to bring a matrix A ∈ Rn×n to upper triangular form using Householder
transformations.
5.3.3 Solving Linear Systems Using Unitary Transformations
Consider now the linear system Ax = b, where A is square. Using Algorithm 5.2 we
obtain an upper triangular system Rx = c that is upper triangular and nonsingular if
A is nonsingular. Thus, it can be solved by back substitution and we have a method
for solving linear systems that is an alternative to Gaussian elimination. The two
methods are similar since they both reduce A to upper triangular form using certain
transformations and they both work for nonsingular systems.
Which method is better? Here is a very brief discussion.
• Advantages with Householder:
– Row interchanges are not necessary, but see [3].
– Numerically stable.
• Advantages with Gauss
– Half the number of arithmetic operations compared to Householder.
– Row interchanges are often not necessary.
– Usually stable (but no guarantee).
Linear systems can be constructed where Gaussian elimination will fail numerically even if row interchanges are used, see [21]. On the other hand the transformations used in Householder triangulation are unitary so the method is quite stable.
So why is Gaussian elimination more popular than Householder triangulation? One
114
5 Orthonormal and Unitary Transformations
reason is that the number of arithmetic operations in (5.16) when m = n is 4n3 /3 =
2Gn , which is twice the number for Gaussian elimination. Numerical stability can
be a problem with Gaussian elimination, but years and years of experience shows
that it works well for most practical problems and pivoting is often not necessary.
Also Gaussian elimination often wins for banded and sparse problems.
5.4 The QR Decomposition and QR Factorization
Gaussian elimination without row interchanges results in an LU factorization
A = LU of A ∈ Cn×n . Consider Householder triangulation of A. Applying
Algorithm 5.2 gives R = H n−1 · · · H 1 A implying the factorization A = QR,
where Q = H 1 · · · H n−1 is unitary and R is upper triangular. This is known as a
QR factorization of A.
5.4.1 Existence
For a rectangular matrix we define the following.
Definition 5.5 (QR Decomposition) Let A ∈ Cm×n with m, n ∈ N. We say that
A = QR is a QR decomposition of A if Q ∈ Cm,m is square and unitary and
R ∈ Cm×n is upper trapezoidal. If m ≥ n then R takes the form
R=
R1
0m−n,n
where R 1 ∈ Cn×n is upper triangular and 0m−n,n is the zero matrix with m − n
rows and n columns. For m ≥ n we call A = Q1 R 1 a QR factorization of A if
Q1 ∈ Cm×n has orthonormal columns and R 1 ∈ Cn×n is upper triangular.
Suppose m ≥ n. A QR factorization is obtained from a QR decomposition A =
QR by simply using the first n columns of Q and the first n rows of R. Indeed, if
we partition Q as [Q1 , Q2 ] and R = R01 , where Q1 ∈ Rm×n and R 1 ∈ Rn×n
then A = Q1 R 1 is a QR factorization of A. On the other hand a QR factorization
A = Q1 R 1 of A can be turned into a QR decomposition by extending the set of
columns {q 1 , . . . , q n } of Q1 into an orthonormal basis {q 1 , . . . , q n , q n+1 , . . . , q m }
for Rm and adding m − n rows of zeros to R 1 . We then obtain a QR decomposition
A = QR, where Q = [q 1 , . . . , q m ] and R = R01 .
5.4 The QR Decomposition and QR Factorization
115
Example 5.2 (QR Decomposition and Factorization) Consider the factorization
⎡
1
⎢1
A=⎢
⎣1
1
⎡
⎤ ⎡
⎤
⎤
1 1 −1 −1
3 1
223
⎢
⎥ ⎢
⎥
3 7⎥
⎥ = 1 ⎢1 1 1 1 ⎥ × ⎢0 4 5⎥ = QR.
−1 −4⎦ 2 ⎣1 −1 −1 1 ⎦ ⎣0 0 6⎦
1 −1 1 −1
−1 2
000
Since QT Q = I and R is upper trapezoidal, this is a QR decomposition of A. A
QR factorization A = Q1 R 1 is obtained by dropping the last column of Q and the
last row of R, so that
⎡
⎤
⎡
⎤
1 1 −1
223
⎥
1⎢
1
1
1
⎥ × ⎣0 4 5⎦ = Q1 R 1 .
A= ⎢
2 ⎣1 −1 −1⎦
006
1 −1 1
Consider existence and uniqueness.
Theorem 5.9 (Existence of QR Decomposition) Any matrix A ∈ Cm×n with
m, n ∈ N has a QR decomposition.
Proof If m = 1 then A is already in upper trapezoidal form and A = [1]A is a
QR decomposition of A. Suppose m > 1 and set s := min(m − 1, n). Note that
the function housegen(x) returns the vector u in a Householder transformation
for any vector x. With B = I in Algorithm 5.2 we obtain R = CA and C =
H s · · · H 2 H 1 . Thus A = QR is a QR decomposition of A since Q := C ∗ =
H 1 · · · H s is a product of unitary matrices and therefore unitary.
Theorem 5.10 (Uniqueness of QR Factorization) If m ≥ n then the QR factorization is unique if A has linearly independent columns and R has positive diagonal
elements.
Proof Let A = Q1 R 1 be a QR factorization of A ∈ Cm×n . Now A∗ A =
R ∗1 Q∗1 Q1 R 1 = R ∗1 R 1 . By Lemma 4.2 the matrix A∗ A is positive definite, the
matrix R 1 is nonsingular, and if its diagonal elements are positive then R ∗1 R 1 is the
Cholesky factorization of A∗ A. Since the Cholesky factorization is unique it follows
that R 1 is unique and since necessarily Q1 = AR −1
1 , it must also be unique.
Example 5.3 (QR Decomposition and Factorization)
Consider
finding the QR
2 −1
decomposition and factorization of the matrix A = −1
using
the
method of the
2 T
5 −4
uniqueness proof of Theorem 5.10. We find B := A A = −4 5 . The Cholesky
3 4
1
−1
factorization of B = R T R is given by R = √1 50 −4
= √
05
3 . Now R
5
3
5
2 1
.
Since
A
is
square
A
=
QR
is
both
the
QR
so Q = AR −1 = √1 −1
2
5
decomposition and QR factorization of A.
116
5 Orthonormal and Unitary Transformations
5.5 QR and Gram-Schmidt
The Gram-Schmidt orthogonalization of the columns of A can be used to find the
QR factorization of A.
Theorem 5.11 (QR and Gram-Schmidt) Suppose A ∈ Rm×n has rank n and let
v 1 , . . . , v n be the result of applying Gram Schmidt to the columns a 1 , . . . , a n of A,
i.e.,
j −1
v1 = a1 ,
vj = aj −
a Tj v i
i=1
v Ti v i
vi ,
for j = 2, . . . , n.
(5.17)
Let
Q1 := [q 1 , . . . , q n ],
q j :=
⎡
v 1 2 a T2 q 1 a T3 q 1
⎢ 0 v a T q
2 2 3 2
⎢
⎢
0 v 3 2
⎢
⎢
R 1 := ⎢
..
.
⎢
⎢
⎢
⎣
vj
,
v j 2
j = 1, . . . , n and
⎤
· · · a Tn−1 q 1 a Tn q 1
· · · a Tn−1 q 2 a Tn q 2 ⎥
⎥
⎥
· · · a Tn−1 q 3 a Tn q 3 ⎥
.. ⎥
..
⎥.
...
.
. ⎥
⎥
⎥
..
. v n−1 2 a Tn q n−1 ⎦
0
v n 2
(5.18)
Then A = Q1 R 1 is the unique QR factorization of A.
Proof Let Q1 and R 1 be given by (5.18). The matrix Q1 is well defined and
has orthonormal columns, since {q 1 , . . . , q n } is an orthonormal basis for R(A) by
Theorem 5.4. By (5.17)
j −1
aj = vj +
i=1
a Tj v i
v Ti v i
j −1
v i = rjj q j +
q i rij = Q1 R 1 ej , j = 1, . . . , n.
i=1
Clearly R 1 has positive diagonal elements and the factorization is unique.
Example 5.4 (QR Using Gram-Schmidt)
finding the QR decomposition
2Consider
−1
=
[a 1 , a 2 ] using Gram-Schmidt.
and factorization of the matrix A = −1
2
a T2 v 1
Using (5.17) we find v 1 = a 1 and v 2 = a 2 − T v 1 = 35 12 . Thus Q = [q 1 , q 2 ],
v1 v1
2 and q 2 = √1 12 . By (5.18) we find
where q 1 = √1 −1
5
5
1 5 −4
v 1 2 a T2 q 1
= √
R1 = R =
0 v 2 2
5 0 3
and this agrees with what we found in Example 5.3.
5.6 Givens Rotations
117
Warning The Gram-Schmidt orthogonalization process should not be used to
compute the QR factorization numerically. The columns of Q1 computed in floating
point arithmetic using Gram-Schmidt orthogonalization will often be far from
orthogonal. There is a modified version of Gram-Schmidt which behaves better
numerically, see [2]. Here we only considered Householder transformations (cf.
Algorithm 5.2).
5.6 Givens Rotations
In some applications, the matrix we want to triangulate has a special structure.
Suppose for example that A ∈ Rn×n is square and upper Hessenberg as illustrated
by a Wilkinson diagram for n = 4
⎡
x
⎢x
A=⎢
⎣0
0
x
x
x
0
x
x
x
x
⎤
x
x⎥
⎥.
x⎦
x
Only one element in each column needs to be annihilated and a full Householder
transformation will be inefficient. In this case we can use a simpler transformation.
Definition 5.6 (Givens Rotation, Plane Rotation) A plane rotation (also called
a Given’s rotation) is a matrix P ∈ R2,2 of the form
P :=
c s
, where c2 + s 2 = 1.
−s c
A plane rotation is an orthogonal matrix and there is a unique angle θ ∈ [0, 2π)
such that c = cos θ and s = sin θ . Moreover, the identity matrix is a plane rotation
corresponding to θ = 0. A vector x in the plane is rotated an angle θ clockwise by
P = R. See Exercise 5.16 and Fig. 5.4.
Fig. 5.4 A plane rotation
x
θ
y=Rx
118
5 Orthonormal and Unitary Transformations
A Givens rotation can be used to introduce one zero in a vector. Consider first
the case of a 2-vector. Suppose
x=
x1
= 0,
x2
c :=
x1
,
r
s :=
x2
,
r
r := x2 .
If x ∈ R2 then
1 x12 + x22
1 x1 x2 x1
r
=
,
=
Px =
0
0
r −x2 x1 x2
r
and we have introduced a zero in x. We can take P = I when x = 0.
For an n-vector x ∈ Rn and 1 ≤ i < j ≤ n we define a rotation in the i, j -plane
as a matrix P ij = (pkl ) ∈ Rn×n by pkl = δkl except for positions ii, jj, ij, j i,
which are given by
pii pij
c s
=
, where c2 + s 2 = 1.
pj i pjj
−s c
Thus, for n = 4,
⎡
P 12
c
⎢−s
=⎢
⎣0
0
s
c
0
0
⎤
00
0 0⎥
⎥,
1 0⎦
01
⎡
P 13
c
⎢0
=⎢
⎣−s
0
⎤
0s0
1 0 0⎥
⎥,
0 c 0⎦
001
P 23
⎡
1 0
⎢0 s
=⎢
⎣0 −s
0 0
0
c
c
0
⎤
0
0⎥
⎥.
0⎦
1
Premultiplying a matrix by a rotation in the i, j -plane changes only rows i and
j of the matrix, while post multiplying the matrix by such a rotation only changes
column i and j . In particular, if B = P ij A and C = AP ij then B(k, :) = A(k, :),
C(:, k) = A(:, k) for all k = i, j and
B(i, :)
cs
A(i, :)
cs
=
, C(:, i) C(:, j ) = A(:, i) A(:, j )
.
B(j, :)
−s c A(j, :)
−s c
(5.19)
Givens rotations can be used as an alternative to Householder transformations
for solving linear systems. It can be shown that for a dense system of order n the
number of arithmetic operations is asymptotically 2n3 , corresponding to the work
of 3 Gaussian eliminations, while, the work using Householder transformations
corresponds to 2 Gaussian eliminations. However, for matrices with a special
structure Givens rotations can be used to advantage. As an example consider an
upper Hessenberg matrix A ∈ Rn×n . It can be transformed to upper triangular form
using rotations P i,i+1 for i = 1, . . . , n − 1. For n = 4 the process can be illustrated
as follows.
5.7 Exercises Chap. 5
A=
x x x
x x x x P 12
0x x x →
0 0 x x
x
119
r11
r12
0 x
0 x
0 0
r13
x
x
x
r14 P 23
x
→
x
x
r11
r12
0 r22
0 0
0 0
r13
r23
x
x
r14 r24 P 34
→
x
x
r11
r12
0 r22
0 0
0 0
r13
r23
r33
0
r14 r24
r34 .
r44
For an algorithm see Exercise 5.18. This reduction is used in the QR method
discussed in Chap. 15.
5.7 Exercises Chap. 5
5.7.1 Exercises Sect. 5.1
Exercise 5.1 (The A∗ A Inner Product) Suppose A ∈ Cm×n has linearly independent columns. Show that x, y := y ∗ A∗ Ax defines an inner product on Cn .
Exercise 5.2 (Angle Between Vectors in Complex Case) Show that in the complex case there is a unique angle θ in [0, π/2] such that
cos θ =
|x, y|
.
xy
(5.20)
Exercise 5.3 (x T Ay Inequality (Exam Exercise 1979-3)) Suppose A ∈ Rn×n is
symmetric positive definite. Show that
|x T Ay|2 ≤ x T Ax y T Ay
for all x, y ∈ Rn , with equality if and only if x and y are linearly dependent.
5.7.2 Exercises Sect. 5.2
Exercise 5.4 (What Does Algorithm Housegen Do When x = e1 ?) Determine
H in Algorithm 5.1 when x = e1 .
Exercise 5.5 (Examples of Householder Transformations) If x, y ∈ Rn with
x = y and v := x − y = 0 then it follows from Example 5.1 that
2 vvT 2
I − 2 v T v x = y. Use this to construct a Householder transformation H such
that H x = y in the following cases.
3
5
a) x =
, y=
.
4
0
⎡ ⎤
⎡ ⎤
2
0
b) x = ⎣ 2 ⎦ , y = ⎣ 3 ⎦.
1
0
120
5 Orthonormal and Unitary Transformations
Exercise 5.6 (2 × 2 Householder Transformation) Show that a real 2 × 2
Householder transformation can be written in the form
− cos φ sin φ
H =
.
sin φ cos φ
Find H x if x = [cos φ, sin φ]T .
Exercise 5.7 (Householder Transformation (Exam Exercise 2010-1))
a) Suppose x, y ∈ Rn with x2 = y2 and v := x − y = 0. Show that
H x = y,
where H := I − 2
vv T
.
vT v
b) Let B ∈ R4,4 be given by
⎡
01
⎢0 0
B := ⎢
⎣0 0
0
⎤
00
1 0⎥
⎥,
0 1⎦
00
where 0 < < 1. Compute a Householder transformation H and a matrix B 1
such that the first column of B 1 := H BH has a zero in the last two positions.
5.7.3 Exercises Sect. 5.4
Exercise 5.8 (QR Decomposition)
⎡
⎤
12
⎢1 2⎥
⎥
A=⎢
⎣1 0⎦ ,
10
⎡
⎤
1 1 1 1
1 ⎢1 1 −1 −1⎥
⎥,
Q= ⎢
2 ⎣1 −1 −1 1 ⎦
1 −1 1 −1
⎡
⎤
22
⎢0 2⎥
⎥
R=⎢
⎣0 0⎦ .
00
Show that Q is orthonormal and that QR is a QR decomposition of A. Find a QR
factorization of A.
Exercise 5.9 (Householder Triangulation)
a) Let
⎡
⎤
1 0 1
A := ⎣ −2 −1 0 ⎦ .
2 2 1
5.7 Exercises Chap. 5
121
Find Householder transformations H 1 , H 2 ∈ R3×3 such that H 2 H 1 A is
upper triangular.
b) Find the QR factorization of A, when R has positive diagonal elements.
Exercise 5.10 (Hadamard’s Inequality) In this exercise we use the QR factorization to prove a classical determinant inequality. For any A = [a 1 , . . . , a n ] ∈ Cn×n
we have
|det(A)| ≤
n
a j 2 .
(5.21)
j =1
Equality holds if and only if A has a zero column or the columns of A are
orthogonal.
a) Show that if Q is unitary then |det(Q)| = 1.
b) Let A = QR be a QR factorization of A and let R = [r 1 , . . . , r n ]. Show that
∗
2
(A∗ A)jj = a j 22 = (R
j 2 .
R)jj = r
c) Show that |det(A)| = nj=1 |rjj | ≤ nj=1 a j 2 .
d) Show that we have equality if A has a zero column,
e) Suppose the columns of A are nonzero. Show that we have equality if and only
if the columns of A are orthogonal.1
Exercise 5.11 (QL Factorization (Exam Exercise 1982-2)) Suppose B ∈ Rn×n
is symmetric and positive definite. It can be shown that B has a factorization of the
form B = LT L, where L is lower triangular with positive diagonal elements (you
should not show this). Note that this is different from the Cholesky factorization
B = LLT .
a) Suppose B = LT L. Write down the equations to determine the elements li,j of
L, in the order i = n, n − 1, . . . , 1 and j = i, 1, 2 . . . , i − 1.
b) Explain (without making a detailed algorithm) how the LT L factorization can be
used to solve the linear system Bx = c. Compute LF . Is the algorithm stable?
c) Show that every nonsingular matrix A ∈ Rn×n can be factored in the form A =
QL, where Q ∈ Rn×n is orthogonal and L ∈ Rn×n is lower triangular with
positive diagonal elements.
d) Show that the QL factorization in c) is unique.
Exercise 5.12 (QL-Factorization (Exam Exercise 1982-3)) In this exercise we
will develop an algorithm to find a QL-factorization of A ∈ Rn×n (cf. Exam exercise
1982-2) using Householder transformations.
a) Given vectors a := [a1 , . . . , an ]T ∈ Rn and en := [0, . . . , 0, 1]T . Find v ∈ Rn
∗
such that the Householder transformation H := I − 2 vv
v ∗ v satisfies H a = −sen ,
where |s| = a2 . How should we choose the sign of s?
1 Show
that we have equality ⇐⇒ R is diagonal ⇐⇒ A∗ A is diagonal.
122
5 Orthonormal and Unitary Transformations
b) Let 1 ≤ r ≤ n, v r ∈ Rr , v r = 0, and
V r := I r − 2
√ vr
v r v ∗r
= I r − ur u∗r , with ur := 2
.
∗
vr v r
v r 2
"
!
0
Show that H := V0r I n−r
is a Householder transformation. Show also that if
ai,j = 0 for i = 1, . . . , r and j = r + 1, . . . , n then the last r columns of A and
H A are the same.
c) Explain, without making a detailed algorithm, how we to a given matrix
A ∈ Rn×n can find Householder transformations H 1 , . . . , H n−1 such that
H n−1 , . . . , H 1 A is lower triangular. Give a QL factorization of A.
Exercise 5.13 (QR Fact. of Band Matrices (Exam Exercise 2006-2)) Let A ∈
Rn×n be a nonsingular symmetric band matrix with bandwidth d ≤ n − 1, so that
aij = 0 for all i, j with |i − j | > d. We define B := AT A and let A = QR be the
QR factorization of A where R has positive diagonal entries.
a) Show that B is symmetric.
b) Show that B has bandwidth ≤ 2d.
c) Write a MATLAB function B=ata(A,d) which computes B. You shall
exploit the symmetry and the function should only use O(cn2 ) flops, where c
only depends on d.
d) Estimate the number of arithmetic operations in your algorithm.
e) Show that AT A = R T R.
f) Explain why R has upper bandwidth 2d.
g) We consider 3 methods for finding the QR factorization of the band matrix A,
where we assume that n is much bigger than d. The methods are based on
1. Gram-Schmidt orthogonalization,
2. Householder transformations,
3. Givens rotations.
Which method would you recommend for a computer program using floating
point arithmetic? Give reasons for your answer.
Exercise 5.14 (Find QR Factorization (Exam Exercise 2008-2)) Let
⎡
⎤
2 1
⎢ 2 −3⎥
⎥
A := ⎢
⎣−2 −1⎦
−2 3
a) Find the Cholesky factorization of AT A.
b) Find the QR factorization of A.
5.7 Exercises Chap. 5
123
5.7.4 Exercises Sect. 5.5
Exercise 5.15 (QR Using Gram-Schmidt, II) Construct Q1 and R 1 in Example 5.2 using Gram-Schmidt orthogonalization.
5.7.5 Exercises Sect. 5.6
Exercise
5.16
!
" (Plane Rotation) Show that if x
r cos (α−θ)
r sin (α−θ) .
=
r cos α r sin α
then P x
=
Exercise 5.17 (Updating the QR Decomposition) Let H ∈ R4,4 be upper
Hessenberg. Find Givens rotation matrices G1 , G2 , G3 such that
G3 G2 G1 H = R
is upper triangular. (Here each Gk = P i,j for suitable i, j, c og s, and for each k
you are meant to find suitable i and j .)
Exercise 5.18 (Solving Upper Hessenberg System Using Rotations) Let A ∈
Rn×n be upper Hessenberg and nonsingular, and let b ∈ Rn . The following algorithm
solves the linear system Ax = b using rotations P k,k+1 for k = 1, . . . , n − 1. It
uses the back solve Algorithm 3.2. Determine the number of arithmetic operations
of this algorithm.
function x=rothesstri(A,b)
% x=rothesstri(A,b)
n=length(A); A=[A b];
for k=1:n-1
r=norm([A(k,k),A(k+1,k)]);
if r>0
c=A(k,k)/r; s=A(k+1,k)/r;
A([k k+1],k+1:n+1) ...
=[c s;-s c]*A([k k+1],k+1:n+1);
end
A(k,k)=r; A(k+1,k)=0;
end
x=rbacksolve(A(:,1:n),A(:,n+1),n);
end
Listing 5.3 rothesstri
Exercise 5.19 (A Givens Transformation
(Exam Exercise 2013-2)) A Givens
cs
rotation of order 2 has the form G :=
∈ R2×2 , where s 2 + c2 = 1.
−s c
124
5 Orthonormal and Unitary Transformations
a) Is G symmetric and unitary?
b) Given x1 , x2 ∈ R and set r :=
y
x
where 1 = G 1 .
y2
x2
%
x12 + x22 . Find G and y1 , y2 so that y1 = y2 ,
Exercise 5.20 (Givens Transformations (Exam Exercise 2016-3)) Recall that a
rotation in the ij -plane is an m × m-matrix, denoted P i,j , which differs from the
identity matrix only in the entries ii, ij, j i, jj , which equal
pii pij
cos θ sin θ
=
,
pj i pjj
− sin θ cos θ
i.e., these four entries are those of a Givens rotation.
a) For θ ∈ R, let P be a Givens rotation of the form
cos θ sin θ
P =
− sin θ cos θ
and let x be a fixed vector in R2 . Show that there exists a unique θ ∈ (−π/2, π/2]
so that P x = ±x2 e1 , where e1 = (1, 0)T .
b) Show that, for any vector w ∈ Rm , one can find rotations in the 12-plane, 23plane, . . ., (m − 1)m-plane, so that
⎡ ⎤
α
⎢0⎥
⎢ ⎥
P 1,2 P 2,3 · · · P m−2,m−1 P m−1,m w = ⎢ . ⎥ ,
⎣ .. ⎦
0
where α = ±w2 .
c) Assume that m ≥ n. Recall that an m × n-matrix A with entries ai,j is called
upper trapezoidal if there are no nonzero entries below the main diagonal
(a1,1 , a2,2 , . . . , an,n )
(for m = n, upper trapezoidal is the same as upper triangular). Recall also that
an m × n-matrix is said to be in upper Hessenberg form if there are no nonzero
entries below the subdiagonal
(a2,1, a3,2 , . . . , an,n−1 ).
5.8 Review Questions
125
Explain that, if an m × n-matrix H is in upper Hessenberg form, one can find
plane rotations so that
P m−1,m P m−2,m−1 · · · P 2,3 P 1,2 H
is upper trapezoidal.
d) Let again A be an m × n-matrix with m ≥ n, and let A− be the matrix obtained
by removing column k in A. Explain how you can find a QR Decomposition of
A− , when we already have a QR decomposition A = QR of A.2
Exercise 5.21 (Cholesky and Givens (Exam Exercise 2018-2)) Assume that A
is n × n symmetric positive definite, and with Cholesky factorization A = LL∗ .
Assume also that z is a given column vector of length n.
a) Explain why A + zz∗ has a unique Cholesky factorization.
b) Assume that we are given a QR decomposition
∗
L
R
=
Q
,
z∗
0
with R square and upper triangular. Explain why R is nonsingular. Explain also
that, if R also has nonnegative diagonal entries, then A + zz∗ has the Cholesky
factorization R ∗ R.
c) Explain how one can find plane rotations Pi1 ,n+1 , Pi2 ,n+1 ,. . . ,Pin ,n+1 so that
∗ L
R
Pi1 ,n+1 Pi2 ,n+1 · · · Pin ,n+1 ∗ =
,
z
0
(5.22)
withR upper triangular, and explain how to obtain a QR decomposition of
L∗
from this. In particular you should write down the numbers i1 , . . . , in .
z∗
Is it possible to choose the plane rotations so that R in (5.22) also has positive
diagonal entries?
5.8 Review Questions
5.8.1
5.8.2
5.8.3
What is a Householder transformation?
Why are they good for numerical work?
What are the main differences between solving a linear system by Gaussian
elimination and Householder transformations?
2 Consider
the matrix QT A− .
126
5.8.4
5.8.5
5.8.6
5.8.7
5 Orthonormal and Unitary Transformations
What are the differences between a QR decomposition and a QR factorization?
Does any matrix have a QR decomposition?
What is a Givens transformation?
Is a unitary matrix always well conditioned?
Part II
Eigenpairs and Singular Values
We turn now to eigenpairs of matrices, i.e., eigenvalues and corresponding
eigenvectors. The eigenpairs of a matrix are easily determined if it is diagonal.
Indeed, the eigenvalues are the diagonal elements and the eigenvectors are unit
vectors. We will see that not all matrices can be reduced to diagonal form using
eigenvalue preserving transformations known as similarity transformations. This
raises the question: how close to a diagonal matrix can we reduce a general
matrix using similarity transformations? We give one answer to this question, the
Jordan factorization or the Jordan canonical form. We also characterize matrices
which can be diagonalized using unitary similarity transformations, and study the
subclass of Hermitian matrices. Numerical methods for determining eigenvalues
and eigenvectors will be considered in Chaps. 14 and 15.
In the second chapter in this part we consider the important singular value
decomposition of a rectangular matrix. This decomposition will play a central role
in several of the remaining chapters in this book.
Chapter 6
Eigenpairs and Similarity
Transformations
We have seen that a Hermitian matrix is positive definite if and only if it has positive
eigenvalues. Eigenvalues and some related quantities called singular values occur in
many branches of applied mathematics and are also needed for a deeper study of
linear systems and least squares problems. In this and the next chapter we study
eigenvalues and singular values. Recall that if A ∈ Cn×n is a square matrix, λ ∈ C
and x ∈ Cn then (λ, x) is an eigenpair for A if Ax = λx and x is nonzero.
The scalar λ is called an eigenvalue and x is said to be an eigenvector. The set of
eigenvalues is called the spectrum of A and is denoted by σ (A). For example,
σ (I ) = {1, . . . , 1} = {1}. The eigenvalues are the roots of the characteristic
polynomial of A given for λ ∈ C by
πA (λ) = det(A − λI ).
The equation det(A − λI ) = 0 is called the characteristic equation of A.
Equivalently the characteristic equation can be written det(λI − A) = 0.
6.1 Defective and Nondefective Matrices
For the eigenvectors we will see that it is important to know if the eigenvectors of a
matrix of order n form a basis for Cn . We say that A is defective if the eigenvectors
do not form a basis for Cn and nondefective otherwise.
We have the following sufficient condition for a matrix to be nondefective.
Theorem 6.1 (Distinct Eigenvalues) A matrix with distinct eigenvalues is nondefective, i.e., its eigenvectors are linearly independent.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_6
129
130
6 Eigenpairs and Similarity Transformations
Proof The proof is by contradiction. Suppose A has eigenpairs (λk , x k ), k =
1, . . . , n, with linearly dependent eigenvectors. Let m be the smallest integer such
that {x 1 , . . . , x m } is linearly dependent. Thus m
j =1 cj x j = 0, where at least one
cj is nonzero. We must have m ≥ 2 since eigenvectors are nonzero. We find
m
m
cj x j = 0 ⇒
j =1
m
cj Ax j =
j =1
cj λj x j = 0.
j =1
m−1
From the last relation we subtract m
j =1 cj λm x j = 0 and find
j =1 cj (λj −
λm )x j = 0. But since λj − λm = 0 for j = 1, . . . , m − 1 and at least one
cj = 0 for j < m we see that {x 1 , . . . , x m−1 } is linearly dependent, contradicting
the minimality of m.
If some of the eigenvalues occur with multiplicity higher than one then the matrix
can be either defective or nondefective.
Example 6.1 (Defective and Nondefective Matrices) Consider the matrices
10
I :=
,
01
11
J :=
.
01
Since I x = x and λ1 = λ2 = 1 any vector x ∈ C2 is an eigenvector for I . In
particular the two unit vectors e1 and e2 are eigenvectors and form an orthonormal
basis for C2 . We conclude that the identity matrix is nondefective. The matrix J
also has the eigenvalue one with multiplicity two, but since J x = x if and only if
x2 = 0, any eigenvector must be a multiple of e1 . Thus J is defective.
If the eigenvectors x 1 , . . . , x n form a basis for Cn then any x ∈ Cn can be written
n
x=
cj x j for some scalars c1 , . . . , cn .
j =1
We call this an eigenvector expansion of x. Thus to any nondefective matrix there
corresponds an eigenvector expansion.
2 −1
Example 6.2 (Eigenvector Expansion Example) Eigenpairs of
are
−1 2
(1, [1, 1]T ) and (3, [1, −1]T ). Any x = [x1 , x2 ]T ∈ C2 has the eigenvector
expansion
x1 + x2 1
x1 − x2 1
x=
+
.
1
−1
2
2
6.1 Defective and Nondefective Matrices
131
6.1.1 Similarity Transformations
We need a transformation that can be used to simplify a matrix without changing
the eigenvalues.
Definition 6.1 (Similar Matrices) Two matrices A, B ∈ Cn×n are said to be
similar if there is a nonsingular matrix S ∈ Cn×n such that B = S −1 AS. The
transformation A → B is called a similarity transformation. The columns of S
are denoted by s 1 , s 2 , . . . , s n .
We note that
1. Similar matrices have the same eigenvalues, they even have the same characteristic polynomial. Indeed, by the product rule for determinants det(AC) =
det(A) det(C) so that
πB (λ) = det(S −1 AS − λI ) = det S −1 (A − λI )S
= det(S −1 ) det(A − λI ) det(S) = det(S −1 S) det(A − λI ) = πA (λ),
since det(I ) = 1.
2. (λ, x) is an eigenpair for S −1 AS if and only if (λ, Sx) is an eigenpair for A. In
fact (S −1 AS)x = λx if and only if A(Sx) = λ(Sx).
3. If S −1 AS = D = diag(λ1 , . . . , λn ) we can partition AS = SD by
columns to obtain [As 1 , . . . , As n ] = [λ1 s 1 , . . . , λn s n ]. Thus the columns of
S are eigenvectors of A. Moreover, A is nondefective since S is nonsingular.
Conversely, if A is nondefective then it can be diagonalized by a similarity
transformation S −1 AS, where the columns of S are eigenvectors of A.
4. For any square matrices A, C ∈ Cn×n the two products AC and CA have the
same characteristic polynomial. More generally, for rectangular matrices A ∈
Cm×n and C ∈ Cn×m , with say m > n, the bigger matrix has m − n extra zero
eigenvalues
πAC (λ) = λm−n πCA (λ),
λ ∈ C.
(6.1)
To show this define for any m, n ∈ N block triangular matrices of order n + m by
E :=
AC 0
,
C 0
F :=
0 0
,
C CA
S=
I A
.
0 I
I −A
=
. Moreover, ES = SF so
The matrix S is nonsingular with S
0 I
E and F are similar and therefore have the same characteristic polynomials.
Moreover, this polynomial is the product of the characteristic polynomial of the
diagonal blocks. But then πE (λ) = λn πAC (λ) = πF (λ) = λm πCA (λ). This
implies the statements for m ≥ n.
−1
132
6 Eigenpairs and Similarity Transformations
6.1.2 Algebraic and Geometric Multiplicity of Eigenvalues
Linear independence of eigenvectors depends on the multiplicity of the eigenvalues
in a nontrivial way. For multiple eigenvalues we need to distinguish between two
kinds of multiplicities.
Suppose A ∈ Cn×n has k distinct eigenvalues λ1 , . . . , λk with multiplicities
a1 , . . . , ak so that
k
πA (λ) := det(A − λI ) = (λ1 − λ)a1 · · · (λk − λ)ak ,
λi = λj , i = j,
ai = n.
i=1
(6.2)
The positive integer ai = a(λi ) = aA (λi ) is called the multiplicity, or more
precisely the algebraic multiplicity of the eigenvalue λi . The multiplicity of an
eigenvalue is simple (double, triple) if ai is equal to one (two, three).
To define a second kind of multiplicity we consider for each λ ∈ σ (A) the
nullspace
N (A − λI ) := {x ∈ Cn : (A − λI )x = 0}
(6.3)
of A − λI . The nullspace is a subspace of Cn consisting of all eigenvectors of A
corresponding to the eigenvalue λ. The dimension of the subspace must be at least
one since A − λI is singular.
Definition 6.2 (Geometric Multiplicity) The geometric multiplicity g = g(λ) =
gA (λ) of an eigenvalue λ of A is the dimension of the nullspace N (A − λI ).
Example 6.3 (Geometric Multiplicity) The n × n identity matrix I has the eigenvalue λ = 1 with πI (λ) = (1 − λ)n . Since I − λI is the zero matrix when λ = 1,
the nullspace of I − λI is all of n-space and it follows
that
a = g = n. On the other
11
hand we saw in Example 6.1 that the matrix J :=
has the eigenvalue λ = 1
01
with a = 2 and any eigenvector is a multiple of e1 . Thus g = 1.
Theorem 6.2 (Geometric Multiplicity of Similar Matrices) Similar matrices
have the same eigenvalues with the same algebraic and geometric multiplicities.
Proof Similar matrices have the same characteristic polynomials and only the
invariance of geometric multiplicity needs to be shown. Suppose λ ∈ σ (A),
dim N (S −1 AS − λI ) = k, and dim N (A − λI ) = . We need to show that
k = . Suppose v 1 , . . . , v k is a basis for N (S −1 AS − λI ). Then S −1 ASv i = λv i
or ASv i = λSv i , i = 1, . . . , k. But then {Sv 1 , . . . , Sv k } ⊂ N (A − λI ), which
implies that k ≤ . Similarly, if w 1 , . . . , w is a basis for N (A − λI ) then
{S −1 w1 , . . . , S −1 w } ⊂ N (S −1 AS − λI ). which implies that k ≥ . We conclude
that k = .
6.2 The Jordan Factorization
133
For a proof of the following theorem see the next section.1
Theorem 6.3 (Geometric Multiplicity) We have
1. The geometric multiplicity of an eigenvalue is always bounded above by the
algebraic multiplicity of the eigenvalue.
2. The number of linearly independent eigenvectors of a matrix equals the sum of
the geometric multiplicities of the eigenvalues.
3. A matrix A ∈ Cn×n has n linearly independent eigenvectors if and only if the
algebraic and geometric multiplicity of all eigenvalues are the same.
6.2 The Jordan Factorization
We have seen that a nondefective matrix can be diagonalized by its eigenvectors,
while a defective matrix does not enjoy this property. The following question arises.
How close to a diagonal matrix can we reduce a general matrix by a similarity
transformation? We give one answer to this question, called the Jordan factorization,
or the Jordan canonical form, in Theorem 6.4. For a proof, see for example [10]. The
Jordan factorization is an important tool in matrix analysis and it has applications to
systems of differential equations, see [8].
The Jordan factorization involves bidiagonal matrices called Jordan blocks.
Definition 6.3 (Jordan Block)
A Jordan block of order m, denoted J m (λ) is an m × m matrix of the form
⎡λ 1
⎤
0 ··· 0 0
0 λ 1 ··· 0 0
⎢ 0 0 λ ··· 0 0 ⎥
J m (λ) := ⎢
⎣ ...
= λI m + E m ,
.. ⎥
.⎦
0 0 0 ··· λ 1
0 0 0 ··· 0 λ
⎡ 0 1 0 ···
⎤
00
0 0 1 ··· 0 0
⎢ 0 0 0 ··· 0 0 ⎥
E m := ⎢
⎣ ...
.
.. ⎥
.⎦
(6.4)
0 0 0 ··· 0 1
0 0 0 ··· 0 0
!λ 1 0"
A 3×3 Jordan block has the form J 3 (λ) = 0 λ 1 . Since a Jordan block is upper
0 0 λ
triangular λ is an eigenvalue of J m (λ) and any eigenvector must be a multiple of
e1 . Indeed, if J m (λ)v = λv for some v = [v1 , . . . , vm ] then λvi−1 + vi = λvi−1 ,
i = 2, . . . , m which shows that v2 = · · · = vm = 0. Thus, the eigenvalue λ of
J m (λ) has algebraic multiplicity a = m and geometric multiplicity g = 1.
The Jordan factorization is a factorization of a matrix into Jordan blocks.
Theorem 6.4 (The Jordan Factorization of a Matrix)
Suppose A ∈ Cn×n has k distinct eigenvalues λ1 , . . . , λk of algebraic multiplicities
a1 , . . . , ak and geometric multiplicities g1 , . . . , gk . There is a nonsingular matrix
1 This
can also be shown without using the Jordan factorization, see [9].
134
6 Eigenpairs and Similarity Transformations
S ∈ Cn×n such that
J := S −1 AS = diag(U 1 , . . . , U k ), with U i ∈ Cai ×ai ,
(6.5)
where each U i is block diagonal having gi Jordan blocks along the diagonal
U i = diag(J mi,1 (λi ), . . . , J mi,gi (λi )).
(6.6)
Here mi,1 , . . . , mi,gi are positive integers and they are unique if they are ordered so
g
that mi,1 ≥ mi,2 ≥ · · · ≥ mi,gi . Moreover, ai = j i=1 mi,j for all i.
We note that
1. The matrices S and J in (6.5) are called Jordan factors. We also call J the
Jordan factorization of A.
2. The columns of S are called principal vectors or generalized eigenvectors.
They satisfy the matrix equation AS = SJ .
3. Each U i is upper triangular with the eigenvalue λi on the diagonal and consists
of gi Jordan blocks. These Jordan blocks can be taken in any order and it is
customary to refer to any such block diagonal matrix as the Jordan factorization
of A.
Example 6.4 (Jordan Factorization) As an example consider the Jordan factorization
⎤
⎡2 1 0
021
⎢0 0 2
J := diag(U 1 , U 2 ) = ⎢
⎣
21
02
2
⎥
⎥ ∈ R8×8 .
⎦
(6.7)
31
03
We encountered this matrix in Exercise 6.1. The eigenvalues together with their
algebraic and geometric multiplicities can be read off directly from the Jordan
factorization.
• U 1 = diag(J 3 (2), J 2 (2), J 1 (2)) and U 2 = J 2 (3).
• 2 is an eigenvalue of algebraic multiplicity 6 and geometric multiplicity 3, the
number of Jordan blocks corresponding to λ = 2.
• 3 is an eigenvalue of algebraic multiplicity 2 and geometric multiplicity 1.
The columns of S = [s 1 , . . . , s 8 ] are determined from the columns of J as
follows
As 1 = 2s 1 ,
As 2 = s 1 + 2s 2 ,
As 4 = 2s 4 ,
As 5 = s 4 + 2s 5 ,
As 6 = 2s 6 ,
As 7 = 3s 7 ,
As 8 = s 7 + 3s 8 .
As 3 = s 2 + 2s 3 ,
6.3 The Schur Factorization and Normal Matrices
135
We see that the generalized eigenvector corresponding to the first column in a
Jordan block is an eigenvector of A. The remaining generalized eigenvectors are
not eigenvectors.
The matrix
⎡3 1
⎢
J := ⎢
⎣
03
⎤
21
02
2
210
021
002
⎥
⎥
⎦
is also a Jordan factorization of A. In any Jordan factorization of this A the sizes of
the 4 Jordan blocks J 2 (3), J 2 (2), J 1 (2), J 3 (2) are uniquely given.
Proof of Theorem 6.3
1. The algebraic multiplicity ai of an eigenvalue λi is equal to the size of the
corresponding U i . Moreover each U i contains gi Jordan blocks of size mi,j ≥ 1.
Thus gi ≤ ai .
2. Since A and J are similar the geometric multiplicities of the eigenvalues of
these matrices are the same, and it is enough to prove statement 2 for the Jordan
factor J . We show this only for the matrix J given by (6.7). The general case
should then be clear. There are only 4 eigenvectors of J , namely e1 , e4 , e6 , e7
corresponding to the 4 Jordan blocks. These 4 vectors are clearly linearly
independent. Moreover there are k = 2 distinct eigenvalues and g1 + g2 =
3 + 1 = 4.
3. Since gi ≤ ai for all i and i ai = n we have i gi = n if and only if ai = gi
for i = 1, . . . , k.
6.3 The Schur Factorization and Normal Matrices
6.3.1 The Schur Factorization
We turn now to unitary similarity transformations S −1 AS, where S = U is
unitary. Thus S −1 = U ∗ and a unitary similarity transformation takes the form
U ∗ AU .
6.3.2 Unitary and Orthogonal Matrices
Although not every matrix can be diagonalized it can be brought into triangular
form by a unitary similarity transformation.
136
6 Eigenpairs and Similarity Transformations
Theorem 6.5 (Schur Factorization) For each A ∈ Cn×n there exists a unitary
matrix U ∈ Cn×n such that R := U ∗ AU is upper triangular.
The matrices U and R in the Schur factorization are called Schur factors. We
call A = U RU ∗ the Schur factorization of A.
Proof We use induction on n. For n = 1 the matrix U is the 1 × 1 identity matrix.
Assume that the theorem is true for all k ×k matrices, and suppose A ∈ Cn×n , where
n := k + 1. Let (λ1 , v 1 ) be an eigenpair for A with v 1 2 = 1. By Theorem 5.5
we can extend v 1 to an orthonormal basis {v 1 , v 2 , . . . , v n } for Cn . The matrix V :=
[v 1 , . . . , v n ] ∈ Cn×n is unitary, and
V ∗ AV e1 = V ∗ Av 1 = λ1 V ∗ v 1 = λ1 e1 .
It follows that
λ1 x ∗
V AV =
, for some M ∈ Ck×k and x ∈ Ck .
0 M
∗
(6.8)
By the induction hypothesis there is a unitary matrix W 1 ∈ C(n−1)×(n−1) such that
W ∗1 MW 1 is upper triangular. Define
1 0∗
W=
0 W1
and U = V W .
Then W and U are unitary and
1 0∗
0 W ∗1
λ1 x ∗
0 M
λ1 x ∗ W 1
=
0 W ∗1 MW 1
U ∗ AU = W ∗ (V ∗ AV )W =
is upper triangular.
1 0∗
0 W1
If A has complex eigenvalues then U will be complex even if A is real. The
following is a real version of Theorem 6.5.
Theorem 6.6 (Schur Form, Real Eigenvalues) For each A ∈ Rn×n with real
eigenvalues there exists an orthogonal matrix U ∈ Rn×n such that U T AU is upper
triangular.
Proof Consider the proof of Theorem 6.5. Since A and λ1 are real the eigenvector
v 1 is real and the matrix W is real and W T W = I . By the induction hypothesis V
is real and V T V = I . But then also U = V W is real and U T U = I .
6.3 The Schur Factorization and Normal Matrices
137
A real matrix with some complex eigenvalues can only be reduced to block
triangular form by a real unitary similarity transformation. We consider this in
Sect. 6.3.5.
Example 6.5 (Deflation Example) By using the unitary transformation V on the
n × n matrix A, we obtain a matrix M of order n − 1. M has the same eigenvalues
as A except λ. Thus we can find another eigenvalue of A by working with a smaller
matrix M. This is an example of a deflation technique
which is very useful in
numerical work. The second derivative matrix T :=
2 −1 0
−1 2 −1
0 −1 2
has an eigenpair
(2, x 1 ), where x 1 = [−1, 0, 1]T . Find the remaining eigenvalues using deflation.
For this we extend x 1 to a basis {x 1 , x 2 , x 3 } for R3 by defining x 2 = [0, 1, 0]T ,
x 3 = [1, 0, 1]T . This is already an orthogonal basis and normalizing we obtain the
orthogonal matrix
⎡
− √1 0
⎢
V =⎣ 0
√1
2
2
√1
2
⎤
⎥
1 0 ⎦.
0 √1
2
We obtain (6.8) with λ = 2 and
√ 2 − 2
√
M=
.
− 2 2
We can now find the remaining eigenvalues of A from the 2 × 2 matrix M.
6.3.3 Normal Matrices
A matrix A ∈ Cn×n is normal if A∗ A = AA∗ . In this section we show that a matrix
has orthogonal eigenvectors if and only if it is normal.
Examples of normal matrices are
1.
2.
3.
4.
A∗ = A,
A∗ = −A,
A∗ = A−1 ,
A = diag(d1 , . . . , dn ).
(Hermitian)
(Skew-Hermitian)
(Unitary)
(Diagonal)
Clearly the matrices in 1. 2. 3. are normal. If A is diagonal then
A∗ A = diag(d1 d1 , . . . , dn dn ) = diag(|d1 |2 , . . . , |dn |2 ) = AA∗ ,
138
6 Eigenpairs and Similarity Transformations
and A is normal. The 2. derivative matrix T in (2.27) is symmetric and therefore
normal. The eigenvalues of a normal matrix can be complex (cf. Exercise 6.21).
However in the Hermitian case the eigenvalues are real (cf. Lemma 2.3).
The following theorem shows that A has a set of orthogonal eigenvectors if and
only if it is normal.
Theorem 6.7 (Spectral Theorem for Normal Matrices) A matrix A ∈ Cn×n is
normal if and only if there exists a unitary matrix U ∈ Cn×n such that U ∗ AU = D
is diagonal. If D = diag(λ1 , . . . , λn ) and U = [u1 , . . . , un ] then (λj , uj ), j =
1, . . . , n are orthonormal eigenpairs for A.
Proof If B = U ∗ AU , with B diagonal, and U ∗ U = I , then A = U BU ∗ and
AA∗ = (U BU ∗ )(U B ∗ U ∗ ) = U BB ∗ U ∗ and
A∗ A = (U B ∗ U ∗ )(U BU ∗ ) = U B ∗ BU ∗ .
Now BB ∗ = B ∗ B since B is diagonal, and A is normal.
Conversely, suppose A∗ A = AA∗ . By Theorem 6.5 we can find U with U ∗ U =
I such that B := U ∗ AU is upper triangular. Since A is normal B is normal. Indeed,
BB ∗ = U ∗ AU U ∗ A∗ U = U ∗ AA∗ U = U ∗ A∗ AU = B ∗ B.
The proof is complete if we can show that an upper triangular normal matrix B
must be diagonal. The diagonal elements eii in E := B ∗ B and fii in F := BB ∗
are given by
n
eii =
i
k=1
n
|bki |2 and fii =
bki bki =
k=1
n
|bik |2 .
bik bik =
k=1
k=i
The result now follows by equating eii and fii for i = 1, 2, . . . , n. In particular for
i = 1 we have |b11|2 = |b11|2 +|b12 |2 +· · ·+|b1n |2 , so b1k = 0 for k = 2, 3, . . . , n.
Suppose B is diagonal in its first i − 1 rows so that bj k = 0 for j = 1, . . . , i −1,
k = j +1, . . . , n. Then
i
n
|bki |2 = |bii |2 =
eii =
k=1
|bik |2 = fii
k=i
and it follows that bik = 0, k = i +1, . . . , n. By induction on the rows we see that
B is diagonal. The last part of the theorem follows from Sect. 6.1.1.
2 −1 Example 6.6 The orthogonal diagonalization of A = −1 2 is U T AU =
1 .
diag(1, 3), where U = √1 11 −1
2
6.3 The Schur Factorization and Normal Matrices
139
6.3.4 The Rayleigh Quotient
The Rayleigh quotient is a useful tool when studying eigenvalues.
Definition 6.4 (Rayleigh Quotient) For A ∈ Cn×n and a nonzero x the number
R(x) = RA (x) :=
x ∗ Ax
x∗x
is called a Rayleigh quotient.
∗
If (λ, x) is an eigenpair for A then R(x) = xx ∗Ax
x = λ.
Equation (6.9) in the following theorem shows that the Rayleigh quotient of a
normal matrix is a convex combination of its eigenvalues.
Theorem 6.8 (Convex Combination of the Eigenvalues) Suppose A ∈ Cn×n
is normal with orthonormal eigenpairs (λj , uj ), for j = 1, 2, . . . , n. Then the
Rayleigh quotient is a convex combination of the eigenvalues of A
RA (x) =
n
2
i=1 λi |ci |
,
n
2
j =1 |cj |
n
x = 0,
x=
cj uj .
(6.9)
j =1
n
n
Proof By orthonormality of the eigenvectors x ∗ x =
i=1
j =1 c i ui cj uj =
n
n
n
n
2 . Similarly, x ∗ Ax =
2
|c
|
c
u
c
λ
u
=
j =1 j
i=1
j =1 i i j j j
i=1 λi |ci | . and
(6.9) follows. This is clearly a combination of nonnegative quantities and a convex
combination since ni=1 |ci |2 / nj=1 |cj |2 = 1.
6.3.5 The Quasi-Triangular Form
How far can we reduce a real matrix A with some complex eigenvalues by a
real unitary similarity transformation? To study this we note that the complex
eigenvalues of a real matrix occur in conjugate pairs, λ = μ + iν, λ = μ − iν,
where μ, ν are real. The real 2 × 2 matrix
μ ν
M=
−ν μ
(6.10)
has eigenvalues λ = μ + iν and λ = μ − iν.
Definition 6.5 (Quasi-Triangular Matrix) We say that a matrix is quasitriangular if it is block triangular with only 1 × 1 and 2 × 2 blocks on the
diagonal. Moreover, no 2 × 2 block should have real eigenvalues.
140
6 Eigenpairs and Similarity Transformations
As an example consider the matrix
⎡
⎤
D 1 R 1,2 R 1,3
2 1
3 2
R := ⎣ 0 D 2 R 2,3 ⎦ , D 1 :=
, D 2 := 1 , D 3 :=
.
−1 2
−1 1
0 0 D3
Since R is block triangular the characteristic polynomial of R is given by πR =
πD1 πD 2 πD3 . We find
πD1 (λ) = πD3 (λ) = λ2 − 4λ + 5,
πD2 (λ) = λ − 1,
and the eigenvalues D 1 and D 3 are λ1 = 2+i, λ2 = 2−i, while D 2 obviously has
the eigenvalue λ = 1.
Any A ∈ Rn×n can be reduced to quasi-triangular form by a real orthogonal
similarity transformation. For a proof see [16]. We will encounter the quasitriangular form in Chap. 15.
6.3.6 Hermitian Matrices
The special cases where A is Hermitian, or real and symmetric, deserve special
attention.
Theorem 6.9 (Spectral Theorem, Complex Form) Suppose A ∈ Cn×n is Hermitian. Then A has real eigenvalues λ1 , . . . , λn . Moreover, there is a unitary matrix
U ∈ Cn×n such that
U ∗ AU = diag(λ1 , . . . , λn ).
For any such U the columns {u1 , . . . , un } of U are orthonormal eigenvectors of A
and Auj = λj uj for j = 1, . . . , n.
Proof That the eigenvalues are real was shown in Lemma 2.3. The rest follows from
Theorem 6.7.
There is also a real version.
Theorem 6.10 (Spectral Theorem (Real Form)) Suppose A ∈ Rn×n is symmetric. Then A has real eigenvalues λ1 , λ2 , . . . , λn . Moreover, there is an orthogonal
matrix U ∈ Rn×n such that
U T AU = diag(λ1 , λ2 , . . . , λn ).
For any such U the columns {u1 , . . . , un } of U are orthonormal eigenvectors of A
and Auj = λj uj for j = 1, . . . , n.
6.4 Minmax Theorems
141
Proof Since a real symmetric matrix has real eigenvalues and eigenvectors this
follows from Theorem 6.9.
6.4 Minmax Theorems
There are some useful characterizations of the eigenvalues of a Hermitian matrix in
∗
terms of the Rayleigh quotient R(x) = RA (x) := xx ∗Ax
x . First we show
Theorem 6.11 (Minmax) Suppose A ∈ Cn×n is Hermitian with orthonormal
eigenpairs (λj , uj ), 1 ≤ j ≤ n, ordered so that λ1 ≥ · · · ≥ λn . Let 1 ≤ k ≤ n. For
any subspace S of Cn of dimension n − k + 1
λk ≤ max R(x),
(6.11)
x∈S
x =0
with equality for S = S̃ := span(uk , . . . , un ) and x = uk .
Proof Let S be any subspace of Cn of dimension n − k + 1 and define S :=
span(u1 , . . . , uk ). It is enough to find y ∈ S so that R(y) ≥ λk . Now S + S :=
{s + s : s ∈ S, s ∈ S } is a subspace of Cn and by (1.7)
dim(S ∩ S ) = dim(S) + dim(S ) − dim(S + S ) ≥ (n − k + 1) + k − n = 1.
It follows that S ∩ S is nonempty. Let y ∈ S ∩ S = kj =1 cj uj with
1. Defining cj = 0 for k + 1 ≤ j ≤ n, we obtain by Theorem 6.8
n
max R(x) ≥ R(y) =
x∈S
x =0
k
λj |cj | =
j =1
=
k
λj |cj | ≥
2
k
2
j =1 |cj |
λk |cj |2 = λk ,
2
j =1
j =1
and (6.11) follows. To show equality suppose z ∈ S = S̃. Now z = nj=k dj uj for
some dk , . . . , dn with nj=k |dj |2 = 1 and by Lemma 6.8 R(z) = nj=k λj |dj |2 ≤
λk . Since z ∈ S̃ is arbitrary we have maxx∈S̃ R(x) ≤ λk and equality in (6.11)
follows for S = S̃. Moreover, R(uk ) = λk .
x =0
There is also a maxmin version of this result.
Theorem 6.12 (Maxmin) Suppose A ∈ Cn×n is Hermitian with eigenvalues
λ1 , . . . , λn , ordered so that λ1 ≥ · · · ≥ λn . Let 1 ≤ k ≤ n. For any subspace
S of Cn of dimension k
λk ≥ min R(x),
x∈S
x =0
(6.12)
142
6 Eigenpairs and Similarity Transformations
with equality for S = S̃ := span(u1 , . . . , uk ) and x = uk . Here (λj , uj ), 1 ≤ j ≤
n are orthonormal eigenpairs for A.
Proof The proof is very similar to the proof of Theorem 6.11. We define S :=
span(uk , . . . , un ) and show that R(y) ≤ λk for some y ∈ S ∩ S . It is easy to see
that R(y) ≥ λk for any y ∈ S̃.
These theorems immediately lead to classical minmax and maxmin characterizations.
Corollary 6.1 (The Courant-Fischer Theorem) Suppose A ∈ Cn×n is Hermitian
with eigenvalues λ1 , . . . , λn , ordered so that λ1 ≥ · · · ≥ λn . Then
λk =
min
max R(x) =
dim(S )=n−k+1 x∈S
x =0
max min R(x),
dim(S )=k x∈S
x =0
k = 1, . . . , n.
(6.13)
Using Theorem 6.11 we can prove inequalities of eigenvalues without knowing
the eigenvectors and we can get both upper and lower bounds.
Theorem 6.13 (Eigenvalue Perturbation for Hermitian Matrices) Let A, B ∈
Cn×n be Hermitian with eigenvalues α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn .
Then
αk + εn ≤ βk ≤ αk + ε1 , for k = 1, . . . , n,
(6.14)
where ε1 ≥ ε2 ≥ · · · ≥ εn are the eigenvalues of E := B − A.
Proof Since E is a difference of Hermitian matrices it is Hermitian and the
eigenvalues are real. Let (αj , uj ), j = 1, . . . , n be orthonormal eigenpairs for A
and let S := span{uk , . . . , un }. By Theorem 6.11 we obtain
βk ≤ max RB (x) ≤ max RA (x) + max RE (x)
x∈S
x =0
x∈S
x =0
x∈S
x =0
≤ max RA (x) + maxn RE (x) = αk + ε1 ,
x∈S
x =0
x∈C
x =0
and this proves the upper inequality. For the lower one we define D := −E and
observe that −εn is the largest eigenvalue of D. Since A = B+D it follows from the
result just proved that αk ≤ βk − εn , which is the same as the lower inequality. In many applications of this result the eigenvalues of the matrix E will be small
and then the theorem states that the eigenvalues of B are close to those of A.
Moreover, it associates a unique eigenvalue of A with each eigenvalue of B.
6.5 Left Eigenvectors
143
6.4.1 The Hoffman-Wielandt Theorem
We can also give a bound involving all eigenvalues. The following theorem shows
that the eigenvalue problem for a normal matrix is well conditioned.
Theorem 6.14 (Hoffman-Wielandt Theorem) Suppose A, B ∈ Cn×n are both
normal matrices with eigenvalues λ1 , . . . , λn and μ1 , . . . , μn , respectively. Then
there is a permutation i1 , . . . , in of 1, 2, . . . , n such that
n
n
n
2
|aij − bij |2 .
|μij − λj | ≤
j =1
(6.15)
i=1 j =1
For a proof of this theorem see [19, p. 190]. For a Hermitian matrix we can
use the identity permutation if we order both set of eigenvalues in nonincreasing or
nondecreasing order.
6.5 Left Eigenvectors
Definition 6.6 (Left and Right Eigenpairs) Suppose A ∈ Cn×n is a square matrix,
λ ∈ C and y ∈ Cn is nonzero. We say that (λ, y) is a left eigenpair for A if
y ∗ A = λy ∗ or equivalently A∗ y = λy. We say that (λ, y) is a right eigenpair for
A if Ay = λy. If (λ, y) is a left eigenpair then λ is called a left eigenvalue and
y a left eigenvector. Similarly if (λ, y) is a right eigenpair then λ is called a right
eigenvalue and y a right eigenvector.
In this book an eigenpair will always mean a right eigenpair. A left eigenvector
is an eigenvector of A∗ . If λ is a left eigenvalue of A then λ is an eigenvalue of A∗
and then λ is an eigenvalue of A (cf. Exercise 6.3). Thus left and right eigenvalues
are identical, but left and right eigenvectors are in general different. For a Hermitian
matrix the right and left eigenpairs are the same.
Using right and left linearly independent eigenpairs we get some useful eigenvector expansions.
Theorem 6.15 (Biorthogonal Eigenvector Expansion) If A ∈ Cn×n has linearly independent right eigenvectors {x 1 , . . . , x n } then there exists a set of left
eigenvectors {y 1 , . . ., y n } with y ∗i x j = δi,j . Conversely, if A ∈ Cn×n has
linearly independent left eigenvectors {y 1 , . . . , y n } then there exists a set of right
eigenvectors {x 1 , . . . , x n } with y ∗i x j = δi,j . For any scaling of these sets we have
the eigenvector expansions
n
v=
j =1
y ∗j v
y ∗j x j
n
xj =
k=1
x ∗k v
y ,
y ∗k x k k
v ∈ Cn .
(6.16)
144
6 Eigenpairs and Similarity Transformations
Proof For any right eigenpairs (λ1 , x 1 ), . . . , (λn , x n ) and left eigenpairs
(λ1 , y 1 ), . . ., (λn , y n ) of A we have AX = XD, Y ∗ A = DY ∗ , where
X := [x 1 , . . . , x n ],
Y := [y 1 , . . . , y n ],
D := diag(λ1 , . . . , λn ).
Suppose X is nonsingular. Then AX = XD ⇒ A = XDX −1 ⇒ X −1 A =
DX −1 and it follows that Y ∗ := X−1 contains a collection of left eigenvectors such
that Y ∗ X = I . Thus the columns of Y are linearly independent and y ∗i x j = δi,j .
Similarly, if Y is nonsingular then AY −∗ = Y −∗ D and it follows that X := Y −∗
contains a collection of linearly independent right eigenvectors such that Y ∗ X = I .
If v = nj=1 cj x j then y ∗i v = nj=1 cj y ∗i x j = ci y ∗i x i , so ci = y ∗i v/y ∗i x i for
i = 1, . . . , n and the first expansion in (6.16) follows. The second expansion follows
similarly.
For a Hermitian matrix the right eigenvectors {x 1 , . . . , x n } are also left eigenvectors and (6.16) takes the form
n
v=
j =1
x ∗j v
x ∗j x j
xj .
(6.17)
6.5.1 Biorthogonality
Left- and right eigenvectors corresponding to distinct eigenvalues are orthogonal.
Theorem 6.16 (Biorthogonality) Suppose (μ, y) and (λ, x) are left and right
eigenpairs of A ∈ Cn×n . If λ = μ then y ∗ x = 0.
Proof Using the eigenpair relation in two ways we obtain y ∗ Ax = λy ∗ x = μy ∗ x
and we conclude that y ∗ x = 0.
Right and left eigenvectors corresponding to the same eigenvalue are sometimes
orthogonal, sometimes not.
Theorem 6.17 (Simple Eigenvalue) Suppose (λ, x) and (λ, y) are right and left
eigenpairs of A ∈ Cn×n . If λ has algebraic multiplicity one then y ∗ x = 0.
Proof Assume that x2 = 1. We have (cf. (6.8))
λ z∗
,
V AV =
0M
∗
6.6 Exercises Chap. 6
145
where V is unitary and V e1 = x. We show that if y ∗ x = 0 then λ is also an
eigenvalue of M contradicting the multiplicity assumption of λ. Let u := V ∗ y.
Then
(V ∗ A∗ V )u = V ∗ A∗ y = λV ∗ y = λu,
so (λ, u) is an eigenpair of V ∗ A∗ V . But then y ∗ x = u∗ V ∗ V e1 = u∗ e1 . Suppose
that u∗ e1 = 0, i.e., u = 0v for some nonzero v ∈ Cn−1 . Then
λ 0∗
V A Vu =
z M∗
∗
∗
0
0
0
=λ
=
∗
v
v
M v
and λ is an eigenvalue of M.
The casewith multiple eigenvalues is more complicated. For example, the matrix
A := 10 11 has one eigenvalue λ = 1 of algebraic multiplicity two, one right
eigenvector x = e1 and one left eigenvector y = e2 . Thus x and y are orthogonal.
6.6 Exercises Chap. 6
6.6.1 Exercises Sect. 6.1
Exercise 6.1 (Eigenvalues of a Block Triangular Matrix) What are the eigenvalues of the matrix
⎡2 1 0 0 0 0 0 0⎤
02100000
⎢0 0 2 0 0 0 0 0⎥
⎢ 0 0 0 2 0 0 0 0 ⎥ ∈ R8,8 ?
⎣0 0 0 1 2 0 0 0⎦
00000200
00000030
00000013
Exercise 6.2 (Characteristic Polynomial of Transpose) We have det(B T ) =
det(B) and det(B) = det(B)for any square matrix B. Use this to show that
a) πAT = πA ,
b) πA∗ (λ) = πA (λ).
Exercise 6.3 (Characteristic Polynomial of Inverse) Suppose (λ, x) is an eigenpair for A ∈ Cn×n . Show that
a) If A is nonsingular then (λ−1 , x) is an eigenpair for A−1 .
b) (λk , x) is an eigenpair for Ak for k ∈ Z.
146
6 Eigenpairs and Similarity Transformations
Exercise 6.4 (The Power of the Eigenvector Expansion) Show that if A ∈ Cn×n
is nondefective with eigenpairs (λj , x j ), j = 1, . . . , n then for any x ∈ Cn and
k∈N
n
A x=
k
cj λkj x j for some scalars c1 , . . . , cn .
(6.18)
j =1
Show that if A is nonsingular then (6.18) holds for all k ∈ Z.
Exercise 6.5 (Eigenvalues of an Idempotent Matrix) Let λ ∈ σ (A) where A2 =
A ∈ Cn×n . Show that λ = 0 or λ = 1. (A matrix is called idempotent if A2 = A).
Exercise 6.6 (Eigenvalues of a Nilpotent Matrix) Let λ ∈ σ (A) where Ak = 0
for some k ∈ N. Show that λ = 0. (A matrix A ∈ Cn×n such that Ak = 0 for some
k ∈ N is called nilpotent).
Exercise 6.7 (Eigenvalues of a Unitary Matrix) Let λ ∈ σ (A), where A∗ A = I .
Show that |λ| = 1.
Exercise 6.8 (Nonsingular Approximation of a Singular Matrix) Suppose A ∈
Cn×n is singular. Then we can find 0 > 0 such that A + I is nonsingular for all
∈ C with | | < 0 . Hint: det(A) = λ1 λ2 · · · λn , where λi are the eigenvalues of A.
Exercise 6.9 (Companion Matrix) For q0 , . . . , qn−1 ∈ C let p(λ) = λn +
qn−1 λn−1 + · · · + q0 be a polynomial of degree n in λ. We derive two matrices
that have (−1)n p as its characteristic polynomial.
a) Show that p = (−1)n πA where
⎡
−qn−1 −qn−2
⎢ 1
0
⎢
⎢
1
A=⎢ 0
⎢ .
..
⎣ ..
.
0
0
· · · −q1
··· 0
··· 0
. . ..
. .
··· 1
⎤
−q0
0 ⎥
⎥
0 ⎥
⎥.
.. ⎥
. ⎦
0
A is called a companion matrix of p.
b) Show that p = (−1)n πB where
⎡
⎤
· · · 0 −q0
· · · 0 −q1 ⎥
⎥
· · · 0 −q2 ⎥
⎥.
.. ⎥
. . ..
. .
. ⎦
0 0 · · · 1 −qn−1
00
⎢1 0
⎢
⎢
B = ⎢0 1
⎢. .
⎣ .. ..
Thus B can also be regarded as a companion matrix for p.
6.6 Exercises Chap. 6
147
Exercise 6.10 (Find Eigenpair Example) Find eigenvalues and eigenvectors of
⎡
⎤
123
A = ⎣ 0 2 3 ⎦. Is A defective?
002
Exercise 6.11 (Right or Wrong? (Exam Exercise 2005-1)) Decide if the following statements are right or wrong. Give supporting arguments for your decisions.
a) The matrix
1 3 4
A=
6 4 −3
is orthogonal?
b) Let
a1
A=
0a
where a ∈ R. There is a nonsingular matrix Y ∈ R2×2 and a diagonal matrix
D ∈ R2×2 such that A = Y DY −1 ?
Exercise 6.12 (Eigenvalues of Tridiagonal Matrix (Exam Exercise 2009-3)) Let
A ∈ Rn,n be tridiagonal (i.e. aij = 0 when |i − j | > 1) and suppose also that
ai+1,i ai,i+1 > 0 for i = 1, . . . , n − 1. Show that the eigenvalues of A are real.2
6.6.2 Exercises Sect. 6.2
Exercise 6.13 (Jordan Example)
For the Jordan factorization of the matrix A =
Find S.
!
3 0 1
−4 1 −2
−4 0 −1
"
we have J =
Exercise 6.14 (A Nilpotent Matrix) Show that (J m (λ) − λI )r =
1 ≤ r ≤ m − 1 and conclude that (J m (λ) − λI )m = 0.
!
!1 1 0"
010
001
0 I m−r
0 0
.
"
for
Exercise 6.15 (Properties of the Jordan Factorization)
Let J be the Jordan factorization of a matrix A ∈ Cn×n as given in Theorem 6.4.
Then for r = 0, 1, 2, . . ., m = 2, 3, . . ., and any λ ∈ C
a) Ar = SJ r S −1 ,
b) J r = diag(U r1 , . . . , U rk ),
2 Hint:
show that there is a diagonal matrix D such that D −1 AD is symmetric.
148
6 Eigenpairs and Similarity Transformations
c) U ri = diag(J mi,1 (λi )r , . . . , J mi,gi (λi )r ),
d) J m (λ)r = (E m + λI m )r =
min{r,m−1} r r−k k
Em.
k=0
k λ
Exercise 6.16 (Powers of a Jordan Block) Find J 100 and A100 for the matrix in
Exercise 6.13.
Exercise 6.17 (The Minimal Polynomial) Let J be the Jordan factorization of a
matrix A ∈ Cn×n as given in Theorem 6.4. The polynomial
μA (λ) :=
k
(λi − λ)mi where mi := max mi,j ,
1≤j ≤gi
i=1
(6.19)
is called the minimal polynomial of A. We define the matrix polynomial μA (A)
by replacing the factors λi − λ by λi I − A.
g
a) We have πA (λ) = ki=1 j i=1 (λi − λ)mi,j . Use this to show that the minimal
polynomial divides the characteristic polynomial, i.e., πA = μA νA for some
polynomial νA .
b) Show that μA (A) = 0 ⇐⇒ μA (J ) = 0.
c) (can be difficult) Use Exercises 6.14, 6.15 and the maximality of mi to show that
μA (A) = 0 . Thus a matrix satisfies its minimal equation. Finally show that the
degree of any polynomial p such that p(A) = 0 is at least as large as the degree
of the minimal polynomial.
d) Use 2. to show the Cayley-Hamilton Theorem which says that a matrix satisfies
its characteristic equation πA (A) = 0.
Exercise 6.18 (Cayley Hamilton Theorem (Exam Exercise 1996-3)) Suppose p
is a polynomial given by p(t) := rj =0 bj t j , where bj ∈ C and A ∈ Cn×n . We
define the matrix p(A) ∈ Cn×n by
r
p(A) :=
bj Aj ,
j =0
where A0 := I . From this it follows that if p(t) := (t − α1 ) · · · (t − αr ) for some
α0 , . . . , αr ∈ C then p(A) = (A − α1 ) · · · (A − αr ). We accept this without proof.
Let U ∗ AU = T , where U is unitary and T upper triangular with the eigenvalues
of A on the diagonal.
21
a) Find the characteristic polynomial πA to
. Show that π(A) = 0.
−1 4
b) Let now A ∈ Cn×n be arbitrary. For any polynomial p show that p(A) =
U p(T )U ∗ .
c) Let n, k ∈ N with 1 ≤ k < n. Let C, D ∈ Cn×n be upper triangular. Moreover,
ci,j = 0 for i, j ≤ k and dk+1,k+1 = 0. Define E := CD and show that ei,j = 0
for i, j ≤ k + 1.
6.6 Exercises Chap. 6
149
d) Now let p := πA be the characteristic polynomial of A. Show that p(T ) = 0.3
Then show that p(A) = 0. (Cayley Hamilton Theorem)
6.6.3 Exercises Sect. 6.3
Exercise
(Schur Factorization
Example) Show
1 1 that
a Schur factorization of
6.19
−1 √1
,
where
U
=
A = 13 22 is U T AU = −1
−1 1 .
0 4
2
Exercise 6.20 (Skew-Hermitian Matrix) Suppose C = A + iB, where A, B ∈
Rn×n . Show that C is skew-Hermitian if and only if AT = −A and B T = B.
Exercise 6.21 (Eigenvalues of a Skew-Hermitian Matrix) Show that any eigenvalue of a skew-Hermitian matrix is purely imaginary.
Exercise 6.22 (Eigenvector Expansion Using Orthogonal Eigenvectors) Show
that if the eigenpairs (λ1 , u1 ), . . . , (λn , un ) of A ∈ Cn×n are orthogonal, i.e.,
u∗j uk = 0 for j = k then the eigenvector expansions of x and Ax ∈ Cn take
the form
n
x=
n
cj u j ,
j =1
Ax =
cj λj uj , where cj =
j =1
u∗j x
u∗j uj
.
(6.20)
Exercise 6.23 (Rayleigh Quotient (Exam Exercise 2015-3))
a) Let A ∈ Rn×n be a symmetric matrix. Explain how we can use the spectral
theorem for symmetric matrices to show that
λmin = min R(x) = min R(x),
x =0
x2 =1
where λmin is the smallest eigenvalue of A, and R(x) is the Rayleigh quotient
given by
R(x) :=
3 Hint:
use a suitable factorization of p and use c).
x T Ax
.
xT x
150
6 Eigenpairs and Similarity Transformations
b) Let x, y ∈ Rn such that x2 = 1 and y = 0. Show that
T
R(x − ty) = R(x) − 2t Ax − R(x)x y + O(t 2 ),
where t > 0 is small.4
c) Based on the characterization given in a) above it is tempting to develop an
algorithm for computing λmin by approximating the minimum of R(x) over the
unit ball
B1 := {x ∈ Rn | x2 = 1 }.
Assume that x 0 ∈ B1 satisfies Ax 0 − R(x 0 )x 0 = 0, i.e., (R(x 0 ), x 0 ) is not an
eigenpair for A. Explain how we can find a vector x 1 ∈ B1 such that R(x 1 ) <
R(x 0 ).
6.6.4 Exercises Sect. 6.4
Exercise 6.24 (Eigenvalue Perturbation for Hermitian Matrices) Show that in
Theorem 6.13, if E is symmetric positive semidefinite then βi ≥ αi .
Exercise 6.25
(Hoffman-Wielandt)
−1 Show that (6.15) does not hold for the matrices A := 00 04 and B := −1
1 1 . Why does this not contradict the HoffmanWielandt theorem?
Exercise 6.26 (Biorthogonal
Expansion) Determine right and left eigenpairs for
the matrix A := 32 12 and the two expansions in (6.16) for any v ∈ R2 .
Exercise 6.27 (Generalized Rayleigh Quotient) For A ∈ Cn×n and any y, x ∈
∗
Cn with y ∗ x = 0 the quantity R(y, x) = RA (y, x) := yy ∗Ax
x is called a generalized
Rayleigh quotient for A. Show that if (λ, x) is a right eigenpair for A then
R(y, x) = λ for any y with y ∗ x = 0. Also show that if (λ, y) is a left eigenpair for
A then R(y, x) = λ for any x with y ∗ x = 0.
6.7 Review Questions
6.7.1
6.7.2
4 Hint:
Does A, AT and A∗ have the same eigenvalues? What about A∗ A and
AA∗ ?
Can a matrix with multiple eigenvalues be similar to a diagonal matrix?
Use Taylor’s theorem for the function f (t) = R(x − ty).
6.7 Review Questions
6.7.3
6.7.4
6.7.5
6.7.6
6.7.7
6.7.8
6.7.9
6.7.10
6.7.11
151
What is the geometric multiplicity of an eigenvalue? Can it be bigger than
the algebraic multiplicity?
What is the Jordan factorization of a matrix?
What are the eigenvalues of a diagonal matrix?
What are the Schur factors of a matrix?
What is a quasi-triangular matrix?
Give some classes of normal matrices. Why are normal matrices important?
State the Courant-Fischer theorem.
State the Hoffman-Wielandt theorem for Hermitian matrices.
What is a left eigenvector of a matrix?
Chapter 7
The Singular Value Decomposition
The singular value decomposition and the reduced form called the singular value
factorization are useful both for theory and practice. Some of their applications
include solving over-determined equations, principal component analysis in statistics, numerical determination of the rank of a matrix, algorithms used in search
engines, and the theory of matrices.
We know from Theorem 6.7 that a square matrix A can be diagonalized by
a unitary similarity transformation if and only if it is normal, that is A∗ A =
AA∗ . In particular, if A ∈ Cn×n is normal then it has a set of orthonormal
eigenpairs (λ1 , u1 ), . . . , (λn , un ). Letting U := [u1 , . . . , un ] ∈ Cn×n and D :=
diag(λ1 , . . . , λn ) we have the spectral decomposition
A = U DU ∗ , where U ∗ U = I .
(7.1)
The singular value decomposition (SVD) is a decomposition of a matrix in the form
A = U Σ V ∗ , where U and V are unitary, and Σ is a nonnegative diagonal matrix,
i.e., Σij = 0 for all i = j and Σii ≥ 0 for all i. The diagonal elements σi := Σii are
called singular values, while the columns of U and V are called singular vectors.
To be a singular value decomposition the singular values should be ordered, i.e.,
σi ≥ σi+1 for all i.
Example 7.1 (SVD) The following is a singular value decomposition of a rectangular matrix.
⎡
⎤
⎡
⎤⎡ ⎤
14 2
1 2 2
20
1 ⎣
1
1 3 4
= U ΣV ∗ .
A=
4 22⎦ = ⎣ 2 −2 1 ⎦ ⎣0 1⎦
15
3
5 4 −3
16 13
2 1 −2
00
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_7
(7.2)
153
154
7 The Singular Value Decomposition
Indeed, U and V are unitary since the columns (the singular vectors) are orthonormal, and Σ is a nonnegative diagonal matrix with singular values σ1 = 2 and
σ2 = 1.
7.1 The SVD Always Exists
The singular value decomposition is closely related to the eigenpairs of A∗ A and
AA∗ .
7.1.1 The Matrices A∗ A, AA∗
To start we show that bases for the four fundamental subspaces R(A), N (A),
R(A∗ ) and N (A∗ ) of a matrix A can be determined from the eigenpairs of A∗ A
and AA∗ .
Theorem 7.1 (The Matrices A∗ A, AA∗ ) Suppose m, n ∈ N and A ∈ Cm×n .
1. The matrices A∗ A ∈ Cn×n and AA∗ ∈ Cm×m have the same nonzero eigenvalues with the same algebraic multiplicities. Moreover the extra eigenvalues of the
larger matrix are all zero.
2. The matrices A∗ A and AA∗ are Hermitian with nonnegative eigenvalues.
3. Let (λj , v j ) be orthonormal eigenpairs for A∗ A with
λ1 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn .
Then {Av 1 , . . . , Av r } is an orthogonal basis for the column space R(A) :=
{Ay ∈ Cm : y ∈ Cn } and {v r+1 , . . . , v n } is an orthonormal basis for the
nullspace N (A) := {y ∈ Cn : Ay = 0}.
4. Let (λj , uj ) be orthonormal eigenpairs for AA∗ . If λj > 0, j = 1, . . . , r and
λj = 0, j = r + 1, . . . , m then {A∗ u1 , . . . , A∗ ur } is an orthogonal basis for
the column space R(A∗ ) and {ur+1 , . . . , um } is an orthonormal basis for the
nullspace N (A∗ ).
5. The rank of A equals the number of positive eigenvalues of A∗ A and AA∗ .
Proof
1. Consider the characteristic polynomials πA∗ A and πAA∗ . By (6.1) we have
λm πA∗ A (λ) = λn πAA∗ (λ),
λ ∈ C,
and the claim follows.
2. The matrices A∗ A and AA∗ are Hermitian and positive semidefinite and
therefore has nonnegative eigenvalues (cf. Lemmas 4.2 and 4.5). Moreover, if
7.1 The SVD Always Exists
155
A∗ Av = λv with v = 0, then
λ=
Av22
v ∗ A∗ Av
=
≥ 0.
v∗v
v22
(7.3)
3. By orthonormality of v 1 , . . . , v n we have (Av j )∗ Av k = v ∗j A∗ Av k = λk v ∗j v k
= 0 for j = k, showing that Av 1 , . . . , Av n are orthogonal vectors. Moreover,
(7.3) implies that Av 1 , . . . , Av r are nonzero and Av j = 0 for j = r +
1, . . . , n. In particular, the elements of {Av 1 , . . . , Av r } and {v r+1 , . . . , v n } are
linearly independent vectors in R(A) and N (A), respectively. The proof will
be complete once it is shown that R(A) ⊂ span(Av 1 , . . . , Av r ) and N (A) ⊂
span(v r+1 , . . . , v n ). Suppose x ∈ R(A). Then x = Ay for some y ∈ Cn ,
n
Let y =
j =1 cj v j be an eigenvector expansion of y. Since Av j = 0 for
n
r
j = r + 1, . . . , n we obtain x = Ay =
j =1 cj Av j =
j =1 cj Av j ∈
n
span(Av 1 , . . . , Av r ). Finally, if y =
c
v
∈
N
(A),
then
we have
j =1 j j
r
Ay =
j =1 cj Av j = 0, and c1 = · · · = cr = 0 since Av 1 , . . . , Av r are
linearly independent. But then y = nj=r+1 cj v j ∈ span(v r+1 , . . . , v n ).
4. Since AA∗ = B ∗ B with B := A∗ this follows from part 3 with A = B.
5. By part 1 and 2 A∗ A and AA∗ have the same number r of positive eigenvalues
and by part 3 and 4 r is the rank of A.
The following theorem shows, in a constructive way, that any matrix has a
singular value decomposition.
Theorem 7.2 (Existence of SVD) Suppose for m, n, r ∈ N that A ∈ Cm×n has
rank r, and that (λj , v j ) are orthonormal eigenpairs for A∗ A with λ1 ≥ · · · ≥
λr > 0 = λr+1 = · · · = λn . Define
1. V := [v 1 , . . . , v n ] ∈ Cn×n ,
2. Σ ∈ Rm×n is a diagonal matrix with diagonal elements σj := λj for j =
1, . . . , min(m, n),
3. U := [u1 , . . . , um ] ∈ Cm×m , where uj = σj−1 Av j for j = 1, . . . , r and
ur+1 , . . . , um is any extension of u1 , . . . , ur to an orthonormal basis u1 , . . . , um
for Cm .
Then A = U Σ V ∗ is a singular value decomposition of A.
Proof Let U , Σ, V be as in the theorem. The vectors u1 , . . . , ur are orthonormal
since Av 1 , . . . , Av r are orthogonal and σj = Av j 2 > 0, j = 1, . . . , r by (7.3).
But then U and V are unitary and Σ is a nonnegative diagonal matrix. Moreover,
U Σ = U [σ1 e1 , . . . , σr er , 0, . . . , 0]
= [σ1 u1 , . . . , σr ur , 0, . . . , 0]
= [Av 1 , . . . , Av n ].
156
7 The Singular Value Decomposition
Thus U Σ = AV and since V is square and unitary we find U ΣV ∗ = AV V ∗ = A
and we have an SVD of A with σ1 ≥ σ2 ≥ · · · ≥ σr .
⎡
⎤
14 2
1 ⎣
Example 7.2 (Find SVD) To derive the SVD in (7.2) where A = 15
4 22⎦, we
16 13
first compute the eigenpairs of
1 52 36
B := A A =
25 36 73
T
as
3
3
B
=4
,
4
4
4
4
B
=1
.
−3
−3
3 4
. Now u1 = Av 1 /σ1 = [1, 2, 2]T /3,
Thus σ1 = 2, σ2 = 1, and V =
4 −3
u2 = Av 2 /σ2 = [2, −2, 1]T /3. For an SVD we also need u3 which is any vector
of length one orthogonal to u1 and u2 . u3 = [2, 1, −2]T /3 is such a vector and we
obtain the singular value decomposition (7.2).
1
5
7.2 Further Properties of SVD
We first consider a reduced SVD that is often convenient.
7.2.1 The Singular Value Factorization
Suppose A = U ΣV ∗ is a singular value decomposition of A of rank r. Consider
the block partitions
U = [U 1 , U 2 ] ∈ Cm×m ,
U 1 := [u1 , . . . , ur ],
U 2 := [ur+1 , . . . , um ],
V = [V 1 , V 2 ] ∈ Cn×n , V 1 := [v 1 , . . . , v r ], V 2 := [v r+1 , . . . , v n ],
Σ1
0r,n−r
Σ=
∈ Rm×n , where Σ 1 := diag(σ1 , . . . , σr ).
0m−r,r 0m−r,n−r
(7.4)
Thus Σ1 contains the r positive singular values on the diagonal and for k, l ≥ 0 the
symbol 0k,l = [ ] denotes the empty matrix if k = 0 or l = 0, and the zero matrix
7.2 Further Properties of SVD
157
with k rows and l columns otherwise. We obtain by block multiplication a reduced
factorization
A = U ΣV ∗ = U 1 Σ 1 V ∗1 .
(7.5)
As an example:
1 1 1 1 1 1
2 0 1 1 −1
1 −1
= √
=√
√
2 √ 1 −1 .
1 −1
2 1 −1 0 0
2 1 1
2 1
2
Definition 7.1 (SVF) Let m, n, r ∈ N and suppose A ∈ Cm×n has r positive
singular values, i.e., A has rank r. A singular value factorization (SVF) is a
factorization of A ∈ Cm×n of the form A = U 1 Σ 1 V ∗1 , where U 1 ∈ Cm×r and
V 1 ∈ Cn×r have orthonormal columns, and Σ 1 ∈ Rr×r is a diagonal matrix with
σ1 ≥ · · · ≥ σr > 0.
An SVD and an SVF of a matrix A of rank r are closely related.
1. Let A = U ΣV ∗ be an SVD of A. Then A = U 1 Σ 1 V ∗1 is an SVF of A, where
U 1 , V 1 contain the first r columns of U , V respectively, and Σ1 ∈ Rr×r is a
diagonal matrix with the positive singular values on the diagonal.
2. Conversely, suppose A = U 1 Σ 1 V ∗1 is a singular value factorization of A. Extend
U 1 and V 1 in any way to unitary matrices U ∈ Cm×m and V ∈ Cn×n , and let Σ
be given by (7.4). Then A = U ΣV ∗ is an SVD of A.
3. If A = [u1 , . . . , ur ] diag(σ1 , . . . , σr )[v 1 , . . . , v r ]∗ is a singular value factorization of A then
r
A=
σj uj v ∗j .
(7.6)
j =1
This is known as the outer product form of the SVD and SVF.
4. We note that a nonsingular square matrix has full rank and only positive singular
values. Thus the SVD and SVF are the same for a nonsingular matrix.
Example 7.3 (r < n < m) To find the SVF and SVD of
⎡
⎤
11
A = ⎣1 1⎦ .
00
we first compute eigenpairs of
B := AT A =
22
22
158
7 The Singular Value Decomposition
as
1
1
B
=4
,
1
1
1
1
B
=0
,
−1
−1
and we find σ1 = 2, σ2 = 0, Thus r = 1, m = 3, n = 2 and
⎡
Σ1
⎣
Σ= 0
0
⎤
0
0⎦ ,
0
1 1 1
V = √
.
2 1 −1
Σ 1 = [2],
√
We find u1 = Av 1 /σ1 = s 1 / 2, where s 1 = [1, 1, 0]T , and the SVF of A is given
by
⎡ ⎤
1
1 ⎣ ⎦
1 A= √
1 [2] √ 1 1 .
2 0
2
To find an SVD we need to extend u1 to an orthonormal basis for R3 . We first extend
s 1 to a basis {s 1 , s 2 , s 3 } for R3 , apply the Gram-Schmidt orthogonalization process
to {s 1 , s 2 , s 3 }, and then normalize. Choosing the basis
s1 =
!1"
1
0
,
s2 =
!0"
1
0
,
s3 =
s T2 w 1
w
w T1 w 1 1
!0"
0
1
,
−1/2
1/2
0
we find from (5.8) w1 = s 1 , w 2 = s 2 −
=
, w3 = s 3 −
!0"
T
T
s 3 w1
s w
w − w3T w2 w2 = 0 . Normalizing the wi ’s we obtain u1 = w1 /w 1 2 =
w T1 w 1 1
1
√
√ 2 2
√
√
[1/ 2, 1/ 2, 0]T , u2 = w2 /w2 2 = [−1/ 2, 1/ 2, 0]T , and u3 = s 3 /s 3 2 =
[0, 0, 1]T . Therefore, A = U ΣV T , is an SVD, where
U :=
√
√
1/√2 −1/√ 2 0
1/ 2 1/ 2 0
0
0
1
∈ R3,3 ,
Σ :=
!2 0"
00
00
∈ R3,2 ,
1 1 ∈ R2,2 .
V := √ 11 −1
2
The method we used to find the singular value decomposition in the examples
and exercises can be suitable for hand calculation with small matrices, but it is
not appropriate as a basis for a general purpose numerical method. In particular,
the Gram-Schmidt orthogonalization process is not numerically stable, and forming
A∗ A can lead to extra errors in the computation. Standard computer implementations of the singular value decomposition [16] first reduces A to bidiagonal form
and then use an adapted version of the QR algorithm where the matrix A∗ A is not
formed. The QR algorithm is discussed in Chap. 15.
7.3 A Geometric Interpretation
159
7.2.2 SVD and the Four Fundamental Subspaces
The singular vectors form orthonormal bases for the four fundamental subspaces
R(A), N (A), R(A∗ ), and N (A∗ ).
Theorem 7.3 (Singular Vectors and Orthonormal Bases) For positive integers
m, n let A ∈ Cm×n have rank r and a singular value decomposition A =
[u1 , . . . , um ]Σ[v 1 , . . . , v n ]∗ = U ΣV ∗ . Then the singular vectors satisfy
Av i = σi ui , i = 1, . . . , r,
∗
A ui = σi v i , i = 1, . . . , r,
Av i = 0, i = r + 1, . . . , n,
A∗ ui = 0, i = r + 1, . . . , m.
(7.7)
Moreover,
1. {u1 , . . . , ur } is an orthonormal basis for R(A),
2. {ur+1 , . . . , um }is an orthonormal basis for N (A∗ ),
(7.8)
3. {v 1 , . . . , v r } is an orthonormal basis for R(A∗ ),
4. {v r+1 , . . . , v n } is an orthonormal basis for N (A).
Proof If A = U ΣV ∗ then
= U Σ, or in terms of the block partition (7.4)
AV
A[V 1 , V 2 ] = [U 1 , U 2 ] Σ01 00 . But then AV 1 = U 1 Σ 1 , AV 2 = 0, and this
implies the first part of (7.7). Taking conjugate transpose of A = U Σ V ∗ gives
A∗ = V Σ ∗ U ∗ or A∗ U = V Σ ∗ . Using the block partition as before we obtain the
last part of (7.7).
It follows from Theorem 7.1 that {Av 1 , . . . , Av r } is an orthogonal basis for
R(A), {A∗ u1 , . . . , A∗ ur } is an orthogonal basis for R(A∗ ), {v r+1 , . . . , v m } is an
orthonormal basis for N (A) and{ur+1 , . . . , um } is an orthonormal basis for N (A∗ ).
By (7.7) {u1 , . . . , ur } is an orthonormal basis for R(A) and {v 1 , . . . , v r } is an
orthonormal basis for R(A∗ ).
7.3 A Geometric Interpretation
The singular value decomposition and factorization give insight into the geometry of
a linear transformation. Consider the linear transformation T : Rn → Rm given by
T z := Az where A ∈ Rm×n . Assume that rank(A) = n. In the following theorem
we show that the function T maps the unit sphere in Rn given by S := {z ∈ Rn :
z2 = 1} onto an ellipsoid E := AS = {Az : z ∈ S} in Rm .
Theorem 7.4 (SVF Ellipse) Suppose A ∈ Rm×n has rank r = n, and let A =
U 1 Σ1 V T1 be a singular value factorization of A. Then
E = U 1 Ẽ where Ẽ := {y = [y1, . . . , yn ]T ∈ Rn :
y12
σ12
+ ··· +
yn2
= 1}.
σn2
160
7 The Singular Value Decomposition
Proof Suppose z ∈ S. Now Az = U 1 Σ1 V T1 z = U 1 y, where y := Σ1 V T1 z.
Since rank(A) = n it follows that V 1 = V is square so that V 1 V T1 = I . But then
V 1 Σ −1
1 y = z and we obtain
−1
2
2
1 = z22 = V 1 Σ −1
1 y2 = Σ 1 y2 =
y12
σ12
+ ···+
yn2
.
σn2
This implies that y ∈ Ẽ. Finally, x = Az = U 1 Σ1 V T1 z = U 1 y, where y ∈ Ẽ
implies that E = U 1 Ẽ.
The equation 1 =
y12
σ12
+ ··· +
yn2
σn2
describes an ellipsoid in Rn with semiaxes
of length σj along the unit vectors ej for j = 1, . . . , n. Since the orthonormal
transformation U 1 y → x preserves length, the image E = AS is a rotated ellipsoid
with semiaxes along the left singular vectors uj = U ej , of length σj , j = 1, . . . , n.
Since Av j = σj uj , for j = 1, . . . , n the right singular vectors defines points in S
that are mapped onto the semiaxes of E.
Example 7.4 (Ellipse) Consider the transformation A : R2 → R2 given by the
matrix
1 11 48
A :=
25 48 39
in Example 7.8. Recall that σ1 = 3, σ2 = 1, u1 = [3, 4]T /5 and u2 = [−4, 3]T /5.
The ellipses y12 /σ12 + y22 /σ22 = 1 and E = AS = U 1 Ẽ are shown in Fig. 7.1. Since
Fig. 7.1 The ellipse y12 /9 + y22 = 1 (left) and the rotated ellipse AS (right)
7.4 Determining the Rank of a Matrix Numerically
161
y = U T1 x = [3/5x1 + 4/5x2 , −4/5x1 + 3/5x2 ]T , the equation for the ellipse on the
right is
(− 45 x1 + 35 x2 )2
( 35 x1 + 45 x2 )2
+
= 1,
9
1
7.4 Determining the Rank of a Matrix Numerically
In many elementary linear algebra courses a version of Gaussian elimination, called
Gauss-Jordan elimination, is used to determine the rank of a matrix. To carry this
out by hand for a large matrix can be a Herculean task and using a computer and
floating point arithmetic the result will not be reliable. Entries, which in the final
result should have been zero, will have nonzero values because of round-off errors.
As an alternative we can use the singular value decomposition to determine rank.
Although success is not at all guaranteed, the result will be more reliable than if
Gauss-Jordan elimination is used.
By Theorem 7.2 the rank of a matrix is equal to the number of nonzero singular
values, and if we have computed the singular values, then all we have to do is
to count the nonzero ones. The problem however is the same as for Gaussian
elimination. Due to round-off errors none of the computed singular values are likely
to be zero.
7.4.1 The Frobenius Norm
This commonly occurring matrix norm will be used here in a discussion of how
many of the computed singular values can possibly be considered to be zero. The
Frobenius norm, of a matrix A ∈ Cm×n is defined by
AF :=
m
n
|aij |2
1/2
.
(7.9)
i=1 j =1
There is a relation between the Frobenius norm of a matrix and its singular
values. First we derive some elementary properties of this norm. A systematic study
of matrix norms is given in the next chapter.
Lemma 7.1 (Frobenius Norm Properties) For any m, n ∈ N and any matrix A ∈
Cm×n
1. A∗ F = AF ,
2. A2F = nj=1 a :j 22 ,
162
7 The Singular Value Decomposition
3. U AF = AV F = AF for any unitary matrices U ∈ Cm×m and V ∈
Cn×n ,
4. ABF ≤ AF BF for any B ∈ Cn,k , k ∈ N,
5. Ax2 ≤ AF x2 , for all x ∈ Cn .
Proof
m
n
2
2
2
1. A∗ 2F = nj=1 m
i=1 |a ij | =
i=1
j =1 |aij | = AF .
2. This follows since the Frobenius norm is the Euclidian norm of a vector,
AF := vec(A)2 , where vec(A) ∈ Cmn is the vector obtained by stacking
the columns a :j of A on top of each other.
3. Recall that if U ∗ U = I then U x2 = x2 for all x ∈ Cn . Applying this to each
2.
n
2
j =1 U a :j 2 =
1.
AV F = V ∗ A∗ F
column a :j of A we find U A2F =
Similarly, since V V ∗ = I we find
4. Using the Cauchy-Schwarz inequality and 2. we obtain
m
k
AB2F =
|a ∗i: b:j |2 ≤
i=1 j =1
m
n
2 2.
2
j =1 a :j 2 = AF .
1.
= A∗ F = AF .
k
a i: 22 b :j 22 = A2F B2F .
i=1 j =1
5. Since vF = v2 for a vector this follows by taking k = 1 and B = x in 4.
Theorem 7.5 (Frobenius Norm and Singular Values) We have AF
%
σ12 + · · · + σn2 , where σ1 , . . . , σn are the singular values of A.
=
Proof Using Lemma 7.1 we find
AF = U ∗ AV F = ΣF =
3.
%
σ12 + · · · + σn2 .
7.4.2 Low Rank Approximation
m×n has a singular value decomposition A =
Suppose
D ∗m ≥ n ≥ 1 and A ∈ C
U 0 V , where D = diag(σ1 , . . . , σn ). We choose > 0 and let 1 ≤ r ≤ n be
2 +· · ·+σ 2 < 2 . Define A := U D V ∗ , where
the smallest integer such that σr+1
n
0
D := diag(σ1 , . . . , σr , 0, . . . , 0) ∈ Rn×n . By Lemma 7.1
A − A F = U
!
D−D 0
"
%
"
!
2 + · · · + σ2 < .
V ∗ F = D−D
=
σr+1
F
n
0
7.5 Exercises Chap. 7
163
Thus, if is small then A is near a matrix A of
% rank r. This can be used to determine
2 + · · · + σ 2 is “small”. Then we
rank numerically. We choose an r such that σr+1
n
postulate that rank(A) = r since A is close to a matrix of rank r.
The following theorem shows that of all m × n matrices of rank r, A is closest
to A measured in the Frobenius norm.
Theorem 7.6 (Best Low Rank Approximation) Suppose A ∈ Rm×n has singular
values σ1 ≥ · · · ≥ σn ≥ 0. For any r ≤ rank(A) we have
A − A F =
min A − BF =
%
B∈Êm×n
2 + · · · + σ 2.
σr+1
n
rank(B)=r
For the proof of this theorem we refer to p. 322 of [16].
7.5 Exercises Chap. 7
7.5.1 Exercises Sect. 7.1
Exercise 7.1 (SVD1) Show that the decomposition
1 1 1
11
20 1 1 1
√
A :=
= √
= U DU T
11
2 1 −1 0 0
2 1 −1
(7.10)
is both a spectral decomposition and a singular value decomposition.
Exercise 7.2 (SVD2) Show that the decomposition
1 1 1
1 −1
2 0 1 1 −1
A :=
=√
=: U ΣV T
√
1 −1
2 1 −1 0 0
2 1 1
(7.11)
is a singular value decomposition. Show that A is defective so it cannot be
diagonalized by any similarity transformation.
Exercise 7.3 (SVD Examples) Find the singular value decomposition of the following matrices
3
a) A =
.
4
⎡
⎤
11
b) A = ⎣ 2 2 ⎦.
22
164
7 The Singular Value Decomposition
Exercise 7.4 (More SVD Examples) Find the singular value decomposition of the
following matrices
a) A = e1 the first unit vector in Rm .
b) A = eTn the last unit vector in Rn .
c) A = −10 03 .
Exercise 7.5 (Singular Values of a Normal Matrix) Show that
a) the singular values of a normal matrix are the absolute values of its eigenvalues,
b) the singular values of a symmetric positive semidefinite matrix are its eigenvalues.
Exercise 7.6 (The Matrices A∗ A, AA∗ and SVD) Show the following: If A =
U Σ V is a singular value decomposition of A ∈ Cm×n then
a)
b)
c)
d)
A∗ A = V diag(σ12 , . . . , σn2 )V ∗ is a spectral decomposition of A∗ A.
AA∗ = U diag(σ12 , . . . , σm2 )U ∗ is a spectral decomposition of AA∗ .
The columns of U are orthonormal eigenvectors of AA∗ .
The columns of V are orthonormal eigenvectors of A∗ A.
Exercise 7.7 (Singular Values (Exam Exercise 2005-2)) Given the statement:
“If A ∈ Rn×n has singular values (σ1 , . . . , σn ) then A2 has singular values
(σ12 , . . . , σn2 )”. Find a class of matrices for which the statement is true. Show that
the statement is not true in general.
7.5.2 Exercises Sect. 7.2
Exercise7.8 (Nonsingular
Matrix) Derive the SVF and SVD of the matrix1
1 11 48
. Also, using possibly a computer, find its spectral decomposition
A = 25
48 39
U DU T . The matrix A is normal, but the spectral decomposition is not an SVD.
Why?
Exercise 7.9 (Full Row Rank) Find2 the SVF and SVD of
1 14 4 16
A :=
∈ R2×3 .
15 2 22 13
3 −4 3 0 1 3 4
.
4 3
0 1 5 4 −3
2 Hint: Take the transpose of the matrix in (7.2).
1 Answer:
A=
1
5
7.5 Exercises Chap. 7
165
Exercise 7.10 (Counting Dimensions of Fundamental Subspaces) Suppose A ∈
Cm×n . Show using SVD that
a) rank(A) = rank(A∗ ).
b) rank(A) + null(A) = n,
c) rank(A) + null(A∗ ) = m,
where null(A) is defined as the dimension of N (A).
Exercise 7.11 (Rank and Nullity Relations) Use Theorem 7.1 to show that for
any A ∈ Cm×n
a) rank A = rank(A∗ A) = rank(AA∗ ),
b) null(A∗ A) = null A, and null(AA∗ ) = null(A∗ ).
Exercise 7.12 (Orthonormal Bases Example) Let A and B be as in Example 7.2.
Give orthonormal bases for R(B) and N (B).
Exercise 7.13 (Some Spanning Sets) Show for any A ∈ Cm×n that R(A∗ A) =
R(V 1 ) = R(A∗ )
Exercise 7.14 (Singular Values and Eigenpair of Composite Matrix) Let
A ∈ Cm×n with m ≥ n have singular values σ1 , . . . , σn , left singular vectors
u1 , . . . , um ∈ Cm , and right singular vectors v 1 , . . . , v n ∈ Cn . Show that the matrix
0 A
C :=
∈ R(m+n)×(m+n)
A∗ 0
has the n + m eigenpairs
{(σ1 , p 1 ), . . . , (σn , p n ), (−σ1 , q 1 ), . . . , (−σn , q n ), (0, r n+1 ), . . . , (0, r m )},
where
u
pi = i ,
vi
ui
qi =
,
−v i
u
r j = j , for i = 1, . . . , n, j = n + 1, . . . , m.
0
Exercise 7.15 (Polar Decomposition (Exam Exercise 2011-2)) Given n ∈ N and
a singular value decomposition A = U ΣV T of a square matrix A ∈ Rn,n , consider
the matrices
Q := U V T ,
P := V ΣV T
(7.12)
of order n.
a) Show that
A = QP
and show that Q is orthonormal.
(7.13)
166
7 The Singular Value Decomposition
b) Show that P is symmetric positive semidefinite and positive definite if A is
nonsingular. The factorization in (7.13) is called a polar factorization
c) Use the singular value decomposition of A to give a suitable definition of B :=
AT A so that P = B.
For the rest of this problem assume that A is nonsingular. Consider the iterative
method
X k+1 =
1
, k = 0, 1, 2, . . . with X0 = A,
X k + X −T
k
2
(7.14)
for finding Q.
d) Show that the iteration (7.14) is well defined by showing that X k = U Σ k V T ,
for a diagonal matrix Σk with positive diagonal elements, k = 0, 1, 2, . . ..
e) Show that
Xk+1 − Q =
1 −T T
X k X k − QT X k − Q
2
(7.15)
and use (7.15) and the Frobenius norm to show (quadratic convergence to Q)
X k+1 − QF ≤
1 −1
X F X k − Q2F .
2 k
(7.16)
f) Write a MATLAB program
function [Q,P,k] = polardecomp(A,tol,K) to carry out the iteration in (7.14). The output is approximations Q and P = QT A to the polar
decomposition A = QP of A and the number of iterations k such that
Xk+1 − X k F < tol ∗ X k+1 F . Set k = K + 1 if convergence is not achieved
in K iterations. The Frobenius norm in MATLAB is written norm(A,’fro’).
Exercise 7.16 (Underdetermined System (Exam Exercise 2015-1))
a) Let A be the matrix
⎡
⎤
1 2
A = ⎣ 0 1⎦ .
−1 3
Compute A1 and A∞ .
b) Let B be the matrix
B=
1 0 −1
.
11 1
Find the spaces span(B T ) and ker(B).
7.5 Exercises Chap. 7
167
c) Consider the underdetermined linear system
x1
− x3 = 4,
x1 + x2 + x3 = 12.
Find the solution x ∈ R3 with x2 as small as possible.
d) Let A ∈ Rm×n be a matrix with linearly independent columns, and b ∈ Rm
a vector. Assume that we use the Gauss-Seidel method (cf. Chap. 12) to solve
the normal equations AT Ax = AT b. Will the method converge? Justify your
answer.
7.5.3 Exercises Sect. 7.4
Exercise 7.17 (Rank Example) Consider the singular value decomposition
⎡
0
⎢4
A := ⎢
⎣4
0
⎤ ⎡ 1 1 1 1 ⎤⎡
6
3 3
2 −2 −2
2
⎢ 1 1 1 1 ⎥⎢0
1 −1 ⎥
⎥ = ⎢ 2 2 2 2 ⎥⎢
1 −1 ⎦ ⎣ 21 12 − 12 − 12 ⎦ ⎣ 0
1
1
1
1
0
3 3
2 −2
2 −2
⎤
00 ⎡2 2 1⎤
3
3
3
6 0⎥
⎥ ⎣ 2 −1 −2 ⎦
⎦
3
3
3
00
2
2
1
−
3
3
3
00
a) Give orthonormal bases for R(A), R(AT ), N (A) and N (AT ).
b) Explain why for all matrices B ∈ R4,3 of rank one we have A − BF ≥ 6.
c) Give a matrix A1 of rank one such that A − A1 F = 6.
Exercise 7.18 (Another Rank Example) Let A be the n × n matrix that for n = 4
takes the form
A=
1 −1 −1 −1
0 1 −1 −1
0 0 1 −1
0 0 0 1
.
Thus A is upper triangular with diagonal elements one and all elements above the
diagonal equal to −1. Let B be the matrix obtained from A by changing the (n, 1)
element from zero to −22−n .
a) Show that Bx = 0, where x := [2n−2 , 2n−3 , . . . , 20 , 1]T . Conclude that B is
singular, det(A) = 1, and A − BF = 22−n . Thus even if det(A) is not small
the Frobenius norm of A − B is small for large n, and the matrix A is very close
to being singular for large n.
b) Use Theorem 7.6 to show that the smallest singular vale σn of A is bounded
above by 22−n .
168
7 The Singular Value Decomposition
Exercise 7.19 (Norms, Cholesky and SVD (Exam Exercise 2016-1))
a) Let A be the matrix
⎡
⎤
3 1
A = ⎣ 2 3⎦ .
−1 5
Compute A1 , A∞ and AF .
b) Let T be the matrix
T =
2 −1
.
−1 2
Show that T is symmetric positive definite, and find the Cholesky factorization
T = LLT of T .
c) Let A = U ΣV ∗ be a singular value decomposition of the m × n-matrix A with
m ≥ n, and let A = ri=1 σi ui v ∗i , where 1 ≤ r ≤ n, σi are the singular values
of A, and where ui , v i are the columns of U and V . Prove that
2
A − A 2F = σr+1
+ · · · + σn2 .
7.6 Review Questions
7.6.1
Consider an SVD and an SVF of a matrix A.
•
•
•
•
•
•
7.6.2
What are the singular values of A?
how is the SVD defined?
how can we find an SVF if we know an SVD?
how can we find an SVD if we know an SVF?
what are the relations between the singular vectors?
which singular vectors form bases for R(A) and N (A∗ )?
How are the Frobenius norm and singular values related?
Part III
Matrix Norms and Least Squares
We introduce vector and matrix norms and use them to study how sensitive the
solution of a linear system is to perturbation in the data. This leads to the important
concept of condition number.
In the second chapter in this part we consider solving linear systems in the
least squares sense. We give examples, the basic theory, discuss numerical methods
and perturbation theory. Singular values and the important concept of generalized
inverses play a central role in our presentation.
Chapter 8
Matrix Norms and Perturbation Theory
for Linear Systems
Norms are used to measure the size of vector and matrices.
8.1 Vector Norms
Definition 8.1 (Vector Norm) A (vector) norm in a real (resp. complex) vector
space V is a function · : V → R that satisfies for all x, y in V and all a in R
(resp. C)
1. x ≥ 0 with equality if and only if x = 0.
2. ax = |a| x.
3. x + y ≤ x + y.
(positivity)
(homogeneity)
(subadditivity)
The triple (V, R, ·) (resp. (V, C, ·)) is called a normed vector space and the
inequality 3. is called the triangle inequality.
In this book the vector space will be one of Rn , Cn or one of the matrix spaces
or Cm×n . Vector addition is defined by element wise addition and scalar
multiplication is defined by multiplying every element by the scalar.
We encountered norms associated with any inner product in Rn or Cn in Chap. 5.
That these inner product norms are really norms was shown in Theorem 5.2. In this
book we will use the following family of vector norms on V = Cn and V = Rn .
Rm×n ,
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_8
171
172
8 Matrix Norms and Perturbation Theory for Linear Systems
Definition 8.2 (Vector p-Norms) We define for p ≥ 1 and x ∈ Rn or x ∈ Cn the
p-norms by
n
1/p
xp :=
|xj |p
(8.1)
,
j =1
x∞ := max |xj |.
(8.2)
1≤j ≤n
The most important cases are p = 1, 2, ∞:
n
1. x1 :=
|xj | ,
(the one-norm or l1 -norm)
j =1
%
n
2
2. x2 :=
j =1 |xj | ,
3. x∞ := max |xj |,
(the two-norm, l2 -norm, or Euclidian norm)
(the infinity-norm, l∞ -norm, or max norm)
1≤j ≤n
Some remarks are in order.
1. In Sect. 8.4, we show that the p-norms are vector norms for 1 ≤ p ≤ ∞.
2. The triangle inequality x + yp ≤ xp + yp is called Minkowski’s
inequality.
3. To prove it one first establishes Hölder’s inequality
n
|xj yj | ≤ xp yq ,
j =1
1
1
+ = 1,
p q
x, y ∈ Cn .
(8.3)
The relation p1 + q1 = 1 means that if p = 1 then q = ∞ and vice versa.
The Hölder’s inequality is the same as the Cauchy-Schwarz inequality (cf.
Theorem 5.1) for the Euclidian norm p = 2.
4. The infinity norm is related to the other p-norms by
lim xp = x∞ for all x ∈ Cn .
(8.4)
p→∞
5. The equation (8.4) clearly holds for x = 0. For x = 0 we write
n
xp := x∞
j =1
|xj | p
x∞
1/p
.
Now each term in the sum is not greater than one and at least one term is equal
to one, and we obtain
x∞ ≤ xp ≤ n1/p x∞ ,
p ≥ 1.
Since limp→∞ n1/p = 1 for any fixed n ∈ N, we see that (8.4) follows.
(8.5)
8.1 Vector Norms
173
6. In Exercise 8.28 we show the following generalization of inequality (8.5)
xp ≤ xp ≤ n1/p−1/p xp ,
x ∈ Cn ,
1 ≤ p ≤ p ≤ ∞.
(8.6)
We return now to the general vector norm case.
Definition 8.3 (Equivalent Norms) We say that two norms · and · on V are
equivalent if there are positive constants m and M such that for all vectors x ∈ V
we have
mx ≤ x ≤ Mx .
(8.7)
By (8.5) the p- and ∞-norms are equivalent for any p ≥ 1. This result is
generalized in the following theorem.
Theorem 8.1 (Basic Properties of Vector Norms) The following holds for a
normed vector space (V, C, ·).
(inverse triangle inequality).
1. x − y ≥ | x − y |, for all x, y ∈ Cn
2. The vector norm is a continuous function V → R.
3. All vector norms on V are equivalent provided V is finite dimensional.
Proof
1. Since x = x − y + y ≤ x − y + y we obtain x − y ≥ x − y. By
symmetry x − y = y − x ≥ y − x and we obtain the inverse triangle
inequality.
2. This follows from the inverse triangle inequality.
3. The following proof can be skipped by those who do not have the necessary
background in advanced calculus. Define the · unit sphere
S := {y ∈ V : y = 1}.
The set S is a closed and bounded set and the function f : S → R given by
f (y) = y is continuous by what we just showed. Therefore f attains its
minimum and maximum value on S. Thus, there are positive constants m and
M such that
m ≤ y ≤ M,
y ∈ S.
(8.8)
For any x ∈ V we have y := x/x ∈ S, and (8.7) follows if we apply (8.8) to
these y.
174
8 Matrix Norms and Perturbation Theory for Linear Systems
8.2 Matrix Norms
For simplicity we consider only norms on the vector space (Cm×n , C). All results
also holds for (Rm×n , R). A matrix norm : Cm×n , → R is simply a vector norm
on Cm×n . Thus 1., 2. and 3. in Definition 8.1 holds, where we replace x and y by
m × n matrices A and B, respectively. The Frobenius norm
AF :=
m
n
|aij |2
1/2
i=1 j =1
is a matrix norm. Indeed, writing all elements in A in a string of length mn we see
that the Frobenius norm is the Euclidian norm on the space Cmn .
Adapting Theorem 8.1 to the matrix situation gives
Theorem 8.2 (Matrix Norm Equivalence) All matrix norms on Cm×n are equivalent. Thus, if · and · are two matrix norms on Cm×n then there are positive
constants μ and M such that
μA ≤ A ≤ MA
holds for all A ∈ Cm×n . Moreover, a matrix norm is a continuous function.
Any vector norm ·V on Cmn defines a matrix norm on Cm×n given by A :=
vec(A)V , where vec(A) ∈ Cmn is the vector obtained by stacking the columns
of A on top of each other. In particular, to the p vector norms for p = 1, 2, ∞, we
have the corresponding sum norm, Frobenius norm, and max norm defined by
m
n
AS :=
|aij |,
i=1 j =1
AF :=
m
n
|aij |2
1/2
,
AM := max |aij |.
i=1 j =1
i,j
(8.9)
Of these norms the Frobenius norm is the most useful. Some of its properties
were derived in Lemma 7.1 and Theorem 7.5.
8.2.1 Consistent and Subordinate Matrix Norms
Since matrices can be multiplied it is useful to have an analogue of subadditivity
for matrix multiplication. For square matrices the product AB is defined in a fixed
space Cn×n , while in the rectangular case matrix multiplication combines matrices
in different spaces. The following definition captures this distinction.
8.2 Matrix Norms
175
Definition 8.4 (Consistent Matrix Norms) A matrix norm is called consistent on
Cn×n if
4. AB ≤ A B
(submultiplicativity)
holds for all A, B ∈ Cn×n . A matrix norm is consistent if it is defined on Cm×n
for all m, n ∈ N, and 4. holds for all matrices A, B for which the product AB is
defined.
Clearly the Frobenius norm is defined for all m, n ∈ N. From Lemma 7.1 it
follows that the Frobenius norm is consistent.
For a consistent matrix norm on Cn×n we have the inequality
Ak ≤ Ak for A ∈ Cn×n and k ∈ N.
(8.10)
When working with norms one often has to bound the vector norm of a matrix
times a vector by the norm of the matrix times the norm of the vector. This leads to
the following definition.
Definition 8.5 (Subordinate Matrix Norms) Suppose m, n ∈ N are given, let on Cm and β on Cn be vector norms, and let be a matrix norm on Cm×n . We
say that the matrix norm is subordinate to the vector norms and β if
Ax ≤ A xβ for all A ∈ Cm×n and all x ∈ Cn .
By Lemma 7.1 we have Ax2 ≤ AF x2 , for all x ∈ Cn . Thus the Frobenius
norm is subordinate to the Euclidian vector norm.
For consistent matrix norms we have
Proposition 8.1 For m, n ∈ N, A ∈ Cm×n , all x ∈ Cn and any consistent matrix
norm Ax ≤ A x,
(8.11)
i.e., a consistent matrix norm is subordinate to itself. Moreover, the matrix power
bound (8.10) holds for all square matrices A ∈ Cn×n .
Proof Since a consistent matrix norm is defined on Cm×n for all m, n ∈ N the
consistency implies that (8.11) holds for A ∈ Cm×n and B := x ∈ Cn×1 . The last
statement also follows immediately from the consistency.
8.2.2 Operator Norms
Corresponding to vector norms on Cn and Cm there is an induced matrix norm on
Cm×n which we call the operator norm. It is possible to consider one vector norm
176
8 Matrix Norms and Perturbation Theory for Linear Systems
on Cm and another vector norm on Cn , but we treat only the case of one vector norm
defined on Cn for all n ∈ N.1
Definition 8.6 (Operator Norm) Let be a vector norm defined on Cn for all
n ∈ N. For given m, n ∈ N and A ∈ Cm×n we define
A := max
x =0
Ax
.
x
(8.12)
We call this the operator norm corresponding to the vector norm .
With a risk of confusion we use the same symbol for the operator norm and the
corresponding vector norm. Before we show that the operator norm is a matrix norm
we make some observations.
1. It is enough to take the max over subsets of Cn . For example
A = max Ax.
(8.13)
S := {x ∈ Cn : x = 1}
(8.14)
x=1
The set
is the unit sphere in Cn with respect to the vector norm . It is enough to take
the max over this unit sphere since
max
x =0
. x .
Ax
.
.
= max .A
. = max Ay.
x
x =0
x
y=1
2. The operator norm is subordinate to the corresponding vector norm. Thus,
Ax ≤ Ax for all A ∈ Cm×n and x ∈ Cn .
(8.15)
3. We can use max instead of sup in (8.12). This follows by the following
compactness argument. The unit sphere S given by (8.14) is bounded. It is also
finite dimensional and closed, and hence compact. Moreover, since the vector
norm : S → R is a continuous function, it follows that the function
f : S → R given by f (x) = Ax is continuous. But then f attains its max and
min and we have
A = Ax ∗ for some x ∗ ∈ S.
(8.16)
the case of one vector norm on Cm and another vector norm β on Cn we would define
A := maxx =0 Ax
xβ .
1 In
8.2 Matrix Norms
177
Lemma 8.1 (The Operator Norm Is a Consistent Matrix Norm) If is vector
norm defined on Cn for all n ∈ N, then the operator norm given by (8.12) is a
consistent matrix norm. Moreover, I = 1.
Proof We use (8.13). In 2. and 3. below we take the max over the unit sphere S
given by (8.14).
1. Nonnegativity is obvious. If A = 0 then Ay = 0 for each y ∈ Cn . In
particular, each column Aej in A is zero. Hence A = 0.
2. cA = maxx cAx = maxx |c| Ax = |c| A.
3. A + B = maxx (A + B)x ≤ maxx Ax + maxx Bx = A + B.
ABx
ABx Bx
4. AB = maxx =0 ABx
x = maxBx =0 x = maxBx =0 Bx x
≤ maxy =0
Ay
y
maxx =0
Bx
x
= A B.
That I = 1 for any operator norm follows immediately from the definition.
√
Since I F = n, we see that the Frobenius norm is not an operator norm for
n > 1.
8.2.3 The Operator p-Norms
Recall that the p or p vector norms (8.1) are given by
xp :=
n
|xj |p
1/p
, p ≥ 1,
j =1
x∞ := max |xj |.
1≤j ≤n
The operator norms p defined from these p-vector norms are used quite
frequently for p = 1, 2, ∞. We define for any 1 ≤ p ≤ ∞
Ap := max
x =0
Axp
= max Ayp .
yp =1
xp
(8.17)
For p = 1, 2, ∞ we have explicit expressions for these norms.
Theorem 8.3 (One-Two-Inf-Norms) For A ∈ Cm×n we have
m
A1 := max Aej 1 = max
1≤j ≤n
1≤j ≤n
|ak,j |,
(max column sum)
k=1
A2 := σ1 ,
(largest singular value of A)
n
A∞ = max eTk A1 = max
1≤k≤m
1≤k≤m
|ak,j |.
j =1
(max row sum)
(8.18)
178
8 Matrix Norms and Perturbation Theory for Linear Systems
The two-norm A2 is also called the spectral norm of A.
Proof We proceed as follows:
(a) We derive a constant Kp such that Axp ≤ Kp for any x ∈ Cn with xp = 1.
(b) We give an extremal vector y e ∈ Cn with y e p = 1 so that Ay e p = Kp .
It then follows from (8.17) that Ap = Ay e p = Kp .
2-norm: Let A = U ΣV ∗ be a singular value decomposition of A, define K2 =
σ1 , c := V ∗ x, and y e = v 1 the singular vector corresponding to σ1 . Then x =
V c, c2 = x2 = 1, and using (7.7) in (b) we find
(a) Ax22 = U ΣV ∗ x22 = Σ c22 =
(b) Av 1 2 = σ1 u1 2 = σ1 .
n
2
2
j =1 σj |cj |
n
2
j =1 |cj |
≤ σ12
= σ12 .
1-norm: Define K1 , c and y e by K1 := Aec 1 = max1≤j ≤n Aej 1 and y e :=
ec , a unit vector. Then y e 1 = 1 and we obtain
(a)
m
Ax1 =
akj xj ≤
n
k=1 j =1
m
n
n
|akj ||xj | =
k=1 j =1
j =1
m
|akj | |xj | ≤ K1 .
k=1
(b) Ay e 1 = K1 .
∞-norm: Define K∞ , r and y e by K∞ := eTr A1 = max1≤k≤meTk A1 and
y e := [e−iθ1 , . . . , e−iθn ]T , where arj = |arj |eiθj for j = 1, . . . , n.
(a)
Ax∞
= max 1≤k≤m
(b) Ay ∗ ∞ = max1≤k≤m n
akj xj ≤ max
1≤k≤m
j =1
n
|akj ||xj | ≤ K∞ .
j =1
n
−iθj j =1 akj e
The last equality is correct because equality for k = r.
= K∞ .
n
−iθj ≤
j =1 akj e
n
j =1 |akj |
≤ K∞ with
Example 8.1 (Comparing
One-Two-Inf-Norms) The largest singular value of the
1 14 4 16
matrix A := 15
2 22 13 , is σ1 = 2 (cf. Example 7.9). We find
A1 =
29
,
15
A2 = 2,
A∞ =
37
,
15
AF =
√
5.
The values of these norms do not differ by much.
In some cases the spectral norm is equal to an eigenvalue of the matrix.
8.2 Matrix Norms
179
Theorem 8.4 (Spectral Norm) Suppose A ∈ Cn×n has singular values σ1 ≥ σ2 ≥
· · · ≥ σn and eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |. Then
A2 = σ1 and A−1 2 =
1
,
σn
(8.19)
A2 = λ1 and A−1 2 =
1
, if A is positive definite,
λn
(8.20)
1
, if A is normal.
|λn |
(8.21)
A2 = |λ1 | and A−1 2 =
For the norms of A−1 we assume that A is nonsingular.
Proof Since 1/σn is the largest singular value of A−1 , (8.19) follows. By Exercise 7.5 the singular values of a positive definite matrix (normal matrix) are equal
to the eigenvalues (absolute value of the eigenvalues). This implies (8.20) and
(8.21).
The following result is sometimes useful.
Theorem 8.5 (Spectral Norm Bound) For any A ∈ Cm×n we have A22 ≤
A1 A∞ .
Proof Let (σ 2 , v) be an eigenpair for A∗ A corresponding to the largest singular
value σ of A. Then
A22 v1 = σ 2 v1 = σ 2 v1 = A∗ Av1 ≤ A∗ 1 A1 v1 .
Observing that A∗ 1 = A∞ by Theorem 8.3 and canceling v1 proves the
result.
8.2.4 Unitary Invariant Matrix Norms
Definition 8.7 (Unitary Invariant Norm) A matrix norm on Cm×n is called
unitary invariant if U AV = A for any A ∈ Cm×n and any unitary matrices
U ∈ Cm×m and V ∈ Cn×n .
When a unitary invariant matrix norm is used, the size of a perturbation is not
increased by a unitary transformation. Thus if U and V are unitary then U (A +
E)V = U AV + F , where F = E.
It follows from Lemma 7.1 that the Frobenius norm is unitary invariant. We show
here that this also holds for the spectral norm.
180
8 Matrix Norms and Perturbation Theory for Linear Systems
Theorem 8.6 (Unitary Invariant Norms) The Frobenius norm and the spectral
norm are unitary invariant. Moreover,
A∗ F = AF and A∗ 2 = A2 .
Proof The results for the Frobenius norm follow from Lemma 7.1. Suppose A ∈
Cm×n and let U ∈ Cm×m and V ∈ Cn×n be unitary. Since the 2-vector norm is
unitary invariant we obtain
U A2 = max U Ax2 = max Ax2 = A2 .
x2 =1
x2 =1
Now A and A∗ have the same nonzero singular values, and it follows from
Theorem 8.3 that A∗ 2 = A2 . Moreover V ∗ is unitary. Using these facts we
find
AV 2 = (AV )∗ 2 = V ∗ A∗ 2 = A∗ 2 = A2 .
It can be shown that the spectral norm is the only unitary invariant operator norm,
see [10] p. 357.
8.2.5 Absolute and Monotone Norms
A vector norm on Cn is an absolute norm if x = |x| for all x ∈ Cn . Here
|x| := [|x1 |, . . . , |xn |]T , the absolute values of the components of x. Clearly the
vector p norms are absolute norms. We state without proof (see Theorem 5.5.10 of
[10]) that a vector norm on Cn is an absolute norm if and only if it is a monotone
norm, i.e.,
|xi | ≤ |yi |, i = 1, . . . , n ⇒ x ≤ y, for all x, y ∈ Cn .
Absolute and monotone matrix norms are defined as for vector norms.
The study of matrix norms will be continued in Chap. 12.
8.3 The Condition Number with Respect to Inversion
Consider the system of two linear equations
x2 = 20
x1 +
−16
−15
+
(1
−
10
)x
x1
2 = 20 − 10
8.3 The Condition Number with Respect to Inversion
181
whose exact solution is x1 = x2 = 10. If we replace the second equation by
x1 + (1 + 10−16 )x2 = 20 − 10−15 ,
the exact solution changes to x1 = 30, x2 = −10. Here a small change in one of
the coefficients, from 1 − 10−16 to 1 + 10−16 , changed the exact solution by a large
amount.
A mathematical problem in which the solution is very sensitive to changes in
the data is called ill-conditioned. Such problems can be difficult to solve on a
computer.
In this section we consider what effect a small change (perturbation) in the data
A,b has on the inverse of A and on the solution x of a linear system Ax = b. To
measure this we use vector and matrix norms. In this section will denote a vector
norm on Cn and also a matrix norm on Cn×n . We assume that the matrix norm is
consistent on Cn×n and subordinate to the vector norm. Thus, for any A, B ∈ Cn×n
and any x ∈ Cn we have
AB ≤ A B and Ax ≤ A x.
Recall that this holds if the matrix norm is the operator norm corresponding to the
given vector norm. It also holds for the Frobenius matrix norm and the Euclidian
n×n then I = 1
vector norm. This follows from Lemma
√ 7.1. We recall that if I ∈ R
for an operator norm, while I F = n.
8.3.1 Perturbation of the Right Hand Side in a Linear Systems
Suppose x, y solve Ax = b and (A + E)y = b+e, respectively. where A, A + E ∈
Cn×n are nonsingular and b, e ∈ Cn . How large can y−x be? The difference y −x
measures the absolute error in y as an approximation to x, while y − x/x and
y − x/y are measures for the relative error.
We consider first the simpler case of a perturbation in the right-hand side b.
Theorem 8.7 (Perturbation in the Right-Hand Side) Suppose A ∈ Cn×n is
nonsingular, b, e ∈ Cn , b = 0 and Ax = b, Ay = b+e. Then
1 e
y − x
e
≤
≤ K(A)
,
K(A) b
x
b
(8.22)
where K(A) = A A−1 is the condition number of A.
Proof Subtracting Ax = b from Ay = b+e we have A(y−x) = e or y−x = A−1 e.
Combining y − x = A−1 e ≤ A−1 e and b = Ax ≤ A x we
obtain the upper bound in (8.22). Combining e ≤ A y − x and x ≤
A−1 b we obtain the lower bound.
182
8 Matrix Norms and Perturbation Theory for Linear Systems
Consider (8.22). e/b is a measure of the size of the perturbation e relative
to the size of b. The upper bound says that y − x/x in the worst case can be
K(A) times as large as e/b.
The bounds in (8.22) depends on K(A). This number is called the condition
number with respect to inversion of a matrix, or just the condition number of
A, if it is clear from the context that we are talking about inverting a matrix. The
condition number depends on the matrix A and on the norm used. If K(A) is large,
A is called ill-conditioned (with respect to inversion). If K(A) is small, A is called
well-conditioned (with respect to inversion). We always have K(A) ≥ 1. For since
x = I x ≤ I x for any x we have I ≥ 1 and therefore A A−1 ≥
AA−1 = I ≥ 1.
Since all matrix norms are equivalent, the dependence of K(A) on the norm
chosen is less important than the dependence on A. Example 8.1 provided an
illustration of this. See also Exercise 8.19. Sometimes one chooses the spectral norm
when discussing properties of the condition number, and the 1 , ∞ , or Frobenius
norm when one wishes to compute it or estimate it.
Suppose we have computed an approximate solution y to Ax = b. The vector
r(y) := Ay − b is called the residual vector, or just the residual. We can bound
x −y in terms of r.
Theorem 8.8 (Perturbation and Residual) Suppose A ∈ Cn×n , b ∈ Cn , A is
nonsingular and b = 0. Let r(y) = Ay − b for y ∈ Cn . If Ax = b then
y − x
r(y)
1 r(y)
≤
≤ K(A)
.
K(A) b
x
b
Proof We simply take e = r(y) in Theorem 8.7.
(8.23)
Consider next a perturbation in the coefficient matrix in a linear system. Suppose
A, E ∈ Cn×n with A, A + E nonsingular. We like to compare the solution x and y
of the systems Ax = b and (A + E)y = b.
Theorem 8.9 (Perturbation in Matrix) Suppose A, E ∈ Cn×n , b ∈ Cn with A
nonsingular and b = 0. If r := A−1 E < 1 then A + E is nonsingular. If Ax = b
and (A + E)y = b then
E
y − x
≤ r ≤ K(A)
,
y
A
(8.24)
r
K(A) E
y − x
≤
≤
.
x
1−r
1 − r A
(8.25)
Proof We show A + E singular implies r ≥ 1. Suppose A + E is singular. Then
(A + E)x = 0 for some nonzero x ∈ Cn . Multiplying by A−1 it follows that
(I + A−1 E)x = 0 and this implies that x = A−1 Ex ≤ rx. But then r ≥ 1.
Subtracting (A + E)y = b from Ax = b gives A(x − y) = Ey or
x − y = A−1 Ey. Taking norms and dividing by y proves (8.24). Solving
8.3 The Condition Number with Respect to Inversion
183
x − y = A−1 Ey for y we obtain y = (I + A−1 E)−1 x. By Theorem 12.14 we
x
and (8.24) implies y − x ≤ ry ≤
have y ≤ (I + A−1 E)−1 x ≤ 1−r
K(A) E
r
1−r x ≤ 1−r A x. Dividing by x gives (8.25).
In Theorem 8.9 we gave bounds for the relative error in x as an approximation
to y and the relative error in y as an approximation to x. E/A is a measure for
the size of the perturbation E in A relative to the size of A. The condition number
again plays a crucial role. y − x/y can be as large as K(A) times E/A. It
can be shown that the upper bound can be attained for any A and any b. In deriving
the upper bound we used the inequality A−1 Ey ≤ A−1 E y. For a more
or less random perturbation E this is not a severe overestimate for A−1 Ey. In the
situation where E is due to round-off errors (8.24) can give a fairly realistic estimate
for y − x/y.
The following explicit expressions for the 2-norm condition number follow from
Theorem 8.4.
Theorem 8.10 (Spectral Condition Number) Suppose A ∈ Cn×n is nonsingular
with singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0 and eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥
|λn | > 0. Then
⎧
⎪
if A is positive definite,
⎪
⎨λ1 /λn ,
K2 (A) = |λ1 |/|λn |, if A is normal,
(8.26)
⎪
⎪
⎩σ /σ ,
in general.
1
n
It follows that A is ill-conditioned with respect to inversion if and only if σ1 /σn
is large, or λ1 /λn is large when A is positive definite.
If A is well-conditioned, (8.23) says that y − x/x ≈ r(y)/b. In other
words, the accuracy in y is about the same order of magnitude as the residual as
long as b ≈ 1. If A is ill-conditioned, anything can happen. We can for example
have an accurate solution even if the residual is large.
8.3.2 Perturbation of a Square Matrix
Suppose A is nonsingular and E a perturbation of A. We expect B := A + E to be
nonsingular when E is small relative to A. But how small is small? It is also useful
to have bounds on B −1 in terms of A−1 and the difference B −1 − A−1 .
We consider the relative errors B −1 /A−1 , B −1 − A−1 /B −1 and B −1 −
A−1 /A−1 .
184
8 Matrix Norms and Perturbation Theory for Linear Systems
Theorem 8.11 (Perturbation of Inverse Matrix) Suppose A ∈ Cn×n is nonsingular and let B := A + E ∈ Cn×n be nonsingular. For any consistent matrix norm
we have
B −1 − A−1 E
,
≤ A−1 E ≤ K(A)
−1
A
B (8.27)
where K(A) := A A−1 . If r := A−1 E < 1 then B is nonsingular and
B −1 1
1
≤
.
≤
1+r
1−r
A−1 (8.28)
We also have
B −1 − A−1 A
−1
≤
K(A) E
r
≤
.
1−r
1 − r A
(8.29)
We can replace A−1 E by EA−1 everywhere.
Proof That B is nonsingular if r < 1 follows from Theorem 8.9. We have −E =
A − B = A(B −1 − A−1 )B = B(B −1 − A−1 )A so that
B −1 − A−1 = −A−1 EB −1 = −B −1 EA−1 .
(8.30)
Therefore, if B is nonsingular then by (8.30)
B −1 − A−1 ≤ A−1 EB −1 ≤ K(A)
E −1
B .
A
Dividing through by B −1 gives the upper bounds in (8.27). Next, (8.30) implies
B −1 ≤ A−1 + A−1 EB −1 ≤ A−1 + rB −1 .
Solving for B −1 and dividing by A−1 we obtain the upper bound in (8.28).
Similarly we obtain the lower bound in (8.28) from A−1 ≤ B −1 + rB −1 .
The bound in (8.29) follows by multiplying (8.27) by B −1 /A−1 and using
(8.28).
That we can replace A−1 E by EA−1 everywhere follows from (8.30). 8.4 Proof That the p-Norms Are Norms
185
8.4 Proof That the p-Norms Are Norms
We want to show
Theorem 8.12 (The p Vector Norms Are Norms) Let for 1 ≤ p ≤ ∞ and x ∈ Cn
xp :=
n
|xj |p
1/p
,
x∞ := max |xj |.
1≤j ≤n
j =1
Then for all 1 ≤ p ≤ ∞, x, y ∈ Cn and all a ∈ C
1. xp ≥ 0 with equality if and only if x = 0.
2. axp = |a| xp .
3. x + yp ≤ xp + yp .
(positivity)
(homogeneity)
(subadditivity)
Positivity and homogeneity follows immediately. To show the subadditivity we
need some elementary properties of convex functions.
Definition 8.8 (Convex Function) Let I ⊂ R be an interval. A function f : I → R
is convex if
f (1 − λ)x1 + λx2 ≤ (1 − λ)f (x1 ) + λf (x2 )
(8.31)
for all x1 , x2 ∈ I with x1 < x2 and all λ ∈ [0, 1]. The sum nj=1 λj xj is called a
convex combination of x1 , . . . , xn if λj ≥ 0 for j = 1, . . . , n and nj=1 λj = 1.
The convexity condition is illustrated in Fig. 8.1.
(1 − λ)f (x1 ) + λf (x2 )
f (x)
x1
Fig. 8.1 A convex function
x = (1 − λ)x1 + λ x2
x2
186
8 Matrix Norms and Perturbation Theory for Linear Systems
Lemma 8.2 (A Sufficient Condition for Convexity) If f ∈ C 2 [a, b] and f (x) ≥
0 for x ∈ [a, b] then f is convex.
Proof We recall the formula for linear interpolation with remainder, (cf a book on
numerical methods) For any a ≤ x1 ≤ x ≤ x2 ≤ b there is a c ∈ [x1 , x2 ] such that
f (x) =
x2 − x
x − x1
f (x1 ) +
f (x2 ) + (x − x1 )(x − x2 )f (c)/2
x2 − x1
x2 − x1
= (1 − λ)f (x1 ) + λf (x2 ) + (x2 − x1 )2 λ(λ − 1)f (c)/2,
λ :=
x − x1
.
x2 − x1
Since λ ∈ [0, 1] we have f (x) ≤ (1 − λ)f (x1 ) + λf (x2 ). Moreover,
x=
x2 − x
x − x1
x1 +
x2 = (1 − λ)x1 + λx2
x2 − x1
x2 − x1
so that (8.31) holds, and f is convex.
The following inequality is elementary, but can be used to prove many nontrivial
inequalities.
Theorem 8.13 (Jensen’s Inequality) Suppose I ∈ R is an interval and f : I → R
is convex. Then for all n ∈ N, all λ1 , . . . , λn with λj ≥ 0 for j = 1, . . . , n and
n
j =1 λj = 1, and all z1 , . . . , zn ∈ I
n
n
λj zj ) ≤
f(
j =1
λj f (zj ).
j =1
Proof We use induction on n. The result is trivial for n = 1. Let n ≥ 2, assume
the inequality holds for n − 1, and let λj , zj for j = 1, . . . , n be given as in the
theorem. Since n ≥ 2 we have λi < 1 for at least one i so assume without loss of
λ
generality that λ1 < 1, and define u := nj=2 1−λj 1 zj . Since nj=2 λj = 1 − λ1
this is a convex combination of n − 1 terms and the induction hypothesis implies
λj
that f (u) ≤ nj=2 1−λ
f (zj ). But then by the convexity of f
1
n
n
λj zj = f (λ1 z1 + (1 − λ1 )u) ≤ λ1 f (z1 ) + (1 − λ1 )f (u) ≤
f
j =1
and the inequality holds for n.
λj f (zj )
j =1
8.4 Proof That the p-Norms Are Norms
187
Corollary 8.1 (Weighted Geometric/Arithmetic Mean Inequality) Suppose
n
j =1 λj aj is a convex combination of nonnegative numbers a1 , . . . , an . Then
n
a1λ1 a2λ2 · · · anλn ≤
(8.32)
λj aj ,
j =1
where 00 := 0.
Proof The result is trivial if one or more of the aj ’s are zero so assume aj > 0
for all j . Consider the function f : (0, ∞) → R given by f (x) = − log x. Since
f (x) = 1/x 2 > 0 for x ∈ (0, ∞), it follows from Lemma 8.2 that this function is
convex. By Jensen’s inequality
− log
n
n
λj aj ≤ −
j =1
λj log(aj ) = − log a1λ1 · · · anλn
j =1
n
or log a1λ1 · · · anλn ≤ log
j =1 λj aj . The inequality follows since exp(log x)
= x for x > 0 and the exponential function is monotone increasing.
Taking λj = n1 for all j in (8.32) we obtain the classical geometric/arithmetic
mean inequality
1
(a1 a2 · · · an ) n ≤
1
n
n
(8.33)
aj .
j =1
Corollary 8.2 (Hölder’s Inequality) For x, y ∈ Cn and 1 ≤ p ≤ ∞
n
|xj yj | ≤ xp yq , where
j =1
1
1
+ = 1.
p q
Proof We leave the proof for p = 1 and p = ∞ as an exercise so assume 1 < p <
∞. For any a, b ≥ 0 the weighted arithmetic/geometric mean inequality implies
that
1
1
ap bq ≤
1
1
1
1
a + b, where + = 1.
p
q
p q
(8.34)
If x = 0 or y = 0 there is nothing to prove so assume that both x and y are nonzero.
Using 8.34 on each term in the middle sum we obtain
1
xp yq
n
n
|xj yj | =
j =1
j =1
|xj |p
p
xp
1
p
|yj |q
q
yq
and the proof of the inequality is complete.
1
q
n
≤
j =1
1 |xj |p 1 |yj |q
+
p xpp q yqq
=1
188
8 Matrix Norms and Perturbation Theory for Linear Systems
Corollary 8.3 (Minkowski’s Inequality) For x, y ∈ Cn and 1 ≤ p ≤ ∞
x + yp ≤ xp + yp .
Proof We leave the proof for p = 1 and p = ∞ as an exercise so assume 1 < p <
∞. We write
n
n
p
x + yp =
|xj + yj |p ≤
j =1
n
|xj ||xj + yj |p−1 +
j =1
|yj ||xj + yj |p−1 .
j =1
We apply Hölder’s inequality with exponent p and q to each sum. In view of the
relation (p − 1)q = p the result is
p
p/q
x + yp ≤ xp x + yp
p/q
+ yp x + yp
p−1
= (xp + yp )x + yp
,
and canceling the common factor, the inequality follows.
8.4.1 p-Norms and Inner Product Norms
It is possible to characterize the p-norms that are derived from an inner product. We
start with the following identity.
Theorem 8.14 (Parallelogram Identity) For all x, y in a real or complex inner
product space
x + y2 + x − y2 = 2x2 + 2y2 ,
(8.35)
where is the inner product norm in the space.
Proof We set a = ±1 in (5.5) and add the two equations.
Theorem 8.15 (When Is a Norm an Inner Product Norm?) To a given norm
on a real or complex vector space V there exists an inner product on V such that
x, x = x2 if and only if the parallelogram identity (8.35) holds for all x, y ∈ V.
Proof If x, x = x2 then
x + y2 + x − y2 = x + y, x + y + x − y, x − y = 2x2 + 2y2
and the parallelogram identity holds. For the converse we prove the real case and
leave the complex case as an exercise. Suppose (8.35) holds for all x, y in the real
vector space V. We show that
x, y :=
1
x + y2 − x − y2 ,
4
x, y ∈ V
(8.36)
8.4 Proof That the p-Norms Are Norms
189
defines an inner product on V. Clearly 1. and 2. in Definition 5.1 hold. The hard part
is to show 3. We need to show that
x, z + y, z = x + y, z,
ax, y = ax, y,
x, y, z ∈ V,
(8.37)
x, y ∈ V.
(8.38)
a ∈ R,
Now
(8.36)
4x, z + 4y, z = x + z2 − x − z2 + y + z2 − y − z2
x +y x −y 2
x + y y −x 2
+
− z−
+
= z+
2
2
2
2
x +y x −y 2
x +y y −x 2
−
− z−
−
+ z+
2
2
2
2
x +y 2
x−y 2
x +y 2
y−x 2
(8.35)
+ 2
− 2z −
− 2
= 2z +
2
2
2
2
x+y
(8.36)
, z,
= 8
2
or
x, z + y, z = 2
x+y
, z,
2
x, y, z ∈ V.
In particular, since y = 0 implies y, z = 0 we obtain x, z = 2 x2 , z for all
x, z ∈ V. This means that 2 x+y
2 , z = x + y, z for all x, y, z ∈ V and (8.37)
follows.
We first show (8.38) when a = n is a positive integer. By induction
(8.37)
nx, y = (n − 1)x + x, y = (n − 1)x, y + x, y = nx, y.
(8.39)
If m, n ∈ N then
m2 n
(8.39)
(8.39)
x, y = mnx, y = mnx, y,
m
implying that (8.38) holds for positive rational numbers
n
n
x, y = x, y.
m
m
Now if a > 0 there is a sequence {an } of positive rational numbers converging to a.
For each n
(8.36)
an x, y = an x, y =
1
an x + y2 − an x − y2 .
4
190
8 Matrix Norms and Perturbation Theory for Linear Systems
Taking limits and using continuity of norms we obtain ax, y = ax, y. This also
holds for a = 0. Finally, if a < 0 then (−a) > 0 and from what we just showed
(8.36)
(−a)x, y = (−a)x, y =
1
−ax + y2 − −ax − y2 = −ax, y,
4
so (8.38) also holds for negative a.
Corollary 8.4 (Are the p-Norms Inner Product Norms?) For the p vector norms
on V = Rn or V = Cn , 1 ≤ p ≤ ∞, n ≥ 2, there is an inner product on V such that
x, x = x2p for all x ∈ V if and only if p = 2.
Proof For p = 2 the p-norm is the Euclidian norm which corresponds to the
standard inner product. If p = 2 then the parallelogram identity (8.35) does not
hold for say x := e1 and y := e2 .
8.5 Exercises Chap. 8
8.5.1 Exercises Sect. 8.1
Exercise 8.1 (An A-Norm Inequality (Exam Exercise 1982-4)) Given a symmetric positive definite matrix A ∈ Rn×n with eigenvalues 0 < λn ≤ · · · ≤ λ1 .
Show that
/
λ1
y2 ,
xA ≤ yA ⇒ x2 ≤
λn
where
xA :=
x T Ax,
x ∈ Rn .
Exercise 8.2 (A Orthogonal Bases (Exam Exercise 1995-4)) Let A ∈ Rn×n be a
symmetric and positive definite matrix and assume b1 , . . . , bn is a basis for Rn . We
define B k := [b1 , . . . , bk ] ∈ Rn×k for k = 1, . . . , n. We consider in this exercise the
inner product ·, · defined by x, y := x T Ay for x, y ∈ Rn and the corresponding
norm xA := x, x1/2 . We define b̃1 := b1 and
−1
b̃k := bk − B k−1 B Tk−1 AB k−1 B Tk−1 Abk ,
k = 2, . . . , n.
a) Show that B Tk AB k is positive definite for k = 1, . . . , n.
b) Show that for k = 2, . . . , n we have (i) b̃k , bj = 0 for j = 1, . . . , k − 1 and
(ii) b̃k − bk ∈ span(b1 , . . . , b k−1 ).
8.5 Exercises Chap. 8
191
c) Explain why b̃1 , . . . , b̃n is a basis for Rn which in addition is A-orthogonal, i.e.,
b̃i , b̃j = 0 for all i, j ≤ n, i = j .
d) Define B̃ n := [b̃1 , . . . , b̃n ]. Show that there is an upper triangular matrix T ∈
Rn×n with ones on the diagonal and satisfies B n = B̃ n T .
e) Assume that the matrix T in d) is such that |tij | ≤ 12 for all i, j ≤ n, i = j .
Assume also that b̃ k 2A ≤ 2b̃k+1 2A for k = 1, . . . , n−1 and that det(B n ) = 1.
Show that then2
b1 A b2 A · · · bn A ≤ 2n(n−1)/4 det(A).
8.5.2 Exercises Sect. 8.2
Exercise 8.3 (Consistency of Sum Norm?) Show that the sum norm is consistent.
Exercise 8.4 (Consistency of Max Norm?) Show that the max norm is not
consistent by considering 11 11 .
Exercise 8.5 (Consistency of Modified Max Norm)
a) Show that the norm
A :=
√
mnAM ,
A ∈ Cm×n
is a consistent matrix norm.
√
b) Show that the constant mn can be replaced by m and by n.
Exercise 8.6 (What Is the Sum Norm Subordinate to?) Show that the sum norm
is subordinate to the l1 -norm.
Exercise 8.7 (What Is the Max Norm Subordinate to?)
a) Show that the max norm is subordinate to the ∞ and 1 norm, i.e., Ax∞ ≤
AM x1 holds for all A ∈ Cm×n and all x ∈ Cn .
b) Show that if AM = |akl |, then Ael ∞ = AM el 1 .
∞
c) Show that AM = maxx =0 Ax
x1 .
Exercise 8.8 (Spectral Norm) Let m, n ∈ N and A ∈ Cm×n . Show that
A2 =
2 Hint:
max
x2 =y2 =1
Show that b̃1 2A · · · b̃n 2A = det(A).
|y ∗ Ax|.
192
8 Matrix Norms and Perturbation Theory for Linear Systems
Exercise 8.9 (Spectral Norm of the Inverse) Suppose A ∈ Cn×n is nonsingular.
Show that Ax2 ≥ σn for all x ∈ Cn with x2 = 1. Show that
A−1 2 = max
x =0
x2
.
Ax2
Exercise 8.10 (p-Norm Example) Let
A=
2 −1
.
−1 2
Compute Ap and A−1 p for p = 1, 2, ∞.
Exercise 8.11 (Unitary Invariance of the Spectral Norm) Show that V A2 =
A2 holds even for a rectangular V as long as V ∗ V = I .
Exercise 8.12 (AU 2 Rectangular A) Find A ∈ R2×2 and U ∈ R2×1 with
U T U = I such that AU 2 < A2 . Thus, in general, AU 2 = A2 does
not hold for a rectangular U even if U ∗ U = I .
Exercise 8.13 (p-Norm of Diagonal Matrix) Show that Ap = ρ(A) :=
max |λi | (the largest eigenvalue of A), 1 ≤ p ≤ ∞, when A is a diagonal matrix.
Exercise 8.14 (Spectral Norm of a Column Vector) A vector a ∈ Cm can also be
considered as a matrix A ∈ Cm,1 .
a) Show that the spectral matrix norm (2-norm) of A equals the Euclidean vector
norm of a.
b) Show that Ap = ap for 1 ≤ p ≤ ∞.
Exercise 8.15 (Norm of Absolute Value Matrix) If A ∈ Cm×n has elements aij ,
let |A| ∈ Rm×n be the matrix with elements |aij |.
√
1+i −2
a) Compute |A| if A =
, i = −1.
1 1−i
b) Show that for any A ∈ Cm×n AF = |A| F , Ap = |A| p for p = 1, ∞.
c) Show that for any A ∈ Cm×n A2 ≤ |A| 2 .
d) Find a real symmetric 2 × 2 matrix A such that A2 < |A| 2 .
Exercise 8.16 (An Iterative Method (Exam Exercise 2017-3)) Assume that
A ∈ Cn×n is non-singular and nondefective (the eigenvectors of A form a basis
for Cn ). We wish to solve Ax = b. Assume that we have a list of the eigenvalues
{λ1 , λ2 , . . . , λm }, in no particular order. We have that m ≤ n, since some of the
eigenvalues may have multiplicity larger than one. Given x 0 ∈ Cn , and k ≥ 0, we
8.5 Exercises Chap. 8
193
define the sequence {x k }m−1
k=0 by
x k+1 = x k +
1
λk+1
r k , where r k = b − Ax k .
a) Let the coefficients cik be defined by
n
rk =
cik ui ,
i=1
where {(σi , ui )}ni=1 are the eigenpairs of A. Show that
ci,k+1 =
⎧
⎨0
⎩ci,k
if σi = λk+1 ,
1 − λk+1
otherwise.
σi
b) Show that for some l ≤ m, we have that x l = x l+1 = · · · = x m = x, where
Ax = b.
c) Consider this iteration for the n × n matrix T = tridiag(c, d, c), where d and c
are positive real numbers and d > 2c. The eigenvalues of T are
λj = d + 2c cos
jπ
,
n+1
j = 1, . . . , n.
What is the operation count for solving T x = b using the iterative algorithm
above?
d) Let now B be a symmetric n × n matrix which is zero on the “tridiagonal”, i.e.,
bij = 0 if |i − j | ≤ 1. Set A = T + B, where T is the tridiagonal matrix above.
We wish to solve Ax = b by the iterative scheme
T x k+1 = b − Bx k .
(8.40)
Recall that if E ∈ Rn×n has eigenvalues λ1 , . . . , λn then ρ(E) := maxi |λi | is
the spectral radius of E. Show that ρ(T −1 B) ≤ ρ(T −1 )ρ(B).
e) Show that the iteration (8.40) will converge if3
⎧
⎨
min max
⎩ i
3 Hint:
use Gershgorin’s theorem.
n
n
|bij |, max
j =1
j
|bij |
i=1
⎫
⎬
⎭
< d − 2c.
194
8 Matrix Norms and Perturbation Theory for Linear Systems
8.5.3 Exercises Sect. 8.3
Exercise 8.17 (Perturbed Linear Equation (Exam Exercise 1981-2)) Given the
systems Ax = b, Ay = b + e, where
1.1 1
A :=
,
1 1
2.1
b
,
b := 1 =
b2
2.0
e
e := 1 ,
e2
e2 = 0.1.
We define δ := x − y2 /x2 .
a) Determine K2 (A) = A2 A−1 2 . Give an upper bound and a positive lower
bound for δ without computing x and y.
b) Suppose as before that b2 = 2.0 and e2 = 0.1. Determine b1 and e which
maximize δ.
Exercise 8.18 (Sharpness of Perturbation Bounds) The upper and lower bounds
for y − x/x given by (8.22) can be attained for any matrix A, but only for
special choices of b. Suppose y A and y A−1 are vectors with y A = y A−1 = 1
and A = Ay A and A−1 = A−1 y A−1 .
a) Show that the upper bound in (8.22) is attained if b = Ay A and e = y A−1 .
b) Show that the lower bound is attained if b = y A−1 and e = Ay A .
Exercise 8.19 (Condition Number of 2. Derivative Matrix) In this exercise we
will show that for m ≥ 1
1
4
(m + 1)2 − 2/3 < condp (T ) ≤ (m + 1)2 ,
π2
2
p = 1, 2, ∞,
(8.41)
where T := tridiag(−1, 2, −1) ∈ Rm×m and condp (T ) := T p T −1 p is the pnorm condition number of T . The p matrix norm is given by (8.17). You will need
the explicit inverse of T given by (2.39) and the eigenvalues given in Lemma 2.2.
As usual we define h := 1/(m + 1).
a) Show that for m ≥ 3
1
cond1 (T ) = cond∞ (T ) =
2
,
m odd,
h−2 ,
−2
h − 1, m even.
and that cond1 (T ) = cond∞ (T ) = 3 for m = 2.
b) Show that for p = 2 and m ≥ 1 we have
cond2 (T ) = cot2
πh 2
= 1/ tan2
πh .
2
(8.42)
8.5 Exercises Chap. 8
195
c) Show the bounds
4
4 −2 2
h − < cond2 (T ) < 2 h−2 .
2
π
3
π
(8.43)
Hint: For the upper bound use the inequality tan x > x valid for 0 < x < π/2.
For the lower bound we use (without proof) the inequality cot2 x > x12 − 23 for
x > 0.
d) Show (8.41).
Exercise 8.20 (Perturbation of the Identity Matrix) Let E be a square matrix.
a) Show that if I − E is nonsingular then
(I − E)−1 − I ≤ E
(I − E)−1 b) If E < 1 then (I − E)−1 is nonsingular by exists and
1
1
≤ (I − E)−1 ≤
1 + E
1 − E
Show the lower bound. Show the upper bound if I = 1. In general for a
consistent matrix norm (i.e., the Frobenius norm) the upper bound follows from
Theorem 12.14 using Neumann series.
c) Show that if E < 1 then
(I − E)−1 − I ≤
E
.
1 − E
Exercise 8.21 (Lower Bounds in (8.27) and (8.29))
a) Solve for E in (8.30) and show that
K(B)−1
B −1 − A−1 E
≤
.
A
B −1 b) Show using a) and (8.28) that
B −1 − A−1 K(B)−1 E
≤
.
1 + r A
A−1 Exercise 8.22 (Periodic Spline Interpolation (Exam Exercise 1993-2)) Let the
components of x = [x0, . . . , xn ]T ∈ Rn+1 define a partition of the interval [a, b],
a = x0 < x1 < · · · < xn = b,
196
8 Matrix Norms and Perturbation Theory for Linear Systems
and given a dataset y := [y0 , . . . , yn ]T ∈ Rn+1 , where we assume y0 = yn . The
periodic cubic spline interpolation problem is defined by finding a cubic spline
function g satisfying the conditions
g(xi ) = yi ,
i = 0, 1, . . . , n,
g (a) = g (b),
g (a) = g (b).
(Recall that g is a cubic polynomial on each interval (xi−1 , xi ), for i = 1, . . . , n
with smoothness C 2 [a, b].)
We define si := g (xi ), i = 0, . . . , n. It can be shown that the vector s :=
[s1 , . . . , sn ]T is determined from a linear system
As = b,
(8.44)
where b ∈ Rn is a given vector determined by x and y. The matrix A ∈ Rn×n is
given by
⎡
⎤
· · · 0 λ1
⎥
.
2 μ2 . .
0 ⎥
⎥
. ⎥
.. .. .. ..
. .
.
. .. ⎥
⎥,
⎥
.. .. .. ..
. .
.
. 0 ⎥
⎥
⎥
..
. λn−1 2 μn−1 ⎦
0 · · · 0 λn 2
2 μ1 0
⎢
⎢ λ2
⎢
⎢
⎢0
A := ⎢
⎢ .
⎢ ..
⎢
⎢
⎣0
μn
where
λi :=
hi
,
hi−1 + hi
μi :=
hi−1
,
hi−1 + hi
, i = 1, . . . , n,
and
hi = xi+1 − xi ,
i = 0, . . . , n − 1, and hn = h0 .
You shall not argue or prove the system (8.44). Throughout this exercise we assume
that
1
hi
≤
≤ 2,
2
hi−1
i = 1, . . . , n.
8.5 Exercises Chap. 8
197
a) Show that
A∞ = 3
A1 ≤
and that
10
.
3
b) Show that A−1 ∞ ≤ 1.
c) Show that A−1 1 ≤ 32 .
d) Let s and b be as in (8.44), where we assume b = 0. Let e ∈ Rn be such that
ep /bp ≤ 0.01. Suppose ŝ satisfies
Aŝ = b + e.
Give estimates for
ŝ − s∞
s∞
and
ŝ − s1
.
s1
Exercise 8.23 (LSQ MATLAB Program (Exam Exercise 2013-4)) Suppose
A ∈ Rm×n , b ∈ Rm , where A has rank n and let A = U Σ V T be a singular
value factorization of A. Thus U ∈ Rm×n and Σ, V ∈ Rn×n . Write a MATLAB
function [x,K]=lsq(A,b) that uses the singular value factorization of A
to calculate a least squares solution x = V Σ −1 U T b to the system Ax =
b and the spectral (2-norm) condition number of A. The MATLAB command
[U,Sigma,V]=svd(A,0) computes the singular value factorization of A.
8.5.4 Exercises Sect. 8.4
Exercise 8.24 (When Is a Complex Norm an Inner Product Norm?) Given a
vector norm in a complex vector space V, and suppose (8.35) holds for all x, y ∈ V.
Show that
1
(8.45)
x + y2 − x − y2 + ix + iy2 − ix − iy2 ,
4
√
defines an inner product on V, where i = −1. The identity (8.45) is called the
polarization identity.4
x, y :=
Exercise 8.25 (p Norm for p = 1 and p = ∞) Show that ·p is a vector norm
in Rn for p = 1, p = ∞.
4 Hint:
We have x, y = s(x, y) + is(x, iy), where s(x, y) :=
1
4
x + y2 − x − y2 .
198
8 Matrix Norms and Perturbation Theory for Linear Systems
Exercise 8.26 (The p-Norm Unit Sphere) The set
Sp = {x ∈ Rn : xp = 1}
is called the unit sphere in Rn with respect to p. Draw Sp for p = 1, 2, ∞ for n = 2.
Exercise 8.27 (Sharpness of p-Norm Inequality) For p ≥ 1, and any x ∈ Cn we
have x∞ ≤ xp ≤ n1/p x∞ (cf. (8.5)).
Produce a vector x l such that x l ∞ = x l p and another vector x u such that
x u p = n1/p x u ∞ . Thus, these inequalities are sharp.
Exercise 8.28 (p-Norm Inequalities for Arbitrary p) If 1 ≤ q ≤ p ≤ ∞ then
xp ≤ xq ≤ n1/q−1/p xp ,
x ∈ Cn .
Hint: For the rightmost inequality use Jensen’s inequality Cf. Theorem 8.13 with
f (z) = zp/q and zi = |xi |q . For the left inequality consider first yi = xi /x∞ ,
i = 1, 2, . . . , n.
8.6 Review Questions
8.6.1
•
•
•
•
•
•
•
What is a consistent matrix norm?
what is a subordinate matrix norm?
is an operator norm consistent?
why is the Frobenius norm not an operator norm?
what is the spectral norm of a matrix?
how do we compute A∞ ?
what is the spectral condition number of a symmetric positive definite
matrix?
8.6.2 Does there exist a vector norm such that Ax ≤ AF x for all A ∈
Cn×n , x ∈ Cn , m, n ∈ N?
8.6.3 Why is A2 ≤ AF for any matrix A?
8.6.4 What is the spectral norm of the inverse of a normal matrix?
Chapter 9
Least Squares
Consider the linear system Ax = b of m equations in n unknowns. It is
overdetermined, if m > n, square, if m = n, and underdetermined, if m < n. In
either case the system can only be solved approximately if b ∈
/ R(A), the column
space of A. One way to solve Ax = b approximately is to select a vector norm
·, say a p-norm, and look for x ∈ Cn which minimizes Ax − b. The use of
the one and ∞ norm can be formulated as linear programming problems, while the
Euclidian norm leads to a linear system and has applications in statistics. Only this
norm is considered here.
Definition 9.1 (Least Squares Problem (LSQ)) Suppose m, n ∈ N, A ∈ Cm×n
and b ∈ Cm . To find x ∈ Cn that minimizes E : Cn → R given by
E(x) := Ax − b22 ,
is called the least squares problem. A minimizer x is called a least squares
solution.
√
Since the square root function is monotone, minimizing E(x) or E(x) is
equivalent.
Example 9.1 (Average) Consider an overdetermined linear system of 3 equations in
one unknown
x1 = 1
x1 = 1,
x1 = 2
⎡ ⎤
1
A = ⎣1⎦ ,
x = [x1 ],
1
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_9
⎡ ⎤
1
b = ⎣1⎦ .
2
199
200
9 Least Squares
To solve this as a least squares problem we compute
Ax − b22 = (x1 − 1)2 + (x1 − 1)2 + (x1 − 2)2 = 3x12 − 8x1 + 6.
Setting the first derivative with respect to x1 equal to zero we obtain 6x1 − 8 = 0 or
x1 = 4/3, the average of b1 , b2 , b3 . The second derivative is positive and x1 = 4/3
is a global minimum.
We will show below the following results, valid for any m, n ∈ N, A ∈ Cm×n
and b ∈ Cn .
Theorem 9.1 (Existence) The least squares problem always has a solution.
Theorem 9.2 (Uniqueness) The solution of the least squares problem is unique if
and only if A has linearly independent columns.
Theorem 9.3 (Characterization) x ∈ Cn is a solution of the least squares problem
if and only if A∗ Ax = A∗ b.
The linear system A∗ Ax = A∗ b is known as the normal equations. By
Lemma 4.2 the coefficient matrix A∗ A is symmetric and positive semidefinite, and
it is positive definite if and only if A has linearly independent columns. This is
the same condition which guarantees that the least squares problem has a unique
solution.
9.1 Examples
Example 9.2 (Linear Regression) We want to fit a straight line p(t) = x1 + x2 t to
m ≥ 2 given data (tk , yk ) ∈ R2 , k = 1, . . . , m. This is part of the linear regression
process in statistics. We obtain the linear system
⎡ ⎤
⎤ ⎡
⎤
1 t1 p(t1 )
y1
⎢ .. ⎥ ⎢ .. ⎥ x1
⎢ .. ⎥
Ax = ⎣ . ⎦ = ⎣ . ⎦
= ⎣ . ⎦ = b.
x2
1 tm
ym
p(tm )
⎡
This is square for m = 2 and overdetermined for m > 2. The matrix A has linearly
independent columns if and only if the set {t1 , . . . , tm } of sites contains at least two
distinct elements. For if say ti = tj then
⎡ ⎤
⎡ ⎤ ⎡ ⎤
0
1
t1
0
1 ti c1
⎢ .. ⎥
⎢ .. ⎥ ⎢ .. ⎥
⇒ c1 = c2 = 0.
=
c1 ⎣ . ⎦ + c2 ⎣ . ⎦ = ⎣ . ⎦ ⇒
0
1 tj c2
tm
0
1
9.1 Examples
201
Conversely, if t1 = · · · = tm then the columns of A are linearly dependent. The
normal equations are
⎡
⎤
1 t1 m
tk x1
1 · · · 1 ⎢ . ⎥ x1
A∗ Ax =
=
,
⎣ .. ⎦
x2
tk
tk2 x2
t1 · · · tm
1 tm
⎡ ⎤
y1
yk
1 ··· 1 ⎢ . ⎥
=
= A∗ b,
⎣ .. ⎦ =
tk yk
t1 · · · tm
ym
where k ranges from 1 to m in the sums. By what we showed the coefficient matrix
is positive semidefinite and positive definite if we have at least two distinct cites. If
m = 2 and t1 = t2 then both systems Ax = b and A∗ Ax = A∗ b are square, and p
is the linear interpolant to the data. Indeed, p is linear and p(tk ) = yk , k = 1, 2.
With the data
t 1.0 2.0 3.0 4.0
y 3.1 1.8 1.0 0.1
6
4 10 x1
=
. The data and the least
10.1
10 30 x2
squares polynomial p(t) = x1 + x2 t = 3.95 − 0.98t are shown in Fig. 9.1.
the normal equations become
Example 9.3 (Input/Output Model) Suppose we have a simple input/output model.
To every input u ∈ Rn we obtain an output y ∈ R. Assuming we have a linear
relation
y = u∗ x =
n
ui x i ,
i=1
Fig. 9.1 A least squares fit to
data
y
6
@
@
×
@
@
@
@
×
@
@
×
@
@
@
×
@
- t
202
9 Least Squares
between u and y, how can we determine x?
Performing m ≥ n experiments we obtain a table of values
u u1 u2 · · · um
.
y y1 y2 · · · ym
We would like to find x such that
⎡
u∗1
⎢ u∗
⎢ 2
Ax = ⎢ .
⎣ ..
⎤
⎡
⎤
y1
⎥
⎢ y2 ⎥
⎥
⎢ ⎥
⎥ x = ⎢ . ⎥ = b.
⎦
⎣ .. ⎦
u∗m
ym
We can estimate x by solving the least squares problem minAx − b22 .
9.1.1 Curve Fitting
Given
•
•
•
•
size: 1 ≤ n ≤ m,
sites: S := {t1 , t2 , . . . , tm } ⊂ [a, b],
y-values: y = [y1 , y2 , . . . , ym ]∗ ∈ Rm ,
functions: φj : [a, b] → R, j = 1, . . . , n.
Find a function (curve fit) p : [a, b] → R given by p := nj=1 xj φj such that
p(tk ) ≈ yk for k = 1, . . . , m.
A solution to the curve fitting problem is found by finding an approximate
solution to the following overdetermined set of linear equations
⎤ ⎡
⎤⎡ ⎤ ⎡ ⎤
φ1 (t1 ) · · · φn (t1 )
x1
y1
p(t1 )
⎥
⎢
⎥
⎢
⎢ .. ⎥ ⎢ ..
.
.
.. ⎦ ⎣ .. ⎦ = ⎣ ... ⎥
Ax = ⎣ . ⎦ = ⎣ .
⎦ =: b.
xn
ym
p(tm )
φ1 (tm ) · · · φn (tm )
⎡
(9.1)
We propose to find x ∈ Rn as a solution of the corresponding least squares problem
given by
m
E(x) := Ax − b22 =
k=1
n
j =1
2
xj φj (tk ) − yk .
(9.2)
9.1 Examples
203
Typical examples of functions φj are polynomials, trigonometric functions, exponential functions, or splines.
In (9.2) one can also include weights wk > 0 for k = 1, . . . , m and minimize
m
E(x) :=
wk
k=1
n
2
xj φj (tk ) − yk .
j =1
If yk is an accurate observation, we can choose a large weight wk . This will force
p(tk ) − yk to be small. Similarly, a small wk will allow p(tk ) − yk to be large. If
an estimate for the standard deviation δyk in yk is known for each k, we can choose
wk = 1/(δyk )2 , k = 1, 2, . . . , m. For simplicity we will assume in the following
that wk = 1 for all k.
Lemma 9.1 (Curve Fitting) Let A be given by (9.1). The matrix A∗ A is symmetric
positive definite if and only if
n
p(tk ) :=
xj φj (tk ) = 0,
k = 1, . . . , m ⇒ x1 = · · · = xn = 0.
(9.3)
j =1
Proof By Lemma 4.2 A∗ A is positive definite if and only if A has linearly
n
independent columns. Since (Ax)k =
j =1 xj φj (tk ), k = 1, . . . , m this is
equivalent to (9.3).
Example 9.4 (Ill Conditioning and the Hilbert Matrix) The normal equations can
be extremely ill-conditioned. Consider the curve fitting problem using the polynomials φj (t) := t j −1 , for j = 1, . . . , n and equidistant sites tk = (k − 1)/(m − 1)
for k = 1, . . . , m. The normal equations are B n x = cn , where for n = 3
⎡
m
B 3 x := ⎣ tk
tk2
tk
tk2
tk3
⎤⎡ ⎤ ⎡
tk2
x1
tk3 ⎦ ⎣ x2 ⎦ = ⎣
tk4
x3
⎤
yk
tk yk ⎦ .
tk2 yk
B n is positive definite if at least n of the t’s are distinct. However B n is extremely
ill-conditioned even for moderate n. Indeed, m1 B n ≈ H n , where H n ∈ Rn×n is the
Hilbert Matrix with i, j element 1/(i + j − 1). Thus, for n = 3
⎡
1
H 3 = ⎣ 21
1
3
1
2
1
3
1
4
1⎤
3
1⎦ .
4
1
5
204
9 Least Squares
The elements of m1 B n are Riemann sums approximations to the elements of H n . In
fact, if B n = [bi,j ]ni,j =1 then
1
1
bi,j =
m
m
≈
0
m
i+j −2
tk
=
k=1
1
x i+j −2 dx =
1
m
m
k=1
k−1
m−1
i+j −2
1
= hi,j .
i+j −1
The elements of H −1
n are determined in Exercise 1.13. We find K1 (H 6 ) :=
H 6 1 H −1
≈
3
· 107 . It appears that m1 B n and hence B n is ill-conditioned
6 1
for moderate n at least if m is large. The cure for this problem is to use a different
basis for polynomials. Orthogonal polynomials are an excellent choice. Another
possibility is to use the shifted power basis (t − t˜)j −1 , j = 1, . . . , n, for a suitable t˜.
9.2 Geometric Least Squares Theory
The least squares problem can be studied as a quadratic minimization problem. In
the real case we have
E(x) := Ax − b22 = (Ax − b)∗ (Ax − b) = x ∗ A∗ Ax − 2x ∗ A∗ b + b∗ b.
Minimization of a quadratic function like E(x) will be considered in Chap. 13.
Here we consider a geometric approach based on orthogonal sums of subspaces, cf.
Sect. 5.1.3.
With the usual inner product x, y = y ∗ x, orthogonal sums and projections we
can prove the existence, uniqueness and characterization theorems for least squares
problems. For A ∈ Cm×n we consider the column space S := R(A) of A and the
null space T := N (A∗ ) of A∗ . These are subspaces of Cm and by Theorem 7.3 we
have the orthogonal sum
⊥
Cm = R(A)⊕N (A∗ ).
(9.4)
Proof of Theorem 9.1 It follows from (9.4) that any b ∈ Cm can be decomposed
uniquely as b = b1 + b2 , where b1 is the orthogonal projection of b into R(A)
and b2 is the orthogonal projection of b into N (A∗ ). Suppose x ∈ Cn . Clearly
Ax −b1 ∈ R(A) since it is a subspace and b2 ∈ N (A∗ ). But then Ax −b1 , b2 = 0
and by Pythagoras
Ax − b22 = (Ax − b1 ) − b2 22 = Ax − b1 22 + b2 22 ≥ b2 22
9.3 Numerical Solution
205
with equality if and only if Ax = b1 . It follows that the set of all least squares
solutions is
{x ∈ Cn : Ax = b1 }.
(9.5)
This set is nonempty since b1 ∈ R(A).
Proof of Theorem 9.2 The set (9.5) contains exactly one element if and only if A
has linearly independent columns.
Proof of Theorem 9.3 If x solves the least squares problem then Ax − b1 = 0 and
it follows that A∗ (Ax − b) = A∗ (Ax − b1 ) = 0 since b2 ∈ N (A∗ ). This shows that
the normal equations hold. Conversely, if A∗ Ax = A∗ b then A∗ b2 = 0 implies that
A∗ (Ax − b1 ) = 0. But then Ax − b1 ∈ R(A) ∩ N (A∗ ) showing that Ax − b1 = 0,
and x is a least squares solution.
9.3 Numerical Solution
We assume that m ≥ n, A ∈ Cm×n , b ∈ Cm . We consider numerical methods based
on normal equations, QR factorization, or Singular Value Factorization. For more
see [2]. We discuss the first two approaches in this section. Another possibility is to
use an iterative method like the conjugate gradient method (cf. Exercise 13.10).
9.3.1 Normal Equations
We assume that rank(A) = n, i.e., A has linearly independent columns. The
coefficient matrix B := A∗ A in the normal equations is positive definite, and we
can solve these equations using the Cholesky factorization of B. Consider forming
the normal equations. We can use either a column oriented (inner product)- or a row
oriented (outer product) approach.
1. inner product: (A∗ A)i,j =
(A∗ b)i =
2. outer product: A∗ A =
m
k=1 a k,i ak,j ,
m
a b ,i=
k=1
⎡ k,i ⎤k
i, j = 1, . . . , n,
1, . . . , n,
a k,1
m ⎢ .. ⎥
∗
k=1 ⎣ . ⎦ [ak1 · · · akn ], A b =
a k,n
⎡
⎤
a k,1
m ⎢ .. ⎥
k=1 ⎣ . ⎦ bk .
a k,n
The outer product form is suitable for large problems since it uses only one pass
through the data importing one row of A at a time from some separate storage.
Consider the number of operations to find the least squares solution for real data.
We need 2m arithmetic operations for each inner product. Since B is symmetric
we only need to compute n(n + 1)/2 such inner products. It follows that B can be
computed in approximately mn2 arithmetic operations. In conclusion the number
206
9 Least Squares
of operations are mn2 to find B, 2mn to find c := A∗ b, n3 /3 to find L such that
B = LL∗ , n2 to solve Ly = c and n2 to solve L∗ x = y. If m ≈ n it takes 43 n3 =
2Gn arithmetic operations. If m is much bigger than n the number of operations is
approximately mn2 , the work to compute B.
Conditioning of A can be a problem with the normal equation approach. We have
Theorem 9.4 (Spectral Condition Number of A∗ A) Suppose 1 ≤ n ≤ m and
that A ∈ Cm×n has linearly independent columns. Then
K2 (A∗ A) := A∗ A2 (A∗ A)−1 2 =
σ2
λ1
= 12 = K2 (A)2 ,
λn
σn
(9.6)
where λ1 ≥ · · · ≥ λn > 0 are the eigenvalues of A∗ A, and σ1 ≥ · · · ≥ σn > 0 are
the singular values of A.
Proof Since A∗ A is Hermitian it follows from Theorem 8.10 that K2 (A) =
∗
K2 (A A) =
λ1
λn .
But λi =
σi2
by Theorem 7.2 and the proof is complete.
σ1
σn
and
It follows from Theorem 9.4 that the 2-norm condition number of B := A∗ A is
the square of the condition number of A and therefore can be quite large even if A
is only mildly ill-conditioned. Another difficulty which can be encountered is that
the computed A∗ A might not be positive definite. See Problem 9.21 for an example.
9.3.2 QR Factorization
The QR factorization can be used to solve the least squares problem. We assume
that rank(A) = n, i.e., A has linearly independent columns. Suppose A = Q1 R 1
is a QR factorization of A. Since Q1 ∈ Cm×n has orthonormal columns we find
A∗ A = R ∗1 Q∗1 Q1 R 1 = R ∗1 R 1 ,
A∗ b = R ∗1 Q∗1 b.
Since A has rank n the matrix R ∗1 is nonsingular and can be canceled. Thus
A∗ Ax = A∗ b ⇒ R 1 x = c1 ,
c1 := Q∗1 b.
We can use Householder transformations or Givens rotations to find R 1 and c1 .
Consider using the Householder triangulation algorithm Algorithm 5.2. We find
R = Q∗ A and c = Q∗ b, where A = QR is the QR decomposition of A.
The matrices R 1 and c1 are located in the first n rows of R and c. Using also
Algorithm 3.2 we have the following method to solve the full rank least squares
problem.
1. [R,c]=housetriang(A,b).
2. x=rbacksolve(R(1:n,1:n),c(1:n),n).
9.3 Numerical Solution
207
Example 9.5 (Solution Using QR Factorization) Consider the least squares problem with
⎡
⎤
⎡ ⎤
1 3 1
1
⎢1 3 7 ⎥
⎢1⎥
⎥
⎢ ⎥
A=⎢
⎣1 −1 −4⎦ and b = ⎣1⎦ .
1 −1 2
1
This is the matrix in Example 5.2. The least squares solution x is found by solving
the system
⎡ ⎤
⎡
⎤
⎡ ⎤
⎤⎡ ⎤
1
1 1 1 1
2
223
x1
⎢1⎥
1
⎣0 4 5⎦ ⎣x2 ⎦ = ⎣ 1 1 −1 −1⎦ × ⎢ ⎥ = ⎣0⎦ ,
⎣1⎦
2
x3
−1 1 −1 1
0
006
1
⎡
and we find x = [1, 0, 0]∗.
Using Householder triangulation is a useful alternative to normal equations for
solving full rank least squares problems. It can even be extended to rank deficient
problems, see [2]. The 2 norm condition number for the system R 1 x = c1 is
K2 (R 1 ) = K2 (Q1 R 1 ) = K2 (A), and as discussed in the previous section this
is the square root of K2 (A∗ A), the condition number for the normal equations.
Thus if A is mildly ill-conditioned the normal equations can be quite ill-conditioned
and solving the normal equations can give inaccurate results. On the other hand
Algorithm 5.2 is quite stable.
But using Householder transformations requires more work. The leading term in
the number of arithmetic operations in Algorithm 5.2 is approximately 2mn2 −
2n3 /3, (cf. (5.16)) while the number of arithmetic operations needed to form
the normal equations, taking advantage of symmetry is approximately mn2 . Thus
for m much larger than n using Householder triangulation requires twice as
many arithmetic operations as the approach based on the normal equations. Also,
Householder triangulation have problems taking advantage of the structure in sparse
problems.
Using MATLAB a least squares solution can be found using x=A\b if A has
full rank.For rank deficient problems the function x=lscov(A,b) finds a least
squares solution with a maximal number of zeros in x.
9.3.3 Singular Value Decomposition, Generalized Inverses and
Least Squares
Further insight into the least squares problem can be obtained by considering a singular value decomposition of A and the corresponding singular value factorization.
208
9 Least Squares
If A has rank r then
Σ1 0 V ∗1
A = U ΣV = [U 1 , U 2 ]
= U 1 Σ 1 V ∗1 ,
0 0 V ∗2
∗
Σ1 = diag(σ1 , . . . , σr ),
(9.7)
where
U 1 = [u1 , . . . , ur ], U 2 = [ur+1 , . . . , um ],
U ∗1 U 1 = I , U ∗2 U 2 = I ,
V 1 = [v 1 , . . . , v r ], V 2 = [v r+1 , . . . , v n ],
V ∗1 V 1 = I , V ∗2 V 2 = I
and σ1 ≥ · · · ≥ σr > 0. We recall (cf. Theorem 7.3)
•
•
•
•
the set of columns of U 1
the set of columns of U 2
the set of columns of V 1
the set of columns of V 2
is an orthonormal basis for R(A),
is an orthonormal basis for N (A∗ ),
is an orthonormal basis for R(A∗ ),
is an orthonormal basis for N (A).
The concept of the inverse of a matrix can be generalized to any rectangular
matrix.
Theorem 9.5 (The Generalized Inverse) For any m, n ∈ N and any A ∈ Cm×n
there is a unique matrix A† ∈ Cn×m such that
AA† A = A, A† AA† = A† , (A† A)∗ = A† A, (AA† )∗ = AA† .
(9.8)
If U 1 Σ 1 V ∗1 is a singular value factorization of A then
∗
A† = V 1 Σ −1
1 U 1.
(9.9)
Proof For existence we show that the matrices
A = U 1 Σ 1 V ∗1 ,
∗
A† := V 1 Σ −1
1 U1
satisfies (9.8). Since U 1 and V 1 have orthonormal columns we find
∗
∗
∗
A† A = V 1 Σ −1
1 U 1U 1Σ 1V 1 = V 1V 1,
∗
∗
AA† = U 1 Σ 1 V ∗1 V 1 Σ −1
1 U 1 = U 1U 1.
(9.10)
A similar calculation shows that (A† A)∗ = V 1 V ∗1 and (AA† )∗ = U 1 U ∗1 showing
that A† A and AA† are Hermitian. Moreover, by (9.10)
AA† A = U 1 Σ 1 V ∗1 V 1 V ∗1 = U 1 Σ 1 V ∗1 = A
∗
∗
−1 ∗
†
A† AA† = V 1 Σ −1
1 U 1U 1U 1 = V 1Σ U 1 = A .
9.3 Numerical Solution
209
Thus (9.8) follows.
That there is only one matrix A† ∈ Cn×m satisfying (9.8) is shown in
Exercise 9.5.
The matrix A† is called the generalized inverse of A. We note that
1. If A is square and nonsingular then A−1 satisfies (9.8) so that A−1 = A† . Indeed,
A−1 A = AA−1 = I implies that A−1 A and AA−1 are Hermitian. Moreover,
AA−1 A = A, A−1 AA−1 = A−1 . By uniqueness A−1 = A† .
2. We show in Exercise 9.7 that if A has linearly independent columns then
A† = (A∗ A)−1 A∗ .
(9.11)
For further properties and examples of the generalized inverse see the exercises.
Orthogonal projections can be expressed in terms of generalized inverses and
singular vectors.
Theorem 9.6 (Orthogonal Projections) Given m, n ∈ N, A ∈ Cm×n of rank r,
and let S be one of the subspaces R(A), N (A∗ ). The orthogonal projection of
v ∈ Cm into S can be written as a matrix P S times the vector v in the form P S v,
where
r
P R(A) = AA† = U 1 U ∗1 =
uj u∗j ∈ Cm×m ,
j =1
†
P N (A∗ ) = I − AA =
U 2 U ∗2
m
=
(9.12)
uj u∗j
∈C
m×m
.
j =r+1
where A† is the generalized inverse of A, and A = U ΣV ∗ ∈ Cm×n is a singular
value decomposition of A (cf. (9.7)).
Proof By block multiplication we have for any v ∈ Cm
∗
U1
v = U U v = [U 1 , U 2
v = s + t,
U ∗2
∗
where s = U 1 U ∗1 v ∈ R(A) and t = U 2 U ∗2 v ∈ N (A∗ ). By uniqueness and (9.10)
we obtain the first equation in (9.12). Since v = (U 1 U ∗1 + U 2 U ∗2 )v for any v ∈ Cm
we have U 1 U ∗1 + U 2 U ∗2 = I , and hence U 2 U ∗2 = I − U 1 U ∗1 = I − AA† and the
second equation in (9.12) follows.
Corollary 9.1 (LSQ Characterization Using Generalized Inverse) x ∈ Cn
solves the least squares problem minx Ax − b22 if and only if x = A† b + z,
where A† is the generalized inverse of A and z ∈ N (A).
210
9 Least Squares
Proof It follows from Theorem 9.6 that b1 := AA† b is the orthogonal projection
of b ∈ Cm into R(A). Moreover, (9.5) implies that x is a least squares solution if
and only if Ax = b1 .
Let x be a least squares solution, i.e., Ax = b1 . If z := x − A† b then Az =
Ax − AA† b = b1 − b1 = 0 and x = A† b + z.
Conversely, if x = A† b + z with Az = 0 then Ax = A(A† b + z) = b1 and x is
a least squares solution.
The least squares solution A† b has an interesting property.
Theorem 9.7 (Minimal Norm Solution) The least squares solution with minimal
Euclidian norm is x = A† b corresponding to z = 0.
Proof Consider a singular value decomposition of A using the notation in (9.7).
Suppose x = A† b + z, with z ∈ N (A). Since the columns of V 2 form a basis for
N (A) we have z = V 2 y for some y. Moreover, V ∗2 V 1 = 0 since V has orthonormal
columns. But then z∗ A† b = y ∗ V ∗2 V 1 Σ −1 U ∗1 b = 0. Thus z and A† b are orthogonal
so that by Pythagoras x22 = A† b + z22 = A† b22 + z22 ≥ A† b22 with
equality for z = 0.
Example 9.6 (Rank Deficient
Least Squares Solution) Consider the least squares
problem with A = 11 11 and b = [1, 1]∗. The singular value factorization,A† and
A† b are given by
1 1 !1" 1 1
1 1
1 1/2
√ 1 1 = A, A† b =
A := √
.
[2] √ 1 1 , A† = √
1/2
1
1
2
4
2
2
2
2
Using Corollary 9.1 we find the general solution [1/2, 1/2] +[a, −a] for any a ∈ C.
The MATLAB function lscov gives the solution [1, 0]∗ corresponding to a = 1/2,
while the minimal norm solution is [1/2, 1/2] obtained for a = 0.
9.4 Perturbation Theory for Least Squares
In this section we consider what effect small changes in the data A, b have on the
solution x of the least squares problem minAx − b2 .
If A has linearly independent columns then we can write the least squares
solution x (the solution of A∗ Ax = A∗ b) as (cf. Exercise 9.7)
x = A† b = A† b1 ,
A† = (A∗ A)−1 A∗ ,
where b1 is the orthogonal projection of b into the column space R(A).
9.4 Perturbation Theory for Least Squares
211
9.4.1 Perturbing the Right Hand Side
Let us now consider the effect of a perturbation in b on x.
Theorem 9.8 (Perturbing the Right Hand Side) Suppose A ∈ Cm×n has linearly
independent columns, and let b, e ∈ Cm . Let x, y ∈ Cn be the solutions of
minAx − b2 and minAy − b − e2 . Finally, let b1 , e1 be the orthogonal
projections of b and e into R(A). If b1 = 0, we have for any operator norm
1 e1 y − x
e1 ≤
≤ K(A)
,
K(A) b1 x
b1 K(A) = AA† .
(9.13)
Proof Subtracting x = A† b1 from y = A† b1 + A† e1 we have y − x = A† e1 .
Thus y − x = A† e1 ≤ A† e 1 . Moreover, b1 = Ax ≤ Ax.
Therefore y − x/x ≤ AA† e1 /b1 proving the rightmost inequality.
From A(x − y) = e1 and x = A† b1 we obtain the leftmost inequality.
Equation (9.13) is analogous to the bound (8.22) for linear systems. We see that
the number K(A) = AA† generalizes the condition number AA−1 for
a square matrix. The main difference between (9.13) and (8.22) is however that
e/b in (8.22) has been replaced by e1 /b1 , the orthogonal projections of
e and b into R(A). If b lies almost entirely in N (A∗ ), i.e. b/b1 is large, then
e1 /b 1 can be much larger than e/b. This is illustrated in Fig. 9.2. If b
is almost orthogonal to R(A), then e1 /b 1 will normally be much larger than
e/b.
Example 9.7 (Perturbing the Right Hand Side) Suppose
⎡
⎤
11
A = ⎣0 1⎦,
00
⎡
⎤
10−4
b = ⎣ 0 ⎦,
1
⎡
⎤
10−6
e = ⎣ 0 ⎦.
0
Fig. 9.2 Graphical
interpretation of the bounds
in Theorem 9.8
N(AT)
b
e
e
span(A)
1
b1
b
2
212
9 Least Squares
For this example we can compute K(A) by finding A† explicitly. Indeed,
11
2 −1
1 −1 0
∗
−1
†
∗
−1 ∗
A A=
, (A A) =
, A = (A A) A =
.
12
−1 1
0 1 0
∗
Thus K∞ (A) = A∞ A† ∞ = 2 · 2 = 4 is quite small.!
"
100
Consider now the projections b1 and e1 . We find AA† = 0 1 0 . Hence
000
b1 = AA† b = [10−4 , 0, 0]∗ ,
and e1 = AA† e = [10−6 , 0, 0]∗ .
Thus e1 ∞ /b1 ∞ = 10−2 and (9.13) takes the form
1 −2 y − x∞
10 ≤
≤ 4 · 10−2 .
4
x∞
(9.14)
To verify the bounds we compute the solutions as x = A† b = [10−4 , 0]∗ and
y = A† (b + e) = [10−4 + 10−6 , 0]∗ . Hence
10−6
x − y∞
= −4 = 10−2 ,
x∞
10
in agreement with (9.14)
For each A we can find b and e so that we have equality in the upper bound in
(9.13). The lower bound is best possible in a similar way.
9.4.2 Perturbing the Matrix
The analysis of the effects of a perturbation E in A is quite difficult. The following
result is stated without proof, see [12, p. 51]. For other estimates see [2] and [19].
Theorem 9.9 (Perturbing the Matrix) Suppose A, E ∈ Cm×n , m > n, where A
has linearly independent columns and α := 1 − E2 A† 2 > 0. Then A + E
has linearly independent columns. Let b = b1 + b2 ∈ Cm where b1 and b2 are the
orthogonal projections into R(A) and N (A∗ ) respectively. Suppose b1 = 0. Let x
and y be the solutions of minAx − b2 and min(A + E)y − b2 . Then
ρ=
x − y2
E2
1
≤ K(1 + βK)
,
x2
α
A2
β=
b 2 2
, K = A2 A† 2 .
b 1 2
(9.15)
Equation (9.15) says that the relative error in y as an approximation to x can be at
most K(1+βK)/α times as large as the size E2 /A2 of the relative perturbation
9.5 Perturbation Theory for Singular Values
213
in A. β will be small if b lies almost entirely in R(A), and we have approximately
ρ ≤ α1 KE2 /A2 . This corresponds to the estimate (8.25) for linear systems.
If β is not small, the term α1 K 2 βE2 /A2 will dominate. In other words, the
condition number is roughly K(A) if β is small and K(A)2 β if β is not small. Note
that β is large if b is almost orthogonal to R(A) and that b2 = b −Ax is the residual
of x.
9.5 Perturbation Theory for Singular Values
In this section we consider what effect a small change in the matrix A has on the
singular values.
9.5.1 The Minmax Theorem for Singular Values and the
Hoffman-Wielandt Theorem
We have a minmax and maxmin characterization for singular values.
Theorem 9.10 (The Courant-Fischer Theorem for Singular Values) Suppose
A ∈ Cm×n has singular values σ1 , σ2 , . . . , σn ordered so that σ1 ≥ · · · ≥ σn . Then
for k = 1, . . . , n
σk =
min
max
dim(S )=n−k+1 x∈S
x =0
Ax2
Ax2
= max min
.
x2
x2
x∈
S
dim(S )=k
(9.16)
x =0
Proof We have
Ax22
x22
=
x ∗ A∗ Ax
(Ax)∗ (Ax)
=
= RA∗ A (x),
x∗x
x∗x
the Rayleigh quotient of A∗ A. Since the singular values of A are the nonnegative
square roots of the eigenvalues of A∗ A, the results follow from the Courant-Fischer
Theorem for eigenvalues, see Theorem 6.1.
By taking k = 1 and k = n in (9.16) we obtain for any A ∈ Cm×n
σ1 = maxn
x∈C
x =0
Ax2
,
x2
σn = minn
x∈C
x =0
Ax2
.
x2
This follows since the only subspace of Cn of dimension n is Cn itself.
Using Theorem 9.10 we obtain the following result.
(9.17)
214
9 Least Squares
Theorem 9.11 (Perturbation of Singular Values) Let A, B ∈ Rm×n be rectangular matrices with singular values α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn .
Then
|αj − βj | ≤ A − B2 , for j = 1, 2, . . . , n.
(9.18)
Proof Fix j and let S be the n−j +1 dimensional subspace for which the minimum
in Theorem 9.10 is obtained for B. Then
(B + (A − B))x2
Bx2
(A − B)x2
≤ max
+ max
x2
x2
x∈S
x∈S x2
x∈S
αj ≤ max
x =0
x =0
x =0
≤ βj + A − B2 .
By symmetry we obtain βj ≤ αj + A − B2 and the proof is complete.
The following result is an analogue of Theorem 8.11.
Theorem 9.12 (Generalized Inverse When Perturbing the Matrix) Let A, E ∈
Rm×n have singular values α1 ≥ · · · ≥ αn and 1 ≥ · · · ≥ n . If rank(A + E) ≤
rank(A) = r and A† 2 E2 < 1 then
1. rank(A + E) = rank(A),
A† 2
2. (A + E)† 2 ≤
†
1−A 2 E2
=
1
αr −
1
.
Proof Suppose A has rank r and let B := A + E have singular values β1 ≥
· · · ≥ βn . In terms of singular values the inequality A† 2 E2 < 1 can be written
1 /αr < 1 or αr > 1 . By Theorem 9.11 we have αr − βr ≤ E2 = 1 , which
implies βr ≥ αr − 1 > 0, and this shows that rank(A + E) ≥ r. Thus 1. follows.
To prove 2., the inequality βr ≥ αr − 1 implies that
(A + E)† 2 =
1
1
≤
βr
αr −
1
=
1/αr
A† 2
=
.
1 − 1 /αr
1 − A† 2 E2
The Hoffman-Wielandt Theorem, see Theorem 6.14, for eigenvalues of Hermitian matrices can be written
n
n
n
|μj − λj |2 ≤ A − B2F :=
j =1
|aij − bij |2 ,
(9.19)
i=1 j =1
where A, B ∈ Cn×n are both Hermitian matrices with eigenvalues λ1 ≥ · · · ≥ λn
and μ1 ≥ · · · ≥ μn , respectively.
For singular values we have a similar result.
9.5 Perturbation Theory for Singular Values
215
Theorem 9.13 (Hoffman-Wielandt Theorem for Singular Values) For any
m, n ∈ N and A, B ∈ Cm×n we have
n
|βj − αj |2 ≤ A − B2F .
(9.20)
j =1
where α1 ≥ · · · ≥ αn and β1 ≥ · · · ≥ βn are the singular values of A and B,
respectively.
Proof We apply the Hoffman-Wielandt Theorem for eigenvalues to the Hermitian
matrices
0 A
0 B
C :=
and
D
:=
∈ C(m+n)×(m+n) .
A∗ 0
B∗ 0
If C and D have eigenvalues λ1 ≥ · · · ≥ λm+n and μ1 ≥ · · · ≥ μm+n , respectively
then
m+n
|λj − μj |2 ≤ C − D2F .
(9.21)
j =1
Suppose A has rank r and SVD [u1 , . . . , um ]Σ[v 1 , . . . , v n ]∗ . We use (7.7) and
determine the eigenpairs of C as follows.
0 A ui
Av i
αi ui
ui
=
=
=
α
, i = 1, . . . , r,
i
A∗ 0 v i
A∗ ui
αi v i
vi
−Av i
−αi ui
ui
0 A
ui
=
=
= −αi
, i = 1, . . . , r,
A∗ ui
αi v i
−v i
A∗ 0 −v i
0
0 A ui
0
u
=
=
= 0 i , i = r + 1, . . . , m,
∗
∗
0
0
0
A 0
A ui
Av i
0
0
0 A 0
=
, i = r + 1, . . . , n.
=0
=
∗
vi
0
0
A 0 vi
Thus C has the 2r eigenvalues α1 , −α1 , . . . , αr , −αr and m + n − 2r additional
zero eigenvalues. Similarly, if B has rank s then D has the 2s eigenvalues
β1 , −β1 , . . . , βs , −βs and m + n − 2s additional zero eigenvalues. Let
t := max(r, s).
Then
λ1 ≥ · · · ≥ λm+n = α1 ≥ · · · ≥ αt ≥ 0 = · · · = 0 ≥ −αt ≥ · · · ≥ −α1 ,
μ1 ≥ · · · ≥ μm+n = β1 ≥ · · · ≥ βt ≥ 0 = · · · = 0 ≥ −βt ≥ · · · ≥ −β1 .
216
9 Least Squares
We find
m+n
t
|λj − μj | =
j =1
t
|αi − βi | +
2
i=1
t
|−αi + βi | = 2
2
|αi − βi |2
2
i=1
i=1
and
C −D2F = 0
A−B 2
F = B −A2F +(B −A)∗ 2F = 2B −A2F .
A∗ − B ∗ 0
But then (9.21) implies ti=1 |αi − βi |2 ≤ B − A2F . Since t ≤ n and αi = βi = 0
for i = t + 1, . . . , n we obtain (9.20).
Because of Theorem 9.11 and the Hoffman-Wielandt Theorem for singular
values, Theorem 9.13 we will say that the singular values of a matrix are well
conditioned. Changing the Frobenius norm or the spectral norm of a matrix by
small amount only changes the singular values by a small amount.
9.6 Exercises Chap. 9
9.6.1 Exercises Sect. 9.1
Exercise 9.1 (Fitting a Circle to Points) In this problem we derive an algorithm
to fit a circle (t − c1 )2 + (y − c2 )2 = r 2 to m ≥ 3 given points (ti , yi )m
i=1 in the
(t, y)-plane. We obtain the overdetermined system
(ti − c1 )2 + (yi − c2 )2 = r 2 , i = 1, . . . , m,
(9.22)
of m equations in the three unknowns c1 , c2 and r. This system is nonlinear, but it
can be solved from the linear system
ti x1 + yi x2 + x3 = ti2 + yi2 , i = 1, . . . , m,
(9.23)
and then setting c1 = x1 /2, c2 = x2 /2 and r 2 = c12 + c22 + x3 .
a) Derive (9.23) from (9.22). Explain how we can find c1 , c2 , r once [x1 , x2 , x3 ] is
determined.
b) Formulate (9.23) as a linear least squares problem for suitable A and b.
c) Does the matrix A in b) have linearly independent columns?
d) Use (9.23) to find the circle passing through the three points (1, 4), (3, 2), (1, 0).
9.6 Exercises Chap. 9
217
Exercise 9.2 (Least Square Fit (Exam Exercise 2018-1))
√ √ 2 √2
a) Let A be the matrix
. Find the singular values of A, and compute
3
0
A2 .
3α
b) Consider the matrix A =
, where α is a real number. For which values of
α1
α is A positive definite?
c) We would like to fit the points p 1 = (0, 1), p 2 = (1, 0), p 3 = (2, 1) to a straight
line in the plane. Find a line p(x) = mx + b which minimizes
3
p(xi ) − yi 2 ,
i=1
where p i = (xi , yi ). Is this solution unique?
9.6.2 Exercises Sect. 9.2
Exercise 9.3 (A Least Squares Problem (Exam Exercise 1983-2)) Suppose A ∈
Rm×n and let I ∈ Rn×n be the identity matrix. We define F : Rn → R by
F (x) := Ax − b22 + x22 .
a) Show that the matrix B := I + AT A is symmetric and positive definite.
b) Show that
F (x) = x T Bx − 2cT x + bT b,
where c = AT b.
c) Show that to every b ∈ Rm there is a unique x which minimizes F . Moreover, x
is the unique solution of the linear system (I + AT A)x = AT b.
Exercise 9.4 (Weighted Least Squares (Exam Exercise 1977-2)) For m ≥ n
we are given A ∈ Rm×n with linearly independent columns, b ∈ Rm , and D :=
diag(d1 , d2 , . . . , dm ) ∈ Rm×m , where di > 0, i = 1, 2, . . . , m. We want to minimize
m
r(x)2D :=
ri (x)2 di ,
i=1
x ∈ Rn ,
(9.24)
218
9 Least Squares
where ri = ri (x), i = 1, 2, . . . , m are the components of the vector
r = r(x) = b − Ax.
a) Show that r(x)2D in (9.24) obtains a unique minimum when x = x min is the
solution of the system
AT DAx = AT Db.
b) Show that
K2 (AT DA) ≤ K2 (AT A)K2 (D),
where for any nonsingular matrix K2 (B) := B2 B −1 2 .
9.6.3 Exercises Sect. 9.3
Exercise 9.5 (Uniqueness of Generalized Inverse) Given A ∈ Cm×n , and suppose B, C ∈ Cn×m satisfy
ABA
BAB
(AB)∗
(BA)∗
=A
=B
= AB
= BA
(1) ACA = A,
(2) CAC = C,
(3) (AC)∗ = AC,
(4) (CA)∗ = CA.
Verify the following proof that B = C.
B = (BA)B = (A∗ )B ∗ B = (A∗ C ∗ )A∗ B ∗ B = CA(A∗ B ∗ )B
= CA(BAB) = (C)AB = C(AC)AB = CC ∗ A∗ (AB)
= CC ∗ (A∗ B ∗ A∗ ) = C(C ∗ A∗ ) = CAC = C.
Exercise 9.6 (Verify That a !Matrix
Inverse) Show that the
" Is a Generalized
11
1 110
†
generalized inverse of A = 1 1 is A = 4 1 1 0 without using the singular
00
value decomposition of A.
Exercise 9.7 (Linearly Independent Columns and Generalized Inverse) Suppose A ∈ Cm×n has linearly independent columns. Show that A∗ A is nonsingular
and A† = (A∗ A)−1 A∗ . If A has linearly independent rows, then show that AA∗ is
nonsingular and A† = A∗ (AA∗ )−1 .
9.6 Exercises Chap. 9
219
Exercise 9.8 (More Orthogonal Projections) Given m, n ∈ N, A ∈ Cm×n of
rank r, and let S be one of the subspaces R(A∗ ), N (A). Show that the orthogonal
projection of v ∈ Cn into S can be written as a matrix P S times the vector v in the
form P S v, where
r
P R(A∗ ) = A† A = V 1 V ∗1 =
v j v ∗j ∈ Cn×n
j =1
P N (A) = I − A† A = V 2 V ∗2 =
n
(9.25)
v j v ∗j ∈ Cn×n .
j =r+1
where A† is the generalized inverse of A and A = U ΣV ∗ ∈ Cm×n , is a singular
value decomposition of A (cf. (9.7)). Thus (9.12) and (9.25) give the orthogonal
projections into the 4 fundamental subspaces. Hint: by Theorem 7.3 we have the
orthogonal sum
⊥
Cn = R(A∗ )⊕N (A).
Exercise 9.9 (The Generalized Inverse of a Vector) Show that u† = (u∗ u)−1 u∗
if u ∈ Cn,1 is nonzero.
Exercise 9.10 (The Generalized Inverse of an Outer Product) If A = uv ∗ where
u ∈ Cm , v ∈ Cn are nonzero, show that
A† =
1 ∗
A ,
α
α = u22 v22 .
Exercise 9.11 (The Generalized Inverse of a Diagonal Matrix) Show that
diag(λ1 , . . . , λn )† = diag(λ†1 , . . . , λ†n ) where
,
λ†i =
1/λi , λi = 0
0 λi = 0.
Exercise 9.12 (Properties of the Generalized Inverse) Suppose A ∈ Cm×n .
Show that
a) (A∗ )† = (A† )∗ .
b) (A† )† = A.
c) (αA)† = α1 A† , α = 0.
Exercise 9.13 (The Generalized Inverse of a Product) Suppose k, m, n ∈ N, A ∈
Cm×n , B ∈ Cn×k . Suppose A has linearly independent columns and B has linearly
independent rows.
a) Show that (AB)† = B † A† .
Hint: Let E = AB, F = B † A† . Show by using A† A = BB † = I that F is the
generalized inverse of E.
b) Find A ∈ R1,2 , B ∈ R2,1 such that (AB)† = B † A† .
220
9 Least Squares
Exercise 9.14 (The Generalized Inverse of the Conjugate Transpose) Show that
A∗ = A† if and only if all singular values of A are either zero or one.
Exercise 9.15 (Linearly Independent Columns) Show that if A has rank n then
A(A∗ A)−1 A∗ b is the projection of b into R(A). (Cf. Exercise 9.8.)
Exercise 9.16 (Analysis of the General Linear System) Consider the linear
system Ax = b where A ∈ Cn×n has rank r > 0 and b ∈ Cn . Let
U ∗ AV =
Σ1 0
0 0
represent the singular value decomposition of A.
a) Let c = [c1 , . . . , cn ]∗ = U ∗ b and y = [y1 , . . . , yn ]∗ = V ∗ x. Show that Ax = b
if and only if
Σ1 0
y = c.
0 0
b) Show that Ax = b has a solution x if and only if cr+1 = · · · = cn = 0.
c) Deduce that a linear system Ax = b has either no solution, one solution or
infinitely many solutions.
Exercise 9.17 (Fredholm’s Alternative) For any A ∈ Cm×n , b ∈ Cn show that
one and only one of the following systems has a solution
(1)
Ax = b,
(2)
A∗ y = 0, y ∗ b = 0.
In other words either b ∈ R(A), or we can find y ∈ N (A∗ ) such that y ∗ b = 0. This
is called Fredholm’s alternative.
Exercise 9.18 (SVD (Exam Exercise 2017-2)) Let A ∈ Cm×n , with m ≥ n, be a
matrix on the form
B
A=
C
where B is a non-singular n × n matrix and C is in C(m−n)×n . Let A† denote the
pseudoinverse of A. Show that A† 2 ≤ B −1 2 .
9.6 Exercises Chap. 9
221
9.6.4 Exercises Sect. 9.4
Exercise 9.19 (Condition Number) Let
⎡
⎤
12
A = ⎣1 1⎦,
11
⎡
⎤
b1
b = ⎣ b2 ⎦ .
b3
a) Determine the projections b1 and b2 of b on R(A) and N (A∗ ).
b) Compute K(A) = A2 A† 2 .
Exercise 9.20 (Equality in Perturbation Bound) Let A ∈ Cm×n . Suppose y A
and y A† are vectors with y A = y A† = 1 and A = Ay A and A† =
A† y A† .
a) Show that we have equality to the right in (9.13) if b = Ay A , e1 = y A† .
b) Show that we have equality to the left if we switch b and e1 in a).
c) Let A be as in Example 9.7. Find extremal b and e when the l∞ norm is used.
This generalizes the sharpness results in Exercise 8.18. For if m = n and A is
nonsingular then A† = A−1 and e1 = e.
Exercise 9.21 (Problem Using Normal Equations) Consider the least squares
problems where
⎡
⎤
1 1
A = ⎣1 1 ⎦,
1 1+
⎡ ⎤
2
b = ⎣3⎦,
∈ R.
2
a) Find the normal equations and the exact least squares solution.
∗
2
b) Suppose is small and we replace the (2, 2)
√ entry 3+2 + in A A by 3+2 .
(This will be done in a computer
if < u, u being the round-off unit). For
√
example, if u = 10−16 then u = 10−8 . Solve A∗ Ax = A∗ b for x and compare
with the x found in a). (We will get a much more accurate result using the QR
factorization or the singular value decomposition on this problem).
9.6.5 Exercises Sect. 9.5
Exercise 9.22 (Singular Values Perturbation (Exam Exercise 1980-2)) Let
A( ) ∈ Rn×n be bidiagonal with ai,j = 0 for i, j = 1, . . . , n and j = i, i + 1.
Moreover, for some 1 ≤ k ≤ n − 1 we have ak,k+1 = ∈ R. Show that
|σi ( ) − σi (0)| ≤ | |,
i = 1, . . . , n,
where σi ( ), i = 1, . . . , n are the singular values of A( ).
222
9 Least Squares
9.7 Review Questions
9.7.1
9.7.2
9.7.3
9.7.4
Do the normal equations always have a solution?
When is the least squares solution unique?
Express the general least squares solution in terms of the generalized inverse.
Consider perturbing the right-hand side in a linear equation and a least squares
problem. What is the main difference in the perturbation inequalities?
9.7.5 Why does one often prefer using QR factorization instead of normal equations
for solving least squares problems.
9.7.6 What is an orthogonal sum?
9.7.7 How is an orthogonal projection defined?
Part IV
Kronecker Products and Fourier
Transforms
We give an introduction to Kronecker products of matrices and the fast Fourier
transform. We illustrate the usefulness by giving a fast method for solving the 2
dimensional discrete Poison Equation based on the fast Fourier transform.
Chapter 10
The Kronecker Product
Matrices arising from 2D and 3D problems sometimes have a Kronecker product
structure. Identifying a Kronecker structure can be very rewarding since it simplifies
the derivation of properties of such matrices.
10.1 The 2D Poisson Problem
Let Ω := (0, 1)2 = {(x, y) : 0 < x, y < 1} be the open unit square with boundary
∂Ω. Consider the problem
− Δu := −
∂ 2u ∂ 2u
− 2 = f on Ω,
∂x 2
∂y
(10.1)
u := 0 on ∂Ω.
Here the function f is given and continuous on Ω, and we seek a function u =
u(x, y) such that (10.1) holds and which is zero on ∂Ω.
Let m be a positive integer. We solve the problem numerically by finding
approximations vj,k ≈ u(j h, kh) on a grid of points given by
Ω h := {(j h, kh) : j, k = 0, 1, . . . , m + 1},
where h = 1/(m + 1).
The points Ωh := {(j h, kh) : j, k = 1, . . . , m} are called the interior points, while
Ω h \ Ωh are the boundary points. The solution is zero at the boundary points. Using
the difference approximation from Chap. 2 for the second derivative we obtain the
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_10
225
226
10 The Kronecker Product
following approximations for the partial derivatives
vj −1,k − 2vj,k + vj +1,k ∂ 2 u(j h, kh)
vj,k−1 − 2vj,k +vj,k+1
∂ 2 u(j h, kh)
≈
,
≈
.
∂x 2
h2
∂y 2
h2
Inserting this in (10.1) we get the following discrete analog of (10.1)
−Δh vj,k = fj,k ,
vj,k = 0,
(j h, kh) ∈ Ωh ,
(j h, kh) ∈ ∂Ωh ,
(10.2)
(10.3)
where fj,k := f (j h, kh) and
− Δh vj,k :=
−vj −1,k + 2vj,k − vj +1,k
−vj,k−1 + 2vj,k − vj,k+1
+
.
h2
h2
(10.4)
Let us take a closer look at (10.2). It consists of n := m2 linear equations. Since the
values at the boundary points are known, the unknowns are the n numbers vj,k at
the interior points. These linear equations can be written as a matrix equation in the
form
T V + V T = h2 F
with h = 1/(m + 1),
(10.5)
where T = tridiag(−1, 2, −1) ∈ Rm×m is the second derivative matrix given by
(2.27) and
⎤
v1,1 · · · v1,m
⎢
.. ⎥ ∈ Rm×m ,
V := ⎣ ...
. ⎦
vm,1 · · · vm,m
⎡
⎤
f1,1 · · · f1,m
⎢
.. ⎥ ∈ Rm×m .
F := ⎣ ...
. ⎦
fm,1 · · · fm,m
⎡
(10.6)
Indeed, the (j, k) element in T V + V T is given by
m
m
T j,i vi,k +
i=1
vj,i T i,k = −vj −1,k + 2vj,k − vj +1,k − vj,k−1 + 2vj,k − vj,k+1 ,
i=1
and if we divide by h2 this is precisely the left hand side of (10.2).
To write (10.5) in standard form Ax = b we need to order the unknowns vj,k in
some way. The following operation of vectorization of a matrix gives one possible
ordering.
Definition 10.1 (vec Operation) For any B ∈ Rm×n we define the vector
vec(B) := [b11 , . . . , bm1 , b12 , . . . , bm2 , . . . , b1n , . . . , bmn ]T ∈ Rmn
by stacking the columns of B on top of each other.
10.1 The 2D Poisson Problem
227
1,1
1,2
1,3
1,3
2,3
3,3
7
8
9
2,1
2,2
2,3
1,2
2,2
3,2
4
5
6
3,1
3,2
3,3
1,1
2,1
3,1
1
2
3
vj,k in V - matrix
v in grid
x i in grid
x i+m
-1
j,k
Fig. 10.1 Numbering of grid points
vj,k+1
vj-1,k
vj+1,k x i-1
vj,k
x i+1
xi
vj,k-1
-1
4
xi-m
-1
-1
Fig. 10.2 The 5-point stencil
Let x := vec(V ) ∈ Rn , where n = m2 . Note that forming x by stacking the
columns of V on top of each other means an ordering of the grid points. For m = 3
this is illustrated in Fig. 10.1. We call this the natural ordering. The elements in
(10.2) defines a 5-point stencil, as shown in Fig. 10.2.
To find the matrix A we note that for values of j, k where the 5-point stencil does
not touch the boundary, (10.2) implies
4xi − xi−1 − xi+1 − xi−m − xi+m = bi ,
where xi = vj,k and bi = h2 fj,k . This must be modified close to the boundary. We
then obtain the linear system
Ax = b,
A ∈ Rn×n ,
b ∈ Rn ,
n = m2 ,
(10.7)
where x = vec(V ), b = h2 vec(F ) and A is the Poisson matrix given by
aii = 4,
ai+1,i = ai,i+1 = −1,
ai+m,i = ai,i+m = −1,
aij = 0,
i = 1, . . . , n,
i = 1, . . . , n − 1, i = m, 2m, . . . , (m − 1)m,
i = 1, . . . , n − m,
otherwise.
(10.8)
228
10 The Kronecker Product
n=9
n = 25
n = 100
Fig. 10.3 Band structure of the 2D test matrix
For m = 3 we have the following matrix
⎤
4 −1 0 −1 0 0 0 0 0
⎢ −1 4 −1 0 −1 0 0 0 0 ⎥
⎥
⎢
⎢ 0 −1 4 0 0 −1 0 0 0 ⎥
⎥
⎢
⎢ −1 0 0 4 −1 0 −1 0 0 ⎥
⎥
⎢
⎥
⎢
A = ⎢ 0 −1 0 −1 4 −1 0 −1 0 ⎥ .
⎥
⎢
⎢ 0 0 −1 0 −1 4 0 0 −1 ⎥
⎥
⎢
⎢ 0 0 0 −1 0 0 4 −1 0 ⎥
⎥
⎢
⎣ 0 0 0 0 −1 0 −1 4 −1 ⎦
0 0 0 0 0 −1 0 −1 4
⎡
The bands of the weakly diagonally dominant matrix A are illustrated in
Fig. 10.3.
10.1.1 The Test Matrices
In Sect. 2.4 we encountered the 1-dimensional test matrix T 1 ∈ Rm×m defined for
any real numbers a, d by
T 1 := tridiag(a, d, a).
(10.9)
The (2-dimensional) Poisson matrix is a special case of the matrix T 2 = [aij ] ∈
Rn×n with elements
aii
ai,i+1 = ai+1,i
ai,i+m = ai+m,i
aij
=
=
=
=
2d, i = 1, . . . , n,
a, i = 1, . . . , n − 1, i = m, 2m, . . . , (m − 1)m,
a, i = 1, . . . , n − m,
0, otherwise,
(10.10)
10.2 The Kronecker Product
229
and where a, d are real numbers. We will refer to this matrix as simply the 2D test
matrix. For m = 3 the 2D test matrix looks as follows
⎡
2d
⎢ a
⎢
⎢ 0
⎢
⎢ a
⎢
⎢
T2 = ⎢ 0
⎢
⎢ 0
⎢
⎢ 0
⎢
⎣ 0
0
a
2d
a
0
a
0
0
0
0
0
a
2d
0
0
a
0
0
0
a 0
0 a
0 0
2d a
a 2d
0 a
a 0
0 a
0 0
0
0
a
0
a
2d
0
0
a
⎤
0 0 0
0 0 0⎥
⎥
0 0 0⎥
⎥
a 0 0⎥
⎥
⎥
0 a 0⎥.
⎥
0 0 a⎥
⎥
2d a 0 ⎥
⎥
a 2d a ⎦
0 a 2d
(10.11)
The partition into 3 × 3 sub matrices shows that T 2 is block tridiagonal.
Properties of T 2 can be derived from properties of T 1 by using properties of the
Kronecker product.
10.2 The Kronecker Product
Definition 10.2 (Kronecker Product) For any positive integers p, q, r, s we
define the Kronecker product of two matrices A ∈ Rp×q and B ∈ Rr×s as a
matrix C ∈ Rpr×qs given in block form as
⎡
Ab1,1 Ab1,2
⎢ Ab2,1 Ab2,2
⎢
C=⎢ .
..
⎣ ..
.
Abr,1 Abr,2
⎤
· · · Ab1,s
· · · Ab2,s ⎥
⎥
. ⎥.
..
. .. ⎦
· · · Abr,s
We denote the Kronecker product of A and B by C = A ⊗ B.
This definition of the Kronecker product is known more precisely as the left
Kronecker product. In the literature one often finds the right Kronecker product
which in our notation is given by B ⊗ A.
T
The Kronecker product u ⊗ v = uT v1 , . . . , uT vr of two column vectors
u ∈ Rp and v ∈ Rr is a column vector of length p · r.
The test matrix T 2 can be written as a sum of Kronecker products. Indeed, if
m = 3 then
⎡
⎤
d a 0
T1 = ⎣a d a⎦,
0ad
⎡
⎤
100
I = ⎣0 1 0⎦
001
230
10 The Kronecker Product
and
⎡
⎤
⎤ ⎡
T1 0 0
dI aI 0
T 1 ⊗ I + I ⊗ T 1 = ⎣ 0 T 1 0 ⎦ + ⎣ aI dI aI ⎦ = T 2
0 aI dI
0 0 T1
given by (10.11). This formula holds for any integer m ≥ 2
T 2 = T 1 ⊗ I + I ⊗ T 1,
T 1 , I ∈ Rm×m ,
2 )×(m2 )
T 2 ∈ R(m
.
(10.12)
The sum of two Kronecker products involving the identity matrix is worthy of a
special name.
Definition 10.3 (Kronecker Sum) For positive integers r, s, k, let A ∈ Rr×r , B ∈
Rs×s , and I k be the identity matrix of order k. The sum A ⊗ I s + I r ⊗ B is known
as the Kronecker sum of A and B.
In other words, the 2D test matrix T 2 is the Kronecker sum involving the 1D test
matrix T 1 .
The following simple arithmetic rules hold for Kronecker products. For scalars
λ, μ and matrices A, A1 , A2 , B, B 1 , B 2 , C of dimensions such that the operations
are defined, we have
λA ⊗ μB = λμ A ⊗ B ,
A1 + A2 ⊗ B = A1 ⊗ B + A2 ⊗ B,
(10.13)
A ⊗ B 1 + B 2 = A ⊗ B 1 + A ⊗ B 2,
(A ⊗ B) ⊗ C = A ⊗ (B ⊗ C).
Note however that in general we have A ⊗ B = B ⊗ A, but it can be shown that
there are permutation matrices P , Q such that B ⊗ A = P (A ⊗ B)Q, see [9].
The following mixed product rule is an essential tool for dealing with Kronecker products and sums.
Lemma 10.1 (Mixed Product Rule) Suppose A, B, C, D are rectangular matrices with dimensions so that the products AC and BD are well defined. Then the
product (A ⊗ B)(C ⊗ D) is defined and
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).
Proof If B ∈ Rr,t and D ∈ Rt,s for some integers r, s, t, then
⎡
⎤⎡
⎤
Ab1,1 · · · Ab1,t
Cd1,1 · · · Cd1,s
⎢
.. ⎥ ⎢ ..
.. ⎥ .
(A ⊗ B)(C ⊗ D) = ⎣ ...
. ⎦⎣ .
. ⎦
Abr,1 · · · Abr,t
Cdt,1 · · · Cdt,s
(10.14)
10.2 The Kronecker Product
231
Thus for all i, j
t
((A ⊗ B)(C ⊗ D))i,j = AC
bi,k dk,j = (AC)(BD)i,j = ((AC) ⊗ (BD))i,j .
k=1
Using the mixed product rule we obtain the following properties of Kronecker
products and sums.
Theorem 10.1 (Properties of Kronecker Products) Suppose for r, s ∈ N that
A ∈ Rr×r and B ∈ Rs×s are square matrices with eigenpairs (λi , ui ) i = 1, . . . , r
and (μj , v j ), j = 1, . . . , s. Moreover, let F , V ∈ Rr×s . Then
1. (A ⊗ B)T = AT ⊗ B T , (this also holds for rectangular matrices).
2. If A and B are nonsingular then A ⊗ B is nonsingular. with (A ⊗ B)−1 =
A−1 ⊗ B −1 .
3. If A and B are symmetric then A ⊗ B and A ⊗ I s + I r ⊗ B are symmetric.
4. (A ⊗ B)(ui ⊗ v j ) = λi μj (ui ⊗ v j ), i = 1, . . . , r, j = 1, . . . , s,
5. (A⊗I s +I r ⊗B)(ui ⊗v j ) = (λi +μj )(ui ⊗v j ), i = 1, . . . , r, j = 1, . . . , s.,
6. If one of A, B is positive definite and the other is positive semidefinite then
A ⊗ I + I ⊗ B is positive definite.
7. AV B T = F ⇔ (A ⊗ B) vec(V ) = vec(F ),
8. AV + V B T = F ⇔ (A ⊗ I s + I r ⊗ B) vec(V ) = vec(F ).
Before giving the simple proofs of this theorem we present some comments.
1. The transpose (or the inverse) of an ordinary matrix product equals the transpose
(or the inverse) of the matrices in reverse order. For Kronecker products the order
is kept.
2. The eigenvalues of the Kronecker product (sum) are the product (sum) of the
eigenvalues of the factors. The eigenvectors are the Kronecker products of the
eigenvectors of the factors. In particular, the eigenvalues of the test matrix T 2 are
sums of eigenvalues of T 1 .
3. Since we already know that T = tridiag(−1, 2, −1) is positive definite the 2D
Poisson matrix A = T ⊗ I + I ⊗ T is also positive definite.
4. The system AV B T = F in part 7 can be solved by first finding W from AW =
F , and then finding V from BV T = W T . This is preferable to solving the much
larger linear system (A ⊗ B) vec(V ) = vec(F ).
5. A fast way to solve the 2D Poisson problem in the form T V + V T = F will be
considered in the next chapter.
232
10 The Kronecker Product
Proof of Theorem 10.1
1. Exercise.
−1
−1 = AA−1 ⊗ BB −1 =
2. By the mixed product
rule A ⊗ B A ⊗ B
I r ⊗ I s = I rs .Thus A ⊗ B is nonsingular with the indicated inverse.
3. By 1, (A ⊗ B)T = AT ⊗ B T = A ⊗ B. Moreover, since then A ⊗ I and I ⊗ B
are symmetric, their sum is symmetric.
4. (A ⊗ B)(ui ⊗ v j ) = (Aui ) ⊗ (Bv j ) = (λi ui ) ⊗ (μj v j ) = (λi μj )(ui ⊗ v j ), for
all i, j , where we used the mixed product rule.
5. (A ⊗ I s )(ui ⊗ v j ) = λi (ui ⊗ v j ), and (I r ⊗ B)(ui ⊗ v j ) = μj (ui ⊗ v j ).
The result now follows by summing these relations.
6. By 1, A⊗I +I ⊗B is symmetric. Moreover, the eigenvalues λi +μj are positive
since for all i, j , both λi and μj are nonnegative and one of them is positive. It
follows that A ⊗ I + I ⊗ B is positive definite.
7. We partition V , F , and B T by columns as V = [v 1 , . . . , v s ], F = [f 1 , . . . , f s ]
and B T = [b1 , . . . , bs ]. Then we have
(A ⊗ B) vec(V ) = vec(F )
⎡
⎤⎡ ⎤ ⎡ ⎤
f1
Ab11 · · · Ab1s
v1
⎢
⎢ ..
⎥
⎢
⎥
.
.
.. ⎦ ⎣ .. ⎦ = ⎣ ... ⎥
⇔ ⎣ .
⎦
⇔
Abs1 · · · Abss
A
b1j v j , . . . ,
j
⇔
vs
fs
bsj v j = [f 1 , . . . , f s ]
j
A[V b1 , . . . , V bs ] = F
⇔
AV B T = F .
8. From the proof of 7. we have (A ⊗ I s ) vec(V ) = AV I Ts and (I r ⊗ B) vec(V ) =
I r V B T . But then
(A ⊗ I s + I r ⊗ B) vec(V ) = vec(F )
⇔
(AV I Ts + I r V B T ) = F
⇔
AV + V B T = F .
For more on Kronecker products see [9].
10.3 Properties of the 2D Test Matrices
Using Theorem 10.1 we can derive properties of the 2D test matrix T 2 from those
of T 1 . Recall (cf. Lemma 2.2) that T 1 s j = λj s j for j = 1, . . . , m, where
λj = d + 2a cos(j πh),
h :=
1
,
m+1
(10.15)
10.3 Properties of the 2D Test Matrices
233
s j = [sin(j πh), sin(2j πh), . . . , sin(mj πh)]T .
(10.16)
Moreover, the eigenvalues are distinct and the eigenvectors are orthogonal
s Tj s k =
m+1
1
δj,k =
δj,k ,
2
2h
j, k = 1, . . . , m.
(10.17)
Theorem 10.2 (Eigenpairs of 2D Test Matrix) For fixed m ≥ 2 let T 2 be the
matrix given by (10.10) and let h = 1/(m + 1). Then
T 2 (s j ⊗ s k ) = (λj + λk )(s j ⊗ s k )
j, k = 1, . . . , m,
(10.18)
where (λj , s j ) are the eigenpairs of T 1 given by (10.15) and (10.16). The
eigenvectors s j ⊗ s k are orthogonal
(s j ⊗ s k )T (s p ⊗ s q ) =
1
δj,p δk,q ,
4h2
j, k, p, q = 1, . . . , m,
(10.19)
and T 2 is positive definite if d > 0 and d ≥ 2|a|.
Proof Equation (10.18) follows from Part 5. of Theorem 10.1. Using the transpose
rule, the mixed product rule and (2.32) we find for j, k, p, q = 1, . . . , m
T 1
s p ⊗ s q = s Tj ⊗ s Tk s p ⊗ s q = s Tj s p ⊗ s Tk s q = 2 δj,p δk,q
sj ⊗ sk
4h
and (10.19) follows. Since T 2 is symmetric, T 2 is positive definite if the λj given
by (10.15) are positive. But this is true whenever d > 0 and d ≥ 2|a| (cf.
Exercise 10.5).
Corollary 10.1 The spectral condition number of the discrete Poisson matrix A ∈
2
2
Rm ×m given by (10.8) is given by
A2 A−1 2 =
cos2 w
sin2 w
,
w :=
π
.
2(m + 1)
(10.20)
Proof Recall that by (10.15) with d = 2, a = −1, and (10.18), the eigenvalues λj,k
of A are
λj,k = 4−2 cos (2j w)−2 cos (2kw) = 4 sin2 (j w)+4 sin2 (kw),
j, k =1, . . . , m.
Using trigonometric formulas, it follows that the largest and smallest eigenvalue of
A, are given by
λmax = 8 cos2 w,
λmin = 8 sin2 w.
234
10 The Kronecker Product
Since d > 0 and d ≥ 2|a| it follows that A is positive definite. By (8.26) we have
A2 A−1 2 = λλmax
and (10.20) follows.
min
10.4 Exercises Chap. 10
10.4.1 Exercises Sects. 10.1, 10.2
Exercise 10.1 (4 × 4 Poisson Matrix) Write down the Poisson matrix for m = 2
and show that it is strictly diagonally dominant.
Exercise 10.2 (Properties of Kronecker Products) Prove (10.13).
Exercise 10.3 (Eigenpairs of Kronecker Products (Exam Exercise 2008-3)) Let
A, B ∈ Rn×n . Show that the eigenvalues of the Kronecker product A ⊗ B are
products of the eigenvalues of A and B and that the eigenvectors of A ⊗ B are
Kronecker products of the eigenvectors of A and B.
10.4.2 Exercises Sect. 10.3
Exercise 10.4 (2. Derivative Matrix Is Positive Definite) Write down the eigenvalues of T = tridiag(−1, 2, −1) using (10.15) and conclude that T is symmetric
positive definite.
Exercise 10.5 (1D Test Matrix Is Positive Definite?) Show that the matrix T 1 is
symmetric positive definite if d > 0 and d ≥ 2|a|.
Exercise 10.6 (Eigenvalues for 2D Test Matrix of Order 4) For m = 2 the matrix
(10.10) is given by
⎡
2d a a
⎢ a 2d 0
A=⎢
⎣ a 0 2d
0 a a
⎤
0
a⎥
⎥.
a⎦
2d
Show that λ = 2a + 2d is an eigenvalue corresponding to the eigenvector x =
[1, 1, 1, 1]T . Verify that apart from a scaling of the eigenvector this agrees with
(10.15) and (10.16) for j = k = 1 and m = 2.
Exercise 10.7 (Nine Point Scheme for Poisson Problem) Consider the following
9 point difference approximation to the Poisson problem −Δu = f , u = 0 on the
10.4 Exercises Chap. 10
235
boundary of the unit square (cf. (10.1))
j, k = 1, . . . , m
(a) −(h v)j,k = (μf )j,k
(b)
0 = v0,k = vm+1,k = vj,0 = vj,m+1 , j, k = 0, 1, . . . , m + 1,
(c) −(h v)j,k = [20vj,k − 4vj −1,k − 4vj,k−1 − 4vj +1,k − 4vj,k+1
− vj −1,k−1 − vj +1,k−1 − vj −1,k+1 − vj +1,k+1 ]/(6h2 ),
(d)
(μf )j,k = [8fj,k + fj −1,k + fj,k−1 + fj +1,k + fj,k+1 ]/12.
(10.21)
a) Write down the 4-by-4 system we obtain for m = 2.
b) Find vj,k for j, k = 1, 2, if f (x, y) = 2π 2 sin (πx) sin (πy) and m = 2.
Answer: vj,k = 5π 2 /66.
It can be shown that (10.21) defines an O(h4 ) approximation to (10.1).
Exercise 10.8 (Matrix Equation for Nine Point Scheme) Consider the nine point
difference approximation to (10.1) given by (10.21) in Problem 10.7.
a) Show that (10.21) is equivalent to the matrix equation
1
T V + V T − T V T = h2 μF .
6
(10.22)
Here μF has elements (μf )j,k given by (10.21d) and T = tridiag(−1, 2, −1).
b) Show that the standard form of the matrix equation (10.22) is Ax = b, where
A = T ⊗ I + I ⊗ T − 16 T ⊗ T , x = vec(V ), and b = h2 vec(μF ).
Exercise 10.9 (Biharmonic Equation) Consider the biharmonic equation
Δ2 u(s, t) := Δ Δu(s, t) = f (s, t) (s, t) ∈ Ω,
u(s, t) = 0, Δu(s, t) = 0
(s, t) ∈ ∂Ω.
(10.23)
Here Ω is the open unit square. The condition Δu = 0 is called the Navier boundary
condition. Moreover, Δ2 u = uxxxx + 2uxxyy + uyyyy .
a) Let v = −Δu. Show that (10.23) can be written as a system
−Δv(s, t) = f (s, t)
(s, t) ∈ Ω
−Δu(s, t) = v(s, t)
(s, t) ∈ Ω
u(s, t) = v(s, t) = 0 (s, t) ∈ ∂Ω.
(10.24)
m×m , h = 1/(m +
b) Discretizing, using (10.4),
m with T = tridiag(−1, 2, −1) ∈ R
1), and F = f (j h, kh) j,k=1 we get two matrix equations
T V + V T = h2 F ,
T U + U T = h2 V .
236
10 The Kronecker Product
Show that
(T ⊗ I + I ⊗ T )vec(V ) = h2 vec(F ),
(T ⊗ I + I ⊗ T )vec(U ) = h2 vec(V ).
and hence A = (T ⊗ I + I ⊗ T )2 is the matrix for the standard form of the
discrete biharmonic equation.
c) Show that with n = m2 the vector form and standard form of the systems in b)
can be written
T 2 U + 2T U T + U T 2 = h4 F
and Ax = b,
(10.25)
where A = T 2 ⊗I +2T ⊗T +I ⊗T 2 ∈ Rn×n , x = vec(U ), and b = h4 vec(F ).
d) Determine the eigenvalues and eigenvectors of the matrix A in c) and show that
it is positive definite. Also determine the bandwidth of A.
e) Suppose we want to solve the standard form equation Ax = b. We have two
representations for the matrix A, the product one in b) and the one in c). Which
one would you prefer for the basis of an algorithm? Why?
10.5 Review Questions
10.5.1 Consider the Poisson matrix.
• Write this matrix as a Kronecker sum,
• how are its eigenvalues and eigenvectors related to the second derivative
matrix?
• is it symmetric? positive definite?
10.5.2 What are the eigenpairs of T 1 := tridiagonal(a, d, a)?
10.5.3 What are the inverse and transpose of a Kronecker product?
10.5.4
• give an economical general way to solve the linear system (A ⊗
B) vec(V ) = vec(F )?
• Same for (A ⊗ I s + I r ⊗ B) vec(V ) = vec(F ).
Chapter 11
Fast Direct Solution of a Large Linear
System
11.1 Algorithms for a Banded Positive Definite System
In this chapter we present a fast method for solving Ax = b, where A is the Poisson
matrix (10.8). Thus, for n = 9
⎡
⎤
4 −1 0 −1 0 0 0 0 0
⎢ −1 4 −1 0 −1 0 0 0 0 ⎥
⎢
⎥
⎢ 0 −1 4 0 0 −1 0 0 0 ⎥
⎢
⎥
⎢ −1 0 0 4 −1 0 −1 0 0 ⎥
⎢
⎥
⎢
⎥
A = ⎢ 0 −1 0 −1 4 −1 0 −1 0 ⎥
⎢
⎥
⎢ 0 0 −1 0 −1 4 0 0 −1 ⎥
⎢
⎥
⎢ 0 0 0 −1 0 0 4 −1 0 ⎥
⎢
⎥
⎣ 0 0 0 0 −1 0 −1 4 −1 ⎦
0 0 0 0 0 −1 0 −1 4
⎡
⎤
T + 2I
−I
0
=⎣
−I T + 2I
−I ⎦ ,
0
−I T + 2I
where T = tridiag(−1, 2, −1). For the matrix A we know by now that
1.
2.
3.
4.
5.
It is positive definite.
It is banded.
It is block-tridiagonal.
We know the eigenvalues and eigenvectors of A.
The eigenvectors are orthogonal.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_11
237
238
11 Fast Direct Solution of a Large Linear System
11.1.1 Cholesky Factorization
Since A is positive definite we can use the Cholesky factorization A = LL∗ , with√L
lower triangular, to solve Ax = b. Since A and L has the same bandwidth d = n
the complexity of this factorization is O(nd 2 ) = O(n2 ), cf. Algorithm 4.2. We need
to store A, and this can be done in sparse form.
The nonzero elements in L are shown in Fig. 11.1 for n = 100. Note that most
of the zeros between the diagonals in A have become nonzero in L. This is known
as fill-inn.
11.1.2 Block LU Factorization of a Block Tridiagonal Matrix
The Poisson matrix has a block tridiagonal structure. Consider finding the block LU
factorization of a block tridiagonal matrix. We are looking for a factorization of the
form
⎡D C
⎤
⎡ I
⎤ ⎡ U 1 C1
⎤
1
1
L1 I
⎢ A1 D 2 C 2
⎥
.
.
.. ..
..
⎢ .. ..
⎥=⎣ . .
⎦⎣
⎦.
(11.1)
.
. .
⎣
⎦
.. ..
U
C
m−1
m−1
A
D
C
m−2
m−1
m−1
Am−1 D m
Lm−1 I
Fig. 11.1 Fill-inn in the Cholesky factor of the Poisson matrix (n = 100)
Um
11.2 A Fast Poisson Solver Based on Diagonalization
239
Here D 1 , . . . , D m and U 1 , . . . , U m are square matrices while A1 , . . . , Am−1 ,L1 ,
. . .,Lm−1 and C 1 , . . . , C m−1 can be rectangular.
Using block multiplication the formulas (2.16) generalize to
U 1 = D1,
Lk = Ak U −1
k ,
U k+1 = D k+1 − Lk C k ,
k = 1, 2, . . . , m − 1.
(11.2)
To solve the system Ax = b we partition b conformally with A in the form bT =
[bT1 , . . . , bTm ]. The formulas for solving Ly = b and U x = y are as follows:
y 1 = b1 ,
x m = U −1
m ym,
y k = bk − Lk−1 y k−1 ,
x k = U −1
k (y k − C k x k+1 ),
k = 2, 3, . . . , m,
k = m − 1, . . . , 2, 1.
(11.3)
The solution is then x T = [x T1 , . . . , x Tm ]. To find Lk in (11.2) we solve the linear
systems Lk U k = Ak . Similarly we need to solve a linear system to find x k in (11.3).
The number of arithmetic operations using block factorizations is O(n2 ),
asymptotically the same as for Cholesky factorization. However we only need to
store the m × m blocks, and using matrix operations can be advantageous.
11.1.3 Other Methods
Other methods include
• Iterative methods, (we study this in Chaps. 12 and 13),
• multigrid. See [5],
• fast solvers based on diagonalization and the fast Fourier transform. See
Sects. 11.2, 11.3.
11.2 A Fast Poisson Solver Based on Diagonalization
The algorithm we now derive will only require O(n3/2 ) arithmetic operations and
we only need to work with matrices of order m. Using the fast Fourier transform the
number of arithmetic operations can be reduced further to O(n log n).
To start we recall that Ax = b can be written as a matrix equation in the form
(cf. (10.5))
T V + V T = h2 F
with h = 1/(m + 1),
where T = tridiag(−1, 2, −1) ∈ Rm×m is the second derivative matrix, V =
(vj,k ) ∈ Rm×m are the unknowns, and F = (fj,k ) = (f (j h, kh)) ∈ Rm×m contains
function values.
240
11 Fast Direct Solution of a Large Linear System
Recall (cf. Lemma 2.2) that the eigenpairs of T are given by
T s j = λj s j ,
j = 1, . . . , m,
s j = [sin (j πh), sin (2j πh), . . . , sin (mj πh)]T ,
λj = 2 − 2 cos(j πh) = 4 sin2 (j πh/2),
h = 1/(m + 1),
s Tj s k = δj,k /(2h) for all j, k.
Let
m
S := [s 1 , . . . , s m ] = sin (j kπh) j,k=1 ∈ Rm×m ,
D = diag(λ1 , . . . , λm ).
(11.4)
Then
T S = [T s 1 , . . . , T s m ] = [λ1 s 1 , . . . , λm s m ] = SD,
S2 = ST S =
1
I.
2h
Define X ∈ Rm×m by V = SXS, where V is the solution of T V + V T = h2 F .
Then
T V + V T = h2 F
V =SXS
⇐⇒ T SXS + SXST = h2 F
S( )S
⇐⇒ ST SXS 2 + S 2 XST S = h2 SF S = h2 G
T S=SD
⇐⇒ S 2 DXS 2 + S 2 XS 2 D = h2 G
S 2 =I /(2h)
⇐⇒
DX + XD = 4h4 G.
Since D is diagonal, the equation DX + XD = 4h4 G, is easy to solve. For the j, k
element we find
m
(DX + XD)j,k =
m
dj, x,k +
=1
xj, d,k = λj xj,k + λk xj,k
=1
so that for all j, k
xj,k = 4h4 gj,k /(λj + λk ) = h4 gj,k /(σj + σk ),
σj := λj /4 = sin2 (j πh/2).
11.2 A Fast Poisson Solver Based on Diagonalization
241
Thus to find V we compute
1. G = SF S,
2. xj,k = h4 gj,k /(σj + σk ),
3. V = SXS.
j, k = 1, . . . , m,
We can compute X, S and the σ ’s without using loops. Using outer products,
element by element division, and raising a matrix element by element to a power
we find
σ1 1
4
X = h G/M, where M := ... [ 1, ..., 1 ] + ... [ σ1 . ... .σm ] ,
σm
1
⎡ ⎤
⎡ ⎤
1
1
πh 2 2
⎣ . ⎦ ∧ 2.
S = sin πh ⎣ .. ⎦ [ 1 2 ... m ] , σ = sin
..
2
.
m
m
We now get the following algorithm to solve numerically the Poisson problem
−Δu = f on Ω = (0, 1)2 and u = 0 on ∂Ω using the 5-point scheme, i.e., let m ∈
N, h = 1/(m + 1), and F = (f (j h, kh)) ∈ Rm×m . We compute V ∈ R(m+2)×(m+2)
using diagonalization of T = tridiag(−1, 2, −1) ∈ Rm×m .
function V=fastpoisson(F)
%function V=fastpoisson(F)
m=length(F); h=1/(m+1); hv=pi*h*(1:m)’;
sigma=sin(hv/2).^2;
S=sin(hv*(1:m));
G=S*F*S;
X=h^4*G./(sigma*ones(1,m)+ ones(m,1)*sigma’);
V=zeros(m+2,m+2);
V(2:m+1,2:m+1)=S*X*S;
end
Listing 11.1 fastpoisson
The formulas are fully vectorized. Since the 6th line in Algorithm 11.1 only
requires O(m3 ) arithmetic operations, the complexity of this algorithm is for large
m determined by 4 m-by-m matrix multiplications and is given by O(4 × 2m3 ) =
O(8n3/2).1 The method is very fast and will be used as a preconditioner for a more
complicated problem in Chap. 13. In 2012 it took about 0.2 seconds on a laptop to
find the 106 unknowns vj,k on a 1000 × 1000 grid.
1 It is possible to compute V using only two matrix multiplications and hence reduce the complexity
to O(4n3/2 ). This is detailed in Problem 11.4.
242
11 Fast Direct Solution of a Large Linear System
11.3 A Fast Poisson Solver Based on the Discrete Sine and
Fourier Transforms
In Algorithm 11.1 we need to compute the product of the sine matrix S ∈ Rm×m
given by (11.4) and a matrix A ∈ Rm×m . Since the matrices are m-by-m this will
normally require O(m3 ) operations. In this section we show that it is possible to
calculate the products SA and AS in O(m2 log2 m) operations.
We need to discuss certain transforms known as the discrete sine transform, the
discrete Fourier transform and the fast Fourier transform. In addition we have
the discrete cosine transform which will not be discussed here. These transforms
are of independent interest. They have applications to signal processing and image
analysis, and are often used when one is dealing with discrete samples of data on a
computer.
11.3.1 The Discrete Sine Transform (DST)
Given v = [v1 , . . . , vm ]T ∈ Rm we say that the vector w = [w1 , . . . , wm ]T given
by
m
wj =
sin
k=1
j kπ vk ,
m+1
j = 1, . . . , m
is the discrete sine transform (DST) of v. In matrix form we can write the DST
as the matrix times vector w = Sv, where S is the sine matrix given by (11.4). We
can then identify the matrix B = SA as the DST of A ∈ Rm,n , i.e. as the DST of
the columns of A. The product B = AS can also be interpreted as a DST. Indeed,
since S is symmetric we have B = (SAT )T which means that B is the transpose
of the DST of the rows of A. It follows that we can compute the unknowns V in
Algorithm 11.1 by carrying out discrete sine transforms on 4 m-by-m matrices in
addition to the computation of X.
11.3.2 The Discrete Fourier Transform (DFT)
The fast computation of the DST is based on its relation to the discrete Fourier
transform (DFT) and the fact that the DFT can be computed by a technique known
as the fast Fourier transform (FFT). To define the DFT let for N ∈ N
ωN = exp−2πi/N = cos(2π/N) − i sin(2π/N),
(11.5)
11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms
243
√
where i = −1 is the imaginary unit. Given y = [y1 , . . . , yN ]T ∈ RN we say that
z = [z1 , . . . , zN ]T given by
N−1
jk
zj +1 =
ωN yk+1 ,
j = 0, . . . , N − 1,
k=0
is the discrete Fourier transform (DFT) of y. We can write this as a matrix times
vector product z = F N y, where the Fourier matrix F N ∈ CN×N has elements
jk
ωN , j, k = 0, 1, . . . , N − 1. For a matrix we say that B = F N A is the DFT of A.
As an example, since
ω4 = exp−2πi/4 = cos(π/2) − i sin(π/2) = −i
we find ω42 = (−i)2 = −1, ω43 = (−i)(−1) = i, ω44 = (−1)2 = 1, ω46 = i 2 = −1,
ω49 = i 3 = −i, and so
⎡
1 1
⎢ 1 ω4
F4 = ⎢
⎣ 1 ω2
4
1 ω43
1
ω42
ω44
ω46
⎤
⎤ ⎡
1 1 1 1
1
⎢
⎥
ω43 ⎥
⎥ = ⎢ 1 −i −1 i ⎥ .
6
⎣
⎦
1 −1 1 −1 ⎦
ω4
9
1 i −1 −i
ω4
(11.6)
The following lemma shows how the discrete sine transform of order m can be
computed from the discrete Fourier transform of order 2m + 2. We recall that for
any complex number w
sin w =
eiw − e−iw
.
2i
Lemma 11.1 (Sine Transform as Fourier Transform) Given a positive integer m
and a vector x ∈ Rm . Component k of Sx is equal to i/2 times component k + 1 of
F 2m+2 z where
zT = [0, x T , 0, −x TB ] ∈ R2m+2 ,
x TB := [xm , . . . , x2 , x1 ].
In symbols
(Sx)k =
i
(F 2m+2 z)k+1 ,
2
k = 1, . . . , m.
Proof Let ω = ω2m+2 = e−2πi/(2m+2) = e−πi/(m+1) . We note that
ωj k = e−πij k/(m+1) ,
ω(2m+2−j )k = e−2πi eπij k/(m+1) = eπij k/(m+1) .
244
11 Fast Direct Solution of a Large Linear System
Component k + 1 of F 2m+2 z is then given by
2m−1
(F 2m+2 z)k+1 =
m
m
ωj k zj +1 =
xj ω j k −
j =0
m
=
j =1
xj ω(2m+2−j )k
j =1
xj e−πij k/(m+1) − eπij k/(m+1)
j =1
m
= −2i
xj sin
j =1
j kπ
m+1
= −2i(S m x)k .
Dividing both sides by −2i and noting −1/(2i) = −i/(2i 2) = i/2, proves the
lemma.
It follows that we can compute the DST of length m by extracting m components
from the DFT of length N = 2m + 2.
11.3.3 The Fast Fourier Transform (FFT)
From a linear algebra viewpoint the fast Fourier transform is a quick way to compute
the matrix- vector product F N y. Suppose N is even. The key to the FFT is a
connection between F N and F N/2 which makes it possible to compute the FFT
of order N as two FFT’s of order N/2. By repeating this process we can reduce the
number of arithmetic operations to compute a DFT from O(N 2 ) to O(N log2 N).
Suppose N is even. The connection between F N and F N/2 involves a permutation matrix P N ∈ RN×N given by
P N = [e1 , e3 , . . . , eN−1 , e2 , e4 , . . . , eN ],
where the ek = (δj,k ) are unit vectors. If A is a matrix with N columns [a 1 , . . . , a N ]
then
AP N = [a 1 , a 3 , . . . , a N−1 , a 2 , a 4 , . . . , a N ],
i.e. post multiplying A by P N permutes the columns of A so that all the odd-indexed
columns are followed by all the even-indexed columns. For example we have from
(11.6)
⎡
1
⎢0
P 4 = [e1 e3 e2 e4 ] = ⎢
⎣0
0
00
01
10
00
⎤
0
0⎥
⎥
0⎦
1
⎡
⎤
1 1 1 1
⎢ 1 −1 −i i ⎥
⎥
F 4P 4 = ⎢
⎣ 1 1 −1 −1 ⎦ ,
1 −1 i −i
11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms
245
where we have indicated a certain block structure of F 4 P 4 . These blocks can be
related to the 2-by-2 matrix F 2 . We define the diagonal scaling matrix D 2 by
D 2 = diag(1, ω4 ) =
1 0
.
0 −i
Since ω2 = exp−2πi/2 = −1 we find
F2 =
1 1
,
1 −1
D2F 2 =
11
,
−i i
and we see that
F 2 D2 F 2
.
F 4P 4 =
F 2 −D 2 F 2
This result holds in general.
Theorem 11.1 (Fast Fourier Transform) If N = 2m is even then
F 2m P 2m
F m Dm F m
=
,
F m −D m F m
(11.7)
where
m−1
2
D m = diag(1, ωN , ωN
, . . . , ωN
).
(11.8)
Proof Fix integers p, q with 1 ≤ p, q ≤ m and set j := p − 1 and k := q − 1.
Since
jk
j
jk
m
2k
k
m
= 1, ω2m
= ωm
, ω2m
= −1, (F m )p,q = ωm , (D m F m )p,q = ω2m ωm ,
ωm
we find by considering elements in the four sub-blocks in turn
(F 2m P 2m )p,q
(F 2m P 2m )p+m,q
(F 2m P 2m )p,q+m
(F 2m P 2m )p+m,q+m
=
=
=
=
j (2k)
ω2m
(j +m)(2k)
ω2m
j (2k+1)
ω2m
(j +m)(2k+1)
ω2m
=
=
=
=
jk
ωm ,
(j +m)k
jk
ωm
= ωm ,
j
jk
ω2m ωm ,
j +m (j +m)k
j
jk
ω2m ωm
=−ω2m ωm .
It follows that the four m-by-m blocks of F 2m P 2m have the required structure.
Using Theorem 11.1 we can carry out the DFT as a block multiplication. Let
y ∈ R2m and set w = P T2m y = [wT1 , wT2 ]T , where
wT1 = [y1, y3 , . . . , y2m−1 ],
wT2 = [y2 , y4 , . . . , y2m ].
246
11 Fast Direct Solution of a Large Linear System
Then
F 2m y = F 2m P 2m P T2m y = F 2m P 2m w
F m DmF m
w1
q1 + q2
=
,
=
w2
q1 − q2
F m −D m F m
where
q 1 = F m w1 ,
and q 2 = D m (F m w2 ).
In order to compute F 2m y we need to compute F m w1 and F m w2 . Thus, by
combining two FFT’s of order m we obtain an FFT of order 2m. If n = 2k then
this process can be applied recursively as in the following MATLAB function:
function z=fftrec(y)
%function z=fftrec(y)
y=y(:);
n=length(y);
if n==1
z=y;
else
q1=fftrec(y(1:2:n-1));
q2=exp(-2*pi*1i/n).^(0:n/2-1)’.*fftrec(y(2:2:n));
z=[q1+q2; q1-q2];
end
Listing 11.2 fftrec
Statement 3 is included so that the input y ∈ Rn can be either a row or column
vector, while the output z is a column vector.
Such a recursive version of FFT is useful for testing purposes, but is much too
slow for large problems. A challenge for FFT code writers is to develop nonrecursive
versions and also to handle efficiently the case where N is not a power of two. We
refer to [14] for further details.
The complexity of the FFT is given by γ N log2 N for some constant γ
independent of N. To show this for the special case when N is a power of two
let xk be the complexity (the number of arithmetic operations) when N = 2k .
Since we need two FFT’s of order N/2 = 2k−1 and a multiplication with the
diagonal matrix D N/2 , it is reasonable to assume that xk = 2xk−1 + γ 2k for
some constant γ independent of k. Since x0 = 0 we obtain by induction on k
that xk = γ k2k . Indeed, this holds for k = 0 and if xk−1 = γ (k − 1)2k−1 then
xk = 2xk−1 + γ 2k = 2γ (k − 1)2k−1 + γ 2k = γ k2k . Reasonable implementations
of FFT typically have γ ≈ 5, see [14].
The efficiency improvement using the FFT to compute the DFT is spectacular for
large N. The direct multiplication F N y requires O(8n2 ) arithmetic operations since
11.4 Exercises Chap. 11
247
complex arithmetic is involved. Assuming that the FFT uses 5N log2 N arithmetic
operations we find for N = 220 ≈ 106 the ratio
8N 2
≈ 84000.
5N log2 N
Thus if the FFT takes one second of computing time and the computing time is
proportional to the number of arithmetic operations then the direct multiplication
would take something like 84000 seconds or 23 hours.
11.3.4 A Poisson Solver Based on the FFT
We now have all the ingredients to compute the matrix products SA and AS using
FFT’s of order 2m + 2 where m is the order of S and A. This can then be used
for quick computation of the exact solution V of the discrete Poisson problem in
Algorithm 11.1. We first compute H = SF using Lemma 11.1 and m FFT’s, one
for each of the m columns of F . We then compute G = H S by m FFT’s, one for
each of the rows of H . After X is determined we compute Z = SX and V = ZS
by another 2m FFT’s. In total the work amounts to 4m FFT’s of order 2m + 2. Since
one FFT requires O(γ (2m + 2) log2 (2m + 2)) arithmetic operations the 4m FFT’s
amount to
8γ m(m + 1) log2 (2m + 2) ≈ 8γ m2 log2 m = 4γ n log2 n,
where n = m2 is the size of the linear system Ax = b we would be solving
if Cholesky factorization was used. This should be compared to the O(8n3/2 )
arithmetic operations used in Algorithm 11.1 requiring 4 straightforward matrix
multiplications with S. What is faster will depend heavily on the programming of
the FFT and the size of the problem. We refer to [14] for other efficient ways to
implement the DST.
11.4 Exercises Chap. 11
11.4.1 Exercises Sect. 11.3
Exercise 11.1 (Fourier Matrix) Show that the Fourier matrix F 4 is symmetric,
but not Hermitian.
Exercise 11.2 (Sine Transform as Fourier Transform) Verify Lemma 11.1
directly when m = 1.
248
11 Fast Direct Solution of a Large Linear System
Exercise 11.3 (Explicit Solution of the Discrete Poisson Equation) Show that
the exact solution of the discrete Poisson equation (10.5) can be written V =
(vi,j )m
i,j =1 , where
1
vij =
(m + 1)4
m
m
m
m
p=1 r=1 k=1 l=1
sin
ipπ kpπ lrπ m+1 sin m+1 sin m+1
! "
! "2
pπ 2
rπ
sin 2(m+1)
+ sin 2(m+1)
m+1
sin
j rπ fk,l .
Exercise 11.4 (Improved Version of Algorithm 11.1) Algorithm 11.1 involves
multiplying a matrix by S four times. In this problem we show that it is enough
to multiply by S two times. We achieve this by diagonalizing only the second T
in T V + V T = h2 F . Let D = diag(λ1 , . . . , λm ), where λj = 4 sin2 (j πh/2),
j = 1, . . . , m.
a) Show that
T X + XD = C, where X = V S, and C = h2 F S.
b) Show that
(T + λj I )x j = cj
j = 1, . . . , m,
(11.9)
where X = [x 1 , . . . , x m ] and C = [c1 , . . . , cm ]. Thus we can find X by solving
m linear systems, one for each of the columns of X. Recall that a tridiagonal
m × m system can be solved by Algorithms 2.1 and 2.2 in 8m − 7 arithmetic
operations. Give an algorithm to find X which only requires O(δm2 ) arithmetic
operations for some constant δ independent of m.
c) Describe a method to compute V which only requires O(4m3 ) = O(4n3/2 )
arithmetic operations.
d) Describe a method based on the fast Fourier transform which requires
O(2γ n log2 n) where γ is the same constant as mentioned at the end of the
last section.
Exercise 11.5 (Fast Solution of 9 Point Scheme) Consider the equation
1
T V + V T − T V T = h2 μF ,
6
that was derived in Exercise 10.8 for the 9-point scheme. Define the matrix X by
V = SXS = (xj,k ) where V is the solution of (10.22). Show that
1
DX + XD − DXD = 4h4 G, where G = SμF S,
6
11.4 Exercises Chap. 11
249
where D = diag(λ1 , . . . , λm ), with λj = 4 sin2 (j πh/2), j = 1, . . . , m, and that
xj,k =
h4 gj,k
σj + σk −
2
3 σj σk
, where σj = sin2 (j πh)/2 for j, k = 1, 2, . . . , m.
Show that σj + σk − 23 σj σk > 0 for j, k = 1, 2, . . . , m. Conclude that the matrix
A in Exercise 10.8 b) is symmetric positive definite and that (10.21) always has a
solution V .
Exercise 11.6 (Algorithm for Fast Solution of 9 Point Scheme) Derive an algorithm for solving (10.21) which for large m requires essentially the same number of
operations as in Algorithm 11.1. (We assume that μF already has been formed).
Exercise 11.7 (Fast Solution of Biharmonic Equation) For the biharmonic problem we derived in Exercise 10.9 the equation
T 2 U + 2T U T + U T 2 = h4 F .
Define the matrix X = (xj,k ) by U = SXS where U is the solution of (10.25).
Show that
D 2 X + 2DXD + XD 2 = 4h6 G, where G = SF S,
and that
xj,k =
h6 gj,k
, where σj = sin2 ((j πh)/2) for j, k = 1, 2, . . . , m.
4(σj + σk )2
Exercise 11.8 (Algorithm for Fast Solution of Biharmonic Equation) Use Exercise 11.7 to derive an algorithm
function U=simplefastbiharmonic(F)
which requires only O(δn3/2) operations to find U in Problem 10.9. Here δ is some
constant independent of n.
Exercise 11.9 (Check Algorithm for Fast Solution of Biharmonic Equation) In
Exercise 11.8 compute the solution U corresponding to F = ones(m,m). For
some small m’s check that you get the same solution obtained by solving the
standard form Ax = b in (10.25). You can use x = A\b for solving Ax = b.
2
Use F(:) to vectorize a matrix and reshape(x,m,m) to turn a vector x ∈ Rm
into an m×m matrix. Use the MATLAB command surf(U) for plotting U for, say,
m = 50. Compare the result with Exercise 11.8 by plotting the difference between
both matrices.
250
11 Fast Direct Solution of a Large Linear System
Exercise 11.10 (Fast Solution of Biharmonic Equation Using 9 Point Rule)
Repeat Exercises 10.9, 11.8 and 11.9 using the nine point rule (10.21) to solve the
system (10.24).
11.5 Review Questions
11.5.1 Consider the Poisson matrix.
• What is the bandwidth of its Cholesky factor?
• approximately how many arithmetic operations does it take to find the
Cholesky factor?
• same question for block LU,
• same question for the fast Poisson solver with and without FFT.
11.5.2 What is the discrete sine transform and discrete Fourier transform of a
vector?
Part V
Iterative Methods for Large Linear
Systems
Gaussian elimination, LU and Cholesky factorization are direct methods. In absence
of rounding errors they are used to find the exact solution of a linear system using
a finite number of arithmetic operations. In an iterative method we start with an
approximation x 0 to the exact solution x and then compute a sequence {x k } such that
hopefully x k → x. Iterative methods are mainly used for large sparse systems, i.e.,
where many of the elements in the coefficient matrix are zero. The main advantages
of iterative methods are reduced storage requirements and ease of implementation.
In an iterative method the main work in each iteration is a matrix times vector
multiplication, an operation which often does not need storing the matrix, not even
in sparse form.
In this part we consider the iterative methods of Jacobi, Gauss-Seidel, successive
over relaxation (SOR), steepest descent and conjugate gradients.
Chapter 12
The Classical Iterative Methods
In this chapter we consider the classical iterative methods of Richardson, Jacobi,
Gauss-Seidel and an accelerated version of Gauss-Seidel’s method called successive
overrelaxation (SOR). David Young developed in his thesis a beautiful theory
describing the convergence rate of SOR, see [22].
We give the main points of this theory specialized to the discrete Poisson matrix.
With a careful choice of an acceleration parameter the amount of work using SOR
on the discrete Poisson problem is the same as for the fast Poisson solver without
FFT (cf. Algorithm 11.1). Moreover, SOR is not restricted to constant coefficient
methods on a rectangle. However, to obtain fast convergence using SOR it is
necessary to have a good estimate for an acceleration parameter.
For convergence we need to study convergence of powers of matrices. In this
chapter we only use matrix norms which are consistent on Cn×n and subordinate to
a vector norm on Cn , (cf. Definitions 8.4 and 8.5).
12.1 Classical Iterative Methods; Component Form
We start with an example showing how a linear system can be solved using an
iterative method.
Example 12.1 (Iterative Methods on a Special
y 2 × 12Matrix) Solving for the diag2 −1
onal elements, the linear system −1
z = 1 can be written in component
2
form as y = (z + 1)/2 and z = (y + 1)/2. Starting with y0 , z0 we generate two
sequences {yk } and {zk } using the difference equations yk+1 = (zk + 1)/2 and
zk+1 = (yk + 1)/2. This is an example of Jacobi’s method. If y0 = z0 = 0 then
we find y1 = z1 = 1/2 and in general yk = zk = 1 − 2−k for k = 0, 1, 2, 3, . . ..
The iteration converges to the exact solution [1, 1]T , and the error is halved in each
iteration.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_12
253
254
12 The Classical Iterative Methods
We can improve the convergence rate by using the most current approximation
in each iteration. This leads to Gauss-Seidel’s method: yk+1 = (zk + 1)/2 and
zk+1 = (yk+1 + 1)/2. If y0 = z0 = 0 then we find y1 = 1/2, z1 = 3/4, y2 = 7/8,
z2 = 15/16, and in general yk = 1 − 2 · 4−k and zk = 1 − 4−k for k = 1, 2, 3, . . ..
The error is now reduced by a factor 4 in each iteration.
Consider the general case. Suppose A ∈ Cn×n is nonsingular and b ∈ Cn .
Suppose we know an approximation x k = [x k (1), . . . , x k (n)]T to the exact solution
x of Ax = b. We need to assume that the rows are ordered so that A has nonzero
diagonal elements. Solving the ith equation of Ax = b for x(i), we obtain a fixedpoint form of Ax = b
x(i) = −
i−1
n
aij x(j ) −
j =1
aij x(j ) + bi /aii ,
i = 1, 2, . . . , n.
(12.1)
j =i+1
1. In Jacobi’s method (J method) we substitute x k into the right hand side of
(12.1) and compute a new approximation by
x k+1 (i) = −
i−1
n
aij x k (j ) + bi /aii , for i = 1, 2, . . . , n.
aij x k (j ) −
j =1
j =i+1
(12.2)
2. Gauss-Seidel’s method (GS method) is a modification of Jacobi’s method,
where we use the new x k+1 (i) immediately after it has been computed.
x k+1 (i) = −
i−1
n
aij x k (j ) + bi /aii , for i = 1, 2, . . . , n.
aij x k+1 (j ) −
j =1
j =i+1
(12.3)
3. The Successive overrelaxation method (SOR method) is obtained by introducing an acceleration parameter 0 < ω < 2 in the GS method. We write
x(i) = ωx(i) + (1 − ω)x(i) and this leads to the method
x k+1 (i) = ω −
i−1
n
aij x k+1 (j ) −
j =1
aij x k (j ) + bi /aii + (1 − ω)x k (i).
j =i+1
(12.4)
The SOR method reduces to the Gauss-Seidel method for ω = 1. Denoting the
gs
gs
right hand side of (12.3) by x k+1 we can write (12.4) as x k+1 = ωx k+1 + (1 −
ω)x k , and we see that x k+1 is located on the straight line passing through the
gs
two points x k+1 and x k . The restriction 0 < ω < 2 is necessary for convergence
12.1 Classical Iterative Methods; Component Form
255
(cf. Theorem 12.6). Normally, the best results are obtained for the relaxation
parameter ω in the range 1 ≤ ω < 2 and then x k+1 is computed by linear
gs
extrapolation, i.e., it is not located between x k+1 and x k .
4. We mention also briefly the symmetric successive overrelaxation method SSOR.
One iteration in SSOR consists of two SOR sweeps. A forward SOR sweep
(12.4), computing an approximation denoted x k+1/2 instead of x k+1 , is followed
by a backward SOR sweep computing
i−1
x k+1 (i) = ω −
aij x k+1/2 (j ) −
n
j =1
j =i+1
aij x k+1 (j ) + bi /aii + (1 − ω)x k+1/2 (i)
(12.5)
in the order i = n, n − 1, . . . 1. The method is slower and more complicated
than the SOR method. Its main use is as a symmetric preconditioner. For if A
is symmetric then SSOR combines the two SOR steps in such a way that the
resulting iteration matrix is similar to a symmetric matrix. We will not discuss
this method any further here and refer to Sect. 13.6 for an alternative example of
a preconditioner.
We will refer to the J, GS and SOR methods as the classical (iterative) methods.
12.1.1 The Discrete Poisson System
Consider the classical methods applied to the discrete Poisson matrix A ∈ Rn×n
given by (10.8). Let n = m2 and set h = 1/(m + 1). In component form the linear
system Ax = b can be written (cf. (10.4))
4v(i, j )−v(i−1, j )−v(i+1, j )−v(i, j−1)−v(i, j+1) = h2 fi,j ,
i, j =1, . . . , m,
with homogenous boundary conditions also given in (10.4). Solving for v(i, j ) we
obtain the fixed point form
v(i, j ) = v(i−1, j ) + v(i+1, j ) + v(i, j−1) + v(i, j+1) + ei,j /4,
(12.6)
where ei,j := fi,j /(m + 1)2 . The J, GS , and SOR methods take the form
J : v k+1 (i, j ) = v k (i−1, j ) + v k (i, j−1) + v k (i+1, j ) + v k (i, j+1)
+ e(i, j ) /4
GS : v k+1 (i, j ) = v k+1 (i−1, j ) + v k+1 (i, j−1) + v k (i+1, j ) + v k (i, j+1)
+ e(i, j ) /4
256
12 The Classical Iterative Methods
Table 12.1 The number of iterations kn to solve the discrete Poisson problem with n unknowns
using the methods of Jacobi, Gauss-Seidel, and SOR (see text) with a tolerance 10−8
k100
385
194
35
J
GS
SOR
k2500
8386
4194
164
k10 000
k40 000
k160 000
324
645
1286
SOR : v k+1 (i, j ) = ω v k+1 (i−1, j ) + v k+1 (i, j − 1) + v k (i + 1, j )
+ v k (i, j + 1) + e(i, j ) /4 + (1 − ω)v k (i, j ).
(12.7)
We note that for GS and SOR we have used the natural ordering, i.e., (i1 , j1 ) <
(i2 , j2 ) if and only if j1 ≤ j2 and i1 < i2 if j1 = j2 . For the J method any ordering
can be used.
In Algorithm 12.1 we give a MATLAB program to test the convergence of
Jacobi’s method on the discrete Poisson problem. We carry out Jacobi iterations
on the linear system (12.6) with F = (fi,j ) ∈ Rm×m , starting with V 0 = 0 ∈
R(m+2)×(m+2) . The output is the number of iterations k, to obtain V (k) − U M :=
maxi,j |vij − uij | < tol. Here [uij ] ∈ R(m+2)×(m+2) is the ”exact" solution of (12.6)
computed using the fast Poisson solver in Algorithm 11.1. We set k = K + 1
if convergence is not obtained in K iterations. In Table 12.1 we show the output
k = kn from this algorithm using F = ones(m, m) for m = 10, 50, K = 104 , and
tol = 10−8 . We also show the number of iterations for Gauss-Seidel
and SORπwith
a value of ω known as the optimal acceleration parameter ω∗ := 2/ 1 + sin( m+1
) .
We will derive this value later.
function k=jdp(F,K,tol)
% k=jdp(F,K,tol)
m=length(F); U=fastpoisson(F);
V=zeros(m+2,m+2); E=F/(m+1)^2;
for k=1:K
V(2:m+1,2:m+1)=(V(1:m,2:m+1)+V(3:m+2,2:m+1)...
+V(2:m+1,1:m)+V(2:m+1,3:m+2)+E)/4;
if max(max(abs(V-U)))<tol, return
end
end
k=K+1;
end
Listing 12.1 jdp
12.2 Classical Iterative Methods; Matrix Form
257
For the GS and SOR methods we have used Algorithm 12.2. This is the analog
of Algorithm 12.1 using SOR instead of J to solve the discrete Poisson problem. w
is an acceleration parameter with 0 < w < 2. For w = 1 we obtain Gauss-Seidel’s
method.
function k=sordp(F,K,w,tol)
% k=sordp(F,K,w,tol)
m=length(F); U=fastpoisson(F); V=zeros(m+2,m+2); E=F/(m+1)^2;
for k=1:K
for j=2:m+1
for i=2:m+1
V(i,j)=w*(V(i-1,j)+V(i+1,j)+V(i,j-1)...
+V(i,j+1)+E(i-1,j-1))/4+(1-w)*V(i,j);
end
end
if max(max(abs(V-U)))<tol, return
end
end
k=K+1;
end
Listing 12.2 sordp
We make several remarks about these programs and the results in Table 12.1.
1. The rate (speed) of convergence is quite different for the four methods. The J
and GS methods converge, but rather slowly. The J method needs about twice as
many iterations as the GS method. The improvement using the SOR method with
optimal ω is spectacular.
2. We show in Sect. 12.3.4 that the number of iterations
√ kn for a size n problem is
kn = O(n) for the J and GS method and kn = O( n) for SOR with√optimal ω.
The choice of tol will only influence the constants multiplying n or n.
3. From (12.1.1) it follows that each iteration requires O(n) arithmetic operations.
Thus the number of arithmetic operations to achieve a given tolerance is
O(kn × n). Therefore the number of arithmetic operations for the J and GS
method is O(n2 ), while it is only O(n3/2 ) for the SOR method with optimal
ω. Asymptotically, for J and GS this is the same as using banded Cholesky, while
SOR competes with the fast method (without FFT).
4. We do not need to store the coefficient matrix so the storage requirements for
these methods on the discrete Poisson problem is O(n), asymptotically the same
as for the fast methods.
5. Jacobi’s method has the advantage that it can be easily parallelized.
12.2 Classical Iterative Methods; Matrix Form
To study convergence we need matrix formulations of the classical methods.
258
12 The Classical Iterative Methods
12.2.1 Fixed-Point Form
In general we can construct an iterative method by choosing a nonsingular matrix
M and write Ax = b in the equivalent form
Mx = (M − A)x + b.
(12.8)
The matrix M is known as a splitting matrix.
The corresponding iterative method is given by
Mx k+1 = (M − A)x k + b
(12.9)
or
G = I − M −1 A,
x k+1 := Gx k + c,
, c = M −1 b.
(12.10)
This is known as a fixed-point iteration. Starting with x 0 this defines a sequence
{x k } of vectors in Cn . For a general G ∈ Cn×n and c ∈ Cn a solution of x = Gx + c
is called a fixed-point. The fixed-point is unique if I − G is nonsingular.
If limk→∞ x k = x for some x ∈ Cn then x is a fixed point since
x = lim x k+1 = lim (Gx k + c) = G lim x k + c = Gx + c.
k→∞
k→∞
k→∞
The matrix M can also be interpreted as a preconditioning matrix. We first write
Ax = b in the equivalent form M −1 Ax = M −1 b or x = x − M −1 Ax + M −1 b.
This again leads to the iterative method (12.10), and M is chosen to reduce the
condition number of A.
12.2.2 The Splitting Matrices for the Classical Methods
Different choices of M in (12.9) lead to different iterative methods. We now derive
M for the classical methods. For GS and SOR it is convenient to write A as a sum
of three matrices, A = D − AL − AR , where −AL , D, and −AR are the lower,
diagonal, and upper part of A, respectively. Thus D := diag(a11 , . . . , ann ),
⎤
⎡
0
⎢−a2,1 0
⎢
AL := ⎢ . .
..
..
⎣ ..
.
−an,1 · · · −an,n−1
⎥
⎥
⎥,
⎦
0
⎡
0 −a1,2
⎢ ..
⎢
.
AR := ⎢
⎣
⎤
· · · −a1,n
⎥
..
..
⎥
.
.
⎥.
0 −an−1,n ⎦
0
(12.11)
12.2 Classical Iterative Methods; Matrix Form
259
Theorem 12.1 (Splitting Matrices for J, GS and SOR) The splitting matrices
M J , M 1 and M ω for the J, GS and SOR methods are given by
M J = D,
M ω = ω−1 D − AL .
M 1 = D − AL ,
(12.12)
Proof To find M we write the methods in the form (12.9) where the coefficient of
b is equal to one. Moving aii to the left hand side of the Jacobi iteration (12.2) we
obtain the matrix form Dx k+1 = (D − A)x k + b showing that M J = D.
For the SOR method a matrix form is
Dx k+1 = ω AL x k+1 + AR x k + b + (1 − ω)Dx k .
(12.13)
Dividing
both sides by ω and moving AL x k+1 to the left hand side this takes the
form ω−1 D − AL x k+1 = AR x k + b + (ω−1 − 1)Dx k showing that M ω =
ω−1 D − AL . We obtain M 1 by letting ω = 1 in M ω .
Example 12.2 (Splitting Matrices) For the system
1
2 −1 x1
=
x2
1
−1 2
we find
AL =
0 0
,
1 0
D=
20
,
02
AR =
0 1
,
0 0
and
MJ = D =
20
,
02
M ω = ω−1 D − AL =
−1
0
2ω
.
−1 2ω−1
The iteration matrix Gω = I − M −1
ω A is given by
10
2 −1
ω/2 0
1−ω
ω/2
Gω =
.
− 2
=
01
ω /4 ω/2 −1 2
ω(1 − ω)/2 1 − ω + ω2 /4
(12.14)
For the J and GS method we have
0 1/2
−1
GJ = I − D A =
,
1/2 0
0 1/2
G1 =
.
0 1/4
(12.15)
260
12 The Classical Iterative Methods
We could have derived these matrices directly from the component form of the
iteration. For example, for the GS method we have the component form
x k+1 (1) =
1
1
x k (2) + ,
2
2
x k+1 (2) =
1
1
x k+1 (1) + .
2
2
Substituting the value of x k+1 (1) from the first equation into the second equation
we find
x k+1 (2) =
1
1
1 1
1
3
( x k (2) + ) + = x k (2) + .
2 2
2
2
4
4
Thus
x k+1 =
x k+1 (1)
0 1/2 x k (1)
1/2
=
+
= G1 x k + c.
x k+1 (2)
0 1/4 x k (2)
3/4
12.3 Convergence
For Newton’s method the choice of starting value is important. This is not the case
for methods of the form x k+1 := Gx k + c.
In the following we assume that G ∈ Cn×n , c ∈ Cn and I − G is nonsingular.
We let x ∈ Cn be the unique fixed point satisfying x = Gx + c.
Definition 12.1 (Convergence of x k+1 := Gx k + c) We say that the iterative
method x k+1 := Gx k + c converges if the sequence {x k } converges for any starting
vector x 0 .
We have the following necessary and sufficient condition for convergence:
Theorem 12.2 (Convergence of an Iterative Method) The iterative method
x k+1 := Gx k + c converges if and only if limk→∞ Gk = 0.
Proof Suppose limk→∞ Gk = 0, and let x be the unique fixed point. We subtract
x = Gx + c from x k+1 = Gx k + c. The vector c cancels and we obtain x k+1 − x =
G(x k − x). By induction on k
x k − x = Gk (x 0 − x),
k = 0, 1, 2, . . .
(12.16)
Clearly x k − x → 0 if Gk → 0 and the method converges. For the converse we let
x be as before and choose x 0 − x = ej , the j th unit vector for j = 1, . . . , n. Since
x k − x → 0 for any x 0 we have Gk ej → 0 for j = 1, . . . , n which implies that
limk→∞ Gk = 0.
Theorem 12.3 (Sufficient Condition for Convergence) If G < 1 then the
iteration x k+1 = Gx k + c converges.
12.3 Convergence
261
Proof We have
x k − x = Gk (x 0 − x) ≤ Gk x 0 − x ≤ Gk x 0 − x → 0,
k → ∞.
A necessary and sufficient condition for convergence involves the eigenvalues of
G. We define the spectral radius of a matrix A ∈ Cn×n as the maximum absolute
value of its eigenvalues.
ρ(A) := max |λ|.
λ∈σ (A)
(12.17)
Theorem 12.4 (When Does an Iterative Method Converge?) Suppose G ∈ Cn×n
with I − G nonsingular and let c ∈ Cn . The iteration x k+1 = Gx k + c converges if
and only if ρ(G) < 1.
We will prove this theorem using Theorem 12.10 in Sect. 12.4.
12.3.1 Richardson’s Method
The Richardson’s method (R method) is defined by
x k+1 = x k + α b − Ax k .
(12.18)
Here we pick an acceleration parameter α and compute a new approximation by
adding a multiple of the residual vector r k := b − Ax k . Note that we do not need
the assumption of nonzero diagonal elements. Richardson considered this method
in 1910.
We will assume that α is real. If all eigenvalues of A have positive real parts then
the R method converges provided α is positive and sufficiently small. We show this
result for positive eigenvalues and leave the more general case to Exercise 12.2.
Theorem 12.5 (Convergence of Richardson’s Method) If A haspositiveeigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn > 0 then the R method given by x k+1 = I −αA x k +b
converges if and only if 0 < α < 2/λ1 . Moreover,
κ −1
, α ∈ R \ {αo },
κ +1
2
αo :=
.
λ1 + λn
ρ(I − αA) > ρ(I − αo A) =
λ1
κ :=
,
λn
(12.19)
262
12 The Classical Iterative Methods
Fig. 12.1 The functions α → |1 − αλ1 | and α → |1 − αλn |
Proof The eigenvalues of I − αA are 1 − αλj ,
ρα := ρ(I − αA) := max|1 − αλj | =
j
j = 1, . . . , n. We have
⎧
⎪
⎪
⎨1 − αλ1 ,
1 − αλn ,
⎪
⎪
⎩αλ − 1,
1
if α ≤ 0
if 0 < α ≤ αo
if α > αo ,
see Fig. 12.1. Clearly 1 − αλn = αλ1 − 1 for α = αo and
ραo = αo λ1 − 1 =
λ1 − λn
κ −1
< 1.
=
λ1 + λn
κ +1
We have ρα < 1 if and only if α > 0 and αλ1 − 1 < 1 showing convergence if
and only if 0 < α < 2/λ1 and ρα > ραo for α ≤ 0 and α ≥ 2/λ1 . Finally, if
0 < α < αo then ρα = 1 − αλn > 1 − αo λn = ραo and if αo < α < 2/λ1 then
ρα = αλ1 − 1 > αo λ1 − 1 = ραo .
For a positive definite matrix we obtain
Corollary 12.1 (Rate of Convergence for the R Method) Suppose A is positive
definite with largest and
smallest
eigenvalue λmax and λmin , respectively. Richardson’s method x k+1 = I − αA x k + b converges if and only if 0 < α < 2/λmax .
2
With α = αo := λmax +λ
we have the error estimate
min
x k − x2 ≤
κ − 1 k
x 0 − x2 ,
κ +1
k = 0, 1, 2, . . .
(12.20)
where κ := λmax /λmin is the spectral condition number of A.
Proof The spectral norm 2 is consistent and therefore x k − x2 ≤ I −
αo Ak2 x 0 − x2 . But for a positive definite matrix the spectral norm is equal to
the spectral radius and the result follows form (12.19).
12.3 Convergence
263
12.3.2 Convergence of SOR
The condition ω ∈ (0, 2) is necessary for convergence of the SOR method.
Theorem 12.6 (Necessary Condition for Convergence of SOR) Suppose A ∈
Cn×n is nonsingular with nonzero diagonal elements. If the SOR method applied to
A converges then ω ∈ (0, 2).
Proof We have (cf. (12.13)) Dx k+1 = ω AL x k+1 + AR x k + b + (1 − ω)Dx k
or x k+1 = ω Lx k+1 + Rx k + D −1 b + (1 − ω)x k , where L := D −1 AL and
R := D −1 AR . Thus (I − ωL)x k+1 = (ωR + (1 − ω)I )x k + ωD −1 b so the
following form of the iteration matrix is obtained
Gω = (I − ωL)−1 ωR + (1 − ω)I .
(12.21)
We next compute the determinant of Gω . Since I − ωL is lower triangular with
ones on the diagonal, the same holds for the inverse by Lemma 2.5, and therefore
the determinant of this matrix is equal to one. The matrix ωR + (1 − ω)I is upper
triangular with 1 − ω on the diagonal and therefore its determinant equals (1 − ω)n .
It follows that det(Gω ) = (1 − ω)n . Since the determinant of a matrix equals the
product of its eigenvalues we must have |λ| ≥ |1 − ω| for at least one eigenvalue λ
of Gω and we conclude that ρ(Gω ) ≥ |ω − 1|. But then ρ(Gω ) ≥ 1 if ω is not in
the interval (0, 2) and by Theorem 12.4 SOR diverges.
The SOR method always converges for a positive definite matrix.
Theorem 12.7 (SOR on Positive Definite Matrix) SOR converges for a positive
definite matrix A ∈ Rn×n if and only if 0 < ω < 2. In particular, Gauss-Seidel’s
method converges for a positive definite matrix.
Proof By Theorem 12.6 convergence implies 0 < ω < 2. Suppose now 0 < ω < 2
and let (λ, x) be an eigenpair for Gω . Note that λ and x can be complex. We need
to show that |λ| < 1. The following identity will be shown below:
ω−1 (2 − ω)|1 − λ|2 x ∗ Dx = (1 − |λ|2 )x ∗ Ax,
(12.22)
where D := diag(a11 , . . . , ann ). Now x ∗ Ax and x ∗ Dx are positive for all nonzero
x ∈ Cn since a positive definite matrix has positive diagonal elements aii =
eTi Aei > 0. It follows that the left hand side of (12.22) is nonnegative and then
the right hand side must be nonnegative as well. This implies |λ| ≤ 1. If |λ| = 1
then (12.22) implies that λ = 1 and it remains to show that this is not possible. By
(12.10) and (12.12) we have
−1
−1
Gω x = (I − M −1
ω A)x = x − (ω D − AL ) Ax
264
12 The Classical Iterative Methods
and the eigenpair equation Gω x = λx can be written x −(ω−1 D −AL )−1 Ax = λx
or
Ax = (ω−1 D − AL )y,
y := (1 − λ)x.
(12.23)
Now Ax = 0 implies that λ = 1.
To prove equation (12.22) we first show that
Ey = λAx,
E := ω−1 D + AR − D = ω−1 D − AL − A.
(12.24)
The second equality follows immediately from A = D − AL − AR . By (12.23) and
(12.24) we have
Ey = (ω−1 D − AL − A)y = Ax − Ay = Ax − (1 − λ)Ax = λAx.
Again using (12.23), (12.24) and adding (Ax)∗ y = y ∗ (ω−1 D − AL )∗ y =
y ∗ (ω−1 D − AR )y and y ∗ (λAx) = y ∗ Ey = y ∗ (ω−1 D + AR − D)y we find
(Ax)∗ y + y ∗ (λAx) = y ∗ (ω−1 D − AR )y + y ∗ (ω−1 D + AR − D)y
= y ∗ (2ω−1 − 1)Dy = ω−1 (2 − ω)|1 − λ|2 x ∗ Dx.
Since (Ax)∗ = x ∗ A∗ = x ∗ A, y := (1 − λ)x and y ∗ = (1 − λ)x ∗ this also equals
(Ax)∗ y + y ∗ (λAx) = (1 − λ)x ∗ Ax + λ(1 − λ)x ∗ Ax = (1 − |λ|2 )x ∗ Ax,
and (12.22) follows.
12.3.3 Convergence of the Classical Methods for the Discrete
Poisson Matrix
We know the eigenvalues of the discrete Poisson matrix A given by (10.8) and
we can use this to estimate the number of iterations necessary to achieve a given
accuracy for the various methods.
Recall that by (10.15) the eigenvalues λj,k of A are
λj,k = 4 − 2 cos (j πh) − 2 cos (kπh),
j, k = 1, . . . , m, h = 1/(m + 1).
Consider first the J method. The matrix GJ = I −D −1 A = I −A/4 has eigenvalues
1
1
1
μj,k = 1 − λj,k = cos(j πh) + cos(kπh),
4
2
2
j, k = 1, . . . , m.
(12.25)
12.3 Convergence
265
It follows that ρ(GJ ) = cos(πh) < 1. Since GJ is symmetric it is normal, and the
spectral norm is equal to the spectral radius (cf. Theorem 8.4). We obtain
x k −x2 ≤ GJ k2 x 0 −x2 = cosk (πh)x 0 −x2 ,
k = 0, 1, 2, . . .
(12.26)
The R method given by x k+1 = x k + αr k with α = 2/(λmax + λmin ) = 1/4
is the same as the J-method so (12.26) holds in this case as well. This also follows
from Corollary 12.1 with κ given by (10.20).
For the SOR method it is possible to explicitly determine ρ(Gω ) for any ω ∈
(0, 2). The following result will be shown in Sect. 12.5.
Theorem 12.8 (The Spectral Radius of SOR Matrix) Consider the SOR iteration
(12.1.1), with the natural ordering. The spectral radius of Gω is
⎧ ⎨ 1 ωβ + (ωβ)2 − 4(ω − 1) 2 ,
4
ρ(Gω ) =
⎩ω − 1,
for 0 < ω ≤ ω∗ ,
for ω∗ < ω < 2,
(12.27)
where β := ρ(GJ ) = cos(πh) and
ω∗ :=
2
> 1.
1 + 1 − β2
(12.28)
Moreover,
ρ(Gω ) > ρ(Gω∗ ) for ω ∈ (0, 2) \ {ω∗ }.
(12.29)
A plot of ρ(Gω ) as a function of ω ∈ (0, 2) is shown in Fig. 12.2 for n = 100
(lower curve) and n = 2500 (upper curve). As ω increases the spectral radius of Gω
decreases monotonically to the minimum ω∗ . Then it increases linearly to the value
one for ω = 2. We call ω∗ the optimal relaxation parameter.
For the discrete Poisson problem we have β = cos(πh) and it follows from
(12.27), (12.28) that
ω∗ =
2
,
1 + sin(πh)
ρ(Gω∗ ) = ω∗ − 1 =
1 − sin(πh)
,
1 + sin(πh)
h=
1
.
m+1
(12.30)
Letting ω = 1 in (12.27) we find ρ(G1 ) = β 2 = ρ(GJ )2 = cos2 (πh) for the GS
method. Thus, for the discrete Poisson problem the J method needs twice as many
iterations as the GS method for a given accuracy.
The values of ρ(GJ ), ρ(G1 ), and ρ(Gω∗ ) = ω∗ − 1 are shown in Table 12.2 for
n = 100 and n = 2500. We also show the smallest integer kn such that ρ(G)kn ≤
10−8 . This is an estimate for the number of iteration needed to obtain an accuracy
of 10−8 . These values are comparable to the exact values given in Table 12.1.
266
12 The Classical Iterative Methods
1.0
0.9
0.8
0.7
0.6
0.5
0.0
0.5
1.0
1.5
2.0
Fig. 12.2 ρ(Gω ) with ω ∈ [0, 2] for n = 100, (lower curve) and n = 2500 (upper curve)
Table 12.2 Spectral radial
for GJ , G1 , Gω∗ and the
smallest integer kn such that
ρ(G)kn ≤ 10−8
J
GS
SOR
n=100
0.959493
0.920627
0.56039
n = 2500
0.998103
0.99621
0.88402
k100
446
223
32
k2500
9703
4852
150
12.3.4 Number of Iterations
Consider next the rate of convergence of the iteration x k+1 = Gx k + c. We like to
know how fast the iterative method converges. Recall that x k − x = Gk (x 0 − x).
For k sufficiently large
x k − x ≤ Gk x 0 − x ≈ ρ(G)k x 0 − x.
For the last formula we apply Theorem 12.13 which says that limk→∞ Gk 1/ k =
ρ(G). For Jacobi’s method and the spectral norm we have GkJ 2 = ρ(GJ )k (cf.
(12.26)).
For fast convergence we should use a G with small spectral radius.
Lemma 12.1 (Number of Iterations) Suppose ρ(G) = 1 − η for some 0 < η < 1
and let s ∈ N. Then
k̃ :=
s log(10)
η
(12.31)
is an estimate for the smallest number of iterations k so that ρ(G)k ≤ 10−s .
12.3 Convergence
267
Proof The estimate k̃ is an approximate solution of the equation ρ(G)k = 10−s .
Thus, since − log(1 − η) ≈ η when η is small
k=−
s log (10)
s log(10)
≈
= k̃.
log(1 − η)
η
The following estimates are obtained. They agree with those we found numerically in Sect. 12.1.1.
• R and J: ρ(GJ ) = cos(πh) = 1 − η, η = 1 − cos(πh) = 12 π 2 h2 + O(h4 ) =
π2
−2
2 /n + O(n ).
Thus,
k̃n =
2 log(10)s
n + O(n−1 ) = O(n).
π2
• GS: ρ(G1 ) = cos2 (πh) = 1 − η, η = 1 − cos2 (πh) = sin2 πh = π 2 h2 +
O(h4 ) = π 2 /n + O(n−2 ). Thus,
k̃n =
• SOR: ρ(Gω∗ ) =
1−sin(πh)
1+sin(πh)
k̃n =
log(10)s
n + O(n−1 ) = O(n).
π2
= 1 − 2πh + O(h2 ). Thus,
√
log(10)s √
n + O(n−1/2 ) = O( n).
2π
We note that
1. The convergence depends on the behavior of the powers Gk as k increases.
The matrix M should be chosen so that all elements in Gk converge quickly
to zero and such that the linear system (12.9) is easy to solve for x k+1 . These
are conflicting demands. M should be an approximation to A to obtain a G with
small elements, but then (12.9) might not be easy to solve for x k+1 .
2. The convergence limk→∞ Gk 1/ k = ρ(G) can be quite slow (cf. Exercise 12.15).
12.3.5 Stopping the Iteration
In Algorithms 12.1 and 12.2 we had access to the exact solution and could stop the
iteration when the error was sufficiently small in the infinity norm. The decision
when to stop is obviously more complicated when the exact solution is not known.
One possibility is to choose a vector norm, keep track of x k+1 − x k , and stop
268
12 The Classical Iterative Methods
when this number is sufficiently small. The following result indicates that x k − x
can be quite large if G is close to one.
Lemma 12.2 (Be Careful When Stopping) If x k = Gx k−1 + c, x = Gx + c and
G < 1 then
x k − x k−1 ≥
1 − G
x k − x,
G
k ≥ 1.
(12.32)
Proof We find
x k − x = G(x k−1 − x) ≤ Gx k−1 − x
= Gx k−1 − x k + x k − x ≤ G x k−1 − x k + x k − x .
Thus (1 − G)x k − x ≤ Gx k−1 − x k which implies (12.32).
Another possibility is to stop when the residual vector r k := b − Ax k is
sufficiently small in some norm. To use the residual vector for stopping it is
convenient to write the iterative method (12.10) in an alternative form. If M is the
splitting matrix of the method then by (12.9) we have Mx k+1 = Mx k − Ax k + b.
This leads to
x k+1 = x k + M −1 r k ,
r k = b − Ax k .
(12.33)
Testing on r k works fine if A is well conditioned, but Theorem 8.8 shows that
the relative error in the solution can be much larger than the relative error in r k if A
is ill-conditioned.
12.4 Powers of a Matrix
Let A ∈ Cn×n be a square matrix. In this section we consider the special matrix
sequence {Ak } of powers of A. We want to know when this sequence converges to
the zero matrix. Such a sequence occurs in iterative methods (cf. (12.16)), in Markov
processes in statistics, in the converge of geometric series of matrices (Neumann
series cf. Sect. 12.4.2) and in many other applications.
12.4.1 The Spectral Radius
In this section we show the following important theorem.
12.4 Powers of a Matrix
269
Theorem 12.10 (When Is limk→∞ Ak = 0?) For any A ∈ Cn×n we have
lim Ak = 0 ⇐⇒ ρ(A) < 1,
k→∞
where ρ(A) is the spectral radius of A given by (12.17).
Clearly ρ(A) < 1 is a necessary condition for limk→∞ Ak = 0. For if (λ, x) is
an eigenpair of A with |λ| ≥ 1 and x2 = 1 then Ak x = λk x, and this implies
Ak 2 ≥ Ak x2 = λk x2 = |λ|k , and it follows that Ak does not tend to zero.
The sufficiency condition is harder to show. We construct a consistent matrix
norm on Cn×n such that A < 1 and then use Theorems 12.2 and 12.3.
We start with
Theorem 12.11 (Any Consistent Norm Majorizes the Spectral Radius) For any
matrix norm · that is consistent on Cn×n and any A ∈ Cn×n we have ρ(A) ≤
A.
Proof Let (λ, x) be an eigenpair for A, a consistent matrix norm on Cn×n
and define X := [x, . . . , x] ∈ Cn×n . Then λX = AX, which implies |λ| X =
λX = AX ≤ A X. Since X = 0 we obtain |λ| ≤ A.
The next theorem shows that if ρ(A) < 1 then A < 1 for some consistent
matrix norm on Cn×n , thus completing the proof of Theorem 12.10.
Theorem 12.12 (The Spectral Radius Can Be Approximated by a Norm) Let
A ∈ Cn×n and > 0 be given. There is a consistent matrix norm · on Cn×n such
that ρ(A) ≤ A ≤ ρ(A) + .
Proof Let A have eigenvalues λ1 , . . . , λn . By the Schur Triangulation Theorem 6.5
there is a unitary matrix U and an upper triangular matrix R = [rij ] such that
U ∗ AU = R. For t > 0 we define D t := diag(t, t 2 , . . . , t n ) ∈ Rn×n , and note that
i−j r for all i, j . For n = 3
the (i, j ) element in D t RD −1
ij
t is given by t
⎤
λ1 t −1 r12 t −2 r13
= ⎣ 0 λ2 t −1 r23 ⎦ .
0
0
λ3
⎡
D t RD −1
t
For each B ∈ Cn×n and t > 0 we use the one norm to define the matrix norm
Bt := D t U ∗ BU D −1
t 1 . We leave it as an exercise to show that t is a
consistent matrix norm on Cn×n . We define B := Bt , where t is chosen so
large that the sum of the absolute values of all off-diagonal elements in D t RD −1
t is
less than . Then
A = D t U
∗
AU D −1
t 1
=
D t RD −1
t 1
≤ max (|λj | + ) = ρ(A) + .
1≤j ≤n
n
= max
1≤j ≤n
i=1
| D t RD −1
|
t
ij
270
12 The Classical Iterative Methods
A consistent matrix norm of a matrix can be much larger than the spectral radius.
However the following result holds.
Theorem 12.13 (Spectral Radius Convergence) For any consistent matrix norm
· on Cn×n and any A ∈ Cn×n we have
lim Ak 1/ k = ρ(A).
(12.34)
k→∞
Proof Let be a consistent matrix norm on Cn×n . If λ is an eigenvalue of A
then λk is an eigenvalue of Ak for any k ∈ N. By Theorem 12.11 we then obtain
ρ(A)k = ρ(Ak ) ≤ Ak for any k ∈ N so that ρ(A) ≤ Ak 1/ k . Let > 0 and
consider the matrix B := (ρ(A) + )−1 A. Then ρ(B) = ρ(A)/(ρ(A) + ) < 1
and B k → 0 by Theorem 12.10 as k → ∞. Choose N ∈ N such that B k < 1
for all k ≥ N. Then for k ≥ N
k k
k
Ak = (ρ(A) + )k B k = ρ(A) +
B < ρ(A) +
.
We have shown that ρ(A) ≤ Ak 1/ k ≤ ρ(A) +
the result follows.
for k ≥ N. Since
is arbitrary
12.4.2 Neumann Series
Let B be a square matrix. In this section we consider the Neumann series
∞
Bk
k=0
which is a matrix analogue of a geometric series of numbers.
n×n . We say that the series
Consider an infinite series ∞
k=0 Ak of matrices in C
converges if the sequence of partial sums {S m } given by S m = m
k=0 Ak converges.
The series converges if and only if {S m } is a Cauchy sequence, i.e. to each > 0
there exists an integer N so that S l − S m < for all l > m ≥ N.
Theorem 12.14 (Neumann Series) Suppose B ∈ Cn×n . Then
k
1. The series ∞
k=0 B converges if and only if ρ(B) < 1.
2. If ρ(B) < 1 then (I − B) is nonsingular and (I − B)−1 =
3. If B < 1 for some consistent matrix norm · on Cn×n then
(I − B)−1 ≤
1
.
1 − B
∞
k
k=0 B .
(12.35)
12.5 The Optimal SOR Parameter ω
271
Proof
k
1. Suppose ρ(B) < 1. We show that S m := m
k=0 B is a Cauchy sequence and
hence convergent. Let > 0. By Theorem 12.12 there is a consistent matrix
norm · on Cn×n such that B < 1. Then for l > m
l
∞
l
S l − S m = Bk ≤
k=m+1
Bk ≤ Bm+1
k=m+1
Bk =
k=0
Bm+1
.
1 − B
But then {S m } is a Cauchy sequence provided N is such that B
1−B < .
Conversely,suppose (λ,
x)
is
an
eigenpair
for
B
with
|λ|
≥
1. We find S m x =
m
m
∞
kx =
k x. Since λk does not tend to zero the series
k
B
λ
k=0
k=0
k=0 λ is
not convergent and therefore {S m x} and hence {S m } does not converge.
2. We have
N+1
m
B k (I − B) = I + B + · · · + B m − (B + · · · + B m+1 ) = I − B m+1 .
k=0
(12.36)
Since ρ(B) < 1 we conclude
that B m+1 → 0 and hence taking limits in (12.36)
∞
k
we obtain
k=0 B (I − B) = I which completes the proof of 2.
∞
1
k
k
3. By 2: (I − B)−1 = ∞
k=0 B ≤
k=0 B = 1−B .
12.5 The Optimal SOR Parameter ω
The following analysis is only carried out for the discrete Poisson matrix. It also
holds for the averaging matrix given by (10.10). A more general theory is presented
in [22]. We will compare the eigenpair equations for GJ and Gω . It is convenient to
write these equations using the matrix formulation T V + V T = h2 F . If GJ v = μv
is an eigenpair of GJ then
1
(vi−1,j + vi,j −1 + vi+1,j + vi,j +1 ) = μvi,j ,
4
i, j = 1, . . . , m,
(12.37)
2
where v := vec(V ) ∈ Rm and vi,j = 0 if i ∈ {0, m + 1} or j ∈ {0, m + 1}.
Suppose
(λ, w) is an eigenpair for Gω . By (12.21) (I − ωL)−1 ωR + (1 −
ω)I w = λw or
(ωR + λωL)w = (λ + ω − 1)w.
(12.38)
272
12 The Classical Iterative Methods
Let w = vec(W ), where W ∈ Cm×m . Then (12.38) can be written
ω
(λwi−1,j + λwi,j −1 + wi+1,j + wi,j +1 ) = (λ + ω − 1)wi,j ,
4
(12.39)
where wi,j = 0 if i ∈ {0, m + 1} or j ∈ {0, m + 1}.
Theorem 12.15 (The Optimal ω) Consider the SOR method applied to the discrete Poisson matrix (10.10), where we use the natural ordering. Moreover, assume
ω ∈ (0, 2).
1. If λ = 0 is an eigenvalue of Gω then
μ :=
λ+ω−1
ωλ1/2
(12.40)
is an eigenvalue of GJ .
2. If μ is an eigenvalue of GJ and λ satisfies the equation
μωλ1/2 = λ + ω − 1
(12.41)
then λ is an eigenvalue of Gω .
Proof Suppose (λ, w) is an eigenpair for Gω . We claim that (μ, v) is an eigenpair
for GJ , where μ is given by (12.40) and v = (V ) with vi,j := λ−(i+j )/2 wi,j .
Indeed, replacing wi,j by λ(i+j )/2 vi,j in (12.39) and cancelling the common factor
λ(i+j )/2 we obtain
ω
(vi−1,j + vi,j −1 + vi+1,j + vi,j +1 ) = λ−1/2 (λ + ω − 1)vi,j .
4
But then
GJ v = (L + R)v =
λ+ω−1
v = μv.
ωλ1/2
For the converse let (μ, v) be an eigenpair for GJ and let et λ be a solution of
(12.41). We define as before v =: vec(V ), W = vec(W ) with wi,j := λ(i+j )/2 vi,j .
Inserting this in (12.37) and canceling λ−(i+j )/2 we obtain
1 1/2
(λ wi−1,j + λ1/2 wi,j −1 + λ−1/2 wi+1,j + λ−1/2 wi,j +1 ) = μwi,j .
4
Multiplying by ωλ1/2 we obtain
ω
(λwi−1,j + λwi,j −1 + wi+1,j + wi,j +1 ) = ωμλ1/2 wi,j ,
4
Thus, if ωμλ1/2 = λ + ω − 1 then by (12.39) (λ, w) is an eigenpair for Gω .
12.5 The Optimal SOR Parameter ω
273
Proof of Theorem 12.8 Combining statement 1 and 2 in Theorem 12.15 we see
that ρ(Gω ) = |λ(μ)|, where λ(μ) is an eigenvalue of Gω satisfying (12.41) for
some eigenvalue μ of GJ . The eigenvalues of GJ are 12 cos(j πh) + 12 cos(kπh),
j, k = 1, . . . , m, so μ is real and both μ and −μ are eigenvalues. Thus, to compute
ρ(Gω ) it is enough to consider (12.41) for a positive eigenvalue μ of GJ . Solving
(12.41) for λ = λ(μ) gives
λ(μ) :=
1
ωμ ±
4
%
2
(ωμ)2 − 4(ω − 1) .
(12.42)
Both roots λ(μ) are eigenvalues of Gω . The discriminant
d(ω) := (ωμ)2 − 4(ω − 1).
is strictly decreasing on (0, 2) since
d (ω) = 2(ωμ2 − 2) < 2(ω − 2) < 0.
Moreover d(0) = 4 > 0 and d(2) = 4μ2 − 4 < 0. As a function of ω, λ(μ) changes
from real to complex when d(ω) = 0. The root in (0, 2) is
ω = ω̃(μ) := 2
1−
1 − μ2
2
=
.
μ2
1 + 1 − μ2
(12.43)
In the complex case we find
|λ(μ)| =
1
(ωμ)2 + 4(ω − 1) − (ωμ)2 = ω − 1,
4
ω̃(μ) < ω < 2.
In the real case both roots of (12.42) are positive and the larger one is
1
λ(μ) =
ωμ +
4
%
2
(ωμ)2 − 4(ω − 1) ,
0 < ω ≤ ω̃(μ).
(12.44)
Both λ(μ) and ω̃(μ) are strictly increasing as functions of μ. It follows that |λ(μ)|
is maximized for μ = ρ(GJ ) =: β and for this value of μ we obtain (12.27) for
0 < ω ≤ ω̃(β) = ω∗ .
Evidently ρ(Gω ) = ω − 1 is strictly increasing in ω∗ < ω < 2. Equation (12.29)
will follow if we can show that ρ(Gω ) is strictly decreasing in 0 < ω < ω∗ . By
differentiation
%
β (ωβ)2 − 4(ω − 1) + ωβ 2 − 2
d 2
ωβ + (ωβ) − 4(ω − 1) =
.
dω
(ωβ)2 − 4(ω − 1)
Since β 2 (ω2 β 2 − 4ω + 4) < (2 − ωβ 2 )2 the numerator is negative and the strict
decrease of ρ(Gω ) in 0 < ω < ω∗ follows.
274
12 The Classical Iterative Methods
12.6 Exercises Chap. 12
12.6.1 Exercises Sect. 12.3
Exercise 12.1 (Richardson and Jacobi) Show that if aii = d = 0 for all i then
Richardson’s method with α := 1/d is the same as Jacobi’s method.
Exercise 12.2 (R-Method When Eigenvalues Have Positive Real Part) Suppose
all eigenvalues λj of A have positive real parts uj for j = 1, . . . , n and that α is
real. Show that the R method converges if and only if 0 < α < minj (2uj /|λj |2 ).
Exercise 12.3 (Divergence Example for J and GS)
Show that both Jacobi’s
method and Gauss-Seidel’s method diverge for A = 13 24 .
Exercise 12.4 (2 by 2 Matrix) We want to show that
converges if
Gauss-Seidel
a12 2×2 .
and only if Jacobi converges for a 2 by 2 matrix A := aa11
∈
R
21 a22
a) Show that the spectral radius for the Jacobi method is
ρ(GJ ) =
|a21a12 /a11 a22|.
b) Show that the spectral radius for the Gauss-Seidel method is
ρ(G1 ) = |a21 a12 /a11a22 |.
c) Conclude that Gauss-Seidel converges if and only if Jacobi converges.
Exercise 12.5 (Example:
! GS "Converges, J Diverges) Show (by finding its eigen1a a
values) that the matrix a 1 a is positive definite for −1/2 < a < 1. Thus, GS
a a 1
converges for these values of a. Show that the J method does not converge for
1/2 < a < 1.
Exercise 12.6 (Example: GS Diverges, J Converges) Let GJ and G1 be the
iteration
matrices
for the Jacobi and Gauss-Seidel methods applied to the matrix
A :=
1 0 1/2
1 1 0
−1 1 1
.1
a) Show that G1 :=
0 0 −1/2
0 0 1/2
0 0 −1
and conclude that GS diverges.
b) Show that p(λ) := det(λI − GJ ) = λ3 + 12 λ + 12 .
c) Show that if |λ| ≥ 1 then p(λ) = 0. Conclude that J converges.
Exercise 12.7 (Strictly Diagonally Dominance; The J Method) Show that the J
method converges if |aii | > j =i |aij | for i = 1, . . . , n.
1 Stewart Venit, “The convergence of Jacobi and Gauss-Seidel iteration”, Mathematics Magazine
48 (1975), 163–167.
12.6 Exercises Chap. 12
275
Exercise 12.8 (Strictly Diagonally Dominance; The GS Method) Consider the
|aij |
GS method. Suppose r := maxi ri < 1, where ri =
j =i |aii | . Show using
induction on i that | k+1 (j )| ≤ r k ∞ for j = 1, . . . , i. Conclude that GaussSeidel’s method is convergent when A is strictly diagonally dominant.
Exercise 12.9 (Convergence Example for Fix Point Iteration) Consider for a ∈
C
0 a x1
1−a
x1
=
+
x :=
=: Gx + c.
x2
a 0 x2
1−a
Starting with x 0 = 0 show by induction
x k (1) = x k (2) = 1 − a k ,
k ≥ 0,
and conclude that the iteration converges to the fixed-point x = [1, 1]T for |a| < 1
and diverges for |a| > 1. Show that ρ(G) = 1 − η with η = 1 − |a|. Compute the
estimate (12.31) for the rate of convergence for a = 0.9 and s = 16 and compare
with the true number of iterations determined from |a|k ≤ 10−16 .
Exercise 12.10 (Estimate in Lemma 12.1 Can Be Exact) Consider the iteration
in Example 12.2. Show that ρ(GJ ) = 1/2. Then show that x k (1) = x k (2) = 1−2−k
for k ≥ 0. Thus the estimate in Lemma 12.1 is exact in this case.
Exercise 12.11 (Iterative Method (Exam Exercise 1991-3)) Let A ∈ Rn×n be a
symmetric positive definite matrix with ones on the diagonal and let b ∈ Rn . We
will consider an iterative method for the solution of Ax = b. Observe that A may
be written A = I −L−LT , where L is lower triangular with zero’s on the diagonal,
li,j = 0, when j >= i. The method is defined by
Mx k+1 = N x k + b,
(12.45)
where M and N are given by the splitting
A = M − N,
M = (I − L)(I − LT ),
N = LLT .
(12.46)
a) Let x = 0 be an eigenvector of M −1 N with eigenvalue λ. Show that
λ=
xT N x
.
+ xT N x
x T Ax
(12.47)
b) Show that the sequence {x k } generated by (12.45) converges to the solution x of
Ax = b for any starting vector x 0 .
276
12 The Classical Iterative Methods
c) Consider the following algorithm
1. Choose x = [x(1), x(2), . . . , x(n)]T .
2. for k = 1, 2, 3, . . .
for i = 1, 2, . . . , n − 1, n, n, n − 1, n − 2, . . . , 1
x(i) = b(i) −
(12.48)
a(i, j )x(j )
j =i
Is there a connection between this algorithm and the method of Gauss-Seidel?
Show that the algorithm (12.48) leads up to the splitting (12.46).
Exercise 12.12 (Gauss-Seidel Method (Exam Exercise 2008-1)) Consider the
linear system Ax = b in which
⎡
⎤
301
A := ⎣0 7 2⎦
124
and b := [1, 9, −2]T .
a) With x 0 = [1, 1, 1]t , carry out one iteration of the Gauss-Seidel method to find
x 1 ∈ R3 .
b) If we continue the iteration, will the method converge? Why?
c) Write a MATLAB program for the Gauss-Seidel method applied to a matrix
A ∈ Rn×n and right-hand side b ∈ Rn . Use the ratio of the current residual to
the initial residual as the stopping criterion, as well as a maximum number of
iterations.2
12.6.2 Exercises Sect. 12.4
Exercise 12.13 (A Special Norm) Show that Bt := D t U ∗ BU D −1
t 1 defined
in the proof of Theorem 12.12 is a consistent matrix norm on Cn×n .
Exercise 12.14 (Is A + E Nonsingular?) Suppose A ∈ Cn×n is nonsingular and
E ∈ Cn×n . Show that A + E is nonsingular if ρ(A−1 E) < 1.
2 Hint:
The function C=tril(A) extracts the lower part of A into a lower triangular matrix C.
12.7 Review Questions
277
Exercise 12.15 (Slow Spectral Radius Convergence) The convergence limk→∞ Ak 1/ k = ρ(A) can be quite slow. Consider
⎤
⎡λ a
0 ··· 0 0
0 λ a ··· 0 0
⎢ 0 0 λ ··· 0 0 ⎥
A := ⎢
⎣ ...
∈ Rn×n .
.. ⎥
.⎦
0 0 0 ··· λ a
0 0 0 ··· 0 λ
If |λ| = ρ(A) < 1 then limk→∞ Ak = 0 for any a ∈ R. We show below that the
k
(1, n) element of Ak is given by f (k) := n−1
a n−1 λk−n+1 for k ≥ n − 1.
a) Pick an n, e.g. n = 5, and make a plot of f (k) for λ = 0.9, a = 10, and
n − 1 ≤ k ≤ 200. Your program should also compute maxk f (k). Use your
program to determine how large k must be before f (k) < 10−8 .
b) We can determine the elements!of Ak explicitly
for any k. Let E := (A − λI )/a.
"
for
1
≤
k ≤ n − 1 and that E n = 0.
Show by induction that E k = 00 I n−k
0
min{k,n−1} k j k−j j
c) We have Ak = (aE + λI )k = j =0
E and conclude that the
j a λ
(1, n) element is given by f (k) for k ≥ n − 1.
12.7 Review Questions
12.7.1 Consider a matrix A ∈ Cn×n with nonzero diagonal elements.
•
•
•
•
Define the J and GS method in component form,
Do they always converge?
Give a necessary and sufficient condition that An → 0.
Is there a matrix norm consistent on Cn×n such that A < ρ(A)?
12.7.2 What is a Neumann series? when does it converge?
12.7.3 How do we define convergence of a fixed point iteration x k+1 = Gx k + c?
When does it converge?
12.7.4 Define Richardson’s method.
Chapter 13
The Conjugate Gradient Method
The conjugate gradient method was published by Hestenes and Stiefel in 1952, [6]
as a direct method for solving linear systems. Today its main use is as an iterative
method for solving large sparse linear systems. On a test problem we show that
it performs as well as the SOR method with optimal acceleration parameter, and
we do not have to estimate any such parameter. However the conjugate gradient
method is restricted to positive definite systems. We also consider a mathematical
formulation of the preconditioned conjugate gradient method. It is used to speed
up convergence of the conjugate gradient method. We only give one example of a
possible preconditioner. See [1] for a more complete treatment of iterative methods
and preconditioning.
The conjugate gradient method can also be used for minimization and is
related to a method known as steepest descent. This method and the conjugate
gradient method are both minimization methods, and iterative methods, for solving
equations.
Throughout this chapter A ∈ Rn×n will be a symmetric and positive definite
matrix. We recall that A has positive eigenvalues and that the spectral (2-norm)
condition number of A is given by κ := λλmax
, where λmax and λmin are the largest
min
and smallest eigenvalue of A.
The analysis of the methods in this chapter is in terms of two inner products
T
on Rn , the
√ usual inner product x, y = x y with the associated Euclidian norm
T
x2 = x x , and the A-inner product and the corresponding A-norm given by
x, yA := x T Ay,
yA :=
%
y T Ay,
x, y ∈ Rn .
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_13
(13.1)
279
280
13 The Conjugate Gradient Method
We note that the A-inner product is an inner product on Rn . Indeed, for any x, y, z ∈
Rn
1. x, xA = x T Ax ≥ 0 and x, xA = 0 if and only if x = 0, since A is positive
definite,
2. x, yA := x T Ay = (x T Ay)T = y T AT x = y T Ax = y, xA by symmetry of
A,
3. x + y, zA := x T Az + y T Az = x, zA + y, zA , true for any A.
By Theorem 5.2 the A-norm is a vector norm on Rn since it is an inner product
norm, and the Cauchy-Schwarz inequality holds
|x T Ay|2 ≤ (x T Ax)(y T Ay),
x, y ∈ Rn .
(13.2)
13.1 Quadratic Minimization and Steepest Descent
We start by discussing some aspect of quadratic minimization and its relation to
solving linear systems.
Consider for a positive definite A ∈ Rn×n , b ∈ Rn and c ∈ R the quadratic
function Q : Rn → R given by
Q(y) :=
1 T
y Ay − bT y + c.
2
(13.3)
As an example, some level curves of
Q(x, y) :=
1 2 −1 x
= x 2 − xy + y 2
xy
−1 2
y
2
(13.4)
are shown in Fig. 13.1. The level curves are ellipses and the graph of Q is a
paraboloid (cf. Exercise 13.2).
The following expansion will be used repeatedly. For y, h ∈ Rn and ε ∈ R
1
Q(y + εh) = Q(y) − εhT r(y) + ε2 hT Ah, where r(y) := b − Ay.
2
(13.5)
Minimizing a quadratic function is equivalent to solving a linear system.
Lemma 13.1 (Quadratic Function) A vector x ∈ Rn minimizes Q given by (13.3)
if and only if Ax = b. Moreover, the residual r(y) := b − Ay for any y ∈ Rn is
"T
!
equal to the negative gradient, i.e., r(y) = −∇Q(y), where ∇ := ∂y∂ 1 , . . . , ∂y∂ n .
Proof If y = x, ε = 1, and Ax = b, then (13.5) simplifies to Q(x + h) =
Q(x) + 12 hT Ah, and since A is positive definite Q(x + h) > Q(x) for all nonzero
h ∈ Rn . It follows that x is the unique minimum of Q. Conversely, if Ax = b and
13.1 Quadratic Minimization and Steepest Descent
x2
x0
281
x2
x3
x1
x0
x1
Fig. 13.1 Level curves for Q(x, y) given by (13.4). Also shown is a steepest descent iteration
(left) and a conjugate gradient iteration (right) to find the minimum of Q (cf Examples 13.1,13.2)
h := r(x), then by (13.5), Q(x + εh) − Q(x) = −ε(hT r(x) − 12 εhT Ah) < 0 for
ε > 0 sufficiently small. Thus x does not minimize Q. By (13.5) for y ∈ Rn
∂
1
Q(y) := lim (Q(y + εei ) − Q(y))
ε→0 ε
∂yi
1
1
−εeTi r(y)) + ε2 eTi Aei
ε→0 ε
2
= lim
showing that r(y) = −∇Q(y).
= −eTi r(y),
i = 1, . . . , n,
A general class of minimization algorithms for Q and solution algorithms for a
linear system is given as follows:
1. Choose x 0 ∈ Rn .
2. For k = 0, 1, 2, . . .
Choose a “search direction” p k ,
Choose a “step length” αk ,
Compute x k+1 = x k + αk p k .
(13.6)
We would like to generate a sequence {x k } that converges quickly to the minimum
x of Q.
282
13 The Conjugate Gradient Method
For a fixed direction p k we say that αk is optimal if Q(x k+1 ) is as small as
possible, i.e.
Q(x k+1 ) = Q(x k + αk p k ) = min Q(x k + αp k ).
α∈R
By (13.5) we have Q(x k + αp k ) = Q(x k ) − αp Tk r k + 12 α 2 pTk Apk , where r k :=
∂
b−Ax k . Since pTk Apk ≥ 0 we find a minimum αk by solving ∂α
Q(x k +αp k ) = 0.
It follows that the optimal αk is uniquely given by
αk :=
pTk r k
p Tk Ap k
(13.7)
.
In the method of steepest descent, also known as the gradient method, we
choose p k = r k the negative gradient, and the optimal αk . Starting from x 0 we
compute for k = 0, 1, 2 . . .
x k+1 = x k +
r Tk r k r Tk Ar k
(13.8)
rk.
This is similar to Richardson’s method (12.18), but in that method we used a
constant step length. Computationally, a step in the steepest descent iteration can be
organized as follows
p k = r k , t k = Apk ,
αk = (pTk r k )/(pTk t k ),
(13.9)
x k+1 = x k + αk pk ,
r k+1 = r k − αk t k .
Here, and in general, the following update of the residual is used:
r k+1 = b − Ax k+1 = b − A(x k + αk pk ) = r k − αk Ap k .
(13.10)
In the steepest descent method the choice pk = r k implies that the last two gradients
are orthogonal. Indeed, by (13.10), r Tk+1 r k = (r k − αk Ar k )T r k = 0 since αk =
r Tk r k
r Tk Ar k
and A is symmetric.
Example 13.1 (Steepest Descent Iteration) Suppose Q(x, y) is given by (13.4).
Starting with x 0 = [−1, −1/2]T and r 0 = −Ax 0 = [3/2, 0]T we find
t0 = 3
1
−1/2
t 1 = 3 ∗ 4−1
α0 =
,
−1 2
,
1
,
2
α1 =
x 1 = −4−1
1
,
2
1
2
x 2 = −4−1
,
r 1 = 3 ∗ 4−1
1
1/2
,
0
1
r 2 = 3 ∗ 4−1
!
1/2
0
"
,
13.2 The Conjugate Gradient Method
283
and in general for k ≥ 1
1 t 2k−2 = 3 ∗ 41−k −1/2
, x 2k−1 = −4−k 12 , r 2k−1 = 3 ∗ 4−k 01
! "
1 −k
−k 1/2
.
,
x
,
r
=
−4
=
3
∗
4
t 2k−1 = 3 ∗ 4−k −1
2k
2k
1/2
2
0
Since αk = 1/2 is constant for all k the methods of Richardson, Jacobi and steepest
descent are the same on this simple problem. See the left part of Fig. 13.1. The rate
of convergence is determined from x j +1 2 /x j = r j +1 2 /r j 2 = 1/2 for
all j .
13.2 The Conjugate Gradient Method
In the steepest descent method the last two gradients are orthogonal. In the
conjugate gradient method all gradients are orthogonal.1 We achieve this by using
A-orthogonal search directions i.e., p Ti Ap j = 0 for all i = j .
13.2.1 Derivation of the Method
As in the steepest descent method we choose a starting vector x 0 ∈ Rn . If r 0 =
b −Ax 0 = 0 then x 0 is the exact solution and we are finished, otherwise we initially
make a steepest descent step. It follows that r T1 r 0 = 0 and p0 := r 0 .
For the general case we define for j ≥ 0
j −1
p j := r j −
(
i=0
r Tj Ap i
pTi Api
x j +1 := x j + αj p j
αj :=
r j +1 = r j − αj Ap j .
(13.11)
)p i ,
r Tj r j
p Tj Ap j
,
(13.12)
(13.13)
We note that
1. p j is computed by the Gram-Schmidt orthogonalization process applied to the
residuals r 0 , . . . , r j using the A-inner product. The search directions are therefore A-orthogonal and nonzero as long as the residuals are linearly independent.
2. Equation (13.13) follows from (13.10).
3. It can be shown that the step length αj is optimal for all j (cf. Exercise 13.7).
1 It
is this property that has given the method its name.
284
13 The Conjugate Gradient Method
Lemma 13.2 (The Residuals Are Orthogonal) Suppose that for some k ≥ 0 that
x j is well defined, r j = 0, and r Ti r j = 0 for i, j = 0, 1, . . . , k, i = j . Then x k+1
is well defined and r Tk+1 r j = 0 for j = 0, 1, . . . , k.
Proof Since the residuals r j are orthogonal and nonzero for j ≤ k, they are linearly
independent, and it follows form the Gram-Schmidt Theorem 5.4 that pk is nonzero
and p Tk Ap i = 0 for i < k. But then x k+1 and r k+1 are well defined. Now
r Tk+1 r j
(13.13)
=
(r k − αk Ap k )T r j
(13.11) T
= rk rj
pTk Api = 0
=
− αk p Tk A
j −1
pj +
(
r Tj Api
p Ti Ap i
i=0
r Tk r j − αk p Tk Ap j = 0,
)p i
j = 0, 1, . . . , k.
That the final expression is equal to zero follows by orthogonality and Aorthogonality for j < k and by the definition of αk for j = k. This completes
the proof.
The conjugate gradient method is also a direct method. The residuals are
orthogonal and therefore linearly independent if they are nonzero. Since dim Rn = n
the n + 1 residuals r 0 , . . . , r n cannot all be nonzero and we must have r k = 0 for
some k ≤ n. Thus we find the exact solution in at most n iterations.
The expression (13.11) for pk can be greatly simplified. All terms except the last
one vanish, since by orthogonality of the residuals
r Tj Api
(13.13) T r i
= rj
− r i+1 = 0,
αi
i = 0, 1, . . . , j − 2.
With j = k + 1 (13.11) therefore takes the simple form p k+1 = r k+1 + βk p k and
we find
βk := −
r Tk+1 Apk
p Tk Ap k
(13.13)
=
r Tk+1 (r k+1 − r k )
αk pTk Apk
(13.12)
=
r Tk+1 r k+1
r Tk r k
.
(13.14)
To summarize, in the conjugate gradient method we start with x 0 , p 0 = r 0 =
b − Ax 0 and then generate a sequence of vectors {x k } as follows:
For k = 0,1, 2, . . .
x k+1 := x k + αk pk ,
r k+1 := r k − αk Apk ,
αk :=
r Tk r k
p Tk Ap k
,
(13.15)
(13.16)
13.2 The Conjugate Gradient Method
pk+1 := r k+1 + βk p k ,
285
βk :=
r Tk+1 r k+1
r Tk r k
.
(13.17)
The residuals and search directions are orthogonal and A-orthogonal, respectively.
For computation we organize the iterations as follows for k = 0, 1, 2, . . .
t k = Ap k ,
αk = (r Tk r k )/(p Tk t k ),
x k+1 = x k + αk pk ,
r k+1 = r k − αk t k ,
(13.18)
βk = (r Tk+1 r k+1 )/(r Tk r k ),
p k+1 := r k+1 + βk pk .
Note that (13.18) differs from (13.9) only in the computation of the search direction.
Example 13.2 (Conjugate Gradient
(13.18) applied to the pos 2 −1 Iteration)
x1 0 Consider
itive definite linear system −1
=
.
Starting
as in Example 13.1 with
2 ! x2"
0
!
"
−1
3/2
x 0 = −1/2 we find p 0 = r 0 = 0 and then
t0 =
p1 =
!
!
3
−3/2
3/8
3/4
"
,
"
,
"
!
! "
−1/4
0
α0 = 1/2, x 1 = −1/2
, r 1 = 3/4
, β0 = 1/4,
! "
0
t 1 = 9/8
, α1 = 2/3, x 2 = 0, r 2 = 0.
Thus x 2 is the exact solution as illustrated in the right part of Fig. 13.1.
13.2.2 The Conjugate Gradient Algorithm
In this section we give numerical examples and discuss implementation.
The formulas in (13.18) form a basis for the following algorithm, which solves
the positive definite linear system Ax = b by the conjugate gradient method. x is a
starting vector for the iteration. The iteration is stopped when ||r k ||2 /||b||2 ≤ tol or
286
13 The Conjugate Gradient Method
k > itmax. K is the number of iterations used:
function [x,K]=cg(A,b,x,tol,itmax)
% [x,K]=cg(A,b,x,tol,itmax)
r=b-A*x; p=r; rho0=b’*b; rho=r’*r;
for k=0:itmax
if sqrt(rho/rho0)<= tol
K=k; return
end
t=A*p; a=rho/(p’*t);
x=x+a*p; r=r-a*t;
rhos=rho; rho=r’*r;
p=r+(rho/rhos)*p;
end
K=itmax+1;
end
Listing 13.1 cg
The work involved in each iteration is
1. one matrix times vector (t = Ap),
2. two inner products ((pT t and r T r),
3. three vector-plus-scalar-times-vector (x = x + ap, r = r − at and p = r +
(rho/rhos)p),
The dominating part is the computation of t = Ap.
13.2.3 Numerical Example
We test the conjugate gradient method on two examples. For a similar test for
the steepest descent method see Exercise 13.9. Consider the matrix given by the
Kronecker sum T 2 := T 1 ⊗ I + I ⊗ T 1 , where T 1 := tridiagm (a, d, a) ∈ Rm×m
and a, d ∈ R. We recall that this matrix is positive definite if d > 0 and d ≥ 2|a|
(cf. Theorem 10.2). We set h = 1/(m + 1) and f = [1, . . . , 1]T ∈ Rn .
We consider two problems.
1. a = 1/9, d = 5/18, the Averaging matrix.
2. a = −1, d = 2, the Poisson matrix.
13.2.4 Implementation Issues
Note that for our test problems T 2 only has O(5n) nonzero elements. Therefore,
taking advantage of the sparseness of T 2 we can compute t in Algorithm 13.1
13.2 The Conjugate Gradient Method
287
in O(n) arithmetic operations. With such an implementation the total number of
arithmetic operations in one iteration is O(n). We also note that it is not necessary
to store the matrix T 2 .
To use the conjugate gradient algorithm on the test matrix for large n it is advantageous to use a matrix equation formulation. We define matrices V , R, P , B, T ∈
Rm×m by x = vec(V ), r = vec(R), p = vec(P ), t = vec(T ), and h2 f = vec(B).
Then T 2 x = h2 f ⇐⇒ T 1 V + V T 1 = B, and t = T 2 p ⇐⇒ T = T 1 P + P T 1 .
This leads to the following algorithm for testing the conjugate gradient algorithm
on the matrix
2 )×(m2 )
A = tridiagm (a, d, a) ⊗ I m + I m ⊗ tridiagm (a, d, a) ∈ R(m
.
function [V,K]=cgtest(m,a,d,tol,itmax)
% [V,K]=cgtest(m,a,d,tol,itmax)
R=ones(m)/(m+1)^2; rho=sum(sum(R.*R)); rho0=rho; P=R;
V=zeros(m,m); T1=sparse(tridiagonal(a,d,a,m));
for k=1:itmax
if sqrt(rho/rho0)<= tol
K=k; return
end
T=T1*P+P*T1;
a=rho/sum(sum(P.*T)); V=V+a*P; R=R-a*T;
rhos=rho; rho=sum(sum(R.*R)); P=R+(rho/rhos)*P;
end
K=itmax+1;
end
Listing 13.2 cgtest
For both the averaging- and Poison matrix we use tol = 10−8 .
For the averaging matrix we obtain the values in Table 13.1.
The convergence is quite rapid. It appears that the number of iterations can be
bounded independently of n, and therefore we solve the problem in O(n) operations.
This is the best we can do for a problem with n unknowns.
Consider next the √
Poisson problem. In Table 13.2 we list K, the required number
of iterations, and K/ n.
Table 13.1
various n
n
K
The number of iterations K for the averaging problem on a
2500
19
Table 13.2 The number of
iterations K for
√
√ the Poisson
problem on a n × n grid
for various n
10000
18
40000
18
n
K
√
K/ n
√
√
n × n grid for
1000000
16
2500
94
1.88
10000
188
1.88
4000000
15
40000
370
1.85
160000
735
1.84
288
13 The Conjugate Gradient Method
The
√ results show that K is much smaller than n and appears to be proportional
to n. This is the same speed as for SOR and we don’t have to estimate any
acceleration parameter.
13.3 Convergence
13.3.1 The Main Theorem
√
Recall that the A-norm of a vector x ∈ Rn is given by xA := x T Ax. The
following theorem gives upper bounds for the A-norm of the error in both steepest
descent and conjugate gradients.
Theorem 13.3 (Error Bound for Steepest Descent and Conjugate Gradients)
Suppose A is positive definite. For the A-norms of the errors in the steepest descent
method (13.8) the following upper bounds hold
||x − x k ||A
≤
||x − x 0 ||A
κ −1
κ +1
k
2
< e− κ k ,
, k > 0,
(13.19)
while for the conjugate gradient method we have
√
κ −1
||x − x k ||A
≤2 √
||x − x 0 ||A
κ +1
k
< 2e
− √2κ k
,
k ≥ 0.
(13.20)
Here κ = cond2(A) := λmax /λmin is the spectral condition number of A, where
λmax and λmin are the largest and smallest eigenvalue of A, respectively.
Theorem 13.3 implies
1. Since κ−1
κ+1 < 1 the steepest descent method always converges for a positive
definite matrix. The convergence can be slow when κ−1
κ+1 is close to one, and this
happens even for a moderately ill-conditioned A.
2. The rate of convergence for the conjugate gradient method appears to be
determined by the square root of the spectral condition number. This is much
better than the estimate for the steepest descent method. Especially for problems
with large condition numbers.
3. The proofs of the estimates in (13.19) and (13.20) are quite different. This is in
spite of their similar appearance.
13.3 Convergence
289
13.3.2 The Number of Iterations for the Model Problems
Consider the test matrix
2 )×(m2 )
T 2 := tridiagm (a, d, a) ⊗ I m + I m ⊗ tridiagm (a, d, a) ∈ R(m
.
The eigenvalues were given in (10.15) as
λj,k = 2d + 2a cos(j πh) + 2a cos(kπh),
j, k = 1, . . . , m.
(13.21)
For the averaging problem given by d = 5/18, a = 1/9, the largest and smallest
eigenvalue of T 2 are given by λmax = 59 + 49 cos (πh) and λmin = 59 − 49 cos (πh).
Thus
κA =
5 + 4 cos(πh)
≤ 9,
5 − 4 cos(πh)
and the condition number is bounded independently of n. It follows from (13.20)
that the number of iterations can be bounded independently of the size n of the
problem, and this is in agreement with what we observed in Table 13.1.
For the Poisson problem we have by (10.20) the condition number
κP =
2
2√
λmax
cos2 (πh/2)
cos(πh/2)
√
≈
≈
=
n.
and κP =
2
λmin
sin(πh/2)
πh
π
sin (πh/2)
Thus, (see also Exercise 8.19) we solve the discrete Poisson problem in O(n3/2 )
arithmetic operations using the conjugate gradient method. This is the same as
for the SOR method and for the fast method without the FFT. In comparison the
Cholesky Algorithm requires O(n2 ) arithmetic operations both for the averaging
and the Poisson problem.
13.3.3 Krylov Spaces and the Best Approximation Property
For the convergence analysis of the conjugate gradient method certain subspaces
of Rn called Krylov spaces play a central role. In fact the iterates in the conjugate
gradient method are best approximation of the solution from these subspaces using
the A-norm to measure the error.
The Krylov spaces are defined by W0 = {0} and
Wk = span(r 0 , Ar 0 , A2 r 0 , . . . , Ak−1 r 0 ),
k = 1, 2, 3, · · · .
290
13 The Conjugate Gradient Method
They are nested subspaces
W0 ⊂ W1 ⊂ W2 ⊂ · · · ⊂ Wn ⊂ Rn
with dim(Wk ) ≤ k for all k ≥ 0. Moreover, If v ∈ Wk then Av ∈ Wk+1 .
Lemma 13.3 (Krylov Space) For the iterates in the conjugate gradient method we
have
x k − x 0 ∈ Wk ,
r k , p k ∈ Wk+1 ,
k = 0, 1, . . . ,
(13.22)
w ∈ Wk .
(13.23)
and
r Tk w = p Tk Aw = 0,
Proof Equation (13.22) clearly holds for k = 0 since p 0 = r 0 . Suppose it holds
for some k ≥ 0. Then r k+1 = r k − αk Apk ∈ Wk+2 , p k+1 = r k+1 + βk pk ∈
(13.12)
Wk+2 and x k+1 − x 0 = x k − x 0 + αk p k ∈ Wk+1 . Thus (13.22) follows by
induction. The equation (13.23) follows since any w ∈ Wk is a linear combination
of {r 0 , r 1 , . . . , r k−1 } and also {p0 , p1 , . . . , p k−1 }.
Theorem 13.4 (Best Approximation Property) Suppose Ax = b, where A ∈
Rn×n is positive definite and {x k } is generated by the conjugate gradient method
(cf. (13.15)). Then
x − x k A = min x − x 0 − wA .
w∈Wk
(13.24)
Proof Fix k, let w ∈ Wk and u := x k −x 0 −w. By (13.22) u ∈ Wk and then (13.23)
implies that x − x k , u = r Tk u = 0. Using Corollary 5.2 we obtain
x − x 0 − wA = x − x k + uA ≥ x − x k A ,
with equality for u = 0.
If x 0 = 0 then (13.24) says that x k is the element in Wk that is closest to the
solution x in the A-norm. More generally, if x 0 = 0 then x − x k = (x − x 0 ) −
(x k − x 0 ) and x k − x 0 is the element in Wk that is closest to x − x 0 in the A-norm.
This is the orthogonal projection of x − x 0 into Wk , see Fig. 13.2.
m
Recall that to each polynomial p(t) := m
j =0 aj t there corresponds a matrix
m
polynomial p(A) := a0 I + a1 A + · · · + am A . Moreover, if (λj , uj ) are eigenpairs
of A then (p(λj ), uj ) are eigenpairs of p(A) for j = 1, . . . , n.
Lemma 13.4 (Krylov Space and Polynomials) Suppose Ax = b where A ∈
Rn×n is positive definite with orthonormal eigenpairs (λj , uj ), j = 1, 2, . . . , n,
and let r 0 := b − Ax 0 for some x 0 ∈ Rn . To each w ∈ Wk there corresponds
13.3 Convergence
291
x − x0
x − xk
x k − x0
Fig. 13.2 The orthogonal projection of x − x 0 into Wk
a polynomial P (t) :=
n
j =1 σj uj then
k−1
k−1
j =0 aj t
n
||x − x 0 − w||2A =
j =1
σj2
λj
such that w = P (A)r 0 . Moreover, if r 0 =
Q(λj )2 ,
Q(t) := 1 − tP (t).
(13.25)
Proof If w ∈ Wk then w = a0 r 0 + a1 Ar 0 + · · · + ak−1 Ak−1 r 0 for some scalars
a0 , . . . , ak−1 . But then w = P (A)r 0 . We find x − x 0 − P (A)r 0 = A−1 (r 0 −
AP (A))r 0 = A−1 Q(A)r 0 and A(x − x 0 − P (A)r 0 ) = Q(A)r 0 . Therefore,
x − x 0 − P (A)r 0 2A = cT A−1 c where c = (I − AP (A))r 0 = Q(A)r 0 .
(13.26)
Using the eigenvector expansion for r 0 we obtain
n
c=
σj Q(λj )uj ,
j =1
A−1 c =
n
σi
i=1
Q(λi )
ui .
λi
Now (13.25) follows by the orthonormality of the eigenvectors.
We will use the following theorem to estimate the rate of convergence.
(13.27)
292
13 The Conjugate Gradient Method
Theorem 13.5 (cg and Best Polynomial Approximation) Suppose [a, b] with
0 < a < b is an interval containing all the eigenvalues of A. Then in the conjugate
gradient method
||x − x k ||A
≤ min max |Q(x)|,
Q∈Πk a≤x≤b
||x − x 0 ||A
(13.28)
Q(0)=1
where Πk denotes the class of univariate polynomials of degree ≤ k with real
coefficients.
Proof By (13.25) with Q(t) = 1 (corresponding to P (A) = 0) we find ||x −
σ2
x 0 ||2A = nj=1 λjj . Therefore, by the best approximation property Theorem 13.4
and (13.25), for any w ∈ Wk
σj2
n
||x−x k ||2A ≤ ||x−x 0 −w||2A ≤ max |Q(x)|2
a≤x≤b
j =1
λj
= max |Q(x)|2 ||x−x 0 ||2A ,
a≤x≤b
where Q ∈ Πk and Q(0) = 1. Minimizing over such polynomials Q and taking
square roots the result follows.
In the next section we use properties of the Chebyshev polynomials to show that
||x − x k ||A
≤ min
Q∈Πk
||x − x 0 ||A
max
Q(0)=1
λmin ≤x≤λmax
|Q(x)| =
2
,
−k
a + ak
√
κ −1
a := √
,
κ +1
(13.29)
where κ = λmax /λmin is the spectral condition number of A. Ignoring the second
term in the denominator this implies the first inequality in (13.20).
Consider the second inequality in (13.20). The inequality
x−1
< e−2/x
x+1
for x > 1
(13.30)
follows from the familiar series expansion of the exponential function. Indeed, with
y = 1/x, using 2k /k! = 2, k = 1, 2, and 2k /k! < 2 for k > 2, we find
∞
e2/x = e2y =
k=0
and (13.30) follows.
(2y)k
< −1 + 2
k!
∞
yk =
k=0
x+1
1+y
=
1−y
x−1
13.4 Proof of the Convergence Estimates
293
13.4 Proof of the Convergence Estimates
13.4.1 Chebyshev Polynomials
The proof of the estimate (13.29) for the error in the conjugate gradient method
is based on an extremal property of the Chebyshev polynomials. Suppose a < b,
c ∈ [a, b] and k ∈ N. Consider the set Sk of all polynomials Q of degree ≤ k such
that Q(c) = 1. For any continuous function f on [a, b] we define
f ∞ = max |f (x)|.
a≤x≤b
We want to find a polynomial Q∗ ∈ Sk such that
Q∗ ∞ = min Q∞ .
Q∈Sk
We will show that Q∗ is uniquely given as a suitably shifted and normalized
version of the Chebyshev polynomial. The Chebyshev polynomial Tn of degree n
can be defined recursively by
Tn+1 (t) = 2tTn (t) − Tn−1 (t),
n ≥ 1,
t ∈ R,
starting with T0 (t) = 1 and T1 (t) = t. Thus T2 (t) = 2t 2 − 1, T3 (t) = 4t 3 − 3t etc.
In general Tn is a polynomial of degree n.
There are some convenient closed form expressions for Tn .
Lemma 13.5 (Closed Forms of Chebyshev Polynomials) For n ≥ 0
1. Tn (t) = cos
t) for t ∈ [−1,
√
√ 1], −n (n arccos
n
for |t| ≥ 1.
2. Tn (t) = 12 t + t 2 − 1 + t + t 2 − 1
Proof
1. With Pn (t) = cos (n arccos t) we have Pn (t) = cos nφ, where t = cos φ.
Therefore,
Pn+1 (t) + Pn−1 (t) = cos (n + 1)φ + cos (n − 1)φ = 2 cos φ cos nφ = 2tPn (t),
and it follows that Pn satisfies the same recurrence relation as Tn . Since P0 = T0
and P1 = T1 we have Pn = Tn for all n ≥ 0.
2. Fix t with |t| ≥ 1 and let xn := Tn (t) for n ≥ 0. The recurrence relation for the
Chebyshev polynomials can then be written
xn+1 − 2txn + xn−1 = 0 for n ≥ 1, with x0 = 1, x1 = t.
(13.31)
294
13 The Conjugate Gradient Method
To solve this difference equation we insert xn = zn into (13.31) and obtain zn+1 −
2tzn + zn−1 = 0 or z2 − 2tz + 1 = 0. The roots of this equation are
z1 = t +
t 2 − 1,
z2 = t −
−1
t2 − 1 = t + t2 − 1 .
Now z1n , z2n and more generally c1 z1n + c2 z2n are solutions of (13.31) for any
constants c1 and c2 . We find these constants from the initial conditions x0 =
c1 + c2 = 1 and x1 = c1 z1 + c2 z2 = t. Since z1 + z2 = 2t the solution is
c1 = c2 = 12 .
We show that the unique solution to our minimization problem is
Q∗ (x) =
Tk (u(x))
,
Tk (u(c))
u(x) =
b + a − 2x
.
b−a
(13.32)
Clearly Q∗ ∈ Sk .
Theorem 13.6 (A Minimal Norm Problem) Suppose a < b, c ∈ [a, b] and k ∈
N. If Q ∈ Sk and Q = Q∗ then Q∞ > Q∗ ∞ .
Proof Recall that a nonzero polynomial p of degree k can have at most k zeros. If
p(z) = p (z) = 0, we say that p has a double zero at z. Counting such a zero as
two zeros it is still true that a nonzero polynomial of degree k has at most k zeros.
|Q∗ | takes on its maximum 1/|Tk (u(c))| at the k + 1 points μ0 , . . . , μk in [a, b]
such that u(μi ) = cos(iπ/k) for i = 0, 1, . . . , k. Suppose Q ∈ Sk and that
Q∞ ≤ Q∗ ∞ . We have to show that Q ≡ Q∗ . Let f ≡ Q − Q∗ . We show
that f has at least k zeros in [a, b]. Since f is a polynomial of degree ≤ k and
f (c) = 0, this means that f ≡ 0 or equivalently Q ≡ Q∗ .
Consider Ij = [μj −1 , μj ] for a fixed j . Let
σj = f (μj −1 )f (μj ).
We have σj ≤ 0. For if say Q∗ (μj ) > 0 then
Q(μj ) ≤ Q∞ ≤ Q∗ ∞ = Q∗ (μj )
so that f (μj ) ≤ 0. Moreover,
−Q(μj −1 ) ≤ Q∞ ≤ Q∗ ∞ = −Q∗ (μj −1 ).
Thus f (μj −1 ) ≥ 0 and it follows that σj ≤ 0. Similarly, σj ≤ 0 if Q∗ (μj ) < 0.
If σj < 0, f must have a zero in Ij since it is continuous. Suppose σj = 0.
Then f (μj −1 ) = 0 or f (μj ) = 0. If f (μj ) = 0 then Q(μj ) = Q∗ (μj ). But then
μj is a maximum or minimum both for Q and Q∗ . If μj ∈ (a, b) then Q (μj ) =
13.4 Proof of the Convergence Estimates
295
Q∗ (μj ) = 0. Thus f (μj ) = f (μj ) = 0, and f has a double zero at μj . We can
count this as one zero for Ij and one for Ij +1 . If μj = b, we still have a zero in Ij .
Similarly, if f (μj −1 ) = 0, a double zero of f at μj −1 appears if μj −1 ∈ (a, b). We
count this as one zero for Ij −1 and one for Ij .
In this way we associate one zero of f for each of the k intervals Ij , j =
1, 2, . . . , k. We conclude that f has at least k zeros in [a, b].
Theorem 13.6 with a, and b, the smallest and largest eigenvalue of A, and c = 0
implies that the minimizing polynomial in (13.29) is given by
Q∗ (x) = Tk
b + a − 2x
b−a
/Tk
b+a
b−a
(13.33)
.
By Lemma 13.5
max Tk
a≤x≤b b + a − 2x max Tk (t) = 1.
= −1≤t
≤1
b−a
Moreover with t = (b + a)/(b − a) we have
√
κ +1
2
t + t −1= √
,
κ −1
(13.34)
κ = b/a.
Thus again by Lemma 13.5 we find
Tk
b+a
b−a
= Tk
κ +1
κ −1
1
=
2
√
κ +1
√
κ −1
k
+
√
κ −1
√
κ +1
k
(13.35)
and (13.29) follows (Fig. 13.3).
Fig. 13.3 This is an illustration of the proof of Theorem 13.6 for k = 3. f ≡ Q −Q∗ has a double
zero at μ1 and one zero between μ2 and μ3
296
13 The Conjugate Gradient Method
13.4.2 Convergence Proof for Steepest Descent
For the proof of (13.19) the following inequality will be used.
Theorem 13.7 (Kantorovich Inequality) For any positive definite matrix A ∈
Rn×n
1≤
(y T Ay)(y T A−1 y)
(M + m)2
≤
4Mm
(y T y)2
y = 0, y ∈ Rn ,
(13.36)
where M := λmax and m := λmin are the largest and smallest eigenvalue of A,
respectively.
Proof If (λj , uj ) are orthonormal eigenpairs of A then (λ−1
j , uj ) are eigenpairs
n
−1
for A , j = 1, . . . , n. Let y =
j =1 cj uj be the corresponding eigenvector
expansion of a vector y ∈ Rn . By orthonormality, (cf. (6.9))
a :=
y T Ay
=
yT y
n
ti λi ,
b :=
i=1
y T A−1 y
=
yT y
n
i=1
ti
,
λi
(13.37)
where
ti =
ci2
n
2
j =1 cj
n
≥ 0,
i = 1, . . . , n and
ti = 1.
(13.38)
i=1
Thus a and b are convex combinations of the eigenvalues of A and A−1 , respectively. Let c be a positive constant to be chosen later. By the geometric/arithmetic
mean inequality (8.33) and (13.37)
√
ab =
1
(ac)(b/c) ≤ (ac + b/c)/2 =
2
n
i=1
1
ti λi c + 1/(λi c) =
2
n
ti f (λi c),
i=1
where f : [mc, Mc] → R is given by f (x) := x + 1/x. By (13.38)
√
ab ≤
1
max f (x).
2 mc≤x≤Mc
Since f ∈ C 2 and f is positive it follows from Lemma 8.2 that f is a convex
function. But a convex function takes it maximum at one of the endpoints of the
range (cf. Exercise 13.16) and we obtain
√
1
ab ≤ max{f (mc), f (Mc)}.
2
(13.39)
13.4 Proof of the Convergence Estimates
297
%
%
√
M
m
Choosing c := 1/ mM we find f (mc) = f (Mc) =
+
m
M =
By (13.39) we obtain
M+m
√
.
mM
(y T Ay)(y T A−1 y)
(M + m)2
,
=
ab
≤
(y T y)2
4Mm
the upper bound in (13.36). For the lower bound we use the Cauchy-Schwarz
inequality as follows
1=
n
ti
2
3
42
n
=
(ti λi )
i=1
1/2
(ti /λi )
1/2
≤
i=1
n
ti λi
n
i=1
ti /λi = ab.
i=1
Proof of (13.19) Let
show that
:= x − x j , j = 0, 1, . . ., where Ax = b. It is enough to
j
2
k+1 A
k 2A
for then k A ≤
κ−1
κ+1
≤
2
κ −1
κ +1
k−1 ≤ · · · ≤
k+1
=
k = 0, 1, 2, . . . ,
,
k − αk r k ,
κ−1
κ+1
k
(13.40)
0 . It follows from (13.8) that
αk :=
r Tk r k
r Tk Ar k
.
We find
k 2A =
2
k+1 A
=(
=
T
kA k
k
= r Tk A−1 r k ,
− αk r k )T A(
T
kA k
k
− 2αk r Tk A
− αk r k )
k
+ αk2 r Tk Ar k = k 2A −
(r Tk r k )2
r Tk Ar k
.
Combining these and using Kantorovich inequality
2
k+1 A
k 2A
=1−
(r Tk r k )2
(r Tk Ar k )(r Tk A−1 r k )
≤1−
4λmin λmax
=
(λmin + λmax )2
κ −1
κ +1
2
and (13.40) is proved.
298
13 The Conjugate Gradient Method
13.4.3 Monotonicity of the Error
The error analysis for the conjugate gradient method is based on the A-norm. We
end this chapter by considering the Euclidian norm of the error, and show that it is
strictly decreasing.
Theorem 13.8 (The Error in cg Is Strictly Decreasing) Let in the conjugate
gradient method m be the smallest integer such that r m+1 = 0. For k ≤ m we
have k+1 2 < k 2 . More precisely,
k 22 − where
j
2
k+1 2
=
p k 22
p k 2A
( k 2A + 2
k+1 A )
= x − x j and Ax = b.
Proof For j ≤ m
j
= x m+1 − x j = x m − x j + αm p m = x m−1 − x j + αm−1 p m−1 + αm p m = . . .
so that
m
j
=
αi pi ,
αi =
i=j
r Ti r i
p Ti Ap i
(13.41)
.
By (13.41) and A-orthogonality
m
j 2A =
jA
j
=
m
αi2 pTi Api =
i=j
i=j
(r Ti r i )2
pTi Ap i
.
(13.42)
By (13.17) and Lemma 13.3
pTi p k = (r i + βi−1 p i−1 )T pk = βi−1 p Ti−1 pk = · · · = βi−1 · · · βk (p Tk pk ),
and since βi−1 · · · βk = (r Ti r i )/(r Tk r k ) we find
p Ti p k =
r Ti r i
r Tk r k
pTk p k ,
i ≥ k.
Since
k 22 = k+1
+ x k+1 − x k 22 = k+1
+ αk pk 22 ,
(13.43)
13.5 Preconditioning
299
we obtain
k 22 − 2
k+1 2
=αk 2pTk
k+1
m
= αk 2
(13.41)
+ αk pTk p k
αi pTi p k + αk pTk pk =
i=k+1
m
m
m
+
i=k
m
r Tk r k
(13.43)
=
+
p Tk Ap k
i=k
i=k+1
2
(13.42) p k 2 k 2A
=
p k 2A
+
r Ti r i
pTi Api
2
k+1 A
αk αi pTi pk
i=k+1
r Ti r i T
pk p k
r Tk r k
.
and the Theorem is proved.
13.5 Preconditioning
For problems Ax = b of size n, where both n and cond2 (A) are large, it is
often possible to improve the performance of the conjugate gradient method by
using a technique known as preconditioning. Instead of Ax = b we consider an
equivalent system BAx = Bb, where B is nonsingular and cond2 (BA) is smaller
than cond2 (A). The matrix B will in many cases be the inverse of another matrix,
B = M −1 . We cannot use CG on BAx = Bb directly since BA in general is not
symmetric even if both A and B are. But if B (and hence M) is positive definite
then we can apply CG to a symmetrized system and then transform the recurrence
formulas to an iterative method for the original system Ax = b. This iterative
method is known as the preconditioned conjugate gradient method. We shall see
that the convergence properties of this method is determined by the eigenvalues of
BA.
Suppose B is positive definite. By Theorem 4.4 there is a nonsingular matrix C
such that B = C T C. (C is only needed for the derivation and will not appear in the
final formulas). Now
BAx = Bb ⇔ C T (CAC T )C −T x = C T Cb ⇔ (CAC T )y = Cb,
where y := C −T x. We have 3 linear systems
Ax = b
(13.44)
BAx = Bb
(13.45)
(CAC )y = Cb, & x = C y.
T
T
(13.46)
300
13 The Conjugate Gradient Method
Note that (13.44) and (13.46) are positive definite linear systems. In addition to
being positive definite the matrix CAC T is similar to BA. Indeed,
C T (CAC T )C −T = BA.
Thus CAC T and BA have the same eigenvalues. Therefore, if we apply the conjugate gradient method to (13.46) then the rate of convergence will be determined by
the eigenvalues of BA.
We apply the conjugate gradient method to (CAC T )y = Cb. Denoting the
search direction by q k and the residual by zk := Cb − CAC T y k we obtain the
following from (13.15), (13.16), and (13.17).
y k+1 = y k + αk q k ,
αk = zTk zk /q Tk (CAC T )q k ,
zk+1 = zk − αk (CAC T )q k ,
q k+1 = zk+1 + βk q k ,
βk = zTk+1 zk+1 /zTk zk .
With
x k := C T y k ,
p k := C T q k ,
s k := C T zk ,
r k := C −1 zk
(13.47)
this can be transformed into
x k+1 = x k + αk pk ,
αk =
s Tk r k
pTk Apk
(13.48)
,
r k+1 = r k − αk Apk ,
(13.49)
s k+1 = s k − αk BAp k ,
(13.50)
p k+1 = s k+1 + βk pk ,
βk =
s Tk+1 r k+1
s Tk r k
.
(13.51)
Here x k will be an approximation to the solution x of Ax = b, r k = b − Ax k is
the residual in the original system, and s k = C T zk = C T (C − CAC T )y k = Bb −
BAx k is the residual in the preconditioned system. If we set r 0 = b − Ax 0 , p 0 =
s 0 = Br 0 , we obtain the following preconditioned conjugate gradient algorithm for
determining approximations x k to the solution of a positive definite system Ax = b,
by considering the system BAx = Bb, with B positive definite. The iteration is
stopped when ||r k ||2 /||b||2 ≤ tol or k > itmax. K is the number of iterations used,
and x(= x 0 ) is the starting iteration.
13.5 Preconditioning
301
function [x,K]=pcg(A,B,b,x,tol,itmax)
% [x,K]=pcg(A,B,b,x,tol,itmax)
r=b-A*x; p=B*r; s=p; rho=s’*r; rho0=b’*b;
for k=0:itmax
if sqrt(rho/rho0)<= tol
K=k; return
end
t=A*p; a=rho/(p’*t);
x=x+a*p; r=r-a*t;
w=B*t; s=s-a*w;
rhos=rho; rho=s’*r;
p=r+(rho/rhos)*p;
end
K=itmax+1;
end
Listing 13.3 pcg
Apart from the calculation of ρ this algorithm is quite similar to Algorithm 13.1.
The main additional work is contained in w = B ∗ t. We’ll discuss this further in
connection with an example. There the inverse of B is known and we have to solve
a linear system to find w.
We have the following convergence result for this algorithm.
Theorem 13.9 (Error Bound Preconditioned cg) Suppose we apply a positive
definite preconditioner B to the positive definite system Ax = b. Then the quantities
x k computed in Algorithm 13.3 satisfy the following bound:
√
||x − x k ||A
κ −1
≤2 √
||x − x 0 ||A
κ +1
k
for k ≥ 0,
,
where κ = λmax /λmin is the ratio of the largest and smallest eigenvalue of BA.
Proof Since Algorithm 13.3 is equivalent to solving (13.46) by the conjugate
gradient method Theorem 13.3 implies that
√
||y − y k ||CAC T
κ −1
≤2 √
||y − y 0 ||CAC T
κ +1
k
,
for k ≥ 0,
where y k is the conjugate gradient approximation to the solution y of (13.46) and κ
is the ratio of the largest and smallest eigenvalue of CAC T . Since BA and CAC T
are similar this is the same as the κ in the theorem. By (13.47) we have
y − y k 2CAC T = (y − y k )T (CAC T )(y − y k )
= (C T (y − y k ))T A(C T (y − y k )) = x − x k 2A ,
and the proof is complete.
302
13 The Conjugate Gradient Method
We conclude that B should satisfy the following requirements for a problem of
size n:
1. The eigenvalues of BA should be located in a narrow interval. Preferably one
should be able to bound the length of the interval independently of n.
2. The evaluation of Bx for a given vector x should not be expensive in storage and
arithmetic operations, ideally O(n) for both.
In this book we only consider one example of a preconditioner. For a comprehensive treatment of preconditioners see [1].
13.6 Preconditioning Example
13.6.1 A Variable Coefficient Problem
Consider the problem
∂
− ∂x
c(x, y) ∂u
∂x −
∂
∂y
c(x, y) ∂u
∂y = f (x, y),
u(x, y) = 0,
(x, y) ∈ Ω = (0, 1)2 ,
(x, y) ∈ ∂Ω.
(13.52)
Here Ω is the open unit square while ∂Ω is the boundary of Ω. The functions f
and c are given and we seek a function u = u(x, y) such that (13.52) holds. We
assume that c and f are defined and continuous on Ω and that c(x, y) > 0 for
all (x, y) ∈ Ω. The problem (13.52) reduces to the Poisson problem (10.1) in the
special case where c(x, y) = 1 for (x, y) ∈ Ω .
To solve (13.52) numerically, we choose m ∈ N, set h := 1/(m + 1), and define
index sets
Im := {(j, k) : 1 ≤ j, k ≤ m},
I m := {(j, k) : 0 ≤ j, k ≤ m + 1},
∂Im := I m \ Im .
We compute approximations vj,k ≈ u(xj , yk ) on a grid of points
{(xj , yk ) = (j h, kh) : (j, k) ∈ I m }
13.6 Preconditioning Example
303
using a finite difference method. For univariate functions f, g we approximate
derivatives by using the central difference approximations
d
d
h d
h d
h
f (t) g(t) ≈ f (t + ) g(t + h/2) − f (t − ) g(t − ) / h
dt
dt
2 dt
2 dt
2
≈ f (t +
h h ) g(t + h) − g(t) − f (t − ) g(t) − g(t − h) / h2
2
2
to obtain
(Lh v)j,k :=
(dv)j,k
= fj,k ,
h2
vj,k = 0,
(j, k) ∈ Im ,
(13.53)
(j, k) ∈ ∂Im ,
(13.54)
where fj,k := f (xj , yk ), (dv)j,k := (d1 v)j,k + (d2 v)j,k ,
∂ ∂u c
,
∂x ∂x j,k
∂ ∂u c
− vj,k−1 ) − cj,k+ 1 (vj,k+1 − vj,k ) ≈ −h2
,
2
∂y ∂y j,k
(13.55)
(d1 v)j,k := cj − 1 ,k (vj,k − vj −1,k ) − cj + 1 ,k (vj +1,k − vj,k ) ≈ −h2
2
2
(d2 v)j,k := cj,k− 1 (vj,k
2
and where cp,q = c(ph, qh) for p, q ∈ R. The equation (13.53) can be written in
matrix form as
⎡
⎡
⎤
⎤
f1,1 . . . f1,m
(dv)1,1 . . . (dv)1,m
1 ⎢
⎢ .
⎥
..
.. ⎥ .
Lh v = F , Lh v := 2 ⎣ ...
⎦ , F := ⎣ ..
.
. ⎦
h
(dv)m,1 . . . (dv)m,m
fm,1 . . . fm,m
(13.56)
This is a linear system with the elements of
⎡
⎤
v1,1 . . . v1,m
⎢
.. ⎥
V := ⎣ ...
. ⎦
vm,1 . . . vm,m
as unknowns. The system h2 Lh v = h2 F can be written in standard form Ax = b
where x = vec(V ), b = h2 vec(F ), and the coefficient matrix A ∈ Rn×n is defined
as follows
Ax = Avec(V ) := h2 vec(Lh v).
(13.57)
304
13 The Conjugate Gradient Method
If c(x, y) = 1 for all (x, y) ∈ Ω we recover the Poisson matrix (10.8). In general
we can show that A is positive definite for all m ∈ N provided c(x, y) > 0 for all
(x, y) ∈ Ω. For this we do not need the explicit form of A.
To start we define for m ∈ N a discrete inner product on the space of matrices
Rm×m
m
V , W := h2
vj,k wj,k ,
(13.58)
j,k=1
We then have the following lemma.
Lemma 13.6 (Discrete Inner Product) If V , W ∈ Rm×m and vj,k = wj,k = 0 for
(j, k) ∈ ∂Im , then
m
m
Lh v, W =
cj,k+ 1 vj,k+1 − vj,k wj,k+1 − wj,k
2
j =1 k=0
m
m
+
j =0 k=1
cj + 1 ,k
2
vj +1,k − vj,k wj +1,k − wj,k .
(13.59)
Proof If m ∈ N, ai , bi , ci ∈ R for i = 0, . . . , m and b0 = c0 = bm+1 = cm+1 = 0
then
m
ai−1 (bi − bi−1 ) − ai (bi+1 − bi ) ci =
i=1
m
ai (bi+1 − bi )(ci+1 − ci ).
(13.60)
i=0
Indeed, the left hand side can be written
m
m
ai (bi+1 − bi )ci+1 −
i=0
ai (bi+1 − bi )ci ,
i=0
and the right hand side of (13.60) follows. We apply (13.60) to (d1 v)j,k wj,k and
(d2 v)j,k wj,k given by (13.55) and (13.59) follows.
Theorem 13.10 (Positive Definite Matrix) If c(x, y) > 0 for (x, y) ∈ Ω then the
matrix A defined by (13.57) via the linear system (13.56) is positive definite.
Proof By (13.59) Lh v, W = W , Lh v and symmetry follows. We take W =
V and obtain quadratic factors in (13.59). Since cj + 1 ,k and cj,k+ 1 correspond to
2
2
values of c in Ω for the values of j, k in the sums, it follows that they are positive
and Lh v, V ≥ 0 for all V ∈ Rm×m . If Lh v, V = 0 then all the quadratic factors
must be zero, and vj,k+1 = vj,k for k = 0, 1, . . . , m and j = 1, . . . , m. Now
vj,0 = vj,m+1 = 0 implies that V = 0. It follows that the linear system (13.56) is
positive definite.
13.6 Preconditioning Example
305
13.6.2 Applying Preconditioning
Consider solving Ax = b, where A is given by (13.57) and b ∈ Rn . Since A is
positive definite it is nonsingular and the system has a unique solution x ∈ Rn .
Moreover we can use either Cholesky factorization
or the block tridiagonal solver
√
to find x. Since the bandwidth of A is m = n both of these methods require O(n2 )
arithmetic operations for large n.
If we choose c(x, y) ≡ 1 in (13.52), we get the Poisson problem. With this in
mind, we may think of the coefficient matrix Ap arising from the discretization of
the Poisson problem as an approximation to the matrix (13.57). This suggests using
B = A−1
p , the inverse of the discrete Poisson matrix as a preconditioner for the
system (13.53).
Consider Algorithm 13.3. With this preconditioner the calculation w = Bt takes
the form Ap w k = t k .
In Sect. 11.2 we developed a Simple fast Poisson Solver, Cf. Algorithm 11.1.
This method can be utilized to solve Ap w = t.
Consider the specific problem where
c(x, y) = e−x+y and f (x, y) = 1.
We have used Algorithm 13.1 (conjugate gradient without preconditioning),
and Algorithm 13.3 (conjugate gradient with preconditioning) to solve the problem (13.52). We used x 0 = 0 and = 10−8 . The results are shown in Table 13.3.
Without preconditioning
the number of iterations still seems to be more or
√
less proportional to n although the convergence is slower than for the constant
coefficient problem. Using preconditioning speeds up the convergence considerably.
The number of iterations appears to be bounded independently of n.
Using a preconditioner increases the work in each iteration. For the present
example the number of arithmetic operations in each iteration changes from O(n)
without preconditioning to O(n3/2 ) or O(n log2 n) with preconditioning. This is not
a large increase and both the number of iterations and the computing time is reduced
significantly.
Let us finally show that the number κ = λmax /λmin which determines the rate
of convergence for the preconditioned conjugate gradient method applied to (13.52)
can be bounded independently of n.
Table 13.3 The number of iterations K (no preconditioning) and Kpre (with preconditioning) for
the problem (13.52) using the discrete Poisson problem as a preconditioner
n
K
√
K/ n
Kpre
2500
222
4.44
22
10000
472
4.72
23
22500
728
4.85
23
40000
986
4.93
23
62500
1246
4.98
23
306
13 The Conjugate Gradient Method
Theorem 13.12 (Eigenvalues of Preconditioned Matrix) Suppose 0 < c0 ≤
c(x, y) ≤ c1 for all (x, y) ∈ [0, 1]2. For the eigenvalues of the matrix BA = A−1
p A
just described we have
κ=
λmax
c1
≤ .
λmin
c0
n
Proof Suppose A−1
p Ax = λx for some x ∈ R \ {0}. Then Ax = λAp x.
T
Multiplying this by x and solving for λ we find
λ=
x T Ax
.
x T Ap x
We computed x T Ax in (13.59) and we obtain x T Ap x by setting all the c’s there
equal to one
m
m
x T Ap x =
vi,j +1 − vi,j
i=1 j =0
2
m
m
+
2
vi+1,j − vi,j .
j =1 i=0
Thus x T Ap x > 0 and bounding all the c’s in (13.59) from below by c0 and above
by c1 we find
c0 (x T Ap x) ≤ x T Ax ≤ c1 (x T Ap x)
which implies that c0 ≤ λ ≤ c1 for all eigenvalues λ of BA = A−1
p A.
Using c(x, y) = e−x+y as above, we find c0 = e−2 and c1 = 1. Thus κ ≤ e2 ≈
7.4, a quite acceptable matrix condition number which explains the convergence
results from our numerical experiment.
13.7 Exercises Chap. 13
13.7.1 Exercises Sect. 13.1
Exercise 13.1 (A-Norm) One can show that the A-norm is a vector norm on Rn
without using the fact that it is an inner product norm. Show this with the help of
the Cholesky factorization of A.
Exercise 13.2 (Paraboloid) Let A = U DU T be the spectral decomposition of A,
i.e., U is orthogonal and D = diag(λ1 , . . . , λn ) is diagonal. Define new variables
13.7 Exercises Chap. 13
307
v = [v1 , . . . , vn ]T := U T y, and set c := U T b = [c1 , . . . , cn ]T . Show that
Q(y) =
1
2
n
n
λj vj2 −
j =1
cj vj .
j =1
Exercise 13.3 (Steepest Descent Iteration) Verify the numbers in Example 13.1.
Exercise 13.4 (Steepest Descent (Exam Exercise 2011-1)) The method of steepest descent can be used to solve a linear system Ax = b for x ∈ Rn , where A ∈ Rn,n
is symmetric and positive definite, and b ∈ Rn . With x 0 ∈ Rn an initial guess,
the iteration is x k+1 = x k + αk r k , where r k is the residual, r k = b − Ax k , and
r Tk r k
.
r Tk Ar k
αk =
2 −1
a) Compute x 1 if A =
, b = [1 1]T and x 0 = 0.
−1 2
b) If the k-th error, ek = x k − x, is an eigenvector of A, what can you say about
x k+1 ?
13.7.2 Exercises Sect. 13.2
Exercise 13.5 (Conjugate Gradient Iteration, II) Do one iteration with the
T
conjugate gradient method when x 0 = 0. (Answer: x 1 = bT b b.)
b Ab
Exercise 13.6 (Conjugate Gradient Iteration, III) Do two conjugate gradient
iterations for the system
2 −1
−1 2
0
x1
=
x2
3
starting with x 0 = 0.
Exercise 13.7 (The cg Step Length Is Optimal) Show that the step length αk in
the conjugate gradient method is optimal.2
Exercise 13.8 (Starting Value in cg) Show that the conjugate gradient method
(13.18) for Ax = b starting with x 0 is the same as applying the method to the
system Ay = r 0 := b − Ax 0 starting with y 0 = 0.3
k−1
j =0 ak,j r j
2 Hint:
use induction on k to show that pk = r k +
3 Hint:
The conjugate gradient method for Ay = r 0 can be written y k+1 := y k + γk q k , γk :=
s Tk s k
,
q Tk Aq k
s k+1 := s k − γk Aq k , q k+1 := s k+1 + δk q k , δk :=
s k = r k , and q k = pk , for k = 0, 1, 2 . . ..
for some constants ak,j .
s Tk+1 s k+1
.
s Tk s k
Show that y k = x k − x 0 ,
308
13 The Conjugate Gradient Method
Exercise 13.9 (Program Code for Testing Steepest Descent) Write a function
K=sdtest(m,a,d,tol,itmax) to test the steepest descent method on the
matrix T 2 . Make the analogues of Tables 13.1 and 13.2. For Table 13.2 it √
is enough
to test for say n = 100, 400, 1600, 2500, and tabulate K/n instead of K/ n in the
last row. Conclude that the upper bound (13.19) is realistic. Compare also with the
number of iterations for the J and GS method in Table 12.1.
Exercise 13.10 (Using cg to Solve Normal Equations) Consider solving the
linear system AT Ax = AT b by using the conjugate gradient method. Here
A ∈ Rm,n , b ∈ Rm and AT A is positive definite.4 Explain why only the following
modifications in Algorithm 13.1 are necessary
1. r=A’(b-A*x); p=r;
2. a=rho/(t’*t);
3. r=r-a*A’*t;
Note that the condition number of the normal equations is cond2 (A)2 , the square of
the condition number of A.
Exercise 13.11 (AT A Inner Product (Exam Exercise 2018-3)) In this problem
we consider linear systems of the form Ax = b, where A ∈ Rn×n and b ∈ Rn
are given, and x ∈ Rn is the unknown vector. We assume throughout that A is
nonsingular.
a) Let {v i }ki=1 be a set of linearly independent vectors in Rn , and let ·, · be an
inner product in Rn . Explain that the k × k-matrix N with entries nij = v i , v j is symmetric positive definite.
b) Let W ⊂ Rn be any linear subspace. Show that there is one and only one vector
x̂ ∈ W so that
wT AT Ax̂ = w T AT b,
for all w ∈ W,
and that x̂ satisfies
b − Ax̂2 ≤ b − Aw2 ,
for all w ∈ W.
c) In the rest of this problem we consider the situation above, but where the vector
space W is taken to be the Krylov space
Wk := span(b, Ab, . . . , Ak−1 b).
We use the inner product in Rn given by
v, wA := v T AT Aw,
v, w ∈ Rn .
4 This system known as the normal equations appears in linear least squares problems and was
considered in this context in Chap. 9.
13.7 Exercises Chap. 13
309
The associated approximations of x, corresponding to x̂ in Wk , are then denoted
x k . Assume that x k ∈ Wk is already determined. In addition, assume that we
already have computed a “search direction” pk ∈ Wk+1 such that Apk 2 =
pk A = 1, and such that
p k , wA = 0,
for all w ∈ Wk .
Show that x k+1 = x k + αk p k for a suitable αk ∈ R, and express αk in terms of
the residual r k := b − Ax k , and p k .
d) Assume that A is symmetric, but not necessarily positive definite. Assume
further that the vectors pk−2 , p k−1 , and pk are already known with properties as
above. Show that
Ap k−1 ∈ span(pk−2 , p k−1 , pk ).
Use this to suggest how the search vectors p k can be computed recursively.
13.7.3 Exercises Sect. 13.3
Exercise 13.12 (Krylov Space and cg Iterations) Consider the linear system
Ax = b where
⎡
⎤
2 −1 0
A = ⎣ −1 2 −1 ⎦ ,
0 −1 2
⎡ ⎤
4
⎣
and b = 0 ⎦ .
0
a) Determine the vectors defining the Krylov spaces for k ≤ 3 taking as initial
⎡
⎤
4 8 20
approximation x = 0. Answer: [b, Ab, A2 b] = ⎣ 0 −4 −16 ⎦ .
0 0
4
b) Carry out three CG-iterations on Ax = b. Answer:
⎡
⎤
0 2 8/3 3
[x 0 , x 1 , x 2 , x 3 ] = ⎣ 0 0 4/3 2 ⎦ ,
00 01
⎡
⎤
40 00
[r 0 , r 1 , r 2 , r 3 ] = ⎣ 0 2 0 0 ⎦ ,
0 0 4/3 0
310
13 The Conjugate Gradient Method
⎡
⎤
8 0
0
[Ap0 , Ap1 , Ap2 ] = ⎣ −4 3
0⎦,
0 −2 16/9
⎡
⎤
4 1 4/9 0
[p0 , p1 , p 2 , p 3 ] = ⎣ 0 2 8/9 0 ⎦ ,
0 0 12/9 0
c) Verify that
•
•
•
•
•
•
dim(Wk ) = k for k = 0, 1, 2, 3.
x 3 is the exact solution of Ax = b.
r 0 , . . . , r k−1 is an orthogonal basis for Wk for k = 1, 2, 3.
p 0 , . . . , pk−1 is an A-orthogonal basis for Wk for k = 1, 2, 3.
{r k is monotonically decreasing.
{x k − x is monotonically decreasing.
Exercise 13.13 (Antisymmetric System (Exam Exercise 1983-3)) In this and the
next exercise x, y = x T y is the usual inner product in Rn . We note that
x, y = y, x,
Cx, y = y, C T x
x, y ∈ Rn ,
(13.61)
(13.61)
= C T x, y,
x, y ∈ Rn , C ∈ Rn×n .
(13.62)
Let B ∈ Rn×n be an antisymmetric matrix, i.e., B T = −B, and let A := I − B,
where I is the unit matrix in Rn .
a) Show that
Bx, x = 0,
x ∈ Rn ,
(13.63)
Ax, x = x, x = x22 .
b) Show that Ax22 = x22 + Bx22 and that A2 =
c) Show that A is nonsingular,
A−1 2 = max
x =0
(13.64)
%
1 + B22 .
x2
,
Ax2
and A2 ≤ 1.
d) Let 1 ≤ k ≤ n, W = span(w1 , . . . , w k ) a k-dimensional subspace of Rn and
b ∈ Rn . Show that if x ∈ W is such that
Ax, w = b, w for all w ∈ W,
then x2 ≤ b2 .
(13.65)
13.7 Exercises Chap. 13
311
k
With x :=
j =1 xj w j the problem (13.65) is equivalent to finding real
numbers x1 , . . . , xk solving the linear system
k
xj Awj , w i = b, wi ,
i = 1, . . . , k.
(13.66)
j =1
Show that (13.65) has a unique solution x ∈ W.
e) Let x ∗ := A−1 b. Show that
x ∗ − x2 ≤ A2 min x ∗ − w2 .
w∈W
(13.67)
Exercise 13.14 (cg Antisymmetric System (Exam Exercise 1983-4)) (It is recommended to study Exercise 13.13 before starting this exercise.) As in Exercise 13.13 let B ∈ Rn×n be an antisymmetric matrix, i.e., B T = −B, let
x, y = x T y be the usual inner product in Rn , let A := I − B, where I is the
unit matrix in Rn and b ∈ Rn . The purpose of this exercise is to develop an iterative
algorithm for the linear system Ax = b. The algorithm is partly built on the same
idea as for the conjugate gradient method for positive definite systems.
Let x 0 = 0 be the initial approximation to the exact solution x ∗ := A−1 b. For
k = 1, 2, . . . , n we let
Wk := span(b, Bb, . . . , B k−1 b).
For k = 1, 2, . . . , n we define x k ∈ Wk by
Ax k , w = b, w, for all w ∈ Wk .
The vector x k is uniquely determined as shown in Exercise 13.13d) and that it is a
“good” approximation to x∗ follows from (13.67). In this exercise we will derive a
recursive algorithm to determine x k .
For k = 0, . . . , n we set
r k := b − Ax k , and ρk := r k 22 .
Let m ∈ N be such that
ρk = 0,
k = 0, . . . , m.
Let ω0 , ω1 , . . . , ωm be real numbers defined recursively for k = 1, 2, . . . , m by
ωk :=
1,
if k = 0
−1
(1 + ωk−1
ρk /ρk−1 )−1 ,
otherwise.
(13.68)
312
13 The Conjugate Gradient Method
We will show below that x k and r k satisfy the following recurrence relations for
k = 0, 1, . . . , m − 1
x k+1 = (1 − ωk )x k−1 + ωk (x k + r k ),
(13.69)
r k+1 = (1 − ωk )r k−1 + ωk Br k ,
(13.70)
starting with x 0 = x −1 = 0 and r 0 = r −1 = b.
a) Show that 0 < ωk < 1 for k = 1, 2, . . . , m.
b) Explain briefly how to define an iterative algorithm for determining x k using
the formulas (13.68), (13.69), (13.70) and estimate the number of arithmetic
operations in each iteration.
c) Show that r k , r j = 0 for j = 0, 1, . . . , k − 1.
d) Show that if k ≤ m + 1 then Wk = span(r 0 , r 1 , . . . , r k−1 ) and dimWk = k.
e) Show that if 1 ≤ k ≤ m − 1 then
Br k = αk r k+1 + βk r k−1 ,
(13.71)
where αk := Br k , r k+1 /ρk+1 and βk := Br k , r k−1 /ρk−1 .
f) Define α0 := Br 0 , r 1 /ρ1 and show that α0 = 1.
g) Show that if 1 ≤ k ≤ m − 1 then βk = −αk−1 ρk /ρk−1 .
h) Show that5
r k+1 , A−1 r k+1 = r k+1 , A−1 r j ,
j = 0, 1, . . . , k.
(13.72)
i) Use (13.71) and (13.72) to show that αk + βk = 1 for k = 1, 2, . . . , m − 1.
j) Show that αk ≥ 1 for k = 1, 2, . . . , m − 1.
k) Show that x k , r k and ωk satisfy the recurrence relations (13.68), (13.69)
and (13.70).
13.7.4 Exercises Sect. 13.4
Exercise 13.15 (Another Explicit Formula for the Chebyshev Polynomial)
Show that
Tn (t) = cosh(narccosh t) for t ≥ 1,
where arccosh is the inverse function of cosh x := (ex + e−x )/2.
5 Hint:
Show that A−1 (r k+1 − r j ) ∈ Wk+1 .
13.8 Review Questions
313
Exercise 13.16 (Maximum of a Convex Function) Show that if f : [a, b] → R
is convex then maxa≤x≤b f (x) ≤ max{f (a), f (b)}.
13.7.5 Exercises Sect. 13.5
Exercise 13.17 (Variable
form
⎡
a1,1
⎢
⎢ −c 3 ,1
2
Ax = ⎢
⎢ −c 3
⎣ 1, 2
0
Coefficient) For m = 2, show that (13.57) takes the
−c 3 ,1 −c1, 3
2
2
a2,2
0
0
a3,3
−c2, 3 −c 3 ,2
2
2
⎤
⎡
⎤ ⎡
⎤
(dv)1,1
⎥ v1,1
−c2, 3 ⎥ ⎢v2,1 ⎥ ⎢(dv)2,1 ⎥
2 ⎥⎢
⎥ ⎢
⎥
⎣v1,2 ⎦ = ⎣(dv)1,2 ⎦ ,
−c 3 ,2 ⎥
⎦
2
v2,2
(dv)2,2
a4,4
0
where
⎡
⎤
c 1 ,1 + c1, 1
a1,1
2
⎢ 2
⎢ a2,2 ⎥ ⎢ c 3 ,1 + c2, 1
⎥=⎢ 2
⎢
2
⎣ a3,3 ⎦ ⎢ c 1 + c 3
⎣ 2 ,2
1, 2
a4,4
c3 + c 3
⎡
2 ,2
2, 2
+ c1, 3 + c 3 ,1
⎤
2
2
⎥
+ c2, 3 + c 5 ,1 ⎥
⎥.
2
2
+ c1, 5 + c 3 ,2 ⎥
⎦
2
2
+ c2, 5 + c 5 ,2
2
2
Show that the matrix A is symmetric, and if c(x, y) > 0 for all (x, y) ∈ Ω then it
is strictly diagonally dominant.
13.8 Review Questions
13.8.1
13.8.2
13.8.3
13.8.4
13.8.5
Does the steepest descent and conjugate gradient method always converge?
What kind of orthogonalities occur in the conjugate gradient method?
What is a Krylov space?
What is a convex function?
How do SOR and conjugate gradient compare?
Part VI
Eigenvalues and Eigenvectors
In this and the next chapter we briefly give some numerical methods for finding one
or more eigenvalues and eigenvectors of a matrix. Both Hermitian and non hermitian
matrices are considered.
But first we consider a location result for eigenvalues and then give a useful upper
bound for how much an eigenvalue can change when the elements of the matrix is
perturbed.
Chapter 14
Numerical Eigenvalue Problems
14.1 Eigenpairs
Consider the eigenpair problem for some classes of matrices A ∈ Cn×n .
Diagonal Matrices The eigenpairs are easily determined. Since Aei = aii ei
the eigenpairs are (λi , ei ), where λi = aii for i = 1, . . . , n. Moreover, the
eigenvectors of A are linearly independent.
Triangular Matrices Suppose A is upper
or lower triangular. Consider finding
the eigenvalues Since det(A − λI ) = ni=1 (aii − λ) the eigenvalues are λi = aii
for i = 1, . . . , n, the diagonal elements of A. To determine the eigenvectors
can be more challenging since A can be defective, i.e., the eigenvectors are not
necessarily linearly independent, cf. Chap. 6.
Block Diagonal Matrices Suppose
A = diag(A1 , A2 , . . . , Ar ),
Ai ∈ Cmi ×mi .
Here the eigenpair problem reduces to r smaller problems. Let Ai X i = Xi D i
define the eigenpairs of Ai for i = 1, . . . , r and let X := diag(X1 , . . . , X r ),
D := diag(D 1 , . . . , D r ). Then the eigenpairs for A are given by
AD = diag(A1 , . . . , Ar ) diag(X 1 , . . . , Xr ) = diag(A1 X1 , . . . , Ar Xr )
= diag(X1 D 1 , . . . , Xr D r ) = XD.
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_14
317
318
14 Numerical Eigenvalue Problems
Block Triangular matrices Matrices Let
A11 , A22 , . . . , Arr be the diagonal blocks of A. By Property 8. of determinants
det(A − λI ) =
r
det(Aii − λI )
i=1
and the eigenvalues are found from the eigenvalues of the diagonal blocks.
In this and the next chapter we consider some numerical methods for finding one
or more of the eigenvalues and eigenvectors of a matrix A ∈ Cn×n . Maybe the first
method which comes to mind is to form the characteristic polynomial πA of A, and
then use a polynomial root finder, like Newton’s method to determine one or several
of the eigenvalues.
It turns out that this is not suitable as an all purpose method. One reason is that
a small change in one of the coefficients of πA (λ) can lead to a large change in the
roots of the polynomial. For example, if πA (λ :) = λ16 and q(λ) = λ16 −10−16 then
the roots of πA are all equal to zero, while the roots of q are λj = 10−1 e2πij/16 ,
j = 1, . . . , 16. The roots of q have absolute value 0.1 and a perturbation in one
of the polynomial coefficients of magnitude 10−16 has led to an error in the roots
of approximately 0.1. The situation can be somewhat remedied by representing the
polynomials using a different basis.
In this text we will only consider methods which work directly with the matrix.
But before that, in Sect. 14.3 we consider how much the eigenvalues change when
the elements in the matrix are perturbed. We start with a simple but useful result for
locating the eigenvalues.
14.2 Gershgorin’s Theorem
The following theorem is useful for locating eigenvalues of an arbitrary square
matrix.
Theorem 14.1 (Gershgorin’s Circle Theorem) Suppose A ∈ Cn×n . Define for
i = 1, 2, . . . , n
n
Ri = {z ∈ C : |z − aii | ≤ ri },
ri :=
|aij |,
j =1
j =i
n
Cj = {z ∈ C : |z − ajj | ≤ cj },
cj :=
|aij |.
i=1
i =j
14.2 Gershgorin’s Theorem
319
Imaginary axis
Fig. 14.1 The Gershgorin
disk Ri
6
ai,i
- ri
- Real axis
Then any eigenvalue of A lies in R ∩ C where R = R1 ∪ R2 ∪ · · · ∪ Rn and
C = C1 ∪ C2 ∪ · · · ∪ Cn .
Proof Suppose (λ, x) is an eigenpair for A. We claim that λ ∈ Ri , where i is such
that |xi | = x∞ . Indeed, Ax = λx implies that j aij xj = λxi or (λ − aii )xi =
j =i aij xj . Dividing by xi and taking absolute values we find
|λ − aii | = |
aij xj /xi | ≤
j =i
|aij ||xj /xi | ≤ ri
j =i
since |xj /xi | ≤ 1 for all j . Thus λ ∈ Ri .
Since λ is also an eigenvalue of AT , it must be in one of the row disks of AT .
But these are the column disks Cj of A. Hence λ ∈ Cj for some j .
The set Ri is a subset of the complex plane consisting of all points inside a circle
with center at aii and radius ri , c.f. Fig. 14.1. Ri is called a (Gerschgorin) row disk.
An eigenvalue λ lies in the union of the row disks R1 , . . . , Rn and also in the
union of the column disks C1 , . . . , Cn . If A is Hermitian then Ri = Ci for i =
1, 2, . . . , n. Moreover, in this case the eigenvalues of A are real, and the Gerschgorin
disks can be taken to be intervals on the real line.
Example 14.1 (Gershgorin) Let T = tridiag(−1, 2, −1) ∈ Rm×m be the second
derivative matrix. Since A is Hermitian we have Ri = Ci for all i and the
eigenvalues are real. We find R1 = Rm = {z ∈ R : |z − 2| ≤ 1} and
Ri = {z ∈ R : |z − 2| ≤ 2},
i = 2, 3, . . . , m − 1.
We conclude that λ ∈ [0, 4] for any eigenvalue λ of T . To check this, we recall that
by Lemma 2.2 the eigenvalues of T are given by
λj = 4 sin
jπ
2(m + 1)
2
,
j = 1, 2, . . . , m.
320
14 Numerical Eigenvalue Problems
"2
!
π
When m is large the smallest eigenvalue 4 sin 2(m+1)
is very close to zero and
"2
!
mπ
the largest eigenvalue 4 sin 2(m+1)
is very close to 4. Thus Gerschgorin’s theorem
gives a remarkably good estimate for large m.
Sometimes some of the Gerschgorin disks are distinct and we have
Corollary 14.1 (Disjoint Gershgorin Disks) If p of the Gershgorin row disks are
disjoint from the others, the union of these disks contains precisely p eigenvalues.
The same result holds for the column disks.
Proof Consider a family of matrices
A(t) := D + t (A − D),
D := diag(a11 , . . . , ann ),
t ∈ [0, 1].
We have A(0) = D and A(1) = A. As a function of t, every eigenvalue of A(t) is
a continuous function of t. This follows from Theorem 14.2, see Exercise 14.5. The
row disks Ri (t) of A(t) have radius proportional to t, indeed
n
Ri (t) = {z ∈ C : |z − aii | ≤ tri },
ri :=
|aij |.
j=1
j =i
Clearly 0 ≤ t1 < t25≤ 1 implies Ri (t1 ) ⊂ Ri (t2 ) and Ri (1) is a row disk of
p
A for all i. Suppose k=1 Rik (1) are disjoint from the other disks of A and set
5
p
R p (t) := k=1 Rik (t) for t ∈ [0, 1]. Now R p (0) contains only the p eigenvalues
ai1 ,i1 , . . . , aip ,ip of A(0) = D. As t increases from zero to one the set R p (t) is
disjoint from the other row disks of A and by the continuity of the eigenvalues
cannot loose or gain eigenvalues. It follows that R p (1) must contain p eigenvalues
of A.
1 1 2
Example 14.2 Consider the matrix A = 3 2 4 , where | i | ≤ 10−15 all i. By
5
6
3
Corollary 14.1 the eigenvalues λ1 , λ2 , λ3 of A are distinct and satisfy |λj − j | ≤
2 × 10−15 for j = 1, 2, 3.
14.3 Perturbation of Eigenvalues
In this section we study the following problem. Given matrices A, E ∈ Cn×n , where
we think of E as a perturbation of A. By how much do the eigenvalues of A and A+
E differ? Not surprisingly this problem is more complicated than the corresponding
problem for linear systems.
We illustrate this by considering two examples. Suppose A0 := 0 is the zero
matrix. If λ ∈ σ (A0 + E) = σ (E), then |λ| ≤ E∞ by Theorem 12.11, and any
14.3 Perturbation of Eigenvalues
321
zero eigenvalue of A0 is perturbed by at most E∞ . On the other hand consider
for > 0 the matrices
⎤
0
0⎥
⎥
.. ⎥ ,
.⎥
⎥
0 0 · · · 0 1⎦
0 0 0 ··· 0 0
⎡
0
⎢0
⎢
⎢
A1 := ⎢ ...
⎢
⎣0
1 0 ···
0 1 ···
.. ..
. .
0
0
..
.
⎡
⎤
00
0 0⎥
⎥
.. .. ⎥ = e eT .
n 1
. .⎥
⎥
⎦
0 ··· 0 0
0 0 ··· 0 0
00
⎢0 0
⎢
⎢
E := ⎢ ... ...
⎢
⎣0 0
0 ···
0 ···
..
.
The characteristic polynomial of A1 + E is π(λ) := (−1)n (λn − ), and the zero
1/n
eigenvalues of A1 are perturbed by the amount |λ| = E∞ . Thus, for n = 16, a
−16
perturbation of say = 10
gives a change in eigenvalue of 0.1.
1/n
The following theorem shows that a dependence E∞ is the worst that can
happen.
Theorem 14.2 (Elsner’s Theorem (1985)) Suppose A, E ∈ Cn×n . To every μ ∈
σ (A + E) there is a λ ∈ σ (A) such that
1/n
|μ − λ| ≤ KE2 ,
1−1/n
K = A2 + A + E2
.
(14.1)
Proof Suppose A has eigenvalues λ1 , . . . , λn and let λ1 be one which is closest to
μ. Let u1 with u1 2 = 1 be an eigenvector corresponding to μ, and extend u1 to
an orthonormal basis {u1 , . . . , un } of Cn . Note that
(μI − A)u1 2 = (A + E)u1 − Au1 2 = Eu1 2 ≤ E2 ,
n
(μI − A)uj 2 ≤
j =2
n
n−1
(|μ| + Auj 2 ) ≤ (A + E)2 + A2
.
j =2
Using this and Hadamard’s inequality (5.21) we find
|μ − λ1 |n ≤
n
|μ − λj | = |det(μI − A)| = |det (μI − A)[u1 , . . . , un ] |
j =1
≤ (μI − A)u1 2
n
n−1
(μI − A)uj 2 ≤ E2 (A + E)2 + A2
.
j =2
The result follows by taking nth roots in this inequality.
It follows from this theorem that the eigenvalues depend continuously on the
1/n
elements of the matrix. The factor E2 shows that this dependence is almost,
11
but not
√ quite, differentiable. As an example, the eigenvalues of the matrix 1 are
1±
and these expressions are not differentiable at = 0.
322
14 Numerical Eigenvalue Problems
14.3.1 Nondefective Matrices
Recall that a matrix is nondefective if the eigenvectors form a basis for Cn . For
nondefective matrices we can get rid of the annoying exponent 1/n in E2
in (14.1). For a more general discussion than the one in the following theorem see
[19].
Theorem 14.3 (Absolute Errors) Suppose A ∈ Cn×n has linearly independent
eigenvectors {x 1 , . . . , x n } and let X = [x 1 , . . . , x n ] be the eigenvector matrix. To
any μ ∈ C and x ∈ Cn with xp = 1 we can find an eigenvalue λ of A such that
|λ − μ| ≤ Kp (X)rp ,
1 ≤ p ≤ ∞,
(14.2)
where r := Ax − μx and Kp (X) := Xp X−1 p . If for some E ∈ Cn×n it holds
that (μ, x) is an eigenpair for A + E, then we can find an eigenvalue λ of A such
that
|λ − μ| ≤ Kp (X)Ep ,
1 ≤ p ≤ ∞,
(14.3)
Proof If μ ∈ σ (A) then we can take λ = μ and (14.2), (14.3) hold trivially. So
assume μ ∈
/ σ (A). Since A is nondefective it can be diagonalized, we have A =
XDX−1 , where D = diag(λ1 , . . . , λn ) and (λj , x j ) are the eigenpairs of A for j =
−1
−1
1, . . . , n. Define D 1 := D − μI . Then D −1
1 = diag (λ1 − μ) , . . . , (λn − μ)
exists and
−1
−1 −1
r = (A − μI )−1 (A − μI )x = x.
XD −1
1 X r = X(D − μI )X
Using this and Lemma 14.1 below we obtain
−1
−1
1 = xp = XD −1
1 X rp ≤ D 1 p Kp (X)rp =
Kp (X)rp
.
minj |λj − μ|
But then (14.2) follows. If (A + E)x = μx then 0 = Ax − μx + Ex = r + Ex.
But then rp = −Exp ≤ Ep . Inserting this in (14.2) proves (14.3).
The equation (14.3) shows that for a nondefective matrix the absolute error can
be magnified by at most Kp (X), the condition number of the eigenvector matrix
with respect to inversion. If Kp (X) is small then a small perturbation changes the
eigenvalues by small amounts.
Even if we get rid of the exponent 1/n, the equation (14.3) illustrates that it can be
difficult or sometimes impossible to compute accurate eigenvalues and eigenvectors
of matrices with almost linearly dependent eigenvectors. On the other hand the
eigenvalue problem for normal matrices is better conditioned. Indeed, if A is normal
then it has a set of orthonormal eigenvectors and the eigenvector matrix is unitary. If
14.3 Perturbation of Eigenvalues
323
we restrict attention to the 2-norm then K2 (X) = 1 and (14.3) implies the following
result.
Theorem 14.4 (Perturbations, Normal Matrix) Suppose A ∈ Cn×n is normal
and let μ be an eigenvalue of A + E for some E ∈ Cn×n . Then we can find an
eigenvalue λ of A such that |λ − μ| ≤ E2 .
For an even stronger result for Hermitian matrices see Corollary 6.13. We
conclude that the situation for the absolute error in an eigenvalue of a Hermitian
matrix is quite satisfactory. Small perturbations in the elements are not magnified in
the eigenvalues.
In the proof of Theorem 14.3 we used that the p-norm of a diagonal matrix is
equal to its spectral radius.
Lemma 14.1 (p-Norm of a Diagonal Matrix) If A = diag(λ1 , . . . , λn ) is a
diagonal matrix then Ap = ρ(A) for 1 ≤ p ≤ ∞.
Proof For p = ∞ the proof is left as an exercise. For any x ∈ Cn and p < ∞ we
have
Axp = [λ1 x1 , . . . , λn xn ]T p =
n
|λj |p |xj |p
1/p
≤ ρ(A)xp .
j =1
Ax
Thus Ap = maxx =0 xpp ≤ ρ(A). But from Theorem 12.11 we have ρ(A) ≤
Ap and the proof is complete.
For the accuracy of an eigenvalue of small magnitude we are interested in the
size of the relative error.
Theorem 14.5 (Relative Errors) Suppose in Theorem 14.3 that A ∈ Cn×n is
nonsingular. To any μ ∈ C and x ∈ Cn with xp = 1, we can find an eigenvalue λ
of A such that
rp
|λ − μ|
≤ Kp (X)Kp (A)
,
|λ|
Ap
1 ≤ p ≤ ∞,
(14.4)
where r := Ax − μx. If for some E ∈ Cn×n it holds that (μ, x) is an eigenpair for
A + E, then we can find an eigenvalue λ of A such that
Ep
|λ − μ|
≤ Kp (X)A−1 Ep ≤ Kp (X)Kp (A)
,
|λ|
Ap
1 ≤ p ≤ ∞,
Proof Applying Theorem 12.11 to A−1 we have for any λ ∈ σ (A)
Kp (A)
1
≤ A−1 p =
λ
Ap
(14.5)
324
14 Numerical Eigenvalue Problems
and (14.4) follows from (14.2). To prove (14.5) we define the matrices
B := μA−1
μ
−1
and F := −A E. If (λj , x) are the eigenpairs for A then λj , x are the eigenpairs
for B for j = 1, . . . , n. Since (μ, x) is an eigenpair for A + E we find
(B + F − I )x = (μA−1 − A−1 E − I )x = A−1 μI − (E + A) x = 0.
Thus (1, x) is an eigenpair for B + F . Applying Theorem 14.3 to this eigenvalue
we can find λ ∈ σ (A) such that | μλ − 1| ≤ Kp (X)F p = Kp (X)A−1 Ep which
proves the first estimate in (14.5). The second inequality in (14.5) follows from the
submultiplicativity of the p-norm.
14.4 Unitary Similarity Transformation of a Matrix
into Upper Hessenberg Form
Before attempting to find eigenvalues and eigenvectors of a matrix (exceptions are
made for certain sparse matrices), it is often advantageous to reduce it by similarity
transformations to a simpler form. Orthogonal or unitary similarity transformations
are particularly important since they are insensitive to round-off errors in the
elements of the matrix. In this section we show how this reduction can be carried
out.
Recall that a matrix A ∈ Cn×n is upper Hessenberg if ai,j = 0 for j =
1, 2, . . . , i −2, i = 3, 4, . . . , n. We will reduce A ∈ Cn×n to upper Hessenberg form
by unitary similarity transformations. Let A1 = A and define Ak+1 = H k Ak H k for
k = 1, 2, . . . , n − 2. Here H k is a Householder transformation chosen to introduce
zeros in the elements of column k of Ak under the subdiagonal. The final matrix
An−1 will be upper Hessenberg. Householder transformations were used in Chap. 5
to reduce a matrix to upper triangular form. To preserve eigenvalues similarity
transformations are needed and then the final matrix in the reduction cannot in
general be upper triangular.
If A1 = A is Hermitian, the matrix An−1 will be Hermitian and tridiagonal. For
if A∗k = Ak then
A∗k+1 = (H k Ak H k )∗ = H k A∗k H k = Ak+1 .
Since An−1 is upper Hessenberg and Hermitian, it must be tridiagonal.
To describe the reduction to upper Hessenberg or tridiagonal form in more detail
we partition Ak as follows
Bk Ck
Ak =
.
Dk Ek
14.4 Unitary Similarity Transformation of a Matrix into Upper Hessenberg Form
325
Suppose B k ∈ Ck,k is upper Hessenberg, and the first k − 1 columns of D k ∈
Cn−k,k are zero, i.e. D k = [0, 0, . . . , 0, d k ]. Let V k = I − v k v ∗k ∈ Cn−k,n−k be a
Householder transformation such that V k d k = αk e1 . Define
Ik 0
Hk =
0 Vk
∈ Cn×n .
The matrix H k is a Householder transformation, and we find
Ak+1
Ik 0
= H k Ak H k =
0 Vk
Ck V k
Bk
.
=
V k Dk V k Ek V k
Bk Ck
Dk Ek
Ik 0
0 Vk
Now V k D k = [V k 0, . . . , V k 0, V k d k ] = (0, . . . , 0, αk e1 ). Moreover, the matrix
B k is not affected by the H k transformation. Therefore the upper left (k+1)×(k+1)
corner of Ak+1 is upper Hessenberg and the reduction is carried one step further. The
reduction stops with An−1 which is upper Hessenberg.
To find Ak+1 we use Algorithm 5.1 to find v k and αk . We store v k in the kth
column of a matrix L as L(k + 1 : n, k) = v k . This leads to the following algorithm
for reducing a matrix A ∈ Cn×n to upper Hessenberg form using Householder
transformations. The algorithm returns the reduced matrix B. B is tridiagonal if A
is symmetric. Details of the transformations are stored in a lower triangular matrix
L, also returned by the algorithm. The elements of L can be used to assemble a
unitary matrix Q such that B = Q∗ AQ. Algorithm 5.1 is used in each step of the
reduction:
function [L,B] = hesshousegen(A)
n=length(A); L=zeros(n,n); B=A;
for k=1:n-2
[v,B(k+1,k)]=housegen(B(k+1:n,k));
L((k+1):n,k)=v; B((k+2):n,k)=zeros(n-k-1,1);
C=B((k+1):n,(k+1):n);
B((k+1):n,(k+1):n)=C-v*(v’*C);
C=B(1:n,(k+1):n); B(1:n,(k+1):n)=C-(C*v)*v’;
end
end
Listing 14.1 hesshousegen
326
14 Numerical Eigenvalue Problems
14.4.1 Assembling Householder Transformations
We can use the output of Algorithm 14.1 to assemble the matrix Q ∈ Rn×n such that
Q is orthogonal and Q∗ AQ is upper !Hessenberg.
" We need to compute the product
I
0
Q = H 1 H 2 · · · H n−2 , where H k = 0 I −v k vT and v k ∈ Rn−k . Since v 1 ∈ Rn−1
k
and v n−2 ∈ R2 it is most economical to assemble the product from right to left. We
compute
Qn−1 = I and Qk = H k Qk+1 for k = n − 2, n − 3, . . . , 1.
!
Suppose Qk+1 has the form
Qk =
Ik 0
0 Uk
"
, where U k ∈ Rn−k,n−k . Then
Ik
0
0
Ik 0
Ik
∗
=
.
0 I − v k v Tk
0 Uk
0 U k − v k (v Tk U k )
This leads to the following algorithm for assembling Householder transformations.
The algorithm assumes that L is output from Algorithm 14.1, and assembles an
orthogonal matrix Q from the columns of L so that Q∗ AQ is upper Hessenberg.
function Q = accumulateQ(L)
n=length(L); Q=eye(n);
for k=n-2:-1:1
v=L((k+1):n,k); C=Q((k+1):n,(k+1):n);
Q((k+1):n,(k+1):n)=C-v*(v’*C);
end
Listing 14.2 accumulateQ
14.5 Computing a Selected Eigenvalue of a Symmetric
Matrix
Let A ∈ Rn×n be symmetric with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn . In this section
we consider a method to compute an approximation to the mth eigenvalue λm for
some 1 ≤ m ≤ n. Using Householder similarity transformations as outlined in the
14.5 Computing a Selected Eigenvalue of a Symmetric Matrix
327
previous section we can assume that A is symmetric and tridiagonal.
⎡
d1
⎢ c1
⎢
⎢
A=⎢
⎢
⎣
⎤
c1
⎥
d 2 c2
⎥
⎥
..
.. ..
⎥.
.
.
.
⎥
cn−2 dn−1 cn−1 ⎦
cn−1 dn
(14.6)
Suppose
! one"of the off-diagonal elements is equal to zero, say ci = 0. We then have
A = A01 A02 , where
⎡
d1
⎢ c1
⎢
⎢
A1 = ⎢
⎢
⎣
⎤
⎤
⎡
c1
di+1 ci+1
⎥
⎥
⎢ci+1 di+2 ci+2
d 2 c2
⎥
⎥
⎢
⎥
⎥
⎢
.
.
.. .. ..
.
.
.
.
=
and
A
⎥
⎥.
⎢
2
.
.
.
. .
.
⎥
⎥
⎢
⎦
⎣
ci−2 di−1 ci−1
cn−2 dn−1 cn−1 ⎦
ci−1 di
cn−1 dn
Thus A is block diagonal and we can split the eigenvalue problem into two smaller
problems involving A1 and A2 . We assume that this reduction has been carried out
so that A is irreducible, i.e., ci = 0 for i = 1, . . . , n − 1.
We first show that irreducibility implies that the eigenvalues are distinct.
Lemma 14.2 (Distinct Eigenvalues of a Tridiagonal Matrix) An irreducible,
tridiagonal and symmetric matrix A ∈ Rn×n has n real and distinct eigenvalues.
Proof Let A be given by (14.6). By Theorem 6.10 the eigenvalues are real. Define
for x ∈ R the polynomial pk (x) := det(xI k − Ak ) for k = 1, . . . , n, where Ak
is the upper left k × k corner of A (the leading principal submatrix of order k).
The eigenvalues of A are the roots of the polynomial pn . Using the last column to
expand for k ≥ 2 the determinant pk+1 (x) we find
pk+1 (x) = (x − dk+1 )pk (x) − ck2 pk−1 (x).
(14.7)
Since p1 (x) = x −d1 and p2 (x) = (x −d2 )(x −d1 )−c12 this also holds for k = 0, 1
if we define p−1 (x) = 0 and p0 (x) = 1. For M sufficiently large we have
p2 (−M) > 0,
p2 (d1 ) < 0,
p2 (+M) > 0.
Since p2 is continuous there are y1 ∈ (−M, d1 ) and y2 ∈ (d1 , M) such that
p2 (y1 ) = p2 (y2 ) = 0. It follows that the root d1 of p1 separates the roots of p2 , so
y1 and y2 must be distinct. Consider next
p3 (x) = (x − d3 )p2 (x) − c22 p1 (x) = (x − d3 )(x − y1 )(x − y2 ) − c22 (x − d1 ).
328
14 Numerical Eigenvalue Problems
Since y1 < d1 < y2 we have for M sufficiently large
p3 (−M) < 0,
p3 (y1 ) > 0,
p3 (y2 ) < 0,
p3 (+M) > 0.
Thus the roots x1 , x2 , x3 of p3 are separated by the roots y1 , y2 of p2 . In the general
case suppose for k ≥ 2 that the roots z1 , . . . , zk−1 of pk−1 separate the roots
y1 , . . . , yk of pk . Choose M so that y0 := −M < y1 , yk+1 := M > yk . Then
y0 < y1 < z1 < y2 < z2 · · · < zk−1 < yk < yk+1 .
We claim that for M sufficiently large
pk+1 (yj ) = (−1)k+1−j |pk+1 (yj )| = 0, for j = 0, 1, . . . , k + 1.
This holds for j = 0, k + 1, and for j = 1, . . . , k since
pk+1 (yj ) = −ck2 pk−1 (yj ) = −ck2 (yj − z1 ) · · · (yj − zk−1 ).
It follows that the roots x1 , . . . , xk+1 are separated by the roots y1 , . . . , yk of pk and
by induction the roots of pn (the eigenvalues of A) are distinct.
14.5.1 The Inertia Theorem
We say that two matrices A, B ∈ Cn×n are congruent if A = E ∗ BE for some
nonsingular matrix E ∈ Cn×n . By Theorem 6.7 a Hermitian matrix A is both
congruent and similar to a diagonal matrix D, U ∗ AU = D where U is unitary. The
eigenvalues of A are the diagonal elements of D. Let π(A), ζ(A) and υ(A) denote
the number of positive, zero and negative eigenvalues of A. If A is Hermitian then
all eigenvalues are real and π(A) + ζ (A) + υ(A) = n.
Theorem 14.6 (Sylvester’s Inertia Theorem) If A, B ∈ Cn×n are Hermitian and
congruent then π(A) = π(B), ζ (A) = ζ (B) and υ(A) = υ(B).
Proof Suppose A = E ∗ BE, where E is nonsingular. Assume first that A and B
are diagonal matrices. Suppose π(A) = k and π(B) = m < k. We shall show that
this leads to a contradiction. Let E 1 be the upper left m × k corner of E. Since
m < k, we can find a nonzero x such that E 1 x = 0 (cf. Lemma 1.3). Let y T =
[x T , 0T ] ∈ Cn , and z = [z1 , . . . , zn ]T = Ey. Then zi = 0 for i = 1, 2, . . . , m.
If A has positive eigenvalues λ1 , . . . , λk and B has eigenvalues μ1 , . . . , μn , where
μi ≤ 0 for i ≥ m + 1 then
∗
n
k
2
y Ay =
λi |xi |2 > 0.
λi |yi | =
i=1
i=1
14.5 Computing a Selected Eigenvalue of a Symmetric Matrix
329
But
y ∗ Ay = y ∗ E ∗ BEy = z∗ Bz =
n
μi |zi |2 ≤ 0,
i=m+1
a contradiction.
We conclude that π(A) = π(B) if A and B are diagonal. Moreover, υ(A) =
π(−A) = π(−B) = υ(B) and ζ (A) = n − π(A) − υ(A) = n − π(B) − υ(B) =
ζ (B). This completes the proof for diagonal matrices.
Let in the general case U 1 and U 2 be unitary matrices such that U ∗1 AU 1 = D 1
and U ∗2 BU 2 = D 2 where D 1 and D 2 are diagonal matrices. Since A = E ∗ BE,
we find D 1 = F ∗ D 2 F where F = U ∗2 EU 1 is nonsingular. Thus D 1 and D 2
are congruent diagonal matrices. But since A and D 1 , B and D 2 have the same
eigenvalues, we find π(A) = π(D 1 ) = π(D 2 ) = π(B). Similar results hold for ζ
and υ.
Corollary 14.2 (Counting Eigenvalues Using the LDL* Factorization) Suppose
A = tridiag(ci , di , ci ) ∈ Rn×n is symmetric and that α ∈ R is such that A − αI
has an symmetric LU factorization, i.e. A − αI = LDLT where L is unit lower
triangular and D is diagonal. Then the number of eigenvalues of A strictly less than
α equals the number of negative diagonal elements in D. The diagonal elements
d1 (α), . . . , dn (α) in D can be computed recursively as follows
2
d1 (α) = d1 − α, dk (α) = dk − α − ck−1
/dk−1 (α), k = 2, 3, . . . , n.
(14.8)
Proof Since the diagonal elements in L in an LU factorization equal the diagonal
elements in D in an LDLT factorization we see that the formulas in (14.8) follows
immediately from (2.16). Since L is nonsingular, A − αI and D are congruent.
By the previous theorem υ(A − αI ) = υ(D), the number of negative diagonal
elements in D. If Ax = λx then (A − αI )x = (λ − α)x, and λ − α is an eigenvalue
of A − αI . But then υ(A − αI ) equals the number of eigenvalues of A which are
less than α.
14.5.2 Approximating λm
Corollary 14.2 can be used to determine the mth eigenvalue of A, where λ1 ≥ λ2 ≥
· · · ≥ λn . Using Gerschgorin’s theorem we first find an interval [a, b], such that
(a, b) contains the eigenvalues of A. Let for x ∈ [a, b]
ρ(x) := #{k : dk (x) > 0 for k = 1, . . . , n}
330
14 Numerical Eigenvalue Problems
be the number of eigenvalues of A which are strictly greater than x. Clearly ρ(a) =
n, ρ(b) = 0. Choosing a tolerance and using bisection we proceed as follows:
h = b − a;
f or j = 1 : itmax
c = (a + b)/2;
if b − a < eps ∗ h
λ = (a + b)/2; return
(14.9)
end
k = ρ(c);
if k ≥ m a = c else b = c;
end
We generate a sequence {[aj , bj ]} of intervals, each containing λm and bj −aj =
2−j (b − a).
As it stands this method will fail if in (14.8) one of the dk (α) is zero. One
possibility is to replace such a dk (α) by a suitable small number, say δk =
ck M , where M is the Machine epsilon, typically 2 × 10−16 for MATLAB. This
replacement is done if |dk (α)| < |δk |.
14.6 Exercises Chap. 14
14.6.1 Exercises Sect. 14.1
Exercise 14.1 (Yes or No (Exam Exercise 2006-1)) Answer simply yes or no to
the following questions:
a) Every matrix A ∈ Cm×n has a singular value decomposition?
b) The algebraic multiplicity of an eigenvalue is always less than or equal to the
geometric multiplicity?
c) The QR factorization of a matrix A ∈ Rn×n can be determined by Householder
transformations in O(n2 ) arithmetic operations?
d) Let ρ(A) be the spectral radius of A ∈ Cn×n . Then limk→∞ Ak = 0 if and only
if ρ(A) < 1?
14.6 Exercises Chap. 14
331
14.6.2 Exercises Sect. 14.2
Exercise 14.2 (Nonsingularity Using Gershgorin) Consider the matrix
⎛
410
⎜1 4 1
A=⎜
⎝0 1 4
001
⎞
0
0⎟
⎟.
1⎠
4
Show using Gershgorin’s theorem that A is nonsingular.
Exercise 14.3 (Gershgorin, Strictly Diagonally Dominant Matrix) Show using
Gershgorin’s circle theorem that a strictly diagonally dominant matrix A (|ai,i | >
j =i |ai,j | for all i) is nonsingular.
Exercise 14.4 (Gershgorin Disks (Exam Exercise 2009-2)) The eigenvalues of
A ∈ Rn,n lie inside R ∩ C, where R := R1 ∪ · · · ∪ Rn is the union of the row disks
Ri of A, and C = C1 ∪ · · · ∪ Cn is the union of the column disks Cj . You do not
need to prove this. Write a MATLAB function [s,r,c]=gershgorin(A) that
computes the centres s = [s1 , . . . , sn ] ∈ Rn of the row and column disks, and their
radii r = [r1 , . . . , rn ] ∈ Rn and c = [c1 , . . . , cn ] ∈ Rn , respectively.
14.6.3 Exercises Sect. 14.3
Exercise 14.5 (Continuity of Eigenvalues) Suppose
A(t) := D + t (A − D),
D := diag(a11, . . . , ann ),
t ∈ R.
0 ≤ t1 < t2 ≤ 1 and that μ is an eigenvalue of A(t2 ). Show, using Theorem 14.2
with A = A(t1 ) and E = A(t2 ) − A(t1 ), that A(t1 ) has an eigenvalue λ such that
|λ − μ| ≤ C(t2 − t1 )1/n , where C ≤ 2 D2 + A − D2 .
Thus, as a function of t, every eigenvalue of A(t) is a continuous function of t.
Exercise 14.6 (∞-Norm of a Diagonal Matrix) Give a direct proof that A∞ =
ρ(A) if A is diagonal.
Exercise 14.7 (Eigenvalue Perturbations (Exam Exercise 2010-2)) Let A =
[akj ], E = [ekj ], and B = [bkj ] be matrices in Rn,n with
akj =
1, j = k + 1,
0, otherwise,
ekj =
,
k = n, j = 1,
0, otherwise,
(14.10)
332
14 Numerical Eigenvalue Problems
and B = A + E, where 0 <
⎡
0
⎢0
A := ⎢
⎣0
0
⎤
100
0 1 0⎥
⎥,
0 0 1⎦
000
< 1. Thus for n = 4,
⎡
000
⎢0 0 0
E := ⎢
⎣0 0 0
00
⎤
0
0⎥
⎥,
0⎦
0
⎡
01
⎢0 0
B := ⎢
⎣0 0
0
⎤
00
1 0⎥
⎥.
0 1⎦
00
a) Find the eigenvalues of A and B.
b) Show that A2 = B2 = 1 for arbitrary n ∈ N.
c) Recall Elsner’s Theorem (Theorem 14.2). Let A, E, B be given by (14.10).
What upper bound does (14.1) in Elsner’s theorem give for the eigenvalue
μ = 1/n of B? How sharp is this upper bound?
14.6.4 Exercises Sect. 14.4
Exercise 14.8 (Number of Arithmetic Operations, Hessenberg Reduction)
3
Show that the number of arithmetic operations for Algorithm 14.1 is 10
3 n = 5Gn .
Exercise 14.9 (Assemble Householder Transformations) Show that the number
of arithmetic operations required by Algorithm 14.2 is 43 n3 = 2Gn .
Exercise 14.10 (Tridiagonalize a Symmetric Matrix) If A is real and symmetric
we can modify Algorithm 14.1 as follows. To find Ak+1 from Ak we have to
compute V k E k V k where E k is symmetric. Dropping subscripts we have to compute
a product of the form G = (I − vv T )E(I − vv T ). Let w := Ev, β := 12 v T w and
z := w −βv. Show that G = E −vzT −zv T . Since G is symmetric, only the sub- or
superdiagonal elements of G need to be computed. Computing G in this way, it can
be shown that we need O(4n3 /3) operations to tridiagonalize a symmetric matrix
by orthonormal similarity transformations. This is less than half the work to reduce
a nonsymmetric matrix to upper Hessenberg form. We refer to [18] for a detailed
algorithm.
14.6.5 Exercises Sect. 14.5
Exercise 14.11 (Counting Eigenvalues) Consider the matrix in Exercise 14.2.
Determine the number of eigenvalues greater than 4.5.
14.6 Exercises Chap. 14
333
Exercise 14.12 (Overflow in LDL* Factorization) Let for n ∈ N
⎡
⎤
10 1 0 · · · 0
⎢
. ⎥
⎢ 1 10 1 . . . .. ⎥
⎢
⎥
⎢
⎥
An = ⎢ 0 . . . . . . . . . 0 ⎥ ∈ Rn×n .
⎢
⎥
⎢ . .
⎥
⎣ .. . . 1 10 1 ⎦
0 · · · 0 1 10
a) Let √
dk be the diagonal elements of D in an LDL* factorization of An . Show that
5+ 24 < dk ≤ 10, k = 1, 2, . . . ,√
n.
b) Show that Dn := det(An ) > (5 + 24)n . Give n0 ∈ N such that your computer
gives an overflow when Dn0 is computed in floating point arithmetic.
Exercise 14.13 (Simultaneous Diagonalization) (Simultaneous diagonalization
of two symmetric matrices by a congruence transformation). Let A, B ∈ Rn×n
where AT = A and B is symmetric positive definite. Then B = U T DU where
U is orthonormal and D = diag(d1 , . . . , dn ) has positive diagonal elements. Let
 = D −1/2 U AU T D −1/2 where
−1/2
−1/2 .
D −1/2 := diag d1 , . . . , dn
a) Show that  is symmetric.
T
Let  = Û D̂ Û where Û is orthonormal and D̂ is diagonal. Set E =
T
U T D −1/2 Û .
b) Show that E is nonsingular and that E T AE = D̂, E T BE = I .
For a more general result see Theorem 10.1 in [11].
Exercise 14.14 (Program Code for One Eigenvalue) Suppose A = tridiag(c, d, c)
is symmetric and tridiagonal with elements d1 , . . . , dn on the diagonal and
c1 , . . . , cn−1 on the neighboring subdiagonals. Let λ1 ≥ λ2 ≥ · · · ≥ λn be the
eigenvalues of A. We shall write a program to compute one eigenvalue λm for a
given m using bisection and the method outlined in (14.9).
a) Write a function k=counting(c,d,x) which for given x counts the
number of eigenvalues of A strictly greater than x. Use the replacement described
above if one of the dj (x) is close to zero.
b) Write a function lambda=findeigv(c,d,m) which first estimates an
interval (a, b] containing all eigenvalues of A and then generates a sequence
{(aj , bj ]} of intervals each containing λm . Iterate until bj − aj ≤ (b − a) M ,
where M is MATLAB’s machine epsilon eps. Typically M ≈ 2.22 × 10−16.
c) Test the program on T := tridiag(−1, 2, −1) of size 100. Compare the exact
value of λ5 with your result and the result obtained by using MATLAB’s built-in
function eig.
334
14 Numerical Eigenvalue Problems
Exercise 14.15 (Determinant of Upper Hessenberg Matrix) Suppose A ∈ Cn×n
is upper Hessenberg and x ∈ C. We will study two algorithms to compute f (x) =
det(A − xI ).
a) Show that Gaussian elimination without pivoting requires O(n2 ) arithmetic
operations.
b) Show that the number of arithmetic operations is the same if partial pivoting is
used.
c) Estimate the number of arithmetic operations if Given’s rotations are used.
d) Compare the two methods discussing advantages and disadvantages.
14.7 Review Questions
14.7.1
Suppose A, E ∈ Cn×n . To every μ ∈ σ (A + E) there is a λ ∈ σ (A) which
is in some sense close to μ.
•
•
•
•
14.7.2
14.7.3
14.7.4
14.7.5
14.7.6
What is the general result (Elsner’s theorem)?
what if A is non defective?
what if A is normal?
what if A is Hermitian?
Can Gerschgorin’s theorem be used to check if a matrix is nonsingular?
How many arithmetic operation does it take to reduce a matrix by
similarity transformations to upper Hessenberg form by Householder
transformations?
Give a condition ensuring that a tridiagonal symmetric matrix has real and
distinct eigenvalues:
What is the content of Sylvester’s inertia theorem?
Give an application of this theorem.
Chapter 15
The QR Algorithm
The QR algorithm is a method to find all eigenvalues and eigenvectors of a matrix.
In this chapter we give a brief informal introduction to this important algorithm. For
a more complete treatment see [18].
The QR algorithm is related to a simpler method called the power method and
we start studying this method and its variants.
15.1 The Power Method and Its Variants
These methods can be used to compute a single eigenpair of a matrix. They also
play a role when studying properties of the QR algorithm.
15.1.1 The Power Method
The power method in its basic form is a technique to compute an approximation
to the eigenvector corresponding to the largest (in absolute value) eigenvalue of
a matrix A ∈ Cn×n . As a by product we can also find an approximation to the
corresponding eigenvalue. We define a sequence {zk } of vectors in Cn by
zk := Ak z0 = Azk−1 ,
k = 1, 2, . . . .
(15.1)
Example 15.1 (Power Method) Let
A=
2 −1
,
−1 2
z0 :=
1
.
0
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_15
335
336
15 The QR Algorithm
We find
2
z1 = Az0 =
,
−1
1 1 + 3k
5
, ··· .
z2 = Az1 =
, · · · , zk =
−4
2 1 − 3k
It follows that 2zk /3k converges to [1, −1], an eigenvector corresponding to the
dominant eigenvalue λ = 3. The sequence of Rayleigh quotients {zTk Azk /zTk zk }
will converge to the dominant eigenvalue λ = 3.
To understand better what happens we expand z0 in terms of the eigenvectors
1 1
1 1
z0 =
+
= c1 v 1 + c2 v 2 .
2 −1
2 1
Since Ak has eigenpairs (λkj , v j ), j = 1, 2 we find
zk = c1 λk1 v 1 + c2 λk2 v 2 = c1 3k v 1 + c2 1k v 2 .
Thus 3−k zk = c1 v 1 + 3−k c2 v 2 → c1 v 1 . Since c1 = 0 we obtain convergence to the
dominant eigenvector.
Let A ∈ Cn×n have eigenpairs (λj , v j ), j = 1, . . . , n with |λ1 | > |λ2 | ≥ · · · ≥
|λn |.
Given z0 ∈ Cn we assume that
|λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |,
(i)
(ii) zT0 v 1 = 0
(15.2)
(iii) A has linearly independent eigenvectors.
The first assumption means that A has a dominant eigenvalue λ1 of algebraic
multiplicity one. The second assumption says that z0 has a component in the
direction v 1 . The third assumption is not necessary, but is included in order to
simplify the analysis.
To see what happens let z0 = c1 v 1 + c2 v 2 + · · · + cn v n , where by assumption
(ii) of (15.2) we have c1 = 0. Since Ak v j = λkj v j for all j we see that
zk = c1 λk1 v 1 + c2 λk2 v 2 + · · · + cn λkn v n ,
k = 0, 1, 2, . . . .
(15.3)
Dividing by λk1 we find
zk
λk1
= c1 v 1 + c2
λ k
2
λ1
v 2 + · · · + cn
λ k
n
λ1
vn,
k = 0, 1, 2, . . . .
(15.4)
15.1 The Power Method and Its Variants
337
Assumption (i) of (15.2) implies that (λj /λ1 )k → 0 as k → ∞ for all j ≥ 2 and
we obtain
zk
= c1 v 1 ,
k→∞ λk
1
lim
(15.5)
the dominant eigenvector of A. It can be shown that this also holds for defective
matrices as long as (i) and (ii) of (15.2) hold, see for example page 58 of [18].
In practice we need to scale the iterates zk somehow and we normally do not
know λ1 . Instead we choose a norm on Cn , set x 0 = z0 /z0 and generate for
k = 1, 2, . . . unit vectors as follows:
(i)
y k = Ax k−1
(15.6)
(ii) x k = y k /y k .
Lemma 15.1 (Convergence of the Power Method) Suppose (15.2) holds. Then
|λ | k
c1 v 1
1
.
xk =
k→∞ λ1
|c1 | v 1 lim
In particular, if λ1 > 0 and c1 > 0 then the sequence {x k } will converge to the
eigenvector u1 := v 1 /v 1 of unit length.
Proof By induction on k it follows that x k = zk /zk for all k ≥ 0, where zk =
Ak z0 . Indeed, this holds for k = 1, and if it holds for k − 1 then y k = Ax k−1 =
Azk−1 /zk−1 = zk /zk−1 and x k = (zk /zk−1 )(zk−1 /zk ) = zk /zk .
But then
xk =
zk
=
zk c1 λk1
|c1 λk1 |
v1 +
c2
c1
v 1 +
c2
c1
and this implies the lemma.
λ2
λ1
λ2
λ1
k
k
v2 + · · · +
cn
c1
v2 + · · · +
cn
c1
λn
λ1
λn
λ1
k
k
vn
,
vn
k = 0, 1, 2, . . . ,
Suppose we know an approximate eigenvector u of A, but not the corresponding
eigenvalue μ. One way of estimating μ is to minimize the Euclidian norm of the
residual r(λ) := Au − λu.
Theorem 15.1 (The Rayleigh Quotient Minimizes the Residual) Let A ∈ Cn×n ,
u ∈ Cn \ {0}, and let ρ : C → R be given by ρ(λ) = Au − λu2 . Then ρ is
∗
minimized when λ := uu∗Au
u , the Rayleigh quotient for A.
338
15 The QR Algorithm
Proof Assume u∗ u = 1 and extend u to an orthonormal basis {u, U } for Cn . Then
U ∗ u = 0 and
∗
∗
∗
u Au − λ
u Au − λu∗ u
u
=
.
(Au − λu) =
U ∗ Au − λU ∗ u
U ∗ Au
U∗
By unitary invariance of the Euclidian norm
ρ(λ)2 = |u∗ Au − λ|2 + U ∗ Au22 ,
and ρ has a global minimum at λ = u∗ Au.
Using Rayleigh quotients we can incorporate the calculation of the eigenvalue
into the power iteration. We can then compute the residual and stop the iteration
when the residual is sufficiently small. But what does it mean to be sufficiently
small? Recall that if A is nonsingular with a nonsingular eigenvector matrix X and
(μ, u) is an approximate eigenpair with u2 = 1, then by (14.4) we can find an
eigenvalue λ of A such that
|λ − μ|
Au − μu2
≤ K2 (X)K2 (A)
.
|λ|
A2
Thus if the relative residual is small and both A and X are well conditioned then the
relative error in the eigenvalue will be small.
This discussion leads to the power method with Rayleigh quotient computation.
Given A ∈ Cn×n , a starting vector z ∈ Cn , a maximum number K of iterations, and
a convergence tolerance tol. The power method combined with a Rayleigh quotient
estimate for the eigenvalue is used to compute a dominant eigenpair (l, x) of A
with x2 = 1. The integer it returns the number of iterations needed in order for
Ax − lx2 /AF < tol. If no such eigenpair is found in K iterations the value
it = K + 1 is returned.
function [l,x,it]=powerit(A,z,K,tol)
% [l,x,it]=powerit(A,z,K,tol)
af=norm(A,’fro’); x=z/norm(z);
for k=1:K
y=A*x; l=x’*y;
if norm(y-l*x)/af<tol
it=k; x=y/norm(y); return
end
x=y/norm(y);
end
it=K+1;
end
Listing 15.1 powerit
15.1 The Power Method and Its Variants
339
Example 15.2 (Power Method) We try powerit on the three matrices
A1 :=
12
,
34
A2 :=
1.7 −0.4
1 2
, and A3 =
.
0.15 2.2
−3 4
In each case we start with the random vector z = [0.6602, 0.3420] and tol = 10−6 .
For A1 we get convergence in 7 iterations, for A2 it takes 174 iterations, and for A3
we do not get convergence.
The matrix A3 does not have a dominant eigenvalue since the two eigenvalues
are complex conjugate of each other. Thus the basic condition (i) of (15.2) is not
satisfied and the power method diverges. The enormous difference in the rate of
convergence for A1 and A2 can be explained by looking at (15.4). The rate of
2|
convergence depends on the ratio |λ
|λ1 | . If this ratio is small then the convergence
is fast, while it can be quite slow if the ratio is close to one. The eigenvalues of
A1 are λ1 = 5.3723 and λ2 = −0.3723 giving a quite small ratio of 0.07 and
the convergence is fast. On the other hand the eigenvalues of A2 are λ1 = 2 and
λ2 = 1.9 and the corresponding ratio is 0.95 resulting in slow convergence.
A variant of the power method is the shifted power method. In this method we
choose a number s and apply the power method to the matrix A − sI . The number
s is called a shift since it shifts an eigenvalue λ of A to λ − s of A − sI . Sometimes
the convergence can be faster if the shift is chosen intelligently. For example, if we
apply the shifted power method to A2 in Example 15.2 with shift 1.8, then with the
same starting vector and tol as above, we get convergence in 17 iterations instead
of 174 for the unshifted algorithm.
15.1.2 The Inverse Power Method
Another variant of the power method with Rayleigh quotient is the inverse power
method. This method can be used to determine any eigenpair (λ, x) of A as long as
λ has algebraic multiplicity one. In the inverse power method we apply the power
method to the inverse matrix (A − sI )−1 , where s is a shift. If A has eigenvalues
λ1 , . . . , λn in no particular order then (A − sI )−1 has eigenvalues
μ1 (s) = (λ1 − s)−1 , μ2 (s) = (λ2 − s)−1 , . . . , μn (s) = (λn − s)−1 .
Suppose λ1 is a simple eigenvalue of A. Then lims→λ1 |μ1 (s)| = ∞, while
lims→λ1 μj (s) = (λj − λ1 )−1 < ∞ for j = 2, . . . , n. Hence, by choosing s
sufficiently close to λ1 the inverse power method will converge to that eigenvalue.
340
15 The QR Algorithm
For the inverse power method (15.6) is replaced by
(i) (A − sI )y k = x k−1
(ii) x k = y k /y k .
(15.7)
Note that we solve the linear system rather than computing the inverse matrix.
Normally the PLU factorization of A − sI is precomputed in order to speed up
the computation.
15.1.3 Rayleigh Quotient Iteration
A variant of the inverse power method is known simply as Rayleigh quotient
iteration. In this method we change the shift from iteration to iteration, using the
previous Rayleigh quotient s k−1 as the current shift. In each iteration we need to
compute the following quantities
(i) (A − sk−1 I )y k = x k−1 ,
(ii) x k = y k /y k ,
(iii) s k = x ∗k Ax k ,
(iv)
r k = Ax k − sk x k .
We can avoid the calculation of Ax k in (iii) and (iv). Let
ρk :=
y ∗k x k−1
,
y ∗k y k
wk :=
x k−1
.
y k 2
Then
sk =
y ∗k Ay k
y ∗k (A − sk−1 I )y k
y ∗k x k−1
=
s
+
=
s
+
= sk−1 + ρk ,
k−1
k−1
y ∗k y k
y ∗k y k
y ∗k y k
r k = Ax k − sk x k =
Ay k − (sk−1 + ρk )y k
x k−1 − ρk y k
=
= wk − ρk x k .
y k 2
y k 2
Another problem is that the linear system in (i) becomes closer and closer to
singular as sk converges to the eigenvalue. Thus the system becomes more and more
ill-conditioned and we can expect large errors in the computed y k . This is indeed
true, but we are lucky. Most of the error occurs in the direction of the eigenvector and
15.1 The Power Method and Its Variants
341
this error disappears when we normalize y k in (ii). Miraculously, the normalized
eigenvector will be quite accurate.
Given an approximation (s, x) to an eigenpair (λ, v) of a matrix A ∈ Cn×n . The
following algorithm computes a hopefully better approximation to (λ, v) by doing
one Rayleigh quotient iteration. The length nr of the new residual is also returned.
function [x,s,nr]=rayleighit(A,x,s)
% [x,s,nr]=rayleighit(A,x,s)
n=length(x);
y=(A-s*eye(n,n))\x;
yn=norm(y);
w=x/yn;
x=y/yn;
rho=x’*w;
s=s+rho;
nr=norm(w-rho*x);
end
Listing 15.2 rayleighit
Since the shift changes from iteration to iteration the computation of y in
rayleighit will require O(n3 ) arithmetic operations for a full matrix. For such
a matrix it might be useful to reduce it to an upper Hessenberg form, or tridiagonal
form, before starting the iteration. However, if we have a good approximation to
an eigenpair then only a few iterations are necessary to obtain close to machine
accuracy.
If Rayleigh quotient iteration converges the convergence will be quadratic and
sometimes even cubic. We illustrate this with an example.
Example
Iteration) The smallest eigenvalue of the matrix
√
15.3
(Rayleigh Quotient
A = 13 24 is λ1 = (5 − 33)/2 ≈ −0.37. Starting with x = [1, 1]T and s =
0 rayleighit converges to this eigenvalue and corresponding eigenvector. In
Table 15.1 we show the rate of convergence by iterating rayleighit 5 times. The
errors are approximately squared in each iteration indicating quadratic convergence.
Table 15.1 Quadratic convergence of Rayleigh quotient iteration
k
r2
|s − λ1 |
1
1.0e+000
3.7e−001
2
7.7e−002
−1.2e−002
3
1.6e−004
−2.9e−005
4
8.2e−010
−1.4e−010
5
2.0e−020
−2.2e−016
342
15 The QR Algorithm
15.2 The Basic QR Algorithm
The QR algorithm is an iterative method to compute all eigenvalues and eigenvectors
of a matrix A ∈ Cn×n . The matrix is reduced to triangular form by a sequence of
unitary similarity transformations computed from the QR factorization of A. Recall
that for a square matrix the QR factorization and the QR decomposition are the
same. If A = QR is a QR factorization then Q ∈ Cn×n is unitary, Q∗ Q = I and
R ∈ Cn×n is upper triangular.
The basic QR algorithm takes the following form:
A1 = A
for k = 1, 2, . . .
Qk R k = Ak
(QR factorization of Ak )
(15.8)
Ak+1 = R k Qk .
end
The determination of the QR factorization of Ak and the computation of R k Qk
is called a QR step. It is not at all clear that a QR step does anything useful. At this
point, since R k = Q∗k Ak we find
Ak+1 = R k Qk = Q∗k Ak Qk ,
(15.9)
so Ak+1 is unitary similar to Ak . By induction Ak+1 is unitary similar to A. Thus,
each Ak has the same eigenvalues as A. We shall see that the basic QR algorithm is
related to the power method.
Here are two examples to illustrate what happens.
Example 15.4 (QR Iteration; Real Eigenvalues) We start with
1 −2 −1 1 −5 −4 21
A1 = A =
∗ √
= Q1 R 1
= √
12
5 −1 2
5 0 3
and obtain
A2 = R 1 Q1 =
1 −5 −4
1 14 −3
−2 −1
2.8 −0.6
∗
=
=
.
−1 2
−0.6 1.2
5 0 3
5 −3 6
Continuing we find
2.997 −0.074
A4 ≈
,
−0.074 1.0027
A10
3.0000 −0.0001
≈
−0.0001 1.0000
15.2 The Basic QR Algorithm
343
A10 is almost diagonal and contains approximations to the eigenvalues λ1 = 3 and
λ2 = 1 on the diagonal.
Example 15.5 (QR Iteration; Complex Eigenvalues) Applying the QR iteration (15.8) to the matrix
⎡
⎤
0.9501 0.8913 0.8214 0.9218
⎢ 0.2311 0.7621 0.4447 0.7382 ⎥
⎥
A1 = A = ⎢
⎣ 0.6068 0.4565 0.6154 0.1763 ⎦
0.4860 0.0185 0.7919 0.4057
we obtain
⎡
A14
⎤
2.323 0.047223 −0.39232 −0.65056
⎢ −2.1e − 10 0.13029
0.36125 0.15946 ⎥
⎥
=⎢
⎣ −4.1e − 10 −0.58622 0.052576 −0.25774 ⎦ .
1.2e − 14 3.3e − 05 −1.1e − 05 0.22746
This matrix is almost quasi-triangular and estimates for the eigenvalues λ1 , . . . , λ4
of A can now easily be determined from the diagonal blocks of A14 . The 1 × 1
blocks give us two real eigenvalues λ1 ≈ 2.323 and λ4 ≈ 0.2275. The middle
2 × 2 block has complex eigenvalues resulting in λ2 ≈ 0.0914 + 0.4586i and λ3 ≈
0.0914 − 0.4586i. From Gerschgorin’s circle Theorem 14.1 and Corollary 14.1 it
follows that the approximations to the real eigenvalues are quite accurate. We would
also expect the complex eigenvalues to have small absolute errors.
These two examples illustrate what most often happens in general. The sequence
(Ak )k converges to the triangular Schur form (Cf. Theorem 6.5) if all the eigenvalues
are real or the quasi-triangular Schur form (Cf. Definition 6.5) if some of the
eigenvalues are complex.
15.2.1 Relation to the Power Method
Let us show that the basic QR algorithm is related to the power method. We obtain
the QR factorization of the powers Ak as follows:
Theorem 15.3 (QR and Power) For k = 1, 2, 3, . . ., the QR factorization of Ak is
Ak = Q̃k R̃ k , where
Q̃k := Q1 · · · Qk and R̃ k := R k · · · R 1 ,
(15.10)
and Q1 , . . . , Qk , R 1 , . . . , R k are the matrices generated by the basic QR algorithm (15.8).
344
15 The QR Algorithm
A=
x x x x
0 xxx
0 0 xx
0 0 0 x
P∗
12
x x x x
xxxx
0 0 xx
0 0 0 x
P∗
23
x x x x
xxxx
0 xxx
0 0 0 x
P∗
34
x x x x
xxxx
0 x x x
0 0 xx
.
Fig. 15.1 Post multiplication in a QR step
Proof By (15.9)
∗
Ak = Q∗k−1 Ak−1 Qk−1 = Q∗k−1 Q∗k−2 Ak−2 Qk−2 Qk−1 = · · · = Q̃k−1 AQ̃k−1 .
(15.11)
The proof is by induction on k. Clearly Q̃1 R̃ 1 = Q1 R 1 = A1 . Suppose
Q̃k−1 R̃ k−1 = Ak−1 for some k ≥ 2. Since Qk R k = Ak and using (15.11)
∗
Q̃k R̃ k = Q̃k−1 (Qk R k )R̃ k−1 = Q̃k−1 Ak R̃ k−1 = (Q̃k−1 Q̃k−1 )AQ̃k−1 R̃ k−1 = Ak .
Since R̃ k is upper triangular, its first column is a multiple of e1 so that
(k)
(k)
Ak e1 = Q̃k R̃ k e1 = r̃11 Q̃k e1 or q̃ 1 := Q̃k e1 =
1
(k)
r̃11
Ak e 1 .
Since q̃ (k)
1 2 = 1 the first column of Q̃k is the result of applying the normalized
power iteration (15.6) to the starting vector x 0 = e1 . If this iteration converges we
conclude that the first column of Q̃k must converge to a dominant eigenvector of A.
It can be shown that the first column of Ak must then converge to λ1 e1 , where λ1 is a
dominant eigenvalue of A. This is clearly what happens in Examples 15.4 and 15.5.
∗
Indeed, what is observed in practice is that the sequence (Q̃k AQ̃k )k converges to a
(quasi-triangular) Schur form of A.
15.2.2 Invariance of the Hessenberg Form
One QR step requires O(n3 ) arithmetic operations for a matrix A of order n. By
an initial reduction of A to upper Hessenberg form H 1 using Algorithm 14.1, the
cost of a QR step can be reduced to O(n2 ). Consider a QR step on H 1 . We first
determine plane rotations P i,i+1 , i = 1, . . . , n − 1 so that P n−1,n · · · P 1,2 H 1 = R 1
is upper triangular. The details were described in Sect. 5.6. Thus H 1 = Q1 R 1 ,
where Q1 = P ∗1,2 · · · P ∗n−1,n is a QR factorization of H 1 . To finish the QR step we
compute R 1 Q1 = R 1 P ∗1,2 · · · P ∗n−1,n . This postmultiplication step is illustrated by
the Wilkinson diagram in Fig. 15.1.
The postmultiplication by P i,i+1 introduces a nonzero in position (i + 1, i)
leaving the other elements marked by a zero in Fig. 15.1 unchanged. Thus the final
15.3 The Shifted QR Algorithms
345
matrix RP ∗1,2 · · · P ∗n−1,n is upper Hessenberg and a QR step leaves the Hessenberg
form invariant.
In conclusion, to compute Ak+1 from Ak requires O(n2 ) arithmetic operations
if Ak is upper Hessenberg and O(n) arithmetic operations if Ak is tridiagonal.
15.2.3 Deflation
If a subdiagonal element ai+1,i of an upper Hessenberg matrix A is equal to zero,
then the eigenvalues of A are the union of the eigenvalues of the two smaller
matrices A(1 : i, 1 : i) and A(i + 1 : n, i + 1 : n). Thus, if during the iteration
the (i + 1, i) element of Ak is sufficiently small then we can continue the iteration
on the two smaller submatrices separately.
(k)
To see what effect this can have on the eigenvalues of A suppose |ai+1,i
| ≤ . Let
(k)
Âk := Ak − ai+1,i ei+1 eTi be the matrix obtained from Ak by setting the (i + 1, i)
∗
element equal to zero. Since Ak = Q̃k−1 AQ̃k−1 we have
∗
Âk = Q̃k−1 (A + E)Q̃k−1 ,
∗
(k)
E = Q̃k−1 (ai+1,i ei+1 eTi )Q̃k−1 .
(k)
(k)
Since Q̃k−1 is unitary, EF = ai+1,i ei+1 eTi F = |ai+1,i | ≤
and setting
(k)
ai+1,i
= 0 amounts to a perturbation in the original A of at most . For how to
chose see the discussion on page 94-95 in [18].
This deflation occurs often in practice and can with a proper implementation
reduce the computation time considerably. It should be noted that to find the
eigenvectors of the original matrix one has to continue with some care, see [18].
15.3 The Shifted QR Algorithms
Like in the inverse power method it is possible to speed up the convergence by
introducing shifts. The explicitly shifted QR algorithm works as follows:
A1 = A
for k = 1, 2, . . .
Choose a shift sk
Qk R k = Ak − sk I
Ak+1 = R k Qk + sk I .
end
(QR factorization of Ak − sI )
346
15 The QR Algorithm
Since R k = Q∗k (Ak − sk I ) we find
Ak+1 = Q∗k (Ak − sk I )Qk + sk I = Q∗k Ak Qk
and Ak+1 and Ak are unitary similar.
The shifted QR algorithm is related to the power method with shift, cf.
Theorem 15.3 and also the inverse power method. In fact the last column of Qk
is the result of one iteration of the inverse power method to A∗ with shift sk . Indeed,
since A − sk I = Qk R k we have (A − sk I )∗ = R ∗k Q∗k and (A − sk I )∗ Qk = R ∗k .
(k)
Thus, since R ∗k is lower triangular with n, n element r nn we find (A−sk I )∗ Qk en =
(k)
R ∗k en = r nn en from which the conclusion follows.
The shift sk := eTn Ak en is called the Rayleigh quotient shift, while the
eigenvalue of the lower right 2 × 2 corner of Ak closest to the n, n element of Ak is
called theWilkinson shift. This shift can be used to find complex eigenvalues of a
real matrix. The convergence is very fast and at least quadratic both for the Rayleigh
quotient shift and the Wilkinson shift.
By doing two QR iterations at a time it is possible to find both real and complex
eigenvalues of a real matrix without using complex arithmetic. The corresponding
algorithm is called the implicitly shifted QR algorithm.
After having computed the eigenvalues we can compute the eigenvectors in steps.
First we find the eigenvectors of the triangular or quasi-triangular matrix. We then
compute the eigenvectors of the upper Hessenberg matrix and finally we get the
eigenvectors of A.
The QR Algorithm
shifts does not always converge. A simple example
without
is given by A :=? 01 10 . We obtain Ak = A for all k ∈ N. For convergence of the
shifted QR algorithm for unitary upper Hessenberg matrices see [20].
Practical experience indicates that only O(n) iterations are needed to find all
eigenvalues of A. Thus both the explicit- and implicit shift QR algorithms are
normally O(n3 ) algorithms.
For further remarks and detailed algorithms see [18].
15.4 Exercises Chap. 15
15.4.1 Exercises Sect. 15.1
Exercise 15.1 (Orthogonal Vectors) Show that u and Au − λu are orthogonal
∗
when λ = uu∗Au
u .
15.5 Review Questions
347
15.5 Review Questions
15.4.1
15.4.2
15.4.3
15.4.4
What is the main use of the power method?
Can the QR method be used to find all eigenvectors of a matrix?
Can the power method be used to find an eigenvalue?
Do the power method converge to an eigenvector corresponding to a
complex eigenvalue?
15.4.5 What is the inverse power method?
15.4.6 Give a relation between the QR algorithm and the power method.
15.4.7 How can we make the basic QR algorithm converge faster?
Part VII
Appendix
Chapter 16
Differentiation of Vector Functions
We give a short introduction to differentiation of vector functions.
For any sufficiently differentiable f : Rn → R we recall that the partial derivative
with respect to the ith variable of f is defined by
Di f (x) :=
f (x + hei ) − f (x)
∂f (x)
,
:= lim
∂xi
h→0
h
x ∈ Rn ,
where ei is the ith unit vector in Rn . For each x ∈ Rn we define the gradient
∇f (x) ∈ Rn , and the hessian H f = ∇∇ T f (x) ∈ Rn×n of f by
⎤
D1 f
⎥
⎢
∇f := ⎣ ... ⎦ ,
⎡
Dn f
⎤
D1 D1 f · · · D1 Dn f
⎢
.. ⎥ ,
H f := ∇∇ T f := ⎣ ...
. ⎦
Dn D1 · · · Dn Dn f
⎡
(16.1)
where ∇ T f := (∇f )T is the row vector gradient. The operators ∇∇ T and ∇ T ∇ are
quite different. Indeed, ∇ T ∇f = D12 f + · · · + Dn2 f =: ∇ 2 the Laplacian of f ,
while ∇∇ T can be thought of as an outer product resulting in a matrix.
Lemma 16.1 (Product Rules) For f, g : Rn → R we have the product rules
1. ∇(fg) = f ∇g + g∇f, ∇ T (fg) = f ∇ T g + g∇ T f,
2. ∇∇ T (fg) = ∇f ∇ T g + ∇g∇ T f + f ∇∇ T g + g∇∇ T f .
3. ∇ 2 (fg) = 2∇ T f ∇g + f ∇ 2 g + g∇ 2 f .
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7_16
351
352
16 Differentiation of Vector Functions
We define the Jacobian of a vector function f = [f1 , . . . fm ]T : Rn → Rm as
the m, n matrix
⎤
D1 f1 · · · Dn f1
⎢
.. ⎥ .
∇ T f := ⎣ ...
. ⎦
D1 fm · · · Dn fm
⎡
As an example, if f (x) = f (x, y) = x 2 −xy +y 2 and g(x, y) := [f (x, y), x −y]T
then
2x − y
2x − y −x + 2y
T
∇f (x, y) =
, ∇ g(x, y) =
,
−x + 2y
1
−1
⎤
⎡ 2
∂ f ∂2f
2
⎦ = 2 −1 .
H f (x, y) = ⎣ ∂∂x2 f ∂x∂y
2
∂ f
−1 2
2
∂y∂x
∂y
The second order Taylor expansion in n variables can be expressed in terms of
the gradient and the hessian.
Lemma 16.2 (Second Order Taylor Expansion) Suppose f ∈ C 2 (Ω), where
Ω ∈ Rn contains two points x, x +h ∈ Ω, such that the line segment L := {x +th :
t ∈ (0, 1)} ⊂ Ω. Then
1
f (x + h) = f (x) + hT ∇f (x) + hT ∇∇ T f (c)h, for some c ∈ L.
2
(16.2)
Proof Let g : [0, 1] → R be defined by g(t) := f (x + th). Then g ∈ C 2 [0, 1] and
by the chain rule
g(0) = f (x)
g (t) =
g(1) = f (x + h),
n
hi
i=1
g (t) =
n
∂f (x + th)
= hT ∇f (x + th),
∂xi
n
hi hj
i=1 j =1
∂ 2 f (x + th)
= hT ∇∇ T f (x + th)h.
∂xi ∂xj
Inserting these expressions in the second order Taylor expansion
1
g(1) = g(0) + g (0) + g (u), for some u ∈ (0, 1),
2
we obtain (16.2) with c = x + uh.
16 Differentiation of Vector Functions
353
The gradient and hessian of some functions involving matrices can be found from
the following lemma.
Lemma 16.3 (Functions Involving Matrices) For any m, n ∈ N, B ∈ Rn×n , C ∈
Rm×n , and x ∈ Rn , y ∈ Rm we have
1. ∇(y T C) = ∇ T (Cx) = C,
2. ∇(x T Bx) = (B + B T )x, ∇ T (x T Bx) = x T (B + B T ),
3. ∇∇ T (x T Bx) = B + B T .
Proof
1. We find Di (y T C) = limh→0 h1 (y + hei )T C − y T C = eTi C and Di (Cx) =
limh→0 h1 (C(x + hei ) − Cx) = Cei and 1. follows.
2. Here we find
1
(x + hei )T B(x + hei ) − x T Bx
h→0 h
= lim eTi Bx + x T Bei + heTi ei = eTi (B + B T )x,
Di (x T Bx) = lim
h→0
and the first part of 2. follows. Taking transpose we obtain the second part.
3. Combining 1. and 2. we obtain 3.
References
1. O. Axelsson, Iterative Solution Methods (Cambridge University Press, 1994)
2. Å. Björck, Numerical Methods in Matrix Computations (Springer, 2015)
3. G.H. Golub, C.F. Van Loan, Matrix Computations, 4th Edition (The John Hopkins University
Press, Baltimore, MD, 2013)
4. J.F. Grcar, Mathematicians of gaussian elimination. Not. AMS 58, 782–792 (2011)
5. W. Hackbush, Iterative Solution of Large Sparse Systems of Equations, Second edition
(Springer International Publishing, Berlin, 2016)
6. M. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems. J. Res.
Natl. Bur. Stand. 29, 409–439 (1952)
7. N.J. Higham, Accuracy and Stability of Numerical Algorithms, Second Edition (SIAM,
Philadelphia, 2002)
8. M.W. Hirsch, S. Smale, Differential Equations, Dynamical Systems, and Linear Algebra
(Academic Press, San Diego, 1974)
9. R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, Cambridge,
UK, 1991)
10. R.A. Horn, C.R. Johnson, Matrix Analysis, Second edition (Cambridge University Press,
Cambridge, UK, 2013)
11. P. Lancaster, L. Rodman, Canonical forms for hermitian matrix pairs under strict equivalence
and congruence. SIAM Rev. 47, 407–443 (2005)
12. C.L. Lawson, R.J. Hanson, Solving Least Squares Problems (Prentice-Hall, Englewood Cliffs,
NJ, 1974)
13. D.C. Lay, S.R. Lay, J.J. McDonald, Linear Algebra and Its Applications (Fifth edition)
(Pearson Education Limited, 2016)
14. C. Van Loan, Computational Frameworks for the Fast Fourier Transform (SIAM, Philadelphia,
1992)
15. C.D. Meyer, Matrix Analysis and Applied Linear Algebra. (SIAM, Philadelphia, 2000)
16. G.W. Stewart, Introduction to Matrix Computations (Academic press, New York, 1973)
17. G.W. Stewart, Matrix Algorithms Volume I: Basic Decompositions (SIAM, Philadelphia, 1998)
18. G.W. Stewart, Matrix Algorithms Volume II: Eigensystems (SIAM Philadelphia, 2001)
19. G.W. Stewart, J. Sun, Matrix Perturbation Theory (Academic Press, San Diego, 1990)
20. T.L. Wang, W.B. Gragg, Convergence of the shifted QR algorithm for unitary Hessenberg
matrices. Math. Comput. 71, 1473–1496 (2002)
21. J.H. Wilkinson, The Algebraic Eigenvalue Problem (Clarendon Press, Oxford, 1965)
22. D.M. Young, Iterative Solution of Large Linear Systems (Academic Press, New York, 1971)
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7
355
Index
A
Absolute error, 181
A-inner product, 279
Algebraic multiplicity, 132
Algorithms
assemble Householder transformations,
326
backsolve, 63
backsolve column oriented, 75
bandcholesky, 89
cg, 286
fastpoisson, 241
findsubintervals, 50
forwardsolve, 63
forwardsolve column oriented, 64
housegen, 110
Householder reduction to Hessenberg form,
325
Householder triangulation, 112
Jacobi, 256
L1U factorization, 73
LDL* factorization, 85
the power method, 338
preconditioned cg, 301
Rayleigh quotient iteration, 341
SOR, 257
spline evaluation, 50
splineint, 50
testing conjugate gradient, 287
trifactor, 36
trisolve, 36
upper Hessenberg linear system, 123
A-norm, 279, 306
B
Back substitution, 58
Bandsemi-cholesky, 94
Biharmonic equation, 235
fast solution method, 249
nine point rule, 250
Block LU theorem, 74
C
Cauchy-Binet formula, 17
Cauchy determinant, 23
Cauchy-Schwarz inequality, 101
Cayley Hamilton Theorem, 148
Central difference, 52
Central difference approximation
second derivative, 52
Change of basis matrix, 9
Characteristic equation, 19, 129
Characteristic polynomial, 19, 129
Chebyshev polynomial, 293
Cholesky factorization, 83
Column operations, 16
Column space (span), 10
Companion matrix, 146
Complementary subspaces, 104
Complete pivoting, 70
Complexity of an algorithm, 65
Condition number
ill-conditioned, 181
Congruent matrices, 328
Conjugate gradient method, 279
convergence, 288
derivation, 283
© Springer Nature Switzerland AG 2020
T. Lyche, Numerical Linear Algebra and Matrix Factorizations,
Texts in Computational Science and Engineering 22,
https://doi.org/10.1007/978-3-030-36468-7
357
358
Krylov spaces, 289
least squares problem, 308
preconditioning, 299
preconditioning algorithm, 301
preconditioning convergence, 301
Convex combinations, 139, 185, 296
Convex function, 185
Courant-Fischer theorem, 142
Cubic spline
minimal norm, 49
D
Defective matrix, 129
Deflation, 137
Determinant
area of a triangle, 23
Cauchy, 23
Cauchy-Binet, 17
cofactor expansion, 16
definition, 15
elementary operations, 16
plane equation, 22
principal minor, 60
straight line equation, 16
Vandermonde, 23
Dirac delta, 3
Direct sum, 104
Discrete cosine transform, 242
Discrete Fourier transform (DFT), 242, 243
Fourier matrix, 243
Discrete sine transform (DST), 242
E
Eigenpair, 18, 40, 129
eigenvalue, 18
eigenvector, 18
left eigenpair, 143
1D test matrix, 42
right eigenpair, 143
spectrum, 18
Eigenvalue, 40, 129
algebraic multiplicity, 132
characteristic equation, 19, 129
characteristic polynomial, 19, 129
Courant-Fischer theorem, 142
geometric multiplicity, 132
Hoffman-Wielandt theorem, 143
left eigenvalue, 143
location, 318
Rayleigh quotient, 139
right eigenvalue, 143
spectral theorem, 140
Index
spectrum, 129
triangular matrix, 20
Eigenvector, 40, 129
left eigenvector, 143
right eigenvector, 143
Eigenvector expansion, 130
Elementary reflector, 107
Elsner’s theorem, 321
Equivalent norms, 173
Euclidian norm, 11
Exams
1977-1, inverse update, 21
1977-2, weighted least squares, 217
1977-3, symmetrize matrix, 53
1978-1, computing the inverse, 79
1979-3, x T Ay inequality, 119
1980-2, singular values perturbation, 221
1981-2, perturbed linear equation, 194
1981-3, solving T H x = b, 79
1982-1, L1U factorization, 96
1982-2, QL-factorization, 121
1982-3, QL-factorization, 121
1982-4, an A-norm inequality, 190
1983-1, L1U factorization update, 80
1983-2, a least squares problem, 217
1983-3, antisymmetric system, 310
1983-4, cg antisymmetric system, 311
1990-1, U1L factorization, 80
1991-3, Iterative method, 275
1993-2, periodic spline interpolation, 195
1994-2, upper Hessenberg system, 76
1995-4, A orthogonal bases, 190
1996-3, Cayley Hamilton Theorem, 148
2005-1, right or wrong?, 147
2005-2, singular values, 164
2006-1, yes or no, 330
2006-2, QR fact. of band matrices, 122
2008-1, Gauss-Seidel method, 276
2008-2, find QR factorization, 122
2008-3, eigenpairs of Kronecker prod, 234
2009-1, matrix products, 21
2009-2, Gershgorin disks, 331
2009-3, eigenvalues of tridiag matrix, 147
2010-1, Householder transformation, 120
2010-2, eigenvalue perturbations, 331
2011-1, steepest descent, 307
2011-2, polar decomposition, 165
2013-2, a Givens transformation, 123
2013-4, LSQ MATLAB program, 197
2015-1, underdetermined system, 166
2015-2, Cholesky update, 97
2015-3, Rayleigh quotient, 149
2016-1, Norms, Cholesky and SVD, 168
2016-2 Cholesky update, 97
Index
2016-3, operation counts, 124
2017-1, Strassen multiplication, 20
2017-2, SVD , 220
2017-3, an iterative method, 192
2018-1, least square fit, 217
2018-2, Cholesky and Givens, 125
2018-3, AT A inner product, 308
Extension of basis, 104
F
Fast Fourier transform (FFT), 242, 244
recursive FFT, 246
Fill-inn, 238
Finite difference method, 37
Fixed-point, 258
Fixed point form of discrete Poisson equation,
255
Fixed-point iteration, 258
Fourier matrix, 243
Fredholm’s alternative, 220
Frobenius norm, 161
G
Gaussian elimination, 59
complete pivoting, 70
interchange matrix, 67
multipliers, 60
pivot, 66
pivoting, 66
pivot row, 66
pivots, 60
pivot vector, 67
scaled partial pivoting, 70
Generalized eigenvectors, 134
Generalized inverse, 209
Geometric multiplicity, 132
Gerschgorin’s theorem, 318
Given’s rotation, 117
Gradient, 86, 351
Gradient method, 282
Gram-Schmidt, 103
H
Hadamard’s inequality, 121
Hessian, 86, 351
Hilbert matrix, 24, 203
Hoffman-Wielandt theorem, 143
Hölder’s inequality, 172, 187
Householder transformation, 107
359
I
Ill-conditioned problem, 181
Inequality
geometric/arithmetic mean, 187
Hölder, 187
Kantorovich, 296
Minkowski, 188
Inner product, 99, 100
inner product norm, 100
Pythagoras’ theorem, 102
standard, 11
standard inner product in Cn , 100
Inner product space
orthogonal basis, 102
orthonormal basis, 102
Interchange matrix, 67
Inverse power method, 339
Inverse triangle inequality, 173
Iterative method
convergence, 260
Gauss-Seidel, 254
Jacobi, 254
SOR, 254
SOR, convergence, 263
SSOR, 255
Iterative methods, 251
J
Jacobian, 352
Jordan factorization, 134
generalized eigenvectors, 134
Jordan block, 133
Jordan canonical form, 133
principal vectors, 134
Jordan factors, 134
K
Kronecker product, 229
eigenvectors, 231
inverse, 231
left product, 229
mixed product rule, 230
nonsingular, 231
positive definite, 231
propertis, 231
right product, 229
symmetry, 231
transpose, 231
Kronecker sum, 230
nonsingular, 231
positive definite, 231
symmetry, 231
Krylov space, 289
360
L
Laplacian, 351
LDL theorem, 84
Leading principal block submatrices, 74
Leading principal minor, 60
Leading principal submatrices, 60
Least squares
error analysis, 210
Least squares problem, 199
Least squares solution, 199
Left eigenpair, 143
Left eigenvalue, 143
Left eigenvector, 143
Left triangular, 70
Linear combination
nontrivial, 6
span, 6
Linear system
Cramer’s rule, 17
existence and uniqueness, 12, 13
homogeneous, 12
overdetermined, 12
residual vector, 182
square, 12
underdetermined, 12
Linearly dependent, 6
Linearly independent, 6
LSQ, 199
LU factorization, 70
LU theorem, 72
M
Matrix
A∗ A, 86
addition, 3
adjoint, 17
adjoint formula for the inverse, 17
banded, 4
block matrix, 44
block triangular, 46
cofactor, 16
column space (span), 10
companion matrix, 146
computing inverse, 75
conjugate transpose, 3
deflation, 137
diagonal, 4
element-by-element operations, 3
Hadamard product, 3
Hermitian, 3, 43
Hilbert, 24
Hilbert matrix, inverse, 24
idempotent, 146
Index
identity matrix, 3
ill-conditioned, 182
inverse, 13
invertible, 13, 14
Kronecker product, 229
LDL* and LL*, 88
leading principal minor, 60
leading principal submatrices, 60
left inverse, 13
left triangular, 4
lower Hessenberg, 4
lower triangular, 4
LU theorem, 72
multiplication, 3
negative (semi)definite, 86
Neumann series, 270
nilpotent, 146
nonsingular, 12, 13
nonsingular products, 13
nullity, 10
null space (N), 10
operator norm, 176
outer product, 54
outer product expansion, 54
permutation, 66
positive definite, 86
positive semidefinite, 86
principal minor, 60
principal submatrix, 60
product of triangular matrices, 47
quasi-triangular, 139
right inverse, 13
right triangular, 4
row space, 10
scalar multiplication, 3
Schur product, 3
second derivative, 38
Sherman-Morrison formula, 21
similar matrices, 131
similarity transformation, 131
singular, 12, 13
spectral radius, 261, 268
Strassen multiplication, 20
strictly diagonally dominant, 33
symmetric, 3
test matrix,1D , 41
test matrix, 2D, 229
trace, 19
transpose, 3
triangular, 47
tridiagonal, 4
unit triangular, 48
upper Hessenberg, 4
upper trapezoidal, 111
Index
upper triangular, 4
vec Operation, 226
weakly diagonally dominant, 38
well-conditioned, 182
Matrix norm
consistent norm, 175
Frobenius norm, 161, 174
max norm, 174
operator norm, 175
spectral norm, 178
subordinate norm, 175
sum norm, 174
two-norm, 178
Minimal polynomial, 148
Minkowski’s inequality, 172, 188
Mixed product rule, 230
N
Natural ordering, 227
Negative (semi)definite, 86
Neumann series, 270
Nilpotent matrix, 146
Nondefective matrix, 129
Nonsingular matrix, 12
Nontrivial subspaces, 8
Norm, 171
l1 -norm, 172
l2 -norm, 172
l∞ -norm, 172
absolute norm, 180
continuity, 173
Euclidian norm, 172
infinity-norm, 172
max norm, 172
monotone norm, 180
one-norm, 172
triangle inequality, 171
two-norm, 172
Normal matrix, 137
Normal equations, 200
Null space (N), 10
O
Oblique projection, 104
1D test matrix, 41
Operation count, 64
Operator norm, 176
Optimal relaxation parameter, 265
Optimal step length, 282
Orthogonal matrix, 106
Orthogonal projection, 105
361
Orthogonal sum, 104
Outer product, 54
P
Paraboloid, 306
Parallelogram identity, 188
Partial pivoting, 69
Permutation matrix, 66
Perpendicular vectors, 102
Pivot vector, 67
Plane rotation, 117
p-norms, 172
Poisson matrix, 227
Poisson problem, 225
five point stencil, 227
nine point scheme, 234
Poisson matrix, 227
variable coefficients, 302
Poisson problem (1D), 37
Polarization identity, 197
Polynomial
degree, 6
linear interpolation, 28
nontrivial, 6
Runge phenomenon, 28
zero, 6
Positive definite, 86, 95
Positive semidefinite, 86, 95
Power method, 335
inverse, 339
Rayleigh quotient iteration, 340
shifted, 339
Preconditioned conjugate gradient method, 279
Preconditioning, 299
Preconditioning matrix, 258
Principal submatrix, 60
Principal vectors, 134
Q
QR algorithm
implicit shift, 346
Rayleigh quotient shift, 346
shifted, 345
Wilkinson shift, 346
QR decomposition, 114
QR factorization, 114
Quadratic form, 86
R
Rank of a matrix, 10
Rate of convergence, 266
362
Rayleigh quotient, 139
generalized, 150
Rayleigh quotient iteration, 340
Relative error, 181
Residual vector, 182
Richardson’s method, 261
Right eigenpair, 143
Right eigenvalue, 143
Right eigenvector, 143
Rotation in the i, j -plane, 118
Row operations, 16
Row space, 10
S
Scalar product, 99
Schur factorization, 136
Schur factors, 136
Second derivative matrix, 38
positive definite, 87
Semi-Cholesky factorization, 91
Sherman-Morrison formula, 21
Shifted power method, 339
Similar matrices, 131
Similarity transformation, 131
Singular value
Courant-Fischer theorem, 213
Hoffman-Wielandt theorem, 215
Singular value factorization (SVF), 157
Singular values, 153
error analysis, 213
well conditioned, 216
Singular vectors, 153
Spectral radius, 261, 268
Spectral theorem, 140
Spectrum, 18, 129
Splitting matrices for J, GS and SOR, 259
Splitting matrix, 258
Standard inner product in Cn , 106
Steepest descent, 282
Stencil, 227
Sum of subspaces, 104
Sums of integers, 76
Sylvester’s inertia theorem, 328
T
Trace, 19
Triangle inequality, 171
Triangular matrix
lower triangular, 70
upper triangular, 70
2D test matrix, 229
Two point boundary value problem, 37
Index
U
Unitary matrix, 106
Unit vectors, 3
Upper trapezoidal matrix, 111
Upper triangular, 70
V
Variable coefficient, 313
Vector
addition, 5
angle, 102
linearly dependent, 6
linearly independent, 6
negative, 5
orthogonal, 102
orthonormal, 102
scalar multiplication, 5
zero, 5
Vector norm, 101, 171
Vector space
basis, 7
change of basis, 10
change of basis matrix, 9
complex, 5
complex inner product space, 100
dimension, 7
dimension formula, sums of subspaces, 9
direct sum, 8
enlarging vectors to a basis, 7
examples of subspaces, 8
existence of basis, 7
finite dimensional, 6
intersection, 8
nontrivial subspaces, 8
normed, 171
orthogonal vectors, 102
real, 5
span, 6
subspace, 8
sum, 8
trivial subspace, 8
union, 8
vectors, 5
Vectorization, 226
W
Weights, 203
Z
Zero function, 5
Editorial Policy
1. Textbooks on topics in the field of computational science and engineering will be
considered. They should be written for courses in CSE education. Both graduate and
undergraduate textbooks will be published in TCSE. Multidisciplinary topics and multidisciplinary teams of authors are especially welcome.
2. Format: Only works in English will be considered. For evaluation purposes, manuscripts
may be submitted in print or electronic form, in the latter case, preferably as pdfor zipped ps-files. Authors are requested to use the LaTeX style files available from
Springer at: http://www.springer.com/gp/authors-editors/book-authors-editors/resourcesguidelines/rights-permissions-licensing/manuscript-preparation/5636#c3324 (Layout &
templates – LaTeX template – contributed books).
Electronic material can be included if appropriate. Please contact the publisher.
3. Those considering a book which might be suitable for the series are strongly advised to
contact the publisher or the series editors at an early stage.
General Remarks
Careful preparation of manuscripts will help keep production time short and ensure a
satisfactory appearance of the finished book.
The following terms and conditions hold:
Regarding free copies and royalties, the standard terms for Springer mathematics textbooks
hold. Please write to martin.peters@springer.com for details.
Authors are entitled to purchase further copies of their book and other Springer books for
their personal use, at a discount of 33.3% directly from Springer-Verlag.
Series Editors
Timothy J. Barth
NASA Ames Research Center
NAS Division
Moffett Field, CA 94035, USA
barth@nas.nasa.gov
Michael Griebel
Institut für Numerische Simulation
der Universität Bonn
Wegelerstr. 6
53115 Bonn, Germany
griebel@ins.uni-bonn.de
Risto M. Nieminen
Department of Applied Physics
Aalto University School of Science
and Technology
00076 Aalto, Finland
risto.nieminen@tkk.fi
Dirk Roose
Department of Computer Science
Katholieke Universiteit Leuven
Celestijnenlaan 200A
3001 Leuven-Heverlee, Belgium
dirk.roose@cs.kuleuven.be
David E. Keyes
Mathematical and Computer Sciences
and Engineering
King Abdullah University of Science
and Technology
P.O. Box 55455
Jeddah 21534, Saudi Arabia
david.keyes@kaust.edu.sa
Tamar Schlick
Department of Chemistry
and Courant Institute
of Mathematical Sciences
New York University
251 Mercer Street
New York, NY 10012, USA
schlick@nyu.edu
and
Editor for Computational Science
and Engineering at Springer:
Martin Peters
Springer-Verlag
Mathematics Editorial
Tiergartenstrasse 17
69121 Heidelberg, Germany
martin.peters@springer.com
Department of Applied Physics
and Applied Mathematics
Columbia University
500 W. 120 th Street
New York, NY 10027, USA
kd2112@columbia.edu
Texts in Computational Science
and Engineering
1. H. P. Langtangen, Computational Partial Differential Equations. Numerical Methods and Diffpack
Programming. 2nd Edition
2. A. Quarteroni, F. Saleri, P. Gervasio, Scientific Computing with MATLAB and Octave. 4th Edition
3. H. P. Langtangen, Python Scripting for Computational Science. 3rd Edition
4. H. Gardner, G. Manduchi, Design Patterns for e-Science.
5. M. Griebel, S. Knapek, G. Zumbusch, Numerical Simulation in Molecular Dynamics.
6. H. P. Langtangen, A Primer on Scientific Programming with Python. 5th Edition
7. A. Tveito, H. P. Langtangen, B. F. Nielsen, X. Cai, Elements of Scientific Computing.
8. B. Gustafsson, Fundamentals of Scientific Computing.
9. M. Bader, Space-Filling Curves.
10. M. Larson, F. Bengzon, The Finite Element Method: Theory, Implementation and Applications.
11. W. Gander, M. Gander, F. Kwok, Scientific Computing: An Introduction using Maple and MATLAB.
12. P. Deuflhard, S. Röblitz, A Guide to Numerical Modelling in Systems Biology.
13. M. H. Holmes, Introduction to Scientific Computing and Data Analysis.
14. S. Linge, H. P. Langtangen, Programming for Computations - A Gentle Introduction to Numerical
Simulations with MATLAB/Octave.
15. S. Linge, H. P. Langtangen, Programming for Computations - A Gentle Introduction to Numerical
Simulations with Python.
16. H.P. Langtangen, S. Linge, Finite Difference Computing with PDEs - A Modern Software Approach.
17. B. Gustafsson, Scientific Computing from a Historical Perspective
18. J.A. Trangenstein, Scientific Computing - Vol. I. – Linear and Nonlinear Equations
19. J.A. Trangenstein, Scientific Computing - Vol. II. – Eigenvalues and Optimization
20. J.A. Trangenstein, Scientific Computing - Vol. III. – Approximation and Integration
21. H.P. Langtangen, K.-A. Mardal, Introduction to Numerical Methods for Variational Problems.
22. T. Lyche, Numerical Linear Algebra and Matrix Factorizations.
For further information on these books please have a look at our mathematics catalogue at the following
URL: www.springer.com/series/5151
Monographs in Computational Science
and Engineering
1. J. Sundnes, G.T. Lines, X. Cai, B.F. Nielsen, K.-A. Mardal, A. Tveito, Computing the Electrical
Activity in the Heart.
For further information on this book, please have a look at our mathematics catalogue at the following
URL: www.springer.com/series/7417
Lecture Notes
in Computational Science
and Engineering
1. D. Funaro, Spectral Elements for Transport-Dominated Equations.
2. H.P. Langtangen, Computational Partial Differential Equations. Numerical Methods and Diffpack
Programming.
3. W. Hackbusch, G. Wittum (eds.), Multigrid Methods V.
4. P. Deuflhard, J. Hermans, B. Leimkuhler, A.E. Mark, S. Reich, R.D. Skeel (eds.), Computational
Molecular Dynamics: Challenges, Methods, Ideas.
5. D. Kröner, M. Ohlberger, C. Rohde (eds.), An Introduction to Recent Developments in Theory and
Numerics for Conservation Laws.
6. S. Turek, Efficient Solvers for Incompressible Flow Problems. An Algorithmic and Computational
Approach.
7. R. von Schwerin, Multi Body System SIMulation. Numerical Methods, Algorithms, and Software.
8. H.-J. Bungartz, F. Durst, C. Zenger (eds.), High Performance Scientific and Engineering Computing.
9. T.J. Barth, H. Deconinck (eds.), High-Order Methods for Computational Physics.
10. H.P. Langtangen, A.M. Bruaset, E. Quak (eds.), Advances in Software Tools for Scientific Computing.
11. B. Cockburn, G.E. Karniadakis, C.-W. Shu (eds.), Discontinuous Galerkin Methods. Theory,
Computation and Applications.
12. U. van Rienen, Numerical Methods in Computational Electrodynamics. Linear Systems in Practical
Applications.
13. B. Engquist, L. Johnsson, M. Hammill, F. Short (eds.), Simulation and Visualization on the Grid.
14. E. Dick, K. Riemslagh, J. Vierendeels (eds.), Multigrid Methods VI.
15. A. Frommer, T. Lippert, B. Medeke, K. Schilling (eds.), Numerical Challenges in Lattice Quantum
Chromodynamics.
16. J. Lang, Adaptive Multilevel Solution of Nonlinear Parabolic PDE Systems. Theory, Algorithm, and
Applications.
17. B.I. Wohlmuth, Discretization Methods and Iterative Solvers Based on Domain Decomposition.
18. U. van Rienen, M. Günther, D. Hecht (eds.), Scientific Computing in Electrical Engineering.
19. I. Babuška, P.G. Ciarlet, T. Miyoshi (eds.), Mathematical Modeling and Numerical Simulation in
Continuum Mechanics.
20. T.J. Barth, T. Chan, R. Haimes (eds.), Multiscale and Multiresolution Methods. Theory and
Applications.
21. M. Breuer, F. Durst, C. Zenger (eds.), High Performance Scientific and Engineering Computing.
22. K. Urban, Wavelets in Numerical Simulation. Problem Adapted Construction and Applications.
23. L.F. Pavarino, A. Toselli (eds.), Recent Developments in Domain Decomposition Methods.
24. T. Schlick, H.H. Gan (eds.), Computational Methods for Macromolecules: Challenges and
Applications.
25. T.J. Barth, H. Deconinck (eds.), Error Estimation and Adaptive Discretization Methods in
Computational Fluid Dynamics.
26. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations.
27. S. Müller, Adaptive Multiscale Schemes for Conservation Laws.
28. C. Carstensen, S. Funken, W. Hackbusch, R.H.W. Hoppe, P. Monk (eds.), Computational
Electromagnetics.
29. M.A. Schweitzer, A Parallel Multilevel Partition of Unity Method for Elliptic Partial Differential
Equations.
30. T. Biegler, O. Ghattas, M. Heinkenschloss, B. van Bloemen Waanders (eds.), Large-Scale PDEConstrained Optimization.
31. M. Ainsworth, P. Davies, D. Duncan, P. Martin, B. Rynne (eds.), Topics in Computational Wave
Propagation. Direct and Inverse Problems.
32. H. Emmerich, B. Nestler, M. Schreckenberg (eds.), Interface and Transport Dynamics. Computational Modelling.
33. H.P. Langtangen, A. Tveito (eds.), Advanced Topics in Computational Partial Differential
Equations. Numerical Methods and Diffpack Programming.
34. V. John, Large Eddy Simulation of Turbulent Incompressible Flows. Analytical and Numerical
Results for a Class of LES Models.
35. E. Bänsch (ed.), Challenges in Scientific Computing - CISC 2002.
36. B.N. Khoromskij, G. Wittum, Numerical Solution of Elliptic Differential Equations by Reduction
to the Interface.
37. A. Iske, Multiresolution Methods in Scattered Data Modelling.
38. S.-I. Niculescu, K. Gu (eds.), Advances in Time-Delay Systems.
39. S. Attinger, P. Koumoutsakos (eds.), Multiscale Modelling and Simulation.
40. R. Kornhuber, R. Hoppe, J. Périaux, O. Pironneau, O. Wildlund, J. Xu (eds.), Domain Decomposition Methods in Science and Engineering.
41. T. Plewa, T. Linde, V.G. Weirs (eds.), Adaptive Mesh Refinement – Theory and Applications.
42. A. Schmidt, K.G. Siebert, Design of Adaptive Finite Element Software. The Finite Element Toolbox
ALBERTA.
43. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations II.
44. B. Engquist, P. Lötstedt, O. Runborg (eds.), Multiscale Methods in Science and Engineering.
45. P. Benner, V. Mehrmann, D.C. Sorensen (eds.), Dimension Reduction of Large-Scale Systems.
46. D. Kressner, Numerical Methods for General and Structured Eigenvalue Problems.
47. A. Boriçi, A. Frommer, B. Joó, A. Kennedy, B. Pendleton (eds.), QCD and Numerical Analysis III.
48. F. Graziani (ed.), Computational Methods in Transport.
49. B. Leimkuhler, C. Chipot, R. Elber, A. Laaksonen, A. Mark, T. Schlick, C. Schütte, R. Skeel (eds.),
New Algorithms for Macromolecular Simulation.
50. M. Bücker, G. Corliss, P. Hovland, U. Naumann, B. Norris (eds.), Automatic Differentiation:
Applications, Theory, and Implementations.
51. A.M. Bruaset, A. Tveito (eds.), Numerical Solution of Partial Differential Equations on Parallel
Computers.
52. K.H. Hoffmann, A. Meyer (eds.), Parallel Algorithms and Cluster Computing.
53. H.-J. Bungartz, M. Schäfer (eds.), Fluid-Structure Interaction.
54. J. Behrens, Adaptive Atmospheric Modeling.
55. O. Widlund, D. Keyes (eds.), Domain Decomposition Methods in Science and Engineering XVI.
56. S. Kassinos, C. Langer, G. Iaccarino, P. Moin (eds.), Complex Effects in Large Eddy Simulations.
57. M. Griebel, M.A Schweitzer (eds.), Meshfree Methods for Partial Differential Equations III.
58. A.N. Gorban, B. Kégl, D.C. Wunsch, A. Zinovyev (eds.), Principal Manifolds for Data Visualization and Dimension Reduction.
59. H. Ammari (ed.), Modeling and Computations in Electromagnetics: A Volume Dedicated to JeanClaude Nédélec.
60. U. Langer, M. Discacciati, D. Keyes, O. Widlund, W. Zulehner (eds.), Domain Decomposition
Methods in Science and Engineering XVII.
61. T. Mathew, Domain Decomposition Methods for the Numerical Solution of Partial Differential
Equations.
62. F. Graziani (ed.), Computational Methods in Transport: Verification and Validation.
63. M. Bebendorf, Hierarchical Matrices. A Means to Efficiently Solve Elliptic Boundary Value
Problems.
64. C.H. Bischof, H.M. Bücker, P. Hovland, U. Naumann, J. Utke (eds.), Advances in Automatic
Differentiation.
65. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations IV.
66. B. Engquist, P. Lötstedt, O. Runborg (eds.), Multiscale Modeling and Simulation in Science.
67. I.H. Tuncer, Ü. Gülcat, D.R. Emerson, K. Matsuno (eds.), Parallel Computational Fluid Dynamics
2007.
68. S. Yip, T. Diaz de la Rubia (eds.), Scientific Modeling and Simulations.
69. A. Hegarty, N. Kopteva, E. O’Riordan, M. Stynes (eds.), BAIL 2008 – Boundary and Interior
Layers.
70. M. Bercovier, M.J. Gander, R. Kornhuber, O. Widlund (eds.), Domain Decomposition Methods in
Science and Engineering XVIII.
71. B. Koren, C. Vuik (eds.), Advanced Computational Methods in Science and Engineering.
72. M. Peters (ed.), Computational Fluid Dynamics for Sport Simulation.
73. H.-J. Bungartz, M. Mehl, M. Schäfer (eds.), Fluid Structure Interaction II - Modelling, Simulation,
Optimization.
74. D. Tromeur-Dervout, G. Brenner, D.R. Emerson, J. Erhel (eds.), Parallel Computational Fluid
Dynamics 2008.
75. A.N. Gorban, D. Roose (eds.), Coping with Complexity: Model Reduction and Data Analysis.
76. J.S. Hesthaven, E.M. Rønquist (eds.), Spectral and High Order Methods for Partial Differential
Equations.
77. M. Holtz, Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance.
78. Y. Huang, R. Kornhuber, O.Widlund, J. Xu (eds.), Domain Decomposition Methods in Science and
Engineering XIX.
79. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations V.
80. P.H. Lauritzen, C. Jablonowski, M.A. Taylor, R.D. Nair (eds.), Numerical Techniques for Global
Atmospheric Models.
81. C. Clavero, J.L. Gracia, F.J. Lisbona (eds.), BAIL 2010 – Boundary and Interior Layers, Computational and Asymptotic Methods.
82. B. Engquist, O. Runborg, Y.R. Tsai (eds.), Numerical Analysis and Multiscale Computations.
83. I.G. Graham, T.Y. Hou, O. Lakkis, R. Scheichl (eds.), Numerical Analysis of Multiscale Problems.
84. A. Logg, K.-A. Mardal, G. Wells (eds.), Automated Solution of Differential Equations by the Finite
Element Method.
85. J. Blowey, M. Jensen (eds.), Frontiers in Numerical Analysis - Durham 2010.
86. O. Kolditz, U.-J. Gorke, H. Shao, W. Wang (eds.), Thermo-Hydro-Mechanical-Chemical Processes
in Fractured Porous Media - Benchmarks and Examples.
87. S. Forth, P. Hovland, E. Phipps, J. Utke, A. Walther (eds.), Recent Advances in Algorithmic
Differentiation.
88. J. Garcke, M. Griebel (eds.), Sparse Grids and Applications.
89. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations VI.
90. C. Pechstein, Finite and Boundary Element Tearing and Interconnecting Solvers for Multiscale
Problems.
91. R. Bank, M. Holst, O. Widlund, J. Xu (eds.), Domain Decomposition Methods in Science and
Engineering XX.
92. H. Bijl, D. Lucor, S. Mishra, C. Schwab (eds.), Uncertainty Quantification in Computational Fluid
Dynamics.
93. M. Bader, H.-J. Bungartz, T. Weinzierl (eds.), Advanced Computing.
94. M. Ehrhardt, T. Koprucki (eds.), Advanced Mathematical Models and Numerical Techniques for
Multi-Band Effective Mass Approximations.
95. M. Azaïez, H. El Fekih, J.S. Hesthaven (eds.), Spectral and High Order Methods for Partial
Differential Equations ICOSAHOM 2012.
96. F. Graziani, M.P. Desjarlais, R. Redmer, S.B. Trickey (eds.), Frontiers and Challenges in Warm
Dense Matter.
97. J. Garcke, D. Pflüger (eds.), Sparse Grids and Applications – Munich 2012.
98. J. Erhel, M. Gander, L. Halpern, G. Pichot, T. Sassi, O. Widlund (eds.), Domain Decomposition
Methods in Science and Engineering XXI.
99. R. Abgrall, H. Beaugendre, P.M. Congedo, C. Dobrzynski, V. Perrier, M. Ricchiuto (eds.), High
Order Nonlinear Numerical Methods for Evolutionary PDEs - HONOM 2013.
100. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations VII.
101. R. Hoppe (ed.), Optimization with PDE Constraints - OPTPDE 2014.
102. S. Dahlke, W. Dahmen, M. Griebel, W. Hackbusch, K. Ritter, R. Schneider, C. Schwab,
H. Yserentant (eds.), Extraction of Quantifiable Information from Complex Systems.
103. A. Abdulle, S. Deparis, D. Kressner, F. Nobile, M. Picasso (eds.), Numerical Mathematics and
Advanced Applications - ENUMATH 2013.
104. T. Dickopf, M.J. Gander, L. Halpern, R. Krause, L.F. Pavarino (eds.), Domain Decomposition
Methods in Science and Engineering XXII.
105. M. Mehl, M. Bischoff, M. Schäfer (eds.), Recent Trends in Computational Engineering - CE2014.
Optimization, Uncertainty, Parallel Algorithms, Coupled and Complex Problems.
106. R.M. Kirby, M. Berzins, J.S. Hesthaven (eds.), Spectral and High Order Methods for Partial
Differential Equations - ICOSAHOM’14.
107. B. Jüttler, B. Simeon (eds.), Isogeometric Analysis and Applications 2014.
108. P. Knobloch (ed.), Boundary and Interior Layers, Computational and Asymptotic Methods – BAIL
2014.
109. J. Garcke, D. Pflüger (eds.), Sparse Grids and Applications – Stuttgart 2014.
110. H. P. Langtangen, Finite Difference Computing with Exponential Decay Models.
111. A. Tveito, G.T. Lines, Computing Characterizations of Drugs for Ion Channels and Receptors
Using Markov Models.
112. B. Karazösen, M. Manguoğlu, M. Tezer-Sezgin, S. Göktepe, Ö. Uğur (eds.), Numerical Mathematics and Advanced Applications - ENUMATH 2015.
113. H.-J. Bungartz, P. Neumann, W.E. Nagel (eds.), Software for Exascale Computing - SPPEXA 20132015.
114. G.R. Barrenechea, F. Brezzi, A. Cangiani, E.H. Georgoulis (eds.), Building Bridges: Connections
and Challenges in Modern Approaches to Numerical Partial Differential Equations.
115. M. Griebel, M.A. Schweitzer (eds.), Meshfree Methods for Partial Differential Equations VIII.
116. C.-O. Lee, X.-C. Cai, D.E. Keyes, H.H. Kim, A. Klawonn, E.-J. Park, O.B. Widlund (eds.), Domain
Decomposition Methods in Science and Engineering XXIII.
117. T. Sakurai, S. Zhang, T. Imamura, Y. Yusaku, K. Yoshinobu, H. Takeo (eds.), Eigenvalue Problems:
Algorithms, Software and Applications, in Petascale Computing. EPASA 2015, Tsukuba, Japan,
September 2015.
118. T. Richter (ed.), Fluid-structure Interactions. Models, Analysis and Finite Elements.
119. M.L. Bittencourt, N.A. Dumont, J.S. Hesthaven (eds.), Spectral and High Order Methods for Partial
Differential Equations ICOSAHOM 2016.
120. Z. Huang, M. Stynes, Z. Zhang (eds.), Boundary and Interior Layers, Computational and Asymptotic Methods BAIL 2016.
121. S.P.A. Bordas, E.N. Burman, M.G. Larson, M.A. Olshanskii (eds.), Geometrically Unfitted Finite
Element Methods and Applications. Proceedings of the UCL Workshop 2016.
122. A. Gerisch, R. Penta, J. Lang (eds.), Multiscale Models in Mechano and Tumor Biology. Modeling,
Homogenization, and Applications.
123. J. Garcke, D. Pflüger, C.G. Webster, G. Zhang (eds.), Sparse Grids and Applications - Miami 2016.
124. M. Schäfer, M. Behr, M. Mehl, B. Wohlmuth (eds.), Recent Advances in Computational Engineering. Proceedings of the 4th International Conference on Computational Engineering (ICCE 2017)
in Darmstadt.
125. P.E. Bjørstad, S.C. Brenner, L. Halpern, R. Kornhuber, H.H. Kim, T. Rahman, O.B. Widlund (eds.),
Domain Decomposition Methods in Science and Engineering XXIV. 24th International Conference
on Domain Decomposition Methods, Svalbard, Norway, February 6–10, 2017.
126. F.A. Radu, K. Kumar, I. Berre, J.M. Nordbotten, I.S. Pop (eds.), Numerical Mathematics and
Advanced Applications – ENUMATH 2017.
127. X. Roca, A. Loseille (eds.), 27th International Meshing Roundtable.
128. Th. Apel, U. Langer, A. Meyer, O. Steinbach (eds.), Advanced Finite Element Methods with
Applications. Selected Papers from the 30th Chemnitz Finite Element Symposium 2017.
129. M. Griebel, M. A. Schweitzer (eds.), Meshfree Methods for Partial Differencial Equations IX.
130. S. Weißer, BEM-based Finite Element Approaches on Polytopal Meshes.
131. V. A. Garanzha, L. Kamenski, H. Si (eds.), Numerical Geometry, Grid Generation and Scientific
Computing. Proceedings of the 9th International Conference, NUMGRID 2018/Voronoi 150,
Celebrating the 150th Anniversary of G. F. Voronoi, Moscow, Russia, December 2018.
132. E. H. van Brummelen, A. Corsini, S. Perotto, G. Rozza (eds.), Numerical Methods for Flows.
For further information on these books please have a look at our mathematics catalogue at the following
URL: www.springer.com/series/3527
Download